日韩黑丝制服一区视频播放|日韩欧美人妻丝袜视频在线观看|九九影院一级蜜桃|亚洲中文在线导航|青草草视频在线观看|婷婷五月色伊人网站|日本一区二区在线|国产AV一二三四区毛片|正在播放久草视频|亚洲色图精品一区

分享

基因組選擇和SNP分析在ASREML-SA中的實(shí)現(xiàn)方法

 育種數(shù)據(jù)分析 2021-11-18

編者自語:

asreml是非常強(qiáng)大的軟件, 由于太強(qiáng)大, 很多人不會使用. 基因組選擇在育種中的應(yīng)用, 其基礎(chǔ)是常規(guī)的系譜動物模型, 動物模型也可以很復(fù)雜, 看一下asreml的說明書就知道了, 有300多頁, 據(jù)我了解, 其厚度可以用這個(gè)公式表示:

這說明一個(gè)問題, Arthur Gilmour教授(asreml的作者)是一個(gè)非常有耐心, 也非常厲害的統(tǒng)計(jì)學(xué)家, 他花費(fèi)了自己的大半生, 將自己的心血編程了這個(gè)軟件, 我很佩服.

這個(gè)教程是asreml在基因組選擇和分子育種中的應(yīng)用, 下面是我的讀書筆記.

一個(gè)朋友說, 我們這個(gè)圈子很小了, 如果大家再不知道怎么分享, 怎么交流, 那我們這個(gè)學(xué)科以后怎么辦呢, 這也是我停不下來的原因. 尼采說過: 力的過剩, 是力的證明. 他把不務(wù)正業(yè)說的這么理所應(yīng)當(dāng), 搞得我將斜杠青年進(jìn)行到底的決心變得更加穩(wěn)固. 廢話少說, 以下是目錄.

目錄:

簡介

這篇文檔的主要目標(biāo)是介紹ASReml在基因組分析中的實(shí)現(xiàn)方法, 它假定讀者有一定的統(tǒng)計(jì)基礎(chǔ). 在本文檔中, 不對統(tǒng)計(jì)和模型做過多的介紹.

1, 單標(biāo)記分析

示例數(shù)據(jù):

ID,effect,SNP_1,SNP_100,SNP_1000,SNP_101,SNP_102,SNP_103,SNP_104,SNP_105,SNP_106,SNP_107,SNP_108,SNP_109,SNP_11,SNP_110,SNP_111,SNP_112,SNP_113,SNP_114
ID_1,-0.259731957336183,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ID_10,0.117554666740654,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
ID_100,0.00357380737732867,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ID_101,0.344906212015101,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0
ID_102,0.376403712779367,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1
ID_103,0.131676984710817,0,0,0,0,1,1,0,1,1,1,0,0,0,0,0,0,0,0
ID_104,0.41299708896122,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
ID_105,0.353890056009646,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ID_106,0.237438809186312,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ID_107,-0.316455302927825,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
ID_108,-0.235784805404543,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ID_109,0.0783501427411017,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
ID_11,0.0919863476998604,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0

ID, 觀測值為effect, 第三列及以后為SNP 名稱.

將每個(gè)標(biāo)記作為固定因子, 循環(huán)運(yùn)行:

!cycle SNP_1 SNP_100 SNP_1000 SNP_101 SNP_102 SNP_103 SNP_104 SNP_105 SNP_106 SNP_107 SNP_108 SNP_109 SNP_11 SNP_110 SNP_111 SNP_112 SNP_113 SNP_114

dd.csv !SKIP 1

effect ~ mu $I

可以在asr文件中, 查看每個(gè)SNP的顯著性, 這是單標(biāo)記方差分析.

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
21 mu 1 651.0 0.83 0.363
14 SNP_109 2 651.0 5.20 0.006
Finished: 19 Oct 2018 17:04:23.666 LogL Converged
Folder: D:\spline\snp-asreml
Cycle 13 value is SNP_11
Reading dd.csv FREE FORMAT skipping 1 lines

Univariate analysis of effect
Summary of 654 records retained of 654 read
Warning: Fewer levels found in SNP_1 than specified
Warning: Fewer levels found in SNP_101 than specified
Warning: Fewer levels found in SNP_104 than specified
Warning: Fewer levels found in SNP_11 than specified
Warning: Fewer levels found in SNP_112 than specified
Forming 3 equations: 3 dense.
Initial updates will be shrunk by factor 0.316
Notice: 1 singularities detected in design matrix.
1 LogL= 603.924 S2= 0.56887E-01 652 df
2 LogL= 603.924 S2= 0.56887E-01 652 df

- - - Results from analysis of effect - - -
LogL: 603.92 0.568871E-01 652 2 SNP_11 "LogL Converged"
Akaike Information Criterion -1205.85 (assuming 1 parameters).
Bayesian Information Criterion -1201.37

Model_Term Gamma Sigma Sigma/SE % C
Residual SCA_V 654 1.00000 0.568871E-01 18.06 0 P

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
21 mu 1 652.0 0.82 0.366
15 SNP_11 1 652.0 1.25 0.264
Finished: 19 Oct 2018 17:04:24.058 LogL Converged
Folder: D:\spline\snp-asreml
Cycle 14 value is SNP_110
Reading dd.csv FREE FORMAT skipping 1 lines

Univariate analysis of effect
Summary of 654 records retained of 654 read
Warning: Fewer levels found in SNP_1 than specified
Warning: Fewer levels found in SNP_101 than specified
Warning: Fewer levels found in SNP_104 than specified
Warning: Fewer levels found in SNP_11 than specified
Warning: Fewer levels found in SNP_112 than specified
Forming 3 equations: 3 dense.
Initial updates will be shrunk by factor 0.316
1 LogL= 601.263 S2= 0.56936E-01 651 df
2 LogL= 601.263 S2= 0.56936E-01 651 df

- - - Results from analysis of effect - - -
LogL: 601.26 0.569356E-01 651 2 SNP_110 "LogL Converged"
Akaike Information Criterion -1200.53 (assuming 1 parameters).
Bayesian Information Criterion -1196.05

Model_Term Gamma Sigma Sigma/SE % C
Residual SCA_V 654 1.00000 0.569356E-01 18.04 0 P

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
21 mu 1 651.0 0.82 0.366
16 SNP_110 2 651.0 0.85 0.429
Finished: 19 Oct 2018 17:04:24.499 LogL Converged
Folder: D:\spline\snp-asreml
Cycle 15 value is SNP_111
Reading dd.csv FREE FORMAT skipping 1 lines

Univariate analysis of effect
Summary of 654 records retained of 654 read
Warning: Fewer levels found in SNP_1 than specified
Warning: Fewer levels found in SNP_101 than specified
Warning: Fewer levels found in SNP_104 than specified
Warning: Fewer levels found in SNP_11 than specified
Warning: Fewer levels found in SNP_112 than specified
Forming 3 equations: 3 dense.
Initial updates will be shrunk by factor 0.316
1 LogL= 600.791 S2= 0.57054E-01 651 df
2 LogL= 600.791 S2= 0.57054E-01 651 df

- - - Results from analysis of effect - - -
LogL: 600.79 0.570539E-01 651 2 SNP_111 "LogL Converged"
Local CYCLE LogL Peak at CYCLE: 12 SNP_109 LogL: 605.70 Deviance: 12.35
Akaike Information Criterion -1199.58 (assuming 1 parameters).
Bayesian Information Criterion -1195.10

Model_Term Gamma Sigma Sigma/SE % C
Residual SCA_V 654 1.00000 0.570539E-01 18.04 0 P

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
21 mu 1 651.0 0.81 0.367
17 SNP_111 2 651.0 0.17 0.843
Finished: 19 Oct 2018 17:04:24.962 LogL Converged
Folder: D:\spline\snp-asreml
Cycle 16 value is SNP_112
Reading dd.csv FREE FORMAT skipping 1 lines

Univariate analysis of effect
Summary of 654 records retained of 654 read
Warning: Fewer levels found in SNP_1 than specified
Warning: Fewer levels found in SNP_101 than specified
Warning: Fewer levels found in SNP_104 than specified
Warning: Fewer levels found in SNP_11 than specified
Warning: Fewer levels found in SNP_112 than specified
Forming 3 equations: 3 dense.
Initial updates will be shrunk by factor 0.316
Notice: 1 singularities detected in design matrix.
1 LogL= 602.714 S2= 0.56989E-01 652 df
2 LogL= 602.714 S2= 0.56989E-01 652 df

- - - Results from analysis of effect - - -
LogL: 602.71 0.569893E-01 652 2 SNP_112 "LogL Converged"
Akaike Information Criterion -1203.43 (assuming 1 parameters).
Bayesian Information Criterion -1198.95

Model_Term Gamma Sigma Sigma/SE % C
Residual SCA_V 654 1.00000 0.569893E-01 18.06 0 P

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
21 mu 1 652.0 0.82 0.367
18 SNP_112 1 652.0 0.08 0.776
Finished: 19 Oct 2018 17:04:25.435 LogL Converged
Folder: D:\spline\snp-asreml
Cycle 17 value is SNP_113
Reading dd.csv FREE FORMAT skipping 1 lines

Univariate analysis of effect
Summary of 654 records retained of 654 read
Warning: Fewer levels found in SNP_1 than specified
Warning: Fewer levels found in SNP_101 than specified
Warning: Fewer levels found in SNP_104 than specified
Warning: Fewer levels found in SNP_11 than specified
Warning: Fewer levels found in SNP_112 than specified
Forming 3 equations: 3 dense.
Initial updates will be shrunk by factor 0.316
1 LogL= 601.723 S2= 0.57001E-01 651 df
2 LogL= 601.723 S2= 0.57001E-01 651 df

- - - Results from analysis of effect - - -
LogL: 601.72 0.570011E-01 651 2 SNP_113 "LogL Converged"
Akaike Information Criterion -1201.45 (assuming 1 parameters).
Bayesian Information Criterion -1196.97

Model_Term Gamma Sigma Sigma/SE % C
Residual SCA_V 654 1.00000 0.570011E-01 18.04 0 P

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
21 mu 1 651.0 0.82 0.367
19 SNP_113 2 651.0 0.47 0.623
Finished: 19 Oct 2018 17:04:25.904 LogL Converged
Folder: D:\spline\snp-asreml
Cycle 18 value is SNP_114
Reading dd.csv FREE FORMAT skipping 1 lines

Univariate analysis of effect
Summary of 654 records retained of 654 read
Warning: Fewer levels found in SNP_1 than specified
Warning: Fewer levels found in SNP_101 than specified
Warning: Fewer levels found in SNP_104 than specified
Warning: Fewer levels found in SNP_11 than specified
Warning: Fewer levels found in SNP_112 than specified
Forming 3 equations: 3 dense.
Initial updates will be shrunk by factor 0.316
1 LogL= 606.497 S2= 0.56038E-01 651 df
2 LogL= 606.497 S2= 0.56038E-01 651 df

- - - Results from analysis of effect - - -
LogL: 606.50 0.560380E-01 651 2 SNP_114 "LogL Converged"
Local CYCLE LogL Peak at CYCLE: 18 SNP_114 LogL: 606.50 Deviance: 13.94
Akaike Information Criterion -1210.99 (assuming 1 parameters).
Bayesian Information Criterion -1206.51

Model_Term Gamma Sigma Sigma/SE % C
Residual SCA_V 654 1.00000 0.560380E-01 18.04 0 P

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
21 mu 1 651.0 0.83 0.363
20 SNP_114 2 651.0 6.08 0.002
Best LogL 606.50 0.560380E-01 651 2 SNP_114 LogL Converged
Finished: 19 Oct 2018 17:04:26.403 LogL Converged

結(jié)果可以看出, 第20(SNP_114)個(gè)SNP達(dá)到極顯著, 第16(SNP_109)個(gè)SNP達(dá)到顯著水平.

我們也可以將其作為隨機(jī)因子, 查看Log-likehood評價(jià)模型. 如果比空模型好(LRT檢驗(yàn)), 那說明標(biāo)記效應(yīng)明顯.

!cycle SNP_1 SNP_100 SNP_1000 SNP_101 SNP_102 SNP_103 SNP_104 SNP_105 SNP_106 SNP_107 SNP_108 SNP_109 SNP_11 SNP_110 SNP_111 SNP_112 SNP_113 SNP_114
dd.csv !SKIP 1

effect ~ mu !r $I

結(jié)果:

LogL: LogL Residual NEDF NIT Cycle Text
LogL: 607.75 0.564653E-01 653 6 SNP_1 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 SNP_100 "LogL Converged"
LogL: 606.11 0.569091E-01 653 7 SNP_1000 "LogL Converged"
LogL: 606.37 0.567870E-01 653 4 SNP_101 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 SNP_102 "LogL Converged"
LogL: 606.21 0.568392E-01 653 5 SNP_103 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 SNP_104 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 SNP_105 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 SNP_106 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 SNP_107 "LogL Converged"
LogL: 606.57 0.567311E-01 653 4 SNP_108 "LogL Converged"
LogL: 609.22 0.561598E-01 653 3 SNP_109 "LogL Converged"
LogL: 606.12 0.568872E-01 653 5 SNP_11 "LogL Converged"
Local CYCLE LogL Peak at CYCLE: 12 SNP_109 LogL: 609.22 Deviance: 6.22
LogL: 606.16 0.568635E-01 653 4 SNP_110 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 SNP_111 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 SNP_112 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 SNP_113 "LogL Converged"
LogL: 608.14 0.560577E-01 653 8 SNP_114 "LogL Converged"
Local CYCLE LogL Peak at CYCLE: 18 SNP_114 LogL: 608.14 Deviance: 4.06

同樣的結(jié)果, 我們可以看到Local CYCLE中 達(dá)到Peak的點(diǎn)在SNP_109 6.22 和SNP_114 4.06, 說明這兩個(gè)SNP位點(diǎn)達(dá)到顯著性水平.

另一種寫法, 應(yīng)對標(biāo)記比較多的情況, 不用每個(gè)標(biāo)記都需要用!cycle指定名稱, 可以用!G N, N是標(biāo)記個(gè)數(shù)進(jìn)行代替. 這種方法的缺點(diǎn)是沒有SNP標(biāo)記名稱.

ID !A # ID_101
effect # 0.344906212015101
Marks !G 18
# !cycle SNP_1 SNP_100 SNP_1000 SNP_101 SNP_102 SNP_103 SNP_104 SNP_105 SNP_106 SNP_107 SNP_108 SNP_109 SNP_11 SNP_110 SNP_111 SNP_112 SNP_113 SNP_114
dd.csv !SKIP 1

!cycle 1:18
effect ~ mu !r Marks[$I]

結(jié)果:

LogL: LogL Residual NEDF NIT Cycle Text
LogL: 607.75 0.564653E-01 653 6 1 "LogL Converged"
LogL: 606.10 0.569091E-01 653 6 2 "LogL Converged"
LogL: 606.11 0.569091E-01 653 7 3 "LogL Converged"
LogL: 606.37 0.567870E-01 653 4 4 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 5 "LogL Converged"
LogL: 606.39 0.567814E-01 653 4 6 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 7 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 8 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 9 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 10 "LogL Converged"
LogL: 606.53 0.567416E-01 653 4 11 "LogL Converged"
LogL: 607.88 0.564391E-01 653 5 12 "LogL Converged"
LogL: 606.12 0.568872E-01 653 5 13 "LogL Converged"
Local CYCLE LogL Peak at CYCLE: 12 12 LogL: 607.88 Deviance: 3.55
LogL: 606.11 0.569077E-01 653 5 14 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 15 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 16 "LogL Converged"
LogL: 606.11 0.569091E-01 653 6 17 "LogL Converged"
LogL: 607.67 0.564839E-01 653 5 18 "LogL Converged"
Local CYCLE LogL Peak at CYCLE: 18 18 LogL: 607.67 Deviance: 3.12

查看sln中的BLUP值, 放到excel中排序, 可以看出兩個(gè)標(biāo)記比較大:
如果有每個(gè)標(biāo)記的map位置, 我們就可以進(jìn)行作圖.

2, 多標(biāo)記分析

顧名思義, 就是講所有Marks放在一起進(jìn)行分析.

ID !A # ID_101
effect # 0.344906212015101
Marks !G 18
# !cycle SNP_1 SNP_100 SNP_1000 SNP_101 SNP_102 SNP_103 SNP_104 SNP_105 SNP_106 SNP_107 SNP_108 SNP_109 SNP_11 SNP_110 SNP_111 SNP_112 SNP_113 SNP_114
dd.csv !SKIP 1

# !cycle 1:18
# effect ~ mu !r Marks[$I]

# effect ~ mu # LogL= 606.105
effect ~ mu !r Marks

結(jié)果:

8 LogL= 607.362 S2= 0.55772E-01 653 df 0.1377E-01
Final parameter values 0.1378E-01

- - - Results from analysis of effect - - -
Akaike Information Criterion -1210.72 (assuming 2 parameters).
Bayesian Information Criterion -1201.76

Approximate stratum variance decomposition
Stratum Degrees-Freedom Variance Component Coefficients
Marks 17.30 0.965402E-01 53.0 1.0
Residual Variance 635.70 0.557723E-01 0.0 1.0

Model_Term Gamma Sigma Sigma/SE % C
Marks IDV_V 18 0.137842E-01 0.768776E-03 1.24 0 P
Residual SCA_V 654 1.00000 0.557723E-01 17.83 0 P

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
4 mu 1 73.6 1.52 0.222
Notice: The DenDF values are calculated ignoring fixed/boundary/singular
variance parameters using algebraic derivatives.

Solution Standard Error T-value T-prev
4 mu
1 -0.167809E-01 0.136271E-01 -1.23
3 Marks 18 effects fitted

空模型的log值是606, Mark模型是607, 輕微提高.

查看sln的BLUP值

3, 基因組選擇

理論介紹

GBLUP所依據(jù)的公式為:

M是n*m構(gòu)成的矩陣, n是個(gè)體數(shù), m為標(biāo)記數(shù)(marker), g是每個(gè)標(biāo)記的BLUP值. 隨著標(biāo)記數(shù)目的增加, m >>n的情況出現(xiàn)導(dǎo)致算法需要調(diào)整. 現(xiàn)在通用的是

如果已經(jīng)計(jì)算出G矩陣, 可以使用asreml進(jìn)行GBLUP的估算,  代碼如下:

!work 12 !ARG 1
QTL ANALUSIS
id !P
SEX !A
AGE !A
HEIGHT !M -9999

idbgrm.ped !mark !alpha
ibdgrm.grm !ND !dense
ibdgrm.dat

HEIGHT ~ mu SEX !R nrm(id) grm1(id)
  • grm 文件為稠密矩陣(dense)的下三角

  • 固定因子為age, sex

  • 隨機(jī)因子為加性效應(yīng), 基因組隨機(jī)效應(yīng)

  • asreml在估算GBlUP時(shí), 會同時(shí)給出標(biāo)記的效應(yīng)值(marker  effect), 結(jié)果文件在mef中.
    相關(guān)的R包, 參考wgaim包

在下一章節(jié)中, 我們將對GS的延伸方法: Fast Bayes A進(jìn)行介紹.

4, 基因組選擇的其它方法

EM BayesA-like方法, 參考 Sun et al. (2012)開發(fā)而成.

一般標(biāo)記矩陣的編碼方法為: 0 1 2,

  • 0 為major等位基因: eg AA

  • 1 為雜合等位基因: eg Aa

  • 2 為minor等位基因: eg aa

構(gòu)建矩陣的方法, 公式為:

具體參數(shù):

Bayes A, 假定性狀是由主效QTL控制, 少數(shù)QTL解釋了一大半的變異, 而不是像GBLUP所假定每個(gè)標(biāo)記的有相同的方差(符合正態(tài)分布)

Fast Bayes A:

Bayes B的方法在asreml中實(shí)現(xiàn):

marker文件格式:

  • 文件命名為*.mkr

  • 第一列為基因型ID

  • 第一行為SNP ID

  • mkr中不能有缺失值

標(biāo)記文件的命令參數(shù), 這些參數(shù)都需要和標(biāo)記文件放在同一行才可以起作用
filename.mkr

  • !markers m # 標(biāo)記的個(gè)數(shù)(可以省略)

  • !IDS n # 個(gè)體的個(gè)數(shù)(可以省略)

  • !FBA k # 定義asreml是否使用GBLUP(省略, 為GBLUP, 標(biāo)記方差一致, k=0), k在Fast BayesA中是標(biāo)記的方差分布符合逆卡方(inverse Chi-square)分布的參數(shù), 如果使用!FBA, 默認(rèn)的k=4. 一般來說k需要大于3小于20. 如果!FBA出現(xiàn), asreml會默認(rèn)使用!EXTRA 5用于讀取mef文件, 當(dāng)做初始值.

  • !FBB p # p是百分?jǐn)?shù), 設(shè)置多大比例標(biāo)記方差組分為0(對應(yīng)的是標(biāo)記的效應(yīng)值也為0), 這里可以定義BayesB

  • !HEADER 0 # 標(biāo)記沒有行頭

  • !SKIP c # 掉過的行數(shù)

  • !CSKIP # 掉過的列數(shù), 使用!SKIP -1表示第一列沒有ID, 是SNP

    以下參數(shù)不常用

  • !OFFSET o

  • !CENTER

  • !SAVEGIV g

  • !PENALTY d

  • !DFOFFSET t

  • !MSCALE s

  • !PEV

權(quán)重G矩陣

常規(guī)GBLUP命令

!wrokspace 1
title: standard GBLUP model
ID *
phenotype
genotype.mrk !markers 10031 !IDS 3226 # 標(biāo)記文件有10031個(gè)SNP, ID有3225個(gè)
phenotype.txt !skip 1 !maxit 50 !gdense #使用稠密矩陣(dense)

phenotype ~ mu !r grm1(ID)
residual units

結(jié)果說明

  • 基因型個(gè)體的GBLUP值在.sln文件中

  • 如果標(biāo)記ID有1000個(gè), mark文件ID有1500, 則sln文件也會有1500, 另外500為GBLUP預(yù)測值(即這部分沒有表型值, 根據(jù)基因型進(jìn)行的GBLUP值預(yù)測)

  • 標(biāo)記的效應(yīng)值在.mef文件中, 如果!PEV在mark文件后面, .mef文件中會有標(biāo)準(zhǔn)誤

Fast Bayes A方法命令
很多時(shí)候, 我們對一些效應(yīng)較大的標(biāo)記感興趣, 例如QTL, 但是GBLUP估計(jì)是收縮是估計(jì)(shrunken estimators), QTL的效應(yīng)值會被周圍的標(biāo)記吸收掉, 導(dǎo)致大效應(yīng)標(biāo)記難以發(fā)現(xiàn).
Bayes A的模型可以鑒定少數(shù)大效應(yīng)的標(biāo)記, 這里的Fast Bayes-A like 方法類似. 對于一些性狀, Fast Bayes-A比GBLUP的預(yù)測效果更好.

調(diào)整對角線D

常規(guī)Fast-BayesA命令

!wrokspace 1
title: Fast-BayesA model
ID !A
phenotype
genotype.mrk !markers 10031 !IDS 3226 !FBA 4.2 # 標(biāo)記文件有10031個(gè)SNP, ID有3225個(gè), !FBA 設(shè)置為4.2
phenotype.txt !skip 1 !maxit 50

phenotype ~ mu !r grm1(ID) 0.808 !GF # 這里Vg的gamma設(shè)置為0.808, 固定方差組分
residual units

結(jié)果說明

  • .mef包括marker的效應(yīng)值, 以及權(quán)重(weight)

  • .res 包括顯著性的SNP

不同的K值, Vg是固定還是估計(jì) 比較

結(jié)論:

  • k值為4左右是, 效果比較好

  • Vg是固定還是估算, 影響不大, 默認(rèn)估算

5, 使用asreml注意事項(xiàng)

  • 只有一個(gè)GRM文件可以用, 如果有多個(gè), 建議轉(zhuǎn)化為giv使用

  • 對于Fast Bayes模型中, 只有一個(gè)GRM能夠使用, 如果有其它, 使用giv

  • ID 的順序要和G的ID順序一致, 建議將G的ID單獨(dú)抽取出來, 用!L 定義

  • !PEV會給出標(biāo)記的標(biāo)準(zhǔn)誤, 結(jié)果不可靠

基因型的GBLUP在.sln中, mark的效應(yīng)在.mef中, 標(biāo)記的權(quán)重(weight)在.mef中, 大效應(yīng)的標(biāo)記在.res文件中.

6, asreml基因組選擇考慮GWAS和QTL顯著性位點(diǎn)

如果已經(jīng)鑒定出大效應(yīng)的SNP, 可以放在模型中, 這樣模型就可以利用GWAS和QTL的信息, 提高預(yù)測的準(zhǔn)確性.

snp(ID, 954) snp(ID,4480)

可以作為固定因子, 或者隨機(jī)因子.

后記

GS中, 多性狀GS模型的效果要高于單性狀GS, asreml中有很多強(qiáng)大的函數(shù)可以利用, 未來可期.

    轉(zhuǎn)藏 分享 獻(xiàn)花(0

    0條評論

    發(fā)表

    請遵守用戶 評論公約

    類似文章 更多