MERCYRI hff92474 64位https://the./~sgtatham/putty/latest/w64/putty.exe Winscp: https://jaist.dl./project/winscp/WinSCP/5.13.4/WinSCP-5.13.4-Setup.exe 預處理原始數據 解壓: gunzip newBGIseq500_*.fq.gz gunzip chr17.vcf.gz fastq轉fasta 命令如下:sh fq-fa.sh fq-fa.sh awk '{if(NR%4 == 1){print ">" substr($0, 2)}}{if(NR%4 == 2){print}}' newBGIseq500_1.fq > newBGIseq500_1.fa awk '{if(NR%4 == 1){print ">" substr($0, 2)}}{if(NR%4 == 2){print}}' newBGIseq500_2.fq > newBGIseq500_2.fa 一、 1.成對reads數是1599999,能覆蓋。 reads計算方法:CL100024405L1C001為newBGIseq500_1.fq和newBGIseq500_2.fq中reads特有的特征,故通過grep -c 'CL100024405L1C001' *.fq 來計算PE reads數。 reads命令如下: sh reads.sh reads.sh grep -c 'CL100024405L1C001' newBGIseq500_1.fq grep -c 'CL100024405L1C001' newBGIseq500_2.fq 覆蓋方法及命令如下: Rscript coverage.R coverage.R 實際測序堿基數目<-1599999*100*2 需要覆蓋堿基數目<-6*1e6*40*0.99 實際測序堿基數目>需要覆蓋堿基數目 2.下載安裝soapdenovo軟件, 網址:https://github.com/aquaskyline/SOAPdenovo2 unzip SOAPdenovo2-master.zip cd SOAPdenovo2-master pwd /home/stu38/SOAPdenovo2-master vi ~/.bashrc source ~/.bashrc make 建立配置文件lib.cfg vi lib.cfg max_rd_len=100 [LIB] avg_ins=450 reverse_seq=0 asm_flags=3 rank=1 pair_num_cutoff=3 map_len=32 q1=/home/data/newBGIseq500_1.fq q2=/home/data/newBGIseq500_2.fq f1=/home/data/newBGIseq500_1.fa f2=/home/data/newBGIseq500_2.fa 運行soapdenovo,得到組裝結果ant.scafseq和ant.scafStatistics等 ./SOAPdenovo-127mer all -s lib.cfg -K 35 -D 1 -o ant >>2ant.log 計算組裝結果序列長度 命令如下:perl changdu.pl ant.scafseq > length.txt changdu.pl #!usr/bin/perl use warnings; use strict; my (@arr, $changdu, %hash, $key); open (FH, "$ARGV[0]"); #$ARGV[0]指的是腳本運行后跟的第一個參數 while(<FH>){ chomp; if (/^>/){ $key=$_; } $hash{$key}.=$_ unless /^>/; } foreach (keys %hash){ $changdu=length($hash{$_}); print "$_\t$changdu\t$hash{$_}\n"; } 或者 awk '/^>/&&NR>1{print "";}{ printf "%s",/^>/ ? $0" ":$0 }' ant.scafseq |awk '{print $1"\t"length($3)}' > length.txt 長度序列挑取及排序 命令如下:sh sort-length.sh sort-length.sh cut -f2 length.txt |sort -nr >sort-length.txt 長度累積曲線圖,詳見length.pdf 計算方法及命令如下:Rscript length.R pdf("length.pdf") dat<-read.table("/home/exam01/sort-length.txt") d<-cumsum(dat)/sum(dat) plot(d$V1,type="l",main="length distribution",xlab="length",ylab="frequency",col="red") dev.off() q() 3.組裝結果的n50是687 計算方法及命令如下: perl n50.pl ant.scafSeq n50.pl #/usr/bin/perl -w use strict; my ($len,$total)=(0,0); my @x; while(<>){ if(/^[\>\@]/){ if($len>0){ $total+=$len; push @x,$len; } $len=0; } else{ s/\s//g; $len+=length($_); } } if ($len>0){ $total+=$len; push @x,$len; } @x=sort{$b<=>$a} @x; my ($count,$half)=(0,0); for (my $j=0;$j<@x;$j++){ $count+=$x[$j]; if (($count>=$total/2)&&($half==0)){ print "N50: $x[$j]\n"; $half=$x[$j] }elsif ($count>=$total*0.9){ print "N90: $x[$j]\n"; exit; } } 二、 1.vcf中變異位點的qual值分布請查看文件qual.txt,qual分布圖詳見qual.pdf 計算方法及命令如下: 下載安裝vcftools 網址:https://jaist.dl./project/vcftools/vcftools_0.1.13.tar.gz tar zxvf vcftools_0.1.13.tar.gz pwd /home/soft/vcftools_0.1.13 vi ~/.bashrc source ~/.bashrccd vcftools_0.1.13 make 統(tǒng)計qual值分布,命令如下: vcftools --vcf chr17.vcf --site-quality --out Q 圖計算方法及命令如下:Rscript qual.R pdf("qual.pdf") dat<-read.table("/home/exam01/Q.1qual") hist(dat[,2],main="qual distribution",xlab="qual value",ylab="frequency") dev.off() q() 2.TP53基因變異信息請查看文件tp53.vcf 提取tp53基因變異情況,命令如下: vcftools --vcf chr17.vcf --chr chr17 --from-bp 7668402 --to-bp 7687550 --recode -c > tp53.vcf 計算變異位點總數目是41,命令如下:sh var.sh var.sh grep -v '^#' tp53.vcf |wc -l 計算各樣本tp53變異位點情況,未變異數目=變異總數目-純合和雜合變異數目,結果是27DMBDM4YT純合9個,雜合8個,未變異24個,7XKZJA3JWX純合6個,雜合33個,未變異2個,APRDKV0LDS純合8個,雜合8個,未變異25個。 純合計算方法及命令如下:sh hom-var1.sh hom-var1.sh less -S tp53.vcf | cut -f 10 | grep -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1==$2) print $0}' | wc –l hom-var2.sh less -S tp53.vcf | cut -f 11 | grep -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1==$2) print $0}' | wc –l hom-var3.sh less -S tp53.vcf | cut -f 12 | grep -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1==$2) print $0}' | wc –l 雜合計算方法及命令如下:sh het-var1.sh hom-var1.sh less -S tp53.vcf | cut -f 10 | grep -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1!=$2) print $0}' | wc –l hom-var2.sh less -S tp53.vcf | cut -f 11 | grep -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1!=$2) print $0}' | wc –l hom-var3.sh less -S tp53.vcf | cut -f 12 | grep -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1!=$2) print $0}' | wc –l 三、 1.kmer圖詳見17mer.pdf,方法如下: 下載安裝gce 網址:ftp://ftp.genomics.org.cn/pub/gce tar -zxvf gce-1.0.0.tar.gz cd gce-1.0.0/kmerfreq/kmer_freq_hash pwd /home/soft/gce-1.0.0/ kmerfreq/kmer_freq_hash vi ~/.bashrc source ~/.bashrc 建立fq.list,命令如下:ls /home/data/*.fq > fq.list 運行kmer分析,得到output.freq.stat,進行后續(xù)R畫圖 命令如下:kmer_freq_hash -k 17 -i 450000000 -l fq.list 2>kmerfreq.log 圖計算方法及命令如下:Rscript 17mer.R pdf("17mer.pdf") dat<-read.table("/home/exam01/output.freq.stat") plot(dat[,2],type="l",main="17mer distribution",xlab="17mer depth",ylab="frequency",xlim=c(0,200),ylim=c(0,1e6)) dev.off() q() 2.基因組大小是6.4344e+06bp 計算方法及命令如下:less -S output.freq.stat | awk '{sum += $1*$2} END {print sum/41}' 四、 1.下載安裝bzip、bwa、samtools、bcftools bzip網址:https:///download/bzip2 bwa網址:https://jaist.dl./project/bio-bwa/bwa-0.7.17.tar.bz2 samtools網址:https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2 bcftools網址:https://github.com/samtools/bcftools/releases/download/1.9/bcftools-1.9.tar.bz2 bzip: tar -zxvf bzip2-1.0.6.tar.gz cd bzip2-1.0.6 pwd /home/soft/bzip2-1.0.6 vi ~/.bashrc source ~/.bashrc make bwa: tar jxvf bwa-0.7.17.tar.bz2 cd bwa pwd /home/soft/bwa-0.7.17 vi ~/.bashrc source ~/.bashrc make bcftools: tar jxvf bcftools-1.9.tar.bz2 cd bwa pwd /home/soft/bcftools-1.9 vi ~/.bashrc source ~/.bashrc make samtools: tar jxvf samtools-1.9.tar.bz2 cd bwa pwd /home/soft/samtools-1.9 vi ~/.bashrc source ~/.bashrc ./congfigure --without-curses make 2.bwa mem比對 bwa index *.fa bwa mem *.fa newBGIseq500_1.fq >mem.sam bwa aln ref.fa newBGIseq500_1.fq >read1.sai bwa aln ref.fa newBGIseq500_2.fq >read2.sai bwa sampe ref.fa read1.sai read2.sai newBGIseq500_1.fq newBGIseq500_2.fq >aln.sam 3.sam轉bam,sort samtools view -Sb mem.sam > mem.bam samtools sort mem.bam >mem.sort 4.call變異 Samtools mpileup -ugf ref.fa mem.sort |bcftools call –vm –O z –o test.vcf |
|