Variant calling
Document by Tongyoo P.
SSH client
File transfer
Calculate how much sequencing you need to hit a target depth of coverage (or vice versa)
What is SRA? http://www.ncbi.nlm.nih.gov/sra/
Using basic unix command for downloading file from remote server
Remainder of path:
/sra/sra-instant/reads/ByRun/sra/{SRR|ERR|DRR}/<first 6 characters of accession>/
Example data set: SRR5714433 https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR5714433
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR571/SRR5714433/SRR5714433.sra
Using sratoolkit function
prefetch -v SRR5714433
Convert sra to fastq: sra-toolkit https://github.com/ncbi/sra-tools/wiki/HowTo:-Binary-Installation
fastq-dump --split-3 SRR5714433.sra
Fastqc: ckeck quality http://www.bioinformatics.babraham.ac.uk/projects/download.html#fastqc
fastqc read.fastq
sickle - A windowed adaptive trimming tool for FASTQ files using quality https://github.com/najoshi/sickle
sickle pe/se
[-t ] : solexa (CASAVA < 1.3), illumina (CASAVA 1.3 to 1.7), sanger (which is CASAVA >= 1.8)
[-l ] : default 20
[-q ] : default 20
sickle pe -t sanger -f read1.fastq -r read2.fastq -o qtrim1.fastq -p qtrim2.fastq - s qtrim.unpaired.fastq
sickle se -t sanger -f input_file.fastq -o trimmed_output_file.fastq
Trimmomatic: A flexible read trimming tool for Illumina NGS data http://www.usadellab.org/cms/?page=trimmomatic
java -jar /share/apps/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
This will perform the following:
Remove adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10) Remove leading low quality or N bases (below quality 3) (LEADING:3) Remove trailing low quality or N bases (below quality 3) (TRAILING:3) Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 (SLIDINGWINDOW:4:15) Drop reads below the 36 bases long (MINLEN:36)
BWA: Generate bam file (binary NGS mapping file) samtools pipeline: Using bam files for SNP and indel detection
cd
mkdir vcf
cd vcf
pwd
Make a link to fastq and reference fasta
ln -s /home/pumie/bam/contigs_nc.fasta .
ln -s /home/pumie/bam/ch11_read1.fq.gz .
ln -s /home/pumie/bam/ch11_read2.fq.gz .
ls -lh
Building index for reference sequence
seqkit seq -w 70 contigs_nc.fasta > contigs_nc70.fasta
mv contigs_nc.fasta contigs_nc.fasta_ori
mv contigs_nc70.fasta contigs_nc.fasta
bwa index -p idx -a bwtsw contigs_nc.fasta
align forward reads to reference sequence
bwa aln -t 4 idx ch11_read1.fq.gz > ch11f.bwa
align reverse reads to reference sequence
bwa aln -t 4 idx ch11_read2.fq.gz > ch11r.bwa
Combine forward and reverse read. Create SAM file
bwa sampe idx ch11f.bwa ch11r.bwa ch11_read1.fq.gz ch11_read2.fq.gz > ch11.sam
Manipulate reference sequence to equally line length
Convert sam to bam and SNP calling process using samtools Building index for reference sequence
samtools faidx contigs_nc.fasta
samtools import contigs_nc.fasta ch11.sam ch11.bam
samtools sort ch11.bam -o ch11.srt.bam -@ 4
samtools index ch11.srt.bam
samtools tview ch11.srt.bam contigs_nc.fasta
samtools mpileup -uf contigs_nc.fasta ch11.srt.bam | bcftools call -mv --threads 4 -> ch11snp.vcf
cp ch11snp.vcf ch11.srt.bam ch11.srt.bam.bai ../public_html/Tomato
cd ../public_html/Tomato/
ls -lha
Manipulate vcf for viewing in JBrowse
bgzip ch11snp.vcf
tabix -p vcf ch11snp.vcf.gz
cp /home/pumie/bam/tracks.conf TomatoV1/
http://www.breedserve.cab.kps.ku.ac.th/~anno/JBrowse/?data=../Tomato/TomatoV1