java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -R reference.fasta -T VariantAnnotator -V input.vcf -o output.vcf --resource:foo resource.vcf --expression foo.AF --expression foo.FILTER |
Annotate variant calls with context information |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T GCContentByInterval -R reference.fasta -o output.txt -L input.intervals |
Calculates the GC content of the reference sequence for each interval |
vt |
vt decompose_blocksub -a calls.vcf | vt normalize -r FASTA_FILE - > calls.clean.vcf |
for comparison purposes, it's very useful to normalize the vcf output, especially for more complex graphs which can make large variant blocks that contain a lot of reference bases (Note: requires [vt](http://genome.sph.umich.edu/wiki/Vt)): |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T ValidateVariants -R reference.fasta -V input.vcf --dbsnp dbsnp.vcf |
Validate a VCF file with an extra strict set of criteria |
read_NVC.py |
read_NVC.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output |
This module is used to check the nucleotide composition bias. Due to random priming, certain
patterns are over represented at the beginning (5’end) of reads. This bias could be easily
examined by NVC (Nucleotide versus cycle) plot. NVC plot is generated by overlaying all
reads together, then calculating nucleotide composition for each position of read
(or each sequencing cycle). In ideal condition (genome is random and RNA-seq reads is
randomly sampled from genome), we expect A%=C%=G%=T%=25% at each position of reads. |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T SimulateReadsForVariants -R reference.fasta -V input_variants.vcf -o simulated_reads.bam --readDepth 50 --errorRate 25 |
Generate simulated reads for variants |
GEMINI autosomal recessive |
gemini autosomal_recessive test.auto_rec.db --columns "chrom,start,end,gene" |
Find variants meeting an autosomal recessive model |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T LeftAlignAndTrimVariants -R reference.fasta --variant input.vcf -o output.vcf --reference_window_stop 208 |
Left-align indels in a variant callset |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T VariantsToBinaryPed -R reference.fasta -V variants.vcf -m metadata.fam -bed output.bed -bim output.bim -fam output.fam |
Convert VCF to binary pedigree file |
VarScan |
java -jar VarScan.jar compare [file1] [file2] [type] [output] OPTIONS |
Performs set-comparison operations on two files of variants. |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T CountMales -R reference.fasta -I samples.bam -o output.txt |
Count the number of reads seen from male samples |
VarScan |
java -jar VarScan.jar filter [variants file] OPTIONS |
Filter variants in a file by coverage, supporting reads, variant frequency, or average base quality. It is for use with output from pileup2snp or pileup2indel. |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T AnalyzeCovariates -R myrefernce.fasta -before recal2.table -after recal3.table -plots recalQC.pdf |
Create plots to visualize base recalibration results |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T VariantsToAllelicPrimitives -R reference.fasta -V input.vcf -o output.vcf |
Simplify multi-nucleotide variants (MNPs) into more basic/primitive alleles. |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R reference.fasta -I myinput.bam -knownSites bundle/my-trusted-snps.vcf \ # optional but recommended -knownSites bundle/my-trusted-indels.vcf \ # optional but recommended -o firstpass.table # Generate the second pass recalibration table file java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R reference.fasta -I myinput.bam -knownSites bundle/my-trusted-snps.vcf -knownSites bundle/my-trusted-indels.vcf -BQSR firstpass.table -o secondpass.table # Finally generate the plots and also keep a copy of the csv (optional) java -jar GenomeAnalysisTK.jar -T AnalyzeCovariates -R reference.fasta -before firstpass.table -after secondpass.table -csv BQSR.csv \ # optional -plots BQSR.pdf |
Create plots to visualize base recalibration results |
GEMINI comp_hets |
gemini comp_hets my.db --columns "chrom, start, end" test.comp_het_default.2.db |
Identifying potential compound heterozygotes |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fasta -I input.bam --known indels.vcf -o forIndelRealigner.intervals |
Define intervals to target for local realignment |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T ValidateVariants -R reference.fasta -V input.vcf --validationTypeToExclude ALL |
Validate a VCF file with an extra strict set of criteria |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R reference.fasta -o output.fasta -L input.intervals -V input.vcf [--snpmask mask.vcf] |
Generate an alternative reference sequence over the specified interval |
java -jar GenomeAnalysisTK.jar |
java -jar GenomeAnalysisTK.jar -T CheckPileup -R reference.fasta -I your_data.bam --pileup:SAMPileup pileup_file.txt -L chr1:257-275 -o output_file_name |
Compare GATK's internal pileup to a reference Samtools pileup |