Genome Variant Analysis

GEMINI region
Function: Extracting variants from specific genes.
Usage: gemini region --gene PTPN22 my.db
java -jar GenomeAnalysisTK.jar
Function: Count contiguous regions in an interval list
Usage: java -jar GenomeAnalysisTK.jar -T CountIntervals -R reference.fasta -o output.txt -check intervals.list
java -jar GenomeAnalysisTK.jar
Function: Left-align indels in a variant callset
Usage: java -jar GenomeAnalysisTK.jar -T LeftAlignAndTrimVariants -R reference.fasta --variant input.vcf -o output.vcf
java -jar GenomeAnalysisTK.jar
Function: Collect statistics about sequence reads based on their SAM flags
Usage: java -jar GenomeAnalysisTK.jar -T FlagStat -R reference.fasta -I reads.bam [-o output.txt]
java -jar GenomeAnalysisTK.jar
Function: Left-align indels in a variant callset
Usage: java -jar GenomeAnalysisTK.jar -T LeftAlignAndTrimVariants -R reference.fasta --variant input.vcf -o output.vcf --splitMultiallelics
java -jar GenomeAnalysisTK.jar
Function: Calculate basic statistics about the reference sequence itself
Usage: java -jar GenomeAnalysisTK.jar -T FastaStats -R reference.fasta [-o output.txt]
java -jar GenomeAnalysisTK.jar
Function: Selects headers from a VCF source
Usage: java -jar GenomeAnalysisTK.jar -T SelectHeaders -R reference.fasta -V input.vcf -o output.vcf -hn FILTER -hn FORMAT -hn INFO
java -jar GenomeAnalysisTK.jar
Function: Outputs a list of intervals that are covered to or above a given threshold
Usage: java -jar GenomeAnalysisTK.jar -T FindCoveredIntervals -R reference.fasta -I my_file.bam [-cov 10 \] [-uncovered \] -o output.list
java -jar GenomeAnalysisTK.jar
Function: Left-align indels within reads in a bam file
Usage: java -jar GenomeAnalysisTK.jar -R reference.fasta -T LeftAlignIndels -I reads.bam -o output_with_leftaligned_indels.bam
java -jar GenomeAnalysisTK.jar
Function: Left-align indels in a variant callset
Usage: java -jar GenomeAnalysisTK.jar -T LeftAlignAndTrimVariants -R reference.fasta --variant input.vcf -o output.vcf --dontTrimAlleles
java -jar GenomeAnalysisTK.jar
Function: Select a subset of variants from a larger callset
Usage: java -jar GenomeAnalysisTK.jar -R ref.fasta -T SelectVariants --variant input.vcf --maxFilteredGenotypes 5 --minFilteredGenotypes 2 --maxFractionFilteredGenotypes 0.60 --minFractionFilteredGenotypes 0.10
java -jar GenomeAnalysisTK.jar
Function: Randomly select variant records according to specified options
Usage: java -jar GenomeAnalysisTK.jar -T ValidationSiteSelectorWalker -R reference.fasta -V:foo input1.vcf -V:bar input2.vcf --numValidationSites 200 -sf samples.txt -o output.vcf -sampleMode POLY_BASED_ON_GT -freqMode UNIFORM -selectType INDEL
GEMINI interactions
Function: Find genes among variants that are interacting partners.
Usage: gemini interactions -g CTBP2 -r 3 example.db
java -jar GenomeAnalysisTK.jar
Function: Count the number of ROD objects encountered
Usage: java -jar GenomeAnalysisTK.jar -T CountRODs -R reference.fasta -o output.txt --rod input.vcf
vt
Function: for comparison purposes, it's very useful to normalize the vcf output, especially for more complex graphs which can make large variant blocks that contain a lot of reference bases (Note: requires [vt](http://genome.sph.umich.edu/wiki/Vt)):
Usage: vt decompose_blocksub -a calls.vcf | vt normalize -r FASTA_FILE - > calls.clean.vcf