Genome Variant Analysis


java -jar GenomeAnalysisTK.jar -R reference.fasta -T HaplotypeCaller -I sample1.bam --emitRefConfidence GVCF [--dbsnp dbSNP.vcf] [-L targets.interval_list] -G Standard -G AS_Standard -o output.raw.snps.indels.AS.g.vcf


Argument name(s)Default valueSummary
Optional Inputs
noneSet of alleles to use in genotyping
nonedbSNP file
Optional Outputs
NAOutput the active region to this IGV formatted file
NAOutput the raw activity profile results in IGV format
NAWrite debug assembly graph information to this file
stdoutFile to which variants should be written
Optional Parameters
0.0Fraction of contamination to aggressively remove
DISCOVERYSpecifies how to determine the alternate alleles to use for genotyping
[StandardAnnotation, StandardHCAnnotation]One or more classes/groups of annotations to apply to variant calls
0.001Heterozygosity value used to compute prior likelihoods for any locus
0.01Standard deviation of eterozygosity for SNP and indel calling.
1.25E-4Heterozygosity for indel calling
10000Maximum reads in an active region
10Minimum base quality required to consider a base for calling
10Minimum number of reads sharing the same alignment start for each genomic location in an active region
NAName of single sample to use from a multi-sample bam
2Ploidy per sample. For pooled data, set to (Number of samples in each pool * Sample Ploidy).
10.0The minimum phred-scaled confidence threshold at which variants should be called
Optional Flags
falseAnnotate number of alleles observed
falseUse new AF model instead of the so-called exact model
Advanced Inputs
NAUse this interval list file as the active regions to process
[]Comparison VCF file
Advanced Outputs
NAFile to which assembled haplotypes should be written
Advanced Parameters
0.002Threshold for the probability of a profile state being active.
NAThe active region extension; if not provided defaults to Walker annotated default
NAThe active region maximum size; if not provided defaults to Walker annotated default
[]One or more specific annotations to apply to variant calls
CALLED_HAPLOTYPESWhich haplotypes should be written to the BAM
NAThe sigma of the band pass filter Gaussian kernel; if not provided defaults to Walker annotated default
NAContamination per sample
falseMode for emitting reference confidence scores
[]One or more specific annotations to exclude
10Flat gap continuation penalty for use in the Pair HMM
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99]Exclusive upper bounds for reference confidence GQ bands (must be in [1, 100] and specified in increasing order)
10The size of an indel to check for in the reference model
[]Input prior for calls
[10, 25]Kmer size to use in the read threading assembler
6Maximum number of alternate alleles to genotype
1024Maximum number of genotypes to consider at any site
100Maximum number of PL values to output
128Maximum number of haplotypes to consider for your population
30000Maximum reads per sample given to traversal map() function
10000000Maximum total reads given to traversal map() function
4Minimum length of a dangling branch to attempt recovery
2Minimum support to not prune paths in the graph
1Number of samples that must pass the minPruning threshold
EMIT_VARIANTS_ONLYWhich type of calls we should output
CONSERVATIVEThe PCR indel model to use
45The global assumed mismapping rate for reads
Advanced Flags
falseAllow graphs that have non-unique kmers in the reference
falseAnnotate all sites with PLs
false1000G consensus mode
falsePrint out very verbose debug information about each triggering active region
falseDon't skip calculations in ActiveRegions with no variants
falseDisable physical phasing
falseDisable iterating over kmer sizes when graph cycles are detected
falseIf specified, we will not trim down the active region from the full region (active + extension) to just the active interval for genotyping
falseDo not analyze soft clipped bases in the reads
falseEmit reads that are dropped for filtering, trimming, realignment failure
falseIf provided, all bases will be tagged as active
falseUse additional trigger on variants found in an external alleles file
falseUse the contamination-filtered read maps for the purposes of annotating variants

Share your experience or ask a question