Genome Variant Analysis


java -Xmx4g -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R reference.fasta -input raw_variants.withASannotations.vcf -AS -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.sites.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.b37.sites.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_135.b37.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff -mode SNP -recalFile output.AS.recal -tranchesFile output.AS.tranches -rscriptFile output.plots.AS.R


Argument name(s)Default valueSummary
Required Inputs
NAOne or more VCFs of raw input variants to be recalibrated
[]A list of sites for which to apply a prior probability of being correct but which aren't used by the algorithm (training and truth sets are required to run)
Required Outputs
NAThe output recal file used by ApplyRecalibration
NAThe output tranches file used by ApplyRecalibration
Required Parameters
SNPRecalibration mode to employ
[]The names of the annotations which should used for calculations
Optional Inputs
NAAdditional raw input variants to be used in building the model
Optional Outputs
stdoutA GATKReport containing the positive and negative model fits
NAThe output rscript file generated by the VQSR to aid in visualization of the input data and learned model
Optional Parameters
[]If specified, the variant recalibrator will also use variants marked as filtered by the specified filter name in the input VCF file
2.15The expected novel Ti/Tv ratio to use when calculating FDR tranches and for display on the optimization curve output figures. (approx 2.15 for whole genome experiments). ONLY USED FOR PLOTTING PURPOSES!
[100.0, 99.9, 99.0, 90.0]The levels of truth sensitivity at which to slice the data. (in percent, that is 1.0 for 1 percent)
Optional Flags
falseIf specified, the variant recalibrator will ignore all input filters. Useful to rerun the VQSR from a filtered output file.
falseIf specified, the variant recalibrator will output the VQSR model fit to the file specified by -modelFile or to stdout
falseIf specified, the variant recalibrator will attempt to use the allele-specific versions of the specified annotations.
Advanced Parameters
-5.0LOD score cutoff for selecting bad variants
0.001The dirichlet parameter in the variational Bayes algorithm.
1Number of attempts to build a model before failing
8Max number of Gaussians for the positive model
150Maximum number of VBEM iterations
2Max number of Gaussians for the negative model
2500000Maximum number of training data
1000Minimum number of bad variants
0Apply logit transform and jitter to MQ values
100Number of k-means iterations
20.0The number of prior counts to use in the variational Bayes algorithm.
1.0The shrinkage parameter in the variational Bayes algorithm.
10.0Annotation value divergence threshold (number of standard deviations from the means)
Advanced Flags
falseTrust that all the input training sets' unfiltered records contain only polymorphic sites to drastically speed up the computation.

Share your experience or ask a question