Build a recalibration model to score variant quality for filtering purposes
java -Xmx4g -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R reference.fasta -input raw_variants.withASannotations.vcf -AS -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.sites.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.b37.sites.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_135.b37.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff -mode SNP -recalFile output.AS.recal -tranchesFile output.AS.tranches -rscriptFile output.plots.AS.R
Argument name(s) | Default value | Summary | |
---|---|---|---|
Required Inputs | |||
--input | NA | One or more VCFs of raw input variants to be recalibrated | |
--resource | [] | A list of sites for which to apply a prior probability of being correct but which aren't used by the algorithm (training and truth sets are required to run) | |
Required Outputs | |||
--recal_file  -recalFile | NA | The output recal file used by ApplyRecalibration | |
--tranches_file  -tranchesFile | NA | The output tranches file used by ApplyRecalibration | |
Required Parameters | |||
--mode | SNP | Recalibration mode to employ | |
--use_annotation  -an | [] | The names of the annotations which should used for calculations | |
Optional Inputs | |||
--aggregate | NA | Additional raw input variants to be used in building the model | |
Optional Outputs | |||
--model_file  -modelFile | stdout | A GATKReport containing the positive and negative model fits | |
--rscript_file  -rscriptFile | NA | The output rscript file generated by the VQSR to aid in visualization of the input data and learned model | |
Optional Parameters | |||
--ignore_filter  -ignoreFilter | [] | If specified, the variant recalibrator will also use variants marked as filtered by the specified filter name in the input VCF file | |
--target_titv  -titv | 2.15 | The expected novel Ti/Tv ratio to use when calculating FDR tranches and for display on the optimization curve output figures. (approx 2.15 for whole genome experiments). ONLY USED FOR PLOTTING PURPOSES! | |
--TStranche  -tranche | [100.0, 99.9, 99.0, 90.0] | The levels of truth sensitivity at which to slice the data. (in percent, that is 1.0 for 1 percent) | |
Optional Flags | |||
--ignore_all_filters  -ignoreAllFilters | false | If specified, the variant recalibrator will ignore all input filters. Useful to rerun the VQSR from a filtered output file. | |
--output_model  -outputModel | false | If specified, the variant recalibrator will output the VQSR model fit to the file specified by -modelFile or to stdout | |
--useAlleleSpecificAnnotations  -AS | false | If specified, the variant recalibrator will attempt to use the allele-specific versions of the specified annotations. | |
Advanced Parameters | |||
--badLodCutoff | -5.0 | LOD score cutoff for selecting bad variants | |
--dirichlet | 0.001 | The dirichlet parameter in the variational Bayes algorithm. | |
--max_attempts | 1 | Number of attempts to build a model before failing | |
--maxGaussians  -mG | 8 | Max number of Gaussians for the positive model | |
--maxIterations  -mI | 150 | Maximum number of VBEM iterations | |
--maxNegativeGaussians  -mNG | 2 | Max number of Gaussians for the negative model | |
--maxNumTrainingData | 2500000 | Maximum number of training data | |
--minNumBadVariants  -minNumBad | 1000 | Minimum number of bad variants | |
--MQCapForLogitJitterTransform  -MQCap | 0 | Apply logit transform and jitter to MQ values | |
--numKMeans  -nKM | 100 | Number of k-means iterations | |
--priorCounts | 20.0 | The number of prior counts to use in the variational Bayes algorithm. | |
--shrinkage | 1.0 | The shrinkage parameter in the variational Bayes algorithm. | |
--stdThreshold  -std | 10.0 | Annotation value divergence threshold (number of standard deviations from the means) | |
Advanced Flags | |||
--trustAllPolymorphic  -allPoly | false | Trust that all the input training sets' unfiltered records contain only polymorphic sites to drastically speed up the computation. |