Genome Variant Analysis


java -jar VarScan.jar somatic [normal_pileup] [tumor_pileup] [output] OPTIONS


--output-snp - Output file for SNP calls [default: output.snp]
--output-indel - Output file for indel calls [default: output.indel]
--min-coverage - Minimum coverage in normal and tumor to call variant [8]
--min-coverage-normal - Minimum coverage in normal to call somatic [8]
--min-coverage-tumor - Minimum coverage in tumor to call somatic [6]
--min-var-freq - Minimum variant frequency to call a heterozygote [0.10]
--min-freq-for-hom	Minimum frequency to call homozygote [0.75]
--normal-purity - Estimated purity (non-tumor content) of normal sample [1.00]
--tumor-purity - Estimated purity (tumor content) of tumor sample [1.00]
--p-value - P-value threshold to call a heterozygote [0.99]
--somatic-p-value - P-value threshold to call a somatic site [0.05]
--strand-filter - If set to 1, removes variants with >90% strand bias
--validation - If set to 1, outputs all compared positions even if non-variant

Note that more specific options (e.g. min-coverage-normal) will override the default or specificied value of less specific options (e.g. min-coverage). 

The normal and tumor purity values should be a value between 0 and 1. The default (1) implies that the normal is 100% pure with no contaminating tumor cells, and the tumor is 100% pure with no contaminating stromal or other non-malignant cells. You would change tumor-purity to something less than 1 if you have a low-purity tumor sample and thus expect lower variant allele frequencies for mutations. You would change normal-purity to something less than 1 only if it's possible that there will be some tumor content in your "normal" sample, e.g. adjacent normal tissue for a solid tumor, malignant blood cells in the skin punch normal for some liquid tumors, etc. 

There are two p-value options. One (p-value) is the significance threshold for the first-pass algorithm that determines, for each position, if either normal or tumor is variant at that position. The second (somatic-p-value) is more important; this is the threshold below which read count differences between tumor and normal are deemed significant enough to classify the sample as a somatic mutation or an LOH event. In the case of a shared (germline) variant, this p-value is used to determine if the combined normal and tumor evidence differ significantly enough from the null hypothesis (no variant with same coverage) to report the variant. See the somatic mutation callingsection for details.

Two tab-delimited files (SNPs and Indels) with the following columns:
chrom					chromosome name
position				position (1-based from the pileup)
ref						reference allele at this position
var						variant allele at this position
normal_reads1			reads supporting reference allele
normal_reads2			reads supporting variant allele
normal_var_freq			frequency of variant allele by read count
normal_gt				genotype call for Normal sample
tumor_reads1			reads supporting reference allele
tumor_reads2			reads supporting variant allele
tumor_var_freq			frequency of variant allele by read count
tumor_gt				genotype call for Tumor sample
somatic_status			status of variant (Germline, Somatic, or LOH)	
variant_p_value			Significance of variant read count vs. baseline error rate
somatic_p_value			Significance of tumor read count vs. normal read count
tumor_reads1_plus       Ref-supporting reads from + strand in tumor
tumor_reads1_minus      Ref-supporting reads from - strand in tumor
tumor_reads2_plus       Var-supporting reads from + strand in tumor
tumor_reads2_minus		Var-supporting reads from - strand in tumor

Share your experience or ask a question