Call variants and identifies their somatic status (Germline/LOH/Somatic) using pileup files from a matched tumor-normal pair
java -jar VarScan.jar somatic [normal_pileup] [tumor_pileup] [output] OPTIONS
OPTIONS: --output-snp - Output file for SNP calls [default: output.snp] --output-indel - Output file for indel calls [default: output.indel] --min-coverage - Minimum coverage in normal and tumor to call variant [8] --min-coverage-normal - Minimum coverage in normal to call somatic [8] --min-coverage-tumor - Minimum coverage in tumor to call somatic [6] --min-var-freq - Minimum variant frequency to call a heterozygote [0.10] --min-freq-for-hom Minimum frequency to call homozygote [0.75] --normal-purity - Estimated purity (non-tumor content) of normal sample [1.00] --tumor-purity - Estimated purity (tumor content) of tumor sample [1.00] --p-value - P-value threshold to call a heterozygote [0.99] --somatic-p-value - P-value threshold to call a somatic site [0.05] --strand-filter - If set to 1, removes variants with >90% strand bias --validation - If set to 1, outputs all compared positions even if non-variant
Note that more specific options (e.g. min-coverage-normal) will override the default or specificied value of less specific options (e.g. min-coverage).
The normal and tumor purity values should be a value between 0 and 1. The default (1) implies that the normal is 100% pure with no contaminating tumor cells, and the tumor is 100% pure with no contaminating stromal or other non-malignant cells. You would change tumor-purity to something less than 1 if you have a low-purity tumor sample and thus expect lower variant allele frequencies for mutations. You would change normal-purity to something less than 1 only if it's possible that there will be some tumor content in your "normal" sample, e.g. adjacent normal tissue for a solid tumor, malignant blood cells in the skin punch normal for some liquid tumors, etc.
There are two p-value options. One (p-value) is the significance threshold for the first-pass algorithm that determines, for each position, if either normal or tumor is variant at that position. The second (somatic-p-value) is more important; this is the threshold below which read count differences between tumor and normal are deemed significant enough to classify the sample as a somatic mutation or an LOH event. In the case of a shared (germline) variant, this p-value is used to determine if the combined normal and tumor evidence differ significantly enough from the null hypothesis (no variant with same coverage) to report the variant. See the somatic mutation callingsection for details.
OUTPUT Two tab-delimited files (SNPs and Indels) with the following columns: chrom chromosome name position position (1-based from the pileup) ref reference allele at this position var variant allele at this position normal_reads1 reads supporting reference allele normal_reads2 reads supporting variant allele normal_var_freq frequency of variant allele by read count normal_gt genotype call for Normal sample tumor_reads1 reads supporting reference allele tumor_reads2 reads supporting variant allele tumor_var_freq frequency of variant allele by read count tumor_gt genotype call for Tumor sample somatic_status status of variant (Germline, Somatic, or LOH) variant_p_value Significance of variant read count vs. baseline error rate somatic_p_value Significance of tumor read count vs. normal read count tumor_reads1_plus Ref-supporting reads from + strand in tumor tumor_reads1_minus Ref-supporting reads from - strand in tumor tumor_reads2_plus Var-supporting reads from + strand in tumor tumor_reads2_minus Var-supporting reads from - strand in tumor