Collect whole genome sequencing-related metrics. This tool computes metrics that are useful for evaluating coverage and performance of whole genome sequencing experiments. These metrics include the percentages of reads that pass minimal base- and mapping- quality filters as well as coverage (read-depth) levels. The histogram output is optional and for a given run, displays two separate outputs on the y-axis while using a single set of values for the x-axis. Specifically, the first column in the histogram table (x-axis) is labeled 'coverage' and represents different possible coverage depths. However, it also represents the range of values for the base quality scores and thus should probably be labeled 'sequence depth and base quality scores'. The second and third columns (y-axes) correspond to the numbers of bases at a specific sequence depth 'count' and the numbers of bases at a particular base quality score 'baseq_count' respectively.Although similar to the CollectWgsMetrics tool, the default thresholds for CollectRawWgsMetrics are less stringent. For example, the CollectRawWgsMetrics have base and mapping quality score thresholds set to '3' and '0' respectively, while the CollectWgsMetrics tool has the default threshold values set to '20' (at time of writing). Nevertheless, both tools enable the user to input specific threshold values.
java -jar picard.jar CollectRawWgsMetrics I=input.bam O=raw_wgs_metrics.txt R=reference_sequence.fasta INCLUDE_BQ_HISTOGRAM=true
MINIMUM_MAPPING_QUALITY (Integer) Minimum mapping quality for a read to contribute coverage. Default value: 0. This option can be set to 'null' to clear the default value.
MINIMUM_BASE_QUALITY (Integer) Minimum base quality for a base to contribute coverage. Default value: 3. This option can be set to 'null' to clear the default value.
COVERAGE_CAP (Integer) Treat bases with coverage exceeding this value as if they had coverage at this value. Default value: 100000. This option can be set to 'null' to clear the default value.
LOCUS_ACCUMULATION_CAP (Integer) At positions with coverage exceeding this value, completely ignore reads that accumulate beyond this value (so that they will not be considered for PCT_EXC_CAPPED). Used to keep memory consumption in check, but could create bias if set too low Default value: 200000. This option can be set to 'null' to clear the default value.
INPUT (File) Input SAM or BAM file. Required.
OUTPUT (File) Output metrics file. Required.
REFERENCE_SEQUENCE (File) The reference sequence fasta aligned to. Required.
STOP_AFTER (Long) For debugging purposes, stop after processing this many genomic bases. Default value: -1. This option can be set to 'null' to clear the default value.
INCLUDE_BQ_HISTOGRAM (Boolean) Determines whether to include the base quality histogram in the metrics file. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
COUNT_UNPAIRED (Boolean) If true, count unpaired reads, and paired reads with one end unmapped Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
SAMPLE_SIZE (Integer) Sample Size used for Theoretical Het Sensitivity sampling. Default is 10000. Default value: 10000. This option can be set to 'null' to clear the default value.
INTERVALS (File) An interval list file that contains the positions to restrict the assessment. Please note that all bases of reads that overlap these intervals will be considered, even if some of those bases extend beyond the boundaries of the interval. The ideal use case for this argument is to use it to restrict the calculation to a subset of (whole) contigs. To restrict the calculation just to individual positions without overlap, please see CollectWgsMetricsFromSampledSites. Default value: null.