-
Function: Collect whole genome sequencing-related metrics. This tool computes metrics that are useful for evaluating coverage and performance of whole genome sequencing experiments. These metrics include the percentages of reads that pass minimal base- and mapping- quality filters as well as coverage (read-depth) levels. The histogram output is optional and for a given run, displays two separate outputs on the y-axis while using a single set of values for the x-axis. Specifically, the first column in the histogram table (x-axis) is labeled 'coverage' and represents different possible coverage depths. However, it also represents the range of values for the base quality scores and thus should probably be labeled 'sequence depth and base quality scores'. The second and third columns (y-axes) correspond to the numbers of bases at a specific sequence depth 'count' and the numbers of bases at a particular base quality score 'baseq_count' respectively.Although similar to the CollectWgsMetrics tool, the default thresholds for CollectRawWgsMetrics are less stringent. For example, the CollectRawWgsMetrics have base and mapping quality score thresholds set to '3' and '0' respectively, while the CollectWgsMetrics tool has the default threshold values set to '20' (at time of writing). Nevertheless, both tools enable the user to input specific threshold values.
Usage: java -jar picard.jar CollectRawWgsMetrics I=input.bam O=raw_wgs_metrics.txt R=reference_sequence.fasta INCLUDE_BQ_HISTOGRAM=true
-
Function: Subsets intervals from a reference sequence to a new FASTA file.This tool takes a list of intervals, reads the corresponding subsquences from a reference FASTA file and writes them to a new FASTA file as separate records. Note that the reference FASTA file must be accompanied by an index file and the interval list must be provided in Picard list format. The names provided for the intervals will be used to name the corresponding records in the output file.
Usage: java -jar picard.jar ExtractSequences INTERVAL_LIST=regions_of_interest.interval_list R=reference.fasta O=extracted_IL_sequences.fasta
-
Function: Convert alignments in BAM or SAM format into fastq format.
Usage: bam2fq.py -i test_PairedEnd_StrandSpecific_hg19.sam -o bam2fq_out1
-
Function: It’s very important to check if current sequencing depth is deep enough to perform
alternative splicing analyses. For a well annotated organism, the number of expressed genes
in particular tissue is almost fixed so the number of splice junctions is also fixed. The fixed
splice junctions can be predetermined from reference gene model. All (annotated) splice
junctions should be rediscovered from a saturated RNA-seq data, otherwise, downstream
alternative splicing analysis is problematic because low abundance splice junctions are
missing. This module checks for saturation by resampling 5%, 10%, 15%, ..., 95% of total
alignments from BAM or SAM file, and then detects splice junctions from each subset and
compares them to reference gene model.
Usage: junction_saturation.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -r hg19.refseq.bed12 -o output
-
Function: Lifts over an interval list from one reference build to another. This tool adjusts the coordinates in an interval list derived from one reference to match a new reference, based on a chain file that describes the correspondence between the two references. It is based on the UCSC liftOver tool (see: http://genome.ucsc.edu/cgi-bin/hgLiftOver) and uses a UCSC chain file to guide its operation. It accepts both Picard interval_list files or VCF files as interval inputs.
Usage: java -jar picard.jar LiftOverIntervalList I=input.interval_list O=output.interval_list SD=reference_sequence.dict CHAIN=build.chain
-
Function: Uses samtools flagstat command to print descriptive information for a BAM dataset.
Usage: samtools flagstat in.sam|in.bam|in.cram
-
Function: Collect metrics to assess oxidative artifacts.This tool collects metrics quantifying the error rate resulting from oxidative artifacts. For a brief primer on oxidative artifacts, see the GATK Dictionary.This tool calculates the Phred-scaled probability that an alternate base call results from an oxidation artifact. This probability score is based on base context, sequencing read orientation, and the characteristic low allelic frequency. Please see the following reference for an in-depth discussion of the OxoG error rate.
Usage: java -jar picard.jar CollectOxoGMetrics I=input.bam O=oxoG_metrics.txt R=reference_sequence.fasta
-
Function: Chart the distribution of quality scores.
Usage: java -jar picard.jar QualityScoreDistribution I=input.bam O=qual_score_dist.txt CHART=qual_score_dist.pdf
-
Function: Takes a SAM or BAM file and separates all the reads into one SAM or BAM file per library name. Reads that do not have a read group specified or whose read group does not have a library name are written to a file called 'unknown.' The format (SAM or BAM) of the output files matches that of the input file.
Usage: java -jar picard.jar SplitSamByLibrary
-
Function: Asserts the provided gzip file's (e.g., BAM) last block is well-formed; RC 100 otherwise
Usage: java -jar picard.jar CheckTerminatorBlock
-
Function: Merges multiple VCF or BCF files into one VCF file. Input files must be sorted by their contigs and, within contigs, by start position. The input files must have the same sample and contig lists. An index file is created and a sequence dictionary is required by default.
Usage: java -jar picard.jar MergeVcfs
-
Function: Provided a BAM/SAM file and reference gene model, this module will calculate how mapped
reads were distributed over genome feature (like CDS exon, 5’UTR exon, 3’ UTR exon, Intron,
Intergenic regions). When genome features are overlapped (e.g. a region could be annotated
as both exon and intron by two different transcripts) , they are prioritize as:
CDS exons > UTR exons > Introns > Intergenic regions, for example, if a read was mapped to
both CDS exon and intron, it will be assigned to CDS exons.
Usage: read_distribution.py -i Pairend_StrandSpecific_51mer_Human_hg19.bam -r hg19.refseq.bed12
-
Function: Collect metrics regarding GC bias. This tool collects information about the relative proportions of guanine (G) and cytosine (C) nucleotides in a sample. Regions of high and low G + C content have been shown to interfere with mapping/aligning, ultimately leading to fragmented genome assemblies and poor coverage in a phenomenon known as 'GC bias'. Detailed information on the effects of GC bias on the collection and analysis of sequencing data can be found at DOI: 10.1371/journal.pone.0062856/.
Usage: java -jar picard.jar CollectGcBiasMetrics I=input.bam O=gc_bias_metrics.txt CHART=gc_bias_metrics.pdf S=summary_metrics.txt R=reference_sequence.fasta
-
Function: The command bamtools resolve resolves paired-end reads. The resolving mode is required, and it can be -makeStats, -markPairs, or -twoPass.
Usage: bamtools resolve -twoPass -in input_alignments.bam -out output_alignments.bam
-
Function: Transforms raw Illumina sequencing data into an unmapped SAM or BAM file.
Usage: java -jar picard.jar IlluminaBasecallsToSam BASECALLS_DIR=/BaseCalls/ LANE=001 READ_STRUCTURE=25T8B25T RUN_BARCODE=run15 IGNORE_UNEXPECTED_BARCODES=true LIBRARY_PARAMS=library.params