Software Usage Function
samtools index samtools index [-bc] [-m INT] aln.bam|aln.cram [out.index] Index a coordinate-sorted BAM or CRAM file for fast random access. (Note that this does not work with SAM files even if they are bgzip compressed — to index such files, use tabix instead.)
java -jar picard.jar java -jar picard.jar ExtractSequences INTERVAL_LIST=regions_of_interest.interval_list R=reference.fasta O=extracted_IL_sequences.fasta Subsets intervals from a reference sequence to a new FASTA file.This tool takes a list of intervals, reads the corresponding subsquences from a reference FASTA file and writes them to a new FASTA file as separate records. Note that the reference FASTA file must be accompanied by an index file and the interval list must be provided in Picard list format. The names provided for the intervals will be used to name the corresponding records in the output file.
samtools sort samtools sort [-l level] [-m maxMem] [-o out.bam] [-O format] [-n] [-T tmpprefix] [-@ threads] [in.sam|in.bam|in.cram] This tool uses samtools sort command to sort BAM datasets in coordinate or read name order.
java -jar picard.jar java -jar picard.jar CheckTerminatorBlock Asserts the provided gzip file's (e.g., BAM) last block is well-formed; RC 100 otherwise
java -jar picard.jar java -jar picard.jar AddCommentsToBam I=input.bam O=modified_bam.bam C=comment_1 C="comment 2" Adds comments to the header of a BAM file.This tool makes a copy of the input bam file, with a modified header that includes the comments specified at the command line (prefixed by @CO). Use double quotes to wrap comments that include whitespace or special characters. Note that this tool cannot be run on SAM files.
bamtools bamtools resolve -twoPass -in input_alignments.bam -out output_alignments.bam The command bamtools resolve resolves paired-end reads. The resolving mode is required, and it can be -makeStats, -markPairs, or -twoPass.
samtools stats samtools stats [options] in.sam|in.bam|in.cram [region...] samtools stats collects statistics from BAM files and outputs in a text format. The output can be visualized graphically using plot-bamstats.
java -jar picard.jar java -jar picard.jar CompareMetrics metricfile1.txt metricfile2.txt Compare two metrics files.This tool compares the metrics and histograms generated from metric tools to determine if the generated results are identical. This tool is useful to test and compare outputs when code changes are implemented. It is not meant for use by end-users of this toolkit. The tool's output simply indicates whether two metrics files are equal or not equal.
java -jar picard.jar java -jar picard.jar IlluminaBasecallsToSam BASECALLS_DIR=/BaseCalls/ LANE=001 READ_STRUCTURE=25T8B25T RUN_BARCODE=run15 IGNORE_UNEXPECTED_BARCODES=true LIBRARY_PARAMS=library.params Transforms raw Illumina sequencing data into an unmapped SAM or BAM file.
java -jar picard.jar java -jar picard.jar QualityScoreDistribution I=input.bam O=qual_score_dist.txt CHART=qual_score_dist.pdf Chart the distribution of quality scores.
java -jar picard.jar java -jar picard.jar CollectRawWgsMetrics I=input.bam O=raw_wgs_metrics.txt R=reference_sequence.fasta INCLUDE_BQ_HISTOGRAM=true Collect whole genome sequencing-related metrics. This tool computes metrics that are useful for evaluating coverage and performance of whole genome sequencing experiments. These metrics include the percentages of reads that pass minimal base- and mapping- quality filters as well as coverage (read-depth) levels. The histogram output is optional and for a given run, displays two separate outputs on the y-axis while using a single set of values for the x-axis. Specifically, the first column in the histogram table (x-axis) is labeled 'coverage' and represents different possible coverage depths. However, it also represents the range of values for the base quality scores and thus should probably be labeled 'sequence depth and base quality scores'. The second and third columns (y-axes) correspond to the numbers of bases at a specific sequence depth 'count' and the numbers of bases at a particular base quality score 'baseq_count' respectively.Although similar to the CollectWgsMetrics tool, the default thresholds for CollectRawWgsMetrics are less stringent. For example, the CollectRawWgsMetrics have base and mapping quality score thresholds set to '3' and '0' respectively, while the CollectWgsMetrics tool has the default threshold values set to '20' (at time of writing). Nevertheless, both tools enable the user to input specific threshold values.
java -jar picard.jar java -jar picard.jar CalculateHsMetrics DEPRECATED: Use CollectHsMetrics instead. Calculates a set of Hybrid Selection specific metrics from an aligned SAMor BAM file. If a reference sequence is provided, AT/GC dropout metrics will be calculated, and the PER_TARGET_COVERAGE option can be used to output GC and mean coverage information for every target.
java -jar picard.jar java -jar picard.jar LiftOverIntervalList I=input.interval_list O=output.interval_list SD=reference_sequence.dict CHAIN=build.chain Lifts over an interval list from one reference build to another. This tool adjusts the coordinates in an interval list derived from one reference to match a new reference, based on a chain file that describes the correspondence between the two references. It is based on the UCSC liftOver tool (see: http://genome.ucsc.edu/cgi-bin/hgLiftOver) and uses a UCSC chain file to guide its operation. It accepts both Picard interval_list files or VCF files as interval inputs.
java -jar picard.jar java -jar picard.jar SortSam I=input.bam O=sorted.bam SORT_ORDER=coordinate Sorts a SAM or BAM file. This tool sorts the input SAM or BAM file by coordinate, queryname (QNAME), or some other property of the SAM record. The SortOrder of a SAM/BAM file is found in the SAM file header tag @HD in the field labeled SO.
read_quality.py read_quality.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output According to SAM specification, if Q is the character to represent “base calling quality” in SAM file, then Phred Quality Score = ord(Q) - 33. Here ord() is python function that returns an integer representing the Unicode code point of the character when the argument is a unicode object, for example, ord(‘a’) returns 97. Phred quality score is widely used to measure “reliability” of base-calling, for example, phred quality score of 20 means there is 1/100 chance that the base-calling is wrong, phred quality score of 30 means there is 1/1000 chance that the base-calling is wrong. In general: Phred quality score = -10xlog(10)P, here P is probability that base-calling is wrong.
java -jar picard.jar java -jar picard.jar CollectHsMetrics I=input.bam O=hs_metrics.txt R=reference_sequence.fasta BAIT_INTERVALS=bait.interval_list TARGET_INTERVALS=target.interval_list Collects hybrid-selection (HS) metrics for a SAM or BAM file. This tool takes a SAM/BAM file input and collects metrics that are specific for sequence datasets generated through hybrid-selection. Hybrid-selection (HS) is the most commonly used technique to capture exon-specific sequences for targeted sequencing experiments such as exome sequencing; for more information, please see the corresponding GATK Dictionary entry.
bam_stat.py bam_stat.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam Summarizing mapping statistics of a BAM or SAM file.
read_distribution.py read_distribution.py -i Pairend_StrandSpecific_51mer_Human_hg19.bam -r hg19.refseq.bed12 Provided a BAM/SAM file and reference gene model, this module will calculate how mapped reads were distributed over genome feature (like CDS exon, 5’UTR exon, 3’ UTR exon, Intron, Intergenic regions). When genome features are overlapped (e.g. a region could be annotated as both exon and intron by two different transcripts) , they are prioritize as: CDS exons > UTR exons > Introns > Intergenic regions, for example, if a read was mapped to both CDS exon and intron, it will be assigned to CDS exons.
java -jar picard.jar java -jar picard.jar MergeVcfs Merges multiple VCF or BCF files into one VCF file. Input files must be sorted by their contigs and, within contigs, by start position. The input files must have the same sample and contig lists. An index file is created and a sequence dictionary is required by default.
java -jar picard.jar java -jar picard.jar chrom - The name of the chromosome (e.g. chr20) or scaffold (e.g. scaffold10671) chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered "0" chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99. Converts a BED file to a Picard Interval List. This tool provides easy conversion from BED to the Picard interval_list format which is required by many Picard processing tools. Note that the coordinate system of BED files is such that the first base or position in a sequence is numbered "0", while in interval_list files it is numbered "1".BED files contain sequence data displayed in a flexible format that includes nine optional fields, in addition to three required fields within the annotation tracks. The required fields of a BED file include: