Software Usage Function
java -jar picard.jar java -jar picard.jar ConvertSequencingArtifactToOxoG I=artifact_metricsR=reference.fasta Extract OxoG metrics from generalized artifacts metrics.
divide_bam.py divide_bam.py -n 3 -i SingleEnd_StrandSpecific_50mer_Human_hg19.bam -o output Equally divide BAM file (m alignments) into n parts. Each part contains roughly m/n alignments that are randomly sampled from total alignments.
infer_experiment.py infer_experiment.py -r gene_annotation.bed -i input.bam [options] “guess” how RNA-seq sequencing were configured, particulary how reads were stranded for strand-specific RNA-seq data
java -jar picard.jar java -jar picard.jar CreateSequenceDictionary R=reference.fasta O=reference.dict Creates a sequence dictionary for a reference sequence. This tool creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records.The reference sequence can be gzipped (both .fasta and .fasta.gz are supported).
samtools split samtools split [options] merged.sam|merged.bam|merged.cram This tool will generate multiple output datasets for each redagroup from the input dataset.
java -jar picard.jar java -jar picard.jar GatherVcfs Gathers multiple VCF files from a scatter operation into a single VCF file. Input files must be supplied in genomic order and must not have events at overlapping positions.
samtools view samtools view -b -h -o -T [REFERENCE GENOME] [OUTPUT BAM] sorted_input.bam Converts SAM dataset into its sorted binary BAM (samtools > 1.3), or just sort the original bam file (samtools < 1.3).
java -jar picard.jar java -jar picard.jar MarkDuplicatesWithMateCigar I=input.bam O=mark_dups_w_mate_cig.bam M=mark_dups_w_mate_cig_metrics.txt Identifies duplicate reads, accounting for mate CIGAR. This tool locates and tags duplicate reads (both PCR and optical) in a BAM or SAM file, where duplicate reads are defined as originating from the same original fragment of DNA, taking into account the CIGAR string of read mates. It is intended as an improvement upon the original MarkDuplicates algorithm, from which it differs in several ways, includingdifferences in how it breaks ties. It may be the most effective duplicate marking program available, as it handles all cases including clipped and gapped alignments and locates duplicate molecules using mate cigar information. However, please note that it is not yet used in the Broad's production pipeline, so use it at your own risk. Note also that this tool will not work with alignments that have large gaps or deletions, such as those from RNA-seq data. This is due to the need to buffer small genomic windows to ensure integrity of the duplicate marking, while large skips (ex. skipping introns) in the alignment records would force making that window very large, thus exhausting memory.
java -jar picard.jar java -jar picard.jar GatherBamFiles I=input1.bam I=input2.bam O=gathered_files.bam Concatenate one or more BAM files as efficiently as possibleThis tool performs a rapid "gather" operation on BAM files after scatter operations where the same process has been performed on different regions of a BAM file creating many smaller BAM files that now need to be concatenated (reassembled) back together.Assumes that the list of BAM files provided as INPUT are in the order that they should be concatenated and simply concatenates the bodies of the BAM files while retaining the header from the first file. Operates via copying of the gzip blocks directly for speed but also supports generation of an MD5 on the output and indexing of the output BAM file. Only supports BAM files, does not support SAM files.
java -jar picard.jar java -jar picard.jar SortVcf I=vcf_1.vcf I=vcf_2.vcf O=sorted.vcf Sorts one or more VCF files. This tool sorts the records in VCF files according to the order of the contigs in the header/sequence dictionary and then by coordinate. It can accept an external sequence dictionary. If no external dictionary is supplied, the VCF file headers of multiple inputs must have the same sequence dictionaries.If running on multiple inputs (originating from e.g. some scatter-gather runs), the input files must contain the same sample names in the same column order.
java -jar picard.jar java -jar picard.jar MarkDuplicates I=input.bam O=marked_duplicates.bam M=marked_dup_metrics.txt Identifies duplicate reads.
java -jar picard.jar java -jar picard.jar MarkIlluminaAdapters INPUT=input.sam METRICS=metrics.txt Reads a SAM or BAM file and rewrites it with new adapter-trimming tags.
java -jar picard.jar java -jar picard.jar MakeSitesOnlyVcf Reads a VCF/VCF.gz/BCF and removes all genotype information from it while retaining all site level information, including annotations based on genotypes (e.g. AN, AF). Output an be any support variant format including .vcf, .vcf.gz or .bcf.
java -jar picard.jar java -jar picard.jar CollectTargetedPcrMetrics I=input.bam O=pcr_metrics.txt R=reference_sequence.fasta AMPLICON_INTERVALS=amplicon.interval_list TARGET_INTERVALS=targets.interval_list Calculate PCR-related metrics from targeted sequencing data.
java -jar picard.jar java -jar picard.jar RenameSampleInVcf I=input.vcf O=renamed.vcf NEW_SAMPLE_NAME=sample123 Renames a sample within a VCF or BCF. This tool enables the user to rename a sample in either a VCF or BCF file. It is intended to change the name of a sample in a VCF prior to merging with VCF files in which one or more samples have similar names. Note that the input VCF file must be single-sample VCF and that the NEW_SAMPLE_NAME is required.
junction_saturation.py junction_saturation.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -r hg19.refseq.bed12 -o output It’s very important to check if current sequencing depth is deep enough to perform alternative splicing analyses. For a well annotated organism, the number of expressed genes in particular tissue is almost fixed so the number of splice junctions is also fixed. The fixed splice junctions can be predetermined from reference gene model. All (annotated) splice junctions should be rediscovered from a saturated RNA-seq data, otherwise, downstream alternative splicing analysis is problematic because low abundance splice junctions are missing. This module checks for saturation by resampling 5%, 10%, 15%, ..., 95% of total alignments from BAM or SAM file, and then detects splice junctions from each subset and compares them to reference gene model.
java -jar picard.jar java -jar picard.jar CollectInsertSizeMetrics I=input.bam O=insert_size_metrics.txt H=insert_size_histogram.pdf M=0.5
java -jar picard.jar java -jar picard.jar LiftoverVcf I=input.vcfO=lifted_over.vcfCHAIN=b37tohg19.chainREJECT=rejected_variants.vcfR=reference_sequence.fasta Lifts over a VCF file from one reference build to another. This tool adjusts the coordinates of variants within a VCF file to match a new reference. The output file will be sorted and indexed using the target reference build. To be clear, REFERENCE_SEQUENCE should be the target reference build. The tool is based on the UCSC liftOver tool (see: http://genome.ucsc.edu/cgi-bin/hgLiftOver) and uses a UCSC chain file to guide its operation. Note that records may be rejected because they cannot be lifted over or because of sequence incompatibilities between the source and target reference genomes. Rejected records will be emitted with filters to the REJECT file, using the source genome coordinates.
java -jar picard.jar java -jar picard.jar ExtractIlluminaBarcodes BASECALLS_DIR=/BaseCalls/ LANE=1 READ_STRUCTURE=25T8B25T BARCODE_FILE=barcodes.txt METRICS_FILE=metrics_output.txt Tool determines the barcode for each read in an Illumina lane.
CNVnator cnvnator [-genome name] -root out.root [-chrom name1 ...] -tree [file1.bam ...] Extracting read mapping from bam/sam files by using CNVnator