Sam/Bam Manipulation

java -jar picard.jar: Function: Computes a number of metrics that are useful for evaluating coverage and performance of whole genome sequencing experiments, but only at a set of sampled positions. It is important that the sampled positions be chosen so that they are spread out at least further than a read's length apart; otherwise, you run the risk of double-counting reads in the metrics. If contig-sized intervals are needed, use INTERVALS argument in CollectWgsMetrics.

Usage: java -jar picard.jar CollectWgsMetricsFromSampledSites
java -jar picard.jar: Function: Converts VCF to BCF or BCF to VCF. This tool converts files between the plain-text VCF format and its binary compressed equivalent, BCF. Input and output formats are determined by file extensions specified in the file names. For best results, it is recommended to ensure that an index file is present and set the REQUIRE_INDEX option to true.

Usage: java -jar picard.jar VcfFormatConverter I=input.vcf O=output.bcf REQUIRE_INDEX=true
java -jar picard.jar: Function: Generate index statistics from a BAM fileThis tool calculates statistics from a BAM index (.bai) file, emulating the behavior of the "samtools idxstats" command. The statistics collected include counts of aligned and unaligned reads as well as all records with no start coordinate. The input to the tool is the BAM file name but it must be accompanied by a corresponding index file.

Usage: java -jar picard.jar BamIndexStats I=input.bam O=output
java -jar picard.jar: Function: Chart the nucleotide distribution per cycle in a SAM or BAM fileThis tool produces a chart of the nucleotide distribution per cycle in a SAM or BAM file in order to enable assessment of systematic errors at specific positions in the reads.

Usage: java -jar picard.jar CollectBaseDistributionByCycle CHART=collect_base_dist_by_cycle.pdf I=input.bam O=output.txt
java -jar picard.jar: Function: Computes a number of metrics that are useful for evaluating coverage and performance of sequencing experiments.

Usage: java -jar picard.jar CollectWgsMetricsFromQuerySorted
samtools targetcut: Function: This command identifies target regions by examining the continuity of read depth, computes haploid consensus sequences of targets and outputs a SAM with each sequence corresponding to a target. When option -f is in use, BAQ will be applied. This command is only designed for cutting fosmid clones from fosmid pool sequencing [Ref. Kitzman et al. (2010)].

Usage: samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f ref] <in.bam>
java -jar picard.jar: Function: Class to downsample a BAM file while respecting that we should either get rid of both ends of a pair or neither end of the pair. In addition, this program uses the read-name and extracts the position within the tile whence the read came from. The downsampling is based on this position. Results with the exact same input will produce the same results. Note 1: This is technology and read-name dependent. If your read-names do not have coordinate information, or if your BAM contains reads from multiple technologies (flowcell versions, sequencing machines) this will not work properly. This has been designed with Illumina MiSeq/HiSeq in mind. Note 2: The downsampling is not random. It is deterministically dependent on the position of the read within its tile. Note 3: Downsampling twice with this program is not supported. Note 4: You should call MarkDuplicates after downsampling. Finally, the code has been designed to simulate sequencing less as accurately as possible, not for getting an exact downsample fraction. In particular, since the reads may be distributed non-evenly within the lanes/tiles, the resulting downsampling percentage will not be accurately determined by the input argument FRACTION.

Usage: java -jar picard.jar PositionBasedDownsampleSam
java -jar picard.jar: Function: Writes an interval list based on splitting a reference by Ns. This tool identifies positions in a reference where the bases are 'no-calls' and writes out an interval-list using the resulting coordinates. This can be used to create an interval list for whole genome sequence (WGS) for e.g. scatter-gather purposes, as an alternative to using fixed-length intervals. The number of contiguous nocalls that can be tolerated before creating a break is adjustable from the command line.

Usage: java -jar picard.jar ScatterIntervalsByNs R=reference_sequence.fasta OT=BOTH O=output.interval_list
java -jar picard.jar: Function: Convert a BAM file to a SAM file, or SAM to BAM. Input and output formats are determined by file extension.

Usage: java -jar picard.jar SamFormatConverter
junction_annotation.py: Function: For a given alignment file (-i) in BAM or SAM format and a reference gene model (-r) in BED format, this program will compare detected splice junctions to reference gene model. splicing annotation is performed in two levels: splice event level and splice junction level.

Usage: junction_annotation.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output -r hg19.refseq.bed12
samtools idxstats: Function: It retrieves and prints stats in the index file.

Usage: samtools idxstats in.sam|in.bam|in.cram
bam2fq.py: Function: Convert alignments in BAM or SAM format into fastq format.

Usage: bam2fq.py -i test_SingleEnd_StrandSpecific_hg19.bam -s -o bam2fq_out2
java -jar picard.jar: Function: Takes a VCF and a second file that contains a sequence dictionary and updates the VCF with the new sequence dictionary.

Usage: java -jar picard.jar UpdateVcfSequenceDictionary
java -jar picard.jar: Function: Replaces the SAMFileHeader in a SAM or BAM file. This tool makes it possible to replace the header of a SAM or BAM file with the header of anotherfile, or a header block that has been edited manually (in a stub SAM file). The sort order (@SO) of the two input files must be the same.Note that validation is minimal, so it is up to the user to ensure that all the elements referred to in the SAMRecords are present in the new header.

Usage: java -jar picard.jar ReplaceSamHeader I=input_1.bam HEADER=input_2.bam O=bam_with_new_head.bam
samtools cat: Function: Concatenate BAMs. The sequence dictionary of each input BAM must be identical, although this command does not check this. This command uses a similar trick to reheader which enables fast BAM concatenation.

Usage: samtools cat [-h header.sam] [-o out.bam] <in1.bam> <in2.bam> [ ... ]