Sam/Bam Manipulation

java -jar picard.jar
Function: Asserts the validity for specified Illumina basecalling data.
Usage: java -jar picard.jar CheckIlluminaDirectory BASECALLS_DIR=/BaseCalls/ READ_STRUCTURE=25T8B25T LANES=1 DATA_TYPES=BaseCalls
java -jar picard.jar
Function: Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.
Usage: java -jar picard.jar CollectWgsMetrics I=input.bam O=collect_wgs_metrics.txt R=reference_sequence.fasta
java -jar picard.jar
Function: Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments. This tool collects metrics about the percentages of reads that pass base- and mapping- quality filters as well as coverage (read-depth) levels. Both minimum base- and mapping-quality values as well as the maximum read depths (coverage cap) are user defined. This extends CollectWgsMetrics by including metrics related only to siteswith non-zero (>0) coverage.
Usage: java -jar picard.jar CollectWgsMetricsWithNonZeroCoverage I=input.bam O=collect_wgs_metrics.txt CHART=collect_wgs_metrics.pdf R=reference_sequence.fasta
java -jar picard.jar
Function: Class to downsample a BAM file while respecting that we should either get rid of both ends of a pair or neither end of the pair. In addition, this program uses the read-name and extracts the position within the tile whence the read came from. The downsampling is based on this position. Results with the exact same input will produce the same results. Note 1: This is technology and read-name dependent. If your read-names do not have coordinate information, or if your BAM contains reads from multiple technologies (flowcell versions, sequencing machines) this will not work properly. This has been designed with Illumina MiSeq/HiSeq in mind. Note 2: The downsampling is not random. It is deterministically dependent on the position of the read within its tile. Note 3: Downsampling twice with this program is not supported. Note 4: You should call MarkDuplicates after downsampling. Finally, the code has been designed to simulate sequencing less as accurately as possible, not for getting an exact downsample fraction. In particular, since the reads may be distributed non-evenly within the lanes/tiles, the resulting downsampling percentage will not be accurately determined by the input argument FRACTION.
Usage: java -jar picard.jar PositionBasedDownsampleSam
java -jar picard.jar
Function: Collect metrics to quantify single-base sequencing artifacts.
Usage: java -jar picard.jar CollectSequencingArtifactMetrics I=input.bamO=artifact_metrics.txtR=reference_sequence.fasta
samtools depth
Function: Computes the depth at each position or region.
Usage: samtools depth [options] [in1.sam|in1.bam|in1.cram [in2.sam|in2.bam|in2.cram] [...]]
java -jar picard.jar
Function: Reverts SAM or BAM files to a previous state. This tool removes or restores certain properties of the SAM records, including alignment information, which can be used to produce an unmapped BAM (uBAM) from a previously aligned BAM. It is also capable of restoring the original quality scores of a BAM file that has already undergone base quality score recalibration (BQSR) if theoriginal qualities were retained.
Usage: java -jar picard.jar RevertSam I=input.bamO=reverted.bam
java -jar picard.jar
Function: Converts a VCF or BCF file to a Picard Interval List.
Usage: java -jar picard.jar VcfToIntervalList
bamtools
Function: prints number of alignments in BAM file(s)
Usage: bamtools count -in <BAM file>
bamtools
Function: The command bamtools sort sorts a BAM file according to a given option. Output_alignments_sorted.bam is the resulting file, where the alignments are sorted by name.
Usage: bamtools sort -in input_alignments.bam -out output_alignments_sorted.bam -byname
java -jar picard.jar
Function: Generate index statistics from a BAM fileThis tool calculates statistics from a BAM index (.bai) file, emulating the behavior of the "samtools idxstats" command. The statistics collected include counts of aligned and unaligned reads as well as all records with no start coordinate. The input to the tool is the BAM file name but it must be accompanied by a corresponding index file.
Usage: java -jar picard.jar BamIndexStats I=input.bam O=output
java -jar picard.jar
Function: Collect metrics about reads that pass quality thresholds and Illumina-specific filters. This tool evaluates the overall quality of reads within a bam file containing one read group. The output indicates the total numbers of bases within a read group that pass a minimum base quality score threshold and (in the case of Illumina data) pass Illumina quality filters as described in the GATK Dictionary entry.
Usage: java -jar picard.jar CollectQualityYieldMetrics I=input.bam O=quality_yield_metrics.txt \
bamtools
Function: filters BAM file(s)
Usage: bamtools filter -in <BAM file> -out <BAM file> -length 100
samtools targetcut
Function: This command identifies target regions by examining the continuity of read depth, computes haploid consensus sequences of targets and outputs a SAM with each sequence corresponding to a target. When option -f is in use, BAQ will be applied. This command is only designed for cutting fosmid clones from fosmid pool sequencing [Ref. Kitzman et al. (2010)].
Usage: samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f ref] <in.bam>
java -jar picard.jar
Function: Computes a number of metrics that are useful for evaluating coverage and performance of sequencing experiments.
Usage: java -jar picard.jar CollectWgsMetricsFromQuerySorted