Sam/Bam Manipulation

java -jar picard.jar: Function: Collects Illumina lane metrics for the given BaseCalling analysis directory. This tool produces quality control metrics on cluster density for each lane of an Illumina flowcell. This tool takes Illumina TileMetrics data and places them into directories containing lane- and phasing-level metrics. In this context, phasing refers to the fraction of molecules that fall behind or jump ahead (prephasing) during a read cycle.

Usage: java -jar picard.jar CollectIlluminaLaneMetrics RUN_DIR=test_run OUTPUT_DIRECTORY=Lane_output_metrics OUTPUT_PREFIX=experiment1 READ_STRUCTURE=25T8B25T
java -jar picard.jar: Function: Transforms raw Illumina sequencing data into an unmapped SAM or BAM file.

Usage: java -jar picard.jar IlluminaBasecallsToSam BASECALLS_DIR=/BaseCalls/ LANE=001 READ_STRUCTURE=25T8B25T RUN_BARCODE=run15 IGNORE_UNEXPECTED_BARCODES=true LIBRARY_PARAMS=library.params
java -jar picard.jar: Function: Manipulates interval lists. This tool offers multiple interval list file manipulation capabilities include sorting, merging, subtracting, padding, customizing, and other set-theoretic operations. If given one or more inputs, the default operation is to merge and sort them. Other options e.g. interval subtraction are controlled by the arguments. The tool lists intervals with respect to a reference sequence.Both interval_list and VCF files are accepted as input. The interval_list file format is relatively simple and reflects the SAM alignment format to a degree. A SAM style header must be present in the file that lists the sequence records against which the intervals are described. After the header, the file then contains records, one per line in text format with the following values tab-separated:

Usage: java -jar picard.jar -Sequence name (SN) -Start position (1-based)** -End position (1-based, end inclusive) -Strand (either + or -) -Interval name (ideally unique names for intervals)
bam2fq.py: Function: Convert alignments in BAM or SAM format into fastq format.

Usage: bam2fq.py -i test_PairedEnd_StrandSpecific_hg19.sam -o bam2fq_out1
java -jar picard.jar: Function: Collect mean quality by cycle.This tool generates a data table and chart of mean quality by cycle from a BAM file. It is intended to be used on a single lane or a read group's worth of data, but can be applied to merged BAMs if needed. This metric gives an overall snapshot of sequencing machine performance. For most types of sequencing data, the output is expected to show a slight reduction in overall base quality scores towards the end of each read. Spikes in quality within reads are not expected and may indicate that technical problems occurred during sequencing.

Usage: java -jar picard.jar MeanQualityByCycle I=input.bam O=mean_qual_by_cycle.txt CHART=mean_qual_by_cycle.pdf
bamtools: Function: Print coverage data for a single BAM file

Usage: bamtools coverage -in <BAM file>
samtools quickcheck: Function: Quickly check that input files appear to be intact. Checks that beginning of the file contains a valid header (all formats) containing at least one target sequence and then seeks to the end of the file and checks that an end-of-file (EOF) is present and intact (BAM only).

Usage: samtools quickcheck [options] in.sam|in.bam|in.cram [ ... ]
bamtools: Function: The command bamtools revert removes duplicate marks and restores original base qualities.

Usage: bamtools revert -in input_alignments.bam -out output_alignments_reverted.bam
java -jar picard.jar: Function: Collect jumping library metrics.

Usage: java -jar picard.jar CollectJumpingLibraryMetrics I=input.bam O=jumping_metrics.txt
java -jar picard.jar: Function: Creates a hash code based on the read groups (RG). This tool creates a hash code based on identifying information in the read groups (RG) of a ".BAM" or "SAM" file header. Addition or removal of RGs changes the hash code, enabling the user to quickly determine if changes have been made to the read group information.

Usage: java -jar picard.jar CalculateReadGroupChecksum I=input.bam
java -jar picard.jar: Function: Normalizes lines of sequence in a FASTA file to be of the same length.This tool takes any FASTA-formatted file and reformats the sequence to ensure that all of the sequence record lines are of the same length (with the exception of the last line). Although the default setting is 100 bases per line, a custom line_length can be specified by the user. In addition, record names can be truncated at the first instance of a whitespace character to ensure downstream compatibility.

Usage: java -jar picard.jar NormalizeFasta I=input_sequence.fasta O=normalized_sequence.fasta
java -jar picard.jar: Function: Sorts a SAM or BAM file. This tool sorts the input SAM or BAM file by coordinate, queryname (QNAME), or some other property of the SAM record. The SortOrder of a SAM/BAM file is found in the SAM file header tag @HD in the field labeled SO.

Usage: java -jar picard.jar SortSam I=input.bam O=sorted.bam SORT_ORDER=coordinate
samtools calmd: Function: Generate the MD tag. If the MD tag is already present, this command will give a warning if the MD tag generated is different from the existing tag. Calmd can also read and write CRAM files although in most cases it is pointless as CRAM recalculates MD and NM tags on the fly. The one exception to this case is where both input and output CRAM files have been / are being created with the no_ref option.

Usage: samtools calmd [-eubrAESQ] <aln.bam> <ref.fasta>

Supported input format: BAM, CRAM, SAM
java -jar picard.jar: Function: Designs oligonucleotide baits for hybrid selection reactions.

Usage: java -jar picard.jar BaitDesigner TARGET=targets.interval_list DESIGN_NAME=new_baits R=reference_sequence.fasta
java -jar picard.jar: Function: Merges multiple SAM and/or BAM files into a single file. This tool is used for combining SAM and/or BAM files from different runs or read groups, similarly to the "merge" function of Samtools (http://www.htslib.org/doc/samtools.html). Note that to prevent errors in downstream processing, it is critical to identify/label read groups appropriately. If different samples contain identical read group IDs, this tool will avoid collisions by modifying the read group IDs to be unique. For more information about read groups, see the GATK Dictionary entry.

Usage: java -jar picard.jar MergeSamFiles I=input_1.bam I=input_2.bam O=merged_files.bam