Sam/Bam Manipulation

read_duplication.py
Function: Two strategies were used to determine reads duplication rate: * Sequence based: reads with identical sequence are regarded as duplicated reads. * Mapping based: reads mapped to the exactly same genomic location are regarded as duplicated reads. For splice reads, reads mapped to the same starting position and splice the same way are regarded as duplicated reads.
Usage: read_duplication.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output
java -jar picard.jar
Function: Prints a SAM or BAM file to the screen.
Usage: java -jar picard.jar ViewSam
java -jar picard.jar
Function: Manipulates interval lists. This tool offers multiple interval list file manipulation capabilities include sorting, merging, subtracting, padding, customizing, and other set-theoretic operations. If given one or more inputs, the default operation is to merge and sort them. Other options e.g. interval subtraction are controlled by the arguments. The tool lists intervals with respect to a reference sequence.Both interval_list and VCF files are accepted as input. The interval_list file format is relatively simple and reflects the SAM alignment format to a degree. A SAM style header must be present in the file that lists the sequence records against which the intervals are described. After the header, the file then contains records, one per line in text format with the following values tab-separated:
Usage: java -jar picard.jar -Sequence name (SN) -Start position (1-based)** -End position (1-based, end inclusive) -Strand (either + or -) -Interval name (ideally unique names for intervals)
java -jar picard.jar
Function: Collects metrics from reduced representation bisulfite sequencing (Rrbs) data.
Usage: java -jar picard.jar CollectRrbsMetrics R=reference_sequence.fasta I=input.bam M=basename_for_metrics_files
java -jar picard.jar
Function: Normalizes lines of sequence in a FASTA file to be of the same length.This tool takes any FASTA-formatted file and reformats the sequence to ensure that all of the sequence record lines are of the same length (with the exception of the last line). Although the default setting is 100 bases per line, a custom line_length can be specified by the user. In addition, record names can be truncated at the first instance of a whitespace character to ensure downstream compatibility.
Usage: java -jar picard.jar NormalizeFasta I=input_sequence.fasta O=normalized_sequence.fasta
java -jar picard.jar
Function: Collect jumping library metrics.
Usage: java -jar picard.jar CollectJumpingLibraryMetrics I=input.bam O=jumping_metrics.txt
java -jar picard.jar
Function: Creates a hash code based on the read groups (RG). This tool creates a hash code based on identifying information in the read groups (RG) of a ".BAM" or "SAM" file header. Addition or removal of RGs changes the hash code, enabling the user to quickly determine if changes have been made to the read group information.
Usage: java -jar picard.jar CalculateReadGroupChecksum I=input.bam
bamtools
Function: Print coverage data for a single BAM file
Usage: bamtools coverage -in <BAM file>
java -jar picard.jar
Function: Collect mean quality by cycle.This tool generates a data table and chart of mean quality by cycle from a BAM file. It is intended to be used on a single lane or a read group's worth of data, but can be applied to merged BAMs if needed. This metric gives an overall snapshot of sequencing machine performance. For most types of sequencing data, the output is expected to show a slight reduction in overall base quality scores towards the end of each read. Spikes in quality within reads are not expected and may indicate that technical problems occurred during sequencing.
Usage: java -jar picard.jar MeanQualityByCycle I=input.bam O=mean_qual_by_cycle.txt CHART=mean_qual_by_cycle.pdf
java -jar picard.jar
Function: Splits SNPs and INDELs into separate files. This tool reads in a VCF or BCF file and writes out the SNPs and INDELs it contains to separate files. The headers of the two output files will be identical and index files will be created for both outputs. If records other than SNPs or INDELs are present, set the STRICT option to "false", otherwise the tool will raise an exception and quit.
Usage: java -jar picard.jar SplitVcfs I=input.vcf SNP_OUTPUT=snp.vcf INDEL_OUTPUT=indel.vcf STRICT=false
samtools fasta
Function: Converts a BAM or CRAM into either FASTQ or FASTA format depending on the command invoked.
Usage: samtools fasta [options] in.bam
java -jar picard.jar
Function: Fixes the NM, MD, and UQ tags in a SAM file. This tool takes in a SAM or BAM file (sorted by coordinate) and calculates the NM, MD, and UQ tags by comparing with the reference.This may be needed when MergeBamAlignment was run with SORT_ORDER different from 'coordinate' and thus could not fix these tags then.
Usage: java -jar picard.jar SetNmMDAndUqTags I=sorted.bam O=fixed.bam \
bamtools
Function: Merge multiple BAM files into one
Usage: bamtools merge -in input_alignments_1.bam -in input_alignments_2.bam -in input_alignments_3.bam -out output_alignments_merged.bam
java -jar picard.jar
Function: Merges multiple SAM and/or BAM files into a single file. This tool is used for combining SAM and/or BAM files from different runs or read groups, similarly to the "merge" function of Samtools (http://www.htslib.org/doc/samtools.html). Note that to prevent errors in downstream processing, it is critical to identify/label read groups appropriately. If different samples contain identical read group IDs, this tool will avoid collisions by modifying the read group IDs to be unique. For more information about read groups, see the GATK Dictionary entry.
Usage: java -jar picard.jar MergeSamFiles I=input_1.bam I=input_2.bam O=merged_files.bam
java -jar picard.jar
Function: Classify PF-Failing reads in a HiSeqX Illumina Basecalling directory into various categories.
Usage: java -jar picard.jar CollectHiSeqXPfFailMetrics BASECALLS_DIR=/BaseCalls/ OUTPUT=/metrics/ LANE=001