Sam/Bam Manipulation

read_duplication.py: Function: Two strategies were used to determine reads duplication rate: * Sequence based: reads with identical sequence are regarded as duplicated reads. * Mapping based: reads mapped to the exactly same genomic location are regarded as duplicated reads. For splice reads, reads mapped to the same starting position and splice the same way are regarded as duplicated reads.

Usage: read_duplication.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output
java -jar picard.jar: Function: Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments. This tool collects metrics about the percentages of reads that pass base- and mapping- quality filters as well as coverage (read-depth) levels. Both minimum base- and mapping-quality values as well as the maximum read depths (coverage cap) are user defined. This extends CollectWgsMetrics by including metrics related only to siteswith non-zero (>0) coverage.

Usage: java -jar picard.jar CollectWgsMetricsWithNonZeroCoverage I=input.bam O=collect_wgs_metrics.txt CHART=collect_wgs_metrics.pdf R=reference_sequence.fasta
java -jar picard.jar: Function: Reverts SAM or BAM files to a previous state. This tool removes or restores certain properties of the SAM records, including alignment information, which can be used to produce an unmapped BAM (uBAM) from a previously aligned BAM. It is also capable of restoring the original quality scores of a BAM file that has already undergone base quality score recalibration (BQSR) if theoriginal qualities were retained.

Usage: java -jar picard.jar RevertSam I=input.bamO=reverted.bam
java -jar picard.jar: Function: Fixes the NM, MD, and UQ tags in a SAM file. This tool takes in a SAM or BAM file (sorted by coordinate) and calculates the NM, MD, and UQ tags by comparing with the reference.This may be needed when MergeBamAlignment was run with SORT_ORDER different from 'coordinate' and thus could not fix these tags then.

Usage: java -jar picard.jar SetNmMDAndUqTags I=sorted.bam O=fixed.bam \
bamtools: Function: The command bamtools sort sorts a BAM file according to a given option. Output_alignments_sorted.bam is the resulting file, where the alignments are sorted by name.

Usage: bamtools sort -in input_alignments.bam -out output_alignments_sorted.bam -byname
samtools depth: Function: Computes the depth at each position or region.

Usage: samtools depth [options] [in1.sam|in1.bam|in1.cram [in2.sam|in2.bam|in2.cram] [...]]
java -jar picard.jar: Function: Collect metrics about reads that pass quality thresholds and Illumina-specific filters. This tool evaluates the overall quality of reads within a bam file containing one read group. The output indicates the total numbers of bases within a read group that pass a minimum base quality score threshold and (in the case of Illumina data) pass Illumina quality filters as described in the GATK Dictionary entry.

Usage: java -jar picard.jar CollectQualityYieldMetrics I=input.bam O=quality_yield_metrics.txt \
java -jar picard.jar: Function: Asserts the validity for specified Illumina basecalling data.

Usage: java -jar picard.jar CheckIlluminaDirectory BASECALLS_DIR=/BaseCalls/ READ_STRUCTURE=25T8B25T LANES=1 DATA_TYPES=BaseCalls
java -jar picard.jar: Function: Replace read groups in a BAM file.This tool enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file.For more information about read groups, see the GATK Dictionary entry. This tool accepts INPUT BAM and SAM files or URLs from the Global Alliance for Genomics and Health (GA4GH) (see http://ga4gh.org/#/documentation).

Usage: java -jar picard.jar AddOrReplaceReadGroups I=input.bam O=output.bam RGID=4 RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM=20
java -jar picard.jar: Function: Classify PF-Failing reads in a HiSeqX Illumina Basecalling directory into various categories.

Usage: java -jar picard.jar CollectHiSeqXPfFailMetrics BASECALLS_DIR=/BaseCalls/ OUTPUT=/metrics/ LANE=001
java -jar picard.jar: Function: Converts a VCF or BCF file to a Picard Interval List.

Usage: java -jar picard.jar VcfToIntervalList
java -jar picard.jar: Function: Collect metrics to quantify single-base sequencing artifacts.

Usage: java -jar picard.jar CollectSequencingArtifactMetrics I=input.bamO=artifact_metrics.txtR=reference_sequence.fasta
bamtools: Function: prints number of alignments in BAM file(s)

Usage: bamtools count -in <BAM file>
java -jar picard.jar: Function: Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.

Usage: java -jar picard.jar CollectWgsMetrics I=input.bam O=collect_wgs_metrics.txt R=reference_sequence.fasta
bamtools: Function: Merge multiple BAM files into one

Usage: bamtools merge -in input_alignments_1.bam -in input_alignments_2.bam -in input_alignments_3.bam -out output_alignments_merged.bam