Sam/Bam Manipulation

java -jar picard.jar: Function: Provides a large, configurable, FIFO buffer that can be used to buffer input and output streams between programs with a buffer size that is larger than that offered by native unix FIFOs (usually 64k).

Usage: java -jar picard.jar FifoBuffer
samtools index: Function: Index a coordinate-sorted SAM/BAM/CRAM files for fast random access.

Usage: samtools index [-bc] [-m INT] aln.bam|aln.cram [out.index]

Supported input format: BAM, CRAM, SAM
java -jar picard.jar: Function: Applies one or more hard filters to a VCF file to filter out genotypes and variants.

Usage: java -jar picard.jar FilterVcf
java -jar picard.jar: Function: Finds mendelian violations of all types within a VCF. Takes in VCF or BCF and a pedigree file and looks for high confidence calls where the genotype of the offspring is incompatible with the genotypes of the parents. Assumes the existence of format fields AD, DP, GT, GQ, and PL fields. Take note that the implementation assumes that reads from the PAR will be mapped to the female chromosomerather than the male. This requires that the PAR in the male chromosome be masked so that the aligner has a single coting to map to. This is normally done for the public releases of the human reference. Usage example: java -jar picard.jar FindMendelianViolations I=input.vcf \ TRIO=family.ped \ OUTPUT=mendelian.txt \ MIN_DP=20

Usage: java -jar picard.jar FindMendelianViolations
java -jar picard.jar: Function: Subset read data from a SAM or BAM fileThis tool takes a SAM or BAM file and subsets it to a new file that either excludes or only includes either aligned or unaligned reads (set using FILTER), or specific reads based on a list of reads names supplied in the READ_LIST_FILE.

Usage: java -jar picard.jar FilterSamReads I=input.bam O=output.bam READ_LIST_FILE=read_names.txt FILTER=filter_value
java -jar picard.jar: Function: Collect multiple classes of metrics. This 'meta-metrics' tool runs one or more of the metrics collection modules at the same time to cut down on the time spent reading in data from input files. Available modules include CollectAlignmentSummaryMetrics, CollectInsertSizeMetrics, QualityScoreDistribution, MeanQualityByCycle, CollectBaseDistributionByCycle, CollectGcBiasMetrics, RnaSeqMetrics, CollectSequencingArtifactMetrics, and CollectQualityYieldMetrics. The tool produces outputs of '.pdf' and '.txt' files for each module, except for the CollectAlignmentSummaryMetrics module, which outputs only a '.txt' file. Output files are named by specifying a base name (without any file extensions).

Usage: java -jar picard.jar CollectMultipleMetrics I=input.bam O=multiple_metrics R=reference_sequence.fasta
java -jar picard.jar: Function: Converts a FASTQ file to an unaligned BAM or SAM file. This tool extracts read sequences and base qualities from the input FASTQ file and writes them out to a new file in unaligned BAM (uBAM) format. Read group information can be provided on the command line. Three versions of FASTQ quality scales are supported: FastqSanger, FastqSolexa and FastqIllumina (see http://maq.sourceforge.net/fastq.shtml for details). Input FASTQ files can be in GZip format (with .gz extension).

Usage: java -jar picard.jar FastqToSam F1=file_1.fastq O=fastq_to_bam.bam SM=for_tool_testing
CNVnator: Function: Extracting read mapping from bam/sam files by using CNVnator

Usage: cnvnator [-genome name] -root out.root [-chrom name1 ...] -tree [file1.bam ...]
samtools sort: Function: This tool uses samtools sort command to sort BAM datasets in coordinate or read name order.

Usage: samtools sort [-l level] [-m maxMem] [-o out.bam] [-O format] [-n] [-T tmpprefix] [-@ threads] [in.sam|in.bam|in.cram]
read_distribution.py: Function: Provided a BAM/SAM file and reference gene model, this module will calculate how mapped reads were distributed over genome feature (like CDS exon, 5’UTR exon, 3’ UTR exon, Intron, Intergenic regions). When genome features are overlapped (e.g. a region could be annotated as both exon and intron by two different transcripts) , they are prioritize as: CDS exons > UTR exons > Introns > Intergenic regions, for example, if a read was mapped to both CDS exon and intron, it will be assigned to CDS exons.

Usage: read_distribution.py -i Pairend_StrandSpecific_51mer_Human_hg19.bam -r hg19.refseq.bed12
java -jar picard.jar: Function: Estimates the numbers of unique molecules in a sequencing library.

Usage: java -jar picard.jar EstimateLibraryComplexity I=input.bamO=est_lib_complex_metrics.txt
java -jar picard.jar: Function: Identifies duplicate reads using information from read positions and UMIs.

Usage: java -jar picard.jar UmiAwareMarkDuplicatesWithMateCigar
java -jar picard.jar: Function: Generate FASTQ file(s) from Illumina basecall read data.

Usage: java -jar picard.jar IlluminaBasecallsToFastq READ_STRUCTURE=25T8B25T BASECALLS_DIR=basecallDirectory LANE=001 OUTPUT_PREFIX=noBarcode.1 RUN_BARCODE=run15 FLOWCELL_BARCODE=abcdeACXX
java -jar picard.jar: Function: Verify mate-pair information between mates and fix if needed.This tool ensures that all mate-pair information is in sync between each read and its mate pair. If no OUTPUT file is supplied then the output is written to a temporary file and then copied over the INPUT file. Reads marked with the secondary alignment flag are written to the output file unchanged.

Usage: java -jar picard.jar FixMateInformation I=input.bam O=fixed_mate.bam
java -jar picard.jar: Function: Evaluate genotype concordance between callsets.This tool evaluates the concordance between genotype calls for samples in different callsets where one is being considered as the truth (aka standard, or reference) and the other as the call that is being evaluated for accuracy.

Usage: java -jar picard.jar GenotypeConcordance CALL_VCF=input.vcf CALL_SAMPLE=sample_name O=gc_concordance.vcf TRUTH_VCF=truth_set.vcf TRUTH_SAMPLE=truth_sample#