java -jar picard.jar |
java -jar picard.jar -Sequence name (SN) -Start position (1-based)** -End position (1-based, end inclusive) -Strand (either + or -) -Interval name (ideally unique names for intervals) |
Manipulates interval lists. This tool offers multiple interval list file manipulation capabilities include sorting, merging, subtracting, padding, customizing, and other set-theoretic operations. If given one or more inputs, the default operation is to merge and sort them. Other options e.g. interval subtraction are controlled by the arguments. The tool lists intervals with respect to a reference sequence.Both interval_list and VCF files are accepted as input. The interval_list file format is relatively simple and reflects the SAM alignment format to a degree. A SAM style header must be present in the file that lists the sequence records against which the intervals are described. After the header, the file then contains records, one per line in text format with the following values tab-separated: |
bamtools |
bamtools coverage -in <BAM file> |
Print coverage data for a single BAM file |
java -jar picard.jar |
java -jar picard.jar CollectIlluminaLaneMetrics RUN_DIR=test_run OUTPUT_DIRECTORY=Lane_output_metrics OUTPUT_PREFIX=experiment1 READ_STRUCTURE=25T8B25T |
Collects Illumina lane metrics for the given BaseCalling analysis directory. This tool produces quality control metrics on cluster density for each lane of an Illumina flowcell. This tool takes Illumina TileMetrics data and places them into directories containing lane- and phasing-level metrics. In this context, phasing refers to the fraction of molecules that fall behind or jump ahead (prephasing) during a read cycle. |
java -jar picard.jar |
java -jar picard.jar ViewSam |
Prints a SAM or BAM file to the screen. |
java -jar picard.jar |
java -jar picard.jar ReorderSam |
Not to be confused with SortSam which sorts a SAM or BAM file with a valid sequence dictionary, ReorderSam reorders reads in a SAM/BAM file to match the contig ordering in a provided reference file, as determined by exact name matching of contigs. Reads mapped to contigs absent in the new reference are dropped. Runs substantially faster if the input is an indexed BAM file. |
samtools merge |
samtools merge [-nur1f] [-h inh.sam] [-R reg] [-b <list>] <out.bam> <in1.bam> [<in2.bam> <in3.bam> ... <inN.bam>] |
Merge multiple sorted alignment files, producing a single sorted output file that contains all the input records and maintains the existing sort order. |
java -jar picard.jar |
java -jar picard.jar MergeBamAlignment ALIGNED=aligned.bam UNMAPPED=unmapped.bam O=merge_alignments.bam R=reference_sequence.fasta |
Merge alignment data from a SAM or BAM with data in an unmapped BAM file. This tool produces a new SAM or BAM file that includes all aligned and unaligned reads and also carries forward additional read attributes from the unmapped BAM (attributes that are otherwise lost in the process of alignment). The purpose of this tool is to use information from the unmapped BAM to fix up aligner output. The resulting file will be valid for use by other Picard tools. For simple BAM file merges, use MergeSamFiles. Note that MergeBamAlignment expects to find a sequence dictionary in the same directory as REFERENCE_SEQUENCE and expects it to have the same base name as the reference FASTA except with the extension ".dict". If the output sort order is not coordinate, then reads that are clipped due to adapters or overlapping will not contain the NM, MD, or UQ tags. |
java -jar picard.jar |
java -jar picard.jar CollectJumpingLibraryMetrics I=input.bam O=jumping_metrics.txt |
Collect jumping library metrics. |
java -jar picard.jar |
java -jar picard.jar NormalizeFasta I=input_sequence.fasta O=normalized_sequence.fasta |
Normalizes lines of sequence in a FASTA file to be of the same length.This tool takes any FASTA-formatted file and reformats the sequence to ensure that all of the sequence record lines are of the same length (with the exception of the last line). Although the default setting is 100 bases per line, a custom line_length can be specified by the user. In addition, record names can be truncated at the first instance of a whitespace character to ensure downstream compatibility. |
java -jar picard.jar |
java -jar picard.jar CollectRnaSeqMetrics I=input.bam O=output.RNA_Metrics REF_FLAT=ref_flat.txt STRAND=SECOND_READ_TRANSCRIPTION_STRAND RIBOSOMAL_INTERVALS=ribosomal.interval_list |
Produces RNA alignment metrics for a SAM or BAM file. |
java -jar picard.jar |
java -jar picard.jar SplitVcfs I=input.vcf SNP_OUTPUT=snp.vcf INDEL_OUTPUT=indel.vcf STRICT=false |
Splits SNPs and INDELs into separate files. This tool reads in a VCF or BCF file and writes out the SNPs and INDELs it contains to separate files. The headers of the two output files will be identical and index files will be created for both outputs. If records other than SNPs or INDELs are present, set the STRICT option to "false", otherwise the tool will raise an exception and quit. |
read_duplication.py |
read_duplication.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output |
Two strategies were used to determine reads duplication rate:
* Sequence based: reads with identical sequence are regarded as duplicated reads.
* Mapping based: reads mapped to the exactly same genomic location are regarded as duplicated reads. For splice reads, reads mapped to the same starting position and splice the same way are regarded as duplicated reads. |
bamtools |
bamtools count -in <BAM file> |
prints number of alignments in BAM file(s) |
java -jar picard.jar |
java -jar picard.jar MeanQualityByCycle I=input.bam O=mean_qual_by_cycle.txt CHART=mean_qual_by_cycle.pdf |
Collect mean quality by cycle.This tool generates a data table and chart of mean quality by cycle from a BAM file. It is intended to be used on a single lane or a read group's worth of data, but can be applied to merged BAMs if needed. This metric gives an overall snapshot of sequencing machine performance. For most types of sequencing data, the output is expected to show a slight reduction in overall base quality scores towards the end of each read. Spikes in quality within reads are not expected and may indicate that technical problems occurred during sequencing. |
samtools collate |
samtools collate [options] in.sam|in.bam|in.cram [out.prefix] |
Shuffles and groups reads together by their names. A faster alternative to a full query name sort, collate ensures that reads of the same name are grouped together in contiguous groups, but doesn't make any guarantees about the order of read names between groups. The output from this command should be suitable for any operation that requires all reads from the same template to be grouped together. |
samtools view |
samtools view -o [OUTPUT SAM] [-h|-H] [INPUT BAM] |
Converts BAM dataset to SAM |
samtools fasta |
samtools fasta [options] in.bam |
Converts a BAM or CRAM into either FASTQ or FASTA format depending on the command invoked. |
java -jar picard.jar |
java -jar picard.jar CollectHiSeqXPfFailMetrics BASECALLS_DIR=/BaseCalls/ OUTPUT=/metrics/ LANE=001 |
Classify PF-Failing reads in a HiSeqX Illumina Basecalling directory into various categories. |
java -jar picard.jar |
java -jar picard.jar MergeSamFiles I=input_1.bam I=input_2.bam O=merged_files.bam |
Merges multiple SAM and/or BAM files into a single file. This tool is used for combining SAM and/or BAM files from different runs or read groups, similarly to the "merge" function of Samtools (http://www.htslib.org/doc/samtools.html). Note that to prevent errors in downstream processing, it is critical to identify/label read groups appropriately. If different samples contain identical read group IDs, this tool will avoid collisions by modifying the read group IDs to be unique. For more information about read groups, see the GATK Dictionary entry. |
java -jar picard.jar |
java -jar picard.jar CalculateReadGroupChecksum I=input.bam |
Creates a hash code based on the read groups (RG). This tool creates a hash code based on identifying information in the read groups (RG) of a ".BAM" or "SAM" file header. Addition or removal of RGs changes the hash code, enabling the user to quickly determine if changes have been made to the read group information. |