Reads Manipulation


multiBamSummary BED-file --BED selection.bed --bamfiles file1.bam file2.bam -out results.npz


multiBamSummary is a tool from the deepTools suite. The information on this page is based on deepTools version 3.5.1.

The page provides instructions on using multiBamSummary among pre-defined genomic regions; if you want to use it with bins across the entire genome, please refer to this page.

Required arguments

  • --bamfiles FILE1 FILE2 [FILE1 FILE2 ...], -b FILE1 FILE2 [FILE1 FILE2 ...]: List of indexed bam files separated by spaces.
  • --outFileName OUTFILENAME, -out OUTFILENAME, -o OUTFILENAME: File name to save the coverage matrix. This matrix can be subsequently plotted using plotCorrelation or or plotPCA.
  • --BED FILE1.bed FILE2.bed [FILE1.bed FILE2.bed ...]: Limits the coverage analysis to the regions specified in these files.


General options
  • --labels sample1 sample2 [sample1 sample2 ...], -l sample1 sample2 [sample1 sample2 ...]: User defined labels instead of default labels from file names. Multiple labels have to be separated by a space, e.g. --labels sample1 sample2 sample3.
  • --smartLabels: Instead of manually specifying labels for the input BAM files, this causes deepTools to use the file name after removing the path and extension. (default: False)
  • --genomeChunkSize GENOMECHUNKSIZE: Manually specify the size of the genome provided to each processor. The default value of None specifies that this is determined by read density of the BAM file. (default: None)
  • --region CHR:START:END, -r CHR:START:END: Region of the genome to limit the operation to. This is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000.
  • --blackListFileName BED file [BED file ...], -bl BED file [BED file ...]: A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant.
  • --numberOfProcessors INT, -p INT: Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (Default: 1)
  • --version: show program's version number and exit
  • --verbose, -v: Set to see processing messages. (default: False)
  • --help, -h: show this help message and exit
Output optional options
  • --outRawCounts FILE: Save the counts per region to a tab-delimited file.
  • --scalingFactors FILE: Compute scaling factors (in the DESeq2 manner) compatible for use with bamCoverage and write them to a file. The file has tab-separated columns "sample" and "scalingFactor".
Read processing options
  • --extendReads [INT bp], -e [INT bp]: This parameter allows the extension of reads to fragment size. If set, each read is extended, without exception. NOTE: This feature is generally NOT recommended for spliced-read data, such as RNA-seq, as it would extend reads over skipped regions.
    • Single-end: Requires a user specified value for the final fragment length. Reads that already exceed this fragment length will not be extended.
    • Paired-end: Reads with mates are always extended to match the fragment size defined by the two read mates. Unmated reads, mate reads that map too far apart (>4x fragment length) or even map to different chromosomes are treated like single-end reads. The input of a fragment length value is optional. If no value is specified, it is estimated from the data (mean of the fragment size of all mate reads). (default: False)
  • --ignoreDuplicates: If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate's position also has to coincide to ignore a read. (default: False)
  • --minMappingQuality INT: If set, only reads that have a mapping quality score of at least this are considered. (default: None)
  • --centerReads: By adding this option, reads are centered with respect to the fragment length. For paired-end data, the read is centered at the fragment length defined by the two ends of the fragment. For single-end data, the given fragment length is used. This option is useful to get a sharper signal around enriched regions. (default: False)
  • --samFlagInclude INT: Include reads based on the SAM flag. For example, to get only reads that are the first mate, use a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage.
  • --samFlagExclude INT: Exclude reads based on the SAM flag. For example, to get only reads that map to the forward strand, use --samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand. (Default: None)
  • --minFragmentLength INT: The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. (Default: 0)
  • --maxFragmentLength INT: The maximum fragment length needed for read/pair inclusion. (Default: 0)
GTF/BED12 options
  • --metagene: When either a BED12 or GTF file are used to provide regions, perform the computation on the merged exons, rather than using the genomic interval defined by the 5-prime and 3-prime most transcript bound (i.e., columns 2 and 3 of a BED file). If a BED3 or BED6 file is used as input, then columns 2 and 3 are used as an exon. (Default: False)
  • --transcriptID TRANSCRIPTID: When a GTF file is used to provide regions, only entries with this value as their feature (column 3) will be processed as transcripts. (Default: transcript)
  • --exonID EXONID: When a GTF file is used to provide regions, only entries with this value as their feature (column 3) will be processed as exons. CDS would be another common value for this. (Default: exon)
  • --transcript_id_designator TRANSCRIPT_ID_DESIGNATOR: Each region has an ID (e.g., ACTB) assigned to it, which for BED files is either column 4 (if it exists) or the interval bounds. For GTF files this is instead stored in the last column as a key:value pair (e.g., as 'transcript_id "ACTB"', for a key of transcript_id and a value of ACTB). In some cases it can be convenient to use a different identifier. To do so, set this to the desired key. (Default: transcript_id)


Share your experience or ask a question