Category

Sam/Bam Manipulation


Usage

bamCoverage [options] -b reads.bam -o coverage.bw


Manual

bamCoverage is a tool from the deepTools suite. The information on this page is based on deepTools version 3.5.1.

bamCoverage works by first calculating all the number of reads (either extended to match the fragment length or not) that overlap each bin in the genome. The resulting read counts can be normalized using either a given scaling factor, the RPKM formula or to get a 1x depth of coverage (RPGC). In the case of paired-end mapping, each read mate is treated independently to avoid a bias when a mixture of concordant and discordant pairs is present. This means that each end will be extended to match the fragment length.

bamCoverage takes a bam file as input, if you want to get genome-wide coverage information with reads/regions stored in bed files, consider using genomecov from the bedtools suite.

Required arguments

  • --bam BAM file, -b BAM file: BAM file to process

Options

  • Output:
    • --outFileName FILENAME, -o FILENAME: Output file name. (default: None)
    • --outFileFormat {bigwig,bedgraph}, -of {bigwig,bedgraph}: Output file type. Either "bigwig" or "bedgraph". (default: bigwig)
  • Optional arguments:
    • --help, -h: show this help message and exit
    • --scaleFactor SCALEFACTOR: The computed scaling factor (or 1, if not applicable) will be multiplied by this. (Default: 1.0)
    • --MNase: Determine nucleosome positions from MNase-seq data. Only 3 nucleotides at the center of each fragment are counted. The fragment ends are defined by the two mate reads. Only fragment lengthsbetween 130 - 200 bp are considered to avoid dinucleosomes or other artifacts. By default, any fragments smaller or larger than this are ignored. To override this, use the --minFragmentLength and --maxFragmentLength options, which will default to 130 and 200 if not otherwise specified in the presence of --MNase. NOTE: Requires paired-end data. A bin size of 1 is recommended. (default: False)
    • --Offset INT [INT ...]: Uses this offset inside of each read as the signal. This is useful in cases like RiboSeq or GROseq, where the signal is 12, 15 or 0 bases past the start of the read. This can be paired with the --filterRNAstrand option. Note that negative values indicate offsets from the end of each read. A value of 1 indicates the first base of the alignment (taking alignment orientation into account). Likewise, a value of -1 is the last base of the alignment. An offset of 0 is not permitted. If two values are specified, then they will be used to specify a range of positions. Note that specifying something like --Offset 5 -1 will result in the 5th through last position being used, which is equivalent to trimming 4 bases from the 5-prime end of alignments. Note that if you specify --centerReads, the centering will be performed before the offset. (default: None)
    • --filterRNAstrand {forward,reverse}: Selects RNA-seq reads (single-end or paired-end) originating from genes on the given strand. This option assumes a standard dUTP-based library preparation (that is, --filterRNAstrand=forward keeps minus-strand reads, which originally came from genes on the forward strand using a dUTP-based method). Consider using --samExcludeFlag instead for filtering by strand in other contexts. (default: None)
    • --version: show program's version number and exit
    • --binSize INT bp, -bs INT bp: Size of the bins, in bases, for the output of the bigwig/bedgraph file. (Default: 50)
    • --region CHR:START:END, -r CHR:START:END: Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None)
    • --blackListFileName BED file [BED file ...], -bl BED file [BED file ...]: A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant. (default: None)
    • --numberOfProcessors INT, -p INT: Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (Default: 1)
    • --verbose, -v: Set to see processing messages. (default: False)
  • Read coverage normalization options:
    • --effectiveGenomeSize EFFECTIVEGENOMESIZE: The effective genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. Also, if repetitive regions were not included in the mapping of reads, the effective genome size needs to be adjusted accordingly. A table of values is available at the end of this page. (default: None)
    • --normalizeUsing {RPKM,CPM,BPM,RPGC,None}: Use one of the entered methods to normalize the number of reads per bin. By default, no normalization is performed.
      • RPKM = Reads Per Kilobase per Million mapped reads;
      • CPM = Counts Per Million mapped reads, same as CPM in RNA-seq;
      • BPM = Bins Per Million mapped reads, same as TPM in RNA-seq;
      • RPGC = reads per genomic content (1x normalization); Mapped reads are considered after blacklist filtering (if applied).
      • RPKM (per bin) = number of reads per bin / (number of mapped reads (in millions) * bin length (kb)).
      • CPM (per bin) = number of reads per bin / number of mapped reads (in millions).
      • BPM (per bin) = number of reads per bin / sum of all reads per bin (in millions).
      • RPGC (per bin) = number of reads per bin / scaling factor for 1x average coverage.
      • None = the default and equivalent to not setting this option at all. This scaling factor, in turn, is determined from the sequencing depth: (total number of mapped reads * fragment length) / effective genome size. The scaling factor used is the inverse of the sequencing depth computed for the sample to match the 1x coverage. This option requires --effectiveGenomeSize. Each read is considered independently, if you want to only count one mate from a pair in paired-end data, then use the --samFlagInclude/--samFlagExclude options. (Default: None)
    • --exactScaling: Instead of computing scaling factors based on a sampling of the reads, process all of the reads to determine the exact number that will be used in the output. This requires significantly more time to compute, but will produce more accurate scaling factors in cases where alignments that are being filtered are rare and lumped together. In other words, this is only needed when region-based sampling is expected to produce incorrect results. (default: False)
    • --ignoreForNormalization IGNOREFORNORMALIZATION [IGNOREFORNORMALIZATION ...], -ignore IGNOREFORNORMALIZATION [IGNOREFORNORMALIZATION ...]: A list of space-delimited chromosome names containing those chromosomes that should be excluded for computing the normalization. This is useful when considering samples with unequal coverage across chromosomes, like male samples. An usage examples is --ignoreForNormalization chrX chrM. (default: None)
    • --skipNonCoveredRegions, --skipNAs: This parameter determines if non-covered regions (regions without overlapping reads) in a BAM file should be skipped. The default is to treat those regions as having a value of zero. The decision to skip non-covered regions depends on the interpretation of the data. Non-covered regions may represent, for example, repetitive regions that should be skipped. (default: False)
    • --smoothLength INT bp: The smooth length defines a window, larger than the binSize, to average the number of reads. For example, if the --binSize is set to 20 and the --smoothLength is set to 60, then, for each bin, the average of the bin and its left and right neighbors is considered. Any value smaller than --binSize will be ignored and no smoothing will be applied. (default: None)
  • Read processing options:
    • --extendReads [INT bp], -e [INT bp]: This parameter allows the extension of reads to fragment size. If set, each read is extended, without exception.
      • NOTE: This feature is generally NOT recommended for spliced-read data, such as RNA-seq, as it would extend reads over skipped regions.
      • Single-end: Requires a user specified value for the final fragment length. Reads that already exceed this fragment length will not be extended.
      • Paired-end: Reads with mates are always extended to match the fragment size defined by the two read mates. Unmated reads, mate reads that map too far apart (>4x fragment length) or even map to different chromosomes are treated like single-end reads. The input of a fragment length value is optional. If no value is specified, it is estimated from the data (mean of the fragment size of all mate reads). (default: False)
    • --ignoreDuplicates: If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate's position also has to coincide to ignore a read. (default: False)
    • --minMappingQuality INT: If set, only reads that have a mapping quality score of at least this are considered. (default: None)
    • --centerReads: By adding this option, reads are centered with respect to the fragment length. For paired-end data, the read is centered at the fragment length defined by the two ends of the fragment. For single-end data, the given fragment length is used. This option is useful to get a sharper signal around enriched regions. (default: False)
    • --samFlagInclude INT: Include reads based on the SAM flag. For example, to get only reads that are the first mate, use a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage. (Default: None)
    • --samFlagExclude INT: Exclude reads based on the SAM flag. For example, to get only reads that map to the forward strand, use --samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand. (Default: None)
    • --minFragmentLength INT: The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. (Default: 0)
    • --maxFragmentLength INT: The maximum fragment length needed for read/pair inclusion. (Default: 0)

Appendix: Effective Genome Sizes

If multimapping reads are included:

Genome Effective size
GRCh37 2864785220
GRCh38 2913022398
GRCm37 2620345972
GRCm38 2652783500
dm3 162367812
dm6 142573017
GRCz10 1369631918
WBcel235 100286401
TAIR10 119481543

If multimapping reads are not included:

Read length GRCh37 GRCh38 GRCm37 GRCm38 dm3 dm6 GRCz10 WBcel235
50 2685511504 2701495761 2304947926 2308125349 130428560 125464728 1195445591 95159452
75 2736124973 2747877777 2404646224 2407883318 135004462 127324632 1251132686 96945445
100 2776919808 2805636331 2462481010 2467481108 139647232 129789873 1280189044 98259998
150 2827437033 2862010578 2489384235 2494787188 144307808 129941135 1312207169 98721253
200 2855464000 2887553303 2513019276 2520869189 148524010 132509163 1321355241 98672758

 

File formats this tool works with
BAM

Share your experience or ask a question