bamCoverage manual with usage examples

Usage

bamCoverage [options] -b reads.bam -o coverage.bw

Manual

bamCoverage is a tool from the deepTools suite. The information on this page is based on deepTools version 3.5.1.

bamCoverage works by first calculating all the number of reads (either extended to match the fragment length or not) that overlap each bin in the genome. The resulting read counts can be normalized using either a given scaling factor, the RPKM formula or to get a 1x depth of coverage (RPGC). In the case of paired-end mapping, each read mate is treated independently to avoid a bias when a mixture of concordant and discordant pairs is present. This means that each end will be extended to match the fragment length.

bamCoverage takes a bam file as input, if you want to get genome-wide coverage information with reads/regions stored in bed files, consider using genomecov from the bedtools suite.

Required arguments

--bam BAM file, -b BAM file: BAM file to process

Options

Output:
- --outFileName FILENAME, -o FILENAME: Output file name. (default: None)
- --outFileFormat {bigwig,bedgraph}, -of {bigwig,bedgraph}: Output file type. Either "bigwig" or "bedgraph". (default: bigwig)
Optional arguments:
- --help, -h: show this help message and exit
- --scaleFactor SCALEFACTOR: The computed scaling factor (or 1, if not applicable) will be multiplied by this. (Default: 1.0)
- --MNase: Determine nucleosome positions from MNase-seq data. Only 3 nucleotides at the center of each fragment are counted. The fragment ends are defined by the two mate reads. Only fragment lengthsbetween 130 - 200 bp are considered to avoid dinucleosomes or other artifacts. By default, any fragments smaller or larger than this are ignored. To override this, use the --minFragmentLength and --maxFragmentLength options, which will default to 130 and 200 if not otherwise specified in the presence of --MNase. NOTE: Requires paired-end data. A bin size of 1 is recommended. (default: False)
- --Offset INT [INT ...]: Uses this offset inside of each read as the signal. This is useful in cases like RiboSeq or GROseq, where the signal is 12, 15 or 0 bases past the start of the read. This can be paired with the --filterRNAstrand option. Note that negative values indicate offsets from the end of each read. A value of 1 indicates the first base of the alignment (taking alignment orientation into account). Likewise, a value of -1 is the last base of the alignment. An offset of 0 is not permitted. If two values are specified, then they will be used to specify a range of positions. Note that specifying something like --Offset 5 -1 will result in the 5th through last position being used, which is equivalent to trimming 4 bases from the 5-prime end of alignments. Note that if you specify --centerReads, the centering will be performed before the offset. (default: None)
- --filterRNAstrand {forward,reverse}: Selects RNA-seq reads (single-end or paired-end) originating from genes on the given strand. This option assumes a standard dUTP-based library preparation (that is, --filterRNAstrand=forward keeps minus-strand reads, which originally came from genes on the forward strand using a dUTP-based method). Consider using --samExcludeFlag instead for filtering by strand in other contexts. (default: None)
- --version: show program's version number and exit
- --binSize INT bp, -bs INT bp: Size of the bins, in bases, for the output of the bigwig/bedgraph file. (Default: 50)
- --region CHR:START:END, -r CHR:START:END: Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None)
- --blackListFileName BED file [BED file ...], -bl BED file [BED file ...]: A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant. (default: None)
- --numberOfProcessors INT, -p INT: Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (Default: 1)
- --verbose, -v: Set to see processing messages. (default: False)
Read coverage normalization options:
- --effectiveGenomeSize EFFECTIVEGENOMESIZE: The effective genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. Also, if repetitive regions were not included in the mapping of reads, the effective genome size needs to be adjusted accordingly. A table of values is available at the end of this page. (default: None)
- --normalizeUsing {RPKM,CPM,BPM,RPGC,None}: Use one of the entered methods to normalize the number of reads per bin. By default, no normalization is performed.
  - RPKM = Reads Per Kilobase per Million mapped reads;
  - CPM = Counts Per Million mapped reads, same as CPM in RNA-seq;
  - BPM = Bins Per Million mapped reads, same as TPM in RNA-seq;
  - RPGC = reads per genomic content (1x normalization); Mapped reads are considered after blacklist filtering (if applied).
  - RPKM (per bin) = number of reads per bin / (number of mapped reads (in millions) * bin length (kb)).
  - CPM (per bin) = number of reads per bin / number of mapped reads (in millions).
  - BPM (per bin) = number of reads per bin / sum of all reads per bin (in millions).
  - RPGC (per bin) = number of reads per bin / scaling factor for 1x average coverage.
  - None = the default and equivalent to not setting this option at all. This scaling factor, in turn, is determined from the sequencing depth: (total number of mapped reads * fragment length) / effective genome size. The scaling factor used is the inverse of the sequencing depth computed for the sample to match the 1x coverage. This option requires --effectiveGenomeSize. Each read is considered independently, if you want to only count one mate from a pair in paired-end data, then use the --samFlagInclude/--samFlagExclude options. (Default: None)
- --exactScaling: Instead of computing scaling factors based on a sampling of the reads, process all of the reads to determine the exact number that will be used in the output. This requires significantly more time to compute, but will produce more accurate scaling factors in cases where alignments that are being filtered are rare and lumped together. In other words, this is only needed when region-based sampling is expected to produce incorrect results. (default: False)
- --ignoreForNormalization IGNOREFORNORMALIZATION [IGNOREFORNORMALIZATION ...], -ignore IGNOREFORNORMALIZATION [IGNOREFORNORMALIZATION ...]: A list of space-delimited chromosome names containing those chromosomes that should be excluded for computing the normalization. This is useful when considering samples with unequal coverage across chromosomes, like male samples. An usage examples is --ignoreForNormalization chrX chrM. (default: None)
- --skipNonCoveredRegions, --skipNAs: This parameter determines if non-covered regions (regions without overlapping reads) in a BAM file should be skipped. The default is to treat those regions as having a value of zero. The decision to skip non-covered regions depends on the interpretation of the data. Non-covered regions may represent, for example, repetitive regions that should be skipped. (default: False)
- --smoothLength INT bp: The smooth length defines a window, larger than the binSize, to average the number of reads. For example, if the --binSize is set to 20 and the --smoothLength is set to 60, then, for each bin, the average of the bin and its left and right neighbors is considered. Any value smaller than --binSize will be ignored and no smoothing will be applied. (default: None)
Read processing options:
- --extendReads [INT bp], -e [INT bp]: This parameter allows the extension of reads to fragment size. If set, each read is extended, without exception.
  - NOTE: This feature is generally NOT recommended for spliced-read data, such as RNA-seq, as it would extend reads over skipped regions.
  - Single-end: Requires a user specified value for the final fragment length. Reads that already exceed this fragment length will not be extended.
  - Paired-end: Reads with mates are always extended to match the fragment size defined by the two read mates. Unmated reads, mate reads that map too far apart (>4x fragment length) or even map to different chromosomes are treated like single-end reads. The input of a fragment length value is optional. If no value is specified, it is estimated from the data (mean of the fragment size of all mate reads). (default: False)
- --ignoreDuplicates: If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate's position also has to coincide to ignore a read. (default: False)
- --minMappingQuality INT: If set, only reads that have a mapping quality score of at least this are considered. (default: None)
- --centerReads: By adding this option, reads are centered with respect to the fragment length. For paired-end data, the read is centered at the fragment length defined by the two ends of the fragment. For single-end data, the given fragment length is used. This option is useful to get a sharper signal around enriched regions. (default: False)
- --samFlagInclude INT: Include reads based on the SAM flag. For example, to get only reads that are the first mate, use a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage. (Default: None)
- --samFlagExclude INT: Exclude reads based on the SAM flag. For example, to get only reads that map to the forward strand, use --samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand. (Default: None)
- --minFragmentLength INT: The minimum fragment length needed for read/pair inclusion. This option is primarily useful in ATACseq experiments, for filtering mono- or di-nucleosome fragments. (Default: 0)
- --maxFragmentLength INT: The maximum fragment length needed for read/pair inclusion. (Default: 0)

Appendix: Effective Genome Sizes

If multimapping reads are included:

Genome	Effective size
GRCh37	2864785220
GRCh38	2913022398
GRCm37	2620345972
GRCm38	2652783500
dm3	162367812
dm6	142573017
GRCz10	1369631918
WBcel235	100286401
TAIR10	119481543

If multimapping reads are not included:

Read length	GRCh37	GRCh38	GRCm37	GRCm38	dm3	dm6	GRCz10	WBcel235
50	2685511504	2701495761	2304947926	2308125349	130428560	125464728	1195445591	95159452
75	2736124973	2747877777	2404646224	2407883318	135004462	127324632	1251132686	96945445
100	2776919808	2805636331	2462481010	2467481108	139647232	129789873	1280189044	98259998
150	2827437033	2862010578	2489384235	2494787188	144307808	129941135	1312207169	98721253
200	2855464000	2887553303	2513019276	2520869189	148524010	132509163	1321355241	98672758

File formats this tool works with

BAM

bamCoverage

Category