Category
BigWig Manipulation
Usage
multiBigwigSummary bins -b file1.bw file2.bw -out results.npz [options]
Manual
multiBigwigSummary
is a tool from the deepTools suite, this document is generated based on deeptools ver 3.5.1
Given a set bigWig files, this tool calculates the average score based on equally sized bins (10 kilobases by default, can be adjusted by -bs), which consecutively cover the entire genome. The only exception is the last bin of a chromosome, which is often smaller. The output of this mode is commonly used to assess the overall similarity of different bigWig files. If you want to calculate mean scores for a set of regions, consider using multiBigWigSummary with bed inputs
.
Required arguments
- --bwfiles FILE1 FILE2 [FILE1 FILE2 ...], -b FILE1 FILE2 [FILE1 FILE2 ...]: List of bigWig files, separated by spaces.
- --outFileName OUTFILENAME, -out OUTFILENAME, -o OUTFILENAME: File name to save the compressed matrix file (npz format) needed by the
plotPCA
and plotCorrelation
tools.
Options
- --labels sample1 sample2 [sample1 sample2 ...], -l sample1 sample2 [sample1 sample2 ...]: User defined labels instead of default labels from file names. Multiple labels have to be separated by spaces, e.g.,
--labels sample1 sample2 sample3
(default: None)
- --smartLabels: Instead of manually specifying labels for the input bigWig files, this causes deepTools to use the file name after removing the path and extension. (default: False)
- --chromosomesToSkip chr1 chr2 [chr1 chr2 ...]: List of chromosomes that you do not want to be included. Useful to remove random or extra chr. (default: None)
- --binSize INT, -bs INT: Size (in bases) of the windows sampled from the genome. (Default: 10000)
- --distanceBetweenBins INT, -n INT: By default,
multiBigwigSummary
considers adjacent bins of the specified --binSize. However, to reduce the computation time, a larger distance between bins can be given. Larger distances results in fewer considered bins. (Default: 0)
- --version: show program's version number and exit
- --region CHR:START:END, -r CHR:START:END: Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is
chr:start:end
, for example --region chr10
or --region chr10:456700:891000. (default: None)
- --blackListFileName BED file [BED file ...], -bl BED file [BED file ...]: A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant. (default: None)
- --numberOfProcessors INT, -p INT: Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (Default: 1)
- --verbose, -v: Set to see processing messages. (default: False)
- --help, -h: show this help message and exit
Output optional options
deepBlue options
Options used only for remote bedgraph/wig files hosted on deepBlue
- --deepBlueURL DEEPBLUEURL: For remote files bedgraph/wiggle files hosted on deepBlue, this specifies the server URL. The default is "http://deepblue.mpi-inf.mpg.de/xmlrpc", which should not be changed without good reason. (default: http://deepblue.mpi-inf.mpg.de/xmlrpc)
- --userKey USERKEY: For remote files bedgraph/wiggle files hosted on deepBlue, this specifies the user key to use for access. The default is "anonymous_key", which suffices for public datasets. If you need access to a restricted access/private dataset, then request a key from deepBlue and specify it here. (default: anonymous_key)
- --deepBlueTempDir DEEPBLUETEMPDIR: If specified, temporary files from preloading datasets from deepBlue will be written here (note, this directory must exist). If not specified, where ever temporary files would normally be written on your system is used. (default: None)
- --deepBlueKeepTemp: If specified, temporary bigWig files from preloading deepBlue datasets are not deleted. A message will be printed noting where these files are and what sample they correspond to. These can then be used if you wish to analyse the same sample with the same regions again. (default: False)
Example
In the following example, the average values for our test ENCODE ChIP-Seq datasets are computed for consecutive genome bins (default size: 10kb) by using the bins mode.
$ multiBigwigSummary bins \
-b testFiles/H3K4Me1.bigWig testFiles/H3K4Me3.bigWig testFiles/H3K27Me3.bigWig testFiles/Input.bigWig \
--labels H3K4me1 H3K4me3 H3K27me3 input \
-out scores_per_bin.npz --outRawCounts scores_per_bin.tab
$ head scores_per_bin.tab
#'chr' 'start' 'end' 'H3K4me1' 'H3K4me3' 'H3K27me3' 'input'
19 0 10000 0.0 0.0 0.0 0.0
19 10000 20000 0.0 0.0 0.0 0.0
19 20000 30000 0.0 0.0 0.0 0.0
19 30000 40000 0.0 0.0 0.0 0.0
19 40000 50000 0.0 0.0 0.0 0.0
19 50000 60000 0.0221538461538 0.0 0.00482142857143 0.0522717391304
19 60000 70000 4.27391282051 1.625 0.634116071429 1.29124347826
19 70000 80000 13.0891675214 24.65 1.8180625 2.80073695652
19 80000 90000 1.74591965812 0.29 4.35576785714 0.92987826087
The output npz file, scores_per_bin.npz, can be used by tools like plotPCA
and plotCorrelation
, to generate PCA and correlation plots. The optional output (--outRawCounts), scores_per_bin.tab, is a simple tab-delimited file that can be used with any other program. The first three columns define the region of the genome for which the reads were summarized.
Share your experience or ask a question