Category

BigWig Manipulation


Usage

multiBigwigSummary bins -b file1.bw file2.bw -out results.npz [options]


Manual

multiBigwigSummary is a tool from the deepTools suite, this document is generated based on deeptools ver 3.5.1

Given a set bigWig files, this tool calculates the average score based on equally sized bins (10 kilobases by default, can be adjusted by -bs), which consecutively cover the entire genome. The only exception is the last bin of a chromosome, which is often smaller. The output of this mode is commonly used to assess the overall similarity of different bigWig files. If you want to calculate mean scores for a set of regions, consider using multiBigWigSummary with bed inputs.

Required arguments

  • --bwfiles FILE1 FILE2 [FILE1 FILE2 ...], -b FILE1 FILE2 [FILE1 FILE2 ...]: List of bigWig files, separated by spaces.
  • --outFileName OUTFILENAME, -out OUTFILENAME, -o OUTFILENAME: File name to save the compressed matrix file (npz format) needed by the plotPCA and plotCorrelation tools.

Options

  • --labels sample1 sample2 [sample1 sample2 ...], -l sample1 sample2 [sample1 sample2 ...]: User defined labels instead of default labels from file names. Multiple labels have to be separated by spaces, e.g., --labels sample1 sample2 sample3 (default: None)
  • --smartLabels: Instead of manually specifying labels for the input bigWig files, this causes deepTools to use the file name after removing the path and extension. (default: False)
  • --chromosomesToSkip chr1 chr2 [chr1 chr2 ...]: List of chromosomes that you do not want to be included. Useful to remove random or extra chr. (default: None)
  • --binSize INT, -bs INT: Size (in bases) of the windows sampled from the genome. (Default: 10000)
  • --distanceBetweenBins INT, -n INT: By default, multiBigwigSummary considers adjacent bins of the specified --binSize. However, to reduce the computation time, a larger distance between bins can be given. Larger distances results in fewer considered bins. (Default: 0)
  • --version: show program's version number and exit
  • --region CHR:START:END, -r CHR:START:END: Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None)
  • --blackListFileName BED file [BED file ...], -bl BED file [BED file ...]: A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant. (default: None)
  • --numberOfProcessors INT, -p INT: Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (Default: 1)
  • --verbose, -v: Set to see processing messages. (default: False)
  • --help, -h: show this help message and exit
Output optional options
  • --outRawCounts FILE: Save average scores per region for each bigWig file to a single tab-delimited file. (default: None)

deepBlue options

Options used only for remote bedgraph/wig files hosted on deepBlue

  • --deepBlueURL DEEPBLUEURL: For remote files bedgraph/wiggle files hosted on deepBlue, this specifies the server URL. The default is "http://deepblue.mpi-inf.mpg.de/xmlrpc", which should not be changed without good reason. (default: http://deepblue.mpi-inf.mpg.de/xmlrpc)
  • --userKey USERKEY: For remote files bedgraph/wiggle files hosted on deepBlue, this specifies the user key to use for access. The default is "anonymous_key", which suffices for public datasets. If you need access to a restricted access/private dataset, then request a key from deepBlue and specify it here. (default: anonymous_key)
  • --deepBlueTempDir DEEPBLUETEMPDIR: If specified, temporary files from preloading datasets from deepBlue will be written here (note, this directory must exist). If not specified, where ever temporary files would normally be written on your system is used. (default: None)
  • --deepBlueKeepTemp: If specified, temporary bigWig files from preloading deepBlue datasets are not deleted. A message will be printed noting where these files are and what sample they correspond to. These can then be used if you wish to analyse the same sample with the same regions again. (default: False)

Example

In the following example, the average values for our test ENCODE ChIP-Seq datasets are computed for consecutive genome bins (default size: 10kb) by using the bins mode.

$ multiBigwigSummary bins \
 -b testFiles/H3K4Me1.bigWig testFiles/H3K4Me3.bigWig testFiles/H3K27Me3.bigWig testFiles/Input.bigWig \
 --labels H3K4me1 H3K4me3 H3K27me3 input \
 -out scores_per_bin.npz --outRawCounts scores_per_bin.tab
$ head scores_per_bin.tab
    #'chr'  'start' 'end'   'H3K4me1'       'H3K4me3'       'H3K27me3'      'input'
    19      0       10000   0.0     0.0     0.0     0.0
    19      10000   20000   0.0     0.0     0.0     0.0
    19      20000   30000   0.0     0.0     0.0     0.0
    19      30000   40000   0.0     0.0     0.0     0.0
    19      40000   50000   0.0     0.0     0.0     0.0
    19      50000   60000   0.0221538461538 0.0     0.00482142857143        0.0522717391304
    19      60000   70000   4.27391282051   1.625   0.634116071429  1.29124347826
    19      70000   80000   13.0891675214   24.65   1.8180625       2.80073695652
    19      80000   90000   1.74591965812   0.29    4.35576785714   0.92987826087

The output npz file, scores_per_bin.npz, can be used by tools like plotPCA and plotCorrelation, to generate PCA and correlation plots. The optional output (--outRawCounts), scores_per_bin.tab, is a simple tab-delimited file that can be used with any other program. The first three columns define the region of the genome for which the reads were summarized.

File formats this tool works with
bigWig

Share your experience or ask a question