multiBigwigSummary manual with usage examples

Usage

multiBigwigSummary BED-file -b file1.bw file2.bw -out results.npz --BED selection.bed

Manual

multiBigwigSummary is a tool from the deepTools suite, this document is generated based on deeptools ver 3.5.1

Given a set bigWig files, this tool calculates the average score among all regions defined in bed files. If you want to calculate mean scores across the entire genome, consider using multiBigWigSummary with bins. For this command, a common use is to compare scores (e.g. ChIP-seq scores) between different samples over a set of pre-defined peak regions.

Required arguments

--bwfiles FILE1 FILE2 [FILE1 FILE2 ...], -b FILE1 FILE2 [FILE1 FILE2 ...]: List of bigWig files, separated by spaces.
--outFileName OUTFILENAME, -out OUTFILENAME, -o OUTFILENAME: File name to save the compressed matrix file (npz format) needed by the plotPCA and plotCorrelation tools.
--BED file1.bed file2.bed [file1.bed file2.bed ...]: Limits the analysis to the regions specified in this file.

Options

--labels sample1 sample2 [sample1 sample2 ...], -l sample1 sample2 [sample1 sample2 ...]: User defined labels instead of default labels from file names. Multiple labels have to be separated by spaces, e.g., --labels sample1 sample2 sample3 (default: None)
--smartLabels: Instead of manually specifying labels for the input bigWig files, this causes deepTools to use the file name after removing the path and extension. (default: False)
--chromosomesToSkip chr1 chr2 [chr1 chr2 ...]: List of chromosomes that you do not want to be included. Useful to remove random or extra chr. (default: None)
--region CHR:START:END, -r CHR:START:END: Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None)
--blackListFileName BED file [BED file ...], -bl BED file [BED file ...]: A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant. (default: None)
--numberOfProcessors INT, -p INT: Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (Default: 1)
--help, -h: show this help message and exit
--verbose, -v: Set to see processing messages.
--version: show program's version number and exit

Output optional options

--outRawCounts FILE: Save average scores per region for each bigWig file to a single tab-delimited file. (default: None)

GTF/BED12 options:

--metagene: When either a BED12 or GTF file are used to provide regions, perform the computation on the merged exons, rather than using the genomic interval defined by the 5-prime and 3-prime most transcript bound (i.e., columns 2 and 3 of a BED file). If a BED3 or BED6 file is used as input, then columns 2 and 3 are used as an exon. (Default: False)
--transcriptID TRANSCRIPTID: When a GTF file is used to provide regions, only entries with this value as their feature (column 3) will be processed as transcripts. (Default: transcript)
--exonID EXONID: When a GTF file is used to provide regions, only entries with this value as their feature (column 3) will be processed as exons. CDS would be another common value for this. (Default: exon)
--transcript_id_designator TRANSCRIPT_ID_DESIGNATOR: Each region has an ID (e.g., ACTB) assigned to it, which for BED files is either column 4 (if it exists) or the interval bounds. For GTF files this is instead stored in the last column as a key:value pair (e.g., as 'transcript_id "ACTB"', for a key of transcript_id and a value of ACTB). In some cases it can be convenient to use a different identifier. To do so, set this to the desired key. (Default: transcript_id)

deepBlue options

Options used only for remote bedgraph/wig files hosted on deepBlue

--deepBlueURL DEEPBLUEURL: For remote files bedgraph/wiggle files hosted on deepBlue, this specifies the server URL. The default is "http://deepblue.mpi-inf.mpg.de/xmlrpc", which should not be changed without good reason. (default: http://deepblue.mpi-inf.mpg.de/xmlrpc)
--userKey USERKEY: For remote files bedgraph/wiggle files hosted on deepBlue, this specifies the user key to use for access. The default is "anonymous_key", which suffices for public datasets. If you need access to a restricted access/private dataset, then request a key from deepBlue and specify it here. (default: anonymous_key)
--deepBlueTempDir DEEPBLUETEMPDIR: If specified, temporary files from preloading datasets from deepBlue will be written here (note, this directory must exist). If not specified, where ever temporary files would normally be written on your system is used. (default: None)
--deepBlueKeepTemp: If specified, temporary bigWig files from preloading deepBlue datasets are not deleted. A message will be printed noting where these files are and what sample they correspond to. These can then be used if you wish to analyse the same sample with the same regions again. (default: False)

Example

In the following example, the average values for our test ENCODE ChIP-Seq datasets are computed for a set of genes.

$ multiBigwigSummary BED-file \
 --bwfiles testFiles/*bigWig \
 --BED testFiles/genes.bed \
 --labels H3K27me3 H3K4me1 H3K4me3 HeK9me3 input \
 -out scores_per_transcript.npz --outRawCounts scores_per_transcript.tab

$ head scores_per_transcript.tab
 #'chr'     'start' 'end'   'H3K27me3'      'H3K4me1'       'H3K4me3'       'HeK9me3'       'input'
19  60104   70951   0.663422099099  4.37103606574   14.9609108509   0.596631607217  1.34274297191
19  60950   70966   0.714223982699  4.54650763906   16.2336261981   0.62173674295   1.41719308888
19  62114   70944   0.747578769617  4.84009060023   18.2951302378   0.648723472352  1.51324474371
19  63820   70951   0.781816722009  5.30500631048   22.5579862572   0.682862029229  1.55490104062
19  65057   66382   0.528301886792  5.45886792453   0.523018867925  0.555471698113  1.97056603774
19  65821   66416   0.411764705882  3.0     0.636974789916  0.168067226891  1.67226890756
19  65821   70945   0.844600775761  4.79176424668   31.1346604215   0.693073728066  1.47911787666
19  66319   66492   0.774566473988  1.59537572254   0.0     0.0     0.578034682081
19  66345   71535   0.877430197151  5.49036608863   43.978805395    0.746026011561  1.43545279383

The output npz file, scores_per_transcript.npz, can be used by tools like plotPCA and plotCorrelation, to generate PCA and correlation plots. The optional output (--outRawCounts), scores_per_transcript.tab, is a simple tab-delimited file that can be used with any other program. The first three columns define the region of the genome for which the reads were summarized.

multiBigwigSummary

Category