computeMatrix manual with usage examples

Usage

computeMatrix scale-regions -S <biwig file> -R <bed file> -b 1000

Manual

computeMatrix is a tool from the deepTools suite. The information on this page is based on deepTools version 3.5.1.

–regionsFileName, -R   File name, in BED format, containing the regions to plot. If multiple bed files are given, each one is considered a group that can be plotted separately. Also, adding a “#” symbol in the bed file causes all the regions until the previous “#” to be considered one group.
–scoreFileName, -S   bigWig file(s) containing the scores to be plotted. BigWig files can be obtained by using the bamCoverage or bamCompare tools. More information about the bigWig file format can be found at http://genome.ucsc.edu/goldenPath/help/bigWig.html
–outFileName, -out   File name to save the gzipped matrix file needed by the “plotHeatmap” and “plotProfile” tools.
–outFileNameMatrix   If this option is given, then the matrix of values underlying the heatmap will be saved using the indicated name, e.g. IndividualValues.tab.This matrix can easily be loaded into R or other programs.
–outFileSortedRegions   File name in which the regions are saved after skiping zeros or min/max threshold values. The order of the regions in the file follows the sorting order selected. This is useful, for example, to generate other heatmaps keeping the sorting of the first heatmap. Example: Heatmap1sortedRegions.bed
–version   show program’s version number and exit
–regionBodyLength, -m   Distance in bases to which all regions will be fit.
–startLabel   Label shown in the plot for the start of the region. Default is TSS (transcription start site), but could be changed to anything, e.g. “peak start”. Note that this is only useful if you plan to plot the results yourself and not, for example, with plotHeatmap, which will override this.
–endLabel   Label shown in the plot for the region end. Default is TES (transcription end site). See the –startLabel option for more information.
–beforeRegionStartLength, -b, –upstream   Distance upstream of the start site of the regions defined in the region file. If the regions are genes, this would be the distance upstream of the transcription start site.
–afterRegionStartLength, -a, –downstream   Distance downstream of the end site of the given regions. If the regions are genes, this would be the distance downstream of the transcription end site.
–unscaled5prime   Number of bases at the 5-prime end of the region to exclude from scaling. By default, each region is scaled to a given length (see the –regionBodyLength option). In some cases it is useful to look at unscaled signals around region boundaries, so this setting specifies the number of unscaled bases on the 5-prime end of each boundary.
–unscaled3prime   Like –unscaled3prime, but for the 3-prime end.
–binSize, -bs   Length, in bases, of the non-overlapping bins for averaging the score over the regions length.
–sortRegions   Possible choices: descend, ascend, no, keep
Whether the output file should present the regions sorted. The default is to not sort the regions. Note that this is only useful if you plan to plot the results yourself and not, for example, with plotHeatmap, which will override this. Note also that unsorted output will be in whatever order the regions happen to be processed in and not match the order in the input files. If you require the output order to match that of the input regions, then either specify “keep” or use computeMatrixOperations to resort the results file.
–sortUsing   Possible choices: mean, median, max, min, sum, region_length
–sortUsingSamples   List of sample numbers (order as in matrix), that are used for sorting by –sortUsing, no value uses all samples, example: –sortUsingSamples 1 3
–averageTypeBins   Possible choices: mean, median, min, max, std, sum
Define the type of statistic that should be used over the bin size range. The options are: “mean”, “median”, “min”, “max”, “sum” and “std”. The default is “mean”.
–missingDataAsZero   If set, missing data (NAs) will be treated as zeros. The default is to ignore such cases, which will be depicted as black areas in a heatmap. (see the –missingDataColor argument of the plotHeatmap command for additional options).
–skipZeros   Whether regions with only scores of zero should be included or not. Default is to include them.
–minThreshold   Numeric value. Any region containing a value that is less than or equal to this will be skipped. This is useful to skip, for example, genes where the read count is zero for any of the bins. This could be the result of unmappable areas and can bias the overall results.
–maxThreshold   Numeric value. Any region containing a value greater than or equal to this will be skipped. The maxThreshold is useful to skip those few regions with very high read counts (e.g. micro satellites) that may bias the average values.
–blackListFileName, -bl   A BED file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered.
–quiet, -q   Set to remove any warning or processing messages.
–scale   If set, all values are multiplied by this number.
–numberOfProcessors, -p   Number of processors to use. Type “max/2” to use half the maximum number of processors or “max” to use all available processors.
–metagene   When either a BED12 or GTF file are used to provide regions, perform the computation on the merged exons, rather than using the genomic interval defined by the 5-prime and 3-prime most transcript bound (i.e., columns 2 and 3 of a BED file). If a BED3 or BED6 file is used as input, then columns 2 and 3 are used as an exon.
–exonID   When a GTF file is used to provide regions, only entries with this value as their feature (column 2) will be processed as exons. CDS would be another common value for this.
–deepBlueURL   For remote files bedgraph/wiggle files hosted on deepBlue, this specifies the server URL. The default is “http://deepblue.mpi-inf.mpg.de/xmlrpc”, which should not be changed without good reason.
–userKey   For remote files bedgraph/wiggle files hosted on deepBlue, this specifies the user key to use for access. The default is “anonymous_key”, which suffices for public datasets. If you need access to a restricted access/private dataset, then request a key from deepBlue and specify it here.
–referencePoint   Possible choices: TSS, TES, center
The reference point for the plotting could be either the region start (TSS), the region end (TES) or the center of the region. Note that regardless of what you specify, plotHeatmap/plotProfile will default to using “TSS” as the label.
–nanAfterEnd   If set, any values after the region end are discarded. This is useful to visualize the region end when not using the scale-regions mode and when the reference-point is set to the TSS.
–transcriptID   When a GTF file is used to provide regions, only entries with this value as their feature (column 2) will be processed as transcripts.
–transcript_id_designator   Each region has an ID (e.g., ACTB) assigned to it, which for BED files is either column 4 (if it exists) or the interval bounds. For GTF files this is instead stored in the last column as a key:value pair (e.g., as ‘transcript_id “ACTB”’, for a key of transcript_id and a value of ACTB). In some cases it can be convenient to use a different identifier. To do so, set this to the desired key.
–deepBlueTempDir   If specified, temporary files from preloading datasets from deepBlue will be written here (note, this directory must exist). If not specified, where ever temporary files would normally be written on your system is used.
–deepBlueKeepTemp   If specified, temporary bigWig files from preloading deepBlue datasets are not deleted. A message will be printed noting where these files are and what sample they correspond to. These can then be used if you wish to analyse the same sample with the same regions again.

computeMatrix

Category

Usage

Manual

Share your experience or ask a question