Reference Code backup Executable files
This tool prepares an intermediate file (a gzipped table of values) that contains scores associated with genomic regions. The regions can either be scaled to the same size (using the scale-regions mode) or you can choose the start, end, or center of each region as the focus point for the score calculations.
computeMatrix scale-regions -S <biwig file> -R <bed file> -b 1000
computeMatrix
is a tool from the deepTools suite. The information on this page is based on deepTools version 3.5.1.
–regionsFileName, -R File name, in BED format, containing the regions to plot. If multiple bed files are given, each one is considered a group that can be plotted separately. Also, adding a “#” symbol in the bed file causes all the regions until the previous “#” to be considered one group.
–scoreFileName, -S bigWig file(s) containing the scores to be plotted. BigWig files can be obtained by using the bamCoverage or bamCompare tools. More information about the bigWig file format can be found at http://genome.ucsc.edu/goldenPath/help/bigWig.html
–outFileName, -out File name to save the gzipped matrix file needed by the “plotHeatmap” and “plotProfile” tools.
–outFileNameMatrix If this option is given, then the matrix of values underlying the heatmap will be saved using the indicated name, e.g. IndividualValues.tab.This matrix can easily be loaded into R or other programs.
–outFileSortedRegions File name in which the regions are saved after skiping zeros or min/max threshold values. The order of the regions in the file follows the sorting order selected. This is useful, for example, to generate other heatmaps keeping the sorting of the first heatmap. Example: Heatmap1sortedRegions.bed
–version show program’s version number and exit
–regionBodyLength, -m Distance in bases to which all regions will be fit.
–startLabel Label shown in the plot for the start of the region. Default is TSS (transcription start site), but could be changed to anything, e.g. “peak start”. Note that this is only useful if you plan to plot the results yourself and not, for example, with plotHeatmap, which will override this.
–endLabel Label shown in the plot for the region end. Default is TES (transcription end site). See the –startLabel option for more information.
–beforeRegionStartLength, -b, –upstream Distance upstream of the start site of the regions defined in the region file. If the regions are genes, this would be the distance upstream of the transcription start site.
–afterRegionStartLength, -a, –downstream Distance downstream of the end site of the given regions. If the regions are genes, this would be the distance downstream of the transcription end site.
–unscaled5prime Number of bases at the 5-prime end of the region to exclude from scaling. By default, each region is scaled to a given length (see the –regionBodyLength option). In some cases it is useful to look at unscaled signals around region boundaries, so this setting specifies the number of unscaled bases on the 5-prime end of each boundary.
–unscaled3prime Like –unscaled3prime, but for the 3-prime end.
–binSize, -bs Length, in bases, of the non-overlapping bins for averaging the score over the regions length.
–sortRegions Possible choices: descend, ascend, no, keep
Whether the output file should present the regions sorted. The default is to not sort the regions. Note that this is only useful if you plan to plot the results yourself and not, for example, with plotHeatmap, which will override this. Note also that unsorted output will be in whatever order the regions happen to be processed in and not match the order in the input files. If you require the output order to match that of the input regions, then either specify “keep” or use computeMatrixOperations to resort the results file.
–sortUsing Possible choices: mean, median, max, min, sum, region_length
–sortUsingSamples List of sample numbers (order as in matrix), that are used for sorting by –sortUsing, no value uses all samples, example: –sortUsingSamples 1 3
–averageTypeBins Possible choices: mean, median, min, max, std, sum
Define the type of statistic that should be used over the bin size range. The options are: “mean”, “median”, “min”, “max”, “sum” and “std”. The default is “mean”.
–missingDataAsZero If set, missing data (NAs) will be treated as zeros. The default is to ignore such cases, which will be depicted as black areas in a heatmap. (see the –missingDataColor argument of the plotHeatmap command for additional options).
–skipZeros Whether regions with only scores of zero should be included or not. Default is to include them.
–minThreshold Numeric value. Any region containing a value that is less than or equal to this will be skipped. This is useful to skip, for example, genes where the read count is zero for any of the bins. This could be the result of unmappable areas and can bias the overall results.
–maxThreshold Numeric value. Any region containing a value greater than or equal to this will be skipped. The maxThreshold is useful to skip those few regions with very high read counts (e.g. micro satellites) that may bias the average values.
–blackListFileName, -bl A BED file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered.
–quiet, -q Set to remove any warning or processing messages.
–scale If set, all values are multiplied by this number.
–numberOfProcessors, -p Number of processors to use. Type “max/2” to use half the maximum number of processors or “max” to use all available processors.
–metagene When either a BED12 or GTF file are used to provide regions, perform the computation on the merged exons, rather than using the genomic interval defined by the 5-prime and 3-prime most transcript bound (i.e., columns 2 and 3 of a BED file). If a BED3 or BED6 file is used as input, then columns 2 and 3 are used as an exon.
–exonID When a GTF file is used to provide regions, only entries with this value as their feature (column 2) will be processed as exons. CDS would be another common value for this.
–deepBlueURL For remote files bedgraph/wiggle files hosted on deepBlue, this specifies the server URL. The default is “http://deepblue.mpi-inf.mpg.de/xmlrpc”, which should not be changed without good reason.
–userKey For remote files bedgraph/wiggle files hosted on deepBlue, this specifies the user key to use for access. The default is “anonymous_key”, which suffices for public datasets. If you need access to a restricted access/private dataset, then request a key from deepBlue and specify it here.
–referencePoint Possible choices: TSS, TES, center
The reference point for the plotting could be either the region start (TSS), the region end (TES) or the center of the region. Note that regardless of what you specify, plotHeatmap/plotProfile will default to using “TSS” as the label.
–nanAfterEnd If set, any values after the region end are discarded. This is useful to visualize the region end when not using the scale-regions mode and when the reference-point is set to the TSS.
–transcriptID When a GTF file is used to provide regions, only entries with this value as their feature (column 2) will be processed as transcripts.
–transcript_id_designator Each region has an ID (e.g., ACTB) assigned to it, which for BED files is either column 4 (if it exists) or the interval bounds. For GTF files this is instead stored in the last column as a key:value pair (e.g., as ‘transcript_id “ACTB”’, for a key of transcript_id and a value of ACTB). In some cases it can be convenient to use a different identifier. To do so, set this to the desired key.
–deepBlueTempDir If specified, temporary files from preloading datasets from deepBlue will be written here (note, this directory must exist). If not specified, where ever temporary files would normally be written on your system is used.
–deepBlueKeepTemp If specified, temporary bigWig files from preloading deepBlue datasets are not deleted. A message will be printed noting where these files are and what sample they correspond to. These can then be used if you wish to analyse the same sample with the same regions again.