Category

Plot


Usage

hifive hic-normalize <SUBCOMMAND> [-h] [-m MINDIST] [-x MAXDIST] [-c CHROMS] [-o OUTPUT] [-q] [normalization options] project


Manual

-h, --help    Display the help message and command/subcommand options and arguments and exit.
-q, --quiet    Suppress all messages generated during HiFive processing.
-F, --fend FILEA tabular file in a format compatible with HiCPipe containing fragment and fend indices, fragment length, start or end position, and any additional fragment features desired (see Loading HiC Fends for more information).
-B, --bed FILE    A BED file containing either restriction enzyme fragment coordinates or retriction enzyme cutsite coordinates. Fragment features may be included in columns after the strand column. Features should be formatted with one feature per column and two values per feature separated by a comma. If the coordinates are of RE fragment boundaries, the feature values should correspond to the upstream end of the fragment followed by the downstream end of the fragment. If the coordinates are of RE cutsites, the values should correspond to the sequence just upstream of the cutsite followed by the sequence just downstream of the cutsite. If additional features are included, the bed file must have a header line identifying the features.
-L, --length FILE    A tab-separated text file containing chromosome names and lengths. Must be used in conjunction with a positive value of ‘binned’.
--binned int    Indicates what size bins to break genome into. If None is passed, fend-level resolution is kept.
-r, --re str    The name of the restriction enzyme.
-g, --genome str    The name of the genome.
-S, --bam FILES    A pair of BAM filenames separated by spaces corresponding to the two independently-mapped ends of a set of reads. Multiple file pairs may be passed by calling this argument more than once. This option is mutually exclusive with -R/–raw and -M/–mat.
-R, --raw FILE    A tabular file containing pairs of mapped read positions (see Loading HiC Data for more information).
-M, --mat FILE    A tabular file containing pairs of fend indices and their corresponding numbers of reads (see Loading HiC Data for more information).
-X, --matrix FILE    A tab-separated binned matrix containing summed fend interactions.
-i, --insert int    The maximum allowable insert size, as measured by the sum of both read end mapping positions to the nearest RE cutsite in the direction of alignment.
--skip-duplicate-filtering    Skip filtering of PCR duplicates (only applicable to raw and bam files).
-f, --min-interactions int    The minimum number of interactions with valid fends to keep a fend in the analysis. [20]
-m, --min-distance int    The minimum distance between fend midpoints to include in calculating numbers of interactions for fend filtering and (if called by hic-normalization or hic-complete) the minimum interaction distance included in learning correction parameter values. [0]
-x, --max-distance int    The maximum distance between fend midpoints to include in calculating numbers of interactions for fend filtering and (if called by hic-normalization or hic-complete) the maximum interaction distance included in learning correction parameter values. A value of zero indicates no maximum distance cutoff. [0]
-j, --min-binsize int    The cutoff size limit for the smallest distance bin used for estimating the distance dependence (see HiC Distance Dependence Estimation for more information). [1000]
-n, --num-bins int    The number of bins to partition the interaction size ranges into for estimating the distance dependence function (see HiC Distance Dependence Estimation for more information). A value of zero indicates that finding the distance dependence function should be skipped.
-c, --chromosomes str    A comma-separated list of chromosome names to include fends from when calculating correction parameter values. [all chromosomes]
-o, --output FILE    An optional filename to save the updated HiFive project to, leaving the original unchanged. [None]
    -o, --output FILES    A set of three filenames separated by spaces to save the newly-created HiFive fend, dataset, and project files to. Mutually exclusive with -P/–prefix.
-P, --prefix str    A prefix for the output filenames. The file extensions .fends, .hcd, and .hcp will be used for the fragment, dataset, and project files, respectively. This option is mutually exclusive with -o/–output.
    -b, --max-iterations int    The maximum number of iterations to run the learning process for. [1000]
-g, --min-change dec    The minimum allowable absolute gradient size to coninute learning process. [0.0005]
-p, --precalculate    Prior to beginning learning, set initial guesses for each correction value to be learned to the fragment’s mean difference between its log-counts and predicted distance-dependence signal.
-l, --learning-step dec    The scaling factor for decreasing learning rate by if step doesn’t meet Armijo criterion. [0.5]
-a, --probability-model    Which probability model to use for normalization (binomial or poisson).
-e, --express-iterations int    The number of iterations to run the learning process for. [1000]
-d, --remove-distance    Calculate and divide out the predicted distance-dependence signal from each count prior to learning correction parameters.
-w, --express-reads str    Which set reads to use for learning correction parameter values, cis, trans, or all. [cis]
-g, --min-change    The minimum mean change in fend correction parameter values needed to keep running past ‘iterations’ number of iterations. If using the Knight-Ruiz algorithm this is the residual cutoff.
-f, --min-interations int    The minimum number of interactions for fend filtering if refiltering is required due to distance cutoff parameters or selected reads to be used. [20]
-k, --binary bool    Use binary indicator instead of counts.
-z, --knight-ruiz bool    Use the Knight Ruiz matrix balancing algorithm instead of weighted matrix balancing. This option ignores ‘iterations’.
-r, --binning-iterations int    The maximum number of iterations to run the learning process for. [1000]
-t, --learning-threshold dec    The maximum change in log-likelihood necessary to stop the learning process early. [1.0]
-y, --binning-reads str    Which set of reads to use for learning correction parameter values, cis, trans, or all. [cis]
-v, --model str    A comma-separated list of fend features to calculate corrections for. Acceptable features are len (length), distance, and any features loaded in the BED or FEND file used to create the HiFive fend file. [len,distance]
-s, --model-bins str    A comma-separated list of numbers of bins to partition fend features into for modeling. [20,20]
-u, --parameter-types str    A comma-separated list of model parameter types. Acceptable values are even, fixed, even-const, and fixed-const. Even means that fend features are partitioned such that each bin has approximately even numbers of fends. Fixed means that the range of the feature is divided into fixed-width bins. The -const suffix indicates that the correction values are held at their seed-values and not updated. [even,fixed-const]
--pseudocounts int    The number of pseudo-counts to add to each bin prior to seeding and learning normalization values. [None]
-b, --binsize int    The width of bins (in basepairs) to partition data into. A value of zero indicates that each bin is to correspond with a single fend. [10000]
-t, --trans    Calculate and include trans interactions in heatmaps.
-c, --chromosomes str    A comma-separated list if chromosome names to include in the heatmaps. [all chromosomes]
-d, --datatype str    Type of data to produce for the heatmaps. Valid options are raw, fend (only fend corrections applied), distance (only distance-dependence signal removed), enrichment (both fend correction and distance-dependence signal removed), and expected (only predicted signal). [fend]
-F, --format str    The format of the output heatmap. Valid options are hdf5, txt, and npz. [hdf5]
-M, --matrix    Store output as a tab-separated matrix of values.
-y, --dynamically-bin    Dynamically bin heatmap.
-x, --expansion-binsize int    The size of bins, in base pairs, to group data into for expanding under-populated bins. [10000]
-f, --minobservations int    The minimum number of observed reads in a bin for it to be considered valid. [20]
-a, --search-distance int    The furthest distance from the bin minpoint to expand bounds. If set to zero, there is no limit on expansion distance. [0]
-v, --remove-failed    If a non-zero ‘search-distance’ is given, it is possible for a bin not to meet the ‘minobservations’ criteria before stopping looking. If this occurs and ‘remove-failed’ is set, the observed and expected values for that bin are zero.
-c, --chromosome str    The chromosome to pull data from.
-b, --binsize int    The width of bins (in basepairs) to partition data into. A value of zero indicates that each bin is to correspond with a single fend.
-s, --start int    The first coordinate of the chromosome to pull data from. None indicates the beginning of the chromosome. [None]
-e, --stop int    The last coordinate + 1 of the chromosome to pull data from. None indicates the end of the chromosome. [None]
-m, --max-distance int    The largest interaction distance to include in the interval file. A value of zero indicates no upper limit. [0]
-d, --datatype str    Type of data to produce for the heatmaps. Valid options are raw, fend (only fend corrections applied), distance (only distance-dependence signal removed), enrichment (both fend correction and distance-dependence signal removed), and expected (only predicted signal). [fend]
-y, --dynamically-bin    Dynamically bin heatmap.
-x, --expansion-binsize int    The size of bins, in base pairs, to group data into for expanding under-populated bins. [10000]
-f, --minobservations int    The minimum number of observed reads in a bin for it to be considered valid. [20]
-a, --search-distance int    The furthest distance from the bin minpoint to expand bounds. If set to zero, there is no limit on expansion distance. [0]
-v, --remove-failed    If a non-zero ‘search-distance’ is given, it is possible for a bin not to meet the ‘minobservations’ criteria before stopping looking. If this occurs and ‘remove-failed’ is set, the observed and expected values for that bin are zero.
-i, --image FILE    Generate an image from the region or regions for which heatmap data is being calculated. [None]
-p, --pdf    Format the image as a pdf. [None]
-r, --rotate    Rotate the image 45 degrees so the chromosome axis is horizontal and only plot the triangle above this axis.
-t, --ticks    Add coordinate ticks and labels to heatmap. This option can only be used if a pdf is requested.
-l, --legend    Add a color scale bar corresponding to interaction strength. This option can only be used if a pdf is requested.
-n, --names    Add chromosome names to the plot. This option can only be used if a pdf is requested.
-k, --keyword str    Pass additional plotting options accepted by the plotting module. Arguments should be of the format KEYWORD=VALUE. This option can be passed multiple times. [None]
-t, --trans    Calculate and include trans interactions in heatmaps.
-c, --chromosomes str    A comma-separated list if chromosome names to include in the heatmaps. [all chromosomes]
-f, --minobservations int    The minimum number of observed reads in a bin for it to be considered valid. [20]
-B, --maximum-binsize int    The largest sized bin to use (minimum resolution) in, base pairs. [1280000]
-b, --minimum-binsize int    The smallest sized bin to use (maximum resolution) in, base pairs. [10000]
-R, --maximum-trans-binsize int    The largest sized bin to use (minimum resolution) for inter-chromosomal interactions, in base pairs. If not specified, this defaults to the value of the -B option. [use -B value]
-r, --minimum-trans-binsize int    The smallest sized bin to use (maximum resolution) for inter-chromosomal interactions, in base pairs. If not specified, this defaults to the value of the -b option. [use -b value]
-m, --mid-binsize    The smalled sized bin to use for binning the entire chromosome, in base pairs. This is used to balance memory usage vs. speed and does not affect the output. [40000]
-d, --datatype str    Type of data to produce for the heatmaps. Valid options are raw, fend (only fend corrections applied), distance (only distance-dependence signal removed), enrichment (both fend correction and distance-dependence signal removed), and expected (only predicted signal). [fend]


Share your experience or ask a question