Category

Sam/Bam Manipulation


Usage

java -jar picard.jar EstimateLibraryComplexity I=input.bamO=est_lib_complex_metrics.txt


Manual

INPUT (File)    One or more files to combine and estimate library complexity from. Reads can be mapped or unmapped. Default value: null. This option may be specified 0 or more times.
OUTPUT (File)    Output file to writes per-library metrics to. Required.
MIN_IDENTICAL_BASES (Integer)    The minimum number of bases at the starts of reads that must be identical for reads to be grouped together for duplicate detection. In effect total_reads / 4^max_id_bases reads will be compared at a time, so lower numbers will produce more accurate results but consume exponentially more memory and CPU. Default value: 5. This option can be set to 'null' to clear the default value.
MAX_DIFF_RATE (Double)    The maximum rate of differences between two reads to call them identical. Default value: 0.03. This option can be set to 'null' to clear the default value.
MIN_MEAN_QUALITY (Integer)    The minimum mean quality of the bases in a read pair for the read to be analyzed. Reads with lower average quality are filtered out and not considered in any calculations. Default value: 20. This option can be set to 'null' to clear the default value.
MAX_GROUP_RATIO (Integer)    Do not process self-similar groups that are this many times over the mean expected group size. I.e. if the input contains 10m read pairs and MIN_IDENTICAL_BASES is set to 5, then the mean expected group size would be approximately 10 reads. Default value: 500. This option can be set to 'null' to clear the default value.
BARCODE_TAG (String)    Barcode SAM tag (ex. BC for 10X Genomics) Default value: null.
READ_ONE_BARCODE_TAG (String)    Read one barcode SAM tag (ex. BX for 10X Genomics) Default value: null.
READ_TWO_BARCODE_TAG (String)    Read two barcode SAM tag (ex. BX for 10X Genomics) Default value: null.
MAX_READ_LENGTH (Integer)    The maximum number of bases to consider when comparing reads (0 means no maximum). Default value: 0. This option can be set to 'null' to clear the default value.
MIN_GROUP_COUNT (Integer)    Minimum number group count. On a per-library basis, we count the number of groups of duplicates that have a particular size. Omit from consideration any count that is less than this value. For example, if we see only one group of duplicates with size 500, we omit it from the metric calculations if MIN_GROUP_COUNT is set to two. Setting this to two may help remove technical artifacts from the library size calculation, for example, adapter dimers. Default value: 2. This option can be set to 'null' to clear the default value.
READ_NAME_REGEX (String)    Regular expression that can be used to parse read names in the incoming SAM file. Read names are parsed to extract three variables: tile/region, x coordinate and y coordinate. These values are used to estimate the rate of optical duplication in order to give a more accurate estimated library size. Set this option to null to disable optical duplicate detection, e.g. for RNA-seq or other data where duplicate sets are extremely large and estimating library complexity is not an aim. Note that without optical duplicate counts, library size estimation will be inaccurate. The regular expression should contain three capture groups for the three variables, in order. It must match the entire read name. Note that if the default regex is specified, a regex match is not actually done, but instead the read name is split on colon character. For 5 element names, the 3rd, 4th and 5th elements are assumed to be tile, x and y values. For 7 element names (CASAVA 1.8), the 5th, 6th, and 7th elements are assumed to be tile, x and y values. Default value: . This option can be set to 'null' to clear the default value.
OPTICAL_DUPLICATE_PIXEL_DISTANCE (Integer)    The maximum offset between two duplicate clusters in order to consider them optical duplicates. The default is appropriate for unpatterned versions of the Illumina platform. For the patterned flowcell models, 2500 is moreappropriate. For other platforms and models, users should experiment to find what works best. Default value: 100. This option can be set to 'null' to clear the default value.


Share your experience or ask a question