Category

Sam/Bam Manipulation


Usage

java -jar picard.jar UmiAwareMarkDuplicatesWithMateCigar


Manual

MAX_EDIT_DISTANCE_TO_JOIN (Integer)    Largest edit distance that UMIs must have in order to be considered as coming from distinct source molecules. Default value: 1. This option can be set to 'null' to clear the default value.
UMI_TAG_NAME (String)    Tag name to use for UMI Default value: RX. This option can be set to 'null' to clear the default value.
ASSIGNED_UMI_TAG (String)    Tag name to use for assigned UMI Default value: MI. This option can be set to 'null' to clear the default value.
ALLOW_MISSING_UMIS (Boolean)    FOR TESTING ONLY: allow for missing UMIs if data doesn't have UMIs. This option is intended to be used ONLY for testing the code. Use MarkDuplicatesWithMateCigar if data has no UMIs. Mixed data (where some reads have UMIs and others do not) is not supported. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP (Integer)    This option is obsolete. ReadEnds will always be spilled to disk. Default value: 50000. This option can be set to 'null' to clear the default value.
MAX_FILE_HANDLES_FOR_READ_ENDS_MAP (Integer)    Maximum number of file handles to keep open when spilling read ends to disk. Set this number a little lower than the per-process maximum number of file that may be open. This number can be found by executing the 'ulimit -n' command on a Unix system. Default value: 8000. This option can be set to 'null' to clear the default value.
SORTING_COLLECTION_SIZE_RATIO (Double)    This number, plus the maximum RAM available to the JVM, determine the memory footprint used by some of the sorting collections. If you are running out of memory, try reducing this number. Default value: 0.25. This option can be set to 'null' to clear the default value.
BARCODE_TAG (String)    Barcode SAM tag (ex. BC for 10X Genomics) Default value: null.
READ_ONE_BARCODE_TAG (String)    Read one barcode SAM tag (ex. BX for 10X Genomics) Default value: null.
READ_TWO_BARCODE_TAG (String)    Read two barcode SAM tag (ex. BX for 10X Genomics) Default value: null.
REMOVE_SEQUENCING_DUPLICATES (Boolean)    If true remove 'optical' duplicates and other duplicates that appear to have arisen from the sequencing process instead of the library preparation process, even if REMOVE_DUPLICATES is false. If REMOVE_DUPLICATES is true, all duplicates are removed and this option is ignored. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
TAGGING_POLICY (DuplicateTaggingPolicy)    Determines how duplicate types are recorded in the DT optional attribute. Default value: DontTag. This option can be set to 'null' to clear the default value. Possible values: {DontTag, OpticalOnly, All}
INPUT (String)    One or more input SAM or BAM files to analyze. Must be coordinate sorted. Default value: null. This option may be specified 0 or more times.
OUTPUT (File)    The output file to write marked records to Required.
METRICS_FILE (File)    File to write duplication metrics to Required.
REMOVE_DUPLICATES (Boolean)    If true do not write duplicates to the output file instead of writing them with appropriate flags set. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
ASSUME_SORTED (Boolean)    If true, assume that the input file is coordinate sorted even if the header says otherwise. Deprecated, used ASSUME_SORT_ORDER=coordinate instead. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} Cannot be used in conjuction with option(s) ASSUME_SORT_ORDER (ASO)
ASSUME_SORT_ORDER (SortOrder)    If not null, assume that the input file has this order even if the header says otherwise. Default value: null. Possible values: {unsorted, queryname, coordinate, duplicate} Cannot be used in conjuction with option(s) ASSUME_SORTED (AS)
DUPLICATE_SCORING_STRATEGY (ScoringStrategy)    The scoring strategy for choosing the non-duplicate among candidates. Default value: SUM_OF_BASE_QUALITIES. This option can be set to 'null' to clear the default value. Possible values: {SUM_OF_BASE_QUALITIES, TOTAL_MAPPED_REFERENCE_LENGTH, RANDOM}
PROGRAM_RECORD_ID (String)    The program record ID for the @PG record(s) created by this program. Set to null to disable PG record creation. This string may have a suffix appended to avoid collision with other program record IDs. Default value: MarkDuplicates. This option can be set to 'null' to clear the default value.
PROGRAM_GROUP_VERSION (String)    Value of VN tag of PG record to be created. If not specified, the version will be detected automatically. Default value: null.
PROGRAM_GROUP_COMMAND_LINE (String)    Value of CL tag of PG record to be created. If not supplied the command line will be detected automatically. Default value: null.
PROGRAM_GROUP_NAME (String)    Value of PN tag of PG record to be created. Default value: UmiAwareMarkDuplicatesWithMateCigar. This option can be set to 'null' to clear the default value.
COMMENT (String)    Comment(s) to include in the output file's header. Default value: null. This option may be specified 0 or more times.
READ_NAME_REGEX (String)    Regular expression that can be used to parse read names in the incoming SAM file. Read names are parsed to extract three variables: tile/region, x coordinate and y coordinate. These values are used to estimate the rate of optical duplication in order to give a more accurate estimated library size. Set this option to null to disable optical duplicate detection, e.g. for RNA-seq or other data where duplicate sets are extremely large and estimating library complexity is not an aim. Note that without optical duplicate counts, library size estimation will be inaccurate. The regular expression should contain three capture groups for the three variables, in order. It must match the entire read name. Note that if the default regex is specified, a regex match is not actually done, but instead the read name is split on colon character. For 5 element names, the 3rd, 4th and 5th elements are assumed to be tile, x and y values. For 7 element names (CASAVA 1.8), the 5th, 6th, and 7th elements are assumed to be tile, x and y values. Default value: . This option can be set to 'null' to clear the default value.
OPTICAL_DUPLICATE_PIXEL_DISTANCE (Integer)    The maximum offset between two duplicate clusters in order to consider them optical duplicates. The default is appropriate for unpatterned versions of the Illumina platform. For the patterned flowcell models, 2500 is moreappropriate. For other platforms and models, users should experiment to find what works best. Default value: 100. This option can be set to 'null' to clear the default value.


Share your experience or ask a question