Category

Sam/Bam Manipulation


Usage

java -jar picard.jar IlluminaBasecallsToFastq READ_STRUCTURE=25T8B25T BASECALLS_DIR=basecallDirectory LANE=001 OUTPUT_PREFIX=noBarcode.1 RUN_BARCODE=run15 FLOWCELL_BARCODE=abcdeACXX


Manual

BASECALLS_DIR (File)    The basecalls directory. Required.
BARCODES_DIR (File)    The barcodes directory with _barcode.txt files (generated by ExtractIlluminaBarcodes). If not set, use BASECALLS_DIR. Default value: null.
LANE (Integer)    Lane number. Required.
OUTPUT_PREFIX (File)    The prefix for output FASTQs. Extensions as described above are appended. Use this option for a non-barcoded run, or for a barcoded run in which it is not desired to demultiplex reads into separate files by barcode. Required. Cannot be used in conjuction with option(s) MULTIPLEX_PARAMS
RUN_BARCODE (String)    The barcode of the run. Prefixed to read names. Required.
MACHINE_NAME (String)    The name of the machine on which the run was sequenced; required if emitting Casava1.8-style read name headers Default value: null.
FLOWCELL_BARCODE (String)    The barcode of the flowcell that was sequenced; required if emitting Casava1.8-style read name headers Default value: null.
READ_STRUCTURE (String)    A description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Sample Barcode, M for molecular barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of "28T8M8B8S28T" then the sequence may be split up into four reads: * read one with 28 cycles (bases) of template * read two with 8 cycles (bases) of molecular barcode (ex. unique molecular barcode) * read three with 8 cycles (bases) of sample barcode * 8 cycles (bases) skipped. * read four with 28 cycles (bases) of template The skipped cycles would NOT be included in an output SAM/BAM file or in read groups therein. Required.
MULTIPLEX_PARAMS (File)    Tab-separated file for creating all output FASTQs demultiplexed by barcode for a lane with single IlluminaBasecallsToFastq invocation. The columns are OUTPUT_PREFIX, and BARCODE_1, BARCODE_2 ... BARCODE_X where X = number of barcodes per cluster (optional). Row with BARCODE_1 set to 'N' is used to specify an output_prefix for no barcode match. Required. Cannot be used in conjuction with option(s) OUTPUT_PREFIX (O)
ADAPTERS_TO_CHECK (IlluminaAdapterPair)    Which adapters to look for in the read. Default value: [INDEXED, DUAL_INDEXED, NEXTERA_V2, FLUIDIGM]. This option can be set to 'null' to clear the default value. Possible values: {PAIRED_END, INDEXED, SINGLE_END, NEXTERA_V1, NEXTERA_V2, DUAL_INDEXED, FLUIDIGM, TRUSEQ_SMALLRNA, ALTERNATIVE_SINGLE_END} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.
NUM_PROCESSORS (Integer)    The number of threads to run in parallel. If NUM_PROCESSORS = 0, number of cores is automatically set to the number of cores available on the machine. If NUM_PROCESSORS
FIRST_TILE (Integer)    If set, this is the first tile to be processed (used for debugging). Note that tiles are not processed in numerical order. Default value: null.
TILE_LIMIT (Integer)    If set, process no more than this many tiles (used for debugging). Default value: null.
APPLY_EAMSS_FILTER (Boolean)    Apply EAMSS filtering to identify inappropriately quality scored bases towards the ends of reads and convert their quality scores to Q2. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
FORCE_GC (Boolean)    If true, call System.gc() periodically. This is useful in cases in which the -Xmx value passed is larger than the available memory. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
MAX_READS_IN_RAM_PER_TILE (Integer)    Configure SortingCollections to store this many records before spilling to disk. For an indexed run, each SortingCollection gets this value/number of indices. Default value: 1200000. This option can be set to 'null' to clear the default value.
MINIMUM_QUALITY (Integer)    The minimum quality (after transforming 0s to 1s) expected from reads. If qualities are lower than this value, an error is thrown.The default of 2 is what the Illumina's spec describes as the minimum, but in practice the value has been observed lower. Default value: 2. This option can be set to 'null' to clear the default value.
INCLUDE_NON_PF_READS (Boolean)    Whether to include non-PF reads Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
IGNORE_UNEXPECTED_BARCODES (Boolean)    Whether to ignore reads whose barcodes are not found in MULTIPLEX_PARAMS. Useful when outputting FASTQs for only a subset of the barcodes in a lane. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
READ_NAME_FORMAT (ReadNameFormat)    The read name header formatting to emit. Casava1.8 formatting has additional information beyond Illumina, including: the passing-filter flag value for the read, the flowcell name, and the sequencer name. Default value: CASAVA_1_8. This option can be set to 'null' to clear the default value. Possible values: {CASAVA_1_8, ILLUMINA}
COMPRESS_OUTPUTS (Boolean)    Compress output FASTQ files using gzip and append a .gz extension to the file names. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}


Share your experience or ask a question