Category

Mapping


Usage

maq map [-n nmis] [-a maxins] [-c] [-1 len1] [-2 len2] [-d adap3] [-m mutrate] [-u unmapped] [-e maxerr] [-M c|g] [-N] [-H allhits] [-C maxhits] out.aln.map in.ref.bfa in.read1.bfq [in.read2.bfq] 2> out.map.log


Manual

  • -n INT    Number of maximum mismatches that can always be found [2]
  • -a INT    Maximum outer distance for a correct read pair [250]
  • -A INT    Maximum outer distance of two RF paied read (0 for disable) [0]
  • -c    Map reads in the colour space (for SOLiD only)
  • -1 INT    Read length for the first read, 0 for auto [0]
  • -2 INT    Read length for the second read, 0 for auto [0]
  • -m FLOAT    Mutation rate between the reference sequences and the reads [0.001]
  • -d FILE    Specify a file containing a single line of the 3’-adapter sequence [null]
  • -u FILE    Dump unmapped reads and reads containing more than nmis mismatches to a separate file [null]
  • -e INT    Threshold on the sum of mismatching base qualities [70]
  • -H FILE    Dump multiple/all 01-mismatch hits to FILE [null]
  • -C INT    Maximum number of hits to output. Unlimited if larger than 512. [250]
  • -M c|g    methylation alignment mode. All C (or G) on the forward strand will be changed to T (or A). This option is for testing only.
  • -N    store the mismatch position in the output file out.aln.map. When this option is in use, the maximum allowed read length is 55bp.

NOTE:

  • Paired end reads should be prepared in two files, one for each end, with reads are sorted in the same order. This means the k-th read in the first file is mated with the k-th read in the second file. The corresponding read names must be identical up to the tailing ‘/1’ or ‘/2’. For example, such a pair of read names are allowed: ‘EAS1_1_5_100_200/1’ and ‘EAS1_1_5_100_200/2’. The tailing ‘/[12]’ is usually generated by the GAPipeline to distinguish the two ends in a pair.
  •  The output is a compressed binary file. It is affected by the endianness.
  • The best way to run this command is to provide about 1 to 3 million reads as input. More reads consume more memory.
  • Option -n controls the sensitivity of the alignment. By default, a hit with up to 2 mismatches can be always found. Higher -n finds more hits and also improves the accuracy of mapping qualities. However, this is done at the cost of speed.
  • Alignments with many high-quality mismatches should be discarded as false alignments or possible contaminations. This behaviour is controlled by option -e. The -e threshold is only calculated approximately because base qualities are divided by 10 at a certain stage of the alignment. The -Q option in the assemble command precisely set the threshold.
  • A pair of reads are said to be correctly paired if and only if the orientation is FR and the outer distance of the pair is no larger than maxins. There is no limit on the minimum insert size. This setting is determined by the paired end alignment algorithm used in Maq. Requiring a minimum insert size will lead to some wrong alignments with highly overestimated mapping qualities.
  • Currently, read pairs from Illumina/Solexa long-insert library have RF read orientation. The maximum insert size is set by option -A. However, long-insert library is also mixed with a small fraction of short-insert read pairs. -a should also be set correctly.
  • Sometimes 5’-end or even the entire 3’-adapter sequence may be sequenced. Providing -d renders Maq to eliminate the adapter contaminations.
  • Given 2 million reads as input, maq usually takes 800MB memory.


Share your experience or ask a question