Category

Multiple sequence alignments / Conservation


Usage

mafSplit splits.bed outRoot file(s).maf


Manual

This tool is part of UCSC Genome Browser's utilities.

Positional arguments

  • splits.bed: Regions to be split
  • outRoot: Folder name for storing the output MAF files
  • file(s).maf: Input MAF file to be split

Options

  • -byTarget: Make one file per target sequence. (splits.bed input is ignored, but you still need to put a placeholder string).
  • -outDirDepth=N: For use only with -byTarget. Create N levels of output directory under current dir. This helps prevent NFS problems with a large number of file in a directory. Using -outDirDepth=3 would produce ./1/2/3/outRoot123.maf.
  • -useSequenceName: For use only with -byTarget. Instead of auto-incrementing an integer to determine output filename, expect each target sequence name to end with a unique number and use that number as the integer to tack onto outRoot.
  • -useFullSequenceName: For use only with -byTarget. Instead of auto-incrementing an integer to determine output filename, use the target sequence name to tack onto outRoot.
  • -useHashedName=N: For use only with -byTarget. Instead of auto-incrementing an integer or requiring a unique number in the sequence name, use a hash function on the sequence name to compute an N-bit number. This limits the max #filenames to 2^N and ensures that even if different subsets of sequences appear in different pairwise mafs, the split file names will be consistent (due to hash function). This option is useful when a "scaffold-based" assembly has more than one sequence name pattern, e.g. both chroms and scaffolds.

Example

Assume you want to compare the conservation between mouse and zebrafish sequences, and you've already have the multiple alignment file (danRer11.mm10.synNet.maf). For the next step, you want to use tools like phastCons to calculate the scores. But phastCons takes one chromosome as input at one time, in this case, you can use the following command to split multiple alignments for each chromosome:

mafSplit -byTarget -useFullSequenceName placeholder.bed maf/ danRer11.mm10.synNet.maf

MAFs for each chromosome will be written to the folder maf:

$ ls maf
chr10.maf                 chr1.maf                  chr24.maf                 chr7.maf
chr11.maf                 chr21.maf                 chr25_KZ115732v1_alt.maf  chr8_KZ115269v1_alt.maf
chr12.maf                 chr22.maf                 chr25.maf                 chr8.maf
chr14.maf                 chr23_KZ114986v1_alt.maf  chr2.maf                  chr9_KZ114904v1_alt.maf
chr15.maf                 chr23_KZ115679v1_alt.maf  chr3_KZ115071v1_alt.maf   chr9_KZ115282v1_alt.maf
chr16.maf                 chr23_KZ115681v1_alt.maf  chr3.maf                  chr9.maf
chr17_KZ115524v1_alt.maf  chr23_KZ115682v1_alt.maf  chr4.maf                  chrM.maf
chr17.maf                 chr23_KZ115692v1_alt.maf  chr5_KZ115115v1_alt.maf   chrUn_KN150185v1.maf
chr18.maf                 chr23.maf                 chr5.maf
chr19.maf                 chr24_KZ115712v1_alt.maf  chr6.maf
chr1_KZ115007v1_alt.maf   chr24_KZ115722v1_alt.maf  chr7_KZ115231v1_alt.maf

File formats this tool works with
MAF

Share your experience or ask a question