Reference Code backup Executable files
Extract a maf that just has a subset of species.
Multiple sequence alignments / Conservation
mafSpeciesSubset in.maf species.lst out.maf
This tool is part of UCSC Genome Browser's utilities.
mafFrag
results where we use this for the gene nameIn the following example, we extract the MAF of chromosome 22 for human(hg38) and Chimpanzee (panTro5) (defined in a file called species.lst) from the multiz30way multiple alignment file (stored in the file maf/chr22.maf.gz). The extracted multiple alignments are written to a file called results.maf.
$ cat species.lst hg38 panTro5 $ mafSpeciesSubset maf/chr22.maf.gz species.lst results.maf $ head results.maf ##maf version=1 scoring=roast.v3.3 a score=59305.000000 s hg38.chr22 10510000 7 + 50818468 GAATTCT s panTro5.chr20 39659357 7 - 66533130 GAATTCT i panTro5.chr20 N 0 C 0 a score=1614586.000000 s hg38.chr22 10510007 206 + 50818468 TGTGTTTATATAATAAGATGTCCTATAATTTCTGTTTGGAATATAAAATCAGCAACTAATATGTATTTTCAAAGCATTATAAATACAGAGTGCTAAGTTACTTCACTGTGAAATGTAGTCATATAAAGAACATAATAATTATACTGGATTATTTTTAAATGGGCTGTCTAACATTATATTAAAAGGTTTCATCAGTAATTCATTAT s panTro5.chr20 39659364 206 - 66533130 TGTGTTTATATAATATGGTGTCCTATAATTTCTGTTTGGAATATAAAATCAGCAACTAGTGTGTATTTTCAAAGCATTATAAATACAGAGTGCTAAGTGACCTCACTGTGAAATGTAGTCATATAAAGAATATAATAATTATACTGGATTCTTTTTAAATGGGCTGTCTAACATTATATTAAAAGGTATCTCCAGTAATTCATTAT i panTro5.chr20 C 0 C 0
Extracting the MAF for an entire chromosome can take some time, if you are only interested in a set of genomic ranges (say you define them in a bed file called range.bed), then you can extract MAF fragments with mafFrags
and pipe the output to mafSpeciesSubset
:
$ mafFrags hg38 multiz30way range.bed stdout | \ mafSpeciesSubset stdin species.lst results.maf