Multiple sequence alignments / Conservation
mafsInRegion regions.bed out.maf|outDir in.maf(s)
This tool is part of UCSC Genome Browser's utilities.
Assume you have a set of ChIP-seq peaks in a bed file called peaks.bed (corresponds to the first positional argument regions.bed):
$ head peaks.bed chr1 9740 10540 CTCF_1.294 chr1 10901 11701 CTCF_2.954 chr1 15866 16666 CTCF_3.40 chr1 25692 26492 CTCF_4.78 chr1 91003 91803 CTCF_5.199 chr1 104574 105374 CTCF_6.51 chr1 138816 139616 CTCF_7.75 chr1 237350 238150 CTCF_8.435 chr1 250914 251714 CTCF_9.52 chr1 324692 325492 CTCF_10.64
Assume you have the multiple alignments for 46 organisms from UCSC, and the files are stored in a folder called maf (corresponds to the third positional argument in.maf(s)):
$ ls maf/*.maf maf/chr10.maf maf/chr23_KZ114986v1_alt.maf maf/chr4.maf maf/chr11.maf maf/chr23_KZ115679v1_alt.maf maf/chr5_KZ115115v1_alt.maf maf/chr12.maf maf/chr23_KZ115681v1_alt.maf maf/chr5.maf maf/chr14.maf maf/chr23_KZ115682v1_alt.maf maf/chr6.maf maf/chr15.maf maf/chr23_KZ115692v1_alt.maf maf/chr7_KZ115231v1_alt.maf maf/chr16.maf maf/chr23.maf maf/chr7.maf maf/chr17_KZ115524v1_alt.maf maf/chr24_KZ115712v1_alt.maf maf/chr8_KZ115269v1_alt.maf maf/chr17.maf maf/chr24_KZ115722v1_alt.maf maf/chr8.maf maf/chr18.maf maf/chr24.maf maf/chr9_KZ114904v1_alt.maf maf/chr19.maf maf/chr25_KZ115732v1_alt.maf maf/chr9_KZ115282v1_alt.maf maf/chr1_KZ115007v1_alt.maf maf/chr25.maf maf/chr9.maf maf/chr1.maf maf/chr2.maf maf/chrM.maf maf/chr21.maf maf/chr3_KZ115071v1_alt.maf maf/chrUn_KN150185v1.maf maf/chr22.maf maf/chr3.maf
You can use mafsInRegion
to extract multiple sequence alignment within ChIP-seq peaks and save the extracted alignments to a file called output.maf (corresponds to the second positional argument out.maf):
mafsInRegion peaks.bed output.maf maf/*.maf
The above command writes content like the following to the output file:
$ head output.maf ##maf version=1 scoring=blastz a score=0.000000 s danRer11.chr10 21784387 800 + 45420867 TTAGATGTGAATGATAATGCACCTGAGATCATCATCACATCCTCACCCAAACCTGTGCGAGAAGATGCGCCTGCTGGGACAATGGTAGCTTTAATAAACGTTAAAGATTTAGATTCGGGCATAAACGGGAACGTAACTCTTCTTATTTTATCTGATACTCCTTTTAAATTAAAGCCAACATTTGCAAACCATTACGCACTGGTAACAGATTCAAAATTAGATCGAGAAAAGTTTCCTAAATATGACATTGAGCTTAAAGCATCAGACTCTGGATCACCTCCACTGGTATCAAGCAAACTCA--TTACAGTTAATATACTAGATGTCAATGATAATCCTCCTGTTTTCTCTGAACGTGTGTACTCGGTTTACATTAAAGAAAACAGCGCTCCAGGATCGATATTAGCATCGGTGACAGCATCAGATCTAGATACAGGAGAAAATGCAAAAATTGTGTATTCAGTTATTGATACTAATACTCGAGACGTACCTGTCTCTTCCTATGTATACATAAACGCAGAAAATGGCAGTATATTTAGCATGCACTCGTTTGATTACGAGAAAAT-AAAGGTCT-----TTCATGTTATTGTGCTTGCCAAAGATCAAGGCTCCCAATCTCTGAGCAGCAACGCTACTGTTCATGTATTTATTCTGGACCAGAACGATAATGCACCTGCTGTCATTTACCCGTCCACATCCATGGGCTCGGTCTCTAAT-CAGAGGA-----TGCCCCGTTCTGCTAAAGCAGGACATCTCGTTACTAAGGTAACGGCAGTGGACGCGGACTCGGGTCATAACGCCTGGCTGTT s mm10.chr18 37767127 803 + 90702639 CTGGACGTGAACGACAATGCCCCTGAAGTAGCCATCACGTCCCTCACCAACTCTGTCCCAGAAAACTCTCCCCAAGGGACATTAATAGCACTTTTAAACGTAAATGATCAAGATTCTGGGGAAAATGGACAGGTAATCTGTTCCATCCAAGAGAATCTGCCCTTTAAGTTAGAAAAGTCTTACGGAAACTATTATAGATTAGTCACAGATGCAGTCCTGGACCGAGAAGAGGTTCCTAGTTACAACATCACAATGACCGCCACTGACAGGGGAAGTCCGCCCCTGACAACAG--AAACTCACCTCGCACTGGACATAGCAGACACGAACGATAACTCGCCCGTTTTCCTTCAGGCCTCATACTGGGCCTACATCCCAGAGAATAACCCTAGAGGGGCCTCTATCGCATCCGTGACCGCCCACGACCCCGACAGTGACAAAAATGCCCAAGTCACTTACTCCCTAGCTGAGGACACCCACCAGGGCGTGCCCCTTTCCTCTTACGTTTCCATCAACTCGGACACTGGTGTCCTGTACGCACTGCATTCCTTTGACTACGAGCAGTTCCCAGACCTACAACTGCAAGTGATAGCGCGTGACAGCGGGGA------CCCGCCACTCAGCAGCAACGTGTCACTGAGCTTGTTCGTGCTGGATCAGAATGACAACGTCCCCGAAATCTTGTACCCCACTCTCCCTACCGAC---GGTTCTACTGGAGTGGAGCTAGCACCCCGCTCAGCAGAGCCTGGCTATCTGGTGACAAAGGTGGTGGCAGTGGACAGAGACTCAGGACAGAACGCCTGGCTGTC