faFrag [options] in.fa start end out.fa
This tool is part of UCSC Genome Browser's utilities.
In the following example, we will extract the sequence for the gene ISG15 (chr1:1013497-1014540) from the reference sequence for human chromosome 1 (chr1.fa):
$ faFrag chr1.fa 1013497 1014540 isg15.fa Wrote 1043 bases to isg15.fa $ head isg15.fa >chr1:1013497-1014540 gcggctgagaggcagcgaactcatctttgccagtacaggagcttgtgccg tggcccacagcccacagcccacagccatggtaaggcagatgtcacaggtg gggggaggtgggctctgtgccagccaattttcgtctccctcccccagcca aggtctcccaggggtgcagggagagcggagctgctcagagcttggccagg ttctaagtgtgctcctgaaagcaggtcacccctgagatcctcagggtggg gcacagaggggcaccctagcaggtaaagggaggccacgggatggcggtgg gcagctggccttctagtaacgagccctcagtgccttctgtgcctggggtc cctgccggcgggatgtagaggacagacaggagggagcactgtccctgggt acaggagctcgccctgcagccagtgccttgtgtgtggtgggcctggggct
Note: if there are multiple sequences in a single FASTA file, the faFrag
command will only work on the first sequence, and you will see warning messages like the following:
More than one sequence in GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta, just using first
If this is not the expected behavior, you can use faOneRecord
to extract the hosting sequence first (chr1 in this case), then run faFrag
on the extracted file:
$ faOneRecord GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta chr1 > chr1.fa