Reference Code backup Executable files
Profile the nucleotide content of intervals in a fasta file
bedtools nuc [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>
This tool is part of the bedtools
suite and it's also known as nucBed
.
The first line in the output stores column names. The first several columns are the same as defined in the original -bed file. Columns are named as colNum_usercol. Assume there are 4 columns in the input bed file, then the first four columns in the output will be 1_usercol, 2_usercol, 3_usercol, and 4_usercol. The following information will be reported after each BED entry:
$ bedtools nuc-fi GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta
-bed rDHS.bed
| head #1_usercol 2_usercol 3_usercol 4_usercol 5_pct_at 6_pct_gc 7_num_A 8_num_C 9_num_G 10_num_T 11_num_N 12_num_oth 13_seq_len chr1 104896 105048 EH38E2776520 0.552632 0.447368 51 33 35 33 0 0 152 chr1 138866 139134 EH38E2776521 0.406716 0.593284 59 55 104 50 0 0 268 chr1 180743 180904 EH38E2776522 0.503106 0.496894 54 79 1 27 0 0 161 chr1 181014 181237 EH38E2776523 0.269058 0.730942 37 82 81 23 0 0 223 chr1 181289 181639 EH38E2776524 0.240000 0.760000 58 99 167 26 0 0 350 chr1 267925 268171 EH38E2776528 0.483740 0.516260 49 55 72 70 0 0 246 chr1 271226 271468 EH38E2776529 0.595041 0.404959 55 49 49 89 0 0 242 chr1 274329 274481 EH38E2776530 0.598684 0.401316 47 25 36 44 0 0 152 chr1 586036 586264 EH38E2776532 0.464912 0.535088 48 55 67 58 0 0 228
$ bedtools nuc-fi GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta
-bed rDHS.bed
-s | head #1_usercol 2_usercol 3_usercol 4_usercol 5_pct_at 6_pct_gc 7_num_A 8_num_C 9_num_G 10_num_T 11_num_N 12_num_oth 13_seq_len chr1 104896 105048 EH38E2776520 0.552632 0.447368 51 33 35 33 0 0 152 chr1 138866 139134 EH38E2776521 0.406716 0.593284 59 55 104 50 0 0 268 chr1 180743 180904 EH38E2776522 0.503106 0.496894 54 79 1 27 0 0 161 chr1 181014 181237 EH38E2776523 0.269058 0.730942 37 82 81 23 0 0 223 chr1 181289 181639 EH38E2776524 0.240000 0.760000 58 99 167 26 0 0 350 chr1 267925 268171 EH38E2776528 0.483740 0.516260 49 55 72 70 0 0 246 chr1 271226 271468 EH38E2776529 0.595041 0.404959 55 49 49 89 0 0 242 chr1 274329 274481 EH38E2776530 0.598684 0.401316 47 25 36 44 0 0 152 chr1 586036 586264 EH38E2776532 0.464912 0.535088 48 55 67 58 0 0 228
Profile regions and get the corresponding sequences
$ bedtools nuc-fi GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta
-bed rDHS.bed
-seq | head #1_usercol 2_usercol 3_usercol 4_usercol 5_pct_at 6_pct_gc 7_num_A 8_num_C 9_num_G 10_num_T 11_num_N 12_num_oth 13_seq_len 14_seq chr1 104896 105048 EH38E2776520 0.552632 0.447368 51 33 35 33 0 0 152 TTTCAGATCTCTAGAACTATCCATCAGTGAAATGGATTGCAAATACAAAGAGTAATACCATGTCACTTAAGAATAGAATCATGGACGAGGCTGCCACCTGCTGTTGGGGGCCACTGCAGAAGAAATTCCAGAACACTGGACTGGAGAGCACC chr1 138866 139134 EH38E2776521 0.406716 0.593284 59 55 104 50 0 0 268 GGCCTGGAGAAGCCCCCATGAGGCAGAGGTTGGGCCTGTAGACGCTGACAGGAGGCAGGAGCTGGGCCTGGACAGGTCAACTTGAGGAGATTTTGGGCCTTCATAGGCCACCAGGAGGCAGCAGTTGGGACTAGAGAGTCTGACTTGAGTAAGTTTTGGGCCCGGAGATGATGTCCTGGGACAGGAGTTGGCCGTGGAGAGGCCACCGTGAGGCATAAGCTGGATGTAGAGAGGCCAGTGTGAGGCAAGACCTGGGCCTGTCTAGGCT chr1 180743 180904 EH38E2776522 0.503106 0.496894 54 79 1 27 0 0 161 TCTAGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACAACCCTAACCCTAACCCTAACAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT chr1 181014 181237 EH38E2776523 0.269058 0.730942 37 82 81 23 0 0 223 GGCCCGCCCGCCCGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAGAGTACCACCGAAATCTGTGCAGAGGACAACGCAGCTCCGCCCTCGCGGTACTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGAGGCGCGCCTCGCC chr1 181289 181639 EH38E2776524 0.240000 0.760000 58 99 167 26 0 0 350 CGCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGAGGAGGCGTGGCACAGGCGCAGAGACACATGCTAGCGCGCCCAGGGGAGGAGGCGTGGCGCAGGCGCAGAGAGGCGCGCCGTGCTGCCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGGTGGAGGCGTGGCGCAGGCGCAGAGACGCACGCCTACGGGCGGGGTTGGGGGGGGCGTGTGTTACAGGAGCAAAGTCGCACGGCGCCGGGCTGGGGGCGGGGGGGGGGGGGGGGGCGCCGTGCACGCGCAGAAACTCACGTCACGGCGGCGCGG chr1 267925 268171 EH38E2776528 0.483740 0.516260 49 55 72 70 0 0 246 AAGGGTTGCTTGACCCACAGATGTGAAGCTGAGGCTGAAGGAGACTGATGTGGTTTCTCCTCAGTTTCTCTGTGCAGCACCAGGTGGCAGCAGAGGTCAGCAAGGCAAACCCGAGCCCGGGGATGCGGAGTGGGGGCAGCTACGTCCTCTCTTGAGCTACAGCAGATTCACTCTGTTCTGTTTCATTGTTGTTTAGTTTGCGTTGTGTTTCTCCAACTTTGTGCCTCATCAGGAAAAGCTTTGGAT chr1 271226 271468 EH38E2776529 0.595041 0.404959 55 49 49 89 0 0 242 ACAGTGGTTTCAGGCAGCATCTGAAGACAGTAAAAGCAGAAGCTCCAAGGCTTCTTACATTCTAGCCTGGAAAATTACATCACATTGCTTCCTTCATATTTTTTTGGCAAATCAGGTTGCAAGGCTTGCCCAGATTAGGGTAAAGAGGCAAAGAGGCTCCTTTTCTTTTCTTTTCTTTTCTTTTTTCTTTTTTTTTTTTTTTTGAGTCAGAATCTCGCTCTGTTGCCCAGGCTGGAGTGCAG chr1 274329 274481 EH38E2776530 0.598684 0.401316 47 25 36 44 0 0 152 GAGAATACAGTAAACTCTATGAGGCAAGCTATAAACATGTAGCATTGTGATTAGGGCTGGTTCTCCTTCTAGAGATATGGTAGGATTGCAATTTCATACCATCCTTGAAGTTAGAGAGAGCCATGTGACTCATTTAGCCAATGAACTGTGAG chr1 586036 586264 EH38E2776532 0.464912 0.535088 48 55 67 58 0 0 228 ATTTTCCTGAGAGGAAAGCTTTCCCACATTATTCAGCTTCTGAAAGGGTTGCTTGACCCACAGATGTGAAGCTGAGGCTGAAGGAGACTGATGTGGTTTCTCCTCAGTTTCTCTGTGCGGCACCAGGTGGCAGCAGAGGTCAGCAAGGCAAACCCGAGCCCGGGGATGCGGGGTGGGGGCAGCTACGTCCTCTCTTGAGCTACAGCAGATTCACTCTGTTCTGTTTCA