Category

Sequence Analysis


Usage

bedtools nuc [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>


Manual

This tool is part of the bedtools suite and it's also known as nucBed.

Required arguments

  • -fi Path: Input FASTA file
  • -bed Path: BED/GFF/VCF file of ranges to extract from -fi

Options

  • -s: Profile the sequence according to strand.
  • -seq: Print the extracted sequence
  • -pattern SEQ: Report the number of times a user-defined sequence is observed (case-sensitive).
  • -C: Ignore case when matching -pattern. By defaulty, case matters.
  • -fullHeader: Use full fasta header. By default, only the word before the first space or tab is used.

Output format

The first line in the output stores column names. The first several columns are the same as defined in the original -bed file. Columns are named as colNum_usercol. Assume there are 4 columns in the input bed file, then the first four columns in the output will be 1_usercol, 2_usercol, 3_usercol, and 4_usercol. The following information will be reported after each BED entry:

  1. %AT content (pct_at)
  2. %GC content (pct_gc)
  3. Number of As observed (num_A)
  4. Number of Cs observed (num_C)
  5. Number of Gs observed (num_G)
  6. Number of Ts observed (num_T)
  7. Number of Ns observed (num_N)
  8. Number of other bases observed (num_oth)
  9. The length of the explored sequence/interval.  (seq_len)
  10. The sequence extracted from the FASTA file. (optional, only if -seq is used) (seq)
  11. The number of times a user's pattern was observed. (optional, only if -pattern is used.) (user_patt_count)

Examples

Profile regions (regardless of strand)
$ bedtools nuc -fi GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta -bed rDHS.bed | head
#1_usercol    2_usercol    3_usercol    4_usercol    5_pct_at    6_pct_gc    7_num_A    8_num_C    9_num_G    10_num_T    11_num_N    12_num_oth    13_seq_len
chr1    104896    105048    EH38E2776520    0.552632    0.447368    51    33    35    33    0    0    152
chr1    138866    139134    EH38E2776521    0.406716    0.593284    59    55    104    50    0    0    268
chr1    180743    180904    EH38E2776522    0.503106    0.496894    54    79    1    27    0    0    161
chr1    181014    181237    EH38E2776523    0.269058    0.730942    37    82    81    23    0    0    223
chr1    181289    181639    EH38E2776524    0.240000    0.760000    58    99    167    26    0    0    350
chr1    267925    268171    EH38E2776528    0.483740    0.516260    49    55    72    70    0    0    246
chr1    271226    271468    EH38E2776529    0.595041    0.404959    55    49    49    89    0    0    242
chr1    274329    274481    EH38E2776530    0.598684    0.401316    47    25    36    44    0    0    152
chr1    586036    586264    EH38E2776532    0.464912    0.535088    48    55    67    58    0    0    228
Profile regions (aware of strand information)
$ bedtools nuc -fi GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta -bed rDHS.bed -s | head
#1_usercol    2_usercol    3_usercol    4_usercol    5_pct_at    6_pct_gc    7_num_A    8_num_C    9_num_G    10_num_T    11_num_N    12_num_oth    13_seq_len
chr1    104896    105048    EH38E2776520    0.552632    0.447368    51    33    35    33    0    0    152
chr1    138866    139134    EH38E2776521    0.406716    0.593284    59    55    104    50    0    0    268
chr1    180743    180904    EH38E2776522    0.503106    0.496894    54    79    1    27    0    0    161
chr1    181014    181237    EH38E2776523    0.269058    0.730942    37    82    81    23    0    0    223
chr1    181289    181639    EH38E2776524    0.240000    0.760000    58    99    167    26    0    0    350
chr1    267925    268171    EH38E2776528    0.483740    0.516260    49    55    72    70    0    0    246
chr1    271226    271468    EH38E2776529    0.595041    0.404959    55    49    49    89    0    0    242
chr1    274329    274481    EH38E2776530    0.598684    0.401316    47    25    36    44    0    0    152
chr1    586036    586264    EH38E2776532    0.464912    0.535088    48    55    67    58    0    0    228

Profile regions and get the corresponding sequences

$ bedtools nuc -fi GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta -bed rDHS.bed -seq | head
#1_usercol    2_usercol    3_usercol    4_usercol    5_pct_at    6_pct_gc    7_num_A    8_num_C    9_num_G    10_num_T    11_num_N    12_num_oth    13_seq_len    14_seq
chr1    104896    105048    EH38E2776520    0.552632    0.447368    51    33    35    33    0    0    152    TTTCAGATCTCTAGAACTATCCATCAGTGAAATGGATTGCAAATACAAAGAGTAATACCATGTCACTTAAGAATAGAATCATGGACGAGGCTGCCACCTGCTGTTGGGGGCCACTGCAGAAGAAATTCCAGAACACTGGACTGGAGAGCACC
chr1    138866    139134    EH38E2776521    0.406716    0.593284    59    55    104    50    0    0    268    GGCCTGGAGAAGCCCCCATGAGGCAGAGGTTGGGCCTGTAGACGCTGACAGGAGGCAGGAGCTGGGCCTGGACAGGTCAACTTGAGGAGATTTTGGGCCTTCATAGGCCACCAGGAGGCAGCAGTTGGGACTAGAGAGTCTGACTTGAGTAAGTTTTGGGCCCGGAGATGATGTCCTGGGACAGGAGTTGGCCGTGGAGAGGCCACCGTGAGGCATAAGCTGGATGTAGAGAGGCCAGTGTGAGGCAAGACCTGGGCCTGTCTAGGCT
chr1    180743    180904    EH38E2776522    0.503106    0.496894    54    79    1    27    0    0    161    TCTAGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACAACCCTAACCCTAACCCTAACAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
chr1    181014    181237    EH38E2776523    0.269058    0.730942    37    82    81    23    0    0    223    GGCCCGCCCGCCCGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAGAGTACCACCGAAATCTGTGCAGAGGACAACGCAGCTCCGCCCTCGCGGTACTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGAGGCGCGCCTCGCC
chr1    181289    181639    EH38E2776524    0.240000    0.760000    58    99    167    26    0    0    350    CGCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGAGGAGGCGTGGCACAGGCGCAGAGACACATGCTAGCGCGCCCAGGGGAGGAGGCGTGGCGCAGGCGCAGAGAGGCGCGCCGTGCTGCCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGGTGGAGGCGTGGCGCAGGCGCAGAGACGCACGCCTACGGGCGGGGTTGGGGGGGGCGTGTGTTACAGGAGCAAAGTCGCACGGCGCCGGGCTGGGGGCGGGGGGGGGGGGGGGGGCGCCGTGCACGCGCAGAAACTCACGTCACGGCGGCGCGG
chr1    267925    268171    EH38E2776528    0.483740    0.516260    49    55    72    70    0    0    246    AAGGGTTGCTTGACCCACAGATGTGAAGCTGAGGCTGAAGGAGACTGATGTGGTTTCTCCTCAGTTTCTCTGTGCAGCACCAGGTGGCAGCAGAGGTCAGCAAGGCAAACCCGAGCCCGGGGATGCGGAGTGGGGGCAGCTACGTCCTCTCTTGAGCTACAGCAGATTCACTCTGTTCTGTTTCATTGTTGTTTAGTTTGCGTTGTGTTTCTCCAACTTTGTGCCTCATCAGGAAAAGCTTTGGAT
chr1    271226    271468    EH38E2776529    0.595041    0.404959    55    49    49    89    0    0    242    ACAGTGGTTTCAGGCAGCATCTGAAGACAGTAAAAGCAGAAGCTCCAAGGCTTCTTACATTCTAGCCTGGAAAATTACATCACATTGCTTCCTTCATATTTTTTTGGCAAATCAGGTTGCAAGGCTTGCCCAGATTAGGGTAAAGAGGCAAAGAGGCTCCTTTTCTTTTCTTTTCTTTTCTTTTTTCTTTTTTTTTTTTTTTTGAGTCAGAATCTCGCTCTGTTGCCCAGGCTGGAGTGCAG
chr1    274329    274481    EH38E2776530    0.598684    0.401316    47    25    36    44    0    0    152    GAGAATACAGTAAACTCTATGAGGCAAGCTATAAACATGTAGCATTGTGATTAGGGCTGGTTCTCCTTCTAGAGATATGGTAGGATTGCAATTTCATACCATCCTTGAAGTTAGAGAGAGCCATGTGACTCATTTAGCCAATGAACTGTGAG
chr1    586036    586264    EH38E2776532    0.464912    0.535088    48    55    67    58    0    0    228    ATTTTCCTGAGAGGAAAGCTTTCCCACATTATTCAGCTTCTGAAAGGGTTGCTTGACCCACAGATGTGAAGCTGAGGCTGAAGGAGACTGATGTGGTTTCTCCTCAGTTTCTCTGTGCGGCACCAGGTGGCAGCAGAGGTCAGCAAGGCAAACCCGAGCCCGGGGATGCGGGGTGGGGGCAGCTACGTCCTCTCTTGAGCTACAGCAGATTCACTCTGTTCTGTTTCA

File formats this tool works with
BEDGFFGTFVCF

Share your experience or ask a question