Category

Sequence Analysis


Usage

faCount file(s).fa


Manual

This tool is part of UCSC Genome Browser's utilities.

faCount calculates the occurances for A/T/C/Gs and CpGs in each sequence records in the given input files and print the result to stdout.

Required arguments

  • file(s).fa: input fasta files

Options

  • -summary: show only summary statistics
  • -dinuc: include statistics on dinucletoide frequencies
  • -strands: count bases on both strands

Examples

Per-sequence stats

Count base occurrences for the human reference genome (hg38):

$ faCount GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta | head
#seq    len    A    C    G    T    N    cpg
chr1    248956422    67070277    48055043    48111528    67244164    18475410    2375159
chr2    242193529    71791213    48318180    48450903    71987932    1645301    2192670
chr3    198295559    59689091    39233483    39344259    59833302    195424    1673293
chr4    190214555    58561236    36236976    36331025    58623430    461888    1503429
chr5    181538259    54053328    35315012    35401468    54213385    2555066    1523709
chr6    170805979    51345477    33646690    33713330    51373025    727457    1511189
chr7    159345973    47058248    32317984    32378859    47215040    375842    1622825
chr8    145138636    43333530    29030173    29103787    43300646    370500    1338200
chr9    138394717    35736329    25099811    25170662    35783748    16604167    1255728
Overall stats

By adding the option -summary, you can get the overall stats for all sequences:

$ faCount GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta -summary | head
#seq    len    A    C    G    T    N    cpg
total    3099922541    866420001    598683433    600854940    868918077    165046090    29303965
prcnt    1.0      0.2795    0.1931    0.1938    0.2803    0.0532    0.0095
Both strand stats

Count base occurrences (for both the forward and reverse directions by using the -strands option) for the human reference genome (hg38):

$ faCount GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta -strands | head
#seq    len    A    C    G    T    N    cpg
chr1    497912844    134314441    96166571    96166571    134314441    36950820    4750318
chr2    484387058    143779145    96769083    96769083    143779145    3290602    4385340
chr3    396591118    119522393    78577742    78577742    119522393    390848    3346586
chr4    380429110    117184666    72568001    72568001    117184666    923776    3006858
chr5    363076518    108266713    70716480    70716480    108266713    5110132    3047418
chr6    341611958    102718502    67360020    67360020    102718502    1454914    3022378
chr7    318691946    94273288    64696843    64696843    94273288    751684    3245650
chr8    290277272    86634176    58133960    58133960    86634176    741000    2676400
chr9    276789434    71520077    50270473    50270473    71520077    33208334    2511456


Share your experience or ask a question