Reference Code backup Executable files
Print total base count and related statistics for sequences stored in FASTA files.
faSize file(s).fa
This tool is part of UCSC Genome Browser's utilities.
By default, faSize calculates the total bases (including number of hard-masked (Ns), soft-masked (sequences in lower cases), and normal bases (in upper cases)). It also prints the mean, standard deviation, minimum, maximum, and median of sequence sizes. In the following example, we show the summary statistics for the human reference genome (hg38):
$ faSize GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta 3099922541 bases (165046090 N's 2934876451 real 2934876451 upper 0 lower) in 195 sequences in 1 files Total size: mean 15897038.7 sd 46804464.6 min 970 (chrUn_KI270394v1) max 248956422 (chr1) median 32032 N count: mean 846390.2 sd 3850369.1 U count: mean 15050648.5 sd 45227268.4 L count: mean 0.0 sd 0.0 %0.00 masked total, %0.00 masked real
With the -tab option, faSize prints the stats in a tab-separated format:
$ faSize -tab GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta baseCount 3099922541 nBaseCount 165046090 realBaseCount 2934876451 upperBaseCount 2934876451 lowerBaseCount 0 seqCount 195 fileCount 1 meanSize 15897038.7 SdSize 46804464.6 minSize 970 minSeqSize chrUn_KI270394v1 maxSize 248956422 maxSeqSize chr1 medianSize 32032 nCountMean 846390.2 nCountSd 3850369.1 upperCountMean 15050648.5 upperCountSd 45227268.4 lowerCountMean 0.0 lowerCountSd 0.0 fracMasked 0.00 fracRealMasked 0.00
With the -detailed option, you can get size information for each sequence in the FASTA file:
$ faSize -detailed GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta | head chr1 248956422 chr2 242193529 chr3 198295559 chr4 190214555 chr5 181538259
With the -veryDetailed option, you can get more information for each sequence in the FASTA file:
$ faSize -veryDetailed GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta | head chr1 248956422 18475410 230481012 230481012 0 chr2 242193529 1645301 240548228 240548228 0 chr3 198295559 195424 198100135 198100135 0 chr4 190214555 461888 189752667 189752667 0 chr5 181538259 2555066 178983193 178983193 0