Reference Code backup Executable files
Perform Fisher's test to see if the amount of overlap between the two sets of intervals is more or less than expected, given their coverage and the genome size.
bedtools fisher [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam> -g <genome file>
This tool is part of the bedtools
suite, and it's also known as fisher
.
This implementation first calculates the number of overlaps and the number of intervals unique to each file and it infers (or accepts) the number that are not present in each file. Then it constructs a contingency table and performs the Fisher's exact test.
in -b | not in -b | |
---|---|---|
in -a |
Number of overlaps (denote as n11) |
Number of query intervals - Number of overlaps (denote as n12) |
not in -a |
Number of db intervals - Number of overlaps (denote as n21) |
Number of possible intervals - n11 - n12 - n21 |
The total number of possible intervals is based on a heuristic that uses the mean sizes of intervals in the a and b sets and the size of the genome. For example, if the average sizes of intervals in a and b are 100 and 150, respectively, and the genome has 5000 bps, then this implementation estimates around 20 possible intervals in total. Before using this tool, please carefully consider if this heuristic fits your assumption.
fetchChromSizes
.sort -k1,1 -k2,2n in.bed > in.sorted.bed
or bedtools sort
for BED files).-f 0.90
and -F 0.10
this requires that either 90% of A is covered OR 10% of B is covered. Without -e, both fractions would have to be satisfied.-f 0.90
and -r is used, this requires that B overlap at least 90% of A and that A also overlaps at least 90% of B.$ bedtools fisher-a gcp_chr22.bam
-b chr22.test.bed
-g GRCh38_no_alt_analysis_set_GCA_000001405.15.genome
-bed # Number of query intervals: 926535 # Number of db intervals: 714888 # Number of overlaps: 622725 # Number of possible intervals (estimated): 9061211 # phyper(622725 - 1, 926535, 9061211 - 926535, 714888, lower.tail=F) # Contingency Table Of Counts #_________________________________________ # | in -b | not in -b | # in -a | 622725 | 303810 | # not in -a | 92163 | 8042513 | #_________________________________________ # p-values for fisher's exact test left right two-tail ratio 1 0 0 178.867