Category

Genomic Interval Manipulation


Usage

bedtools reldist [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>


Manual

This tool is part of the bedtools suite.

Traditional methods of comparing two sets of genomic intervals often rely on the quantity or ratio of overlapping intervals. However, these methods may overlook spatial correlations between the two sets, where intervals are consistently close to each other but rarely intersect. For instance, enhancers and transcription start sites are typically in close proximity, but seldom overlap, much like two sets of random intervals.

To address this, Favorov et al introduced a relative distance metric that captures the distribution of relative distances between each interval in one set and its two nearest intervals in the other set. If the two sets are not spatially correlated, the relative distances would be uniformly distributed between 0 and 0.5. However, if the intervals are closer than what would be expected by chance, the distribution of observed relative distances would lean towards lower values. reldist is an implementation of this idea.

        ~~~~~~~~~~~~~~~r=20~~~~~~~~~~~~~~~~
A:    ====                              ======
B:              =====
        ~~~d1=3~~~|~~~~~~~~~~d2=17~~~~~~~~~|

In the above case, the reldist is calculated as $\frac{\min(d_1,d_2)}{r}=\frac{3}{20}$

Required arguments

  • -a BED/GFF/VCF file: file A. Each feature in A is compared to B in search of overlaps. Use stdin if passing A with a UNIX pipe.
  • -b BED/GFF/VCF file: file B. Use stdin if passing B with a UNIX pipe.

Options

  • -detail: Instead of a summary, report the relative distance for each interval in A

Examples

By default, bedtools reldist reports the distribution of relative distances between two sets of intervals. The output reports the frequency of each relative distance (ranging from 0.0 to 0.5). If the two sets of intervals are randomly distributed with respect to one another, each relative distance “bin” with be roughly equally represented (i.e., a uniform distribution). For example, consider the relative distance distance distribution for exons and AluY elements:

$ bedtools reldist \
    -a data/refseq.chr1.exons.bed.gz \
    -b data/
    aluY.chr1.bed.gz | head -n 5
0.00  164 43408 0.004
0.01  551 43408 0.013
0.02  598 43408 0.014
0.03  637 43408 0.015
0.04  793 43408 0.018

In contrast, consider the relative distance distribution observed between exons and conserved elements:

$ bedtools reldist \
   -a data/refseq.chr1.exons.bed.gz \
   -b data/gerp.chr1.bed.gz | head -n 5
reldist  count total fraction
0.00  20629 43422 0.475
0.01  2629  43422 0.061
0.02  1427  43422 0.033
0.03  985 43422 0.023

Moreover, if one compares the relative distances for one set against itself, every interval should be expected to overlap an interval in the other set (itself). As such, the relative distances will all be 0.0:

$ bedtools reldist \
    -a data/refseq.chr1.exons.bed.gz \
    -b data/refseq.chr1.exons.bed.gz
reldist  count total fraction
0.00  43424 43424 1.000

File formats this tool works with
BEDGFFGTFVCF

Share your experience or ask a question