Reference Code backup Executable files
Merges overlapping BED/GFF/VCF entries into a single interval.
bedtools merge [OPTIONS] -i <BED/GFF/VCF/BAM>
This tool is part of the bedtools
suite.
Multiple operations can be specified in a comma-delimited list. If there is only one column, but multiple operations, all operations will be applied to that column. Likewise, if there is only one operation, but multiple columns, that operation will be applied to all columns. Otherwise, the number of columns must match the number of operations, and will be applied in respective order. E.g., "-c 5,4,6
-o sum,mean,count
" will give the sum of column 5, the mean of column 4, and the count of column 6. The order of output columns will match the ordering given in the command.
-delim "|"
. Default: ",".By default, merge
combines overlapping (by at least 1 bp) and/or bookended intervals into a single, “flattened” or “merged” interval.
$ cat A.bed
chr1 100 200
chr1 180 250
chr1 250 500
chr1 501 1000
$ bedtools merge -i A.bed
chr1 100 500
chr1 501 1000
The -s option will only merge intervals that are overlapping/bookended and are on the same strand.
$ cat A.bed
chr1 100 200 a1 1 +
chr1 180 250 a2 2 +
chr1 250 500 a3 3 -
chr1 501 1000 a4 4 +
$ bedtools merge -i A.bed
-s
chr1 100 250
chr1 501 1000
chr1 250 500
To also report the strand, you could use the -c and -o operators (see below for more details):
$ bedtools merge-i A.bed
-s-c 6
-o distinct
chr1 100 250 + chr1 501 1000 +
The -S option will only merge intervals for a specific strand. For example, to only report merged intervals on the + strand:
$ cat A.bed chr1 100 200 a1 1 + chr1 180 250 a2 2 + chr1 250 500 a3 3 - chr1 501 1000 a4 4 + $ bedtools merge-i A.bed
-S +
chr1 100 250 chr1 501 1000
To also report the strand, you could use the -c and -o operators (see below for more details):
$ bedtools merge-i A.bed
-S +
-c 6
-o distinct
chr1 100 250 + chr1 501 1000 +
By default, only overlapping or book-ended features are combined into a new feature. However, one can force merge
to combine more distant features with the -d option. For example, were one to set -d 1000
, any features that overlap or are within 1000 base pairs of one another will be combined.
$ cat A.bed chr1 100 200 chr1 501 1000 $ bedtools merge-i A.bed
chr1 100 200 chr1 501 1000 $ bedtools merge-i A.bed
-d 1000
chr1 100 200 1000
When merging intervals, we often want to summarize or keep track of the values observed in specific columns (e.g., the feature name or score) from the original, unmerged intervals. When used together, the -c and -o options allow one to select specific columns (-c) and apply operation (-o) to each column. The result will be appended to the default, merged interval output. For example, one could use the following to report the count of intervals that we merged in each resulting interval (this replaces the -n option that existed prior to version 2.20.0).
$ cat A.bed chr1 100 200 chr1 180 250 chr1 250 500 chr1 501 1000 $ bedtools merge-i A.bed
-c 1
-o count
chr1 100 500 3 chr1 501 1000 1
We could also use these options to report the mean of the score (#5) field:
$ cat A.bed chr1 100 200 a1 1 + chr1 180 250 a2 2 + chr1 250 500 a3 3 - chr1 501 1000 a4 4 + $ bedtools merge-i A.bed
-c 5
-o mean
chr1 100 500 2 chr1 501 1000 4
Let’s get fancy and report the mean, min, and max of the score column:
$ bedtools merge-i A.bed
-c 5
-o mean,min,max
chr1 100 500 2 1 3 chr1 501 1000 4 4 4
Let’s also report a comma-separated list of the strands:
$ bedtools merge-i A.bed
-c 5,5,5,6
-o mean,min,max,collapse
chr1 100 500 2 1 3 +,+,- chr1 501 1000 4 4 4 +
Hopefully this provides a clear picture of what can be done.