Category

Genomic Interval Manipulation


Usage

bedtools subtract [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>


Manual

This tool is part of the bedtools suite and it's also known as subtractBed.

bedtools subtract searches for features in B that overlap A by at least the number of base pairs given by the -f option. If an overlapping feature is found in B, the overlapping portion is removed from A and the remaining portion of A is reported. If a feature in B overlaps all of a feature in A, the A feature will not be reported. If a feature in B does not overlap a feature in A by at least the -f amount, the A feature will be reported in its entirety.

Required arguments

  • -a string: Path to the input file A in bed/gff/vcf format.
  • -b string: Path to the input file B in bed/gff/vcf format.

Options

  • -A: Remove entire feature if any overlap. By default, only subtract the portion of A that overlaps B. Here, if any overlap is found (or -f amount), the entire feature is removed.
  • -N: Same as -A except when used with -f, the amount is the sum of all features (not any single feature).
  • -wb: Write the original entry in B for each overlap. Useful for knowing what A overlaps. Restricted by -f and -r.
  • -wo: Write the original A and B entries plus the number of base pairs of overlap between the two features. Overlaps restricted by -f and -r. Only A features with overlap are reported.
  • -s: Require same strandedness. That is, only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand.
  • -S: Require different strandedness. That is, only report hits in B that overlap A on the opposite strand. By default, overlaps are reported without respect to strand.
  • -f float: Minimum overlap required as a fraction of A. Default is $10^{-9}$ (i.e., 1bp).
  • -F float: Minimum overlap required as a fraction of B. Default is $10^{-9}$ (i.e., 1bp).
  • -r: Require that the fraction overlap be reciprocal for A AND B. In other words, if -f 0.90 and -r are used, this requires that B overlap 90% of A and A also overlaps 90% of B.
  • -e: Require that the minimum fraction be satisfied for A OR B. In other words, if -e is used with -f 0.90 and -F 0.10, this requires that either 90% of A is covered OR 10% of B is covered. Without -e, both fractions would have to be satisfied.
  • -split: Treat "split" BAM or BED12 entries as distinct BED intervals.
  • -g genome_file: Provide a genome file to enforce consistent chromosome sort order across input files. Only applies when used with -sorted option.
  • -nonamecheck: For sorted data, don't throw an error if the file has different naming conventions for the same chromosome. ex. "chr1" vs "chr01".
  • -sorted: Use the "chromsweep" algorithm for sorted (-k1,1 -k2,2n) input.
  • -bed: If using BAM input, write output as BED.
  • -header: Print the header from the A file prior to results.
  • -nobuf: Disable buffered output. Using this option will cause each line of output to be printed as it is generated, rather than saved in a buffer. This will make printing large output files noticeably slower, but can be useful in conjunction with other software tools and scripts that need to process one line of bedtools output at a time.
  • -iobuf integer: Specify the amount of memory to use for input buffer. Optional suffixes K/M/G supported. Note: currently has no effect with compressed files.

Examples

By default, bedtools subtracts removes each overlapping interval in B from A. If a feature in B completely overlaps a feature in A, the A feature is removed.

$ cat A.bed
chr1  10   20
chr1  100  200

$ cat B.bed
chr1  0    30
chr1  180  300

$ bedtools subtract -a A.bed -b B.bed
chr1  100  180
Requiring a minimal overlap fraction before subtracting

This option behaves the same as the -f option for bedtools intersect. In this case, subtract will only subtract an overlap with B if it covers at least the fraction of A defined by -f. If an overlap is found, but it does not meet the overlap fraction, the original A feature is reported without subtraction.

$ cat A.bed
chr1  100  200

$ cat B.bed
chr1  180  300

$ bedtools subtract -a A.bed -b B.bed -f 0.10
chr1  100  180

$ bedtools subtract -a A.bed -b B.bed -f 0.80
chr1  100  200
Enforcing same strandedness

This option behaves the same as the -s option for bedtools intersect while scanning for features in B that should be subtracted from A.

$ cat A.bed
chr1  100  200    a1  1   +

$ cat B.bed
chr1  80   120    b1  1   +
chr1  180  300    b2  1   -

$ bedtools subtract -a A.bed -b B.bed -s
chr1  120  200    a1  1   +
Enforcing opposite strandedness

This option behaves the same as the -S option for bedtools intersect while scanning for features in B that should be subtracted from A.

$ cat A.bed
chr1  100  200    a1  1   +

$ cat B.bed
chr1  80   120    b1  1   +
chr1  180  300    b2  1   -

$ bedtools subtract -a A.bed -b B.bed -S
chr1  100  180    a1  1   +
Remove features with any overlap

Unlike the default behavior, the -A option will completely remove a feature from A if it has even 1bp of overlap with a feature in B.

$ cat A.bed
chr1  100  200

$ cat B.bed
chr1  180  300

$ bedtools subtract -a A.bed -b B.bed
chr1  100  180

$ bedtools subtract -a A.bed -b B.bed -A

File formats this tool works with
BEDGFFGTFVCF

Share your experience or ask a question