Category

Genomic Interval Manipulation


Usage

bedtools sample [OPTIONS] -i <bed/gff/vcf/bam>


Manual

This tool is part of the bedtools suite.

bedtools sample is a command-line tool that takes a sample of input file(s) using the reservoir sampling algorithm. It can handle various file formats such as BED, GFF, VCF, and BAM. Ensure you have enough memory for the requested sample size, as all selected records are held in memory before output.

Required arguments

  • -i input file(s): The input file(s) in BED/GFF/VCF/BAM format.

Options

  • -n INTEGER: The number of records to generate. Default is 1000000.
  • -seed INTEGER: Supply an integer seed for shuffling. The seed is chosen automatically by default.
  • -ubam: Write uncompressed BAM output. Default writes compressed BAM.
  • -s strand: Require same strandedness. Options are 'forward' or 'reverse' for forward or reverse strand records, respectively. By default, records are reported without respect to strand.
  • -header: Print the header from the input file prior to results.
  • -bed: If using BAM input, write output as BED.
  • -nobuf: Disable buffered output. Each line of output will be printed as it is generated, rather than saved in a buffer. Useful for processing one line at a time with other tools and scripts.
  • -iobuf INTEGER[K/M/G]: Specify the amount of memory to use for the input buffer. Optional suffixes K/M/G are supported. Note: currently has no effect with compressed files.

Examples

Randomly sample a subset of lines from a bed file

In the following example, bedtools sample randomly samples 8000000 records (as specified by the -n option) without replacement from the input file (input.bed). The results is saved to the output file sampled.bed

$ bedtools sample -i input.bed -n 8000000 > sampled.bed

# let's take a look at the input file
$ head input.bed
chr2	186011035	186011111	N	1000	-
chr2	186010990	186011048	N	1000	+
chr3	160860209	160860285	N	1000	-
chr3	160860106	160860164	N	1000	+
chr2	69367652	69367728	N	1000	+
chr2	69367701	69367759	N	1000	-
chr1	173848214	173848277	N	1000	+
chr1	173848219	173848277	N	1000	-
chr6	110537162	110537238	N	1000	-
chr6	110537086	110537144	N	1000	+

# let's take a look at the sampled output
$ head sampled.bed
chr14	106356536	106356612	N	1000	+
chr14	102972879	102972937	N	1000	-
chr1	224182389	224182461	N	1000	+
chr3	160860106	160860164	N	1000	+
chr5	87127382	87127440	N	1000	-
chr2	69367701	69367759	N	1000	-
chr1	173848214	173848277	N	1000	+
chr12	75556438	75556514	N	1000	+
chr14	73977197	73977273	N	1000	+
chrX	123428052	123428128	N	1000	-

File formats this tool works with
BEDBAMGFFGTFVCF

Share your experience or ask a question