Reference Code backup Executable files
Sample random records from the input BED/GFF/VCF/BAM file(s) without replacement using the reservoir sampling algorithm
bedtools sample [OPTIONS] -i <bed/gff/vcf/bam>
This tool is part of the bedtools
suite.
bedtools sample
is a command-line tool that takes a sample of input file(s) using the reservoir sampling algorithm. It can handle various file formats such as BED, GFF, VCF, and BAM. Ensure you have enough memory for the requested sample size, as all selected records are held in memory before output.
In the following example, bedtools sample
randomly samples 8000000 records (as specified by the -n option) without replacement from the input file (input.bed). The results is saved to the output file sampled.bed
$ bedtools sample-i input.bed
-n 8000000
> sampled.bed # let's take a look at the input file $ head input.bed chr2 186011035 186011111 N 1000 - chr2 186010990 186011048 N 1000 + chr3 160860209 160860285 N 1000 - chr3 160860106 160860164 N 1000 + chr2 69367652 69367728 N 1000 + chr2 69367701 69367759 N 1000 - chr1 173848214 173848277 N 1000 + chr1 173848219 173848277 N 1000 - chr6 110537162 110537238 N 1000 - chr6 110537086 110537144 N 1000 + # let's take a look at the sampled output $ head sampled.bed chr14 106356536 106356612 N 1000 + chr14 102972879 102972937 N 1000 - chr1 224182389 224182461 N 1000 + chr3 160860106 160860164 N 1000 + chr5 87127382 87127440 N 1000 - chr2 69367701 69367759 N 1000 - chr1 173848214 173848277 N 1000 + chr12 75556438 75556514 N 1000 + chr14 73977197 73977273 N 1000 + chrX 123428052 123428128 N 1000 -