Category

Genomic Interval Manipulation


Usage

bedtools shuffle [OPTIONS] -i <BED/GFF/VCF> -g <GENOME>


Manual

This tool is part of the bedtools suite.

bedtools shuffle randomly permutes the genomic locations of a feature file among a genome defined in a genome file. By default, shuffle repositions each feature in the input file (defined by -i) on a random chromosome at a random position. The size and strand of each feature are preserved. shuffle is useful as a null basis against which to test the significance of associations of one feature with another. By default, the shuffled regions are written to the stdout, you can user redirection to save them to a file.

Related: bedtools sample

Required arguments

  • -i path: Path to a BED/GFF/VCF file which includes all the input regions.
  • -g path: Chromosome size file. The file should tab delimited and structured as follows: <chromName> <chromSize>.  You can get this file for the genome release that you are working with by using fetchChromSizes.

Options

  • -excl path: A BED/GFF/VCF file of coordinates in which features from -i should not be placed (e.g., genome gaps).
  • -incl path: Instead of randomly placing features in a genome, the -incl options defines a BED/GFF/VCF file of coordinates in which features in -i should be randomly placed (e.g. genes.bed). Larger -incl intervals will contain more shuffled regions. This method DISABLES -chromFirst.
  • -chrom: Keep features in -i on the same chromosome. Solely permute their location on the chromosome. By default, both the chromosome and position are randomly chosen. This option forces use of -chromFirst.
  • -seed int: Supply an integer seed for the shuffling. This will allow feature shuffling experiments to be recreated exactly as the seed for the pseudo-random number generation will be constant. By default, the seed is chosen automatically.
  • -f float: Maximum overlap (as a fraction of the -i feature) with an -excl feature that is tolerated before searching for a new, randomized locus. For example, -f 0.10 allows up to 10% of a randomized feature to overlap with a given feature in the -excl file. Cannot be used with -incl file. Default is 1E-9 (i.e., 1bp).
  • -chromFirst: Instead of choosing a position randomly among the entire genome (the default), first choose a chrom randomly, and then choose a random start coordinate on that chrom. This leads to features being ~uniformly distributed among the chroms, as opposed to features being distribute as a function of chrom size.
  • -bedpe: Indicate that the input file (defined by -i) is in BEDPE format.
  • -maxTries int: Max. number of attempts to find a home for a shuffled interval in the presence of -incl or -excl. Default = 1000.
  • -noOverlapping: Don't allow shuffled intervals to overlap.
  • -allowBeyondChromEnd: Allow the original the length of the original records to extebd beyond the length of the chromosome. In this case, the end coordinate of the shuffled interval will be set to the chromosome's length. By default, an interval's original length must be fully-contained within the chromosome.

Example

The follow commands shuffles regions defined in A.bed (-i) using the chromosome size (-g) defined in my.genome, and write the results to a new file called result.bed.

$ cat A.bed
chr1  0  100  a1  1  +
chr1  0  1000 a2  2  -
$ cat my.genome
chr1 10000
chr2 8000
chr3 5000
chr4 2000
$ bedtools shuffle -i A.bed -g my.genome > result.bed
$ cat result.bed
chr4 1498 1598 a1 1 + 
chr3 2156 3156 a2 2 -

File formats this tool works with
BEDGFFGTFVCF

Share your experience or ask a question