bcftools view manual with usage examples

Usage

bcftools view [options] <in.vcf.gz> [region1 [...]]

Manual

Required arguments

in.vcf.gz file: Input VCF/BCF file.

Options

region1 [...]: Additional regions.

Output options

-G, --drop-genotypes: Drop individual genotype information (after subsetting if -s option set).
-h, -H, --header-only, --no-header: Print the header only/suppress the header in VCF output.
-l, --compression-level int: Compression level: 0 uncompressed, 1 best speed, 9 best compression [-1].
--no-version: Do not append version and command line to the header.
-o, --output file: Output file name [stdout].
-O, --output-type b|u|z|v:
- b: compressed BCF
- u: uncompressed BCF
- z: compressed VCF
- v: uncompressed VCF (default).
-r, --regions region: Restrict to comma-separated list of regions. Overlapping records are matched even when the starting coordinate is outside of the region, unlike the -t/-T options where only the POS coordinate is checked. Note that -r cannot be used in combination with -R.
-R, --regions-file file: Restrict to regions listed in a file. Regions can be specified either on command line or in a VCF, BED, or tab-delimited file (the default). The columns of the tab-delimited file can contain either positions (two-column format: CHROM, POS) or intervals (three-column format: CHROM, BEG, END), but not both. Positions are 1-based and inclusive. The columns of the tab-delimited BED file are also CHROM, POS and END (trailing columns are ignored), but coordinates are 0-based, half-open. To indicate that a file be treated as BED rather than the 1-based tab-delimited file, the file must have the ".bed" or ".bed.gz" suffix (case-insensitive). Uncompressed files are stored in memory, while bgzip-compressed and tabix-indexed region files are streamed. Note that sequence names must match exactly, "chr20" is not the same as "20". Also note that chromosome ordering in FILE will be respected, the VCF will be processed in the order in which chromosomes first appear in FILE. However, within chromosomes, the VCF will always be processed in ascending genomic coordinate order no matter what order they appear in FILE. Note that overlapping regions in FILE can result in duplicated out of order positions in the output. This option requires indexed VCF/BCF files. Note that -R cannot be used in combination with -r.
-t, --targets [^]region: Similar to -r, but the next position is accessed by streaming the whole VCF/BCF rather than using the tbi/csi index. Both -r and -t options can be applied simultaneously: -r uses the index to jump to a region and -t discards positions which are not in the targets. Unlike -r, targets can be prefixed with "^" to request logical complement. For example, "^X,Y,MT" indicates that sequences X, Y and MT should be skipped. Yet another difference between the -t/-T and -r/-R is that -r/-R checks for proper overlaps and considers both POS and the end position of an indel, while -t/-T considers the POS coordinate only (by default; see also --regions-overlap and --targets-overlap). Note that -t cannot be used in combination with -T.
-T, --targets-file [^]file: Similar to -R but streams rather than index-jumps.
--targets-overlap 0|1|2: This option controls how overlapping records are determined: set to pos or 0 if the VCF record has to have POS inside a region
--threads int: Use multithreading with int worker threads [0].
-W [FMT], --write-index[=FMT]: Automatically index the output file. FMT is optional and can be one of "tbi" or "csi" depending on output file format.

Subset options

-A, --trim-unseen-alleles: remove the unseen allele <*> or <NON_REF> at variant sites when the option is given once (-A) or at all sites when the options is given twice (-AA).
-a, --trim-alt-alleles: Trim ALT alleles not seen in the genotype fields. Note that if no alternate allele remains after trimming, the record itself is not removed but ALT is set to ".". If the option -s or -S is given, removes alleles not seen in the subset. INFO and FORMAT tags declared as Type=A, G or R will be trimmed as well.
-I, --no-update: Do not (re)calculate INFO fields for the subset (currently INFO/AC and INFO/AN).
-s, --samples [^]list: Comma-separated list of samples to include or exclude if prefixed with "^." (Note that when multiple samples are to be excluded, the "^" prefix is still present only once, e.g. "^SAMPLE1,SAMPLE2".) The sample order is updated to reflect that given on the command line. Some tags will be updated (unless the -I option is used).
-S, --samples-file [^]file: File of sample names to include or exclude if prefixed with "^". One sample per line. See also the note above for the -s option. The sample order is updated to reflect that given in the input file.
--force-samples: Only warn about unknown subset samples.

Filter options

Note that filter options below dealing with counting the number of alleles will, for speed, first check for the values of AC and AN in the INFO column to avoid parsing all the genotype (FORMAT/GT) fields in the VCF. This means that filters like --uncalled, --exclude-uncalled, or --min-af 0.1 will be calculated from INFO/AC and INFO/AN when available or FORMAT/GT otherwise. However, it will not attempt to use any other existing field, like INFO/AF for example. For that, use --exclude AF<0.1 instead.

Also note that one must be careful when sample subsetting and filtering is performed in a single command because the order of internal operations can influence the result. For example, the -i/-e filtering is performed before sample removal, but the -P filtering is performed after, and some are inherently ambiguous, for example allele counts can be taken from the INFO column when present but calculated on the fly when absent. Therefore it is strongly recommended to spell out the required order explicitly by separating such commands into two steps. (Make sure to use the -O u option when piping!)

-c, -C, --min-ac, --max-ac int[:type]: Minimum (-c or --min-ac) / maximum (-C or --max-ac) count for non-reference (nref, the default), 1st alternate (alt1), least frequent (minor), most frequent (major) or sum of all but most frequent (nonmajor) alleles.
-f, --apply-filters list: Require at least one of the listed FILTER strings (e.g., "PASS,.").
-g, --genotype [^]hom|het|miss: Require one or more hom/het/missing genotype or, if prefixed with "^", exclude sites with hom/het/missing genotypes.
-i, -e, --include, --exclude expr: Select (-i or --include) / exclude (-e or --exclude) sites for which the expression is true (see man page for details).
-k, -n, --known, --novel: Select known (-k or --known) /novel (-n or --novel) sites only (ID is not/is '.').
-m, -M, --min-alleles, --max-alleles int: Minimum (-m or --min-alleles) / Maximum (-M or --max-alleles) number of alleles listed in REF and ALT (e.g., -m 2 -M 2 for biallelic sites).
-p, -P, --phased, --exclude-phased: Select (-p or --phased) / exclude (-P or --exclude-phased) sites where all samples are phased.
-q, -Q, --min-af, --max-af float[:type]: Minimum (-q or --min-af) / Maximum frequency (-Q or --max-af) for non-reference (nref, the default), 1st alternate (alt1), least frequent (minor), most frequent (major), or sum of all but most frequent (nonmajor) alleles.
-u, -U, --uncalled, --exclude-uncalled: Select (-u or --uncalled) /exclude (-U or --exclude-uncalled) sites without a called genotype.
-v, -V, --types, --exclude-types list: Select (-v or --types) / exclude (-V or --exclude-types) comma-separated list of variant types. Supported types:
- snps
- indels
- mnps
- ref
- bnd
- other.
-x, -X, --private, --exclude-private: Select (-x or --private) / exclude (-X or --exclude-private) sites where the non-reference alleles are exclusive (private) to the subset samples.

Former bcftools subset.

File formats this tool works with

VCF

bcftools view

Category