Category

Genome Annotation


Usage

gffcompare [options]* {-i <input_gtf_list> | <input1.gtf> [<input2.gtf> .. <inputN.gtf>]}


Manual

GffCompare provides classification and reference annotation mapping and matching statistics for RNA-Seq assemblies (transfrags) or other generic GFF/GTF files.

GffCompare also clusters and tracks transcripts across multiple GFF/GTF files (samples), writing matching transcripts (identical intron chains) into <outprefix>.tracking, and a GTF file <outprefix>.combined.gtf which contains a nonredundant set of transcripts across all input files (with a single representative transfrag chosen for each clique of matching transfrags across samples). Note, the <outprefix> is specified by option -o.

Options

  • -i <input_gtf_list>: provide a text file input_gtf_list with a list of (query) GTF files to process instead of expecting them as command-line arguments (useful when a large number of GTF files should be processed).
  • -o <outprefix>: All output files created by Cuffcompare will have this prefix (e.g. .loci, .tracking, etc.). If this option is not provided the default output prefix being used is: "gffcmp".
  • -r <gff_file>: An optional “reference” annotation GFF file. Each sample is matched against this file, and sample isoforms are tagged as overlapping, matching, or novel where appropriate. See the refmap and tmap output file descriptions below.
  • -R: If -r was specified, this option causes cuffcompare to ignore reference transcripts that are not overlapped by any transcript in one of cuff1.gtf,…,cuffN.gtf. Useful for ignoring annotated transcripts that are not present in your RNA-Seq samples and thus adjusting the “sensitivity” calculation in the accuracy report written in the file
  • -Q: If -r was specified, this option causes gffcompare to ignore input transcripts that are not overlapped by any transcript in the reference. Useful for adjusting the “precision” calculation in the accuracy report written in the file.
  • -M: discard (ignore) single-exon transfrags and reference transcripts (i.e. consider only multi-exon transcripts)
  • -N: discard (ignore) single-exon reference transcripts; single-exon transfrags are still considered, but they will never find an exact match
  • -D: discard "duplicate" (redundant) query transfrags (i.e. those with the same intron chain) within a single sample (and thus disable "annotation" mode)
  • -s <genome_path>: path to genome sequences (optional); this will enable the "repeat" ('r') classcode assessment; <genome_path> should be a full path to a multi-FASTA file, preferrably indexed with samtools faidx; repeats must be soft-masked (lower case) in the genomic sequence
  • -e <dist>: Maximum distance (range) allowed from free ends of terminal exons of reference transcripts when assessing exon accuracy. By default, this is 100.
  • -d <dist>: Maximum distance (range) for grouping transcript start sites, by default 100.
  • -T: Do not generate .tmap and .refmap files for each input file
  • -V: Cuffcompare is a little more verbose about what it’s doing, printing messages to stderr, and it will also show warning messages about any inconsistencies or potential issues found while reading the given GFF file(s).
  • -h/-help: Prints the help message and exits.
  • -v/--version: display gffcompare version

Output options

  • -p <tprefix>: The name prefix to use for consensus/combined transcripts in the <outprefix>.combined.gtf file (default: 'TCONS')
  • -C: Discard the “contained” transfrags from the .combined.gtf output. By default, without this option, gffcompare writes in that file isoforms that were found to be fully contained/covered (with the same compatible intron structure) by other transfrags in the same locus, with the attribute “contained_in” showing the first container transfrag found. (Note: this behavior is the opposite of Cuffcompare's -C option).
  • -A: Like -C but will not discard intron-redundant transfrags if they start on a different 5' exon (keep alternate transcript start sites)
  • -X: Like -C but also discard contained transfrags if transfrag ends stick out within the container's introns
  • -K: For -C/-A/-X, do NOT discard any redundant transfrag matching a reference

This document was generated using GffCompare v0.12.1.


Share your experience or ask a question