Reference Code backup Executable files
This tool computes the GC bias using the method proposed in Benjamini and Speed. The output is used to plot the results and can also be used later on to correct the bias with the tool correctGCbias. There are two plots produced by the tool: a boxplot showing the absolute read numbers per GC-content bin and an x-y plot depicting the ratio of observed/expected reads per GC-content bin.
computeGCBias -b file.bam --effectiveGenomeSize 2150570000 -g mm9.2bit -l 200 --GCbiasFrequenciesFile freq.txt [options]
computeGCBias
is a tool from the deepTools suite. The information on this page is based on deepTools version 3.5.1.
–bamfile, -b Sorted BAM file.
–effectiveGenomeSize The effective genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. Also, if repetitive regions were not included in the mapping of reads, the effective genome size needs to be adjusted accordingly. Common values are: mm9: 2150570000, hg19:2451960000, dm3:121400000 and ce10:93260000. See Table 2 of http://www.plosone.org/article/info:doi/10.1371/journal.pone.0030377 or http://www.nature.com/nbt/journal/v27/n1/fig_tab/nbt.1518_T1.html for several effective genome sizes. This value is needed to detect enriched regions that, if not discarded can bias the results.
–genome, -g Genome in two bit format. Most genomes can be found here: http://hgdownload.cse.ucsc.edu/gbdb/ Search for the .2bit ending. Otherwise, fasta files can be converted to 2bit using the UCSC programm called faToTwoBit available for different plattforms at http://hgdownload.cse.ucsc.edu/admin/exe/
–fragmentLength, -l Fragment length used for the sequencing. If paired-end reads are used, the fragment length is computed based from the bam file
–sampleSize Number of sampling points to be considered.
–extraSampling BED file containing genomic regions for which extra sampling is required because they are underrepresented in the genome.
–version show program’s version number and exit
–region, -r Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example –region chr10 or –region chr10:456700:891000.
–blackListFileName, -bl A BED or GTF file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. Please note that you should adjust the effective genome size, if relevant.
–numberOfProcessors, -p Number of processors to use. Type “max/2” to use half the maximum number of processors or “max” to use all available processors.
–verbose, -v Set to see processing messages.
–GCbiasFrequenciesFile, -freq Path to save the file containing the observed and expected read frequencies per %GC-content. This file is needed to run the correctGCBias tool. This is a text file.
–plotFileFormat Possible choices: png, pdf, svg, eps
image format type. If given, this option overrides the image format based on the plotFile ending. The available options are: “png”, “eps”, “pdf” and “svg”
–biasPlot If given, a diagnostic image summarizing the GC-bias will be saved.
–regionSize To plot the reads per %GC over a regionthe size of the region is required. By default, the bin size is set to 300 bases, which is close to the standard fragment size for Illumina machines. However, if the depth of sequencing is low, a larger bin size will be required, otherwise many bins will not overlap with any read