Reference Code backup Executable files
This tool corrects the GC-bias using the method proposed by [Benjamini & Speed (2012). Nucleic Acids Research, 40(10)]. It will remove reads from regions with too high coverage compared to the expected values (typically GC-rich regions) and will add reads to regions where too few reads are seen (typically AT-rich regions). The tool computeGCBias needs to be run first to generate the frequency table needed here.
correctGCBias -b file.bam --effectiveGenomeSize 2150570000 -g mm9.2bit --GCbiasFrequenciesFile freq.txt -o gc_corrected.bam [options]
correctGCBias
is a tool from the deepTools suite. The information on this page is based on deepTools version 3.5.1.
–bamfile, -b Sorted BAM file to correct.
–effectiveGenomeSize The effective genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. Also, if repetitive regions were not included in the mapping of reads, the effective genome size needs to be adjusted accordingly. Common values are: mm9: 2150570000, hg19:2451960000, dm3:121400000 and ce10:93260000. See Table 2 of http://www.plosone.org/article/info:doi/10.1371/journal.pone.0030377 or http://www.nature.com/nbt/journal/v27/n1/fig_tab/nbt.1518_T1.html for several effective genome sizes. This value is needed to detect enriched regions that, if not discarded, could bias the results.
–genome, -g Genome in two bit format. Most genomes can be found here: http://hgdownload.cse.ucsc.edu/gbdb/ Search for the .2bit ending. Otherwise, fasta files can be converted to 2bit using faToTwoBit available here: http://hgdownload.cse.ucsc.edu/admin/exe/
–GCbiasFrequenciesFile, -freq Indicate the output file from computeGCBias containing the observed and expected read frequencies per GC-content.
–correctedFile, -o Name of the corrected file. The ending will be used to decide the output file format. The options are ”.bam”, ”.bw” for a bigWig file, ”.bg” for a bedGraph file.
–version show program’s version number and exit
–binSize, -bs Size of the bins, in bases, for the output of the bigwig/bedgraph file.
–region, -r Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example –region chr10 or –region chr10:456700:891000.
–numberOfProcessors, -p Number of processors to use. Type “max/2” to use half the maximum number of processors or “max” to use all available processors.
–verbose, -v Set to see processing messages.