HomeCLC software: Important notificationsIssues affecting only versions of products released prior to June 2017Gene level RPKM of annotated eukaryotic genomes is using total gene reads instead of total exon reads - GWB 7.0 to 7.0.3

3.13. Gene level RPKM of annotated eukaryotic genomes is using total gene reads instead of total exon reads - GWB 7.0 to 7.0.3

Description

This problem has been fixed in the CLC Genomics Workbench version 7.0.4 and the Genomics Server 6.0.4.

 

A problem exists in the Genomics Workbench that affects the reporting of RPKM values at the gene level when using annotated references involving both mRNA and gene tracks in RNA-seq analyses in the CLC Genomics Workbench versions 7.0, 7.0.1, 7.0.2 and 7.0.3.

From our investigations so far, it appears that the problem is that the total number of reads mapping to a gene are being used in the numerator of the RPKM calculation instead of the total number of reads mapping to exons.

 

Who is affected

You could be affected by this if:

  • you are running RNA-seq analysis tool in the Genomics Workbench 7.0, 7.0.1, 7.0.2 and 7.0.3 AND
  • you are working with gene and mRNA annotations. Generally speaking, this affects people running analyses with annotated eukaryotic references AND
  • you are working with the results in the GE track, that is the, the gene-level results, AND
  • you have used RPKM values of the GE (gene) track output in statisitical tests, such as t-tests or ANOVA.

This issue will be particularly noticeable if you are working with data where you expect mapping to many intronic regions. See the section below for more detail on finding out if you are likely to be affected by this.

If you have not, and do not plan to, run statistical analyses on your gene level RPKM values, but you do wish to report the geen RPKM values, then please check if your RNA-seq results have been affected (see section below). If they have, you can re-run the analyses in the Genomics Workbench 7.0.4, or you can export the RNA-seq results to a text or excel-based format and use spreadsheet or other software to calculate the RPKM vales correctly.

 

You are not affected if

  • you are running an older version of the Genomics Workbench  or you are running the Legacy RNA-seq analysis tool in the affected versions of the CLC Genomics Workbench, OR
  • you are running analyses involving just annotated genes, as would generally be the case for annotated prokaryotic genomes, or you are using a list of sequences as references, where no annotations are involved OR
  • you are working with the results in the TE track, that is, the transcript-level results OR
  • you are working with count-based results, and are running DGE or proportion-based statistical tests, OR
  • no, or very few, reads were mapped to intronic regions of your genes.

 

Are your analyses affect by this problem?

If you are in the situation described in the "You could be affected by this if" section above, then you can further investigate if this problem has had an impact on your results by doing the following:

Check your Experiment data

In Experiment data, two or more RNA-seq samples have been compared. Using Experiment reuslts, you can check if any genes have looked like they were differentially expressed due to mapping to intronic regions.

  • Ensure the column reporting intron reads is visible. 
  • Click on the name of the intron reads column to sort the column. Sort it in descending order, so the highest values are listed first.
  • Check the values in the columns you have used in considering if genes have been differentially expressed or not. If none of the values there have been in a range of interest to you, then conclusions based on your current results will not have been affected.

If there are a reasonable number of reads mapping to intronic regions of some genes in your experiment, then even if your conclusions were not affected, the RPKM values for genes where reads mapped to introns will be affected. If you plan to report the RPKM for such genes, then these will need to be re-calculated.  See the section below called "Getting updated RPKM values for the RNA-seq samples and re-running statisical analyses".

 

Check your RNA-seq GE output

Here, you can check if the RPKM values for any of your RNA-seq samples have been reported incorrectly. Given the nature of this problem, this may not result in incorrect conclusions related to differential expression. Please refer to your Experiment level data, as described above, to see if this problem may have led to any incorrect conclusions being drawn.

  • View the GE output in table format. (This is the default view.)
  • Ensure the column reporting intron reads is visible. 
  • Click on the name of the intron reads column to sort the column. Sort it in descending order, so the highest values are listed first.
  • Are there any genes with reads mapping to intronic regions? 

 

Recommended action if your analyses are affected by this problem

This problem has been fixed in the CLC Genomics Workbench version 7.0.4 and the Genomics Server 6.0.4.

To generate the correct RPKM values, please upgrade your software so that correct values are generated in future. 

If the checks above reveal thatyou are affected by this problem, then please re-run your RNA-seq analyses in the Genomics Workbench 7.0.4/Genomics Server 6.0.4 or higher.

 

Work-arounds if RNA-seq analyses from version 7.0 through 7.0.3 cannot be re-run

The information below should not be needed after May 14, 2014 when the software was updated to fix this issue. It has been left here for cases where re-running the RNA-seq analyses using the updated software is not feasible.

 

Getting updated RPKM values for the RNA-seq samples and re-running statisical analyses

If you just wish to get updated RPKM values for each RNA-seq sample, then you only need to run the first two steps listed below.

 

Run count-based statistical analyses

Count-based statistical tests are not affected by this issue, as RPKM values are not used. If you wish to try using the count-based tests, they are described in our manual:

http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=Empirical_analysis_DGE.html

http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=Tests_on_proportions.html

 

 

Knowledge Tags
RPKM  / 

Related Pages
This page was: Helpful | Not Helpful