HomeCLC software: Important notificationsIssues affecting only older versions of productsIssues with Create UMI Reads tool for QIAseq panel data analysis

2.18. Issues with Create UMI Reads tool for QIAseq panel data analysis

  1. Issue description
  2. Affected software versions
  3. Some information about the fix implemented
  4. Recommendations
  5. How to identify affected datasets
  6. Benchmark
  7. Useful links

Issue description

The Create UMI Reads tool distributed with the QIAseq Targeted Panel Analysis plugin, versions 1.0, 1.0.1 and 1.1, handles the merging of paired reads into UMI reads incorrectly when the contributing raw reads have been amplified by more than one gene-specific primer, that is, when the R1 reads for that UMI group start at different primer positions.

Two issues have been identified:

  1. In affected cases, some paired end reads with the same UMI tag and the same R2 mapping positions, but with different R1 mapping positions, were put into separate UMI groups when they should have been put into a single UMI group.

  2. When the "Ignore end gaps" option is turned off and where the Minimum supporting consensus fraction was low, UMI reads could end up longer than the original reads, with stretches of bases assigned quality scores of 0. Workflows delivered by QIAseq Targeted Panel Analysis 1.1 had this combination set as the default.

    Variants reported in regions containing such stretches of bases may not represent the sample data. These stretches, representing regions between the amplified fragments, were filled in with bases from the reference sequence and given a quality score of 0. Thus, any false reference variants identified in a region like this would be expected to have a very low quality score.

 

Affected software versions

  • QIAseq Targeted Panel Analysis and QIAseq Targeted Panel Analysis Server Plugin, versions 1.0, 1.0.1 and 1.1
  • QIAseq DNA V3 Panel Analysis and QIAseq DNA V3 Panel Analysis Server Plugin, versions 1.0 and 1.0.1

This issue was fixed in QIAseq Targeted Panel Analysis and QIAseq Targeted Panel Analysis Server Plugin, version 1.2 and it does not affect the Biomedical Genomics Analysis or Biomedical Genomics Analysis Server plugin.

 

Some information about the fix implemented

Reads amplified with different primers but originating from the same UMI labeled DNA fragment, and therefore belonging to the same UMI group, are now put into the same UMI group. This will generally result in fewer, larger (more reads) UMI groups, which in turn should lead to more accurate UMI reads.

Stretches representing regions between the amplified fragments of a UMI group are nowfilled with Ns reflecting the fact that the base in the sample is unknown. Reads with Ns at a particular position are ignored during variant detection. This change has the additional outcome that UMI reads created using paired end reads from target regions with multiple primers will generally have higher quality scores, thereby increasing the overall average quality scores of the UMI reads.

 

Recommendations

1)     Upgrade to a plugin version not affected by this issue.

2)     If using customized workflows based on those distributed with QIAseq Targeted Panel Analysis 1.1 or earlier, open a new copy of the workflow and customize it according to your needs (recommended).

Alternatively, adjust the filtering steps of your existing workflow after upgrading the plugin
.

Explanation: When you upgrade, the new version of Create UMI Reads will be available for all workflows, but settings in workflows other than those distributed by the plugin are not updated. Our benchmarking suggests that not updating the filtering strategy can increase the number of false positives reported. Please see the Benchmark section below for further details.

3)     Run some analyses done using affected plugin versions again to determine whether the earlier results are likely to be affected by this issue. How to determine this is described in the next section.If you determine that your targets are not affected by this issue, then no further action is needed. If targets of interest are affected, then we recommend that analyses are re-run with the new software version.

 

How to identify affected datasets

The number of UMI reads created from raw reads amplified using more than one gene specific primer can be found in the "UMI reads being longer than the input reads" section of the Create UMI Reads Report, as created using QIAseq Targeted Panel Analysis Plugin version 1.2 or Biomedical Genomics Analysis plugin version 1.0.

UMI Reads with stretches of Ns seen in the Mapped UMI Reads Track generated using QIAseq Targeted Panel Analysis Plugin version 1.2 or Biomedical Genomics Analysis plugin version 1.0 would suggest that the data set is affected and thus earlier analyses results are likely to be affected. An example of such a case is shown in Figure 1, below.

 

Figure 1: Tracks showing UMI reads created using QIAseq Targeted Panel Analysis Plugin versions 1.0, 1.1 and 1.2, from top to bottom respectively, mapped to the reference genome. The same UMI-labelled DNA fragment was amplified with different gene specific primers in some areas. The UMI reads in the "Version 1.2" track at the bottom, were created using a version of the Create UMI Reads tool without the issue reported here. From this track, we can see this region was affected: There is a lower UMI coverage value compared to the tracks above it, as some UMI groups are now larger. In addition, stretches of Ns are visible, shown as grey boxes. In the earlier versions, such stretches would have been filled using bases from the reference and given a quality score of 0.

 

Benchmark

This issue has a relatively small effect on the variants called, as illustrated by the similar F1 score values shown below for different situations. However, this benchmarking suggests that accompanying changes to filtering steps improve results further, illustrated by comparing the second and third columns of data in the table below.

 

Version 1.1

(Affected Create UMI Reads tool)

Version 1.1 but with a fixed Create UMI Reads tool and without other accompanying changes

Version 1.2

(Fixed Create UMI Reads tool and accompanying changes)

True Positives

242

246

258

False Positives

45

54

47

False Negatives

30

26

14

F1 score

0.866

0.860

0.894

Table 1: Benchmark results generated using the Identify QIAseq DNA Somatic Variants workflow on an Illumina paired end dataset containing 272 known variants.

 

Useful links

Updating workflows stored in CLC Workbench Navigation Areas:
http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Configuring_workflow_tools.html

Updating workflows you installed on a CLC Workbench:
http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Workflow_version_update.html

Biomedical Genomics Analysis (plugin) manual and Latest Improvements:
http://resources.qiagenbioinformatics.com/manuals/biomedicalgenomicsanalysis/current/index.php?manual=Introduction.html

https://www.qiagenbioinformatics.com/biomedical-genomics-analysis-latest-improvements/

QIAseq Targeted Panel Analysis (plugin) manual and Latest Improvements:
http://resources.qiagenbioinformatics.com/manuals/qiaseqpanels/current/QIAseq_Panel_Analysis_Plugin.pdf

https://www.qiagenbioinformatics.com/qiaseq-panel-latest-improvements/

This page was: Helpful | Not Helpful