HomeCLC software: Important notificationsPrinter Friendly Version

CLC software: Important notifications

Important messages related to CLC software products

1. Issues affecting latest releases of products

1.1. Two reference insertion variants can be reported at the same position

Issue description

In regions where the evidence could support different insertions, it is possible for 2 reference variants to be reported at the insertion position, instead of one reference variant, as would be expected.

If this effect is observed in your data, we recommend carefully inspecting the corresponding area of the read mapping to evaluate the reported variants.

Two reference insertions at a given position will look something like this in a variant track:

 

Affected software

  • CLC Genomics Workbench 7.5 and up
  • Biomedical Genomics Workbench 2.1 and up
  • CLC Genomics Server 6.5 and up

2. Issues affecting only older versions of products

2.1. Import of COSMIC Release v90 is not supported

Issue description

The Track importer for the COSMIC variation database does not support the import of COSMIC Release v90 due to issues with that version. From QIAGEN CLC Genomics Workbench 20.0.4 and QIAGEN CLC Genomics Server 20.0.4, import of COSMIC version 91 is supported. Import of versions earlier than v90 is also supported.

From version 91, COSV IDs are used instead of COSM, with each COSV ID imported as a single variant with information from all relevant transcripts and samples. Further details are available in the CLC Genomics Workbench manual.

 

Recommendations

To use COSMIC data with the CLC Genomics Workbench or CLC Genomics Server, please upgrade your software to a version that supports version 91 (or later), or download and import an earlier version than version 90 of the COSMIC database.

Information about downloading the COSMIC database can be found at: https://cancer.sanger.ac.uk/cosmic/download

 

2.2. Restrictions on sequence names when creating BLAST databases

Issue description

New requirements for sequence identifiers, introduced with NCBI BLAST+ 2.8.1, are affecting the creation of BLAST databases in some versions of QIAGEN CLC software. (See the "Affected software and tools" section below.) 

These requirements include a 50 character limit for local accessions, and stringent checks on the format of accessions that resemble PDB identifiers. The effect of these changes can be exacerbated in the QIAGEN CLC software due to how we handle sequence identifiers to avoid duplicate identifiers, which are not accepted by makeblastdb, the NCBI BLAST+ program for creating BLAST databases.

Recommendations

Upgrade your software to a version where this restriction is not present.  See the "Affected software and tools section" below for details.

If upgrading is not an option, the following work-arounds are available:

  1. Download an pre-formatted database from the NCBI using Download BLAST Database.

  2. Install an older version of a QIAGEN CLC Workbench and use its BLAST-related tools. See the linked FAQ about how to get installers for older versions of the software.

    BLAST databases made using older versions of the QIAGEN CLC Workbenches and QIAGEN CLC Genomics Server can be searched using tools in the affected versions of the software.

  3. Use Batch Rename to rename the sequences with unique identifiers shorter than 50 characters.

    Here, we recommend avoiding underscores if you will be building BLAST databases using affected versions of the software, as these can lead to identifiers appearing to be malformed PDB identifiers, which will cause makeblastdb (the NCBI tool used by Create BLAST Database) to fail.

  4. Create or obtain BLAST databases from another source, and place these in a location your QIAGEN CLC software knows about. (E.g. Using Manage BLAST Databases for QIAGEN CLC Workbenches or by configuring the QIAGEN CLC Genomics Server accordingly.)

 

Affected software and tools

This problem affects only the following QIAGEN CLC software, where BLAST+ 2.9.0 was included:

Software affected

  • QIAGEN CLC Main Workbench 20.0, 20.0.1 and 20.0.2
  • QIAGEN CLC Genomics Workbench 20.0, 20.0.1 and 20.0.2
  • QIAGEN CLC Genomics Server 20.0, 20.0.1 and 20.0.2

This issue was addressed in CLC Genomics Workbench 20.0.3, CLC Main Workbench 20.0.3 and CLC Genomics Server 20.0.3, where we replaced the BLAST+ 2.9.0 makeblastdb tool, used by Create BLAST Database with makeblastdb from BLAST+ 2.6.0. This is the same version used in CLC Genomics Workbench 12.x. This change does not affect the searching of BLAST databases.

Earlier product versions (before 20.0) are not affected by this problem. Such versions include programs from BLAST+ 2.6.0 and earlier, which predate the restrictions relating to sequence identifiers that underly this issue.

Tools affected

  • Create BLAST Database
  • BLAST, for running local BLAST searches were the search is run against a set of sequences as the target. In this case, a database is created on the fly, which can fail if identifiers do not meet NCBI's requirements.

2.3. Incorrect annotation and visualization of a small minority of overlapping variants

Issue description

An issue relating to identifying overlapping annotations has been identified that has several potential effects, described below. All are expected to be rare.

1) Annotate RNA Variants incorrectly annotates a small number of variants

Using Annotate RNA Variants, a tool delivered by the Biomedical Genomics Analysis plugin, a small number of variants at specific positions of a given mRNA track may have the following annotations incorrectly applied:

  • Matches known intron
  • Possible splice signatures
  • Conserved splice signature

It is the features of a particular mRNA track that influence the locations affected by this problem. The read mapping and reference sequence used do not have any influence.

In affected software versions, this problem is present in the Perform QIAseq Multimodal Analysis (Illumina) and the Perform QIAseq RNA Fusion XP Analysis ready-to-use workflows, as these include the Annotate RNA Variants tool. The preconfigured mRNA track used in these workflows, Homo_sapiens_refseq_GRCh38.p13_no_alt_analysis_set_pt.wonderland_RNA, may have incorrect  annotations for variants called at the following 23 genomic positions:

Chromosome  
Position
1 19115428
1 100409180
1 100409182
1 202155532
1 202155534
2 203049947
12 110628918
12 110628919
14 49654066
14 49654068
16 1940862
16 72098644
16 72098646
16 89191471
16 89191473
18 54942674
18 54942676
19 19233905
19 19233906
19 54095358
X 15543134
X 15543135
X 15543136


We cannot exclude the possibility that variants at other positions may be affected, but the above list includes the ones we expect to be affected in this particular mRNA track.

If you use a different mRNA track with this tool, or with workflows that include this tool, then the above position list does not apply. Other positions are likely to be affected. We expect the number of positions to be of the same magnitude.

2) Transcript Discovery occasionally identifies an incorrect exon boundary

Using Transcript Discovery, a tool delivered by the Transcript Discovery plugin, an incorrect exon boundary can occasionally be identified. Due to the expected level of sensitivity and precision of this tool, we expect this to have very little impact in practice.

3) Visualization of affected variant annotations is not as expected

This issue can manifest itself as a cosmetic problem in the rendering of annotations and variants, in some editors. In affected positions, annotations and variants may be displayed in multiple vertical layers, instead of beside one another, or they may appear to be "hopping" vertically when you scroll in the editor.

This is a visualization problem that can affect track views, track lists, and (non-track) sequence editors. The underlying, recorded position of the annotations and variants is not affected.

Affected software and tools

This issue was addressed in CLC Genomics Workbench 20.0.3 and CLC Genomics Server 20.0.3.

  • Annotate RNA Variants of the Biomedical Genomics Analysis plugin is affected when used on CLC Genomics Workbench 20.0, 20.0.1 or 20.0.2. The same tool delivered by the Biomedical Genomics Analysis Server Plugin is affected when used on CLC Genomics Server 20.0, 20.0.1 or 20.0.2.
  • Transcript Discovery of the Transcript Discovery plugin is affected when run on CLC Genomics Workbench version 20.0.2 or any earlier version. The same tool delivered by the Transcript Discovery Server Plugin is affected when used on CLC Genomics Server 20.0, 20.0.1 or 20.0.2.
  • Visualizations of affected results can be affected in CLC Genomics Workbench version 20.0.2 or any earlier version.

 

2.4. False positives and misannotation of some fusions in fusion detection workflows

Issue description

Three issues have been identified affecting tools for fusion detection, and thereby affecting workflows delivered by the Biomedical Genomic Analysis 20.0 plugin for fusion detection, which contain these tools. The affected workflows are:

Full details of the software, workflows and tools affected are provided in the "Affected software and tools" section below.

These problems were addressed in Biomedical Genomic Analysis 20.0.1 and Biomedical Genomic Analysis Server Plugin 20.0.1 through a combination of bug fixes and improvements.

Issue 1. Incorrect annotations of fusions

Some fusions identified are not being annotated correctly by Annotate Fusions with Known Fusion Information: Some known fusions detected in the data are not being annotated, while some unknown fusions detected may be annotated as known fusions.

Issue 2. Fusions with breakpoints in close proximity are reported with the same read count

For two fusions where both breakpoints are within 12bp of one another, a given read can be counted as supporting both those fusions. This can then lead to the two fusions being assigned the same read count, as shown in Figure 1. Closer inspection of the read-mapping may reveal that one of the fusions has much better support than the other.

Figure 1. Two fusions with identical 5' breakpoints and 3' breakpoints within 12 bp of each other are listed here. The read count is reported to be the same for both, but inspection of the read mapping could reveal better support for one of them.

Issue 3. Large numbers of false positives being reported for some datasets

Functionality introduced in the Biomedical Genomic Analysis 20.0 plugin to detect exotic fusions, i.e. fusions where one or both breakpoints are not at an exon boundary (Figure 2), has led to a large number of such fusions being detected for some datasets, a large fraction of which are false positives.

Figure 2. Examples of exotic fusions into the middle of an exon and into an intron

Recommendations

We plan to release an update to the software where these issues will be addressed. In the meantime, possible actions to take when detecting fusions are:

  1. Use older versions of the software. These issues do not affect the tools and workflows of Biomedical Genomics Analysis Plugin 1.2.x    

    You need CLC Genomics Workbench 12.x to install and use Biomedical Genomics Analysis 1.2.x, and CLC Genomics Server 11.x to install and use Biomedical Genomics Analysis Server Plugin 1.2.x.

    See the related FAQ, listed below, for information on getting installers for older versions of the software.

  2. If continuing to use the tools and workflows for fusion detection included with Biomedical Genomics Analysis 20.0, the number of false positives, and the impact of these, can be decreased by:
    • Adjusting the following settings in your custom workflows, or when otherwise using the affected tools. Workflows distributed with the software already include these suggestions.
      • Trim reads for homopolymers prior to fusion detection and annotation using the Trim Reads tool. Relevant settings depend on the purpose of the workflow. See Figure 3.
      • Increase the "Breakpoint distance" parameter in the Refine Fusion Gene tool to 25, as shown in Figure 4. (The default value is 10).

    • Reviewing the evidence for exotic fusions as described in the Interpretation of fusion results section of the manual.

Figure 3. Homopolymer trimming settings included in workflows delivered by Biomedical Genomic Analysis 20.0

 

Figure 4. Breakpoint distance parameter of the Refine Fusion Genes tool increased to 25 from the default value of 10.

 

Affected software and tools

 The issues described on this page were addressed in Biomedical Genomic Analysis 20.0.1 and Biomedical Genomic Analysis Server Plugin 20.0.1 through a combination of bug fixes and improvements.

Affected software:

  • Biomedical Genomic Analysis plugin 20.0
  • Biomedical Genomic Analysis Server Plugin 20.0

On affected software versions, the following workflows were affected:

as those workflows contained one or more of the following affected tools:

 

2.5. Expression values from UPX 3' Transcriptome Kit data are systematically too high in many cases

General information

The issue described here affects results of the tools Quantify QIAseq UPX 3' and Analyze QIAseq Panels guide -> UPX 3' RNA of the Biomedical Genomics Analysis plugin when used on QIAGEN CLC Genomics Workbench 20.0 or 20.0.1.

If you are using affected tools, please upgrade your software. This issue was fixed in QIAGEN CLC Genomics Workbench 20.0.2 and QIAGEN Genomics Server 20.0.2. See the Affected software and tools section below for further details.

To check the software version used to generate a data element, you can refer to its history information.

Issue description

The QIAseq UPX 3' Transcriptome protocol involves two amplification steps, one before and one after fragmentation:

 

When quantifying expression, affected software only uses UMIs to correct for duplicate molecules generated during the second amplification step. It does not correct for duplicate molecules from the first amplification step.

This issue leads to systematically high expression values for affected samples.


Results likely to be affected

Samples are most likely to be affected when the sequencing depth is high compared to the amount of input RNA. For example, most single-cell data is likely to be affected.

Any applications making direct use of expression values (including TPM and RPKM and absolute counts) will be affected.

Results less likely to be affected

Where effects are normalized: It is likely that overall results will not be strongly affected for analyses involving normalization. For example, the most significant differentially expressed genes of an analysis are likely to be correctly detected despite this issue.


QIAseq UPX 3' Targeted RNA Panels: Analysis of data from these panels is largely unaffected by this issue. However, for some panels, there is a possibility that the same molecule will be amplified by two different primers. In such cases the resulting duplicate molecules will not be detected due to the issue described here.

 

Affected software and tools

This issue affects the tools Quantify QIAseq UPX 3' and
Analyze QIAseq Panels guide -> UPX 3' RNA, delivered by the following plugins:

  • Biomedical Genomics Analysis 1.2.x and 20.0.x
  • Biomedical Genomics Analysis Server Plugin 1.2.x and 20.0.x

This issue was fixed in CLC Genomics Workbench 20.0.2 and CLC Genomics Workbench 20.0.2, where choosing the "3' sequencing" Library type setting of RNA-Seq Analysis when analyzing reads that have been annotated with UMIs by tools of the Biomedical Genomics Analysis plugin, results in expression values in the GE track being based on the number of distinct UMIs for each gene, rather than the number of reads.

 

2.6. Create MLST Scheme tool leads to strains being linked to wrong sequence type in some cases

Issue description

The "Create MLST Scheme" tool in affected software versions associates sequences with alleles based on the position of each sequence in the list of those selected. This means that sequences will be associated with the correct allele type only if

  • there is one sequence present for every allelic number in the profile, and
  • the sequences are listed in the same order as the allelic number in the profile. (I.e. the allelic number in the profile corresponds to the position of the allele in the sequence list).

Otherwise, the association of sequence types with alleles will be incorrect.

Tools that download and import schemes directly from public sites like PubMLST are not affected by this issue, specifically the "Download MLST Schemes (PubMLST) and "Download Other MLST Scheme" tools provided by the CLC Microbial Genomics Module and the CLC MLST Module.


Expected impact and recommendations

Impact

We expect many MLST schemes created using "Create MLST Scheme", where sequences have been added to genes in the profile, to be affected when using the software listed below. Where an affected scheme is being used, in affected software versions or later software versions, isolates types are likely to be associated with the wrong sequence type.


Recommendations

If an MLST scheme was created using an affected software version (see below), and sequences were added to genes, we recommend checking that:

  • There are as many sequences listed as there are alleles in the profile, and that
  • The positions of the alleles in the sequence list and allelic numbers in the profile match up as expected.


This can be done by opening the scheme and going to the Allele Table view. If you find your MLST scheme is affected by this problem, please discard the scheme and disregard the sequence types in any results that have been generated using it.


New schemes, unaffected by this problem, may be set up in one of the following ways:

  • Download a scheme from a public resource directly, if such is available, using the "Download MLST Schemes (PubMLST) or "Download Other MLST Scheme" tool.
  • Create a new scheme using an updated version of the software that is not affected by this problem when it becomes available.
  • Create a new scheme using an affected software version, but either do not add sequences to it, or when adding sequences, ensure there is a sequence for each allele in the profile and that the allelic sequences are in the same order as expected by the profiles. This can be done by:
    • Selecting individual sequence elements in the correct order in the tool wizard, or
    • Importing a file containing all the allelic sequences ordered as expected into the CLC Workbench. If no sequence data is available for particular allele, a sequence name should be present in the file present, but no sequence listed under it. This results in an empty sequence for that allele within the imported sequence list.
    • Editing the allele list in the profile outside the CLC Workbench so it contains just the entries that will have sequences associated, in the relevant order.


Affected software and versions

  • CLC Microbial Genomics Module 1.1 through 4.5
  • CLC MLST Module – all versions up to 1.9.1

This issue was addressed in CLC Microbial Genomics Module 4.8 and CLC MLST Module  1.9.2.

2.7. Incorrect ARO numbers on some ResFinder entries in QMI-AR databases

Issue description

Due to a processing error, some of the 373 entries in the QIAGEN Microbial Insight - Antimicrobial Resistance (QMI-AR) database originating from ResFinder have been assigned incorrect ARO (Antibiotic Resistance Ontology) numbers.

Two versions of the affected database are available: the QMI-AR Nucleotide Database, which contains nucleotide sequences and the QMI-AR Peptide Marker Database, which contains peptide markers.

Expected impact and recommendations

QMI-AR Nucleotide Database

Impact: Using the "Find Resistance with Nucleotide DB" tool with the nucleotide version of QMI-AR may yield misleading results as some of the genes originating from the ResFinder Database are associated with incorrect ARO numbers.

Recommendations: Affected gene entries originate from the ResFinder database. Entries originating from ResFinder are clearly marked in the QMI-AR database with the prefix "RESF_" in the "Find Resistance table". We do not have a list of which of the 373 entries are affected by the problem described here and thus recommend that entries from the ResFinder database reported when using the QMI-AR (2019-05) database are ignored.

QMI-AR Peptide Database

Impact: Results using "Find Resistance with ShortBRED" with QMI-AR (2019-05) are likely less affected by this problem because homologous genes are clustered during the preparation of this database. Due to this clustering step, we expect the CARD ARO assignment to take precedence over any potentially incorrect ResFinder ARO assignments. However, any misannotated gene may give rise to a nonspecific annotation on the resulting markers as the ARO assignments will be lifted to an LCA of all constituent sequences of a given cluster.

Recommendation: While we do not expect the resistance(s) identified with the "Find Resistance with ShortBRED" tool to be misleading, we cannot say definitively that they will be meaningful when entries from ResFinder are included. We unfortunately have no specific recommendations in this case except to interpret results bearing this in mind.


Affected versions

- QMI-AR Nucleotide Database (2019-05)
- QMI-AR Peptide Marker Database (2019-05)

These were made available for download using the Download Resistance Database tool of CLC Microbial Genomics 4.5 on June 27, 2019. This problem was corrected in updates to these databases (2019-09) released on September 16, 2019.

2.8. Reference multi-nucleotide variants (MNVs) are removed when applying filter or annotation tools under some circumstances

Issue description

In rare cases, reference MNVs may be removed in error by tools that add and remove information from variants and tools that filter variants. This occurs when an MNV is called as the reference allele for a region containing multiple non-reference variants, and some of the non-reference variants are subsequently filtered away. Reference variants without perfectly matching alternate variants may then, in some cases, be removed.

Expected impact

This issue is expected to affect a minority of variants, arising where there is low support for some of the called variants. This is most likely in analyses where very low frequency variants are being considered.

Details

The chance of calling multiple alternate alleles in the same region increases when detecting very low frequency variants, especially if the data contained in a read mapping are of low quality. If variant detection is followed by filtering steps using tools such as "Remove False Positives", as would be common in a workflow context, any reference MNV allele appearing without a corresponding variant MNV would be filtered away. This would result in these variants being represented without corresponding reference alleles, even though the data supported the existence of reference alleles. SNVs in this situation in affected software would have incorrect or misleading zygosity.

Prior to CLC Genomics Workbench 12.0 and CLC Genomics Server 11.0, downstream issues after exporting to VCF could also arise for affected variants as two or more overlapping heterozygous variants without corresponding reference alleles would be reported on separate lines as "reference allele unknown" (a "." in the GT field), when they should have been reported on a single line, as heterozygous alleles.

 

Further background

In one of the later steps of variant detection, if contiguous single nucleotide variants (SNVs) have been found, evidence is sought in the reads for the presence of an MNV. If this evidence exists, the contiguous SNVs are reported as an MNV. Otherwise, they remain classified as SNVs.

During the final step of variant detection, count and coverage filters are applied, and potential variants identified before this point will be filtered out if they do not meet the necessary cut-off criteria.

For a region with two or more contiguous, heterozygous SNVs in the data, it is possible for a potential variant MNV allele to be filtered out at this step such that the variants are represented as multiple SNVs. The reference MNV may then be filtered out by downstream tools, leaving the SNV alleles with no corresponding reference variant, when the data supported the presence of one.

Changes to VCF export introduced in CLC Genomics Workbench 12.0 and CLC Genomics Server 11.0 partially mitigate this situation through the introduction of 4 options for how complex variants should be represented. Of particular note with relation to this issue is the "Reference overlap" option, which is the default, and when selected results in overlapping alternate alleles being appropriately assigned a heterozygous genotype when the reference variant does not perfectly match overlapping alternate variants.

Affected software

The problem exists as described for the following software versions:

  • CLC Genomics Workbench 6.0 - 12.0
  • CLC Genomics Server 5.0 – 11.0
  • All versions of Biomedical Genomics Workbench

The downstream effects of the problem were partially mitigated through changes to the VCF export functionality introduced in CLC Genomics Workbench 12.0 and CLC Genomics Server 11.0 described in the Further Background section above.

Further mitigation of this problem was introduced with CLC Genomics Workbench 12.0.1 and CLC Genomics Server 11.0.1, where the behavior of the following tools was changed so that reference variants without exact matching non-reference variants are retained if they partially overlap non-reference variants: Annotate with Flanking Sequence, Annotate with Conservation Score, Annotate with Exon Numbers, Remove Variants Present in Control Reads, Remove Marginal Variants, , Remove Orphan Reference Variants, Filter against Known Variants, Filter Based on Overlap, GO Enrichment Analysis, Link Variants to 3D Protein Structure, Predict Splice Site Effect, TRIO Analysis, Identify Shared Variants, Add Information from Overlapping Genes (legacy), Compare Simple Variant Tracks (legacy) and Remove Variants Found in Allele Frequency Community (from the Ingenuity Variant Analysis plugin).

2.9. Loss of reference allele information for neighboring SNPs when using certain downstream filtering tools after variant calling

An issue was discovered where multiple nucleotide variants (MNV) representing a reference allele would be filtered out when any of the filtering tools listed below was used.

As a consequence of this problem, the number of reads reported as supporting the reference allele could be incorrect after filtering was carried out. If the output was then exported, e.g. to VCF, the incorrect counts would also be exported.

We expect the issue described here to have little or no impact on the identification or interpretation of variant calls within the Workbench. 

Software affected

  • CLC Genomics Workbench 6.0.4 to 12.0
  • CLC Genomics Server 5.0.4 to 11.0
  • All versions of Biomedical Genomics Workbench

List of tools that cause the described behavior 

 

CLC Genomics Workbench

 

Biomedical Genomics Workbench
  1. Annotate with Overlap Information                   
  2. Annotate with Flanking Sequences
  3. Annotate with Exon Numbers
  4. Compare Variants within Group
  5. GO Enrichment Analysis
  1. Add Information from Overlapping Genes
  2. Add Flanking Sequences
  3. Add Exon Numbers
  4. Compare Shared Variants in a Group of Samples

2.10. Sporadic corruption of analysis outputs on GPFS file systems

Issue Description

When using a distributed file system like GPFS, a bug in Java 10 can lead to output files from analyses being corrupted. This issue has been observed with CLC software using Java 10. In practice, we have observed the issue affecting analysis log files most frequently. However, if intermediate output files generated during a workflow run are corrupted, this can result in the workflow failing before it completes.

Analyses may complete normally if they are re-launched. This problem, while sporadic, can be frequent enough to be quite inconvenient.

 

Affected software versions

This issue only affects systems running Linux with a distributed file system, such as GPFS. The following software versions are affected:

  • CLC Genomics Server 11.0
  • CLC Genomics Workbench 12.0
  • CLC Main Workbench 8.1

This issue was fixed in CLC Genomics Server 11.0.1, CLC Genomics Workbench 12.0.1 and CLC Main Workbench 8.1.1 through the introduction of a work-around at the code level that resolves the problem when working on setups where data storage is on GPFS file systems. However, this issue can also affect our internal indexing systems, so we highly recommend that software from affected release lines (CLC Genomics Server 11.x, CLC Genomics Workbench 12.x and CLC Main Workbench 8.1.x) not be installed directly on a GPFS file system.

2.11. Bonferroni and FDR multiple testing corrections too strict for differential expression analyses

Issue description

Calculations for the Bonferroni and FDR multiple testing corrections in Differential Expression for RNA-Seq and Differential Expression in Two Groups used an inflated value for the number of tests performed. In affected software versions, this number included both the number of tests performed and the number of untestable genes/transcripts, i.e. those with NaN as an expression value.

The impact of this is that fewer differentially expressed transcripts/genes are reported after applying a p-value cut-off based on these corrections than should have been the case. The missing transcripts/genes will be those nearest to the p-value cut-off, i.e. they will not be the most significantly differentially expressed transcripts/genes.

Even when using affected software:

  • Transcripts/genes that are reported as differentially expressed are correctly reported.
  • For sets of samples with no NaN values, the multiple testing corrections are correct.
  • 'Raw' p-values are correctly reported.

 

Recommendations

For results generated using affected software versions, upgrading to an unaffected version is generally recommended. If analyses are then re-run using the tools Differential Expression for RNA-Seq and Differential Expression in Two Groups, an increased number of differentially expressed transcripts/genes may be reported due to the changes made to the multiple testing correction methods.

Affected versions

  • CLC Genomics Workbench 10.0 to 12.0.2
  • CLC Genomics Server 9.0 to 11.0.2
  • Biomedical Genomics Workbench 5.0 and above
  • The Advanced RNA-Seq plugin available for CLC Genomics Workbench 9.x and CLC Genomics Server 8.x

A fix was implemented in CLC Genomics Workbench 12.0.3 and CLC Genomics Server 11.0.2.

2.12. Local realignment of certain UMI reads may differ between runs

Issue description

The Local Realignment tool may realign some reads from the same read mapping slightly differently in different runs when run on multithreaded systems. This issue can thus occur if a read mapping used as input has reads with gaps that have been left aligned as well as reads with gaps that have been right aligned. Reads in such mappings that can be affected are identical reads with the same start and end positions with gaps (insertions or deletions) that are differently aligned.

This issue can affect the realignment of read mappings created by the Create UMI Reads tool of the Biomedical Genomics Analysis plugin, where gaps in reads can be left aligned or right aligned. The issue is most likely to affect data that contain repetitive regions or homopolymeric regions, such as Ion Torrent data.

This issue does not affect the realignment of read mappings created by other tools in the CLC Genomics Workbench or CLC Genomics Server, such as Map Reads to Reference, Map Reads to Contigs, RNA-Seq Analysis and Map Bisulfite Reads to Reference. Gaps in mappings generated by these are consistently left aligned.


Expected impact

If used for variant detection, the minor differences in realigned read mappings due to this issue can lead to a few very high quality variants having slightly different QUAL values. For example, for one of the mappings, a variant called may have the maximum (capped) QUAL value of 200, while for another, it may have a QUAL value of 159.546 (the second highest possible value) or 156.536 (the third highest possible value). When considered in the context of workflows with downstream filtering steps, it is thus possible that some variants with QUAL values may pass a filter in one run, but not in another run.

Workflows distributed with plugins for affected versions of CLC software filter for QUAL values greater than 20, and thus this issue should not affect results generated using these.

 

Affected software

  • CLC Genomics Workbench 12.0, 12.0.1 and 12.0.2
  • CLC Genomics Server 11.0, 11.0.1 and 11.0.2

A fix for this issue was implemented in CLC Genomics Workbench 12.0.3 and CLC Genomics Server 11.0.3.

 

2.13. Variants that span a target region are not called

Issue description

When calling variants with the Basic Variant Detection, Low Frequency Variant Detection, or Fixed Ploidy Variant Detection tool and using the "Restrict calling to target regions" option, variants that span the target region, that is, overlap with and are longer than, the target region, are not detected.

This issue would primarily affect the detection of deletions. An example where this might arise would be where a deletion occurred at the DNA level, but the target was defined based on an exon. If the deletion was longer than the exon, it would not be detected.

The effect of this is illustrated on a small synthetic example. Here, when 4bp target regions are provided, variants are called as expected. When the target regions are reduced to 2bp, the deletions are no longer called because they span the target regions, but other variants are still called as expected.

Affected software

  • CLC Genomics Workbench 6.5 to 12.0.2
  • CLC Genomics Server 5.5 to 11.0.2
  • All versions of Biomedical Genomics Workbench

From CLC Genomics Workbench 12.0.3 and CLC Genomics Server 11.0.3, variants extending up to 50 nucleotides beyond either end of a target region are now reported in full, while variants extending even further will include only the first 50 nucleotides beyond the target region. Insertions at the right hand border of a target region are now considered to be a variant within the target region.

2.14. Variants covering the exact length of a target region are not called and variants are not called for target regions of 1bp

Issue description

When calling variants with the Basic Variant Detection, Low Frequency Variant Detection, or Fixed Ploidy Variant Detection tool and using the "Restrict calling to target regions" option, the following problems exist:

  • Any variant that is exactly the length of the target region will not be called. For example, if a target region is made that is 1bp long, with the intention of detecting a known SNV at that position, that SNV will not be detected. Similarly, any target region defined such that it covers exactly the length of a variant will be affected by this problem.

  • No variants will be called for target regions of 1 bp regardless of the length of the variant.

 

The effect of these issues are illustrated using a small, synthetic example. Here, 4bp target regions are provided, and six variants are called as expected. After reducing the target regions to 2bp, the 2bp MNVs, AG (left) and CA (right) are not called because they exactly match the coordinates of the target regions. When the target regions are further reduced to 1bp no variants are called.

Affected software

  • CLC Genomics Workbench 6.5 to CLC Genomics Workbench 12.0.1
  • CLC Genomics Server 5.5 to CLC Genomics Server 11.0.1
  • All versions of Biomedical Genomics Workbench

This issue was fixed in CLC Genomics Workbench 12.0.2 and CLC Genomics Server 11.0.2.

2.15. Interquartile range test of the Detect MSI Status tool reports all loci as unstable

Issue description

The interquartile range test used in the Detect MSI Status, distributed with the Biomedical Genomics Analysis 1.1 plugin does not work as intended and if this statistical test is used, all loci will be reported as unstable, regardless of whether they are stable or not.

Please do not select the interquartile range test when using this tool or running workflows that include it, when using an affected software version.

Expected impact

We expect the impact of this issue to be limited. The default test type for the Detect MSI Status tool, including when run within ready-to-use workflows, is "Standard deviation", which is not affected by this problem.

Affected software

Biomedical Genomics Analysis 1.1 and Biomedical Genomics Analysis Server Plugin 1.1

This issue was fixed in Biomedical Genomics Analysis 1.2.

2.16. Some larger single deletions reported as multiple, individual deletions when affine gap scoring used

Issue description

When using affine gap scoring in Map Reads to References, Map Reads to Contigs, RNA-Seq Analysis, or Add Reads to Contig on affected software versions, sub-optimal alignments could occur, where opening a gap was not being sufficiently penalized and gap extension should have been preferred. Variants are still called in the right position(s) when variant detection is done using such mappings as input, but in affected regions, multiple, individual deletions may be reported where a single, larger deletion would have been more appropriate.

This problem is expected to predominantly affect mappings with lower quality, long reads, for example, PacBio or Oxford Nanopore data, or when using reference sequences from species different to the one the reads originated from.

By default Map Reads to References, Map Reads to Contigs, RNA-Seq Analysis and Add Reads to Contig use linear gap scoring, which is not affected by this problem. However, affine gap scoring is the default for these tools when included in Ready-to-Use workflows delivered by the Biomedical Genomics Workbench and by the plugins QIAseq Targeted Panel Analysis and QIAGEN GeneRead Panel Analysis. Affine gap scoring is also the default in at least some of the workflows delivered with the CLC Microbial Genomics Module.

 

Affected software versions

  • CLC Genomics Workbench 8.0 - 11.x
  • Biomedical Genomics Workbench - all versions
  • CLC Genomics Server 7.0 - 10.x
  • CLC Cancer Workbench 2.0

This problem will also affect the CLC Genome Finishing Module installed on the above software versions, as it delivers the tool Add Reads to Contig. In addition, any workflows delivered by plugins installed on these versions that include Map Reads to References, Map Reads to Contigs or RNA-Seq Analysis and use affine gap scoring will be affected.

This issue was fixed in CLC Genomics Workbench 12.0, CLC Genomics Server 11.0. Workflows delivered by any version of the Biomedical Genomics Analysis plugin or by  CLC Microbial Genomics Module 4.0 and higher are therefore not affected by this problem. 

2.17. QIAseq Mitochondria Panel DHS-105Z workflow analyses use incorrect genetic code

Issue description

The Mitochondria Panel (DHS-105Z) workflow launched using the Analyze QIAseq Panel tool, uses the standard genetic code rather than the vertebrate mitochondrial code in affected software versions. Used as delivered in affected versions, this affects predictions based on codons that differ between these 2 codes. Of particular note is the potential for false positives or false negatives relating to whether a variant causes premature translation termination, due to incorrect nonsense annotations.

In addition, the Genetic Code parameter of the Amino Acid Changes tool in the Ready-to-Use workflows is locked to "Standard" in affected software versions. Thus, when using such workflows, the suggested work-around should also be used when analyzing data from the QIAseq Mitochondria Panel DHS-105Z.

 

Affected software

- Biomedical Genomics Analysis 1.0
- QIAseq Targeted Panel Analysis 1.0, 1.0.1, 1.1 and 1.2
- QIAseq DNA V3 Panel Analysis 1.0 and 1.0.1

This issue was fixed in Biomedical Genomics Analysis 1.1 and the corresponding server plugin.

Work around for affected software versions

The work-around is documented at:

http://resources.qiagenbioinformatics.com//manuals/biomedicalgenomicsanalysis/100/index.php?manual=Mitochondria_panel.html

This work around can be carried out in any of the affected software versions.

2.18. Issues with Create UMI Reads tool for QIAseq panel data analysis

  1. Issue description
  2. Affected software versions
  3. Some information about the fix implemented
  4. Recommendations
  5. How to identify affected datasets
  6. Benchmark
  7. Useful links

Issue description

The Create UMI Reads tool distributed with the QIAseq Targeted Panel Analysis plugin, versions 1.0, 1.0.1 and 1.1, handles the merging of paired reads into UMI reads incorrectly when the contributing raw reads have been amplified by more than one gene-specific primer, that is, when the R1 reads for that UMI group start at different primer positions.

Two issues have been identified:

  1. In affected cases, some paired end reads with the same UMI tag and the same R2 mapping positions, but with different R1 mapping positions, were put into separate UMI groups when they should have been put into a single UMI group.

  2. When the "Ignore end gaps" option is turned off and where the Minimum supporting consensus fraction was low, UMI reads could end up longer than the original reads, with stretches of bases assigned quality scores of 0. Workflows delivered by QIAseq Targeted Panel Analysis 1.1 had this combination set as the default.

    Variants reported in regions containing such stretches of bases may not represent the sample data. These stretches, representing regions between the amplified fragments, were filled in with bases from the reference sequence and given a quality score of 0. Thus, any false reference variants identified in a region like this would be expected to have a very low quality score.

 

Affected software versions

  • QIAseq Targeted Panel Analysis and QIAseq Targeted Panel Analysis Server Plugin, versions 1.0, 1.0.1 and 1.1
  • QIAseq DNA V3 Panel Analysis and QIAseq DNA V3 Panel Analysis Server Plugin, versions 1.0 and 1.0.1

This issue was fixed in QIAseq Targeted Panel Analysis and QIAseq Targeted Panel Analysis Server Plugin, version 1.2 and it does not affect the Biomedical Genomics Analysis or Biomedical Genomics Analysis Server plugin.

 

Some information about the fix implemented

Reads amplified with different primers but originating from the same UMI labeled DNA fragment, and therefore belonging to the same UMI group, are now put into the same UMI group. This will generally result in fewer, larger (more reads) UMI groups, which in turn should lead to more accurate UMI reads.

Stretches representing regions between the amplified fragments of a UMI group are nowfilled with Ns reflecting the fact that the base in the sample is unknown. Reads with Ns at a particular position are ignored during variant detection. This change has the additional outcome that UMI reads created using paired end reads from target regions with multiple primers will generally have higher quality scores, thereby increasing the overall average quality scores of the UMI reads.

 

Recommendations

1)     Upgrade to a plugin version not affected by this issue.

2)     If using customized workflows based on those distributed with QIAseq Targeted Panel Analysis 1.1 or earlier, open a new copy of the workflow and customize it according to your needs (recommended).

Alternatively, adjust the filtering steps of your existing workflow after upgrading the plugin
.

Explanation: When you upgrade, the new version of Create UMI Reads will be available for all workflows, but settings in workflows other than those distributed by the plugin are not updated. Our benchmarking suggests that not updating the filtering strategy can increase the number of false positives reported. Please see the Benchmark section below for further details.

3)     Run some analyses done using affected plugin versions again to determine whether the earlier results are likely to be affected by this issue. How to determine this is described in the next section.If you determine that your targets are not affected by this issue, then no further action is needed. If targets of interest are affected, then we recommend that analyses are re-run with the new software version.

 

How to identify affected datasets

The number of UMI reads created from raw reads amplified using more than one gene specific primer can be found in the "UMI reads being longer than the input reads" section of the Create UMI Reads Report, as created using QIAseq Targeted Panel Analysis Plugin version 1.2 or Biomedical Genomics Analysis plugin version 1.0.

UMI Reads with stretches of Ns seen in the Mapped UMI Reads Track generated using QIAseq Targeted Panel Analysis Plugin version 1.2 or Biomedical Genomics Analysis plugin version 1.0 would suggest that the data set is affected and thus earlier analyses results are likely to be affected. An example of such a case is shown in Figure 1, below.

 

Figure 1: Tracks showing UMI reads created using QIAseq Targeted Panel Analysis Plugin versions 1.0, 1.1 and 1.2, from top to bottom respectively, mapped to the reference genome. The same UMI-labelled DNA fragment was amplified with different gene specific primers in some areas. The UMI reads in the "Version 1.2" track at the bottom, were created using a version of the Create UMI Reads tool without the issue reported here. From this track, we can see this region was affected: There is a lower UMI coverage value compared to the tracks above it, as some UMI groups are now larger. In addition, stretches of Ns are visible, shown as grey boxes. In the earlier versions, such stretches would have been filled using bases from the reference and given a quality score of 0.

 

Benchmark

This issue has a relatively small effect on the variants called, as illustrated by the similar F1 score values shown below for different situations. However, this benchmarking suggests that accompanying changes to filtering steps improve results further, illustrated by comparing the second and third columns of data in the table below.

 

Version 1.1

(Affected Create UMI Reads tool)

Version 1.1 but with a fixed Create UMI Reads tool and without other accompanying changes

Version 1.2

(Fixed Create UMI Reads tool and accompanying changes)

True Positives

242

246

258

False Positives

45

54

47

False Negatives

30

26

14

F1 score

0.866

0.860

0.894

Table 1: Benchmark results generated using the Identify QIAseq DNA Somatic Variants workflow on an Illumina paired end dataset containing 272 known variants.

 

Useful links

Updating workflows stored in CLC Workbench Navigation Areas:
http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Configuring_workflow_tools.html

Updating workflows you installed on a CLC Workbench:
http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Workflow_version_update.html

Biomedical Genomics Analysis (plugin) manual and Latest Improvements:
http://resources.qiagenbioinformatics.com/manuals/biomedicalgenomicsanalysis/current/index.php?manual=Introduction.html

https://www.qiagenbioinformatics.com/biomedical-genomics-analysis-latest-improvements/

QIAseq Targeted Panel Analysis (plugin) manual and Latest Improvements:
http://resources.qiagenbioinformatics.com/manuals/qiaseqpanels/current/QIAseq_Panel_Analysis_Plugin.pdf

https://www.qiagenbioinformatics.com/qiaseq-panel-latest-improvements/

2.19. Variant calling on results from RNA-Seq Analysis cannot detect insertions and deletions at intron-exon boundaries

Issue description

Read mappings produced by the RNA-Seq Analysis tool treat deletions and insertions at exon-intron boundaries as splice variants. This means that such deletions/insertions are not detected in downstream variant calling.

 

Figure 1: An example of the issue as seen in CLC Genomics Workbench 11. The short deletion in the middle of the right-hand side exon, seen in the Read Mapping track, is found by the variant detection tools in the Workbench and appears in the variant track. However, the deletions at the end of the first exon and the start of the second exon are not identified by the variant detection tools, but rather are treated as if they were splice variants. In addition, the inserted sequence "GAAAA" is not detected.

 

This issue will be fixed in the forthcoming CLC Genomics Workbench 12 release, expected at the end of November, 2018. With that release, the data above would look like that shown in Figure 2, below.

Figure 2: The same data analyzed in the upcoming CLC Genomics Workbench 12. All deletions and insertions in the mapping can, in principle, be found by the variant detection tools, subject to the statistical model and settings used. This image shows the results of an analysis with very lax settings for illustration purposes. This corrected behavior implicitly favors the hypothesis of a deletion/insertion over a novel splice junction.

 

Affected software

  • CLC Genomics Workbench 11.x and earlier.
  • All versions of Biomedical Genomics Workbench.
  • CLC Genomics Server 10.x and earlier

2.20. RNA-Seq Analysis does not count some reads covering start-end position of a circular chromosome

A problem with read counts has been identified in the RNA-Seq Analysis tool when using circular reference sequences. It affects all single reads and some paired end reads that map across the start/end positions of a circular reference where there is a gene annotation over this region. Affected reads are mistakenly considered as mapping to an intergenic region.

For affected analyses, all count values for the genes crossing the start/end position of the circular reference, as well as values with derivations depending on these counts, will be incorrect.

This issue does not affect RNA-Seq analyses where the "One reference sequence per transcript" option has been selected. It also does not affect analyses using the older legacy RNA-Seq Analysis tool.

 

Work around

If your analyses are affected by this problem and you have not yet upgraded to a version of the software where this problem has been addressed, the work around involves redefining the start point of circular references to lie outside a gene region and re-running the analyses with the new references. These are the standard steps that would be involved:

  1. Create a standard, annotated sequence or sequence list using the circular reference by using the Convert from Tracks tool, selecting the sequence track and all annotation tracks that you want to use for this or any other analysis involving these references.

  2. View the reference sequence to be adjusted in circular view and move the start position to somewhere not covered by a gene annotation. How to do this is describe in the manual at:

    http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Mark_molecule_as_circular_specify_starting_point.html
  3. Use the Convert to Tracks tool on the annotated reference sequence(s), choosing to generate a sequence track and all the annotation track types you originally chose when you converted to tracks.

Please ensure the new references and annotation tracks are easily distinguished from the old ones. Using a mix of the new reference sequences with old annotation tracks, or vice versa, would cause incorrect results. This is because the annotations would be in the wrong location along the reference sequence if a mix of the old and new reference tracks were used.

Software affected

  • CLC Genomics Workbench 7.x, 8.x, 9.x, 10.x and 11.x
  • All versions of Biomedical Genomics Workbench
  • CLC Genomics Server 6.x, 7.x, 8.x, 9.x and 10.x

2.21. Some comparison operators in the Identify Candidate Variants tool do not work

Issue description

Comparisons involving the following operators in the Identify Candidate Variants tool do not work and produce empty results:

  • >=  
  • <=
  • abs value > 
  • abs value <
  • abs value >= 
  • abs value <=

The History information of a data element output by this tool after using an affected comparison operator looks correct, but the comparison is not applied as expected.

Other operators are not affected by this problem.

 

Recommendation

Avoid setting up filters using the affected comparison operators in the Identify Candidate Variants tool on affected software. In most cases, a single operator equivalent can be specified instead. For example, the condition x>=15 could be specified as x>14 when filtering on integer values.

Affected software

  • CLC Genomics Workbench 10.1, 10.1.1, 10.1.2, 10.1.3, 11.0 and 11.0.1
  • Biomedical Genomics Workbench 4.1, 4.1.1, 4.1.2, 4.1.3, 5.0 and 5.0.1
  • CLC Genomics Server 9.1, 9.1.1, 9.1.2, 9.1.3, 10.1and 10.0.1

This issue affects the Identify Candidate Variants tool, whether run directly or via a workflow.

Ready-to-use workflows distributed with the Biomedical Genomics workbench or plugins that contain the Identify Candidate Variants tool are not affected as distributed, as the default configurations contain only working comparison operators.  If such a workflow is copied, and an Identify Candidate Variants element is edited to include an affected operator, then the newly created workflow will be affected when run on the software versions listed above.

 

 

 

2.22. Variant calls possible within common sequence regions of QIAseq DNA panel results under specific circumstances

Issue description

For paired QIAseq panel data, common sequence is left on the end of the non-indexed read when that read is longer than the DNA fragment being sequenced.

This will usually not cause problems, as the contaminating common sequence is not aligned to the reference by the read mapper, and so is ignored by variant detection tools. However, false positive variants may on occasion be called when i) enough of the common sequence, by chance, matches the reference, leading to alignment and ii) there is at least one mismatch in that region, leading to a variant call.

We expect false positive variants in such regions to occur rarely in practice due to the combination of circumstances needed. See the Further Background section below for more information.

This issue is caused by a problem with the Remove Ligation Artifacts tool. This tool, and workflows that include this tool, are distributed in the QIAseq DNA V3 Panel Analysis and the QIAseq Targeted Panel Analysis plugins. A full list of software and versions affected is provided at the end of this article.

 

Further background

When the length of a sequencing read exceeds the length of the DNA fragment being sequenced, the read can end with artifacts from library preparation, such as adapter sequence. 

In the case of paired QIAseq V3 panel data, an artifact routinely encountered is "common sequence" at the 3' end of the non-indexed read.  This is shown in simplified form below.  If the read is long enough, it may also contain some of the UMI sequence.

The Remove Ligation Artifacts tool trims away artifacts, so that they are not considered in variant detection analysis. However, in the affected software, regions of common sequence and UMI sequence remain on the non-indexed read of a pair. This is generally not a problem, as the common sequence and UMI do not usually match the reference where the read has mapped. Non-matching regions in read mappings are ignored when variant detection analysis is run.

However, if the part of the common sequence adjacent to the DNA fragment, by chance, matches the reference well enough to be mapped, variant detection tools will consider that region. At least 2 bases of the common sequence near the target region must match the reference for this to happen with default read mapping penalties. If such a region matches perfectly, then this does not affect the variant detection results as no variants will be reported. However, if at least one base does not match the reference, then a variant can potentially be called within the common sequence region. 

Examples

A false positive variant, in the common sequence at the 5' end of the DNA fragment of interest:

The CCT within the red box is part of the common sequence, but the 2 Cs happen to match the reference at this point. This leads to these bases being mapped to the reference and being considered during variant detection analysis.  The T of the CCT does not match the reference, and here has been called as a variant by the variant detection tool.

 

A false positive variant, in the common sequence at the 3' end of the DNA fragment of interest:

The AGGA in the red box is part of the common sequence.Three of the four bases in this region happen to match the reference, so this region is mapped to the reference and is thus considered during variant detection analysis. The first G in this region does not match the reference, and in this case is called as a variant by the variant detection tool.

 

Other symptoms

When considering if a particular variant is a false positive due to this issue, it can be useful to consider that the common sequence will appear on only one of the pair members - either only the ones that map in the reverse direction to the reference, OR only the ones that map in the forward direction to the reference. To check for this, choose the option in the Reads Track section of the side panel called "Disconnect paired reads" and check if all the affected reads are red or all of them are green.

 

Affected software

The Remove Ligation Artifacts tool and the QIAseq DNA v3 Panel Analysis workflow distributed in:

  • QIAseq DNA V3 Panel Analysis 1.0  (plugin)
  • QIAseq DNA V3 Panel Analysis Server Plugin 1.0

The Remove Ligation Artifacts tool and the Targeted DNA and Targeted RNAScan workflows distributed in:

  • QIAseq Targeted Panel Analysis 1.0  (plugin)
  • QIAseq Targeted Panel Analysis Server Plugin 1.0

The Detect QIAseq RNAscan Fusions (beta) workflow distributed in:

  • QIAseq Targeted RNAScan Panel Analysis 0.5.1 beta 1
  • QIAseq Targeted RNAScan Panel Analysis  Server Plugin 0.5.1 beta 1

 

This issue was fixed in

  • QIAseq Targeted Panel Analysis 1.0.1
  • QIAseq Targeted Panel Analysis Server Plugin 1.0.1
  • QIAseq DNA V3 Panel Analysis 1.0.1
  • QIAseq DNA V3 Panel Analysis Server Plugin 1.0.1
  • QIAseq Targeted RNAScan Panel Analysis 0.5.2 beta 1
  • QIAseq Targeted RNAScan Panel Analysis Server Plugin 0.5.2 beta 1

2.23. Acidobacteria and "dsDNA viruses, no RNA stage" are not included in databases created with Create Microbial Reference Database

Issue description

When the option "Bacteria" is selected as an NCBI source during configuring the Create Microbial Reference Database tool, Acidobacteria and "dsDNA viruses, no RNA stage" will NOT be downloaded and therefore will NOT be part of the resulting database. Accordingly, the Taxonomic Profiling analysis which is making use of this database will not annotate Acidobacteria-derived sequences. 

Recommendation

Do not use CLC Microbial Genomics Module for Taxonomic Profiling analysis when your results depend on the presence of Acidobacteria and "dsDNA viruses, no RNA stage" .

Affected software

The problem affects the CLC Microbial Genomics Module versions 2.0 through to 2.5.1 inclusive.

2.24. Incorrect taxonomic IDs assignment during Taxonomic Profiling as consequence of adding viruses to the reference database

Issue description

When the option "Viruses" is selected as an NCBI source when configuring the Create Microbial Reference Database tool, incorrect taxonomies will be assigned to the organism names. This in turn leads to incorrect taxonomic IDs being assigned during Taxonomic Profiling analysis.

This problem exists whether or not NCBI sources are selected as well as "Viruses". As long as "Viruses" is selected, incorrect taxonomies for all organisms will result.

Recommendation

Please upgrade your plugin. This issue was addressed in CLC Microbial Genomics Module 2.5.1.

Do not include "Viruses" as a source when running the Create Microbial Reference Database tool with the CLC Microbial Genomics Module 2.0 through to 2.5 inclusive.

Affected software

The problem affects the CLC Microbial Genomics Module versions 2.0 through to 2.5 inclusive

 

 

 

 

 

2.25. Variant coding impact for duplications of 5' splice sites in genes on positive strand may be missed

Issue description

Duplications located on the 5' intron-exon boundary of a coding region for a gene on the positive strand are reported as insertions in the intron rather than a duplication within the exon. This is due to the left-alignment of insertions relative to the reference sequence  in both Workbenches, as described in the manual section Gap placement.

Data affected

A duplication at a 5' intron-exon junction must be present in a gene on the positive strand for this issue to affect results.

Genes on the negative strand are not affected by this issue.

Expected impact

An affected variant would not be reported as having a coding impact, when it should be reported as have a coding impact, when using the "Add Information about Amino Acid Change" tool in the Biomedical Genomics Workbench or the "Amino Acid Change" tool in the CLC Genomics Workbench.

This problem is affected to arise infrequently, due to the nature of the variants affected.

Software affected

  • Biomedical Genomics Workbench 2.1 through 5.0
  • CLC Genomics Workbench 5 through 11.0
  • CLC Genomics Server 4.0 through 10.0

This issue was addressed in Biomedical Genomics Workbench 5.0.1, CLC Genomics Workbench 11.0.1 and CLC Genomics Server 10.0.1.

 

2.26. Workflows with Map Reads to Reference as the first step use only the first 120 selected elements

Issue Description

When running a workflow that contains the Map Reads to Reference tool as the first step, only the reads in the first 120 selected elements will be used. The only indication of this limitation is provided in the Workflow wizard where the (120) is shown even if more than 120 input elements are selected.

Analyses affected

Workflows containing the Map Reads to Reference tool as the first step are affected when more than 120 elements are selected. This includes when running such a workflow in batch mode, where one or more batch units contains more than 120 elements.

This issue does *not* affect workflows where the Map Reads to Reference tool is not the first step. While the limit of 120 input elements still exists, such a workflow will fail with an informative error if another workflow element passes more than 120 elements into the Map Reads to Reference tool.

If more than 120 data elements are supplied to the Map Reads to Reference tool when run from the Workbench Toolbox, (i.e. not within a workflow), the limit of 120 input elements exists, but a warning is shown in the wizard saying "At most 120 inputs are allowed. Create a new sequence list first."

Expected impact

Any read mapping generated in a workflow with Map Reads to Reference as the first workflow tool and more that 120 input files would be missing reads that were expected to be included, without obvious indication that this is the case.

Work arounds

1) Concatenate sequence elements or sequence lists into one or more sequence lists, such that fewer than 120 data elements contain all the reads to be mapped.

How can I concatenate sequence lists and when do I need to?

2) Generate several mappings, manually or using batching, using 120 or fewer input elements per mapping task. The resulting mappings could then be merged later, if desired.

Affected software

  • CLC Genomics Workbench 7.5 through 9.5.5, and versions 10.0 through 10.1.1.
  • Biomedical Genomics Workbench 2.1 through 3.5.5, and versions 4.0 through 4.1.1.

Affected analyses launched from the above workbenches to run on the CLC Genomics Server, version 6.5 through 8.5.5, and versions 9.0 through 9.1.1 are also affected.

 

2.27. InDels and Structural Variants tool produces additional breakpoints if broken pairs included

Issue Description

Additional breakpoints can be reported by the InDels and Structural Variants tool when both, broken and intact reads, map to a given region. This issue only affects analyses where the option Ignore broken pairs is unchecked and when using one of the affected software versions. By default Ignore broken pairs is checked.

Expected Impact

  • Variants affected are reported as "complex" variants, and are included in the Structural Variants (SV) track instead of the Indel Variants (InDel) track output by the InDels and Structural Variants tool.
  • Any tool that uses an affected InDel track thus also has the potential to be affected via knock-on effects of any missing insertion or deletion entries.

    For example,
    • If an affected InDel track is used as a guidance track for the Local Realignment tool, then for regions where indels are not present due to this issue, but should be, the Local Realignment mapping output could be affected.
    • If an affected mapping were then used as input to Basic Variant Detection, Fixed Ploidy Variant Detection or Low Frequency Variant Detection tool, indels affected by this issue may then also be missed by the variant detection tools, despite the existence of reads supporting them in the data.

 

Recommendation

If working with software affected by this issue, please choose to keep the Ignore broken reads option checked when running the InDels and Structural Variants tool.

Affected Software

  • CLC Genomics Workbench 9.5 through 9.5.5, and versions 10.0 through 10.1.1
  • Biomedical Genomics Workbench 3.5 through 3.5.5, and versions 4.0 through 4.1.1
  • CLC Genomics Server 8.5 through 8.5.5, and versions 9.0 through 9.1.1

2.28. Some PacBio reads and other reads longer than 500bp are included in mappings despite not passing length and similarity cut-offs

Issue description

An issue with read mappings was found where length and similarity fraction cut-offs were ignored for the small subset of PacBio reads and reads of other types longer than 500bp where:

  • the best alignment of the read involved a stretch of residues that exactly match the reference sequence, and where
  • no extension from the alignment seed was possible.

When reads are affected by this problem, they are included in read mappings when they should not be. Such reads will have only a short section of the total read that matches the reference. When the mapping is viewed, such reads are recognized by their long unaligned ends.

Expected impact

For variant detection, we expect little or no effect on outcome. This issue is expected to affect only a small subset of reads that are over 500bp, and for those reads, only a small stretch of matches are added to the mapping. No mismatches are added. Most of an affected mapped read will consist of unaligned ends, and such areas do not contribute to variant calling.

For those mapping contigs to a reference, there is the possibility of some contigs mappings very badly. If affected this way, it is likely to be immediately obvious when the mapping is viewed. A badly mapped contig would have only a short length of residues matching the reference, with most of the contig's residues being unaligned.

Affected software

  • CLC Genomics Workbench 10.0 through 10.1.1
  • Biomedical Genomics Workbench 4.0 through 4.1.1
  • CLC Genomics Server 9.0 through 9.1.1
  • CLC Assembly Cell 5.0, 5.0.1, 5.0.2, and 5.0.3

2.29. Read counts reported for contained multi-nucleotide variants (MNVs) are too low under certain circumstances

Under certain limited circumstances, the read count reported for the Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection tools is lower than it should be.

This issue affects a small subset of variants where all the following apply:

  • there are reads suggesting the presence of a large MNV or deletion and reads suggesting the presence of smaller MNVs within the larger variant region, AND
  • the smaller, contained MNVs have a different sequence than the larger variant, AND
  • one or more of the potential variants of that region are disregarded during variant calling process and reads from the disregarded variants are reassigned to potential variants still under consideration that they support, AND
  • a potential variant supported by these reassigned reads is then also disregarded.

In such a situation, the initially reassigned reads should be checked again for support of potential variants still under consideration. However, in affected software versions, this second round of checks is not done.

Thus, under this set of circumstances, the count reported for a variant can be lower than it should be.

We expect this issue to affect very few variants and to have limited impact for those affected. However, some false negatives may be present in affected software versions: Some MNVs fitting the description above, which are supported by reads initially allocated to other potential variants, might pass count or frequency filters that they currently do not, including filters within the variant callers themselves, if this problem were not present.

Software affected

  • From CLC Genomics Workbench 7.5 up to and including version 9.5.4. Also affects version 10.0 and 10.0.1.
  • From Biomedical Genomics Workbench 2.1 up to and including version 3.5.4. Also affects version 4.0.
  • From CLC Genomics Server 6.5 up to and including version 8.5.4. Also affects version 9.0.

This issue was fixed in the following versions. Later releases in the same major version line (e.g. 9.x, 4.x, etc.) also contain the fix.

  • CLC Genomics Workbench 9.5.5
  • CLC Genomics Workbench 10.1
  • Biomedical Genomics Workbench 3.5.5
  • Biomedical Genomics Workbench 4.1
  • CLC Genomics Server 8.5.5
  • CLC Genomics Server 9.1

2.30. Variant callers incorrectly calculate the BaseQRankSum

The BaseQRankSum values reported by the Basic Variant Detection, Low Frequency Variant Detection and Fixed Ploidy Variant Detection tools were found to be incorrect for the majority of variants reported.

We expect the issue described here to have little or no impact on the identification or interpretation of variant calls within the Workbench.

This issue only affects variant identification and interpretation if the BaseQRankSum has been used as a quality and filter criteria during data analysis. Our default parameter settings and ready-to-use workflows do not make use of the BaseQRankSum values.

The BaseQRankSum value should not be used for the interpretation of variants in affected software versions.

Software affected

  • From CLC Genomics Workbench 7.5 up to and including version 9.5.4. Also affects version 10.0 and 10.0.1.
  • From Biomedical Genomics Workbench 2.1 up to and including version 3.5.4. Also affects version 4.0.
  • From CLC Genomics Server 6.5 up to and including version 8.5.4. Also affects version 9.0.

This issue was fixed in the following versions. Later releases in the same major version line (e.g. 9.x, 4.x, etc.) also contain the fix.

  • CLC Genomics Workbench 9.5.5
  • CLC Genomics Workbench 10.1
  • Biomedical Genomics Workbench 3.5.5
  • Biomedical Genomics Workbench 4.1
  • CLC Genomics Server 8.5.5
  • CLC Genomics Server 9.1

3. Issues affecting only versions of products released prior to June 2017

3.1. Search for Sequences at NCBI tool missing first and last result

An issue with the Search for Sequences at NCBI tool was introduced in Workbench products released on the March 2, 2017 where the first and last search result in each result page returned by the search was missing.

  • For searches for a specific accession number, the outcome of this problem was that no result was listed, even though the "Total number of hits" reported was 1. 
  • For searches with multiple results, the first and last entry on each page of results returned was missing.

Software affected

This issue was introduced in, and only affects:

  • CLC Genomics Workbench 10.0
  • CLC Main Workbench 7.8
  • CLC Sequence Viewer 7.8

This issue was fixed in CLC Genomics Workbench 10.0.1, CLC Main Workbench 7.8.1 and CLC Sequence Viewer 7.8.1, released on March 15, 2017.

3.2. Marginally higher counts reported for a small subset of variants

The count and read count values reported by the Basic Variant Detection, Low Frequency Variant Detection and Fixed Ploidy Variant Detection tools were found to be marginally higher than was actually the case for a small minority of cases.

We expect the issue described here to have little or no impact on the identification or interpretation of variant calls within the Workbench.

The issue described here has been fixed for the

  • CLC Genomics Workbench 9.5.4
  • Biomedical Genomics Workbench 3.5.4 and
  • CLC Genomics Server 8.5.4

These versions were released on February 14, 2017.

Issue details

This issue involves potential variants that overlap, where one of the overlapping variants, but not the other, is confirmed and reported as a variant in the results.

The consequences of this issue are, for the small group of affected variants:

  • Slightly higher variant frequencies could be reported than should have been. For some cases this could result in allele frequencies above 100% being reported.
  • Using software where this issue has been fixed could result in a small decrease in the final number of variants reported when compared to results reported using earlier versions. This would be due to some potential variants no longer passing the count, read count or allele frequency filtering constraints set. To give an idea of the magnitude of the change that one might observe: in our tests, for a particular analysis that reported 250,000 variants, 30 fewer were reported with the same parameters and filters applied after the fix to this issue was implemented.

Software affected

  • CLC Genomics Workbench 7.5 up to and including version 9.5.3
  • Biomedical Genomics Workbench 2.1 up to and including version 3.5.3
  • CLC Genomics Server 6.5 up to and including version 8.5.3

 

3.3. Coverages and read counts for variants in certain circumstances are incorrect

The reporting of coverage, read coverage, read count and forward and reverse read count of the Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection tools could be incorrect for variants meeting the particular conditions described below.

We expect the issues described here to have little or no impact on identification or interpretation of variant calls.

The issues described here have been fixed for the

  • CLC Genomics Workbench 9.5.3
  • Biomedical Genomics Workbench 3.5.3 and
  • CLC Genomics Server 8.5.3

These versions were released on December 14, 2016.

We expect the issues described here to have little or no impact on identification or interpretation of variant calls.

 

Issue description

  • For SNVs with no immediately adjacent variants, overlapping reads of a pair that had conflicting base calls for that variant position were contributing to the values calculated for coverage, read coverage, and read count of that variant. Such reads should not have contributed to these values.
  • For SNVs with no immediately adjacent variants, and where paired read data is used, if the second read of a pair containing the variant did not meet the requirements of the quality filter, neither the first nor second read of that pair were contributing to the coverage calculated for the variant. In such cases, if the first read did pass the quality filter, it should have contributed to the coverage calculation.
  • For variants identified as adjacent to one or more other variants, the values for count, read count, and forward- and reverse read count could be incorrect for variants found in overlapping regions of a pair of reads.
  • The coverage of a longer variant that contained another variant was being reported for both the longer variant and the contained variant.

 

Software affected

  • CLC Genomics Workbench 7.5 up to and including version 9.5.2
  • Biomedical Genomics Workbench 2.1 up to and including version 3.5.2
  • CLC Genomics Server 6.5 up to and including version 8.5.2

 

3.4. Simultaneously run RNA-Seq jobs using the EM option can fail or produce incorrect results on Workbenches

The issue described below has been fixed for CLC Genomics Workbench 9.5.2 and Biomedical Genomics Workbench 3.5.2. If you are running the CLC Genomics Workbench 9.5.1 or 9.5, Biomedical Genomics Workbench 3.5.1 or 3.5, please update your installation.

 

Issue description

Using CLC Genomics Workbench 9.5.1 and 9.5 and Biomedical Genomics Workbench 3.5.1 and 3.5, RNA-Seq Analysis may fail or report wrong results if multiple instances of the RNA-Seq Analysis tool are run simultaneously with the EM option turned on.

The following conditions are necessary before the issue can arise:

  • RNA-Seq Analysis is run with the EM option turned on AND
  • Two or more such RNA-Seq Analysis tasks are run concurrently.

This does not affect jobs run using the Batch option, or jobs run on a CLC Genomics Server.

 

Actions to take

Upgrade your Workbench to the latest version.

We generally recommend generally that the Batch option is used for launching multiple instances of a particular type of analysis. Using this option, jobs launched are run sequentially.

If you have been running multiple RNA-Seq analyses simultaneously on an affected Workbench, and you are not able to upgrade immediately, then on your current Workbench, re-run the RNA-Seq Analysis tasks sequentially.

Symptoms of this issue

One or more of the concurrently run RNA-Seq Analysis tasks fails with an error. Currently we are aware of three possible error messages being associated with this issue: "Index out of bounds", IllegalArgumentException" and "Modifying call after finished not supported".

Some concurrently run RNA-Seq Analysis tasks that have EM turned on may succeed, but these may include incorrect results.

 

3.5. Failure and incomplete imports for gzip and bzip2 compressed Illumina and Ion Torrent files

The issue described below has been fixed for CLC Genomics Workbench 9.5.1, Biomedical Genomics Workbench 3.5.1 and CLC Genomics Server 8.5.1. If you are running the CLC Genomics Workbench 9.5, Biomedical Genomics Workbench 3.5 or CLC Genomics Server 8.5, please update your installation.

 

Description of the issue

The problems in the list below can arise when using the tools Import | Illumina and Import | Ion Torrent to import fastq files compressed with gzip (.gzip) or bzip2 (.bz2). The problems are expected to be seen most frequently on Windows systems, but can sometimes also arise on Mac or Linux. The software affected is CLC Genomics Workbench 9.5, Biomedical Genomics Workbench 3.5 and CLC Genomics Server 8.5.

  • Import can fail with an error message "Errors occurred: see log".  No sequences are imported. The failure is sporadic: sometimes an import will succeed, but other times it will not.
  • Import may seem to succeed, when in fact only a subset of the sequences in the fastq file(s) are imported.

 

Workaround if upgrading immediately is not an option

If you are running one of the affected software versions, we recommend that you upgrade your software as soon as possible. However, if you cannot do that immediately, a work-around to this issue is to decompress the sequence files before importing them. 

 

Software affected by this problem

  • Biomedical Genomics Workbench 3.5
  • CLC Genomics Workbench 9.5
  • CLC Genomics Server 8.5

Other versions of these products are not affected.

 

 

3.6. Genes with transcripts and with identical names reported as having 0 counts even with mapped reads

The issue

When running the RNA-Seq tool using track based references and

  • using the option "Genomes annotated with genes and transcripts", and
  • two or more genes had the same name, and
  • a transcript could be assigned to each of these genes from the mRNA track

then the value in the "Transcripts annotated" column in the GE track and in the TE track was 0, and all counts for such genes were reported as zero, even when there were reads mapping to them.

 

Which software and versions are affected?

This issue affects:

  • CLC Genomics Workbench 9.0, 8.0 through 8.5.2, and 7.0 through 7.5.5
  • Biomedical Genomics Workbench 3.0, and 2.1 through 2.5.2
  • CLC Cancer Research Workbench 1.0 through 2.0
  • CLC Genomics Server 8.0, 7.0 through 7.5.2, and 6.0 through 6.5.6

 

Which software versions are fixed?

This issue is fixed in the CLC Genomics Workbench 9.0.1, Biomedical Genomics Workbench 3.0.1 and CLC Genomics Server 8.0.1, released on June 9, 2016.

It has also been addressed in the previous release line: CLC Genomics Workbench 8.5.3, Biomedical Genomics Workbench 2.5.3, and CLC Genomics Server 7.5.3, released June 16, 2016. 

The release notes describing the fix for this issue can be found on the Latest Improvements pages for each product. This particular fix is described as follows:

Fixed an issue with the RNA-Seq Analysis tool that could arise when the "Genomes annotated with genes and transcripts" option was chosen: If two or more genes had the same name, and a transcript could be assigned to each from the mRNA track, then the value in the "Transcripts annotated" column in the GE track and in the TE track was 0. Furthermore, all counts for such genes were reported as zero, even when there were reads mapping to them.

 

How can I check if my RNA-Seq Analysis has been affected by the issue?

To find out if your analysis is affected you:

  • Sort the TE track table on the column: "Transcripts annotated"
  • Any transcripts that have 0 in this column are affected. All other transcripts are not affected by this bug.

The bug was present at a very late stage in the RNA-Seq algorithm execution, when the calculation results were entered into the table. The underlying calculations were done correctly, but values for the duplicated gene names were excluded. The nonzero values that made it into the table are correct and will not change with the fix.

While transcripts with 0 in the "Transcripts annotated" column (affected transcripts) will often be seen when using the affected software versions under the analysis conditions described above, we do not anticipate that it will be necessary to re-run analyses in most cases. Only a very small number of genes/transcripts would generally be affected and these may not be genes of interest to the analysis. For example, in many cases, genes given the  same name code for rRNAs, snoRNA, and other miscellaneous RNAs, rather than mRNAs.                 

If you do see affected transcripts associated with genes of potential interest, then we recommend re-running the analysis using a version of the software that includes a fix for this problem.

 

 

 

3.7. Error in bed format file output for non-continous annotations

Who is affected

Users, who have exported annotation information to BED format from the below Workbench versions and where annotations consist of so-called block list entries (e.g.a mRNA made of of multiple exons).

Export of continuous annotations (e.g. genes of prokaryotes) are not affected.

  • CLC Genomics Workbench 8.5.1 and earlier
  • CLC Cancer Research Workbench 2.0 and earlier
  • Biomedical Genomics Workbench 2.5.1 and earlier
  • CLC Genomics Server 7.5.1 and earlier

What are the symptoms

The 12th column of the exported BED file, "blockStarts", according to the BED format should report the starting position of each 'block' of a feature (e.g. the individual exons) relatively to the start of the feature as reported in the "chromStart"column. The blockStarts column instead mistakenly gives the absolute genomics sequence coordinates for the start of exons.

Re-import of the BED file into the Workbench or import into alternate programs, will produce distorted annotation information.

When will this be fixed

This issue was fixed in the following versions, released in March, 2016:

  • CLC Genomics Workbench 9.0 
  • Biomedical Genomics Workbench 3.0
  • CLC Genomics Server 3.0

3.8. Read Mapping Global alignment setting resulting in too many unmapped reads (multiple tools)

Who is affected

For all the below mentioned affected tools, this issue is relevant only if the parameter Global alignment has been selected. Note that this is not the default setting. Users will need to specifically have checked the Global alignment option for results to be affected by this issue.

 

What are the symptoms

  • The number of mapped reads is substantially lower that what is expected or what was seen with previous Workbench versions.
  • For paired read sets, the number of broken reads will be significantly higher than expected (as one of the mates will by mistake not be mapped).

What is affected

This issue affects users running the following tools in the following software:

'Map Reads to Reference', 'Map Reads to Contigs' and 'Add Reads to Contigs' (CLC Finishing Module) in:

  • CLC Genomics Workbench 7.5.3 through 8.0.2
  • CLC Cancer Research Workbench 1.5.4 through 2.0 / Biomedical Genomics Workbench 2.1 and 2.1.1
  • CLC Genomics Server 6.5.4 through 7.0.2

'RNA seq analysis' in:

  • CLC Genomics Workbench 8.0, 8.0.1 and 8.0.2
  • CLC Cancer Research Workbench 2.0 / Biomedical Genomics Workbench 2.1 and 2.1.1
  • CLC Genomics Server 7.0, 7.0.1 and 7.0.2

This issue was fixed in the following Workbenches and the corresponding Servers:

  • CLC Genomics Workbench 7.5.5, CLC Cancer Research Workbench 1.5.6 and CLC Genomics Server 6.5.6 - released August 18, 2015
  • CLC Genomics Workbench 8.0.3, Biomedical Genomics Workbench 2.1.2 and CLC Genomics Server 7.0.3 - released August 13, 2015

3.9. Import of low quality Solid fastq paired reads may result in incorrectly matched pairs

Summary

When importing SOLiD paired reads in colorspace fastq format using the SOLiD import tool, reads without quality scores will be discarded.  This can lead to a mispairing of later reads in the list.  For example with a pair of forward-reverse sequences with names read_1 and read_2, if read_1 had no quality scores, it would be discarded. However, read_2 is kept in the paired list and ends up paired with the next forward read, which is not its partner. Thus, the resulting Sequence List in the Workbench contains incorrectly paired reads.

Incorrectly matched pairs will lead to problems in any analysis where paired information is taken into account, e.g. when mapping reads to reference sequences, most pairs would be expected to be recorded as broken pairs.

Who is affected

Anyone who has SOLiD colorspace fastq data, for which quality data is missing for one or more reads, and who has imported the data using the tool: Import | SOLiD. 

You will not be affected if quality information is present for all your SOLiD sequencing reads.

The issue is present in CLC Genomics Workbench 8.0.1 and prior versions, the CLC Cancer Research Workbench 2.0 and earlier, and the Biomedical Genomics Workbench 2.1.

This issue was fixed in CLC Genomics Workbench 8.5 and Biomedical Genomics Workbench 2.5, released in September, 2015.

 

3.10. Basespace fastq files imported using Solid Fastq importer reports incorrect sequences

Summary

Base space fastq files imported using the SOLiD Fastq importer of the CLC Genomics Workbench 8.0 and earlier, or the CLC Cancer Release Workbench 2.0 and earlier results in Sequence Lists with incorrect sequence data. We are very sorry for any inconvenience this has caused.

Who is affected

Anyone who has imported a base space fastq file using the tool: Import | SOLiD and then has chosen a base space fastq file to import using this tool in the CLC Genomics Workbench 8.0 and 7.5.2 and earlier, and the CLC Cancer Release Workbench 2.0 and 1.5.3 and earlier. This issue does not affect import of fastq files in color space.

What are the symptoms?

The Sequence List will report sequences that are not correct. If you use a text reading program to look at the top of your basespace fastq file, you will be able to see that the sequences there do not resemble sequences of the same name, if you chose to keep names on import, in the sequence list. If you did not choose to keep names on import, then the sequences should be in the same order as in your original fastq file if it was single data. For paired data, the first sequence should be the first sequence in one of your two fastq files. It is anticipated that you will see that the top sequence does not match either of these.

When will this be fixed?

This issue has been fixed in the CLC Genomics Workbench 7.5.3, the CLC Genomics Workbench 8.0.1, the CLC Cancer Workbench 1.5.4 and the Biomedical Genomics Workbench 2.1, released in April, 2015.

 

3.11. RNA seq analysis on SOLiD data producing incorrect results in CLC Genomics Workbench 7.5 and 7.5.1 plus CLC Cancer Research Workbench 1.5, 1.5.1 and 1.5.2

Who is affected

This issue only affects users running RNA seq analysis on SOLiD data in the following Workbenches:

  • CLC Genomics Workbench 7.5 and 7.5.1
  • CLC Cancer Research Workbench 1.5, 1.5.1 and 1.5.2
  • CLC Genomics Server 6.5 and 6.5.1

 This issue was fixed with CLC Genomics Workbench 7.5.2, CLC Cancer Research Workbench 1.5.3 and corresponding CLC Servers, released Feb 17, 2015.

What are the symptoms

  • You will see reads mapping to exon-exon boundaries only. No reads map fully within one exon.
  • The number of mapped reads is substantially lower that what is expected or what was seen with previous Workbench versions.

The expression levels are incorrect and should not be trusted! Please delete the affected RNA seq result files!

 

 

3.12. Positive fold change instead of absolute fold change filtering in Cancer Research Workbench 1.5.x, 2.0 and Genomics Workbench 8.0

The tool "Extract Differentially Expressed Genes" tool in the Cancer Research Workbench 1.5 and 2.0 and its counterpart tool in CLC Genomics Workbench 8.0 "Create Track from Experiment". 

This issue was fixed in the Cancer Research Workbench 1.5.4, Biomedical Genomics Workbench 2.1, and the CLC Genomics Workbench 8.0.1.

Both these tools have the facility to filter results based on various values, including fold change. Filtering on fold change is currently being done for values greater than the value provided in the Wizard. For example, in the images below, folder changes greater then 1.5 would be filtered for. What is commonly desired, however, is to filter on values that are either less than -1.5 or greater than 1.5, i.e. > abs(fold change).

 

3.13. Gene level RPKM of annotated eukaryotic genomes is using total gene reads instead of total exon reads - GWB 7.0 to 7.0.3

Description

This problem has been fixed in the CLC Genomics Workbench version 7.0.4 and the Genomics Server 6.0.4.

 

A problem exists in the Genomics Workbench that affects the reporting of RPKM values at the gene level when using annotated references involving both mRNA and gene tracks in RNA-seq analyses in the CLC Genomics Workbench versions 7.0, 7.0.1, 7.0.2 and 7.0.3.

From our investigations so far, it appears that the problem is that the total number of reads mapping to a gene are being used in the numerator of the RPKM calculation instead of the total number of reads mapping to exons.

 

Who is affected

You could be affected by this if:

  • you are running RNA-seq analysis tool in the Genomics Workbench 7.0, 7.0.1, 7.0.2 and 7.0.3 AND
  • you are working with gene and mRNA annotations. Generally speaking, this affects people running analyses with annotated eukaryotic references AND
  • you are working with the results in the GE track, that is the, the gene-level results, AND
  • you have used RPKM values of the GE (gene) track output in statisitical tests, such as t-tests or ANOVA.

This issue will be particularly noticeable if you are working with data where you expect mapping to many intronic regions. See the section below for more detail on finding out if you are likely to be affected by this.

If you have not, and do not plan to, run statistical analyses on your gene level RPKM values, but you do wish to report the geen RPKM values, then please check if your RNA-seq results have been affected (see section below). If they have, you can re-run the analyses in the Genomics Workbench 7.0.4, or you can export the RNA-seq results to a text or excel-based format and use spreadsheet or other software to calculate the RPKM vales correctly.

 

You are not affected if

  • you are running an older version of the Genomics Workbench  or you are running the Legacy RNA-seq analysis tool in the affected versions of the CLC Genomics Workbench, OR
  • you are running analyses involving just annotated genes, as would generally be the case for annotated prokaryotic genomes, or you are using a list of sequences as references, where no annotations are involved OR
  • you are working with the results in the TE track, that is, the transcript-level results OR
  • you are working with count-based results, and are running DGE or proportion-based statistical tests, OR
  • no, or very few, reads were mapped to intronic regions of your genes.

 

Are your analyses affect by this problem?

If you are in the situation described in the "You could be affected by this if" section above, then you can further investigate if this problem has had an impact on your results by doing the following:

Check your Experiment data

In Experiment data, two or more RNA-seq samples have been compared. Using Experiment reuslts, you can check if any genes have looked like they were differentially expressed due to mapping to intronic regions.

  • Ensure the column reporting intron reads is visible. 
  • Click on the name of the intron reads column to sort the column. Sort it in descending order, so the highest values are listed first.
  • Check the values in the columns you have used in considering if genes have been differentially expressed or not. If none of the values there have been in a range of interest to you, then conclusions based on your current results will not have been affected.

If there are a reasonable number of reads mapping to intronic regions of some genes in your experiment, then even if your conclusions were not affected, the RPKM values for genes where reads mapped to introns will be affected. If you plan to report the RPKM for such genes, then these will need to be re-calculated.  See the section below called "Getting updated RPKM values for the RNA-seq samples and re-running statisical analyses".

 

Check your RNA-seq GE output

Here, you can check if the RPKM values for any of your RNA-seq samples have been reported incorrectly. Given the nature of this problem, this may not result in incorrect conclusions related to differential expression. Please refer to your Experiment level data, as described above, to see if this problem may have led to any incorrect conclusions being drawn.

  • View the GE output in table format. (This is the default view.)
  • Ensure the column reporting intron reads is visible. 
  • Click on the name of the intron reads column to sort the column. Sort it in descending order, so the highest values are listed first.
  • Are there any genes with reads mapping to intronic regions? 

 

Recommended action if your analyses are affected by this problem

This problem has been fixed in the CLC Genomics Workbench version 7.0.4 and the Genomics Server 6.0.4.

To generate the correct RPKM values, please upgrade your software so that correct values are generated in future. 

If the checks above reveal thatyou are affected by this problem, then please re-run your RNA-seq analyses in the Genomics Workbench 7.0.4/Genomics Server 6.0.4 or higher.

 

Work-arounds if RNA-seq analyses from version 7.0 through 7.0.3 cannot be re-run

The information below should not be needed after May 14, 2014 when the software was updated to fix this issue. It has been left here for cases where re-running the RNA-seq analyses using the updated software is not feasible.

 

Getting updated RPKM values for the RNA-seq samples and re-running statisical analyses

If you just wish to get updated RPKM values for each RNA-seq sample, then you only need to run the first two steps listed below.

 

Run count-based statistical analyses

Count-based statistical tests are not affected by this issue, as RPKM values are not used. If you wish to try using the count-based tests, they are described in our manual:

http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=Empirical_analysis_DGE.html

http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=Tests_on_proportions.html

 

 

3.14. SignalP and TMHMM plugins not working in older Workbenches (7.0 and earlier)

Working versions of the TMHMM and SignalP plugins have been released for:

  • Genomics Workbench versions 7.0.3 as well as point versions 7.0.1 and 7.0.2
  • Main Workbench version 7.0.2 as well as point version 7.0.1
  • Drug Discovery Workbench 1.0.2 as well as point version 1.0.1

This fixes the problem described below.

Please note that the plugins in older versions of the Workbench are still currently affected.  (Entry date: May 2014).

If you are using one of the versions of the Workbenches listed above and you have one or both of these plugins installed, you should see a pop-up box inviting you to update them if they are not the latest version. You will need to be running the Workbench as an administrative user to update plugins.

If you are running an older version of a Workbench, and your license is currently covered by our Maintenance, Upgrades and Support (MUS) program, then we recommend that you upgrade to the latest version of the Workbench. If you are connecting to a Server product from your Workbench, then please make sure that your Server is in the 6.x line before carrying out this upgrade.

 

Who do the SignalP and TMHMM problems affect?

This issue affects signal peptide prediction using the Signal P plugin and TMHMM analysis using the TMHMM plugin in the Main, Genomics and Drug Discovery Workbenches older than those listed at the top of this page.

Symptom

  • In the Main and Genomics Workbenches version 7.0, the analyses fail with a java null pointer exception.
  • In earlier versions of the Workbenches, the results of SignalP analysis suggest that no signals have been detected, but in fact, the analysis has not run properly.

The underlying issue

For prediction of signal peptides and transmembrane helices, the Workbench uses SignalP and TMHMM, respectively, as made available as a free service from a third party. Unfortunately the services used in these plugins, which are not identical to the public SignalP and TMHMM webservices, are currently not responding.

We continue to hope that the third party providers will make these services available again in the future, but we do not have control over this.

 

Suggested Workarounds for people using older Workbenches

Signal P workaround

The public SignalP webservice appears to still be available. To run your analysis you can submit your input data manually to that webservice. To do this:

  1. Use the mouse to select the input sequence and copy it using ctrl+C/cmd+C. Alternatively, export the sequence(s) in FASTA format.
  2. Go to the webservice http://www.cbs.dtu.dk/services/SignalP/.
  3. Paste the sequence into the webservice or use the generated FASTA file as input.
  4. Set the parameters as appropriate and press 'Submit'.
  5. The calculation will start. Specify an email address and press 'Send email' to get the calculation to return the results.


Transmembrane Helix Prediction workarounds

a) Use the public webservice by manually transferring the input data

  1. Use the mouse to select the input sequence and copy it using ctrl+C/cmd+C. Alternatively, export the sequence(s) in FASTA format.
  2. Go to the webservice http://www.cbs.dtu.dk/services/TMHMM/
  3. Paste the sequence into the webservice or use the generated FASTA file as input.
  4. Press 'Submit'
  5. The calculation will start. Specify an email address and press 'Send email' to get the calculation to return the results.
  6. The range of residues predicted to be in transmembrane helices (TMhelix) are specified in the results.

b) Use the hydrophobicity scales provided with the workbench:

  1. Open the sequence of interest and go to the 'Protein info' tab in the side panel.
  2. A variety of hydrophobicity scales are listed. Choose e.g. Kyte-Doolittle
  3. Apply it as background color. Segments of hydrophobic residues will then show up in red. Hydrophobic segments of length around 20 residues and with an alpha-helix secondary structure will indicate the potential transmembrane helices.

3.15. Double gene and transcript annotations from Ensembl gtf files from late February to March 10, 2014

Description

With the release of Ensembl version 75 came substantial changes to the annotation format. The CLC Genomics Workbench is not able to appropriately import the annotations from version 75 of Ensembl as yet. The symptom of the problem is that annotations such as genes and transcripts are duplicated. That is, each gene or each transcript appears twice in the Workbench. (See image below.)

The settings for the Download Genomes tool in the Genomics Workbench version 7.0 and 6.x have been altered to retrieve data from Ensembl version 74 rather than version 75. The annotation format used in Ensembl version 74 is interpreted correctly by the CLC Genomics Workbench.

This issue has been addressed for the Genomics Workbench 7.0.1 and newer.

Recommended action if your data is affected

If you have used the Download Genomes tool to retrieve annotations from Ensembl since late February, or if you have yourself downloaded gtf annotation files from Ensembl version 75 and imported this into the Workbench, then we recommend that:

  • You delete these annotations. (See information at the bottom of this page.)
  • You download and import version 74 of the Ensembl annotations, either by using Download Genomes or by downloading the gtf file from Ensembl and import it using the import tools of the Workbench.
  • If you have run RNA-seq analyses or other analyses that depend on these annotations, please re-run these after importing Ensembl version 74 annotations.

Who this affects

  • People who have imported gtf files from Ensembl version 75 into the Workbench using the Annotate with GFF tool or the Import Tracks tool
  • People who have used Download Genomes to import data from Ensembl. This includes annotations for human, mouse and many other of the genomes offered via the Download Genomes tool.

This does not affect people using plant or bacterial genomes provided via Ensembl, as these do not come from the same source.

This does not affect people using annotations from other resources, for example, TAIR.

How can you tell if your annotations are affected?

1) Check the history information for our annotation track. Do this by opening the annotation track and clicking on the small icon that looks like a clock (version 7) or a book with a bookmark (earlier versions) at the bottom of the window. Here, you can see if the version of the annotations used was 75 or not. If it is 74 or earlier, then your data is not affected.

2) Zoom in on an annotation track. For example, the gene track. Each annotation will be duplicated, as shown in the image below.

 

3) If you are working with RNA-seq analyses results and you sort the gene table output on the feature ID, you will notice that genes are present twice: once with their expected name and once with the name with a -1 attached.

 

What does it mean for your results?

The duplicate Gene annotations will pose a problem if you run the RNA-Seq Analysis tool and set the 'Maximum number of hits for a read' to '1'. The reference used by the RNA-Seq tool is a list of all Gene annotation sequences. The issue that all Gene annotations are duplicated will mean that each Gene sequence will be present twice in that list. Consequently, a read that maps to a specific gene will be seen as mapping equally well to two positions in the reference - the two identical Gene sequences. This in turn means that with the parameter setting 'Maximum number of hits for a read'= '1' no reads will map.

In addition, the if you run the tool 'Annotate with overlap Information' to add Gene annotation information to e.g. a variant track then the duplicate Gene annotation issue will result in duplicate entries in resulting annotation columns of the variant track table view. 

We recommend that you re-run any analyses that have depended on annotations from Ensembl version 75.

 

How to delete annotations

If you are working with tracks, you just need to move the relevant track objects to the trash.

If you are working with stand-alone reference sequences or sequence lists, then information on deleting annotations is provided in the manual here:

http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=Removing_annotations.html

3.16. Problem with de novo assembly of single circular contigs in Genomics Workbench 6.5.2, 7.0 and Assembly Cell 4.2.1

A problem has been discovered in the Genomics Workbench versions 6.5.2 and 7.0, CLC Genomics Server 5.5.2 and 6.0, and the Assembly Cell 4.2.1. This problem will affect only a very tiny minority of cases.

This issue was fixed in the CLC Genomics Workbench 7.0.1, CLC Genomics Server 6.0.1 and Assembly Cell 4.2.2.

When does this problem arise?

Using the affected software versions, both the following need to be true for this problem to arise:

  • a complete, circular contig is output in the assembly results (i.e. a single contig, which is circular)
  • those who have assembled their data using one of the software versions listed above.

As assembling directly into a single contig is rare, this issue will likely arise very seldom in practice and would be expected to occur only in cases where the assembly involves a very small circular genome (e.g. a plasmid).

How can you tell if your assembly is affected?

There is currently no obvious reporting of when a contig has been detected as being circular by the assembler. In practice, this issue should only affect you if you working with data expected to assemble to a very small, circular contig, and only in those cases where it results in a single contig.

If this is true for your assembly, then when this issue arises, the symptom is that only the first part of the contig will be output correctly. The remaining part of the contig will either contains random nucleotides or it will not be output at all. For example, in the single case of this we have observed in practice, the contig in question contained over 97% A characters. How much of the contig the assembler is able to output correctly will vary, but in most cases the correct region will be about 100bp or less.

 

3.17. Tracks in some track lists replaced by empty area when zooming or scrolling in Genomics WB 6.5 and 6.5.1

This issue is related only to visualizing track lists and variant tracks when the viewing options for insertions are active. The issue and the work around do not affect the underlying data or any downstream analyses.

This issue affects users of  the Genomics Workbench 6.5 and 6.5.1, working with

  • tracklists containing two or more read mappings and/or
  • tracklists containing a read mapping and variant tracks, where viewing options associated with insertions in variant tracks have been turned on

When working with a single variant track, the same underlying problem causes variants to disappear from the view if viewing settings related to insertions are turned on.

This issue was fixed in CLC Genomics Workbench 6.5.2, released in January, 2014.

 

The tracklist viewing issue in detail

When working with a track list that contains two or more read mappings, one can end up being presented with a blank, grey area instead of tracks when zooming in, or scrolling back and forth along the mapping when zoomed partially or fully in.  The image below shows what one might see when zoomed fully out (left) and what one sometimes sees after zooming and scrolling (right).

The problem is related to the visualization of insertions in read mappings in track lists. Insertions are locations where an extra base (relative to the reference sequence) appears in a read or reads at a particular position in a mapping.

Work-around for CLC Genomics Workbench 6.5.0 and 6.5.1

  • Open up the view setting option for Read Mappings in the right hand pane.
  • Click on the small triangle to the left of the Read Mappings section to open up the relevant options list. 
  • Set the percentage for "Hide insertions below (%)" to 101.

 

The result of this change is that insertions relative to the reference in the mapping tracks are not shown. This does not affect your data or downstream analysis.

When working with individual variant tracks, please uncheck any boxes in the Insertions section of the viewing settings in the right hand pane of the Workbench to avoid any issues.

3.18. Trio analysis producing incorrect results from 'unknown zygosity' tracks in Genomics Workbench 6.5 and previous

This issue relates to only those cases where the trio analysis was run using variant tracks containing variants with 'unknown' zygosity. This would apply to e.g. a scenario with more than one affected child and where the 'child' input track for the trio analysis consisted of a 'common' sibling variant track created using the tools 'Compare Sample Variants' and 'Compare Variants Within Groups'. With these tools, if a variant is found in more than one offspring but with different zygosity the zygosity reported in the output track will be 'unknown'.

This issue does not apply to a regular trio analysis with a basic '1 child-mother-father' variant set.

The implications:

The trio analysis tools requires the zygosity information to find out if a variant is de novo or not, and if a variant looks suspicious.

In Genomics Workbench 6.5 and earlier versions the trio analysis would run despite input variants were listed as having unknown zygosity. This would have given rise to potentially incorrect results. Therefore, we strongly recommend that all affected trio analysis are re-run in Genomics Workbench 6.5.1

From Genomics Workbench 6.5.1 trio analysis will no longer run when input tracks contain 'unknown' zygosity variants. To run a trio analysis with more than offspring involved you will need to consider one of the following approaches:

1. Use alternate filtering tool to find common variants (only zygosity of sib 1 will be used)

Use the following filter to find the offspring common variants: Toolbox|Resequencing Analysis|Annotate and Filter Variants|Filter against Known Variants

http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=Filter_against_known_variants.html

Please note that with this tool the zygosity in the output 'common' track with be identical to offspring 1 (the filtered track/first input). You will want to consider the implications for variants where the zygosity is not identical between offspring.

2. Running one trio analysis per sibling and merging the results

Run one trio analysis for each offspring.

Filter the trio analysis outputs using 'Filter Variants Within Groups' to find the common variants.

3.19. Very slow progress through Wizard steps when there are many reference sequences - Genomics WB 6.5 and 6.5.1

The issue

For the CLC Genomics Workbench 6.5 and 6.5.1, when working with many thousands of reference sequences, it can take a very long time for each step of the Wizard to load when setting up an analysis. Examples of tasks where this effect is known to exist are:

  • Map Reads to Reference
  • Probabilistic Variant Detection
  • Quality-based Variant Detection

Any analysis involving a reference set including many thousands of sequences is likely to be affected.

Please note that this issue does not affect the analysis itself or the results. It only affects the speed of working through the Wizard when setting up the job through the Genomics Workbench.

This issue was fixed in CLC Genomics Workbench 6.5.2, released in January, 2014.

 

Workaround for CLC Genomics Workbench 6.5.0 and 6.5.1

The workaround on affected versions is to set up a one-step Workflow for the analysis concerned.

Attached to this FAQ entry are three example Workflows that can be installed and run directly in the Genomics Workbench 6.5 or 6.5.1.

Information on how to create Workflows can be found in our manual starting here:

http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=Creating_workflow.html

Information on how to install Workflows like those attached to this entry can be found in our manual here:

http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=Installing_workflow.html

 

3.20. VCF export - updating the "Exclude from enforced diploid" list in Worfklow design - GWB 6.5.1

The issue

When designing a Workflow and configuring VCF export functionality in the Genomics Workbench 6.5.1, clicking in the box labeled Enforce diploid does not cause the elements for the Exclude from enforced diploid box to be updated automatically.

The work-around

Work-around one (recommended)

Click on the small icon of a lock beside the Exclude from enforced diploid box. This causes the elements to be updated to reflect your chosen genome.

Clicking on the lock icon twice will set it back to its original state. (By default, for this option, it is locked.)

Work-around two

After choosing your genome and clicking in the box next to Enforce diploid, close the configuration wizard by clicking on the button labeled Finish.  Then double click on the Export VCF element in the Workflow design form again. Doing this will cause the list for the Exclude from enforced diploid element to be updated.

 

Further information

VCF export from the Genomics Workbench version 6.5.1 has been improved by the addition of an option to specify whether the genome in question should be considered diploid. In addition, the option to exclude certain references in the genome from being considered diploid has been added. For example, in the case of mammalian male samples, one would want to specify that chromosomes X, Y and the mitochondrial reference should not be considered diploid.

When calling the VCF export functionality directly, if you check the box labeled Enforce diploid, the list of references in your selected genome is updated, so that if you then check on the plus symbol to the right of the Exclude from enforced diploid box, you see a list of those references to choose from. As of version 7.0 of the Genomics Workbench, released in February, 2014, the Exclude from enforced diploid field should be updated automatically when configuring a Workflow also.

3.21. Download Genomes and importing tracks from ensembl-style GTF files not interpreting genes correctly in Genomics Workbench 6.5

A serious issue with the interpretation of ensembl-style gtf files when using the Download Genomes functionality or the Import Tracks functionality was discovered in version 6.5 of the Genomics Workbench. This issue has been resolved in version 6.5.1.

If you have downloaded gene annotations using Download Genomes or have chosen to import ensembl-style gtf annotation files using the tool Import | Tracks using version 6.5 of the Genomics Workbench, then we highly recommend that you:

  • delete the annotation tracks imported using Genomics Workbench version 6.5 via the Download Genomes tool or directly via import of gtf format files,
  • upgrade your Genomics Workbench to version 6.5.1 and
  • import your annotations again using the new Workbench version.

 

Please note that this issue only effects version 6.5 of the Workbench. It does not affect 6.0.5 or any earlier versions.



 

3.22. Annotation tracks of some large genomes affected in GWB 6.0.3 and earlier

Summary description

An important issue has come to our attention associated with the annotation of genome tracks for organisms other than human, where

  • there are more than 22 autosomes and
  • where the annotation files used contain references to an X chromosome and/or a Y chromosome and/or a mitochondrial sequence.  

This issue only affects annotations imported as tracks or those annotations that have been applied via Download Genomes.

For the genomes available via the Download Genome functionality of the Genomics Workbench 6.0.3 and earlier, this affects:

Bos taurus: where incorrect annotations were applied to chromosome 23 and 25, and no annotations are placed on the X chromosome and the mitochondrial reference

Danio rerio: where incorrect annotations were applied to chromosome 25 and no annotations were applied to the mitochondrial reference.

Gallus gallus: where incorrect annotations were applied to chromosome 25 and no annotations were applied to the mitochondrial reference.

 

Recommended action if your data may be affected

We highly recommend that anyone working with data for these organisms, or those how have applied track-based annotations for any other organism matching the description given in the first paragraph:

 

Further details

The identified issue causes the following general problems when importing a track from an annotation file:

  • Annotations in the file for the X chromosome (see synonyms affected in the table below) will have been applied to a chromosome called chr23, 23 or chromsome_23, and no annotations will have been applied to the X chromosome itself.
  • Annotations in the file for the Y chromosome (see synonyms affected in the table below) will have been applied to a chromosome called chr24, 24 or chromsome_24 and no annotations will have been applied to the Y chromosome itself.
  • Annotations in the file for the mitochondria (see synonyms affected in the table below) appearing on a chromosome called chr25, 25 or chromsome_25 and no annotations will have been applied to the mitochondrial reference itself.

Several synonyms would be recognized for each of the sequences mentioned above in annotations files. If the labels being used in your annotation files do not match any of these, then your data will not be affected by this issue.

 

Reference Sequence names and synonyms
chromosome X X, chrX, chromosome_X
chromosome Y Y, chrY, chromosome_Y
mitochondria M, MT, chrM, chrMT, chromosome_M, chromosome_MT
chromosome 23 23, chr23, chromosome_23
chromosome 24 24, chr24, chromosome_24
chromosome 25 25, chr25, chromosome_25

 

 

This issue does not affect:

  • any organism where the genomic data or the annotation files do not have greater than 22 autosomes
  • any organism where the annotation file did not refer to an X chromsome, a Y chromsome or a mitochondrial sequence
  • any annotations applied to sequence objects or sequence lists using the Annotate with GFF tool
  • any annotated sequences imported as sequence objects or sequence lists directly, or downloaded using the Search at NCBI functionality
  • any track-base annotations generated using Convert to Tracks after importing  sequence objects or sequence lists directly, or downloaded using the Search at NCBI functionality
  • any organism where the naming scheme for the annotations for the X, Y or mitochondrial reference had names that are not in the synonym list provided above.

The current RNA-seq tool makes use of reference data in sequence objects or sequence lists objects, not tracks. If your references were not generated from tracks,  then your data, and thus results, are not affected by this issue.

3.23. Starting up CLC Servers on Mac systems, autumn/winter 2014

This notification pertains only to CLC Servers running on Mac operating system and concerns only starting up the CLC Server after installing or upgrading the CLC Server software. The specific CLC software versions affected are given in the sections below.

 

Start-up issue for Mac OS X 10.7, 10.8 and 10.9

This problem will be seen for

  • CLC Genomics Server 6.5
  • CLC Drug Discovery Server 1.0
  • CLC Science Server 3.5
  • CLC Bioinformatics Database 4.0

If you choose the option during installation to start up the CLC Server after installation, it does not start up properly. To address this issue, please run the following two commands. Please note that only the second command should be preceded by sudo.

launchctl unload /Library/LaunchDaemons/com.clcbio.clcgenomicsserver.plist

sudo launchctl load /Library/LaunchDaemons/com.clcbio.clcgenomicsserver.plist

 

Information for Mac 10.10 (Yosemite)

Supported CLC Server versions on Yosemite

Only the CLC Genomics Server 6.5.1 and CLC Drug Discovery 1.0.1 have been tested on Mac 10.10 (Yosemite). We do not anticipate any problems running versions 6.5 and 1.0, respectively, on Yosemite once the Server is up and running, but these earlier versions have not been tested on Yosemite and if you experience problems with them, we recommend upgrading your CLC Server software.

Known issue within initial CLC Server start up after installation/upgrade

An issue with the installer of the versions listed above leads to a problem installing the service script for starting and stopping the CLC Server on Mac systems running the 10.10 (Yosemite).

If you have already upgraded to Yosemite and plan to continue to run it, then the following instructions provide a work-around for this issue. Instructions are included for each supported, affected CLC Server.


CLC Genomics Server 6.5.1

1) Issue the following command in a terminal

sudo /Applications/CLCGenomicsServer/CLCGenomicsServer stop

2) Create a file called com.clcbio.clcgenomicsserver.plist under the folder /Library/LaunchDaemons/. That is, create the file

/Library/LaunchDaemons/com.clcbio.clcgenomicsserver.plist

3) Copy exactly the following contents into that file.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
   <key>Label</key>
   <string>com.clcbio.clcgenomicsserver</string>
   <key>ProgramArguments</key>
   <array>
      <string>/Applications/CLCGenomicsServer/CLCGenomicsServer</string>
      <string>start-launchd</string>
   </array>
   <key>OnDemand</key>
   <true/>
   <key>RunAtLoad</key>
   <true/>
</dict>
</plist>

4) Save that file.

5) Run the following command:

sudo launchctl load /Library/LaunchDaemons/com.clcbio.clcgenomicsserver.plist

 

CLC Drug Discovery Server 1.01.

1) Issue the following command in a terminal

sudo /Applications/CLCGenomicsServer/CLCDrugDiscoveryServer stop

2) Create a file called com.clcbio.clcdrugdiscoveryserver.plist under the folder /Library/LaunchDaemons/. That is, create the file

/Library/LaunchDaemons/com.clcbio.clcdrugdiscoveryserver.plist

3) Copy exactly the following contents into that file.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
   <key>Label</key>
   <string>com.clcbio.clcdrugdiscoveryserver</string>
   <key>ProgramArguments</key>
   <array>
      <string>/Applications/CLCDrugDiscoveryServer/CLCDrugDiscoveryServer</string>
      <string>start-launchd</string>
   </array>
   <key>OnDemand</key>
   <true/>
   <key>RunAtLoad</key>
   <true/>
</dict>
</plist>

4) Save that file.

5) Run the following command:

sudo launchctl load /Library/LaunchDaemons/com.clcbio.clcdrugdiscoveryserver.plist