3.9. Import of low quality Solid fastq paired reads may result in incorrectly matched pairs


When importing SOLiD paired reads in colorspace fastq format using the SOLiD import tool, reads without quality scores will be discarded.  This can lead to a mispairing of later reads in the list.  For example with a pair of forward-reverse sequences with names read_1 and read_2, if read_1 had no quality scores, it would be discarded. However, read_2 is kept in the paired list and ends up paired with the next forward read, which is not its partner. Thus, the resulting Sequence List in the Workbench contains incorrectly paired reads.

Incorrectly matched pairs will lead to problems in any analysis where paired information is taken into account, e.g. when mapping reads to reference sequences, most pairs would be expected to be recorded as broken pairs.

Who is affected

Anyone who has SOLiD colorspace fastq data, for which quality data is missing for one or more reads, and who has imported the data using the tool: Import | SOLiD. 

You will not be affected if quality information is present for all your SOLiD sequencing reads.

The issue is present in CLC Genomics Workbench 8.0.1 and prior versions, the CLC Cancer Research Workbench 2.0 and earlier, and the Biomedical Genomics Workbench 2.1.

This issue was fixed in CLC Genomics Workbench 8.5 and Biomedical Genomics Workbench 2.5, released in September, 2015.


