1.5. How can I import files containing interleaved paired data?
The CLC Genomics Workbench import tools expect that there will be separate files for the members of a pair. That is, all reads in file 1 will have a mate in file 2.
One can, however, import perfectly interleaved, paired sequences into the Workbench.
Here "perfectly" means that each sequence in a pair is followed by its mate. That is, sequence 1 and 2 are members of a pair, 3 and 4 are members of a pair, and so on. This condition must be met as the method outlined below involves importing the data as single reads, and marking it as paired after import. Reads will then be paired based purely on their position in the list of reads., with read 1 paired with read 2, read 3 paired with read 4, and so on.
In other words, no matching of reads names will be done, and the method outlined below will not allow for the inclusion of single reads within the file, nor will it work properly if the file contains unsorted paired data, where all mates may be present, but do not necessarily appear in a consecutive position to their mate in the file.
The method: Assuming that your data is a fastq file consisting of perfectly interleaved, paired sequences, you just need to:
- Import this fastq as single data, by, for example, using the Import | Illumina importer and not checking the option indicating paired reads.
- After the import is complete, open the resulting sequence list in your Genomics Workbench.
- Choose the Element Information view, by clicking on the small icon at the bottom of the viewing area that looks like a piece of paper with a green checkmark on it.
- Within this view, check the "Paired Sequences" box, enter minimum and maximum distances that are relevant for your sample, and adjust the orientation as needed.
More detailed information about handling paired data within the CLC Genomics Workbench can be found at the following page of our manual: