How can I import files containing interleaved paired data?
Go Back This FAQ includes instructions about how to import a single interleaved paired-end sequencing data The QIAGEN CLC Genomics Workbench import tools expect that there will be separate files for the members of a pair. That is, all reads in file 1 will have a mate in file 2. One can, however, import perfectly interleaved, paired sequences into the Workbench if they are in one file and ordered in the following manner:. "Perfectly" means that each sequence in a pair is followed by its mate. That is, sequence 1 and 2 are members of a pair, 3 and 4 are members of a pair, and so on. This condition must be met as the method outlined below involves importing the data as single reads, and marking it as paired after import. Reads will then be paired based purely on their position in the list of reads., with read 1 paired with read 2, read 3 paired with read 4, and so on. In other words, no matching of reads names will be done, and the method outlined below will not allow for the inclusion of single reads within the file, nor will it work properly if the file contains unsorted paired data, where all mates may be present, but do not necessarily appear in a consecutive position to their mate in the file. The method: Assuming that your data is a fastq file consisting of perfectly interleaved, paired sequences, you just need to:
More detailed information about handling paired data within the CLC Genomics Workbench can be found at the following page of our manual. |