2.3. How can I concatenate sequence lists and when do I need to?
- How to concatenate fastq files from different lanes
- How to concatenate sequence lists together
- Concatenating sequence lists is not necessary in most cases...aka How do I can I use more than one sequence list for an analysis?
- Concatenating two or more sequence lists makes sense when...
From QIAGEN CLC Genomics Workbench 20 and onward Fastq files from the same Illumina sequencing run but from different lanes can be merged into a single sequences list during import if selecting the option Join reads from different lanes.
This functionality is described on the following manual page:
How to concatenate two or more sequence lists together is covered in our Workbench manuals. For Genomics Workbench, the relevant manual link is:
The information there is pertinent to all CLC Workbenches.
Analysis tools that accept sequence lists as input can accept two or more sequence lists at once. Thus, there is no need to concatenate the lists prior to analysis of data that should be analyzed together.
For example, if you have two or more sets of sequence reads from a single sample that you wish to enter into a mapping, de novo assembly or other tool, you just select the relevant sequence lists in the Wizard, as shown below:
If you have two or more sets of sequence reads from each sample and wish to analyze the samples using the Batch option. Then, this is possible setting up a folder structure with a top folder and a folder for each sample containing the sequence list to be included in the analysis. If checking Batch and selecting the top folder, then the content of each folder will be analyzed as one batch unit. That is, that all the reads from the sequence lists in the sample folder will be analyzed as if they came from one large sequence list:
If running a workflow in batch the batch units can either be defined based on folder structure or metadata as described on the following manual page:
Cases where concatenating sequence lists can be useful are:
1) Viewing annotations across a sequence set
If you wished to view and search all annotations on all the sequences in a set, then those sequences would need to be in a single sequence list. Relevant manual links include:
2) Organization (convenience)
One could store many sequence lists in a folder, but an alternative would be to concatenate them into one sequence list.
Please note that we recommend that this action be taken only on smaller lists (e.g. thousands of sequences or less) and not very large sequence lists, such as lists of high throughput sequencing data.