HomeCLC FAQ - Analyses-related questionsSequences and sequence listsHow can I concatenate sequence lists and when do I need to?

2.3. How can I concatenate sequence lists and when do I need to?


How to concatenate fastq files from different lanes

From QIAGEN CLC Genomics Workbench 20 and onward Fastq files from the same Illumina sequencing run but from different lanes can be merged into a single sequences list during import if selecting the option Join reads from different lanes.

This functionality is described on the following manual page:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Illumina.html

 

How to concatenate sequence lists together

How to concatenate two or more sequence lists together is covered in our Workbench manuals. For Genomics Workbench, the relevant manual link is: 

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Sequence_Lists.html

The information there is pertinent to all CLC Workbenches.

 

Concatenating sequence lists is not necessary in most cases

Analysis tools that accept sequence lists as input can accept two or more sequence lists at once. Thus, there is no need to concatenate the lists prior to analysis of data that should be analyzed together.

For example, if you have two or more sets of sequence reads from a single sample that you wish to enter into a mapping, de novo assembly or other tool, you just select the relevant sequence lists in the Wizard, as shown below:

If you have two or more sets of sequence reads from each sample and wish to analyze the samples using the Batch option. Then, this is possible setting up a folder structure with a top folder and a folder for each sample containing the sequence list to be included in the analysis. If checking Batch and selecting the top folder, then the content of each folder will be analyzed as one batch unit. That is, that all the reads from the sequence lists in the sample folder will be analyzed as if they came from one large sequence list:

 

If running a workflow in batch the batch units can either be defined based on folder structure or metadata as described on the following manual page:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Running_workflows_in_batch_mode.html

 

Concatenate two or more sequence lists makes sense when...

Cases where concatenating sequence lists can be useful are:

1) Viewing annotations across a sequence set

If you wished to view and search all annotations on all the sequences in a set, then those sequences would need to be in a single sequence list. Relevant manual links include:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=View_Annotations_in_sequence_views.html

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=View_Annotations_in_table.html

 

2) Organization (convenience)

One could store many sequence lists in a folder, but an alternative would be to concatenate them into one sequence list.

Please note that we recommend that this action be taken only on smaller lists (e.g. thousands of sequences or less) and not very large sequence lists, such as lists of high throughput sequencing data.

Knowledge Tags

Related Pages
This page was: Helpful | Not Helpful