HomeCLC FAQ - Workflows, Batching and other Workbench utilitiesRunning analyses in batchesHow to concatenate four sequence lists of NextSeq data in to one sequence list?

1.3. How to concatenate four sequence lists of NextSeq data in to one sequence list?

This FAQ relates to CLC Genomics Workbench 12.0.x and previous version. For CLC Genomics Workbench 20.00 reads from different lanes can be joined on import. For more information on this please see the manual page:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Illumina.html

 

 

Illumina NextSeq machines have four physical lanes and produces eight fastq files per sample, i.e. four R1 and four R2 fastq files.

 

If you wish to have these concatenated into one sequence list (and not four) for analysis in CLC Genomics Workbench, this can be done after import using the New Sequence list option found under:

File | New | Sequence list

 

If your Sequencing Center uses Illuminas bcl-to-fastq tool. Then an alternative solution is to ask the Sequencing Center to use the "--no-lane-splitting" option, which forces bcl-to-fastq to output only two fastq files, i.e. one R1 and one R2 fastq files.

 

Please notice that it is not necessary to concatenate the fastq files before analysis in CLC Genomics Workbench. For more information about how to analyze without concatenating please see the related FAQ pages found in the bottom of the page.

 

If you have a large amount of sequence lists that you wish to concatenate, then we suggest that you utilize the benefits of a multiple inputs workflow to automate the concatenation based on an Excel spreadsheet describing which fastq files belong to which samples.

More information on batch launching workflows with multiple inputs can be found at the manual page:

Batch launching workflows with multiple inputs

Furthermore, we suggest utilizing batch renaming function to keep sample names short and informative.

 

A detailed guide giving an example of how this can be done is found below. It has been sectioned into the following paragraphs:

  1. Build a workflow with multiple inputs to concatenate the sequence lists and produce a QC report for the reads.
  2. Create Excel spreadsheet describing the samples
  3. Import fastq files and batch rename the resulting sequence lists
  4. Run the installed workflow in batch mode
  5. Batch rename output files from the Workflow
  6. Relevant manual pages

 

1. Build a workflow with multiple inputs to concatenate the sequence lists

Build a multiple input workflow including Sequence list and QC for Sequencing Reads. When running in batch mode this workflow will then automatically concatenate and arrange samples in folders based on the Excel spreadsheet describing the samples.

In this example the Graphical Report output was configured to "Graphical QC Report" to ease the batch renaming at a later stage.

 

 

2. Create Excel spreadsheet describing the samples

The Excel spreadsheet should include three columns:

  • The first column should include a Unique ID for the fastq files, e.g. ID incl. lane number (L001, L002, L003 and L004).
  • The second column should include a sample name for the grouping, this may be the shared ID from the fastq files or a descriptive name of the sample.
  • The third column should include the type of data. All data should have the same type, e.g. NextSeq reads.

 

 

3. Import fastq files and batch rename the resulting sequence lists

Gather all fastq files for all samples in one folder before import to allow import of all samples in one go. You also need to create a folder for the imported files in the Workbench as this folder needs to be selected for the following batch workflow.

On import the Workbench merge fastq files into one paired sequence list for each lane. When more files are used as input for one output, the resulting object in the Workbench is named based on the first input file. Hence, the sequence lists will include R01_001, even that it includes both the R1 and R2 reads. Furthermore, (paired) is appended to the name to tell that the sequence list includes paired reads. The batch rename function can be used to remove R1_001 from the names by replacing R1_001 with nothing.

 

Batch renaming of all sequence list are done in the following way:

  • Launch the batch rename tool.
  • Use the right click option to Add folder content to the batch naming and click Next.

  • In this example no objects should be excluded. Click Next.
  • Select the option Rename Elements and click Next.
  • Choose the option Replace part of name and enter: _R1_001, in the From box. Leave the To box empty.

  • Click Finish.
  • Confirm Renaming by clicking Yes.

The sequence lists are now renamed.

 

 

4. Run the installed workflow in batch mode.

  • Right click the installed workflow and choose the option Run in batch mode.

 

  • Select the Excel spreadsheet describing the data.
  • Select the Folder with all the imported samples.
  • Select Partial option for the data association.
  • Click Next.

  • Set Group by to Sample ID and Type to Type.
  • Choose the Type, e.g. NextSeq Reads in this example, for both the first and the second input.
  • Click Next.

 

  • Create or choose a folder where you want to store the workflow results.
  • Click Finish.

The output of the workflow is a subfolder for each batch unit/sample, named according to the Sample ID. Each subfolder contains:

  • Sequence List
  • Graphical Report

 

 

5. Batch rename output files from the Workflow

The batch rename option in the workbench can be used to replace New Sequence List with the sample ID and to append the sample ID to the QC report.

 

To replace New Sequence List with the Sample ID:

  • Launch Batch rename tool.
  • Select top workflow result folder and use the right click option to Add folder content (recursively) and click Next.

  • Filter on using the search term New to only include the New Sequence Lists.
  • Select to Rename Elements
  • Choose the option Replace full name and use Shift + F1 to see the options. Select #BR-F# to replace with the parent folder name.

  • Click Finish
  • Confirm Renaming by clicking Yes.

 

Repeat to append the Sample ID to the Graphical Report:

  • This time choose the option Add text to name. Choose #BR-F# to add the folder name. In this example we wish to add the sample name at the beginning and therefore we included a space after #BR-F#.

The sequence lists are now concatenated, renamed and ready for your subsequent analysis.

 

 

6. Relevant manual pages

Relevant manual pages related to the information in this FAQ are:

 

Related Pages
This page was: Helpful | Not Helpful