HomeCLC FAQ - Workflows, Batching and other Workbench utilitiesRunning analyses in batchesHow to import, arrange and batch analyze data from an Illumina NextSeq machine when having multiple samples?

1.2. How to import, arrange and batch analyze data from an Illumina NextSeq machine when having multiple samples?

How to import, arrange and batch analyze data from an Illumina NextSeq machine when having multiple samples?

Illumina NextSeq machines have four physical lanes and produces eight fastq files per sample, i.e. four R1 and four R2 fastq files per sample.

 

To batch analyze samples consisting of multiple sequence list in CLC Genomics Workbench, the sequence lists either have be concatenated or placed in folders representing batch units. This is described in the related FAQs linked at the bottom of this FAQ.

 

In this FAQ we described how you can utilize the benefits of a multiple inputs workflow to automate the folder generation and batch analysis of the samples without concatenating the sequence list using a single workflow. A multiple inputs workflow can sort your sequence lists into folders based on an Excel spreadsheet describing which fastq files belong to which samples. After which each sample is analyzed in batch. 

 

More information on batch launching workflows with multiple inputs can be found at the manual page:

Batch launching workflows with multiple inputs

Furthermore, we suggest utilizing batch renaming function to keep sample names short and informative.

 

A detailed guide is found below. It has been sectioned into the following paragraphs:

  1. Build a workflow with multiple inputs to prepare and analyze the data
  2. Create Excel spreadsheet describing the samples
  3. Import fastq files and batch rename the resulting sequence lists
  4. Run the installed workflow in batch mode
  5. Batch rename output files from the Workflow
  6. Relevant manual pages

 

1. Build a workflow with multiple inputs to prepare the data and analyze the data

The workflow may include Trim Reads and QC for Sequencing Reads with individual inputs, even that the same data will be used as input for the two tools. When running in batch mode this workflow will then automatically sort and arrange samples in folders based on the Excel spreadsheet describing the samples. The workflow may also include analysis steps based on the data application, e.g. de novo assembly, resequencing, etc. In this example we use de novo assembly as the application.

The Workbench will name output objects for which one object is produced from multiple inputs according to the first input object using the default naming option in a workflow. We therefore suggest using a generic name, e.g. Assembly with mapping, and then append the sample name using batch rename after the analysis. If outputting the Trimmed Reads and Unmapped reads we suggest collecting these into a folder as several files are output. In such case the default name will be appropriate to use.

To run the workflow in batch mode the workflow needs to be installed. This is done by clicking the Installation button.

 

2. Create Excel spreadsheet describing the samples

The Excel spreadsheet should include three columns:

  • The first column should include a Unique ID for the fastq files, e.g. ID incl. lane number (L001, L002, L003 and L004).
  • The second column should include a sample name for the grouping, this may be the shared ID from the fastq files or a descriptive name of the sample.
  • The third column should include the type of data. All data should have the same type, e.g. NextSeq reads.

 

 

3. Import fastq files and batch rename the resulting sequence lists

Gather all fastq files for all samples in one folder before import to allow import of all samples in one go. You also need to create a folder for the imported files in the Workbench as this folder needs to be selected for the following batch workflow.

On import the Workbench merge fastq files into one paired sequence list for each lane. When more files are used as input for one output, the resulting object in the Workbench is named based on the first input file. Hence, the sequence lists will include R01_001, even that it includes both the R1 and R2 reads. Furthermore, (paired) is appended to the name to tell that the sequence list includes paired reads. The batch rename function can be used to remove R01_001 from the names by replacing R001_001 with nothing.

Batch renaming of all sequence list are done in the following way:

  • Launch the batch rename tool.
  • Use the right click option to Add folder content to the batch naming and click Next.

  • In this example no objects should be excluded. Click Next.
  • Select the option Rename Elements and click Next.
  • Choose the option Replace part of name and enter: _R1_001, in the From box. Leave the To box empty.

  • Click Finish.
  • Confirm Renaming by clicking Yes.

The sequence lists are now renamed.

4. Run the installed workflow in batch mode

  • Right click the installed workflow and choose the option Run in batch mode.

  • Select the Excel spreadsheet describing the data.
  • Select the Folder with all the imported samples.
  •  Select Partial option for the data association.

 

  • Click Next.
  • Set Group by to Sample ID and Type to Type.
  • Choose the Type, e.g. NextSeq Reads in this example, for both the first and the second input.
  • Click Next.

  • Create or choose a folder where you want to store the workflow results.
  • Click Finish.

The output of the workflow is a subfolder for each batch unit/sample, named according to the Sample ID. Each subfolder contains:

  • Trim Reads Report
  • Graphical QC Report
  • Assembly Report
  • Assembly with mapping
  • Folder with the trimmed reads
  • Folder with the unmapped reads

 

 

5. Use batch rename to rename the results

The batch rename option in the Workbench can be used to add sample ID to the results.

This is done in the following way:

  • Launch Batch rename tool.
  • Select top workflow result folder and use the right click option to Add folder content (recursively) and click Next.

  • Exclude the sequence lists from the batch rename using the text: (paired)

  • Select the option Rename Elements and click Next.
  • Select the option Add text to name. Use the Shift + F1 option to see options. Choose #BR-F# to add the folder name. In this example we wish to add the sample name at the beginning and therefore we included a space after #BR-F#.

  • Click Finish
  • Confirm Renaming by clicking Yes.

The sample name from the subfolders are now appended to the objects.

 

 

The trimmed sequence list will be named according to the inputs, while the unmapped reads will be named according to the first input, but with the original name in brackets [].

 

6. Relevant manual pages

Relevant manual pages related to the information in this FAQ are:

Related Pages
This page was: Helpful | Not Helpful