HomeCLC FAQ - Workflows, Batching and other Workbench utilitiesRunning analyses in batchesHow can I keep the input sample name in the extracted consensus sequences from mappings?

1.4. How can I keep the input sample name in the extracted consensus sequences from mappings?

How can I keep the input sample name in the extracted consensus sequences from mappings?

The name of an extracted consensus sequence will by default be the reference name followed by "consensus". When extracting and exporting consensus sequences (to e.g. fasta format) from multiple samples mapping to the same reference all the consensus sequences will therefore be named the same.

The image below illustrates the purpose of this FAQ, which is to replace the default name with the sample name.

 

 

 

The instructions below show steps to replace the reference sequence names on the consensus sequences with the input sample names.

 

Step 1: To name your extracted consensus sequence from a read mapping according to your input samples, you can include Extract Consensus Sequence in a Workflow. The Workflow will need to include a mapping step followed by Extract Consensus Sequence. To name the sequence list with the consensus sequence according to the input sample you can use the placeholder {input} or {2} when configuring the final output name.  When using {input} or {2} the output will be named as the input.

More details about output names in Workflows can be found at the following manual page: http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Output.html

 

 

  • To run the Workflow with multiple samples, select the Batch option
  • The samples can be selected either by selecting the folder with the samples or the individual elements.

 

 

Step 2: Use Batch rename tool to rename the consensus sequences within the sequence lists. To rename the consensus sequence in multiple sequence lists at the same time you can follow this approach:

  • Launch Batch rename tool.
  • Use right-click option to Add folder content or Add folder content (recursively).

  • Choose the option Rename sequences in sequence lists.

  • To replace the full name of the consensus sequence, select the option Replace full name. If you rather wish to add the sample name, you can use the option Add text to name.
  • Use the Shift + F1 option to see options. Choose #BR-PE# to replace the name of the consensus sequence with that of the parent element. In this case the parent element is the sequence list that includes the consensus sequence.

  • The consensus sequences within the sequence list are now renamed

 

The batch rename tool is described in detail at the following manual page:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Batch_Rename.html

 

Step 3: Export to FASTA. Once the consensus sequences have been renamed, you can export them in fasta format. The sample name will now be retained in the exported fasta header.

The consensus sequences can either be exported to multiple fasta files or to one single fasta file. If you wish to export to a single fasta file, then select the option Output a single file. If outputting a single file, you may wish to add a custom name. If leaving the default, the fasta file will be named according to the first input.

 

 

Export from the Workbench is described in the manual page as follows:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Data_export.html

 

NB: If using a Workbench version before CLC Genomics Workbench 12, then the batch rename function is installed as a plugin.  The Plugins manager is launched by clicking on the Plugins button in the top toolbar. Please note that to install Plugins, you need to be running your Workbench as an administrative user.

Knowledge Tags

This page was: Helpful | Not Helpful