9.1. Which steps should I follow to perform a resequencing analysis in the CLC Genomics Workbench?
To perform your resequencing study, you can follow the steps below:
1 Import Reference and Sequencing Reads
a) Import or Download Reference Genome
There are several options for importing a reference genome into the CLC Genomics Workbench. These options are listed below:
- Download Genomes
- Search for Sequences at NCBI
- Track import to be used for import of FASTA and GFF3 or GFF2/GTF/GVF file
The easiest way to download the reference genome for selected organisms is via the 'Download Genomes' tool in the Workbench. Another option is to perform a search in the NCBI Entrez database. More information on these tools are provided in the sections below:
If your reference sequence and annotations are in separate files and the reference sequence is in FASTA file format, you will first need to import the reference sequence using the 'Import Tracks' tool. You can find this tool here:
Import in the Toolbar | Tracks
To import reference annotations, again you should use the 'Import Tracks' tool. The annotations can be imported in GFF3 or GFF2/GTF/GVF file format. Please make sure to select the reference genome just imported from the FASTA file at the bottom of the wizard. More information can be found at our manual below:
*Annotating variants with known variants from variant databases is a key concept when you are working with resequencing data. In a later step (step 9), when you will have the identified variant list, you may want to annotate the variants with known variants from variant databases. Any variant track can be used as a known variants track. You can import or download the variant track from variant database resources specific for the organism that you are working with by using the 'Import Tracks' tool. You will also need to have obtained the reference sequence file relevant to the variant track in the Workbench prior to importing it.
b) Import Sequencing Reads
There are dedicated tools for importing high-throughput sequencing data into the CLC Genomics Workbench:
For example, for importing Illumina reads into the Workbench we have the Illumina importer. Please click on the Import button in the top toolbar and choose Illumina. If you have paired reads, you should select "Paired reads" in the General options. For more information on Illumina importer, please see our manual below:
2 Trim Sequence
To remove unwanted or poor quality bases from the reads prior to mapping, you can use our 'Trim Reads' tool. This includes quality trimming, adapter trimming and length trimming. You can access the 'Trim Reads' tool from:
Toolbox | Preparing Sequencing Data | Trim Reads
For more information please see:
3 Map Reads to Reference
In this step you will map the trimmed reads to the reference sequence. Please run the 'Map Reads to Reference' tool from:
Toolbox | Resequencing Analysis | Map Reads to Reference
Please see our manual and the subsection pages below on Map Reads to Reference.
4 InDels and Structural Variant detection
The 'InDels and Structural Variant' tool will help you to identify structural variants such as insertions, deletions, inversions, translocations and tandem duplications in read mappings. This tool relies exclusively on information derived from unaligned ends of the reads in the mappings.
The Reads track output from the 'Map Reads to Reference' tool can be used as input for 'InDels and Structural Variant' detection tool, which can be accessed from:
Toolbox | Resequencing Analysis | Variant Detection | InDels and Structural Variants
More information on this tool can be found in our manual (please see the subsection pages):
5 Prepare Guidance Variants track
The InDel variant track and the Structural Variant track obtained from step 4 can be combined using the 'Prepare Guidance Variants track' tool. The tool is part of the Biomedical Genomics Analysis plugin, which needs to be installed in the Workbench before this tool can be used. Once the plugin is installed, the tool is available from:
Tools | Resequencing Analysis | Prepare Guidance Variant Track
The combined track can then be used as a guidance track to use with the Local realignment tool in the next step.
More information about this tool is available from the link below:
6 Local Realignment
Next, you can run the 'Local Realignment' tool to improve the alignments in the existing read mapping using combined guidance track obtained in step 5. You can run this tool from:
Toolbox | Resequencing Analysis | Local Realignment
Please see our manual on Local Realignment:
Using the combined InDel and Structural Variant track as guidance will locally realign the reads to include InDels up to 200 bp in the mapped part of the reads, whereby such larger InDels can be called by the variant detection tools. This would otherwise not be possible as such large InDels cannot be included in the mapped reads during standard read mapping procedure.
7 Create Statistics for Target Regions (optional for targeted amplicon sequencing)
For your targeted amplicon sequencing experiment, you may run the 'QC for Targeted Sequencing' tool which will report the performance (enrichment and specificity) of a targeted re-sequencing experiment. Here, you need to provide an annotation track with the target regions (e.g. imported BED file) and a mapping file (Reads track). It will investigate the read mapping to determine whether the targeted regions have been appropriately covered by sequencing reads as well as information about how specific the reads map to the targeted regions.
The target region BED file can be imported using the 'Import Tracks' option. You can run this tool from:
Toolbox | Resequencing Analysis| QC for Targeted Sequencing
More information on 'QC for Targeted Sequencing' tool can be found at:
8 Identifying Variants/Mutations
a) Variant Detection
You can then go for variant detection with your locally realigned mapped reads. Please see the manual for an overview of the variant detection tools that we have in the Workbench:
You can access the variant detection tools in the Workbench from:
Toolbox | Resequencing Analysis | Variant Detection
The variant track output is discussed here:
b) Identify Known Mutations from Sample Mappings
If you are simply interested to know if a list of variants are present in your samples or not, you can use the 'Identify Known Mutations from Sample Mappings' tool.
Two types of input are required to run this tool:
- A variant track that holds the specific variants that you wish to test for. If you want to search in any external database variants, you need to import the variants (e.g. VCF file) beforehand using the Import Tracks option. Then, it would be saved as variant track in the Workbench navigation area.
- The read mapping(s) that you wish to check for the presence (or absence) of specific variants. Please use the reads track from the Local Realignment step.
You can run this tool from:
Toolbox | Resequencing Analysis | Identify Known Mutations from Mappings
For more information on how to run the 'Identify Known Mutations from Mappings' tool, please see our manual below:
9 Predict Functional Consequences
a) Predict Amino Acid Changes
To predict or classify the functional impact of the variant, you can run the 'Amino acid changes' tool. This tool adds the HGVS nomenclature of variants within the coding regions of genes. To identify the functional impact in your identified variant list, please run Amino Acid Changes tool from:
Toolbox | Resequencing Analysis | Functional Consequences | Amino Acid Changes
More information on this tool can be found at our manual:
b) Predict Splice Site Effect
The 'Predict Splice Site Effect' tool analyzes a variant track and determines whether the variants fall within potential splice sites. You can run this tool from
Toolbox | Resequencing Analysis | Functional Consequences | Predict Splice Site Effect
A detailed description on this tool can be found at:
10 Annotate Variants
To annotate variants with information from databases of known variants, you can use 'Annotate from Known Variants', accessible from:
Toolbox | Resequencing | Variant annotation | Annotate from Known Variants
To know more about this tool, please see our manual:
11 Create Track List for visualization and inspection of the data
Finally, you can create a track list for easy navigation to the detected variants for visualization and inspection of the data.
Workflow in CLC Genomics Workbench
To avoid going through the tool wizards for each of these tools you may wish to build a Workflow, which is a pipeline of interconnected tools. Figure 1 below shows an example of a Workflow that can be used as a variant detection pipeline.
Please see our manual and its subsection pages for more information on creating a Workflow.
Figure 1: Example of a resequencing Workflow including the most important steps. More elements, inputs and outputs can be added.
If you have more samples, then you can also run the workflow in batch, which will automate the processing of multiple samples by going through the wizard steps only once. You can read about the batch function in the manual as follows:
Starting from CLC Genomics Workbench version 12.0, you can also use various ready-to-use Workflows relevant for your NGS dataset by installing the Biomedical Genomics Analysis plugin. These Workflows were formerly part of the Biomedical Genomics Workbench.
Detailed information about these Workflows are available in the following section:
NGS application specific Workflows are described below: