Which steps should I follow to perform a resequencing analysis in the CLC Genomics Workbench?

Which steps should I follow to perform a resequencing analysis in the CLC Genomics Workbench?
Go Back

Introduction Video on Resequencing

The introduction video starting after the setup of the reference sequence and import of sequencing data with QC and trimming of sequencing reads:

How to perform DNA-seq and resequencing data analyses using QIAGEN CLC Genomics Workbench

Stepwise guidance on different tools to be used in a resequencing study

1 Import Reference and Sequencing Reads

a) Import or Download Reference Genome

There are several options for importing a reference genome into the CLC Genomics Workbench. These options are listed below:

Download Genomes (video)
Search for Sequences at NCBI
Track import to be used for import of FASTA and GFF3 or GFF2/GTF/GVF file (video)

The easiest way to download the reference genome for selected organisms is via the 'Download Genomes' tool in the Workbench. Another option is to perform a search in the NCBI Entrez database. More information on these tools are provided in the sections below:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Download_Genomes.html

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Search_Sequences_at_NCBI.html

If your reference sequence and annotations are in separate files and the reference sequence is in FASTA file format, you will first need to import the reference sequence using the 'Import Tracks' tool. You can find this tool here:

Import in the Toolbar | Tracks

To import reference annotations, again you should use the 'Import Tracks' tool. The annotations can be imported in GFF3 or GFF2/GTF/GVF file format. Please make sure to select the reference genome just imported from the FASTA file at the bottom of the wizard. More information can be found at our manual below:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Import_tracks.html

*Annotating variants with known variants from variant databases is a key concept when you are working with resequencing data. In a later step (step 9), when you will have the identified variant list, you may want to annotate the variants with known variants from variant databases. Any variant track can be used as a known variants track. You can import or download the variant track from variant database resources specific for the organism that you are working with by using the 'Import Tracks' tool. You will also need to have obtained the reference sequence file relevant to the variant track in the Workbench prior to importing it.

b) Import Sequencing Reads

There are dedicated tools for importing high-throughput sequencing data into the CLC Genomics Workbench:

https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Import_NGS_Reads.html

For example, for importing Illumina reads into the Workbench we have the Illumina importer. Please click on the Import button in the top toolbar and choose Illumina. If you have paired reads, you should select "Paired reads" in the General options. For more information on Illumina importer, please see our manual below:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Illumina.html

2 Trim Sequence

To remove unwanted or poor quality bases from the reads prior to mapping, you can use our 'Trim Reads' tool. This includes quality trimming, adapter trimming and length trimming. You can access the 'Trim Reads' tool from:

Toolbox | Preparing Sequencing Data | Trim Reads

For more information please see:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Trim_Reads.html

3 Map Reads to Reference

In this step you will map the trimmed reads to the reference sequence. Please run the 'Map Reads to Reference' tool from:

Toolbox | Resequencing Analysis | Map Reads to Reference

Please see our manual and the subsection pages below on Map Reads to Reference.

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Map_Reads_Reference.html

4 InDels and Structural Variant detection

The 'InDels and Structural Variant' tool will help you to identify structural variants such as insertions, deletions, inversions, translocations and tandem duplications in read mappings. This tool relies exclusively on information derived from unaligned ends of the reads in the mappings.

The Reads track output from the 'Map Reads to Reference' tool can be used as input for 'InDels and Structural Variant' detection tool, which can be accessed from:

Toolbox | Resequencing Analysis | Variant Detection | InDels and Structural Variants

More information on this tool can be found in our manual (please see the subsection pages):

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=InDels_Structural_Variants.html

5 Prepare Guidance Variants track

The InDel variant track and the Structural Variant track obtained from step 4 can be combined using the 'Prepare Guidance Variants track' tool. The tool is part of the Biomedical Genomics Analysis plugin, which needs to be installed in the Workbench before this tool can be used. Once the plugin is installed, the tool is available from:

Tools | Resequencing Analysis | Prepare Guidance Variant Track

The combined track can then be used as a guidance track to use with the Local realignment tool in the next step.

More information about this tool is available from the link below:

http://resources.qiagenbioinformatics.com/manuals/biomedicalgenomicsanalysis/current/index.php?manual=Prepare_Guidance_Variant_Track.html

6 Local Realignment

Next, you can run the 'Local Realignment' tool to improve the alignments in the existing read mapping using combined guidance track obtained in step 5. You can run this tool from:

Toolbox | Resequencing Analysis | Local Realignment

Please see our manual on Local Realignment:
http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Local_Realignment.html

Using the combined InDel and Structural Variant track as guidance will locally realign the reads to include InDels up to 200 bp in the mapped part of the reads, whereby such larger InDels can be called by the variant detection tools. This would otherwise not be possible as such large InDels cannot be included in the mapped reads during standard read mapping procedure.

7 Create Statistics for Target Regions (optional for targeted amplicon sequencing)

For your targeted amplicon sequencing experiment, you may run the 'QC for Targeted Sequencing' tool which will report the performance (enrichment and specificity) of a targeted re-sequencing experiment. Here, you need to provide an annotation track with the target regions (e.g. imported BED file) and a mapping file (Reads track). It will investigate the read mapping to determine whether the targeted regions have been appropriately covered by sequencing reads as well as information about how specific the reads map to the targeted regions.

The target region BED file can be imported using the 'Import Tracks' option. You can run this tool from:

Toolbox | Quality Control | QC for Targeted Sequencing

More information on 'QC for Targeted Sequencing' tool can be found at:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=QC_Targeted_Sequencing.html

8 Identifying Variants/Mutations

a) Variant Detection

You can then go for variant detection with your locally realigned mapped reads. Please see the manual for an overview of the variant detection tools that we have in the Workbench:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Variant_Detection_tools.html

You can access the variant detection tools in the Workbench from:

Toolbox | Resequencing Analysis | Variant Detection

The variant track output is discussed here:

https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Variant_tracks.html

b) Identify Known Mutations from Sample Mappings

If you are simply interested to know if a list of variants are present in your samples or not, you can use the 'Identify Known Mutations from Sample Mappings' tool.

Two types of input are required to run this tool:

A variant track that holds the specific variants that you wish to test for. If you want to search in any external database variants, you need to import the variants (e.g. VCF file) beforehand using the Import Tracks option. Then, it would be saved as variant track in the Workbench navigation area.
The read mapping(s) that you wish to check for the presence (or absence) of specific variants. Please use the reads track from the Local Realignment step.

You can run this tool from:

Toolbox | Resequencing Analysis | Identify Known Mutations from Mappings

For more information on how to run the 'Identify Known Mutations from Mappings' tool, please see our manual below:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Identify_Known_Mutations_from_Sample_Mappings.html

9 Predict Functional Consequences

a) Predict Amino Acid Changes

To predict or classify the functional impact of the variant, you can run the 'Amino acid changes' tool. This tool adds the HGVS nomenclature of variants within the coding regions of genes. To identify the functional impact in your identified variant list, please run Amino Acid Changes tool from:

Toolbox | Resequencing Analysis | Functional Consequences | Amino Acid Changes

More information on this tool can be found at our manual:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Amino_Acid_Changes.html

b) Predict Splice Site Effect

The 'Predict Splice Site Effect' tool analyzes a variant track and determines whether the variants fall within potential splice sites. You can run this tool from

Toolbox | Resequencing Analysis | Functional Consequences | Predict Splice Site Effect

A detailed description on this tool can be found at:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Predict_Splice_Site_Effect.html

10 Annotate Variants

To annotate variants with information from databases of known variants, you can use 'Annotate from Known Variants', accessible from:

Toolbox | Resequencing | Variant annotation | Annotate from Known Variants

To know more about this tool, please see our manual:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Annotate_from_Known_Variants.html

11 Create Track List for visualization and inspection of the data

Finally, you can create a track list for easy navigation to the detected variants for visualization and inspection of the data.

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Track_lists.html

Workflow in CLC Genomics Workbench

To avoid going through the tool wizards for each of these tools you may wish to build a Workflow, which is a pipeline of interconnected tools. Figure 1 below shows an example of a Workflow that can be used as a variant detection pipeline.

Please see our manual and its subsection pages for more information on creating a Workflow.

https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Creating_editing_workflows.html

Figure 1: Example of a resequencing Workflow including the most important steps. More elements, inputs and outputs can be added.

If you have more samples, then you can also run the workflow in batch, which will automate the processing of multiple samples by going through the wizard steps only once. You can read about the batch function in the manual as follows:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Batch_processing.html

Various Template Workflows relevant for NGS datasets become available by installing the Biomedical Genomics Analysis plugin.

https://digitalinsights.qiagen.com/plugins/biomedical-genomics-analysis/

Detailed information about these Workflows are available in the following section:

https://resources.qiagenbioinformatics.com/manuals/biomedicalgenomicsanalysis/current/index.php?manual=WGS_WES_TAS_WTS_template_workflow_descriptions.html

NGS application specific Workflows are described below:

IPA

CLC Software

HGMD

QCI

OmicSoft Suite

OmicSoft Lands

Introduction Video on Resequencing

Stepwise guidance on different tools to be used in a resequencing study