HomeCLC FAQ - Analyses-related questionsTrimmingHow can I find the location and orientation of my adapters in the reads?

10.1. How can I find the location and orientation of my adapters in the reads?

If you suspect that you have adapters in your data, or you are aware that you have adapters in the reads, but do not know the location (5' or 3' end) and orientation (5'-3' or 3'-5'), you can find out using the steps described for our different workbenches in the section below.


CLC Genomics Workbench

1) First create a subset of 1000 reads. These will be used for identifying the location and orientation of the adapters in the reads. 

You can create the subset using the Sample reads tool: 

Toolbox | NGS Core Tools (Image ngsfolder) | Sample reads (Image extractconsensus_16_n_p)

In the wizard please select the the option to Sample an absolute number and set the Sample size to 1000.

You will now get a new Sequence List that only includes 1000 reads from the original Sequence List. Creating this small subset of reads is necessary, as the tools used in the next step were not designed for large datasets.

 

2)  Next you can look for the adapters in your reads by running a Motif search. This can be done either using the Motif search from the Toolbox or Dynamic motif search. The advantages of using the Motif search from the Toolbox is, that it allows you to account for sequencing errors in the adapters and provides you with a table overview of the identified motifs (adapters). On the other hand the dynamic motif search allows you to quickly add new motifs (adapters) and to save the view settings and thereby apply the motif (adapter) search to other Sequence Lists.

Both options are described below:

2a) Motif search from the Toolbox 

  • First create a Motif List with the adapter sequences: 

File | New | Create Motif List (Image motiflist)

In the wizard you can import a fasta file with all the adapter sequences in the 5' - 3' orientation. If you do not have a fasta format file with your adapters, you can add each adapter sequence manually by clicking the Add button.

  • After creating the Motif List run the Motif Search tool on the Sequence List with the 1000 reads:

    Toolbox | Classical Sequence Analysis (Image gene_and_protein_analysis) | General Sequence Analysis (Image generalsequenceanalyses)| Motif Search (Image patternsearch)

    Use the following parameter settings:
    • Motif search type: Motif List
    • Choose the newly created Motif List
    • Set the Accuracy (%) to somewhere between 50 and 100% depending on the accuracy you expect of your sequencing data.
    • Select  Include Negative Strand
    • Make sure that the option to Add annotations to sequences is selected

  • Finish the wizard. The output of the Motif search from the Toolbox is a motif table and the input Sequence List updated with motif annotations.
  • View the motif annotations on the sequences. To do this make sure to select to show the Motif annotations in the side-panel. If only a few motifs (adapters) are found, then it can be helpful to view the Sequence List in a split view with the motif table. In the split view you can select a row in the motif table and the view of the Sequence List will then jump to this position.

    Please see the example screen shot below:

 

 

2b) Dynamic motif search

  • A dynamic motif can be added either by clicking the Add Motif button and then pasting in the adapter name and sequence in the window that opens up or by clicking the Manage Motifs button, after which you can select a Motif List.

  • Select to Include reverse motif and to show the added adapter in the side panel.

    Please see the example screen shot below:


  • In this example both of the adapters are located in the 3' end of the read. Sequence Lists in the CLC Genomics Workbench always display the reads in the 5' -3' orientation, so if the adapter is located in the end/to the right in the read, it is found in the 3' end. The Adapter 1 annotation (arrow) is pointing to the right. This means that the adapter is found in the forward (5'-3') direction; the Adapter 2 annotation (arrow) is pointing to the left, this means that the adapter sequence is found in the reverse-complement (3'-5') orientation.  

 

3) Now you know the location and orientation of the adapters and can therefore create a Trim Adapter List

File | New | Trim Adapter List

In this example both of the adapters were found in the 3' end of the reads. This means that to trim away the adapters and everything 3' of the adapters, you will need to select Minus under the option strand in the Trim Adapter List. Trimming on the minus strand is equivalent to reverse complementing all the reads. This means that for Adapter 1, which was found to be in the forward direction in the Sequence List, you will need to enter the adapter sequence in the reverse-complement (3'-5') orientation, when trimming on the minus strand. Whereas for Adapter 2, which was found in the reverse-complement (3'-5') orientation you can add the adapter sequence as it is (5'-3'), since the sequence is already in the correct orientation for searching the minus strand.

To trim off the adapters in this example:

  • Trim on the minus strand for both adapters as they are both found in the 3' end of the reads
  • For Adapter 1 enter the adapter sequence in the reverse-complement orientation (3'-5')
  • For Adapter 2 enter the adapter sequence in the forward direction (5'-3')

In this example we used the default parameters for the Alignment score costs and Match thresholds. These might not be appropriate for you data, if you need to optimize these, then you can find examples describing the parameters in the Adapter Trimming section of the manual.

Please, see the screenshot indicating how the Trim Adapter list looks in this example:

 

 

4) Finally you may now trim the original Sequence List using the Trim Sequences tool and Trim Adapter List:

Toolbox | NGS Core Tools (Image ngs_folder_open_16_n_p) | Trim Sequences (Image trim)

After running this tool your adapters will be removed from the reads. The resulting Sequence List is ready for further analyses.

 

Biomedical Genomics Workbench

1) First create a subset of 1000 reads. These will be used for identifying the location and orientation of the adapters in the reads.  Creating this small subset of reads is necessary as the tool used in step 2 is not designed for large datasets:

  • Open the Sequence List in the table view.

  • Select the first row then scroll down. 

  • While pressing the Shift key, click a row in the Sequence List. You can now see the number of reads selected in the lower right corner of the Workbench.

  • If you have not reached ~1000 reads then scroll up or down to decrease or increase number of reads. With the shift key pressed, click a new row to adjust your selection. It is not important that you select exactly 1000 reads -anything between 1000-3000 reads will do.

  • After selecting the subset, please click the Create New Sequence List button to create the subset.

Please, see the attached screen shot showing this:

 

2) The next step is to find the adapter sequence in the reads using the Dymanic motif search:

  • Open the new Sequence List you created in step 1 in the graphical (default) view. 

  • Add a dynamic motif by clicking the Add Motif button in the side panel, after which you can paste in the adapter name and sequence in the window that opens up.

  • Check the option to Include reverse motif and to show the added adapter in the side panel.

    Please see the example screen shot below:

In this example both of the adapters are located in the 3' end of the read. Sequence Lists in the Biomedical Genomics Workbench always display the reads in the 5' -3' orientation, so if the adapter is located in the end/to the right in the read, it is found in the 3' end. The Adapter 1 annotation (arrow) is pointing to the right. This means that the adapter is found in the forward (5'-3') direction; the Adapter 2 annotation (arrow) is pointing to the left, this means that the adapter sequence is found in the reverse-complement (3'-5') orientation.

 

3) Now you know the location and orientation of the adapters and can therefore create a Trim Adapter List

File | New | Trim Adapter List

In this example both of the adapters were found in the 3' end of the reads. This means that to trim away the adapters and everything 3' of the adapters, you will need to select Minus under the option strand in the Trim Adapter List. Trimming on the minus strand is equivalent to reverse complementing all the reads. This means that for Adapter 1, which was found to be in the forward direction in the Sequence List, you will need to enter the adapter sequence in the reverse-complement (3'-5') orientation, when trimming on the minus strand. Whereas for Adapter 2, which was found in the reverse-complement (3'-5') orientation you can add the adapter sequence as it is (5'-3'), since the sequence is already in the correct orientation for searching the minus strand.

To trim off the adapters in this example:

  • Trim on the minus strand for both adapters as they are both found in the 3' end of the reads
  • For Adapter 1 enter the adapter sequence in the reverse-complement orientation (3'-5')
  • For Adapter 2 enter the adapter sequence in the forward direction (5'-3')

In this example we used the default parameters for the Alignment score costs and Match thresholds. These might not be appropriate for you data, if you need to optimize these, then you can find examples describing the parameters in the Adapter Trimming section of the manual.

Please, see the screenshot indicating how the Trim Adapter List looks in this example:

 

 

 4) Finally you may now trim the original Sequence List using the Trim Sequences tool and Trim Adapter List:

Toolbox | Preparing Raw Data | Trim Sequences

After running this tool your adapters, the result will be a trimmed Sequence List where the adapters have been removed from the reads. You may now use this trimmed Sequence List for further analyses.

Background info:

To properly trim the adapters of your high throughput sequencing reads it is necessary to know the location and orientation of the adapters in the reads.

If you enter the incorrect strand or orientation of the adapter to the Trim Adapter List then you may risk the following:

  • Not all of the adapters will be trimmed away
  • Trimming on the wrong strand and thereby trimming away the interesting part of the read, leaving only the sequence outside of the adapter
  • Trimming away the full read, when this was not intentional

Having adapters on the reads while performing different analyses in the CLC Genomics Workbench or Biomedical Genomics Workbench can have adverse consequences for the quality of your analysis. It is especially important to have your adapters trimmed off for de novo assemblies, as the adapters will affect the assembly. For read mappings, the adapter can either cause the read not to map, because the length fraction cannot be met, or the adapter will be left as an unaligned end.

Adapters sequences are sometimes unexpectedly found in reads, for example, this can happen if the sequenced DNA molecules have been fragmented into shorter molecules than expected during the DNA fragmentation step in the wet-lab. If you find that there might be an issue with adapters in your data, or you are aware that you have adapters in the reads, but do not know the location (5' or 3' end) and orientation (5'-3' or 3'-5'), it's important that you investigate this such that you can take the right measures to remove adapters properly.

Knowledge Tags

Related Pages
This page was: Helpful | Not Helpful