HomeCLC FAQ - Analyses-related questionsDe novo assemblyHow can contig coverage be included in exported fasta headers?

6.6. How can contig coverage be included in exported fasta headers?

This FAQ page addresses how you can export a fasta file that includes coverage information along with the sequence name. This is a common requirement for downstream applications such as RAST or MT-RAST.

If you have not run the De Novo Assembly tool...

To get coverage information included in the fasta header please take the following steps:

  1. Run the De novo assembly tool with the mapping options:
    • Map reads back to the contigs (slow).
    • Update contigs.

  2. Extract the consensus sequences to a Sequence List
    • Select all rows in the mapping table by clicking a single row, then pressing Ctrl+A or ⌘+A on Mac.
    • Click the Extract Contigs button

     


  3. Export the Sequence List to a fasta format file. The average coverage will then be included in the header.

 

 

If you have already run the De Novo Assembly tool with the Create simple contig sequences (fast) option selected...

If you have chosen the the option Create simple contig sequences (fast) for the De novo assembly, then you can run the mapping using the Map reads to contigs tool to get the Mapping table with the coverage information. In the case please use the options:

  • Update contigs
  • Create stand-alone read mappings

Then Extract the contigs and export as shown in steps 2 and 3 above.

If you have NOT chosen to Update Contigs, then a button labelled Extract Consensus will be present in the Mapping table instead of the Extract Contigs button. In this case the coverage will not be included upon extraction, as then the consensus is extracted, rather than the contig sequence. In this case you will need to re-run the De novo assembly as described above.

 

Background

During the De novo assembly, coverage information regarding the read sequences contributing to each contig is not calculated. This is because contigs are built from the words, or k-mers, used to generate the de bruijn graph, rather than original reads. This means that the reads that map to the contig sequences are not necessarily the original reads contributing to that contig sequence. The coverage information that is produced from the steps above provide the coverage resulting from mapping to the contig sequences.

For more information on how the De novo assembly works please see the manual as follows:

The CLC de novo assembly algorithm

 

 

 

Knowledge Tags

This page was: Helpful | Not Helpful