HomeCLC FAQ - Analyses-related questionsDe novo assemblyWhy do I get contigs shorter than the minimum contig length I specified?

6.3. Why do I get contigs shorter than the minimum contig length I specified?

If when you set up your de novo assembly in the Genomics Workbench, you either chose Simple Contig output, or you chose to map your reads back to the contigs and unchecked the box labelled "Update contigs",  then you should not see contigs shorter than the minimum contig length you requested.

If however, when you set up your de novo assembly in the Genomics Workbench, you chose to map your reads back to the contigs and also checked the box labelled "Update contigs", then you may have contigs shorter than the minimum length you specified returned to you.

This is because in this case, after the assembly of contigs is done, all contigs that meet the length restriction you set are kept, and these are passed to the read mapping tool. Then all the reads are mapped back to those contigs. The option to update the contigs means that:

  • any contigs with no reads mapping to them are thrown away, and
  • any regions with no evidence in contigs are thrown away

For the latter situation, let's say there was a long contig, and there were many reads mapping to the 5' end, and many reads mapping to the 3' end, but none for a long tract in the middle. That middle bit will get chopped out. Similarly, regions at ends with no coverage by reads would be trimmed away.

The removal of regions that no reads map back to can result in the final list of contigs generated containing members shorter than the minimum length you designated for the output of the assembly itself.

If you choose to continue with the contigs that were returned to you after the mapping, and after the updating of the contigs, you can create a sublist of those meeting any new size restriction you set by using the filtering tools on the table of results, and creating a new data object containing just the results you are interested in.

Information on filtering tables can be found in our manual here:

http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=Filtering_tables.html

 

Knowledge Tags

This page was: Helpful | Not Helpful