N characters in de novo assembly outputs can represent two things, depending on the de novo assembly parameters used. It is important to know what the Ns represent when considering submission of your genomic assembly to a public repository such as the NCBI.


If the de novo assembly was performed with the option "Preform Scaffolding" turned OFF when the N characters can represent:
  1. Positions where all the input sequencing reads themselves contain Ns.

If the de novo assembly was performed with the option "Preform Scaffolding" turned ON when the N characters can represent:
  1. Positions where all the input sequencing reads themselves contain Ns.
  2. ​Regions between scaffolded contigs. Here, the number of Ns represent the approximate distance between contigs in the reported scaffold.

The first option should be rare, but can be checked by checking whether there is a scaffold annotation associated with tracts of Ns in the assembly output. One way to do this is illustrated in the following figure:

User-added image


For more information regarding how scaffolding can be used to optimized the graph using paired reads please see the manual as follows:
http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Optimization_graph_using_paired_reads.html#sec:scaffolding
 

Direct export to AGP format, suitable for submission to the NCBI, is available in the Genomics Workbench. This is described in the manual here:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=AGP_export.html

 

To close the gap with the Ns we recommend using CLC Genome Finishing Module. More information regarding this module can be found on this webpage:

https://digitalinsights.qiagen.com/products-overview/discovery-insights-portfolio/analysis-and-visualization/qiagen-clc-genome-finishing-module