What do Ns represent in the output of my de novo assembly?
Go Back N characters in de novo assembly outputs can represent two things, depending on the de novo assembly parameters used. It is important to know what the Ns represent when considering submission of your genomic assembly to a public repository such as the NCBI. If the de novo assembly was performed with the option "Preform Scaffolding" turned OFF when the N characters can represent:
If the de novo assembly was performed with the option "Preform Scaffolding" turned ON when the N characters can represent:
The first option should be rare, but can be checked by checking whether there is a scaffold annotation associated with tracts of Ns in the assembly output. One way to do this is illustrated in the following figure: For more information regarding how scaffolding can be used to optimized the graph using paired reads please see the manual as follows: http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Optimization_graph_using_paired_reads.html#sec:scaffolding Direct export to AGP format, suitable for submission to the NCBI, is available in the Genomics Workbench. This is described in the manual here: To close the gap with the Ns we recommend using CLC Genome Finishing Module. More information regarding this module can be found on this webpage: |