HomeCLC FAQ - Analyses-related questionsDe novo assemblyHow much memory does a de novo assembly take?

6.1. How much memory does a de novo assembly take?

Background

The amount of memory required for any particular de novo assembly will depend on a combination of genome size, data type, data quality and data volume. In simple (and vague) terms, the larger the de Bruijn graph gets, the more memory is needed. Data quality can have an impact on the size of the graph, as can features in the data such as lots of repeats, high heterozygosity, and so on.

Our White Paper about the de novo assembly provides details of the tool as well as giving details of example assemblies, including the memory of the systems that the assemblies were run on:

http://resources.qiagenbioinformatics.com//white-papers/White_paper_on_de_novo_assembly_4.pdf

The white paper outlines the memory requirements when using the Assembly Cell. For users of the Genomics Workbench, there are three phases when running a de novo assembly where you have requested Simple Contig output: a pre-processing phase, a computational phase and a post-processing phase. The Assembly Cell de novo assembly program is the same as the computational phase of running a de novo assembly via the Genomics Workbench. There is thus some additional overhead when running a de novo assembly via the Genomics Workbench. In terms of what this means for considering memory requirements for the Workbench relative to the memory values reported in the White Paper:

  • If nothing else will be run on the machine except the de novo assembly via the Genomics Workbench, you can try adding approximately 1/4 to 1/3 again the amount of memory to that reported as used in the White Paper when considering the minimum amount that you would need on your system.
  • More normally, you may also wish your machine to be able to be used for other small tasks, in which case you should considering adding 1/2 again, or maybe a full times the amount of memory over that reported in the White Paper.

For example, if you decided that your assembly task would likely require about 24Gb of memory for the computational phase, based on the White Paper information, then you should be considering a system with 32 to 48Gb of memory.

We cannot make guarantees of course, as so much depends on your particular data set.

We outline our recommended system requirements for running the Genomics Workbench on our website, but for the reasons outlined above, this does not include any specifics about requirements for de novo assembly:

https://www.qiagenbioinformatics.com/system-requirements/

 

Suggestions

If you are running out of memory, then it may be that you do not have enough memory to run the assembly you are trying to run. However, there are some things that can improve the situation with regards to memory use. Some of the suggestions below (especially 2 and 3) can affect the quality of the output and the speed at which the assembly will complete as well.

 

Suggestion I

Are other tasks running on the same machine at the same time? This could be other de novo assemblies, other read mappings, or other jobs not related to your CLC software that require memory to run. The computational phase of the de novo assembly will assume it has access to as much memory as it needs and does not account for situation where other tasks are running at the same time. If multiple things have been running on the system at the same time when the error occurred, please try running a single de novo assembly again, when other things are not running on the system.

 
Suggestion II
 
Have you trimmed off all adapters in your sequence? This can also make a big difference to the resource demands of de novo analysis as well as to the quality of outputs.
 
Adapter trimming is covered in the manual in this section:
 
http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Adapter_trimming.html
 
 
Suggestion III
 
Have you trimmed your reads for quality? Entering only high quality data can make a big difference to the resource demands of de novo analysis.
 
Quality trimming is covered in the manual in this section:
 
 
 
 
Information on how to improve your de novo assembly can be found in our best practices guides in the manual page:
 
http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Best_practices.html
 

Knowledge Tags

This page was: Helpful | Not Helpful