7.3. Why are my local BLAST searches taking so long?
When your BLAST search takes a long time, it is likely that it is still running as expected.
What can affect the speed of a local blast search?
How long a BLAST job takes to run depends on many things, including:
- The size of the query set, i.e. how much sequence data is there, and also the nature of your query set; a few very large sequences can take longer than many smaller sequences.
- The size of the database you are searching
- The parameter choices, such as the number of threads you specified, the word size, expect value and number of hits to be reported. The default number of threads specified in the Workbench is the number of cores on your machine. Please note however, beyond 4 to 6 threads, you may not see much benefit in speed. The specifics of this depend on your exact search, for example searching with many small sequences may scale better than searching with a few large sequences.
- The type of BLAST search you are running. For example, blastx searches take much longer than blastn searches as the entire query set has to be translated in 6 frames and then a search for each of those frames is executed. The tblastx and tblastn searches will take even longer for similar reasons.
- How busy the disks where the databases are stored are.
In our experience running BLAST searches, via CLC software or using NCBI BLAST+ commands directly, if disks where the databases are stored are very busy, a search that takes only a few minutes at quiet times can take up to hours when demand on the disks is very high.
Checking whether blast is still running
When you launch a local blast job on a CLC Workbench or CLC Genomics Server, an NCBI BLAST+ program is run on your local system in the background for the actual searching. If you are concerned about whether a blast job you launched from the CLC Workbench is still running, please try checking for the relevant executable among the processes your machine is running.
Checking the running processes on a system can be done using the Task Manager (Windows), Activity Manager (Mac) or checking the process table (Linux). The BLAST executables have the same name as the type of blast search launched. For example, if you ran a blastn search, look for a running process with the name blastn. For a blastx search, look for a running process with the name blastx.
Running multiple blast searches simultaneously
For much of the time during a search, BLAST+ programs do not use all threads available. Thus, for large query sets, the overall search time for a large sequence list can sometimes be decreased by splitting the query set into several smaller sequence lists, and then launching separate blast searches so that several blast searches are running at the same time.
Considerations when running BLAST searches simultaneously
- There is no guarantee time will be saved using this route.
- Each CLC Workbench BLAST job will report the results separately.
- The number of threads per search job, defined when setting up each search, should take into account the available resources of the machine. (The number may need to be decreased per job when several jobs will be run at the same time.)
- Disk I/O and memory are required for each blast job being run. For example, each search requires that the database be read into memory, which means memory and disk I/O. This is a particularly important consideration when working with large databases.
When trying this route, we would generally recommend limiting the number of simultaneous blast jobs to something relatively conservative (e.g. 2 to 4 jobs or so) in the first instance, and testing the impact on performance as you increase from there.
You can find information about how to split up sequence lists in our Frequently Asked Questions (FAQ) area here:
How can I make subsets of a Sequence List?