1.6. How can I check if my data file is corrupt?
While there can be other reasons behind it, errors related to the import of large files indicating that they are not recognized as valid are often associated with data corruption and in particular file truncation. This is a relatively frequent occurrence when transferring large files, as is commonplace when working with high throughput sequencing data.
One way to test that the data file has not been corrupted on transfer is to get the md5 checksum for the original file and compare it to the md5 checksum of the copy of the file you are working with. If the two checksums are the same, then the two files are the same. If they are different, then, in the case of sequencing data files for example, this would suggest there was a problem when copying the data. The only solution in this case is to get a new copy of the data.
There are a variety of tools that one can install and run to find out the md5 checksum for a file. Some are provided on the Wikipedia page about checksums. We include a couple of the possibilities below.
We are aware that a tool among Windows users for generating md5 checksums is md5summer, which can be downloaded from:
We are not specifically suggesting this tool over others, but if you do not already have a tool installed, the above is one you could try.
On most systems, you could just be able to run the command
to generate the md5 checksum for a given file.
will write the documentation for the tool to your terminal.
The command name and the syntax for running the md5 check on Mac OS X is
If you are certain your file is intact, please submit an error report to the CLC Support team. In the description field, it is helpful to us if you include background information such as whether you have already checked the md5 checksums for example.