HomeCLC FAQ - Import, export, and downloadsPrinter Friendly Version

CLC FAQ - Import, export, and downloads

Questions related to data import into and export from the CLC Workbenches or Servers, as well as the use of various download tools.

1. Import - General

1.1. Why isn't my Genbank file accepted for import?

The CLC Workbenches accept standard Genbank format files, such as those you can obtain from the Genbank repository.

We are aware that some third party software tools generate Genbank format files that not entirely standard, and when this occurs, such files may not be recognized as Genbank files by the CLC Workbenches. Examples of issues we have seen in the past, and thus which could be worth checking in Genbank files you have been unable to import, include

  • non-standard Locus lines (e.g. Locus lines where identifiers have spaces in them)

  • inclusion of non-standard characters in annotations

  • missing SOURCE or ORGANISM fields

  • badly formatted feature entries. Examples we have seen include feature entries without a name, feature names that run straight into the region with no tab between, and standard qualifiers grouped under non-standard qualifiers put in by third party software.

In at least one piece of software we are aware of, there was an option in the preferences for "strict Genbank" format to be used. That caused the formatting used to adhere to the Genbank format and those files could be imported into the Workbench. If the software you are using offers such an option or feature, then turning that on may be worth trying.

The Gebank/EMBL/DDBJ feature definition can be found online at:

http://www.insdc.org/files/feature_table.html

If you are confident that your file is in valid Genbank format and you are still experiencing problems importing it into a CLC Workbench, and you are eligible for support from us, then please contact the Support team by emailing support-clcbio@qiagen.com. Attaching a copy of your Genbank file to your email may help us troubleshoot the issue. 

1.2. Why do the CLC Workbenches not import the full chromatogram from AB1 or ABI files?

The CLC Workbenches import bases as listed in the AB1 or ABI file. It does not call bases from the chromatogram. Therefore, it also only imports traces for the bases listed in the file. Thus, if bases have been cut on quality, while the full chromatogram is retained by the sequence provider, the part of the chromatogram with no bases called will not be imported.

To import the full chromatogram please ask your sequence provider not to trim bases before sharing the AB1 or ABI files.

1.3. Why is my data import failing?

There are three common types of errors seen when importing data into the Workbench. These are:


Errors associated with the choice of importer

Errors involving the term "Expected token" or "File not a X file"

where the X could be something like "trace".

Importing a file through the Standard import tool will usually run fine on properly formatted, intact files.

When import fails with a message including the words "Expected token" or a message about the file not being imported because it is not a particular type of file, the usual cause is that the import option was set Force Import as Type and an inappropriate format type was selected.

If you experience this, then please try the import again, this time selecting the import option "Automatic import".

An example of the type of error message this pertains to is shown below, as it would appear in the Advanced tab of the error message window:

java.text.ParseException: 1: Expected token: 'AS', was: 'PK'

The Standard Import tool is described in more detail in the manual here:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Import_using_import_dialog.html

 

Errors associated with corrupt data

There are a number of different errors that can be associated with corrupt data. They include:

  • QualityScores with N scores is not valid for sequence of length Y
  • Unexpected end of ZLIB input stream
  • An unexpected error occurred while parsing. This points to corrupt data. Please double-check your data and try again!

Errrors like those in the list above indicate that the data is corrupt. This is a frequent problem when large files are copied across networks.

When such errors arise, please check your data file to make sure it is intact. In some cases, it can be necessary to obtain a new copy from an original source. If you are not sure how to do this, we have some advice about how to check if data files are intact in a related FAQ entry:

How can I check if my data file is corrupt?

If you are certain your data is intact and the license you are using is covered by our Maintenance, Upgrades and Support (MUS) program, please get in touch with the Support team. In this case, we usually need to ask for a copy of the data to investigate further. For small datasets, you can send the data as an attachment. For large datasets, we can set up an ftp area for you to transfer the data to.

 

Errors associated with misformatted data

Genbank files not importing

This topic is discussed in a related FAQ entry: Why isn't my Genbank file accepted for import?

1.4. How to import predesigned primers to the workbench?

  • First make a list of the primers that you want to import in e.g. Excel.
  • The first column should be the primer name and the second column should be the primer sequence. There must not be any empty rows between the primers.
  • Save the file as a .csv file. This can be done by selecting CSV (comma delimited) (*.csv) under Save as type.
  • Import the .csv file into your CLC Workbench using the standard import option and leaving the import setting as automatic.

The primers are now imported as a sequence list.

  • To extract the primers (sequences) as individual primers go to:

Main Workbench: Toolbox | General Sequence Analysis | Extract Sequences

Genomics Workbench: Toolbox | Classical Sequence Analysis | General Sequence Analysis | Extract Sequences

Biomedical Genomics Workbench: Toolbox | Tools | Helper Tools | Extract sequences

  • In the wizard that opens up chose your imported sequence list and click Next.
  • In the next step of the wizard chose to Extract to single sequences and click Next.
  • Choose to Save your results and finish the wizard.

Your primer sequences will now extracted as single sequences (Figure 1).

Figure 1: The sequence in the sequence list are here extracted to single sequences, i.e. individual primers.

Each primer can now be analyzed for its properties. However, please notice that only it is only possible to select one primer at the time for analyzing primer properties.

Attached is an example .csv file with 10 primer sequences.

1.5. How can I import files containing interleaved paired data?

The CLC Genomics Workbench import tools expect that there will be separate files for the members of a pair. That is, all reads in file 1 will have a mate in file 2.

One can, however, import perfectly interleaved, paired sequences into the Workbench.

Here "perfectly" means that each sequence in a pair is followed by its mate. That is, sequence 1 and 2 are members of a pair, 3 and 4 are members of a pair, and so on. This condition must be met as the method outlined below involves importing the data as single reads, and marking it as paired after import. Reads will then be paired based purely on their position in the list of reads., with read 1 paired with read 2, read 3 paired with read 4, and so on.

In other words, no matching of reads names will be done, and the method outlined below will not allow for the inclusion of single reads within the file, nor will it work properly if the file contains unsorted paired data, where all mates may be present, but do not necessarily appear in a consecutive position to their mate in the file.

The method: Assuming that your data is a fastq file consisting of perfectly interleaved, paired sequences, you just need to:

  • Import this fastq as single data, by, for example, using the Import | Illumina importer and not checking the option indicating paired reads.
  • After the import is complete, open the resulting sequence list in your Genomics Workbench.
  • Choose the Element Information view, by clicking on the small icon at the bottom of the viewing area that looks like a piece of paper with a green checkmark on it.
  • Within this view, check the "Paired Sequences" box, enter minimum and maximum distances that are relevant for your sample, and adjust the orientation as needed.

More detailed information about handling paired data within the CLC Genomics Workbench can be found at the following page of our manual:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=General_notes_on_handling_paired_data.html

1.6. How can I check if my data file is corrupt?

While there can be other reasons behind it, errors related to the import of large files indicating that they are not recognized as valid are often associated with data corruption and in particular file truncation. This is a relatively frequent occurrence when transferring large files, as is commonplace when working with high throughput sequencing data.

One way to test that the data file has not been corrupted on transfer is to get the md5 checksum for the original file and compare it to the md5 checksum of the copy of the file you are working with.  If the two checksums are the same, then the two files are the same. If they are different, then, in the case of sequencing data files for example, this would suggest there was a problem when copying the data.  The only solution in this case is to get a new copy of the data.

 

There are a variety of tools that one can install and run to find out the md5 checksum for a file. Some are provided on the Wikipedia page about checksums. We include a couple of the possibilities below.

 

Windows:

We are aware that a tool among Windows users for generating md5 checksums is  md5summer, which can be downloaded from:

http://www.md5summer.org/download.html

We are not specifically suggesting this tool over others, but if you do not already have a tool installed, the above is one you could try.

 

Linux:

On most systems, you could just be able to run the command

md5sum <filename>

to generate the md5 checksum for a given file.

man md5sum

will write the documentation for the tool to your terminal.

 

Mac:

The command name and the syntax for running the md5 check on Mac OS X is

md5sum-lite <filename>

 

If you are certain your file is intact, please submit an error report to the CLC Support team. In the description field, it is helpful to us if you include background information such as whether you  have already checked the md5 checksums for example.

2. Import and Export of SAM/BAM

2.1. How can I import sequence reads in a SAM or BAM file?

You can import the sequencing reads from a BAM file into the CLC Genomics and Biomedical Genomics Workbenches using the standard importer:

File | Import | Standard import

Please leave the import type set to Automatic.

After completing the import wizard, a popup window will appear asking you if you would like to skip importing this BAM file. Please click No in this popup window.

The imported data from the BAM file will be a single Sequence List, or can be multiple Sequence Lists - one for each Read Group in the BAM file. 

 

For paired data imported this way, we recommend that you go to the Element Info view of your data set to check, and where necessary, reset the minimum and maximum distance values. The values set by default are:

  • where a PI tag is present in the SAM/BAM file, the insert size range is set as:

maximum = 2 * PI tag value
minimum  = 1/2 * PI tag value

  • where there is no PI tag in the SAM/BAM file, the Workbench default distances are used: minimum 1, maximum 1000.

Setting the paired read distance range is described here:
http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=General_notes_on_handling_paired_data.html

 

 

2.2. Why do I run out of memory when importing a BAM file containing paired data?

Memory consumption for BAM import depends mainly on the number of broken pairs. A member of a broken pair is kept in memory until its mate is found. If your BAM file contains many broken pairs, then this could account for the high amount of memory being used.

If you have a SAM/BAM file containing many broken pairs and are running out of memory when trying to import it, please try sorting the file on readname and then importing. One can sort on readname using the samtools command samtools -sort -n.

Importing the sorted file should mean that broken pair mates are found earlier, thus releasing the broken pair member from memory earlier.

The above assumes that the paired data in your SAM/BAM file meets the SAM specification in terms of the naming of the mates of the pairs. In section 1.4 of the SAM specification has the following information:

"QNAME: Query template NAME. Reads/segments having identical QNAME are regarded to come from the same template"

What this means is that the expectation in a SAM/BAM file is that members of a pair will have the same name.

We have seen a couple of instances where importing a BAM file containing mapped data with paired data failed with an out of memory error because each member of a pair in the BAM file had a different name. Due to the different names, each such read is seen as part of a pair, but a pair for for which the mate can never be found as it has a different name. In the case then, many sequences would be held in memory, which can eventually lead to an out of memory error.

For example, if you had a read pair with names like this:

CAAAA_7_0013_5111_1
CAAAA_7_0013_5111_2

instead of both members of that pair having the name CAAAA_7_0013_5111, then you will likely run into an out of memory error.

If you had enough memory to import a file with the sort of naming shown above without encoutering an out of memory error, then the mapping that results from that import will likely not be what you want. That is, all such reads would have been recorded as single reads, instead of members of a pair, because when the mates were never found, the sequence would have been marked as single by the CLC Genomics Workbench.

If you are able to generate a BAM format file that meets the SAM specifications, then the CLC Workbench should be able to import the resulting BAM file.

 

Further details about SAM/BAM formats and the CLC Genomics Workbench

Information on what the flags the CLC bio Workbench uses for SAM/BAM format files is outlined in our manual here:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Flags.html

and the SAM specification is at:

http://samtools.sourceforge.net/SAM1.pdf

One open source tool that can be useful in checking a SAM or BAM format file is the open source tool "samtools". We cannot provide support for such third party tools, but if you are interested in it, then the samtools package is available from:

http://sourceforge.net/projects/samtools/files/samtools

2.3. How can I import a BAM file containing data mapped to the hg19 UCSC genome?

If you are attempting to import a BAM format file where the UCSC hg19 reference was used for the mapping process, it is necessary to have the UCSC reference sequences selected in the import wizard of the Workbench. This is different from the hg19 reference obtained through the Download Reference Genome tool in Genomics Workbench and Data Management in Biomedical Genomics Workbench. Information in this FAQ page assumes that the BAM file you are attempting to import was generated by using the UCSC hg19 sequences as the reference for the mapping job that created the BAM file. 

To successfully import your UCSC hg19 based BAM file it is necessary to:

  1. Obtain and import the UCSC hg19 reference sequences into the Workbench

  2. Import your BAM file with the UCSC hg19 reference sequences selected

 Detailed information about how to obtain the UCSC hg19 reference sequence as well as background information regarding this is described below.

 


 

Obtaining the UCSC hg19 reference sequences

If you do not have the UCSC hg19 reference sequences you may obtain them through one of the proposed methods:

 

Request the original reference used to generate the BAM

The best way to ensure you are using the proper reference sequences would be to ask the provider of your BAM file. The person or group who ran the mapping job to generate the BAM file will likely have access to the original fasta that was used as reference for the creation of the BAM. After you have obtained these sequences, you may then import them into the Workbench.

 

Download the reference data from UCSC

The UCSC provides their hg19 reference sequence data on their website. You may download this data directly from the UCSC. To obtain this data directly from the UCSC:

  1. Go to the UCSC hg19 directory of chromosome data: 
         http://hgdownload.soe.ucsc.edu/goldenPath/hg19/chromosomes/

  2. Download each chromosome represented in your BAM files by scrolling to the bottom of the page and clicking each link. The standard set of chromosomes that are most likely included are as follows:

    • chr1.fa.gz
    • chr2.fa.gz
    • chr3.fa.gz
    • chr4.fa.gz
    • chr5.fa.gz
    • chr6.fa.gz
    • chr7.fa.gz
    • chr8.fa.gz
    • chr9.fa.gz
    • chr10.fa.gz
    • chr11.fa.gz
    • chr12.fa.gz
    • chr13.fa.gz
    • chr14.fa.gz
    • chr15.fa.gz
    • chr16.fa.gz
    • chr17.fa.gz
    • chr18.fa.gz
    • chr19.fa.gz
    • chr20.fa.gz
    • chr21.fa.gz
    • chr22.fa.gz
    • chrM.fa.gz
    • chrX.fa.gz
    • chrY.fa.gz

  3. Import all downloaded files into the Workbench by selecting all the gz fasta files in the Import tracks wizard.

More general information about the UCSC provided human data can be found on their webpage: 
     http://hgdownload.soe.ucsc.edu/downloads.html#human 

 

Download the UCSC hg19 reference from NCBI (Only CLC Genomics Workbench)

  1. Download and import the 22 human autosomes and both sex chromosomes from hg19/GRCh37 and the older mitochondrial sequence (NC_001807), with annotations, from Genbank.

    • To do this, use the tool at  Download | Search for Sequences at NCBI

    • Copy and paste the following text as search term as shown in the image below.

      NC_000001.10, NC_000002.11, NC_000003.11, NC_000004.11, NC_000005.9, NC_000006.11, NC_000007.13, NC_000008.10, NC_000009.11, NC_000010.10, NC_000011.9, NC_000012.11, NC_000013.10, NC_000014.8, NC_000015.9, NC_000016.9, NC_000017.10, NC_000018.9, NC_000019.9, NC_000020.10, NC_000021.8, NC_000022.10, NC_000023.10, NC_000024.9, NC_001807.4

    • Select all the rows that result from this search and then click on the button labeled Download and Save.

      Note that this search will give you the 24 normal chromosomes and the mitochondrial chromosome only. If you wish to get a reference set including the ChrUn clone contigs you will need to look up the NCBI accession numbers for these as well.

      NCBI Search

  2. Change the names for each reference sequence so that they will match what has been used in your SAM/BAM mapping file. If the standard UCSC references have been used, then it is likely that chromosome 1 will be named chr1, chromosome 2 will be name chr2 and so on. A table linking the NCBI chromosome names with the corresponding standard UCSC names is provided below.

    You can do this by hand within the Workbench or you can make use of the Batch Rename plugin.

  3. Once the sequences have the correct names, you can convert these reference sequences to Tracks, as described in our manual here:

    http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Convert_tracks.html

    It is the Genbank files that are downloaded from NCBI, thus you can choose to create annotation tracks as well.  Please click on the green + symbol beside the Annotation tracks box in the Wizard to choose the annotation types you wish to create tracks for.

 

Configuring Data Management in Biomedical Genomics Workbench to use UCSC hg19

If you are working with the Biomedical Genomics Workbench, please follow the instructions in the FAQ: How can I use a different reference genome in the Biomedical Genomics Workbench?

 

Background information

The reference sequences in the Workbench must match, both names and lengths, to import mapping data from SAM/BAM files into the Workbench. If a reference sequence differs in either name or length from what is reported in the BAM file, then the Workbench will not see this as a match. This is because reads need to be placed in the right position against the right reference sequence.

For people working with the hg19 human genome reference, different versions of the mitochondrial sequence are in common use. For example, the hg19 reference sequences provided by Ensembl or Genbank are using the hg19 mitochondrial sequence (length 16569bp, Genbank accession NC_012920). UCSC uses a different mitochondrial reference (length 16571bp, GenBank accession NC_001807.4) For further information about the UCSC decision to use a different mitochondrial sequence, please see the UCSC Note on chrM:

http://genome-euro.ucsc.edu/cgi-bin/hgGateway?hgsid=187301261&clade=mammal&org=Human&db=hg19&redirect=auto&source=genome.ucsc.edu

If your mappings are against the hg19 reference sequence and you are seeing warnings about the mitochondrial sequence when importing a SAM or BAM mapping file into the CLC Genomics Workbench, the most common cause for this issue is that your mapping was done using UCSC references and the reference set in the Workbench is from Ensembl or Genbank. These Ensembl and Genbank versions of hg19 include the newer mitochondrial reference, NC_012920, rather than the older one included in the UCSC version, NC_001807.4.  If you used the Download Reference Genome Data tool or Data Management, the hg19 reference genome is from Ensembl and thus has the newer hg19 mitochondrial sequence (length 16569).

  

 

Linking of Genbank GRCh37 accession numbers, sequence names and UCSC hg19 reference sequences
NCBI AccessionNCBI nameUCSC name
NC_000001.10 NC_000001 chr1
NC_000002.11 NC_000002 chr2
NC_000003.11 NC_000003 chr3
NC_000004.11 NC_000004 chr4
NC_000005.9 NC_ 000005 chr5
NC_000006.11 NC_000006 chr6
NC_000007.13 NC_000007 chr7
NC_000008.10 NC_000008 chr8
NC_000009.11 NC_000009 chr9
NC_000010.10 NC_000010 chr10
NC_000011.9 NC_000011 chr11
NC_000012.11 NC_000012 chr12
NC_000013.10 NC_000013 chr13
NC_000014.8 NC_000014 chr14
NC_000015.9 NC_000015 chr15
NC_000016.9 NC_000016 chr16
NC_000017.10 NC_000017 chr17
NC_000018.9 NC_000018 chr18
NC_000019.9 NC_000019 chr19
NC_000020.10 NC_000020 chr20
NC_000021.8 NC_000021 chr21
NC_000022.10 NC_000022 chr22
NC_000023.10 NC_000023 chrX
NC_000024.9 NC_000024 chrY
NC_001807.4 NC_001807 chrM

2.4. How can I import mappings from a SAM/BAM file where the reference names are different to those in the Workbench?

To import mapping data from a SAM or BAM file you need to already have the reference sequences in the Workbench.  The reference sequences the SAM/BAM file and in the Workbench must match in both name and lengths in order to be able to import mapped data.

If the reference names in a SAM/BAM file do not match the reference names in the Workbench, then the easiest route is usually to change names of the reference sequences in the Workbench to match those in your SAM/BAM file.

The issue of reference names commonly arises when using data from resources where different naming schemes are applied.  For example, in the case of the human genome, chromosomes in different public resources have different naming patterns, such as "chrR", "R" and "NC_00000R", where R is some integer number or a letter. e.g. chr1, 1, chrX and X.   If you have a set of reference sequences in the Workbench that use one naming sequence and your SAM/BAM file contains references using a different naming scheme, then the method below can be used to create a reference set that can be used for importing the mapping data.

 

The general process is:

You do not need to convert your reference set back to track format if you started with a set of references in track format. The genome sequence information in the original track set is the same as that in the stand-alone sequence list you created. Thus, if you are working with a track-based read mapping, you can just use that alongside your original track-based reference genome sequence, for example, in a Track List.

Extra notes:

When working with tracks within the Workbench:

    • The names of the references are not checked when determining if different track objects are compatible with one another - for example, can their contents be compared or can they be added to the same track list.  Rather, then number of references in a track set and their lengths is used as the basis of determining if particular track sets are compatible.

    • When importing annotations as tracks using the Import Tracks functionality, the names chrR, R and chromosome R (e.g. chr1, 1 and chromsome 1) are considered synonyms.

      The above two points mean that the type of renaming described here, for SAM/BAM mapping file import, is not necessary for other types of work.

 

2.5. How are mappings to circular references handled when exporting to or importing from SAM/BAM files?

How are mappings to circular references handled when exporting to or importing from SAM/BAM files?

The manner the CLC Workbenches handle export to or import of mappings from SAM/BAM format reflects the fact that there is no explicit convention for the handling of reads that span the origin of a circular reference in the SAM/BAM format (https://samtools.github.io/hts-specs/SAMv1.pdf). Our aim is to represent the information as correctly as possible with this in mind.

 

Things of note when exporting mappings with a circular reference to SAM/BAM format

  • Reads that map across the origin of a circular reference sequence are labeled as unmapped (flag 4) when exported to SAM/BAM format from the CLC Genomics Workbench and Biomedical Genomics Workbench.

If your primary aim is to export a mapping to SAM/BAM format, then you may wish to mark circular references to linear instead before running the read mapping. If this is done, then reads that would have mapped across the origin of the circular reference, will map to the end of the linearized reference where it best matches, assuming that minimum length and similarity fraction requirements are met. The part of the read that matches the other end of the linearized reference will extend beyond the end of the reference.

Otherwise, we would generally recommend that circular genomes are marked as circular so that reads that span the origin can be mapped and viewed accordingly.

 

Things of note when importing mappings with a circular reference from SAM/BAM format

  • Mappings imported into the CLC Workbench from a SAM/BAM file will be presented as if the references were linear, even when the references selected when importing the SAM/BAM file are marked as circular.
  • Reads flagged as unmapped in the SAM/BAM file can be imported to a sequence list by checking the "Import unmapped reads" option in the SAM/BAM Mapping Files import tool
  • Reads that originally mapped across the origin of a circular reference will be marked as unmapped if you exported the mapping from a CLC Workbench. Such reads would therefore not be present in the re-imported mapping. You would likely notice a drop in the coverage level at either end of the reference sequence in such a case.

When working with SAM/BAM files and where one or more reference sequence is circular, we recommend that you:

The images below demonstrate some differences that can be observed when mapping to a circular reference that has been marked in the CLC Workbench as circular or has been marked as linear.

 

 

Figure 1: Visualizations of mappings to circular or linear versions of a reference sequence in a CLC Workbench. The top images show the left and right hand ends of a mapping done against a reference marked as linear. The corresponding images for a mapping done against a reference marked as circular are shown in the bottom images.

Linear reference, top images: A read that would have mapped across the origin here maps to one end of the linearized reference, and extends beyond it, as indicated by a < to the left of the read at the 5' end, or a > symbol to the right of the read at the 3' end. In this example 10 reads mapped to the 5' end of the reference, while 6 reads mapped to the 3' end. One read that mapped to the circular reference could not map to either end of the linearized version.

Circular reference, bottom images: The origin of the circular reference is at position 0 in the view. Reads that mapped across the origin are indicated by a << symbol at the left hand side and a >> symbol at the right hand side. Here, 17 reads map across the origin, and the coverage at each end of the reference is thus 17.

 

 

Figure 2: Comparison between mappings done with circular or linear versions a reference in a CLC Workbench before and after export and re-import from a BAM file.

Track 1, top track: The left hand end of a mapping against a reference marked as linear. Reads that extended beyond the end of the reference sequence are present, as illustrated by the < symbols at the left hand side.

Track 2: The left hand end of the mapping shown in track 1, after it was exported to BAM format and reimported into the CLC Workbench. The reads that extended beyond the end of the reference sequence are still present after re-import, as illustrated by the < symbols at the left hand side.

Track 3: The left hand end of a mapping against a reference marked as circular. Reads that map across the origin of the reference sequence are present, as illustrated by the << symbols at the left hand side.

Track 4, bottom track: The left hand end of the mapping shown in track 3, after it was exported to BAM format and reimported into the CLC Workbench. Reads that spanned the circular reference origin in the original mapping are marked as unmapped in the exported BAM file. They are thus not present in the mapping after re-import into the CLC Workbench.

3. Export and printing

3.1. Can I rearrange the layout of a sequence or alignment for publishing?

You can change the layout using the view settings in the side panel, which are available when your data object is open in the viewing area. If you need more detailed control over e.g. how annotation labels are placed etc, you can export the data in a vector graphics format. To do this, click on the button labelled Graphics in the toolbar, or go to File | Export Graphics and select the Vector graphs (.svg) format.

As svg is a vector graphics format , all the elements in the view can be edited and moved individually using a vector graphics editor.

3.2. Why is the output either too big or too small when I print, compared to what I see on screen?

The images sent from the Workbench have a resolution of 600dpi. If the settings for your printer specify a different image quality, the image will be scaled. To print the image in the proper size, you need to access the properties of your printer after pressing Print from the Workbench. Set the printing resolution to 600dpi, and the image should be printed in the right size. For details on how to set the printing resolution of your printer, consult your operating system manual or printer manual.

4. Download tools

4.1. Why has Download Genome failed in the CLC Genomics Workbench?

The Download Genome functionality of the Genomics Workbench goes out to third party URLs to access the data resources requested. The most common causes of problems with this tool are listed below:

1) You are working at a site with a proxy

To access locations on the external network, you wil need to ensure your Workbench is configured with the correct proxy information. How to enter proxy information into the Genomics Workbench is described in our manual here:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Network_configuration.html

Once the settings are entered, please restart your Workbench and try using the Download Genomes tool again.

 

2) The remote site with the data is not accepting connections at this time.

If this is the case, then trying again another time should solve the problem.

 

3) The remote site is blocking access from your site.

This is not very common, but can happen. 

 

What to do if Download Genome fails for you

  • Check your proxy settings.

    If you are working at a site where there is a proxy, you can check if it is configured in the Workbench correctly by trying out other tools in the Workbench that connect to external sites. One example of this is the "Search for Sequences at NCBI" tool, which is under the Download button on the Workbench toolbar.

    If that tool works for you, but Download Genomes does not,  then,

  • Please wait a little while, for example a half hour, and then try the Download Genomes tool again. If there was a problem at the external site where the data is being downloaded from, sometimes the site will become available again after a short while.

If the above suggestions doe not work, then please:

4.2. Why are no BLAST databases listed in the Download BLAST Databases window?

The tool in the CLC Workbenches at:

Toolbox | BLAST | Download BLAST Databases

connects to a site at the NCBI:

ftp://ftp.ncbi.nlm.nih.gov/blast/db/

where they make pre-formatted BLAST databases available. This means that you must be able to connect to the external network from the machine that the CLC Workbench is running on  and if there is a proxy at your site, the Workbench must be configured to use the relevant proxy information.

If your machine is unable to connect the external network, and specifically to the ftp site above, then the "Download BLAST Database" window will not have any content.

Instructions for how to configure the Workbench with proxy information can be found in our manual at:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Network_configuration.html

You may need to talk to a local IT person to find what the correct settings are for your site.

If connecting to the external network is not possible on the machine the Workbench is installed on, you could download pre-formatted databases on another machine and put them in a CLC database location, or you can create your own BLAST databases. These are described in our manual at:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Make_pre_formatted_BLAST_databases_available.html

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_local_BLAST_databases.html

 

Please note that the nt and nr databases from the NCBI are very big. If these are the databases you wish to search against, it will be best to download the pre-formatted database files rathe than attempting to build these databases yourself. You may also wish to check out if others at your site are already making the relevant BLAST databases available for searching so you do not have to download and store the database yourself.