HomeCLC FAQ - Import, export, and downloadsImport and Export of SAM/BAMHow can I import mappings from a SAM/BAM file where the reference names are different to those in the Workbench?

2.4. How can I import mappings from a SAM/BAM file where the reference names are different to those in the Workbench?

To import mapping data from a SAM or BAM file you need to already have the reference sequences in the Workbench.  The reference sequences the SAM/BAM file and in the Workbench must match in both name and lengths in order to be able to import mapped data.

If the reference names in a SAM/BAM file do not match the reference names in the Workbench, then the easiest route is usually to change names of the reference sequences in the Workbench to match those in your SAM/BAM file.

The issue of reference names commonly arises when using data from resources where different naming schemes are applied.  For example, in the case of the human genome, chromosomes in different public resources have different naming patterns, such as "chrR", "R" and "NC_00000R", where R is some integer number or a letter. e.g. chr1, 1, chrX and X.   If you have a set of reference sequences in the Workbench that use one naming sequence and your SAM/BAM file contains references using a different naming scheme, then the method below can be used to create a reference set that can be used for importing the mapping data.

 

The general process is:

You do not need to convert your reference set back to track format if you started with a set of references in track format. The genome sequence information in the original track set is the same as that in the stand-alone sequence list you created. Thus, if you are working with a track-based read mapping, you can just use that alongside your original track-based reference genome sequence, for example, in a Track List.

Extra notes:

When working with tracks within the Workbench:

    • The names of the references are not checked when determining if different track objects are compatible with one another - for example, can their contents be compared or can they be added to the same track list.  Rather, then number of references in a track set and their lengths is used as the basis of determining if particular track sets are compatible.

    • When importing annotations as tracks using the Import Tracks functionality, the names chrR, R and chromosome R (e.g. chr1, 1 and chromsome 1) are considered synonyms.

      The above two points mean that the type of renaming described here, for SAM/BAM mapping file import, is not necessary for other types of work.

 

Knowledge Tags

Related Pages
This page was: Helpful | Not Helpful