How can I import mappings from a SAM/BAM file where the reference names are different to those in the Workbench?
Go Back This FAQ covers guidelines on how to import SAM/BAM file whose reference names are different than those present in the Workbench To import mapping data from a SAM or BAM file you need to already have the reference sequences in the Workbench. The reference sequences in the SAM/BAM file and in the Workbench must match in both name and lengths in order to be able to import mapped data. If the reference names in a SAM/BAM file do not match the reference names in the Workbench, then the easiest route is usually to change names of the reference sequences in the Workbench to match those in your SAM/BAM file. The issue of reference names commonly arises when using data from resources where different naming schemes are applied. For example, in the case of the human genome, chromosomes in different public resources have different naming patterns, such as "chrR", "R" and "NC_00000R", where R is some integer number or a letter. e.g. chr1, 1, chrX and X. If you have a set of reference sequences in the Workbench that use one naming sequence and your SAM/BAM file contains references using a different naming scheme, then the method below can be used to create a reference set that can be used for importing the mapping data.
You do not need to convert your reference set back to track format if you started with a set of references in track format. The genome sequence information in the original track set is the same as that in the stand-alone Sequence List you created. Thus, if you are working with a track-based read mapping, you can just use that alongside your original track-based reference genome sequence, for example, in a Track List. Extra notes on working with tracks within the Workbench:
The above two points mean that the type of renaming described here, for SAM/BAM mapping file import, is not necessary for other types of analysis. |