HomeCLC FAQ - Analyses-related questionsRead mappingShould I use a masked reference when working with exome or amplicon data?

5.9. Should I use a masked reference when working with exome or amplicon data?

Based on the reason described in the paragraph below we recommend that you map exome and amplicon reads to the full genome representing the biological source of those reads.

After mapping to the full reference genome you can restrict InDel and Structural Variant Detection, Variant Detection, and Quality reporting to the target regions.

Therefore, our exome and targeted amplicon sequencing Ready-to-use Workflows in Biomedical Genomics Workbench do by default not include the option for reference masking during mapping, but only during the downstream steps.

Having said this, then there can be situations were masking is appropriate, as this can depend on the organism and amplicon design. To investigate if masking is appropriate in your case, please run the analyses both with and without reference masking to compare the results. This is also described in our manual at:

http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=References_masking.html

 

The reasoning

When mapping reads generated from defined regions of the genome, for example in the case of amplicon or exome data, it is possible that there are reads in the sample set representing regions of your genome that were not among the intended targets. If this occurs, and you map your reads against a reference that has been masked so only the intended target regions are available, then the mapper will try to map all reads. This includes, of course, any reads that were generated from regions outside the intended regions.

In this case, chances are that at least some of these reads, generated from regions of the genome outside the intended regions, will map to the masked reference. That is, reads that represent source regions not are not included in the reference you provide may still map to the reference, if they map well enough (according to the parameters you provide). Such mapped reads often do not match as well as they would have  to their true source region in the genome, so if the true source region had been available to map to, the reads likely would have mapped preferentially there.

This page was: Helpful | Not Helpful