Should I use a masked reference when working with exome or amplicon data?
Go Back This FAQ provide background information regarding why it is often not appropriate to use a masked reference for mapping Based on the reason described in the paragraph below we recommend that you map exome and amplicon reads to the full genome representing the biological source of those reads. After mapping to the full reference genome you can restrict InDel and Structural Variant Detection, Variant Detection, and Quality reporting to the target regions. Therefore, our exome and targeted amplicon sequencing Template (Ready-to-use) Workflows, provided with the Biomedical Genomics Analysis plugin, do by default not include the option for reference masking during mapping, but only during the downstream steps. Having said this, then there can be situations were masking is appropriate, as this can depend on the organism and amplicon design. To investigate if masking is appropriate in your case, please run the analyses both with and without reference masking to compare the results. This is also described in our manual as follows:
The reasoningWhen mapping reads generated from defined regions of the genome, for example in the case of amplicon or exome data, it is possible that there are reads in the sample set representing regions of your genome that were not among the intended targets. If this occurs, and you map your reads against a reference that has been masked so only the intended target regions are available, then the mapper will try to map all reads. This includes, of course, any reads that were generated from regions outside the intended regions. |