Quantitative and qualitative assessment of pollen DNA metabarcoding using constructed species mixtures

Document Type


Publication Date


Publication Title

Molecular Ecology



First Page


Last Page



amplification bias, copy number bias, DNA extraction bias, DNA metabarcoding, pollen DNA barcoding, quantitative DNA metabarcoding


© 2018 John Wiley & Sons Ltd Pollen DNA metabarcoding—marker-based genetic identification of potentially mixed-species pollen samples—has applications across a variety of fields. While basic species-level pollen identification using standard DNA barcode markers is established, the extent to which metabarcoding (a) correctly assigns species identities to mixes (qualitative matching) and (b) generates sequence reads proportionally to their relative abundance in a sample (quantitative matching) is unclear, as these have not been assessed relative to known standards. We tested the quantitative and qualitative robustness of metabarcoding in constructed pollen mixtures varying in species richness (1–9 species), taxonomic relatedness (within genera to across class) and rarity (5%–100% of grains), using Illumina MiSeq with the markers rbcL and ITS2. Qualitatively, species composition determinations were largely correct, but false positives and negatives occurred. False negatives were typically driven by lack of a barcode gap or rarity in a sample. Species richness and taxonomic relatedness, however, did not strongly impact correct determinations. False positives were likely driven by contamination, chimeric sequences and/or misidentification by the bioinformatics pipeline. Quantitatively, the proportion of reads for each species was only weakly correlated with its relative abundance, in contrast to suggestions from some other studies. Quantitative mismatches are not correctable by consistent scaling factors, but instead are context-dependent on the other species present in a sample. Together, our results show that metabarcoding is largely robust for determining pollen presence/absence but that sequence reads should not be used to infer relative abundance of pollen grains.

This document is currently not available here.