A bioinformatic pipeline for identifying informative SNP panels for parentage assignment from RADseq data

Abstract

The development of high-throughput sequencing technologies is dramatically increasing the use of single nucleotide polymorphisms (SNPs) across the field of genetics, but most parentage studies of wild populations still rely on microsatellites. We developed a bioinformatic pipeline for identifying SNP panels that are informative for parentage analysis from restriction site-associated DNA sequencing (RADseq) data. This pipeline includes options for analysis with or without a reference genome, and provides methods to maximize genotyping accuracy and select sets of unlinked loci that have high statistical power. We test this pipeline on small populations of Mexican gray wolf and bighorn sheep, for which parentage analyses are expected to be challenging due to low genetic diversity and the presence of many closely related individuals. We compare the results of parentage analysis across SNP panels generated with or without the use of a reference genome, and between SNPs and microsatellites. For Mexican gray wolf, we conducted parentage analyses for 30 pups from a single cohort where samples were available from 64% of possible mothers and 53% of possible fathers, and the accuracy of parentage assignments could be estimated because true identities of parents were known a priori based on field data. For bighorn sheep, we conducted maternity analyses for 39 lambs from five cohorts where 77% of possible mothers were sampled, but true identities of parents were unknown. Analyses with and without a reference genome produced SNP panels with ≥95% parentage assignment accuracy for Mexican gray wolf, outperforming microsatellites at 78% accuracy. Maternity assignments were completely consistent across all SNP panels for the bighorn sheep, and were 74.4% consistent with assignments from microsatellites. Accuracy and consistency of parentage analysis were not reduced when using as few as 284 SNPs for Mexican gray wolf and 142 SNPs for bighorn sheep, indicating our pipeline can be used to develop SNP genotyping assays for parentage analysis with relatively small numbers of loci.

Description

Keywords

Citation

Andrews, Kimberly R. , Jennifer R. Adams, E. Frances Cassirer, Raina K. Plowright, Colby Gardner, Maggie Dwire, Paul A. Hohenlohe, and Lisette P. Waits. "A bioinformatic pipeline for identifying informative SNP panels for parentage assignment from RADseq data." Molecular Ecology Resources 18, no. 6 (June 2018): 1263-1281. DOI:10.1111/1755-0998.12910.
Copyright (c) 2002-2022, LYRASIS. All rights reserved.