New Zealand Statistical Association 2024 Conference

Timothy Bilton

AgResearch

Relatedness estimation in pooled samples sequenced using low-depth high-throughput methods

This is joint work with Ken Dodds, Andrew Griffiths

Relatedness information is central to a number of analyses in genetic applications. One such application is genomic selection, where the variance-covariance component of the random effect terms for the linear mixed model fitted to generate breeding values is proportional to the relatedness. In some breeding applications, samples from several (typically related) individuals are often pooled into a single sample when sequencing to reduce cost or because the phenotyping is at the pooled level. Current methods for estimating relatedness for pooled samples found in the literature do not appropriately account for the structure of the pool when estimating relatedness. In addition, researchers are increasing using high-throughput sequencing (HTS) methods for genotyping. HTS methods provide a low cost and efficient approach for genotyping, but the data generated are subject to errors in the form of miscalled bases and heterozygous genotypes being miscalled as homozygous due to low-read depths. Here, we derive an appropriate estimator of relatedness for pooled samples using HTS data. We present the theory behind the estimator and perform a simulation study to explore the properties of the estimator.