New Zealand Statistical Association 2024 Conference
Joane Elleouet
Scion
Predicting the size of any species' genome using the taxonomy tree
This is joint work with CĂ©line Mercier
Many biological applications require or benefit from knowing the size of the genome of organisms present in an environment. This information provides many insights into the evolution of these organisms in response to ecological processes. Human knowledge of species' genomes has recently increased exponentially and is neatly stored in the well-organised databases of the National Center for Biotechnology Information (NCBI). This enabled the development of the genomesizeR package, a new tool to estimate the size of the genome of any fully or partially taxonomically identified organism. Here we describe the statistical models created for this tool. We highlight insightful aspects of statistical model development and challenges associated with highly hierarchical models, touching on the topic of distributional models. We also describe a model validation process allowing to compare strengths and weaknesses of frequentist and Bayesian statistical approaches as well as a non-model-based algorithm.
Log In