New Zealand Statistical Association 2024 Conference

Qingyu Meng

University of Auckland

An explainable disease risk prediction model based on deep transfer learning using high-dimensional genomic data

It is crucial to accurately develop a disease risk prediction model in the pursuit of precision medicine. High-dimensional genomic data sources offer valuable insights into biostatistical fields but present significant analytical challenges due to extensive noise and complex relationships. Deep learning has emerged as a leading approach in various areas, such as computer vision, natural language processing, and speech recognition. The state-of-art framework holds promise for genomic data analysis. However, these models often struggle with the curse of dimensionality and lack of biological interpretability, limiting their effectiveness.

In this study, we introduced a deep neural network (DNN)-based framework for prediction modelling. Firstly, we implement feature selection using a newly proposed groupwise feature importance score. The score can efficiently identify genes with both linear and non-linear genetic variant effects. Then, we developed an explainable transfer-learning DNN method that directly integrates feature selection information and achieves downstream analysis. The technique is a stack-style network, so it is compatible with some other feature selections that can be used. Additionally, our DNN framework is biologically interpretable, focusing on selected predictive genes, computationally efficient, and suitable for genome-wide data. Our method demonstrated superior performance in detecting predictive features and predicting disease risk through extensive simulations and real data applications compared to existing methods.