New Zealand Statistical Association 2024 Conference

Sajeeka Nanayakkara

University of Otago

Evaluating the predictive performance of regression and machine learning models for rare events: A simulation study on structural recurrence in thyroid cancer

This is joint work with Jiaxu Zeng, Robin Turner, Matthew Parry and Mark Sywak

In clinical settings, risk prediction models are crucial for aiding decision-making. Various methods, including regression-based and machine learning approaches, are used to develop these models, yet the selection of the most appropriate method remains a challenge. We performed a simulation study to examine the influence of the data-generating process on the relative predictive performance of regression-based and machine learning methods for predicting structural recurrence in thyroid cancer patients, particularly with low event occurrences. The dataset included patients from the endocrine surgical unit of the University of Sydney (2000–2018), with 13 predictors such as demographics, tumour, and lymph-node characteristics. We used eight different methods, including logistic regression with all selected predictors, backward elimination, shrinkage methods (LASSO, ridge, and elastic net), random forests, gradient boosting machines and extreme gradient boosting for data-generating process to simulate outcomes. For each data-generating process, training and test samples were generated through resampling with replacement from the original dataset, with training sample sizes of 500, 2000 and 10,000, and a large test dataset consisting of 100,000 samples. The eight methods were applied for model training on each simulated training sample, and performance was evaluated on the test sample using c-statistics, calibration slope, integrated calibration index, and prediction errors. Shrinkage methods outperformed other methods across all data-generating processes, demonstrating robustness in handling rare events. Conversely, random forests performed poorly, with overfitting issues becoming more pronounced as training sample sizes increased. This study emphasizes the importance of selecting predictive models based on data characteristics and event occurrence.