New Zealand Statistical Association 2024 Conference
Hengzhe Zhang
Victoria University of Wellington
EvoFeat: Genetic programming-based feature engineering approach to tabular data
This is joint work with Qi Chen, Bing Xue, Mengjie Zhang
In recent years, in the context of tabular data classification, the emergence of transformer architecture has led to deep learning methods yielding better results than conventional tree-based models. Most of these findings attribute the success of deep learning to the expressive feature construction capabilities of neural networks. Nonetheless, in real-world practice, manually designed high-order features using traditional machine learning methods are still widely used because features based on neural networks can be prone to overfitting. In this talk, we propose a genetic programming-based feature engineering algorithm to automate the feature construction process through trial and improvement. Importantly, genetic programming provides an opportunity to optimize symbolic models, which gradient-based methods often find hard to optimize. On a large-scale classification benchmark involving 130 datasets, the experimental results demonstrate that the proposed method outperforms existing, fine-tuned state-of-the-art tree-based and deep-learning-based classification algorithms.
Log In