New Zealand Statistical Association 2024 Conference


Hengzhe Zhang

Victoria University of Wellington

EvoFeat: Genetic programming-based feature engineering approach to tabular data


This is joint work with Qi Chen, Bing Xue, Mengjie Zhang

In recent years, in the context of tabular data classification, the emergence of transformer architecture has led to deep learning methods yielding better results than conventional tree-based models. Most of these findings attribute the success of deep learning to the expressive feature construction capabilities of neural networks. Nonetheless, in real-world practice, manually designed high-order features using traditional machine learning methods are still widely used because features based on neural networks can be prone to overfitting. In this talk, we propose a genetic programming-based feature engineering algorithm to automate the feature construction process through trial and improvement. Importantly, genetic programming provides an opportunity to optimize symbolic models, which gradient-based methods often find hard to optimize. On a large-scale classification benchmark involving 130 datasets, the experimental results demonstrate that the proposed method outperforms existing, fine-tuned state-of-the-art tree-based and deep-learning-based classification algorithms.

Copyright © 2024 Victoria University of Wellington. All Rights Reserved.

Log In