I posted about AutoML-zero in my newsletter a few weeks ago. I found genetic programming and applying evolutionary search for machine learning to be really powerful. Here’s another article referencing AutoML-zero and briefly talks about how AI is evolving with these approaches.

The program discovers algorithms using a loose approximation of evolution. It starts by creating a population of 100 candidate algorithms by randomly combining mathematical operations. It then tests them on a simple task, such as an image recognition problem where it has to decide whether a picture shows a cat or a truck. (via ScienceMag)

and,

The system creates thousands of these populations at once, which lets it churn through tens of thousands of algorithms a second until it finds a good solution. The program also uses tricks to speed up the search, like occasionally exchanging algorithms between populations to prevent any evolutionary dead ends, and automatically weeding out duplicate algorithms.

When they talk about populations, they’re referring to different algorithms with different hyperparameters. For example you could have a SVM with a linear kernel with some C and gamma value as a parent and its offspring (child) could be a SVM with a RBF kernel of some C and gamma value. Whatever algorithm gives you best performance (outcome) is kept and then further tweaked.

I really like these approaches to hyperparameter tuning and feature engineering. They definitely have their uses and can squeeze out more performance for your optimization task.

Feature Stores

The concept of feature generation for ‘profit’ is what every large organization wants to capitalize on right now. What makes evolutionary inspired feature engineering so powerful is how it helps optimize your performance by building the right set of features with the right hyperparameters and with the right algorithm. These organizations want to build “feature stores” that are created after some Data Scientist builds models with generated features and gets some wild and awesome result. He/she is then supposed to give the code that generates these new features to the Hadoop admin.

The Hadoop admin is then supposed to generate all the features on their cluster and then use those features downstream for other models. I think this approach is dangerous. If you use an evolutionary approach to optimize your models and features then you should realize that they’re tied together in one pipeline. You can’t just break them apart and use the optimize model + hyperparameters on some other dataset or extract the generated features for another model and expect to get great performance. It doesn’t work that way.

Each pipeline, whether it’s features or an optimized model is tuned for that specific task on that specific dataset. For example, if it’s on a churn dataset then you better apply those features and tuned algorithm to a churn scoring dataset of the same shape.

This is why the concept of “Feature Stores” is not the right way to think about this problem. I think the proper way to think about it is as “Feature Pipelines”. Generate optimized pipelines specific to your data and go from there.