What's new in Driverless AI?
Arno, H2O’s CTO, gave a great 1+ hour overview in what’s new with Driverless AI version 1.4.1. If you check back in a few weeks/months, it’ll be even better. In all honesty, I have never seen a company innovate this fast.
Below are my notes from the video:
- H2O-3 is the open source product
- Driverless AI is the commercial product
- Makes Feature Engineering for you
- When you have Domain Knowledge, Feature Engineering can give you a huge lift
- Salary, Jon Title, Zip Code example
- What about people in this Zip Code, with # of cars >> generate mean of salaries
- Create out of fold estimates
- Don’t take your own prediction feature for training
- Writes in Python, CUDA and C++ is under the hood that Python directs
- Able to create good models in an automated way
- Driverless AI does not handle images
- Handles strings, numbers, and categorial
- Can be 100’s of Gigabytes
- Creates 100’s of models with 1,000’s of new features
- Creates an ensemble model after its done
- Then creates a exportable model (Java runtime or Python)
- C++ version is being worked on
- All standalone models
- Connect with Python client or via the web browser
- Changelog is on docs.h2o.ai
- Tests against Kaggle datasets
- BNP Paribas Kaggle set, Driverless AI ranked in the top 10 out of the box
- Took Driverless AI 2 hours, whereas Grandmasters it took 2 months
- Discussed how Logloss is interpreted
- Uses Reusable Holdout(RH) and subsamples of RH
- Driverless AI uses unsupervised methods to make supervised models
- Uses XGBoost, GLM, LightGBM, TensorFlow CNN, and Rule Fit
- Implemented in R’s datatable for feature engineering and munging
- Working on a open source version of R’s datatable in Python
- Overview in how Driverless AI handles outliers (AutoViz)
- AutoViz only plots what you should see, not 100’s of scatterplots like Tableau
- Overview on the GUI, what you can do
- Validation and Test sets. How to use them and when
- Checks data shift in training and testing set
- Includes Machine Learning Interpretability suite
- Does Time Series and NLP
And much more! Arno’s presentation style is excellent and he makes Data Science simply understood.