H2Oai

Time Series for H2O with Modeltime

Notes from the Video Matt Dancho, founder of Business Science Introduced to H2O-3 via the AutoML package Sample code in R shared Sample forecasting project / Walmart Sales Tidymodels standardize machine learning packages Modeltime loads H2O Multiple time series Create a forecast time horizon, assess 52 weeks forecast Create preprocessing steps, helps the H2O algos to find good features Some columns are normalized from the pre-processing Extracted Time related features (i.e. week number, day of the week, etc) Initializes H2O-3 / Stacked Ensemble model will be the best but hard to interpret Modeltime workflow starts with a table Modeltime is an organiational tool Modeltime Calibrate will extract the residuals of the models Visualize the forecast on the test set generates nice charts Built a single H2O-3 model to predict on 7 different time series This is very scalable, instead of looping through everything Refit the model on the entire training data and then did a forward walk of 52 weeks Modeltime ecosystem was created to help with higher frequent time series, at scale, that’s automated

H2O Wave App Tutorials

Just like what I did with my general H2O-3 Tutorials, I’m starting a seperate H2O Wave Tutorial post for you all here. I will add to this as I go along and build more apps. Stock Chart Generator Wave App This is a simple candlestick chart generator that defaults to JNJ when you load it. All you need to do is add your Alphavantage API key where it says "apikey": "XXXXXXX" } #ENTER YOUR ALPHAVANTAGE KEY HERE ...

Install H2O's Wave on AWS Lightsail or EC2

I recently set had to set up H2O’s Wave Server on AWS Lightsail and build a simple Wave App as a Proof of Concept. If you’ve never heard of H2O Wave then your been missing out on a new cool app development framework. We use it at H2O to build AI-based apps and you can too, or use it for all other kinds of interactive applications. It’s like Django or Streamlit, but better and faster to build and deploy. ...

Isolation Forests in H2O.ai

A new feature has been added to H2O-3 open source, isolation forests. I’ve always been a fan of understanding outliers and love using One Class SVM’s as a method, but the isolation forests appear to be better in finding outliers, in most cases. From the H2O.ai blog: There are multiple approaches to an unsupervised anomaly detection problem that try to exploit the differences between the properties of common and unique observations. The idea behind the Isolation Forest is as follows. ...

MLI with RSparkling

Last evening my colleague Navdeep Gill (@Navdeep_Gill_) posted a link to his latest talk titled “Interpretable Machine Learning with RSparkling.” Navdeep is part of our MLI team and has a wealth of experience to share about explaining black boxes with modern techniques like Shapley values and LIME. Machine Learning Interpretability (MLI) H2O has this awesome open source Big Data software called Sparkling Water. It’s similiar to RapidMiner’s Radoop but 1) open source, 2) more powerful, and 3) been tested by the masses. It’s stable and runs on many a Hadoop cluster with Spark. The neat thing about Sparkling Water is that you can take the H2O.ai Algorithms and push them down to the cluster to train on your ‘Big Data.’ There are quite a few powerful, fast, and accurate algorithms that H2O-3 has. H2o-3 is the current version of the open source set of algorithms and H2O.ai continues to develop this suite over time. Most recently they added [Isolation Forests](/isolation-forests- h2oai/)! ...

Interpreting Machine Learning Models

I found this short 8 minute video from H2O World about Machine Learning Interpretability (MLI). It’s given by Patrick Hall, the lead for building these capabilities in Driverless AI. My notes from the video are below: ML as an opaque black box is no longer the case Cracking the black box with LIME and Shapley Values Shapley Values won the Nobel Prize in Economics in 2012 After Driverless AI model runs, a dashboard is created Shows the complex feature engineered and the original features Global Shapley Values is like Feature Importance and includes negative and positive contributions Quickly identify what are the important features in the dataset Then go to Partial Dependence Plots, which are the average prediction of the model across different values of the feature Row by Row analysis of each feature can be done to understand interactions and generate reason codes Shapley is accurate for feature contribution, LIME is an approximation Done via stacked ensemble model Can be deployed via Python Scoring pipeline

Making AI Happen Without Getting Fired

I watched Mike Gualtieri’s keynote presentation from H2O World San Francisco (2019) and found it to be very insightful from a non-technical MBA type of way. The gist of the presentation is to really look at all the business connections to doing data science. It’s not just about the problem at hand but rather setting yourself up for success, and as he puts it, not getting fired! ...

Feature Engineering in Driverless AI

Dmitry Larko, Kaggle Grandmaster, and Senior Data Scientist at H2O.ai goes into depth on how to apply feature engineering in general and in Driverless AI. This video is over a year old and the version of Driverless AI shown is in beta form. The current version is much more developed today. This is by far one of the best videos I’ve seen on the topic of feature engineering, not because I work for H2O.ai, but because it approaches the concepts in an easy to understand manner. Plus Dmitry does an awesome job of helping watchers understand with great examples. ...

What's new in Driverless AI?

Arno, H2O’s CTO, gave a great 1+ hour overview in what’s new with Driverless AI version 1.4.1. If you check back in a few weeks/months, it’ll be even better. In all honesty, I have never seen a company innovate this fast. Below are my notes from the video: ...

Getting Started in Data Science Part 2

I’m finally getting around to writing Part 2 of Getting Started in Data Science. The first part can be found here. I made suggestions for university students interested in the field of Data Science. I even made a video about it too. Pick Two, Master One Pick two computer languages and become proficient in one and a master at the other one. Or, pick a platform like H2O-Flow or RapidMiner and a language. Become a master at one but proficient in the other. This way you can set yourself apart from other students or applicants. ...