Below you will find pages that utilize the taxonomy term “H2Oai”
Posts
Time Series for H2O with Modeltime
Notes from the Video Matt Dancho, founder of Business Science Introduced to H2O-3 via the AutoML package Sample code in R shared Sample forecasting project / Walmart Sales Tidymodels standardize machine learning packages Modeltime loads H2O Multiple time series Create a forecast time horizon, assess 52 weeks forecast Create preprocessing steps, helps the H2O algos to find good features Some columns are normalized from the pre-processing Extracted Time related features (i.
Posts
H2O Wave App Tutorials
Just like what I did with my general H2O-3 Tutorials, I’m starting a seperate H2O Wave Tutorial post for you all here. I will add to this as I go along and build more apps.
Stock Chart Generator Wave App This is a simple candlestick chart generator that defaults to JNJ when you load it. All you need to do is add your Alphavantage API key where it says "apikey": "XXXXXXX" } #ENTER YOUR ALPHAVANTAGE KEY HERE
Posts
Install H2O's Wave on AWS Lightsail or EC2
I recently set had to set up H2O’s Wave Server on AWS Lightsail and build a simple Wave App as a Proof of Concept.
If you’ve never heard of H2O Wave then your been missing out on a new cool app development framework. We use it at H2O to build AI-based apps and you can too, or use it for all other kinds of interactive applications. It’s like Django or Streamlit, but better and faster to build and deploy.
Posts
Isolation Forests in H2O.ai
A new feature has been added to H2O-3 open source, isolation forests. I’ve always been a fan of understanding outliers and love using One Class SVM’s as a method, but the isolation forests appear to be better in finding outliers, in most cases.
From the H2O.ai blog:
There are multiple approaches to an unsupervised anomaly detection problem that try to exploit the differences between the properties of common and unique observations.
Posts
MLI with RSparkling
Last evening my colleague Navdeep Gill (@Navdeep_Gill_) posted a link to his latest talk titled “Interpretable Machine Learning with RSparkling.” Navdeep is part of our MLI team and has a wealth of experience to share about explaining black boxes with modern techniques like Shapley values and LIME.
Machine Learning Interpretability (MLI) H2O has this awesome open source Big Data software called Sparkling Water. It’s similiar to RapidMiner’s Radoop but 1) open source, 2) more powerful, and 3) been tested by the masses.
Posts
Interpreting Machine Learning Models
I found this short 8 minute video from H2O World about Machine Learning Interpretability (MLI). It’s given by Patrick Hall, the lead for building these capabilities in Driverless AI.
My notes from the video are below:
ML as an opaque black box is no longer the case Cracking the black box with LIME and Shapley Values Shapley Values won the Nobel Prize in Economics in 2012 After Driverless AI model runs, a dashboard is created Shows the complex feature engineered and the original features Global Shapley Values is like Feature Importance and includes negative and positive contributions Quickly identify what are the important features in the dataset Then go to Partial Dependence Plots, which are the average prediction of the model across different values of the feature Row by Row analysis of each feature can be done to understand interactions and generate reason codes Shapley is accurate for feature contribution, LIME is an approximation Done via stacked ensemble model Can be deployed via Python Scoring pipeline
Posts
Making AI Happen Without Getting Fired
I watched Mike Gualtieri’s keynote presentation from H2O World San Francisco (2019) and found it to be very insightful from a non-technical MBA type of way. The gist of the presentation is to really look at all the business connections to doing data science. It’s not just about the problem at hand but rather setting yourself up for success, and as he puts it, not getting fired!
My notes from the video are below (emphasis mine):
Posts
Feature Engineering in Driverless AI
Dmitry Larko, Kaggle Grandmaster, and Senior Data Scientist at H2O.ai goes into depth on how to apply feature engineering in general and in Driverless AI. This video is over a year old and the version of Driverless AI shown is in beta form. The current version is much more developed today.
This is by far one of the best videos I’ve seen on the topic of feature engineering, not because I work for H2O.
Posts
What's new in Driverless AI?
Arno, H2O’s CTO, gave a great 1+ hour overview in what’s new with Driverless AI version 1.4.1. If you check back in a few weeks/months, it’ll be even better. In all honesty, I have never seen a company innovate this fast. Below are my notes from the video:
H2O-3 is the open source product Driverless AI is the commercial product Makes Feature Engineering for you When you have Domain Knowledge, Feature Engineering can give you a huge lift Salary, Jon Title, Zip Code example What about people in this Zip Code, with # of cars >> generate mean of salaries Create out of fold estimates Don’t take your own prediction feature for training Writes in Python, CUDA and C++ is under the hood that Python directs Able to create good models in an automated way Driverless AI does not handle images Handles strings, numbers, and categorial Can be 100’s of Gigabytes Creates 100’s of models with 1,000’s of new features Creates an ensemble model after its done Then creates a exportable model (Java runtime or Python) C++ version is being worked on All standalone models Connect with Python client or via the web browser Changelog is on docs.
Posts
Open Source
Dear Friend, I’ve been think a lot about open source lately. I’ve also been thinking of closed source and open core too. All those words. What do they mean? Why does it sound so important and confusing at the same time?
Selling AI I’m back in sales now and you can say that I sell ‘AI’. What a strange thing to say, sell AI. I help sell support and Driverless AI.
Posts
Machine Learning Interpretability in R
In this video the presenter goes over a new R package called ‘iML.’ This package has a lot of power when explaining global and local feature importance. These explanations are critical, especially in the health field and if your under GDPR regulations. Now, with the combination of Shapley, LIME, and partial dependence plots, you can figure out how the model works and why.
I think we’ll see a lot of innovation in the ‘model interpretation’ space going forward.
Posts
Matrix Factorization for Missing Values
I stumbled across an interested reddit post about using matrix factorization (MF) for imputing missing values. The original poster was trying to solve a complex time series that had missing values. The solution was to use matrix factorization to impute those missing values.
Since I never heard of that application before, I got curious and searched the web for information. I came across this post using matrix factorization and Python to impute missing values.
Posts
MLI Using LIME Framework
I found this talk to be fascinating. I’ve been a big fan of LIME but never really understood the details of how it works under the hood. I understood that it works on an observation by observation basis but I never knew that it permutates data, tests against the black box model, and then builds a simple linear model to explain it. Really cool. My notes are below the video.
Posts
Exploring H2O.ai
A few years ago RapidMiner incorporated a fantastic open source library from H2O.ai. That gave the platform Deep Learning, GLM, and a GBT algos, something they were lacking for a long time. If you were to look at my usage statistics, I’d bet you’d see that the Deep Learning and GLM algos are my favorites.
Just late last year H20.ai released their driverless.ai platform, an automated modeling platform that can scale easily to GPUs.
Posts
Guide to Getting Started in Data Science
This is the forward to an updated ultimate guide on getting started in data science. I wanted to write a set of ‘getting started’ posts to share with readers on how I became a data scientist at RapidMiner. How I went from a Civil Engineer with an MBA to working for an amazing startup. Granted, I’m not a classically trained Data Scientist, I hardly knew how to code in the beginning but with the right tools and attitude, you can ‘huff’ your way into this field.
Newsletter
Hi friends, I would love it if you signed up for my newsletter. I try not to be obtrusive with it and want to genuinely share valuable information with you. This is a Data Science/Machine Learning related newsletter with a few other random topics related to consulting life.
At the rate I’m sending out newsletters it might be once or twice a year! So completely non-obtrusive and gentle.
If you want to sign up for it, you can do so here or below.
Posts
Introduction to Deep Learning
This is a great introduction to Deep Learning. I know I learned a few things from Phillip.
Some key concepts RapidMiner now can do GPU deep learning. Supports NVIDIA. Easy: Using already loaded Neural Net operators. Harder: Using H20.ai Deep Learning operator. Hardest: Using Keras with RapidMiner. Keras requires more complex setup with RapidMiner. CNN, RNN, LSTM, etc are now available via RapidMiner GUI. Keras supports Tensorflow, CNTK, and Theano. Need Python v3.
Posts
Latest Writings Elsewhere
October 2016 Just a quick list of the content I’ve created some place other than this blog. This current list is 100% RapidMiner related but I’d like to branch out into guest posting. If any of my readers would like a guest post on their blog or site from me, then contact me.
Tips and Tricks: How to use Python and R with Rapidminer Tips and Tricks: Different Ways to Join Data How to Use Data Science to Predict Qualifed Leads December 2018 I’m happy to announce my very first article went live on the H2O.