Below you will find pages that utilize the taxonomy term “AI Machine Learning”
Posts
Python to the Rescue
I’m evaluating whether or not I should move this blog to another CMS platform so I can start building a community around it like it was before.
Right now this blog runs on Hugo and AWS Amplify and it’s freaking awesome. I push new posts to GitHub, AWS pulls the changes and rebuilds the site, and then I can look at make sure it looks fine before I merge into master.
Posts
Restarting the Site
I shut Neural Market Trends down on the last day of August thinking it was going to be for good. Things have changed on my end and I’m thinking of restarting this site. I’m thinking of moving back to my roots and building a community.
Why? Because of something I saw a few days ago.
I was on a work call the other day and a prospect was sharing his screen.
Posts
Kubernetes & Julia Language
I’m coming up on my 3rd anniversary at H2O.ai and we’ve embarked on an interesting pivot of sorts. Our new product is based on Kubernetes. At first I thought Kubernetes is just another buzz word and a way for the Cloud providers to extract more money from users, but I’ve come to realize that it’s not.
Of course, there are pros and cons with Kuberentes, it’s new and isn’t widely adopted by organizations as of yet, but it’s neat from a container isolation and scalablity standpoint.
Posts
Time Series for H2O with Modeltime
Notes from the Video Matt Dancho, founder of Business Science Introduced to H2O-3 via the AutoML package Sample code in R shared Sample forecasting project / Walmart Sales Tidymodels standardize machine learning packages Modeltime loads H2O Multiple time series Create a forecast time horizon, assess 52 weeks forecast Create preprocessing steps, helps the H2O algos to find good features Some columns are normalized from the pre-processing Extracted Time related features (i.
Posts
2020 AI Market in Review
2020! What a crazy year! We had the Covid19 pandemic, major economic shocks, two big IPO’s in the immediate ‘AI’ space, and lots of new machine learning libraries and innovation.
Right now there are so many startups in so many different niche markets or services that it’s hard to count. I’ll highlight the ones that I stumbled across.
First is Spell.ml. A buddy of mine ended up there and clued me into what they do.
Posts
Facebook & Political Ads Targeting
Talk about scummy, Facebook is getting nervous. NYU Researchers @LauraEdelson2 and Damon McCoy were sent a cease and desist letter into their research on how Facebook is targeting political ads. To say the least this is a very hot topic considering all the voter suppression and false information that led to the current Administration.
At a minimum, this question SHOULD be asked and it SHOULD be answered to everyone’s satisfaction. Facebook, IMHO has become a tool for propoganda and manipulation.
Posts
Python Tutorials
This is my ongoing list of Python Tutorials. I’m currently merging my various Python Tutorials into one cohesive list below as a way to reduce the amount of posts all over my site. Thank you for your patience and please feel free to drop me a Tweet if you have any comments or questions.
Learning Python Programming the Easy Way
I picked up python programming when I needed to do something but couldn't figure out how to connect the dots.
Posts
H2O Tutorials
Introduction to Driverless AI I finally posted a new video on my YouTube channel after a year of no activity. It felt good and is part of my ‘content refresh’ project I’m working on. In this video I do an introduction to Driverless AI and its EDA capabilities. The forthcoming videos will go into the training, testing, diagnostics, machine learning interpretability, and much more. Please drop a comment or question in the channel if you have any.
Posts
RapidMiner Tutorials
It would appear that making a rolling list of all my old and newer RapidMiner Tutorials would be helpful for readers. It should be noted that these tutorials and videos were made for differnet versions of RapidMiner (version 5 to 7) and may look a bit old and dated BUT they are still relevant to this day. The functionality of RapidMiner is still the same but the internal organization and look will be different.
Posts
TensorFlow & Julia on a Jetson Nano
A few months ago I bought a Jetson Nano as a Christmas present for myself. I promptly got busy with work and life and forgot to set up until a month ago. I followed the instructions on how to flash the unit and got Ubuntu 18 up and running. That was the easy part, it was when I wanted to install Julia, TensorFlow, and Python where it got less easy.
Posts
Software is eating the World
I started playing bass guitar when I was in high school. The whole goal was to make our own music and get ‘chicks.’ I won’t lie, I was a horny teenager. Over the course of the years and through college, I played in heavy metal, punk, and improv music bands. I loved every minute of it. We recorded, played live gigs in New York City - even played at CBGB’s twice - and built up a small DIY following.
Posts
AI is Evolving
I posted about AutoML-zero in my newsletter a few weeks ago. I found genetic programming and applying evolutionary search for machine learning to be really powerful. Here’s another article referencing AutoML-zero and briefly talks about how AI is evolving with these approaches.
The program discovers algorithms using a loose approximation of evolution. It starts by creating a population of 100 candidate algorithms by randomly combining mathematical operations. It then tests them on a simple task, such as an image recognition problem where it has to decide whether a picture shows a cat or a truck.
Posts
Groovy over Python?
After a few frustrating events where I had some python code blow up because of dependencies, I started looking hard a using Groovy going forward.
For some simple things, Groovy and Python are very easy. For example if I wanted to read the latest sales from Park.IO and print them out, I could do the following in Python on my Mac.
import pandas as pd df = pd.read_csv('https://park.io/orders/export.csv') print (df) If I tried that on my old Raspberry Pi, I’d run into dependency issues w.
Posts
Groovy for Data Science
I was looking through Reddit the other day and came across an interesting post in r/Groovy. The group itself has been relatively dead for a while but sprang to life with a question about using Groovy for Data Science. Granted, Python IS eating the world right now for Data Science with R right behind it and Java still hanging out with all the Big Data stuff.
I’ve always liked Groovy because it’s very Python like (high level-ish) but compiles down to byte-code during runtime.
Posts
How to Recognize AI Snakeoil
My colleague posted an interesting Princeton presentation to our group Slack channel. It’s about how AI is being used in social interactions such as hiring decisions. I completely get why HR departments and companies want to do this. They get 100’s of resumes from people and want to weed through the lot of them to get to the right candidates. This is how AI can help, optimizing processes to assist people to make the right decisions.
Posts
Go Master Quits, AI too powerful
I just read that Lee Sedol is retiring from the competitive Go world because of AlphaGo.
The South Korean said he had decided to retire after realizing: "I’m not at the top even if I become the number one."
This is a bit sad but understandable. I wrote about Lee’s historic match with AlphaGo a while ago. He ended up losing four matches out of five, which was an ass-kicking for the #1 Go player in the world.
Posts
AI Trading & Lehman Brothers?
WOW! I posted this over 12 years ago. Talk about being spot on the money! I hate to do this but I’m going to quote myself here:
I firmly believe that data mining, AI, and machine learning trading will accelerate over the years. Who knows, maybe my little model will move markets one day! :)
As far as the trend in AI goes, it’s really getting started now. AI is being adopted by the next tier of companies.
Posts
Investing in the S&P500 beats AI Trading
I forgot to link to this image back in early May 2019. It’s from Bloomberg and it makes a lot of sense, maybe eye opening for some.
Essentially it shows that the market is the be and end all. Of course the market is an average and there are funds or AI traders that beat the average, but others will underperform the average. Over time, your little ’edge’ will eventually underperform and you have to keep writing new code and new strategies.
Posts
Downloading SEC.GOV data
I’ve finally found a way to download SEC.GOV data in a consistent and less stressful way. I want to give the University of Notre Dame Software Repository for Accounting and Finance a shout out for their excellent work. Thanks to them I can finally start taming this beast.
I’ve struggled for years trying to figure out how to download SEC data. That repository is so wacky that it’s hard to find filings in there.
Posts
Auto Support Resistance Lines in Forex
On the heels of my last post, I’ve extended those functions to the EURUSD pair. The data starts from this year 2019 and goes through to yesterday. It’s actually a pretty neat script as it takes data from Onada and then generates the support and resistance lines for that particular pair. The next step would be to create a buy/sell order in the Oanda Practice Account. Once I do that it’s then a matter of writing a trading strategy and testing it in real time.
Posts
Isolation Forests in H2O.ai
A new feature has been added to H2O-3 open source, isolation forests. I’ve always been a fan of understanding outliers and love using One Class SVM’s as a method, but the isolation forests appear to be better in finding outliers, in most cases.
From the H2O.ai blog:
There are multiple approaches to an unsupervised anomaly detection problem that try to exploit the differences between the properties of common and unique observations.
Posts
MLI with RSparkling
Last evening my colleague Navdeep Gill (@Navdeep_Gill_) posted a link to his latest talk titled “Interpretable Machine Learning with RSparkling.” Navdeep is part of our MLI team and has a wealth of experience to share about explaining black boxes with modern techniques like Shapley values and LIME.
Machine Learning Interpretability (MLI) H2O has this awesome open source Big Data software called Sparkling Water. It’s similiar to RapidMiner’s Radoop but 1) open source, 2) more powerful, and 3) been tested by the masses.
Posts
Interpreting Machine Learning Models
I found this short 8 minute video from H2O World about Machine Learning Interpretability (MLI). It’s given by Patrick Hall, the lead for building these capabilities in Driverless AI.
My notes from the video are below:
ML as an opaque black box is no longer the case Cracking the black box with LIME and Shapley Values Shapley Values won the Nobel Prize in Economics in 2012 After Driverless AI model runs, a dashboard is created Shows the complex feature engineered and the original features Global Shapley Values is like Feature Importance and includes negative and positive contributions Quickly identify what are the important features in the dataset Then go to Partial Dependence Plots, which are the average prediction of the model across different values of the feature Row by Row analysis of each feature can be done to understand interactions and generate reason codes Shapley is accurate for feature contribution, LIME is an approximation Done via stacked ensemble model Can be deployed via Python Scoring pipeline
Posts
Microsoft the AI Powerhouse
I’ve been a long term shareholder of MSFT and I’ve been rewarded quite well. Under the leadership of CEO Nadella, Microsoft has become and AI powerhouse, and I believe he’ll win the cloud computing wars. Right now Amazon is the dominant player but based on what I see in Azure’s development, I think both companies will be ’neck to neck’ in a few short years.
Investors have rewarded the stock with new highs and crossing (before retracing just below) a $1 Trillion dollar valuation.
Posts
Machine Learning Making Pesto Tastier
Now this is something I can get behind, using machine learning to make Pesto tastier! The article is really about growing Basil with higher concentrations of volatile compounds that affect the taste, which in turn is the key ingredient in Pesto.
The researchers behind the AI-optimized basil used machine learning to determine the growing conditions that would maximize the concentration of the volatile compounds responsible for basil’s flavor. The study appears in the journal PLOS One today.
Posts
TensorFlow and High Level APIs
I got a chance to watch this great presentation on the upcoming release of TensorFlow v2 by Martin Wicke. He goes over the big changes - and there’s a lot - plus how you can upgrade your earlier versions of TensorFlow to the new one. Let’s hope the new version is faster than before! My video notes are below:
TensorFlow Since it’s release, TensorFlow (TF) has grown into a vibrant community Learned a lot on how people used TF Realized using TF can be painful You can do everything in TF but what is the best way TF 2.
Posts
Making AI Happen Without Getting Fired
I watched Mike Gualtieri’s keynote presentation from H2O World San Francisco (2019) and found it to be very insightful from a non-technical MBA type of way. The gist of the presentation is to really look at all the business connections to doing data science. It’s not just about the problem at hand but rather setting yourself up for success, and as he puts it, not getting fired!
My notes from the video are below (emphasis mine):
Posts
Functional Programming in Python
I’m spending time trying to understand the differences between writing classes and functions in Python. Which one is better and why? From what I’m gathering, a lot of people are tired of writing classes in general. Classes are used in Object Oriented Programming (OOP) and some python coders hate it because it’s writing too many lines of code when only a few really matter. So programmers like functional programming (FP) in python instead.
Posts
Feature Engineering in Driverless AI
Dmitry Larko, Kaggle Grandmaster, and Senior Data Scientist at H2O.ai goes into depth on how to apply feature engineering in general and in Driverless AI. This video is over a year old and the version of Driverless AI shown is in beta form. The current version is much more developed today.
This is by far one of the best videos I’ve seen on the topic of feature engineering, not because I work for H2O.
Posts
What's new in Driverless AI?
Arno, H2O’s CTO, gave a great 1+ hour overview in what’s new with Driverless AI version 1.4.1. If you check back in a few weeks/months, it’ll be even better. In all honesty, I have never seen a company innovate this fast. Below are my notes from the video:
H2O-3 is the open source product Driverless AI is the commercial product Makes Feature Engineering for you When you have Domain Knowledge, Feature Engineering can give you a huge lift Salary, Jon Title, Zip Code example What about people in this Zip Code, with # of cars >> generate mean of salaries Create out of fold estimates Don’t take your own prediction feature for training Writes in Python, CUDA and C++ is under the hood that Python directs Able to create good models in an automated way Driverless AI does not handle images Handles strings, numbers, and categorial Can be 100’s of Gigabytes Creates 100’s of models with 1,000’s of new features Creates an ensemble model after its done Then creates a exportable model (Java runtime or Python) C++ version is being worked on All standalone models Connect with Python client or via the web browser Changelog is on docs.
Posts
Flux Machine Learning for Julia
There was a HUGE announcement on the Julia blog a few days ago. The convergence of a language for machine learning and marrying it with a compiler just got a bit closer. Julia announced Flux, a machine learning frame work for Julia.
Julia Language started out with the goal to create a language that was elegant for computations (i.e. math and machine learning), easy to code, and can take advantage of all that a hardware can offer by a specialized compiler.
Posts
What is Reusable Holdout?
Overfitting and introducing bias during model training is always a big topic in data science. Typically you train a model using Cross Validation by creating a model on k-1 folds and test it on the remaining one fold. This one fold is the holdout set and will usually work very well if, and only if, the trained model is independent of the holdout set. Under normal situations, this works well, but you might begin to leak information into the model as the test fold changes.
Posts
Open Source
Dear Friend, I’ve been think a lot about open source lately. I’ve also been thinking of closed source and open core too. All those words. What do they mean? Why does it sound so important and confusing at the same time?
Selling AI I’m back in sales now and you can say that I sell ‘AI’. What a strange thing to say, sell AI. I help sell support and Driverless AI.
Posts
Machine Learning Interpretability in R
In this video the presenter goes over a new R package called ‘iML.’ This package has a lot of power when explaining global and local feature importance. These explanations are critical, especially in the health field and if your under GDPR regulations. Now, with the combination of Shapley, LIME, and partial dependence plots, you can figure out how the model works and why.
I think we’ll see a lot of innovation in the ‘model interpretation’ space going forward.
Posts
Matrix Factorization for Missing Values
I stumbled across an interested reddit post about using matrix factorization (MF) for imputing missing values. The original poster was trying to solve a complex time series that had missing values. The solution was to use matrix factorization to impute those missing values.
Since I never heard of that application before, I got curious and searched the web for information. I came across this post using matrix factorization and Python to impute missing values.
Posts
Why (most) Twitter Bots Suck
I’m going to be the first to admit that I use Python to send out Tweets to my followers. I have a few scripts that parse RSS feeds and do retweets on an hourly basis. They work fine but they do get ‘gamed’ occasionally. That’s the problem with automation, isn’t it? Getting gamed can cause all kinds of havoc for your brand and reputation, so you have to be careful. Has this happened to me?
Posts
MLI Using LIME Framework
I found this talk to be fascinating. I’ve been a big fan of LIME but never really understood the details of how it works under the hood. I understood that it works on an observation by observation basis but I never knew that it permutates data, tests against the black box model, and then builds a simple linear model to explain it. Really cool. My notes are below the video.
Posts
Exploring H2O.ai
A few years ago RapidMiner incorporated a fantastic open source library from H2O.ai. That gave the platform Deep Learning, GLM, and a GBT algos, something they were lacking for a long time. If you were to look at my usage statistics, I’d bet you’d see that the Deep Learning and GLM algos are my favorites.
Just late last year H20.ai released their driverless.ai platform, an automated modeling platform that can scale easily to GPUs.
Posts
Beta Testing an Instagram Hashtag Tool
Continuing the stream of consciousness from my Working with Instgram API, JSONPath, and RapidMiner post, I started beta testing a new and improved Instagram Hashtag Tool. I’ve even opened it up to a few beta testers (ping me if you want to try it). It uses a RapidMiner Server on the backend to watch a Dropbox folder. Once you put a text file into the ‘In’ folder, it triggers a process and spits back a spreadsheet in the ‘Out’ folder.
Posts
Guide to Getting Started in Data Science
This is the forward to an updated ultimate guide on getting started in data science. I wanted to write a set of ‘getting started’ posts to share with readers on how I became a data scientist at RapidMiner. How I went from a Civil Engineer with an MBA to working for an amazing startup. Granted, I’m not a classically trained Data Scientist, I hardly knew how to code in the beginning but with the right tools and attitude, you can ‘huff’ your way into this field.
Posts
My Chinese Big Brother - Part 2
Coming on the heels of “I told you so,” China is using facial recognition to make sure you’re a good Chinese citizen.
Authorities in Shenzhen, China, have set up artificial intelligence-powered CCTV cameras to scan the faces of those who jaywalk at major intersections and display their identities on large LED screens for all to see.
If that isn’t punishment enough, plans are now in place to link the current system with cellular technology, so offenders will also be sent a text message with a fine as soon as they are caught crossing the road against traffic lights.
Posts
Data Science & Machine Learning
When I first self-taught myself ‘data science,’ there wasn’t a lot on the Internet to help me. I spent years cobbling information together reading what I could find about it. Now, there’s a plethora of Data Science and Machine Learning education available. There’s forums, open source libraries and much much more. Most of it is free and damn good. There’s no better time for a non data scientist or machine learning wannabe to learn about it, if you want to put in the time in.
Posts
Build a Machine Learning Framework
Great article by Florian Cäsar on how his team developed a new machine learning framework. From scratch. In 491 steps!
He summarizes the entire process up in this great quote:
| *From images, text files, or your cat videos, bits are fed to the
data pipeline that transforms them into usable data chunks and in
turn to data sets,*
| *which are then fed in small pieces to a trainer that manages all
the training and passes it right on to the underlying neural
network,*
| *which consists of many underlying neural network layers connected
through an arbitrarily linear or funky architecture,*
| *which consist of many underlying neurons that form the smallest
computational unit and are nudged in the right direction according
to the trainer’s optimiser,*
| *which takes the network and the transient training data in the
shape of layer buffers, marks the parameters it can improve, runs
every layer, and calculates a “how well did we do” score based on
the calculated and correct answers from the supplied small pieces
of the given dataset according to the optimiser’s settings, *
| *which computes the gradient of every parameter with respect to
the score and then nudges the individual neurons correspondingly,*
| *which then is run again and again until the optimiser reports
results that are good enough as set in a rich criteria and hook
system,*
| *which is based on global and local nested
parameter-identifier-registries that contain the shared parameters
and distribute them safely to all workers*
| *which are the actual workhorses of the training process that do
as their operator says using individual and separate mathematical
backends, *
| *which use the layer-defined placeholder computation graphs and
put in the raw data and then execute it on their computational
backend,*
| *which are all also managed by the operator that distributes the
worker’s work as needed and configured and also functions as a
coordinator to the owning trainer,*
| *which connects the network, the optimiser, the operator, the
initialisers, *
| *which tell the trainer with which distribution to initialise what
parameters, which work similar to hooks that act as a bridge
between them all and communicate with external things using the
Sigma environment,*
| *which is the container and laid-back manager to everything that
also supplies and runs these external things called monitors, *
| *which can be truly anything that makes us of the training data
and*
| *which finally display the learned funny cat image*
| *… from the hooks from the workers from their operator from
its assigned network from its dozens of layers from its millions
of individual neurons derived from some data records from data
chunks from data sets from data extractors.
Posts
How StockTwits Uses Machine Learning to Make Better Products
Fascinating behind the scenes interview of StockTwit’s Senior Data Scientist Garrett Hoffman. He shares great tidbits on how StockTwits uses machine learning for sentiment analysis. I’ve summarized the highlights below:
Idea generation is a huge barrier for active trading Next gen of traders uses social media to make decisions Garrett solves data problems and builds features for the StockTwits platform This includes: production data science, product analytics, and insights research Understanding social dynamics makes for a better user experience Focus is to understand social dynamics of StockTwits (ST) community Focuses on what’s happening inside the ST community ST’s market sentiment model helps users with decision making Users ’tag’ content for bullish or bearish classes Only 20 to 30% of content is tagged Using ST’s market sentiment model increases coverage to 100% For Data Science work, Python Stack is used Use: Numpy, SciPy, Pandas, Scikit-Learn Jupyter Notebooks for research and prototyping Flask for API deployment For Deep Learning, uses Tensorflow with AWS EC2 instances Can spin up GPU’s as needed Deep Learning methods used are Recurrent Neural Nets, Word2Vec, and Autoencoders Stays abreast of new machine learning techniques from blogs, conferences and Twitter Follows Twitter accounts from Google, Spotify, Apple, and small tech companies One area ST wants to improve on is DevOps around Data Science Bridge the gap between research/prototype phase and embedding it into tech stack for deployment Misconception that complex solutions are best Complexity ONLY ok if it leads to deeper insight Simple solutions are best Future long-term ideas: use AI around natural language
Posts
Fraud Analytics in RapidMiner
This is a great video presentation on Fraud Analytics use case with RapidMiner. See my notes below.
##Some key concepts
More complex model, the lower the training error but higher test error. Simple models are better, try explaining them to children. Data Scientists understand the technical aspect, need to communicate results with analysts. Sell results to businesses. Tie $ to the results. Speak same language with business. Map performance metrics to business related figures.
Posts
AlphaGO Zero learns on its own
The news dropped that Google’s new implementation of AlphaGo, called AlphaGO Zero, was able to learn completely on its own. No training set was first used, rather it built it’s own training set as it played against the older AlphaGO.
Earlier versions of AlphaGo were taught to play the game using two methods. In the first, called supervised learning, researchers fed the program 100,000 top amateur Go games and taught it to imitate what it saw.
Posts
Introduction to Deep Learning
This is a great introduction to Deep Learning. I know I learned a few things from Phillip.
Some key concepts RapidMiner now can do GPU deep learning. Supports NVIDIA. Easy: Using already loaded Neural Net operators. Harder: Using H20.ai Deep Learning operator. Hardest: Using Keras with RapidMiner. Keras requires more complex setup with RapidMiner. CNN, RNN, LSTM, etc are now available via RapidMiner GUI. Keras supports Tensorflow, CNTK, and Theano. Need Python v3.
Posts
Orange 3 is impressive
I’ve been keeping a lazy eye on Orange over the years and it’s (fairly) recent update has made it quite an impressive contender in the Data Science visual platform space. While it’s not RapidMiner, it does have a lot of great things going for it. First, it’s entire core was rewritten to tightly integrate with Scikit-Learn and Python. It has a decent time series ‘add-on’ which comes stock with ARIMA. It has a really good Text Processing ‘add-on’ that gives the user more finer control that RapidMiner’s and it has a great GEO Map natively.
Posts
Weaponizing AI
File this under “no shit Sherlock,” but hackers are already weaponizing machine learning.
The AI, named SNAP_R, sent simulated spear-phishing tweets to over 800 users at a rate of 6.75 tweets per minute, luring 275 victims. By contrast, Forbes staff writer Thomas Fox-Brewster, who participated in the experiment, was only able to pump out 1.075 tweets a minute, making just 129 attempts and luring in just 49 users. via Gizmodo
Posts
Product Qualified Lead Model
One of the big corporate strategy things I worked on was developing and putting into production a PQL model. It was essentially a propensity to buy model that analyzed usage patterns on the RapidMiner software platform and bucketed new downloaders into those that were likely to buy or not buy. It was incredibly successful and helped the sales team focus on thier leads better.
My former colleague Tom shares his thoughts on it since it’s been in production for over a year now.
Posts
What Works; What Doesn't Work
An important lesson I’ve learned while working at a Startup is to do more of what works and jettison what doesn’t work, quickly. That’s the way to success, the rest is just noise and a waste of time. This lesson can be applied to everything in life.
Data is your friend We generate data all the time, whether it’s captured in a database or spreadsheet, just by being alive you throw of data points.
Posts
Is it Possible to Automate Data Science?
A few months ago I read about a programmer that automated his job down to the point where the coffee machine would make him lattes! Despite the ethical quandary, I thought it was pretty cool to automate your job with scripts. Then I wondered, was it possible to automate data science? Or at least parts of it? This general question proved to be a rabbit hole of exploration.
StackExchange has an ongoing discussion into another programmer’s automation of his tasks.
Posts
Millennials can't catch a break!
This is just nuts. Millennials just can’t seem to catch a break. Now AI is coming for their jobs.
Research released by Gallup on Thursday indicates a collision between technology and “business as usual” is coming soon, and the fallout will be ugly, especially for Millennials. Automation and artificial intelligence (AI) are among the most disruptive forces descending upon the workplace, says the Gallup report, and 37% of Millennials “are at high risk of having their job replaced by automation, compared with 32% of those in the two older generations.
Posts
Process Mining
Let’s talk Process Mining and Engineering. Why? Because it’s the silver bullet that many Engineering firms are looking for but have never found. Let me explain.
I’ve spent years working at small and large Engineering firms and they hardly made any money. Engineering firms are notorious for making 1 to 3% net margin at the end of the day. The joke was that the only reason these firms exist is to keep people employed!
Posts
Big Data and Infrastructure
I have a daily downtime routine. Every evening I set aside about a hour and think. I sit or walk around the house and ruminate about all sorts of random things. Sometimes it’s with a glass of wine and more often it’s with a cup of black tea and milk. Sometimes my mind wanders to what I did that day or what I didn’t finish. Other times I get inspired to write a new blog post or create a new tutorial.
Posts
Machine Learning on a Raspberry Pi
It looks like Google is catching up to the idea of machine learning on a Raspberry Pi! Someone put RapidMiner on a Pi back in 2013 but it was slow because the Pi was underpowered.
The Pi has been a great thin client and a small, but capable server. I’ve used it for my Personal Weather Station project and as an FTP server. Based on the news, things are about to get interesting for both Google and Raspberry Pi.
Posts
My Chinese Big Brother
One of my Asian friends recently posted a link to a terrifying use of Machine Learning. This is what I call the “dark side” of this field, the use of machine learning by a government to make you behave a certain way.
1984’s Big Brother China is building its own version of 1984’s Big Brother, a massive scoring system that’s probably a large scale classification algorithm most likely sitting on top of a Big Data structure like Hadoop.
Posts
Latest Writings Elsewhere
October 2016 Just a quick list of the content I’ve created some place other than this blog. This current list is 100% RapidMiner related but I’d like to branch out into guest posting. If any of my readers would like a guest post on their blog or site from me, then contact me.
Tips and Tricks: How to use Python and R with Rapidminer Tips and Tricks: Different Ways to Join Data How to Use Data Science to Predict Qualifed Leads December 2018 I’m happy to announce my very first article went live on the H2O.
Posts
Prescriptive Analytics and My Heart
The world right now is awash in Predictive Analytics, the mystery of Big Data, and the rise of the glorious and magical Data Scientist. Most of the time we hear these buzz words in relation to some marketing campaign, election, or credit score application process, but what about applying these tools and people to a project that can benefit the welfare of humanity?
Well, one group of data scientists and a healthcare provider in Washington State are doing just that.
Posts
Big Data's Dirty Little Secret
Do you know what you’re doing with your Big Data? Based on my experience, only a small group of companies really do.
There’s just one little problem, as Intel cloud platforms group vice president and general manager Jason Waxman, told investors in a webcast last week.
“This is the dirty little secret about big data: No one actually knows what to do with it,” said Waxman, as ComputerWorld reported. Link (emphasis mine)
Posts
Churn Models, Cable TV, and My Wife
Churn models are not new in the analytics world, they’re heavily used by mobile telcos and other corporations that want to keep their loyal customers happy, and bring back customers from the brink of “about to churn.” In some cases, these models will help classify a group of customers that are such a pain in the butt to keep, it’s better to let them go.
Makes business sense, right? Keep the best customers happy, group the customers that are about to leave into the ones you can save or not.
Posts
Betting on RapidMiner in a Big Way
This past Friday I resigned from my position as a Civil Engineering Manager at SYSTRA, my employer of the last 6+ years. I did this because an opportunity of a lifetime knocked on my door, an opportunity that will give me a chance to pursue my passion in an exciting and growing field. In short, an opportunity to follow my dreams.
I've accepted a position as a senior consultant at Rapidminer, in their Boston headquarters, and I couldn't be more excited about this.
Posts
The Power of Tinkering
I finally got my Personal Weather Station (PWS) to upload current weather data to Wunderground1 last night. You have no idea how happy this made me, considering I started this project over a year ago but then got interrupted with “life.”
I dedicated my first Raspberry Pi to Bitcoin mining, so I needed a second one (Pi’s are addicting, and cheap) to finally get my PWS up and running on the Internet.
Posts
Raspberry Pi Tutorials
Below you’ll find a merged set of posts on my Raspberry Pi Tutorials and fun links I found. The Raspberry Pi is a cheap credit card sized computer that helps kid ‘hack’ and learn computing. It’s a great way to build lots of fun projects too. I hope to add to this list as time goes on but I thought my readers would like to see an updated list from my archives.
Posts
What is the WhiBo plugin for Rapidminer?
Today’s guest post about an awesome new plugin for Rapidminer, is from Milan Vukicevic. Although I walked in at the very end of his presentation at RCOMM 2010, I sat down with Milan on my last day and he gave me a personal demo of WhiBo. The applications I see from this plugin, as it relates to the financial world, is its ability to build algorithms on new data, find patterns, and tweak parameters that were never possible before.
Posts
Using the SVM RBF Kernel
Wow, I’m happy to announce that today is the first of a two part guest post series. Today’s guest post is by Marin Matijas, who gave a presentation at RCOMM 2010 about Short Term Load Forecasting using Support Vector Machines (SVM). I asked Marin to elaborate a little about his use of the Radial Bias Function (RBF) in Rapidminer’s SVM operator and here’s what he had to say! I did edit the post a bit for readability.
Posts
OECD Factbook eXplorer
I think I’m going to start a new weekly data visualization feature on Neural Market Trends, starting with the OECD. I love web sites that take complex, and sometimes boring data, and displays them in a visually pleasing way. This week’s data visualization feature is the OECD Factbook eXplorer which is maintained by the Organization for Economic Co-operation and Development.
The OECD is an organization that compiles all kinds of data for many countries and then allows you to display that onto a flash based map.
Posts
Humans No Match for Go Bot Overlords?
Since my time away from blogging, I’ve found a new distraction addiction to keep my overactive mind busy. It’s called Go, an ancient strategy game that’s really big in the Asian world but catching on here. Its a fun game and I routinely get beat online by 6 year old children from all over the world.
So, what does Go have in common with neural nets and AI? Quite a lot actually because programmers are working frantically to build a Go program that can beat humans.
Posts
SVM Kernel Application
I promised my readers that I would post about YALE/RapidMiner’s LibSVM operator over a month ago. Unfortunately life had gotten in the way and I’m resorting to a multiple part series to just get the information out to you, so bear with me over the course of the next few days (or weeks) as I write about this exciting, powerful, and complicated learner.
First off, I use the LibSVM operator in YALE 3.
Posts
Guide to AI Transformation
Thirteen years ago I made a simple post about building neural net models. Since then, the field of applied machine learning and data science has changed a great deal. Over a single decade new tools and the open source movement have altered how companies do ‘AI.’ They moved from data mining data in databases to building data lakes and clusters. Now, small to large companies are seeking ways to harness these tools under the umbrella of ‘AI.
Posts
Automated Trading
In my old blog, Digital Breakfast, I posted a few times about my desire to build an Automated Trading System (ATS) using Excel. I figured I’d build it in Excel since I know that software the best and conveniently enough, Interactive Brokers offers an API for Excel. Today, I located my old real time ETF Trend Signal system, which is based on statistical performance measures to generate trend signals, and decided to begin additional back end development on it.
Posts
Wall Street Using AI To Trade
I heard this first on Bloomberg Radio and then found the article. It’s about the ever increasing use of data mining and AI in the financial markets.
In his cubicle overlooking the trading floor, Kearns, 44, consults with Lehman Brothers traders as Ph.D.s tap away at secret software. The programs they’re writing are designed to sift through billions of trades and spot subtle patterns in world markets.
Kearns, a computer scientist who has a doctorate from Harvard University, says the code is part of a dream he’s been chasing for more than two decades: to imbue computers with artificial intelligence, or AI.