Coding RapidMiner in Python

Back in middle school we learned about log tables. We learned how to look them up in a table, interpolate them, and then use the result in our equations. Later on they allowed us to use calculators, which made our lives easier and faster.

Fast forward many years to a Sunday morning this October. I was at my dining table with my laptop open, fooling with Pandas and iPython Notebooks (aka Juypter). I wanted to see how long it would take me to transform a customer data set (i.e. ETL and generate new attributes) and then do a simple K-nn cross validation. This is a routine and fast task in RapidMiner, but I wanted to code it the hard way and see how long it would take me.

Mind you, I’m not a python coder. I learned how to cobble together scripts when I needed them and I’m a novice at best. But with a bit of coaching from my friend, I was able to cobble together this routine process in about 3 hours. I did have a few hiccups though. I had to alter my thought process when I was using Pandas/Scikit-Learn but I perservered.

Granted, a seasoned python coder could do this in about 30 minutes, but it was a big accomplishment for me.

This little exercise did teach me a few things. It taught me that Pandas and Scikit-learn isn’t hard and that I could do it. It taught me that this old dog can learn new tricks, a theory I like to confirm from time to time. It taught me that RapidMiner saves you a ridiculous amount of time in model building. Finally, it taught me that a data scientist, with coding skills, can easily make the transition to RapidMiner. In fact, I think there is a bigger benefit going from a coding environment to a code free environment. Much like learning log tables first and then using a calculator.



Date
October 28, 2015