Scripting in RapidMiner Part I - Macros

A great Medium article from my former colleague Martin Liebig (Schmitz). In this article I would like to show how you can access and change RapidMiner variables (called Macros and are always strings). We will opt for a Groovy script. The advantage of Groovy over the other methods is, that you can access RapidMiner’s objects directly. With the other options you transfer data over into their respective data frames, transform them, and pass them back. Groovy is way more direct in this sense. - Via Medium ...

RapidMiner Tutorials

It would appear that making a rolling list of all my old and newer RapidMiner Tutorials would be helpful for readers. It should be noted that these tutorials and videos were made for differnet versions of RapidMiner (version 5 to 7) and may look a bit old and dated BUT they are still relevant to this day. The functionality of RapidMiner is still the same but the internal organization and look will be different. Most of the operators are the same name but things like the the Time Series plugin have been fully incorporated into RapidMiner Studio now. ...

Getting Started in Data Science Part 2

I’m finally getting around to writing Part 2 of Getting Started in Data Science. The first part can be found here. I made suggestions for university students interested in the field of Data Science. I even made a video about it too. Pick Two, Master One Pick two computer languages and become proficient in one and a master at the other one. Or, pick a platform like H2O-Flow or RapidMiner and a language. Become a master at one but proficient in the other. This way you can set yourself apart from other students or applicants. ...

Why I Left RapidMiner

For those that are wondering why I left RapidMiner, my dream job, there are no gory details to share. The simple reason is I got burnt out. My time at RapidMiner was some of the best learning and growth years in my entire professional career. I solved problems, made presentations to C-suite people, and worked with some of the best talent. The flipside of this was that it wasn’t easy and it sure as hell wasn’t a smooth ride. I worked through some of the most tumultuous years at RapidMiner. We had 3 years of management changes and radical 180 degree strategy changes while I was there. All this ‘chaos’ eventually took it’s toll on me. ...

Python, RapidMiner, and Carriage Returns

I’ve been working on some Python code for a RapidMiner process. What I want to do is simplify my Instagram Hashtag Tool and make it go faster. Part of that work is extracting the Instagram comments for text processing. I ran into utter hell trying to export those comments into a CSV file that RapidMiner could read. It was exporting the data just fine but wrapping the comment into carriage returns. For some strange reason, RapidMiner can not read carriage returned data in a cell. It can only read the first line. Luckily with the help of some users I managed to work around and find a solution on my end. DO all the carriage return striping on my end before export. ...

Thomas Ott

Beta Testing an Instagram Hashtag Tool

Continuing the stream of consciousness from my Working with Instgram API, JSONPath, and RapidMiner post, I started beta testing a new and improved Instagram Hashtag Tool. I’ve even opened it up to a few beta testers (ping me if you want to try it). It uses a RapidMiner Server on the backend to watch a Dropbox folder. Once you put a text file into the ‘In’ folder, it triggers a process and spits back a spreadsheet in the ‘Out’ folder. Pretty straight forward. ...

Word Clouds in RapidMiner and R

There was a question from the RapidMiner Community on how to make a word clouds in RapidMiner using R. It’s really easy. First, you’ll need to make sure you have the Execute R extension installed and configured, then you need to download the “wordcloud” and “RColorBrewer” packages from R Cran Repository. ...

Thomas Ott

RapidMiner's New Time Series Extension

I’m really liking the overhaul of the old Value Series extension. RapidMiner has been building a new Time Series extension that started with the great ARIMA operator. Now they’re adding new Windowing and Sliding Window operators. My notes below the video. Notes: New time series datasets added to updated Time Series extension (gas station prices) Plus three more sample data templates New Windowing operator, easier to use parameters. New Indices parameter / New Horizon Width and Horizon Offset New Process Windows operator. It’s like a Loop for Time Series data - NICE! New Forecast Validation. Appears to be a redo the old Sliding Window Validation operator In the Testing side of the Forecast Validation operator, you don’t need to use Apply Model Difference between Forecast Validation (FV) and Cross Validation (CV) operator is that the model delivered by FV is always the LAST model that was trained, not like CV that trains over the entire data in the final iteration

Learn RapidMiner Livestream Volume 4

I completely fail in this live stream and it’s hilarious. Before I crash and burn, I do talk about how to upsample, downsample, and balance data! Come check it out.

Thomas Ott

Learn RapidMiner Livestream Volume 1

I had my first YouTube LiveStream on how to use RapidMiner. It’s about 48 minutes long and I do a GUI overview and do some text mining. I even answer a few questions on using PHP and RapidMiner. The audio starts about 3 minutes in. I’m scheduling the next one tenatively for Friday 5/11 at 8 AM EDT (New York time). ...

Thomas Ott