Data Science

Machine Learning in Excel

I wrote an article about Microsoft integrating Python in Excel. I love the idea and I think it’s going finally get machine learning into the hands of Excel users. Check out this tutorial on how to use SciKit-Learn in Excel from Anaconda. The new Python in Excel integration by Microsoft and Anaconda grants access to the entire Python ecosystem for data science and machine learning. Thanks to its direct connection to Anaconda Distribution, we can leverage built-in functionality with packages like NumPy, pandas, Seaborn, and scikit-learn directly within our Excel workbooks. ...

Our quest for robust time series forecasting at scale

An older link from April 2017 that I believe became AutoGluon. AutoGluon is fantastic for time series and a host of other AutoML use cases. So, what models do we include in our ensemble? Pretty much any reasonable model we can get our hands on! Specific models include variants on many well-known approaches, such as the Bass Diffusion Model, the Theta Model, Logistic models, bsts, STL, Holt-Winters and other Exponential Smoothing models, Seasonal and other other ARIMA-based models, Year-over-Year growth models, custom models, and more. Indeed, model diversity is a specific objective in creating our ensemble as it is essential to the success of model averaging. Our aspiration is that the models will produce something akin to a representative and not overly repetitive covering of the space of reasonable models. Further, by using well-known, well-vetted models, we attempt to create not merely a “wisdom of crowds” but a “wisdom of crowds of experts” scenario, in the spirit of Mannes [6]. ...

LIBSVM - A Library For Support Vector Machines

It looks like the old LibSVM library is still kicking around and being updated. Last release was a bug fix on July 2023. Version 3.32 released on July 9, 2023. We fix some minor bugs. Plus changes how to use in Python: > pip install -U libsvm-official The python directory is re-organized so >>> from libsvm.svmutil import * instead of >>> from svmutil import * Via Webpage

Regular Expressions for Data Scientists in Python

An oldie but goodie. Regular Expressions are a must for anybody using Python, doing Data Science, or just ETL’ing data. Regular expressions (regex) are essentially text patterns that you can use to automate searching through and replacing elements within strings of text. This can make cleaning and working with text-based data sets much easier, saving you the trouble of having to search through mountains of text by hand. Via Dataquest ...

Download 243 Free Ebooks on Design Data Software

Good reference materials, definitely worth going through this list. If you head over to this page, you can access 243 free ebooks covering a range of different topics. Below, we’ve divided the books into sections (and provided links to them), indicated the number of books in each section, and listed a few attractive/representative titles. - via Open Culture

Scripting in RapidMiner Part I - Macros

A great Medium article from my former colleague Martin Liebig (Schmitz). In this article I would like to show how you can access and change RapidMiner variables (called Macros and are always strings). We will opt for a Groovy script. The advantage of Groovy over the other methods is, that you can access RapidMiner’s objects directly. With the other options you transfer data over into their respective data frames, transform them, and pass them back. Groovy is way more direct in this sense. - Via Medium ...

Sentiment Analysis of Tweets With Python, NLTK, Word2vec & Scikit-learn - Marcin Zabłocki blog

A good tutorial on how do advanced sentiment analysis. The goal of this project was to predict sentiment for the given Twitter post using Python. Sentiment analysis can predict many different emotions attached to the text, but in this report only 3 major were considered: positive, negative and neutral. The training dataset was small (just over 5900 examples) and the data within it was highly skewed, which greatly impacted on the difficulty of building good classifier. After creating a lot of custom features, utilizing both bag-of-words and word2vec representations and applying the Extreme Gradient Boosting algorithm, the classification accuracy at level of 58% was achieved. - via Zablo.net ...

Data Science and Machine Learning Roundup

In this week’s Data Science and Machine Learning link round-up, we’ll share some links that caught our (my) eye during the past few weeks.

Python to the Rescue

I’m evaluating whether or not I should move this blog to another CMS platform so I can start building a community around it like it was before. Right now this blog runs on Hugo and AWS Amplify and it’s freaking awesome. I push new posts to GitHub, AWS pulls the changes and rebuilds the site, and then I can look at make sure it looks fine before I merge into master. ...

Restarting the Site

I shut Neural Market Trends down on the last day of August thinking it was going to be for good. Things have changed on my end and I’m thinking of restarting this site. I’m thinking of moving back to my roots and building a community. Why? Because of something I saw a few days ago. I was on a work call the other day and a prospect was sharing his screen. I noticed he had a bookmark to the popular HackerNoon site. ...