Is DataRobot in Trouble?

The news coming out of DataRobot lately has given me pause. Is a power struggle happening? Are sales flagging? I’m not sure what it is but there are a lot of interesting LinkedIn and Glassdoor Reviews out there for sure.

Before we dive in, here’s my big disclaimer: I work for H2O.ai, this post is my personal observation of what’s going on in an industry that I’m a part of. My views don’t reflect that of my employers or any other employees there.

The big news is that DataRobot laid off 7% of its staff after a very aggressive hiring stint.

…artificial intelligence company DataRobot cut more than 7,500 jobs globally from April 1 to May 16, according to employment tracker Layoffs.fyi. [via Bloomberg]

It gets better, there have been Board departures and former employees speaking out.

Former Board member Christopher Lynch posts:

Shareholder Jacob S laments the recent layoffs that allegedly happened after the Sales Kick-off and the President’s Club celebration. Talk about bad optics! That’s right up there with the “let them eat cake.”

I did do a search for any timestamped photos but I couldn’t find any at this time. My guess? They’ve all been scrubbed.

Of course, the Glassdoor Reviews are all over the place, from praise to downright scathing.

That’s to be expected as the firm tries to manage the message out there but I offer snapshots of two negative reviews. One from before the layoffs and one during (if the April 1 to May 16th layoff dates are correct).

I get it, selling AI software is hard. I’m in Sales, just like that reviewer above. Selling a complex product that is only understood by highly trained people, even if you abstract a lot of it, is so damn hard.

To sell a complex product you need a very focused organization, with clear communication, and a stellar go-to-market (GTM) strategy to sell effectively.

It sounds like there are organizational and communication problems inside DataRobot if you were to believe the review below.

Growing pains

There are a lot of good things about the DataRobot product from what I’ve seen, especially their UI, but after raising so much money and acquiring a bunch of other companies, you would think they had their “stuff” better together.

For the first time since I can remember, we got to peer in behind the curtain of a secretive organization and see what was happening.

What do I see? I see massive growing pains at DataRobot. I see the pain of being under pressure to perform. I see a flat startup organizational structure being forced into the traditional hierarchy structure.

I want to wish them luck, they were the first company to create an AutoML product and they built a unicorn! But unicorns are mythical creatures and money talks, and that’s all that matters in this market.

America Is Great

First off this is not a political post but I will come out and say it. Make America Great Again is a stupid slogan. I admit there are problems with the middle class but America is great. Period.

Why? Because of what I wrote two years ago. It’s as relevant today as it was back then.

Startup Land

Two years ago I wrote a LinkedIn article on a train ride home titled I believe in America.” I was trying to be cute, tying in a Godfather movie reference in there to make a catchy title. I had just spent a few days at the CIC building in Cambridge, MA, and was just utterly overwhelmed with inspiration. The CIC building is an incubator with every floor bursting at the seams with startups. Even reading what I wrote two years ago still gives me goosebumps.

The CIC boasts that it hosts the most startups in a single location, anywhere on the planet, and I believe them. Floor upon floor is filled with startups, incubators, talent, and raw brainpower the likes of which I’ve never seen. The hallway discussions I overheard, the mathematical proof scribbles on whiteboards, and the presentations I saw as I was rushing to a meeting made me feel like I was witnessing the future of America – and it’s exciting! The startups in that building epitomize the qualities that makes America great, and I was a part of it.

The reality is that a handful of these startups are going to make it. Even fewer will make it big but most will die. They’ll fade away or explode in a ball of fire. Those are the breaks but isn’t that America? The freedom to take a risk and show the world why you’re idea is awesome. This is why America will remain great today and continue to be so tomorrow.

Hug an Engineer and punch a Banker

Howard understands this too and his recent blog post caught my eye.

America is greater than ever in 2016. We live longer, we have awesome drugs, Shake Shack, sushi, uber, free trading, open borders, the social web, Amazon Prime, Google maps, an eye on cancer and Alzheimer’s, and engineers up the ying yang.

You won’t believe how important my iPhone is and how much I use Uber and Google Maps. Entrepreneurs build empires on the back of Apps. Big Data and Data Science will continue to move into primetime and affect everything it touches. Blockchain and Fintech will revolutionize business as we know it. The world will continue to connect more, be more open, and be hyper-aware of bullshit.

Yes, I’m bullish on America and Tech.

It’s no longer just ok to hug an engineer and punch a banker. It’s hug an engineer, hoard designers, suck up to centimillionaires and punch a banker’.

You got that right Howard!

How StockTwits Uses Machine Learning

Fascinating behind the scenes interview of StockTwits Senior Data
Scientist Garrett Hoffman.


He shares great tidbits on how StockTwits uses machine learning for
sentiment analysis. I’ve summarized the highlights below:

  • Idea generation is a huge barrier for active trading
  • Next gen of traders uses social media to make decisions
  • Garrett solves data problems and builds features for the StockTwits
    platform
  • This includes: production data science, product analytics, and
    insights research
  • Understanding social dynamics makes for a better user experience
  • Focus is to understand social dynamics of StockTwits (ST) community
  • Focuses on what’s happening inside the ST community
  • ST’s market sentiment model helps users with decision making
  • Users ‘tag’ content for bullish or bearish classes
  • Only 20 to 30% of content is tagged
  • Using ST’s market sentiment model increases coverage to 100%
  • For Data Science work, Python Stack is used
  • Use: Numpy, SciPy, Pandas, Scikit-Learn
  • Jupyter Notebooks for research and prototyping
  • Flask for API deployment
  • For Deep Learning, uses Tensorflow with AWS EC2 instances
  • Can spin up GPU’s as needed
  • Deep Learning methods used are Recurrent Neural Nets, Word2Vec, and
    Autoencoders
  • Stays abreast of new machine learning techniques from blogs,
    conferences and Twitter
  • Follows Twitter accounts from Google, Spotify, Apple, and small tech
    companies
  • One area ST wants to improve on is DevOps around Data Science
  • Bridge the gap between research/prototype phase and embedding it into
    tech stack for deployment
  • Misconception that complex solutions are best
  • Complexity ONLY ok if it leads to deeper insight
  • Simple solutions are best
  • Future long-term ideas: use AI around natural language

Natural Language Processing First Steps: How Algorithms Understand Text | NVIDIA Developer Blog

Textual data sets are often very large, so we need to be conscious of speed. Therefore, we’ve considered some improvements that allow us to perform vectorization in parallel. We also considered some tradeoffs between interpretability, speed and memory usage. By applying machine learning to these vectors, we open up the field of NLP (Natural Language Processing). In addition, vectorization also allows us to apply similarity metrics to text, enabling full-text search and improved fuzzy matching applications.

Natural Language Processing First Steps: How Algorithms Understand Text | NVIDIA Developer Blog

Data Science and Machine Learning Roundup

First off is Julia Language for Data Engineering (Medium link). Author Logan Kilpatrick writes how to use Julia Language packages Dataframs.jl and CSV.jl to do some basic data engineering. He doesn’t stop there but shows you how to work with databases in Julia Language as well and shares a lot of links to videos and additional information.

Meta AI, which I believe might’ve been Facebook AI, has released a new open-source package called data2vec. Data2vec operates in the self-supervised area of machine learning. Self-supervised learning is explained as:

Self-supervision enables computers to learn about the world just by observing it and then figuring out the structure of images, speech, or text. Having machines that don’t need to be explicitly taught to classify images or understand spoken language is simply much more scalable. – via Meta AI.

Are you interested in running machine learning on small and low-powered chips? How about doing deep learning on tiny devices? That’s the goal of TinyML.

TinyML seeks to bring the power of deep learning to small microcontrollers and chips are cheap to produce, use low power, and can run on a battery. I really like this idea since I have several Raspberry Pi computers and a Jetson Nano.

This is a treasure trove of data science cheat sheets. Bookmark this Kaggle page, you will thank me for it.

In other news, it looks like Alteryx is trying to stay relevant by snapping up “holy shit are they still around” Big Data profiling company Trifacta.

Last but not least, how bad was Zillow’s Zestimate? It looks like it was way off the market and the company is shedding more than 20% of its workforce. OUCH! They incurred a $420 million dollar loss. More OUCH!

Isolation Forests in H2O.ai

A new feature has been added to H2O-3 open-source, isolation forests. I’ve always been a fan of understanding outliers and love using One-Class SVM’s as a method, but the isolation forests appear to be better in finding outliers, in most cases.

From the H2O.ai blog:

There are multiple approaches to an unsupervised anomaly detection problem that try to exploit the differences between the properties of common and unique observations. The idea behind the Isolation Forest is as follows.

We start by building multiple decision trees such that the trees isolate the observations in their leaves. Ideally, each leaf of the tree isolates exactly one observation from your data set. The trees are being split randomly. We assume that if one observation is similar to others in our data set, it will take more random splits to perfectly isolate this observation, as opposed to isolating an outlier.

For an outlier that has some feature values significantly different from the other observations, randomly finding the split isolating it should not be too hard. As we build multiple isolation trees, hence the isolation forest, for each observation we can calculate the average number of splits across all the trees that isolate the observation. The average number of splits is then used as a score, where the less splits the observation needs, the more likely it is to be anomalous.

While there are other methods of outlier detection like LOF (local outlier factor), it appears that Isolation Forests tend to be better than One-Class SVM’s in finding outliers.

See this handy image from the Scikit-Learn site:

Anomaly Detection Comparison
Anomaly Detection Comparison

Interesting indeed. I plan on using this new feature on some work I’m doing for customers.

Startup Funding

ICYMI the Startup markets are getting hotter in the Data Science space. Every time I turn around, some small company got millions of dollars in startup funding. It used to be a company with an algorithm or data science library but now it’s Data Science platforms.

These platforms are suddenly all the rage and many new entrants are racing to gain market and mind share.

The above image from FundersandFounders.com really captures a successful startup from inception to IPO. Most interesting for me is how the ownership “pie” is cut over time. If you’re the Founder, you first start out with a 50/50 share with your Co-Founder. Then you get some Seed money, say from an Angel Investor like Howard, which takes a small % of ownership.

Startup Growth

As the Startup grows and matures it should attract more VC money and the ownership pie changes. With every VC investor, you sell parts of your company. This is incredibly important if you want to maintain control of your company and should be carefully analyzed.

My personal opinion is that you can do all this without VC money but it will be harder and take a longer time. It could take decades and in this industry time is not your friend. The market is so hot that your competitors will fill your weaknesses in the market within a quarter or shorter.

So in essence you really need startup funding from VCs to be agile and build/keep your market share. Just keep an eye on those term sheets and make sure that the “pie” is big enough for everyone.

Labeling Training Data Correctly

When you’re dealing with a classification problem in machine learning, good labeled data is crucial. The more time you spend labeling training data correctly, the better. This is because your model’s performance and deployment will depend on it. Always remember that garbage in means garbage out.

Thoughts on labeling data

I recently listened to a great O’Reilly podcast on this subject. They interviewed Lukas Biewald, Chief Data Scientist and Founder of CrowdFlower. CrowdFlower provides their clients with top notch labeled training data for various machine learning tasks, and they’re busy!
 
The few bits that caught my ear were how much of the training data is used in deep learning. They’re also seeing more image labeled data for self driving cars.
 
The best part of the interview as Lukas’s discussion on using a Raspberry Pi with Tensor Flow! How cool is that?

The Podcast

%d bloggers like this: