Labeling Training Data Correctly

When you’re dealing with a classification problem in machine learning, good labeled data is crucial. The more time you spend labeling training data correctly, the better. This is because your model’s performance and deployment will depend on it. Always remember that garbage in means garbage out.

Thoughts on labeling data

I recently listened to a great O’Reilly podcast on this subject. They interviewed Lukas Biewald, Chief Data Scientist and Founder of CrowdFlower. CrowdFlower provides their clients with top notch labeled training data for various machine learning tasks, and they’re busy!
 
The few bits that caught my ear were how much of the training data is used in deep learning. They’re also seeing more image labeled data for self driving cars.
 
The best part of the interview as Lukas’s discussion on using a Raspberry Pi with Tensor Flow! How cool is that?

The Podcast

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: