This is a great video presentation on Fraud Analytics use case with RapidMiner. See my notes below.

##Some key concepts

  • More complex model, the lower the training error but higher test error.
  • Simple models are better, try explaining them to children.
  • Data Scientists understand the technical aspect, need to communicate results with analysts.
  • Sell results to businesses. Tie $ to the results.
  • Speak same language with business. Map performance metrics to business related figures.
  • AUC and recall doesn’t necessarily mean $ to the business, show how.
  • A/B testing method widely used in Marketing. Also consider a “do nothing” model and compare with implementing data science solution.
  • Don’t fear sharing best practices and ideas with similar businesses.
  • Fraud model follows traditional validation method. 80% Training and 20% as Holdout.
  • Both training/holdout sets taken across same time period.
  • Handy trick, use sum of transactions as example weights. (this is cool)
  • Apply $ value to your true/false positive/negatives.
  • Compare with Default model (no model).
  • Generate a money plot!

##Some questions

  • How does this relate to regression?
  • If a simple model not good enough, how do you sell a complex one like Deep Learning?
  • Is it better to have the Data Science team be embedded in the Business Unit or as a separate team?
  • How do you try to explain the uncertainty of prediction intervals to business stakeholders?
  • How do you account for seasonal drift?
  • The model will drift overtime, should the model be updated or retrained over time?
  • Do you build a model to optimize business results or is it a byproduct of the prediction?