EuroPython 2018: Trust me, I'm a Data Scientist - ethics for builders of data-based applications
Sarah Diot-Girard is an an engineer working on Machine Learning and is focused on finding solutions to engineering problems using Data Science.
Data collection
Data collection always learns for the past, and learning from the past leads to biased data. Imagine learning about suitability from student data – you'd think men are much more suitable for technical studies than women, or let social class determine solutions.
During data collections, you also undergo the sampling bias – your dataset may be even more biased than you guess.
Data encoding
You can use bag of words or word embedding encoding. Bag of words encoding checks basically only if strings contain other strings, while word embedding (word2vec) checks for overlap between words – which can even learn analogies! Cool! But learned analogies suffer the same issue we met earlier: it learns biases from the past.
Remember: Don't hoard data! Work on anonymised data whenever you can! Delete data you don't need.
Fairness
What does it mean for an algorithm to be fair?
For each:
- Score Equality
- Predictive parity
- Error rate balance
- Equal acceptance rate (reflect diversity in starting set)
try to give the same chances based on the same scores disregarding irrelevant outer measures.
All those criteria seem fair and reasonable, but you can't have them all. Inherent tradeoffs are always there – your job is to make them explicit, and tell people, with regard to which features the algorithm is fair.
Deep learning
Who needs interpretability when you have deep learning? Well. You need to understand the metrics you apply to arrive at your conclusion. Otherwise you can't fix issues reliably. And it violates ethics – you need to make sure your algorithms are reliable and fair. Bonus points: you are going against GDPR, since it gives people the right for understandable/explainable decisions and processing, such as by decision trees. There are cool libraries like ELI5, which exposes which words/tokens had a bigger or smaller impact during the training.
After building a model: look at surrogate models, and do a sensitivity analysis to confirm your model validity.
Dealing with minority classes is really tricky. Evaluate them on a separate dataset instead of overfitting in the same dataset, instead on relying on preprocessing or the same.
Human biases
Of course, biases aren't restricted to the data, we carry them ourselves. For example, falling for causality when there is only correlation.
In production
Beware of feedback loops, where you cause your prediction to come true and reinforce itself. Be also wary of reaching too wide influence without understanding where your problems are.
Data is not neutral, algorithms are not objective, and data scientists are not exempt from bias.