Cathy O’Neil is without a doubt the most morally and ethically pragmatic thinker about technology on the planet. She doesn’t see crystal balls, hyperloops or cryogenic freezers like other tech wizards such as Ray Kurzweil or Elon Musk. She sees the world threatened by the very thing they all gush over: algorithms and data.
Everyone from Cedric Villani to Yuval Noah Harari has levelled praise at O’Neil for being one of the first popular thinkers looking at data and technology with searing criticism.
Her best seller Weapons of Math Destruction, released this year, allows the wider public an insight into how algorithms and data have the ability to work against us by creating systemic discrimination further threatening democracy, in areas ranging from recidivism in the justice system or even your chances of securing a loan. It’s a new world out there but certainly not a brave one, rather as O’Neil would probably put it, just plain immoral.
You work as a data scientist, how did you build up a case for this book? Was it just seeing the inefficacy of data over many years?
A lot of it was actually pure thought for me because if you have a historically biased data set and you trained a new algorithm to use that data set, it would just pick up the patterns. And, you know I worked around people who, for the most part, don’t get discriminated against, well-educated white and Asian men who are data scientists. The data science community is now very heterogeneous, so the people who are doing well in that community are well off, so we are building these algorithms but we are not subject to them, or to the extent that we are, we are benefitting.
That means we are less sensitive to the question – what happens when we are replicating past mistakes, to the extent that we think these are mistakes, and many of them are and sometimes that’s a consensus and sometimes that’s not.
Upon reading this book, I felt like a lot of it was background noise for me, I felt if we can just get over all of this morality and prejudice stuff then in 10-20 years it will just create a utopia. But maybe that is just being naïve?
I mean I’ve never heard anyone actually admit that so I appreciate it but it is an unspoken assumption that algorithms and technology will let us transcend issues of class, race and gender. It’s not working out that way at all so I am not an idealist.
“I worked around people who, for the most part, don’t get discriminated against, well-educated white and Asian men who are data scientists.”
Alarming society about inherent prejudice of data
But don’t you think they’re just teething problems?
It depends on how we address it. They could be, and that’s what I’m hoping, so the current pattern whereby we are using it as tools of oppression and further punishing the unlucky and rewarding the lucky will have a shift on that. But we will have to be demonstrably more interventionist than we are now, because at the moment we have this thing where we just think everything is good and it will work out because it’s technology, as if that is magical, and it’s not magical.
This is a speculative idea but it popped into my head when I was reading this book. What do you think about creating a unique set of technology to fight against the current inequality as a result of data?
I think it’s a good question, and it’s not ridiculous but the odds are low that it would work. If you think about the stewards of data they are very large, dominant and successful companies and historically speaking people with power don’t give that up to put themselves at a disadvantage. I don’t think they will spontaneously release that they have been taking advantage and that they should become more fair-minded.
It seems like people are reading a lot of books by people like Nick Bostrom, Yuval Noah Harari, Max Tegmark and Ray Kurzweil. I think you are the first person I’ve seen who is really pragmatic in your approach.
Yes, and in fact, I have very little patience for those people.
“At the moment we have this thing where we just think everything is good and it will work out because it’s technology as if that is magical, and it’s not magical.”
The hope that wasn’t technology
Ok, talk to me about that.
Harari is an exception, he’s not like the others. But Kurzweil needs help. He is dreaming a dream that is actually counterproductive to reality. For example, he is allowing Silicon Valley, which is essentially made up of young, successful white and Asian men, to imagine that they will never get old. And when you imagine that you will never get old you do not prepare for equality of life when you are old, and you don’t solve the problems that life would impose. So I feel like we have a very underdeveloped sense of what the problems are that we are solving; in Silicon Valley we are solving the very fucking few problems that those engineers at Facebook actually face.
I guess their response would be that if they can solve it for themselves they can solve it for everyone, meaning longevity.
But the average person is worried about healthcare, not about living forever. So solve this problem. It’s a distraction is what I’m saying, and it’s an obsession with the far future. The futurists think so much about the generations to follow that they stop thinking about the people who are already alive, already suffering and in need of help. Those are the people we should be addressing.
“Kurzweil needs help.”
So let’s take an example. What if someone like Barack Obama were to sit down and read this book, someone who holds a lot of power over our lives. What do you think he would say about it?
Actually, Barack Obama already did that. The White House was actually very proactive in this sense and they put out more than one Big Data report. The last one actually called for auditing of algorithms to make sure that there wasn’t any implicit, unintentional discrimination. The problem is that under the Trump era there is very little chance that will move forward. But I do have hope that it will move forward in the next 20 years because the evidence of harm is going to become more obvious and people will be demanding accountability.
I’m interested in countries that are only starting to build up their technology and data systems. What happens there? Can we get there before the same damage is done?
Yeah, I think about that a lot. I’m more afraid than excited at the concept of moving into other countries with big data because, in Europe, Europe has better privacy protection than the US, and for now the UK has European privacy laws. But most other countries have worse privacy protections and accountability for algorithms than the US does. So for example, in the US credit scores – the law for which isn’t applied widely enough – but it is applied too often to loan offers, there are anti-discrimination laws and accountability laws called ECO and FICO, but those laws don’t exist in other countries.
For example, it’s against the law to use race as a variable in your algorithm for credit scoring. Unfortunately, people get around it by using proxies for race, but let’s put that to one side. I’ve been getting emails from people in India asking if it is wrong to use caste as a variable in their credit score. To me, that seems very obviously wrong but I don’t think there is a law against it. There is very little data protection whatsoever in South America and what that means to me is it’s a very vulnerable population, so things could go wrong very quickly.
Not the great tech dream we were all hoping for?
A lot of the framework of how technology works today could even be put down to the architecture of society in the first place – how we make decisions and connect with each other. Would you agree with that?
To a large extent, the data scientists of today are like the monks of 2000 years ago, when they were literate but they didn’t show other people how to write. We have the same kind of secret language in data science where we aren’t letting people understand it but we still expect them to trust it and be afraid of it.
It’s like almost any religion where we have the leaders holding secrets, giving them power and authority. From my perspective as a mathematician, it is an abuse of this authority; it’s a mathematical authority, which I believe mathematicians deserve because it is trustworthy. But this isn’t math – this is human-developed opinion embedded in the mathematical code, which is really not math because there is no proof or theorem here. There could be evidence and science, but we have never been asked to produce that evidence. Instead we just ‘flash our badge’ and people trust us and blindly accept what we give them.
What example in the book surprised you the most when you were doing the research?
The thing that cut me up at night was the recidivism algorithms, partly because they are used by judges to decide on sentences for defendants, who are sentenced for longer if their scores are higher, and their scores are higher if they were raised in the ‘wrong’ neighbourhood i.e. poor or black. It’s completely crazy and I’m not sure if it was more disgusting to me that we embedded racism and classism into these algorithms or whether we don’t even know if it is worse or better after introducing these scores. We haven’t quantified it, which is to say that the system is so messed up right now that we might have even improved it by injecting these very deeply flawed risk scores into the system, and that is a sad state of affairs. I think most people who are white, educated and Harvard PhDs don’t understand just how bad shit is, and how we desperately need to improve systems. If we could improve them with good algorithms, how much better would that be than with terrible algorithms?
“We have to start demanding accountability for algorithms.”
Fixing the issue of biased algorithms
You say in the book that ‘big data processors codify the past, they don’t invent the future’. I really like that. Maybe you can tell us what you mean by it?
What I mean by that is that any machine learning algorithm is trained by historical data and a definition of success. For example, if you are training a hiring algorithm you need a bunch of old applications and some of the people who got hired and succeeded. Then you train that algorithm on that data with that definition of success and all it does is pick up patterns of the past. If you apply it to a current pool and someone who is now applying for the job, it will say, ‘Do you look like someone who was successful in applying in the past?’ And that is what machine learning algorithms are very good at, but if your past practices were racist or sexist then they will discriminate against people because they are codifying the past.
Well, we’ve only had this technology for a short amount of time. Obviously, algorithms have been around for a while, but in our primitive state we are flawed individuals so how do we usher this technology into the 21st century? There must be a light at the end of the tunnel here?
We have to start demanding accountability for algorithms, which means we have to demand that for important algorithms rather than the ones that data scientists create in their basements. We are talking about the ones that affect people, meaning that we measure the effects on these people. We have to ask for whom does this fail? What does this fail look like and how bad is it? Also we have to ask if the fail is worth it from a cost/benefit analysis. We just accept everything blinding and we don’t compare the efficiency versus the fairness.
Weapons of Math Destruction is out now through Penguin