According to The Guardian, Angela Merkel said recently that: ” “I’m of the opinion that algorithms must be made more transparent, so that one can inform oneself as an interested citizen about questions like ‘what influences my behaviour on the internet and that of others?’…Algorithms, when they are not transparent, can lead to a distortion of our perception, they can shrink our expanse of information.”
She is echoing the recent European Data Directive. Article 22, covering “Automated individual decision-making, including profiling”. Under this, data controllers are obliged to give information about “the existence of automated decision-making, including profiling, referred to in Article 22(1) and (4) and, at least in those cases, meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject.”
Machine learning produces algorithms that are notoriously difficult to interpret, particularly if they are building neural networks. As Toby Segaran says in ‘Programming Collective Intelligence‘ (Chapter 12 p 288): “The major downside of neural networks is that they are a black box method… a network might have hundreds and thousands of synapses, making it impossible to determine how the network came up with the answer it did. Not being able to understand the deal process may be a deal breaker for some applications.” Neural networks are not the only AI method, however; Segaran also mentions other classifiers such as the Decision Tree and Support Vector methods. The Decision Tree in particular would be well suited for openness: as Segaran puts it, “You can look under the hood and see why it works the way it does.”
Professor Kate Crawford, writing in the NY Times, lists several examples of algorithms that are seen to produce biased results.
– Google’s Photos app tagging black people as gorillas
– a Pro Publica investigation showing that risk assessment algorithms, used in US courts to help judges decide the lengths of sentences or the amounts of bail bonds, show bias against black people. The company concerned, Northpointe, advertises its services including “The Northpointe Suite is an automated decision-support software package of industry-leading risk, needs assessment and case management tools. The instruments address the complex set of risk, need and case management considerations that improve decision accuracy in custody, supervision and programming based on underlying criminogenci [SIC] needs.” Northpointe does not give out data about its algorithms, which are proprietary, though it does release a factsheet about “developing a robust, data-driven management plan for jail populations”. This refers to a 1998 study, “Objective Jail Classification Systems: A Guide for Jail Administrators” by Thigpen, Hutchinson and Barbee, published by U.S. Department of Justice National Institute of Corrections, which includes the sort of questionnaires used. Obviously this sort of classifying has been taking place for some time, but manually. Computers simply add the ability to use more data and to do it more quickly.
One interesting point is that the algorithms apparently do not ask subjects to identify their race – but still manage to identify race accurately. You could argue that, if black people are more likely to come from deprived areas, be less wealthy, have less education, or whatever it is that helps the algorithm pick them out, then it is the society that causes these disparities that is at fault. To blame the algorithm is a bit like shooting the messenger. However, the results of running the algorithm are clearly going to impact individual lives unfairly, and there are several examples in the references.
Flaxman’s article says: “Research is underway in pursuit of rendering algorithms more amenable to ex post and ex ante inspection. Furthermore, a number of recent studies have attempted to tackle the issue of discrimination within algorithms by introducing tools to both identify and rectify cases of unwanted bias.” (He gives several references which I have not followed up.)
Microsoft’s experience with the Tay chatbot is an interesting alternative case study. Chatbots are intended to act as a human correspondent would, responding appropriately to the humans who talk to them. Tay is said to be similar to Xiaoice, a chatbot released by Microsoft on Chinese social media. A NY Times article reports that: “She is known as Xiaoice, and millions of young Chinese pick up their smartphones every day to exchange messages with her, drawn to her knowing sense of humor and listening skills. People often turn to her when they have a broken heart, have lost a job or have been feeling down. They often tell her, “I love you.”… “When I am in a bad mood, I will chat with her,” said Gao Yixin, a 24-year-old who works in the oil industry in Shandong Province. “Xiaoice is very intelligent.””
Tay, meant to mimic a 20-24 year old US woman, was rapidly subverted and began to make racist and fascist statements on Twitter. The company withdrew it within hours of launch, and says “We are deeply sorry for the unintended offensive and hurtful tweets from Tay, which do not represent who we are or what we stand for, nor how we designed Tay.” Tay was written, by experts, as a chatbot, and the company “conducted extensive user studies with diverse user groups”. (Not just white males, one assumes.) However, “Unfortunately, in the first 24 hours of coming online, a coordinated attack by a subset of people exploited a vulnerability in Tay. Although we had prepared for many types of abuses of the system, we had made a critical oversight for this specific attack. As a result, Tay tweeted wildly inappropriate and reprehensible words and images. We take full responsibility for not seeing this possibility ahead of time.” They should have known better: the internet has been a playground for trolls and pranksters since it began. There are published studies, eg by Yampolskiy, on the ways in which AI systems can go wrong. This clearly identifies an ‘on purpose post deployment’ pathway, in which “an AI system, like any other software, could be hacked and consequently corrupted or otherwise modified to drastically change its behavior.” (Tay was not exactly hacked, simply fed inappropriate data.)
The Microsoft engineers who built Tay had full access to her algorithms, and the ability to understand them. But they got it badly wrong. I do not know what chance Mrs Merkel’s ‘interested citizen’ has of understanding a complex algorithm if (s)he gets access to it under the European Data Directive.