Anyone who calls an electronic system that is supposed to make decisions an “algorithm” is often trying to distract from the responsibility of human decisions. Because “algorithm” is a term for many that is based on objective and empirically proven data. He also suggests a highly complex technique – perhaps so complex that a person would have difficulty understanding internal processes or anticipating behavior when used. But is this description correct? Not always.
Last December, for example, at Stanford Medical Center, the incorrect allocation of corona vaccines was excused with a “distribution algorithm” that prioritized senior executives over frontline doctors. The hospital said it sought advice from ethicists to develop a “very complex” algorithm that a spokesman said afterwards that it “apparently wasn’t working properly,” according to the US edition of Technology Review.
While many people interpreted this to mean that some form of machine artificial intelligence was involved, the system was actually a medical algorithm. And they work differently: They are more like a very simple formula or a decision tree that is decided by people in a committee. The fact that many would have imagined something different from this illustrates a growing problem. With the increasing use of predictive models, there are public concerns, especially when it comes to sensitive decision-making. But while legislators are already beginning to develop standards for the evaluation and checking of algorithms, it must first be determined in which areas such decision-making or decision-making applications should be used. If the term “algorithm” is not clearly defined, particularly influential models could lie outside the scope of some provisions that are actually developed to protect people.
Is Stanford’s “algorithm” really one? It depends on how you define the term. Although there is no universally valid definition, a common one comes from a textbook written by computer scientist Harold Stone in 1971, which says: “An algorithm is a set of rules that precisely defines a sequence of applications.” Stupid: This definition includes everything from prescriptions to complex neural networks. A test guideline based on this would be ridiculously broad.
In statistics and machine learning, algorithms are usually thought of as a set of instructions that a computer executes to learn from data. In these areas, the resulting structured information is typically called a model. This information, which a computer learns using data via an algorithm, can contribute to the weighting by which each entered factor must be multiplied – but it could get much more complicated. The complexity of algorithms themselves can also vary. What impact these have ultimately depends on the data to which they are applied and the context in which the model is ultimately used. The same algorithm could have a positive impact in one context and a very different effect in another.
In other areas, what was called the model here is already referred to as an algorithm. While confusing, it is the broadest definition: models are rules (learned from a training algorithm, rather than directly entered by a human) that define a sequence of applications. For example, last year the UK media described a system as an “algorithm” that failed to assign fair results to students who were unable to attend the exam due to COVID-19. What was discussed, however, was of course the model – i.e. the set of instructions that translated inputs (the previous performance of students or the assessment of a teacher) into outputs (results).
What now appeared to have happened at Stanford is that people – including ethicists – have sat down and decided which set of applications the system should use to determine whether, based on inputs (such as an employee’s age and department) that person should be among the first to be vaccinated. As far as is known, this sequence was not based on an estimation process that was optimized for quantitative goals. It was a set of normative decisions that dictated how vaccinations should be prioritized, formalized in the language of an algorithm. In medical terminology and according to a broad definition, this approach fulfills the requirement of an “algorithm” – even if the only intelligence involved was human.
Laws on the way
Politicians are also concerned with what an algorithm is. In the US Congress in 2019 the “Algorithmic Accountability Act” was introduced, which chooses the term “automated decision-making system” and defines this as a “computer-controlled process”. What is meant are those that are “derived from machine learning, statistics, and other data processing or AI technologies that make a decision or facilitate human decision-making with an impact on consumers.”
Similarly, New York City is now considering INT 1894, a law that would introduce mandatory review of “automated decision-making in the workplace.” This is defined as “any system whose function is guided by statistical theory or systems whose parameters are determined by such systems.” It is noticeable that although both bills mandate reviews, they only provide high-level guidelines, such as such a review would look like.
As decision makers in both government and business set standards for algorithmic review, there will likely be arguments about what counts as an algorithm. Rather than attempting to find a common definition or a particular universal review technique, one solution might be to develop automated evaluation systems that primarily focus on the impact of such applications. By focusing more on the result than on the input, unnecessary debates about technical complexity can be avoided. Because what counts is the damage potential, regardless of whether it is an algebraic formula or a deep neural network.
In the end, it is the effect that counts
Critical tests based on the effects are also common in other areas. For example, they can be found in the classic DREAD framework (Damage, Reproducibility, Exploitability, Affected Users, Discoverability) of cybersecurity, which first became popular with Microsoft in the early noughties and is still used in some companies. The “A” in DREAD asks the auditor to quantify the “affected users” by asking how many people would suffer from the impact of an identified vulnerability. These impact assessments are also widely used in human rights and sustainability studies. Some previous developments in AI have created similar headings at this point. In Canada, for example, there is an algorithmic impact assessment that is based on questions such as “Are there customers in this branch of the economy who are particularly vulnerable? (Yes or no)”.
Certainly there are difficulties when trying to use such an unclear term as “impact” in an investigation. The DREAD framework was later expanded or replaced by STRIDEchanging the rating. It was sometimes difficult to bring together different beliefs, which involves modeling threats. Microsoft stopped using DREAD in 2008.
In the field of AI, impact assessment systems have already been presented at conferences and in specialist journals – with varying degrees of success and some controversy. They are far from being error-free: Impact assessments that are purely formulaic are easy to make, but others that are based on overly vague definitions can lead to arbitrary or incredibly lengthy investigations.
Nonetheless, it is an important step forward. The term “algorithm”, however you want to define it, must not become a protective shield with which people are released from their responsibility to stand up for the consequences of the systems they have developed and deployed. As a result, the call for some sort of algorithmic accountability is getting louder – dealing conceptually with impact provides a useful common denominator for different systems to address, regardless of how they work.
Kristian Lum is Assistant Professor in the Department of Computer Science and Information Technology at the University of Pennsylvania.
Rumman Chowdhury is the director of the Machine Ethics, Transparency, and Accountability (META) team at Twitter. She was previously CEO and founder of Parity, a platform for checking the effectiveness of algorithms. Besides, she was International Head of Responsible Use of AI at Accenture.