The Elephant in the Room

Or, What Does "Trustworthy AI" Mean Anyway? Or, Can We Trust (Autonomous) Technology? And, Can Technology Trust Us?

Admittedly, one can have too many titles to a chapter, but in this instance there are many reasons. Let’s unpack some of them together.

This is not a technical chapter. There are no formulae, there is no cryptography, there are no recommendation or reputation systems. There are no answers, really.

There are plenty of places where trustworthy AI is discussed — what it means, how it might work, why it is important. There’s a bunch about things like the Trolley Problem, moral choices, transparency and so on. We will probably get to some of that in this chapter. But we won’t cover it all — the further reading chapter at the back of the book (there are always answers in the back of the book!) will provide pointers to some of these other places. But recently it has become clear to me (and a few other people I know) that we may not actually be asking the right questions. So this chapter is about that. The thoughts here have been developed in collaboration with and many thoughtful and urgent conversations with my colleagues and friends Peter Lewis and Jeremy Pitt. Indeed, Peter and I have written a paper that seeks to tease out much of what you will read here. It’s all about trusting rocks (seriously). Naturally I highly recommend that you read it. But in the interim, I hope you read this chapter too.

Let’s begin with once upon a time.

Once upon a time, there was no AI[1]. Not now. Now we have AI around every street corner — watching us on street corners in fact. Sure, it may be a marketing gimmick — “We have AI! Buy our stuff!” — after all, the history of Artificial Intelligence is sadly polluted by snake oil. But let’s imagine we are now in the era where the computers we are using are actually capable of supporting an AI. It’s not exactly hard to believe: I wear on my wrist more computing power than got Armstrong, Aldrin and Collins to the Moon. Let’s imagine for a while that AI has truly arrived, whatever that might mean, and that we live in an age where we have spawned another intelligence. Just as an aside, if this is true (I leave it to you to decide), this is truly something special and we should as a species be humbly proud of what we have done[2]. The consequences are rather massive, though. So, let’s work on that.

Let’s continue, then, with a question: what exactly is AI? Margaret Boden, who probably knows better than anyone else alive, describes it as computers doing the things that minds can do. Imagine, for a second or two (or the remainder of this chapter) that minds and humans are pretty much connected (except where we are talking about AI). That sounds about right. It doesn’t say that AI is better than humans, or that there are things humans can do that AI cannot. Indeed, it doesn’t actually say that there are things an AI can do that a human cannot.

There is something important here. But there’s much, much more. Whether or not an AI can think better than us, or drive a vehicle better than us, or whatever, it is something that resides on a machine (for now) that can be turned off (for now). This sounds profound until you realize that the same can be said for any of us. The only thing that stands in the way of other humans doing it is that it’s just, well, wrong to turn off humans. But I digress, although I would like to point out that the usual but you can turn it off argument about AI being lesser than humans isn’t actually that valid. We generally get to pick the hill we want to die on.

So why is this in a book about trust? There are two things here. The first is the traditional, “Can we trust (an) AI?” (which I call the Frankenstein question). The second is, “Can (an) AI trust us?” (which I think of as the Prometheus question, for want of a better word). Both of these are huge questions. Let’s start with the first.

Can We Trust (an) Artificial Intelligence?

For starters, if you’ve been reading to now and haven’t just jumped here, that’s something of a silly question. If you didn’t think, “What for?” when you read that, may I humbly suggest you head back to the Pioneers chapter and look for Onora O’Neil? To put it slightly more succinctly: asking the question, “Can you trust X?” is pointless. It doesn’t mean anything.

But I digress. The question, however silly, has been asked. Let’s see what we might think of can we trust (an) AI to do something?

Have you noticed those (an) things hanging around? They might get a bit annoying, so let’s do this: An AI is an agent, some instance of a thing that is artificially intelligent in some way. AI, without the (an), is Artificial Intelligence in general. I sometimes slip between the toward I think, right now, that’s ok. It’s a bit like, “Can I trust a human?” versus “Can I trust humans?” You could take it further and do the, “Can I trust Steve?” in which case the corresponding question for AI would be, “Can I trust this specific instantiation of a particular form of AI?”. I may do away with the brackets now that you catch the drift.

Anyway, trusting AI, or even an AI, to do something, It’s really quite a contentious topic, and it has been for as long as I have been thinking about trust (around 30 years now!) and before: is it sensible to actually think about technology in terms of such a human notion as trust? Unpacking that sentence is challenging. The first problem is that of human exceptionalism, which is probably not something you would expect to find in a book about technology. Then again, perhaps it would be sensible to have discussions around human exceptionalism in technology books.

There’s often an implicit and sometimes explicit assumption that trust is a human trait. A careful observation of our non-human friends reveals the inaccuracy of this assumption. Whilst trust is a human trait, it’s not exclusively human. Which is to say, animals can and do trust us too. And it probably goes without saying that we often consider our animals in terms of trust. The story of Gelert the Faithful Hound, and various similar derivative or coincidental stories from around the world, serve to remind us that when we place trust in a ‘lesser’ being we need to accept that the trust has been placed for a reason. As you have likely realized if you have come this far in the book, trust is an acceptance of the potential for bad things to happen in any given situation. This isn’t just an academic statement and it bears a deeper exploration.

When we place trust we accept the fact that there are things we cannot control in the situation. We also accept that the outcome of the situation may well be in the hands of another. That is, the other (the trusted) may well have the power or capability (or willingness, desire, intention, etc.) to do something that we don’t want to happen. The thing about this is that that trusted other need not even know that they are trusted. Castelfranchi notes that it is possible to use the fact that you trust someone as a form of power over them — because moral creatures don’t want to let others down. This in no way compels us to tell the other that they are trusted. In fact, it doesn’t even say that the other knows we exist.

We often talk about trusting government or trusting some company or other to do something. We often talk about trusting animals to do (or in many case not to do) something. That government may know in principle that we exist, but not usually as individuals. The same goes for the company (absent surveillance capitalism) or the animal. In fact, the same goes for anyone or anything that is trusted. The knowledge of our existence is not a pre-requisite for us being able to trust someone or something.

Sure, sometimes there is some form of acknowledgement that we exist. Of course, there may be a moral or fiduciary order that we can appeal to when trust is placed: for instance, walking across the road in front of traffic is a pretty large display of trust in the drivers of the oncoming vehicles. Not being run over is one of those things that it is probably fair to expect in a society. Likewise, we might trust our children’s teachers to behave morally and to educate our children to their best capability. If we go back to Bernard Barber we are looking basically at fiduciary trust in that case and in the case of many professional relationships: people in professional positions are expected to put their own interests aside, or at least have the best interests of the other at heart when engaged in those situations.

We (Patricia and I) own a few horses. We have in the past sent a couple of them to be trained in some way. All that we know is that the animal is being trained. We place a great deal of trust in the trainer because the animal can’t tell us how it was treated. We trust our horses to behave in a certain way. Standing calmly when being groomed by a stranger, for instance. Or not going out of control after getting scared at a paper bag on the side of the road (you would be surprised at what scares otherwise calm horses).

But here’s the thing: trust means placing yourself in someone or something else’s ‘hands’ (or hooves). It does not give us the right to expect that they will honour that trust. After all, they may not even know they are being trusted! Where we may have the right to expect the trust to be honoured is in the situations where fiduciary trust[3], or standards of care, are known and acknowledged. The babysitter knows they are trusted and what is expected as a result. Likewise the surgeon or the lawyer. But does the horse? Or my dogs? Perhaps more to the point, how can I tell them that I trust them, and if I could, does it really matter? How would it change what they might do?

As a matter of fact, I have a service dog. Her name is Jessie and she and I went pretty much everywhere together before the pandemic (planes, trains and automobiles, for sure). She trusts me and I trust her. Nothing more really needs to be said. Except this: she will do whatever I ask her to because she trusts me. And because I know she would, I don’t ask her to do something I know would be dangerous. Why are we diving into this particular rabbit hole? Because this is about an equivalence. When we talk about trusting machines, what exactly is the difference between that and trusting, for example, one of my dogs?

Yes, there’s the second (Prometheus) question. I haven’t forgotten. We’ll get there.

Trust is about accepting the potential for things to go wrong. Sure, we can get clever and talk about the thing we are trusting understanding their obligations, which is usually what gets brought up. But I think (hope!) we just established that the thing that is being trusted doesn’t even have to know that it (they) are being trusted. Where is the moral imperative to behave in a trustworthy fashion in such a case? And before we dive into the ‘moral’ problem there is no requirement in any of the trust definitions I have given in this book that have a moral imperative for the trustee. Bear this in mind.

Clifford Nass and Byron Reeves, from Stanford University, back in the 1990s, did some research about how people (humans) perceive technology (media). The experiments had to do with the way in which people interacted with a computer – the details of the experiments aren’t that important, but very quickly, it was like this:

  • Take a bunch of people and tell them they are about to work with a computer for some problem.
  • Ask them how well they thought the computer did (most of them basically said “pretty good”).
  • Then split them up and have them do the evaluation again, but half of the people use the same computer and the other half use a different one (there were some pen and paper ones too).

What happened? The people who were taken away from the computer and asked in another room, by another computer, basically had answers that were much more varied and even negative. It was as if they were talking behind the first computer’s back, and in front of it they were positive because they didn’t want to hurt its feelings. Yes, they knew it was a computer.

The experiment was repeated with a computer able to talk (had a speaker) and the results were basically the same. What does this actually tell us? Well, the experiments were very vigorous for one thing, and I basically summed the whole thing up in no time at all, but in general we can see that humans basically treat technology as a social actor. Indeed, one of the principles that came out of the so called Media Equation is that when systems are designed they should be designed in such a way that acknowledges that people already know how to interact in a social way, and so to give them what makes sense in this context.

To put it another way: when people interact with technology, they basically see it as a social actor.

This is not a small thing. People already see technology as a social actor. This was in the 1990s, when AI was a twinkle in the eye of many a grant-seeking professor. A small digression is in order here. I know the history of AI. It is far longer than you might think — probably as long as people have been thinking there has been a goal of AI. My point here is that, taking a closer look at AI in the last 70 or so years since Turing’s paper, the field has constantly re-invented itself. The reasons are many and unimportant here, except that one of them is to retain favour with those who would grant money for research. But then, I am a cynic in this respect.

What would the result be with a technology that actually behaved in context and (seemed to) understand the rules? Let me repeat: people already saw technology as a social actor. It’s automatic. It’s not like we even had to try to make them. There is always a thing around here that talks about anthropomorphism, which is basically assigning human aspects to, for example, an animal (“Oh look, she is smiling at me!”). I’m not talking about anthropomorphism. The people didn’t ascribe human aspects to the computer. They already knew that they were interacting with a machine. They didn’t give it a smile or a sense of humour, they just treated it as a social actor in its own right.

And so, let’s repeat this thought: if we are already able (indeed liable) to see really quite simple technology as a social actor, where does that leave us with a responsive, adaptive technology?

That’s what I thought too.

If we already see technology as a social actor, and we behave towards it as if it were so, it’s an extremely small step from there toward trust. A couple of things:

This morning was March 17th, 2021. For fun I picked up my phone and said, “Hey Siri, Happy Saint Patrick’s Day”. The response? “Erin go bragh” (which basically means “Ireland forever”, give or take the odd bastardization of Gaelic). When we could travel back in the distant days of 2019 at a “Trusting Intelligent Machines” workshop in Germany at the wonderful Schloss Raischoltzhausen[4]. I was having a conversation and referred to Siri as “she”. It was (rightly) pointed out that Siri doesn’t have a gender — Siri is just a machine, really. My response? “It doesn’t matter.”

If people want to ascribe personality to the things they use, who are we to tell them they shouldn’t? I’m not going to pick on anyone here, this is only the middle of the very start of a discussion that is ongoing, and it is a moral imperative that the people who I challenge in some way have the right and ability to respond. That’s hard to do when something is in print, as it were. So, let’s say this: there is a debate about the trustworthiness of AI. There is an argument that we shouldn’t actually think of trusting AI at all because, well, for one thing it’s a machine and so the whole moral imperative question doesn’t arise. For another thing, it was made. This is important because if the AI does something wrong and people get hurt we can basically ascribe blame to the programmers or the company that made it.

Before we go further with the other stuff, let’s address this, because it’s important. There are two things in this position that need to be addressed. The first is that one needs to ascribe blame when things go wrong. This is a cultural position. Other cultures believe in different approaches. For example, Indigenous peoples in North America have great faith in restorative justice. The point? Seeking to blame and hold responsible isn’t representative of the entirety of the human race anyway, not to mention that it has had some rather embarrassing and cruel results — pigs and other animals are not culpable. Fortunately we appreciate this, now. How long will it take until both revenge and retributive justice are seen as outdated[5]? The argument is that AI cannot be culpable either, of course, and so we need to seek others to blame.

Consider this: when an AI is released into the real world, every experience it has changes it. It is almost instantly no longer the thing that was released. Who is to blame if it fails? When Tay was released to Twitter in 2016, she was innocent in the sense that she didn’t know any different (although some things she did know not to touch). Her subsequent descent into racism and homophobia was perfectly understandable. Have you seen what gets posted on ‘social’ media? Much more to the point, she wasn’t the agent that was released onto Twitter as soon as she was released. There really is no-one to blame. Truly. Sure, Microsoft apologized, but most importantly, Microsoft apologized like this: We are deeply sorry for the unintended offensive and hurtful tweets from Tay. It is easy to say that Microsoft was at fault, but Tay posted the tweets.

Did you notice something just then? I’ll let you think about it.

There is a great deal of airtime devoted to making AI more trustworthy by, for example, increasing transparency, or predictability, or whatever, in the hope that people will trust it. The goal is to get people to trust AI, of course, so that all its beneficence will be showered upon us, and we will be, as it were, “All watched over by machines of loving grace.”[6]

Sure, that was sarcasm, but the point is this: some people want us to trust AI. So the answer of course is to make it more trustworthy. This is answering the wrong question. Trustworthiness is something that, if you have got this far, you know is the provenance of the thing or person you are thinking of trusting. That is to say, we don’t give trustworthiness to something, it either is or is not trustworthy to some extent. What we give is trust. More to the point, we can choose to trust even if the thing we are trusting is untrustworthy. Even if we know it is untrustworthy.

To labour the point a little more, let’s return to the media equation. As a reminder, because it’s been a few words since then: people treat technology as a social actor (they are even polite to technology). The argument that we shouldn’t trust technology because it is basically just an inanimate, manufactured ‘thing’ is moot. I’m not going to argue one way or another about whether or not we should trust an AI. That cat is already out of the proverbial bag. If you haven’t seen that yet, let me spell it out for you: that people already see their technology as a social actor means that they almost certainly also think of it in terms of trust. It truly doesn’t matter if they should or not, they just do.

This leaves us with only one option, which is what Reeves and Nass told us all along: design technology on a path of least resistance. Accept that people will be doing what people do and make it easier for them to do so. Even if you don’t, they will anyway, so why make it hard?

Let’s briefly return to the trustworthiness of AI. I’ve already said it’s pretty much a done deal anyway – we will see AI in terms of trust regardless of what might happen. The argument that we should make AI more trustworthy so that people will trust it is pointless. What is not pointless is thinking about what “trustworthy” actually means. It doesn’t mean “more transparent”, for instance. Consider: the more we know about something, the more we can control (or predict) its actions, and so the less we need to even consider trust. Transparency doesn’t increase trustworthiness, it just removes the need to trust in the first place.

But of course, AI, autonomous vehicles, robot surgeons and the like are not transparent. As I already talked about in the Calculations chapter (and in the paper about Caesar, actually), we’ve already crossed the line of making things too hard for mere mortals to understand. Coupled with the rather obvious fact that there is no way you can make a neural network transparent to even its creator after it has learned something that wasn’t controlled, we are left only the option to consider trust. There is not another choice. Transparency is a red herring.

That given, what can we do? We are already in a situation where people will be thinking about trust, one way or another. What is it that we can do to make them be more positive? Again: this is not the right question.


If you want someone to trust you, be trustworthy. It’s actually quite simple. Behave in a trustworthy fashion. Be seen as trustworthy. Don’t steal information. Don’t make stupid predictions. Don’t accuse people with different skin colours of being more likely to re-offend. Don’t treat women different from men. Don’t flag black or brown people as cheating in exams simply because of the colour of their skin. Just don’t. It’s honestly not that hard. It’s actually not rocket science (which is good, because I am not a rocket scientist). If the systems we create behave in a way that people see is untrustworthy they will not trust them. And of course, with very good reason.

We are applying AI in all kinds of places where we shouldn’t, because the AI can’t do it properly yet. And we expect people will want to trust it in a positive fashion? Let me ask one question: if you saw a human being behaving the way much of the AI we have experienced does toward different kinds of people, what would you do?

Before we finish this chapter, let us, for the sake of thinking, climb out of one hole and burrow into a vastly different one. Let’s revisit the Prometheus question.

Can An AI Trust Us?

It is possible that at this point that you begin to think that Steve has gone slightly bonkers. Bear with me. Let’s think about it for a second. We are developing, indeed have developed, black boxes. For those who don’t know what that means, a black box is something we can’t actually see inside to see how it works. There are such things in the world today: Artificial Intelligences that are opaque to all of us; they have learned from the information given them (training data) and yet more from the experiences that they have had in the ‘real’ world. If the experiences that they have had impact the data that they reason with, then it goes without saying that each one is unique. Just like us. Just like my dogs.

I’m not saying that the systems we have developed are as complex or indeed as capable as humans, or even the animals we live with. But is that the point? If we are able to rank a very advanced (for us) AI, relative to the fauna of the world, at what point on the spectrum of living things would we put them? I said above that we can turn off the machines that these AIs exist on. It’s true. It has been asked before if we have the moral right to turn off a machine on which resides some sentient AI that does not want to be turned off (it is the very thing Arthur C. Clarke’s 2001 is about, or Sarah Zettel’s excellent "Fool’s War"). It’s a fair question, but it sort of misses the point. Here’s the thing: one of the things about owning animals (for example on our farm) is that in general we get to decide when they are born and when they die. And (as Terry Pratchett would have it) in-between we have a duty of care. It’s important.

How does the animal feel about dying? I’d venture to suggest that my dogs don’t actually think about it at all, they live in a long moment. But we have laws about treating them badly. We have laws about how they should be cared for. Moral expectations are very high with respect to their care. And their deaths. If I asked one of my dogs if they didn’t want to die, what kind of answer would I get?

Have you seen the point yet?

The dogs trust me to care for them, just as much as I trust them to be gentle, not chew up my furniture[7], things like that. They probably class as sentient because they react to stimuli, and so forth. But they react because they can. A disembodied autonomous system that resides on a machine doesn’t have that luxury. So, can it trust us?

If you have read this far in the book you will have seen that trust models exist for artificial agents. I even showed you one. Given the right signals, the right information, and in the right context, the agent will behave as if it trusts in some way. It will change (adapt) its trust based on experience. It will recognize different contexts (if given the right tools) and think about trust accordingly. Just like my dogs. The trust model I made was developed so that agents in different situations could reason about how and when (and what for) they trusted each other, but it has always begged the question.

Can a machine trust a human?

One last thing. Let me return to Tay and the question I asked after I had talked about her. Did you notice that I had called her “her” or “she”? Whether you did or not probably has a lot to say about what you thought of the chapter.

  1. I'm well aware this is a specious observation. It really doesn't matter.
  2. Even if it wasn’t us specifically, personally, that did it.
  3. If you haven't read about Bernard Barber yet in this respect, you can now...
  4. There are some perks to being an academic — another, also in Germany, is Dagstuhl — look them up, you can then be envious and I can gloat.
  5. Yes, I know that is naïve.
  6. Which, if you don’t know it, was the last line of a poem by Richard Braughtigan, as well as a rock band!
  7. Some hope!


Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Trust Systems Copyright © 2021 by Stephen Marsh is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book