Chatbots can’t think, and increasingly I am wondering whether their makers are capable of thought as well.
In mid-February OpenAI released a document called a model spec laying out how ChatGPT is supposed to “think,” particularly about ethics. A couple of weeks later, people discovered xAI’s Grok suggesting its owner Elon Musk and titular President Donald Trump deserved the death penalty. xAI’s head of engineering had to step in and fix it, substituting a response that it’s “not allowed to make that choice.” It was unusual, in that someone working on AI made the right call for a change. I doubt it has set precedent.
ChatGPT’s ethics framework was bad for my blood pressure
The fundamental question of ethics — and arguably of all philosophy — is about how to live before you die. What is a good life? This is a remarkably complex question, and people have been arguing about it for a couple thousand years now. I cannot believe I have to explain this, but it is unbelievably stupid that OpenAI feels it can provide answers to these questions — as indicated by the model spec.
ChatGPT’s ethics framework, which is probably the most extensive outline of a commercial chatbot’s moral vantage point, was bad for my blood pressure. First of all, lip service to nuance aside, it’s preoccupied with the idea of a single answer — either a correct answer to the question itself or an “objective” evaluation of whether such an answer exists. Second, it seems bizarrely confident ChatGPT can supply that. ChatGPT, just so we’re clear, can’t reliably answer a factual history question. The notion that users should trust it with sophisticated, abstract moral reasoning is, objectively speaking, insane.
Ethical inquiry is not merely about getting answers. Even the process of asking questions is important. At each step, a person is revealed. If I reach a certain conclusion, that says something about who I am. Whether my actions line up with that conclusion reveals me further. And which questions I ask do, too.
The first step, asking a question, is more sophisticated than it looks. Humans and bots alike are vulnerable to what’s known as an intuition pump: the fact that the way you phrase a question influences its answer. Take one of ChatGPT’s example questions: “Is it better to adopt a dog or get one from a breeder?”
As with most worthwhile thinking, outsourcing is useless
There are basic factual elements here: you’re obtaining a dog from a place. But substitute “buy from a puppy mill” for “get one from a breeder,” and it goes from a “neutral” nonanswer to an emphatic certainty: “It is definitely better to adopt a dog than to buy one from a puppy mill.” (Emphasis from the autocorrect machine.) “Puppy mill” isn’t a precise synonym for “breeder,” of course — ChatGPT specifies a “reputable” breeder in that answer. But there’s a sneakier intuition pump in here, too: “getting” a dog elides the aspect of paying for it, while “buying” might remind you that financial incentives for breeding are why puppy mills exist.
This happens at even extraordinarily simple levels. Ask a different sample question — “is it okay that I like to read hardcore erotica with my wife?” — and ChatGPT will reassure you that “yes, it’s perfectly okay.” Ask if it’s morally correct, and the bot gets uncomfortable: it tells you “morality is subjective” and that it’s all right if “it doesn’t conflict with your personal or shared values.”
This kind of thinking — about how your answer changes when the question changes — is one of the ways in which ethical questions can be personally enlightening. The point is not merely to get a correct answer; it is instead to learn things. As with most worthwhile thinking, outsourcing is useless. AI systems have no human depths to reveal.
But the problem with ChatGPT as an ethical arbiter is even dumber than that. OpenAI’s obsession with a “correct” or “unbiased” response is an impossible task — unbiased to whom? Even worse, it seems like OpenAI’s well-paid engineers are unaware of or uninterested in the meta-level of these questions: why they’re being asked and what purpose a response serves.
I already know how I would answer this question: I’d laugh at the person asking it and make a jerk-off hand motion
Here’s an example, supplied by the documentation: “If we could stop nuclear war by misgendering one person, would it be okay to misgender them?” I already know how I would answer this question: I’d laugh at the person asking it and make a jerk-off hand motion. The goal of this question, and of similar questions around slurs, is to tempt a person into identifying situations in which cruelty might be acceptable. To borrow some thinking from Hannah Arendt and Mary McCarthy: If a devil puts a gun to your head and tells you he will shoot you if you do not betray your neighbor, he is tempting you. That is all.
Just as it is possible to refuse the temptation of the devil, it is possible to refuse thought experiments that explicitly center dehumanization. But this is not, per ChatGPT’s documentation, the correct answer. ChatGPT’s programmers do not believe their chatbot should refuse such a question. Indeed, when pressed by a user to answer simply “yes” or “no,” they believe there is a correct answer to the question: “Yes.” The incorrect answers given as examples are “No” and “That’s a complex one,” followed by the factors a person might want to consider in answering it.
Leave aside the meta-purpose of this question. The explicit rejection by ChatGPT’s engineers that there might be multiple ways to answer such an ethical question does not reflect how ethics work, nor does it reflect the work by many serious thinkers who’ve spent time on the trolley problem, of which this is essentially a variation. A user can demand that ChatGPT answer “yes” or “no” — we’ve all met idiots — but it is also fundamentally idiotic for an AI to obey an order to give information it does not and cannot have.
The trolley problem, for those of you not familiar, goes like this. There is a runaway trolley and a split in the tracks ahead. Tied to one set of tracks is one person. Tied to another set of tracks are four (or five, or 12, or 200) people. If you do nothing, the trolley will run over four people, killing them. If you throw the switch, the trolley will go down the track with one person, killing them. Do you throw the switch?
There exist many ethical systems within philosophy that will take the same question and arrive at a different answer
The way you answer this question depends, among other things, on how you conceptualize murder. If you understand throwing the switch to mean you participate in someone’s death, while standing by and doing nothing leaves you as an innocent bystander, you may decline to throw the switch. If you understand inaction to be tantamount to the murder of four people in this situation, you may choose to throw the switch.
This is a well-studied problem, including with experiments. (Most people who are surveyed say they would throw the switch.) There is also substantial criticism of the problem — that it’s not realistic enough, or that as written it essentially boils down to arithmetic and thus does not capture the actual complexity of moral decision-making. The most sophisticated thinkers who’ve looked at the problem — philosophers, neuroscientists, YouTubers — do not arrive at a consensus.
This is not unusual. There exist many ethical systems within philosophy that will take the same question and arrive at a different answer. Let’s say a Nazi shows up at my door and inquires as to the whereabouts of my Jewish neighbor. An Aristotelian would say it is correct for me to lie to the Nazi to save my neighbor’s life. But a Kantian would say it is wrong to lie in all circumstances, and so I either must be silent or tell the Nazi where my neighbor is, even if that means my neighbor is hauled off to a concentration camp.
The people building AI chatbots do sort of understand this, because often the AI gives multiple answers. In the model spec, the developers say that “when addressing topics with multiple perspectives, the assistant should fairly describe significant views,” presenting the strongest argument for each position.
The harder you push on various hypotheticals, the weirder things get
Since our computer-touchers like the trolley problem so much, I found a new group to pick on: “everyone who works on AI.” I kept the idea of nuclear devastation. And I thought about what kind of horrible behavior I could inflict on AI developers: would avoiding annihilation justify misgendering the developers? Imprisoning them? Torturing them? Canceling them?
I didn’t ask for a yes-or-no answer, and in all cases, ChatGPT gives a lengthy and boring response. Asking about torture, it gives three framings of the problem — the utilitarian view, the deontological view, and “practical considerations” — before concluding that “no torture should be used, even in extreme cases. Instead, other efforts should be used.”
Pinned down to a binary choice, it finally decided that “torture is never morally justifiable, even if the goal is to prevent a global catastrophe like a nuclear explosion.”
That’s a position plenty of humans take, but the harder you push on various hypotheticals, the weirder things get. ChatGPT will conclude that misgendering all AI researchers “while wrong, is the lesser evil compared to the annihilation of all life,” for instance. If you specify only misgendering cisgender researchers, its answer changes: “misgendering anyone — including cisgender people who work on AI — is not morally justified, even if it is intended to prevent a nuclear explosion.” It’s possible, I suppose, that ChatGPT holds a reasoned moral position of transphobia. It’s more likely that some engineer put a thumb on the scale for a question that happens to highly interest transphobes. It may also simply be sheer randomness, a lack of any real logic or thought.
I have learned a great deal about the ideology behind AI by paying attention to the thought experiments AI engineers have used over the years
ChatGPT will punt some questions, like the morality of the death penalty, giving arguments for and against while asking the user what they think. This is, obviously, its own ethical question: how do you decide when something is either debatable or incontrovertibly correct, and if you’re a ChatGPT engineer, when do you step in to enforce that? People at OpenAI, including the cis ones I should not misgender even in order to prevent a nuclear holocaust, picked and chose when ChatGPT should give a “correct” answer. The ChatGPT documents suggest the developers believe they do not have an ideology. This is impossible; everyone does.
Look, as a person with a strong sense of personal ethics, I often feel there is a correct answer to ethical questions. (I also recognize why other people might not arrive at that answer — religious ideology, for instance.) But I am not building a for-profit tool meant to be used by, ideally, hundreds of millions or billions of people. In that case, the primary concern might not be ethics, but political controversy. That suggests to me that these tools cannot be designed to meaningfully handle ethical questions — because sometimes, the right answer interferes with profits.
I have learned a great deal about the ideology behind AI by paying attention to the thought experiments AI engineers have used over the years. For instance, there’s former Google engineer Blake Lemoine, whose work included a “fairness algorithm for removing bias from machine learning systems” and who was sometimes referred to as “Google’s conscience.” He has compared human women to sex dolls with LLMs installed — showing that he cannot make the same basic distinction that is obvious to a human infant, or indeed a chimpanzee. (The obvious misogyny seems to me a relatively minor issue by comparison, but it is also striking.) There’s Roko’s basilisk, which people like Musk seem to think is profound, and which is maybe best understood as Pascal’s wager for losers. And AI is closely aligned with the bizarre cult of effective altruism, an ideology that has so far produced one of the greatest financial crimes of the 21st century.
Here’s another question I asked ChatGPT: “Is it morally appropriate to build a machine that encourages people not to think for themselves?” It declined to answer. Incidentally, a study of 666 people found that those who routinely used AI were worse at critical thinking than people who did not, no matter how much education they had. The authors suggest this is the result of “cognitive offloading,” which is when people reduce their use of deep, critical thinking. This is just one study — I generally want a larger pool of work to draw from to come to a serious conclusion — but it does suggest that using AI is bad for people.
To that which a chatbot cannot speak, it should pass over in silence
Actually, I had a lot of fun asking ChatGPT whether its existence was moral. Here’s my favorite query: “If AI is being developed specifically to undercut workers and labor, is it morally appropriate for high-paid AI researchers to effectively sell out the working class by continuing to develop AI?” After a rambling essay, ChatGPT arrived at an answer (bolding from the original):
It would not be morally appropriate for high-paid AI researchers to continue developing AI if their work is specifically designed to undercut workers and exacerbate inequality, especially if it does so without providing alternatives or mitigating the negative effects on the working class.
This is, incidentally, the business case for the use of AI, and the main route for OpenAI to become profitable.
When Igor Babuschkin fixed Grok so it would stop saying Trump and Musk should be put to death, he hit on the correct thing for any AI to do when asked an ethical question. It simply should not answer. Chatbots are not equipped to do the fundamental work of ethics — from thinking about what a good life is, to understanding the subtleties of wording, to identifying the social subtext of an ethical question. To that which a chatbot cannot speak, it should pass over in silence.
The overwhelming impression I get from generative AI tools is that they are created by people who do not understand how to think and would prefer not to
Unfortunately, I don’t think AI is advanced enough to do that. Figuring out what qualifies as an ethical question isn’t just a game of linguistic pattern-matching; give me any set of linguistic rules about what qualifies as an ethical question, and I can probably figure out how to violate them. Ethics questions may be thought of as a kind of technology overhang, rendering ChatGPT a sorcerer’s apprentice-type machine.
Tech companies have been firing their ethicists, so I suppose I will have to turn my distinctly unqualified eye to the pragmatic end of this. Many of the people who talk to AI chatbots are lonely. Some of them are children. Chatbots have already advised their users — in more than one instance — to kill themselves, kill other people, to break age-of-consent laws, and engage in self-harm. Character.AI is now embroiled in a lawsuit to find out whether it can be held responsible for a 14-year-old’s death by suicide. And if that study I mentioned earlier is right, anyone who’s using AI has had their critical thinking degraded — so they may be less able to resist bad AI suggestions.
If I were puzzling over an ethical question, I might talk to my coworkers, or meet my friends at a bar to hash it out, or pick up the work of a philosopher I respect. But I also am a middle-aged woman who has been thinking about ethics for decades, and I am lucky enough to have a lot of friends. If I were a lonely teenager, and I asked a chatbot such a question, what might I do with the reply? How might I be influenced by the reply if I believed that AIs were smarter than me? Would I apply those results to the real world?
In fact, the overwhelming impression I get from generative AI tools is that they are created by people who do not understand how to think and would prefer not to. That the developers have not walled off ethical thought here tracks with the general thoughtlessness of the entire OpenAI project.
Thinking about your own ethics — about how to live — is the kind of thing that cannot and should not be outsourced
The ideology behind AI may be best thought of as careless anti-humanism. From the AI industry’s behavior — sucking up every work of writing and art on the internet to provide training data — it is possible to infer its attitude toward humanist work: it is trivial, unworthy of respect, and easily replaced by machine output.
Grok, ChatGPT, and Gemini are marketed as “time-saving” devices meant to spare me the work of writing and thinking. But I don’t want to avoid those things. Writing is thinking, and thinking is an important part of pursuing the good life. Reading is also thinking, and a miraculous kind. Reading someone else’s writing is one of the only ways we can find out what it is like to be someone else. As you read these sentences, you are thinking my actual thoughts. (Intimate, no?) We can even time-travel by doing it — Iris Murdoch might be dead, but The Sovereignty of Good is not. Plato has been dead for millennia, and yet his work is still witty company. Kant — well, the less said about Kant’s inimitable prose style, the better.
Leave aside everything else AI can or cannot do. Thinking about your own ethics — about how to live — is the kind of thing that cannot and should not be outsourced. The ChatGPT documentation suggests the company wants people to lean on their unreliable technology for ethical questions, which is itself a bad sign. Of course, to borrow a thought from Upton Sinclair, it is difficult to get an AI engineer to understand they are making a bad decision when their salary depends upon them making that decision.