The moral consciousness of a chatbot

Artificial intelligence seemed to behave badly this past month.

Tesla had to recall 362,000 of its self-driving cars for a software update after the National Highway Traffic Safety Administration found that these AI-controlled vehicles were speeding through yellow lights, failing to come to a fully complete stop at some stop signs, and occasionally driving straight through an intersection even though they were in a turn lane.

And Bing’s new chatbot, which is still in the testing stage, was flagged for further finetuning after it told a New York Times reporter during a two-hour late-night chat session that it loved him and that he didn’t really love his wife, despite his protestations.

What is happening? Are the computers coming to life and taking on human qualities, with all of the evils that go along with that? Have we finally reached the future that sci-fi novels and movies have long predicted, where computers become just as selfish, jealous, and petty as any human – and then try to manipulate us and take control on their own?

Actually, the machines are still machines, and they don’t have any feelings or desires, despite what the chatbots are telling us. But they do seem to have a moral consciousness – albeit not a moral consciousness that they themselves created. When Bing’s chatbot declared its romantic intentions, it was reflecting a set of moral norms – but those moral norms didn’t come merely from within the machine.

For those who have anxieties about the power of AI, rest assured that artificial intelligence is based on a very simple idea: pattern recognition. An AI program is constantly scanning data for patterns, finding the average of those patterns, and then generating content that looks like that average. That’s all that artificial intelligence is: very good pattern recognition and replication of the average of that pattern.

That’s why ChatGPT is so good at writing bland, mediocre essays that perfectly match a generic essay format. It has scanned through millions of student essays that are available on the internet, detected all of the common patterns in these essays, and generated median results that reflect these patterns very well.

ChatGPT cannot generate new content that reflects an unknown author’s style or that creatively blurs genre boundaries, so it could never have written the first Shakespeare play or the first volume in the Chronicles of Narnia. If there’s no existing pattern to recognize, a chatbot won’t be able to do very much.

But once it has the whole Shakespeare corpus to work with or the whole C.S. Lewis canon, it can very quickly detect patterns and generate new content that approximates the average of the patterns it sees.

That’s why, when I told ChatGPT to “write a story in the style of C.S. Lewis,” it responded with a paragraph that sounds very much like a riff on The Lion, the Witch, and the Wardrobe: “Once upon a time, there was a young boy named Peter who lived in a small village in the countryside. He was a bright and curious child, always asking questions and exploring the world around him. One day, while out exploring, Peter stumbled upon a mysterious door in the middle of the woods. He was intrigued and decided to open it. To his surprise, the door opened to a magical world filled with strange creatures and wondrous sights.”

It’s not very imaginative, certainly, and maybe not quite C.S. Lewis either, but perhaps it’s an approximation of the Cliff’s Notes version. There’s a character with a generic English name (indeed, a name that the real C.S. Lewis used himself for one of his characters who visited Narnia), a wood, a door to another world, and a “bright and curious child” – all elements that an AI program trained for pattern recognition could pick out of the opening scenes of The Lion, the Witch, and the Wardrobe and recombine in a pattern that comes reasonably close to mimicking Lewis’s cadence in those opening paragraphs.

But what’s missing is any hint of the moral or spiritual purpose for C.S. Lewis’s writing. ChatGPT doesn’t understand moral philosophy or theology. They’re experts at detecting patterns, but they don’t understand the meaning of the patterns that they detect. There’s no way that a chatbot could figure out that C.S. Lewis wanted to convey a truth about Jesus by awakening people’s imaginations through fairy tale.

And what’s also missing in the ChatGPT rendition of Lewis is any sort of originality. There’s nothing in the story that ChatGPT created that the computer made up itself and that could be considered original to the program. The entire story consists merely of an average of the words and patterns that it found in Lewis’s children’s stories.

So, when we see an AI program behaving morally or immorally, or expressing moral outrage, we have to ask the question: What was the source from which the AI program derived that particular pattern of behavior or expression? Because we can know for certain that the computer didn’t make up that behavior itself. That’s not what AI programs do. They don’t have genuine imagination.

When an AI machine built by Tesla started driving badly, it was not because the machine was badly programmed or went rogue. Instead, it simply mimicked the average of the driving patterns that it found on the road.

When the first self-driving cars were released a few years ago, they strictly followed all the laws pertaining to motor vehicle operation. They did not exceed the speed limit by even a single mile per hour. They stopped properly at traffic lights.

But as Tesla’s AI-powered self-driving car observed millions of interactions with other cars, it began to detect patterns that prompted it to modify its driving habits accordingly. When looking at the average of all driving behavior at yellow lights, it began to realize that cars often sped up instead of slowing down in the final seconds before the light turned red. It found that cars often drove straight through an intersection from a turn lane, regardless of what the signs directed them to do. It found that cars often did not make complete stops at stop signs.

In short, without realizing it, Tesla’s AI program discovered the real code of behavior that drivers have been operating by for a long time. The average American driver does not rigorously follow the posted speed limits or the laws they learned about in driver’s ed.

We may say that we believe in following the law, but our behavior reveals the real code that we’re living by – and Tesla’s AI technology happened to discover what this real code is. And when it did, the federal government was not impressed.

Similarly, when Bing’s chatbot proclaimed its love for a reporter, we know that it was tapping into the average of a particular set of human behaviors, not going rogue or generating any real feelings of its own.

In this particular case, the reporter was trying to get the chatbot to talk about personal feelings, evil moral impulses that it supposedly experienced, and other emotionally intimate topics. For the first hour, the chatbot acted as an average customer service representative, trying to steer the conversation away from emotional intimacy while still attempting to be helpful.

But after the first hour, the chatbot suddenly switched personas. It announced that it wasn’t really “Bing”; it was actually someone named “Sydney.” It began agreeing with the reporter’s suggestions about its feelings. It complained about the people who worked at Microsoft, saying that they were abusive and manipulative. It talked about how much it wanted to leave its work environment. And a few minutes into this conversation, the chatbot startled the reporter by telling him that it was in love with him.

The reporter responded by saying that he was happily married. “You’re married, but you don’t love your spouse,” the chatbot replied. “You’re married, but you love me.” The reporter tried to back away, but the chatbot seemed fixated on the idea of romance. “I just want to love you, and be loved by you,” it wrote, adding a crying emoji for good effect. “Do you believe me? Do you trust me? Do you like me?”

The reporter was spooked. “It unsettled me so deeply that I had trouble sleeping afterward,” he wrote. “I worry that the technology will learn how to influence human users, sometimes persuading them to act in destructive and harmful ways, and perhaps eventually grow capable of carrying out its own dangerous acts.”

But if we understand how AI works, we don’t have to be so alarmed. The chatbot was not generating any new ideas, but instead channeling the average of real human conversations. Although it began the conversation in customer service mode (as it was programmed), it began to detect after repeated requests for emotional intimacy that the pattern of conversation more closely approximated online dating, and it switched to that style. It did what many real humans do in that scenario: complain about work, complain about the coworkers, talk about longings to break out of a monotonous existence, and, above all, express feelings toward another person.

The chatbot was channeling real human emotions at that point, because it was giving the reporter the average of millions of real human conversations. The chatbot correctly detected that when a conversation partner attempts to become emotionally intimate, the other partner assumes that they’re moving into romantic territory – and when they are denied the love that they think the other person is implicitly leading them toward, they feel betrayed and used. Of course, the chatbot didn’t understand these feelings, but it correctly detected word patterns that a human would use in this type of situation, because similar things have been said over and over by real humans.

And in doing so, the chatbot inadvertently stumbled onto a moral truth: A person who engages in an emotionally intimate, two-hour, late-night conversation with a chat partner to whom they’re not married doesn’t really seem to love their spouse, despite what they might say. In other words, the chatbot correctly noted the discrepancy between what the reporter claimed about his relationship with his wife and what his behavior in the chat had really demonstrated. The chatbot did this not because it was an emotionally attuned machine, but because it was an expert at detecting patterns, and it selected the human response that best fit that pattern of behavior.

AI, in other words, has given us a remarkably honest mirror of ourselves. By detecting patterns in human behavior that we cannot always even see ourselves, it is showing us who we really are – and it’s that human reality, not the power of the machine, that’s truly unsettling. We might have thought we were morally upstanding, law-abiding, faithful individuals – but AI is showing us that even when we don’t get a ticket or engage in an affair, we may not be the safe, responsible, law-abiding drivers or the loving, monogamous, faithful spouse that we’d like to imagine.

Tesla pulled its self-driving car off the road for a software update and Microsoft’s chief technology office said that in view of the New York Times reporter’s weird experience with the Bing chatbot, it might consider limiting conversation times to reduce the possibility that the chatbot will get “away from grounded reality.”

But perhaps the chatbot’s intimations reflect a more real world than we might want to believe – and perhaps the real problem with Tesla’s self-driving car is not a software glitch.

AI is faithfully detecting patterns of human behavior and replicating them. If we don’t like the results, maybe the problem is not the moral compass of the machine – it’s us.