Much has been said about artificial intelligence (AI) and machine learning, which learn from their interactions so that they can work better in the future. But a number of incidents have pointed out how easy it is for AI systems to go astray, either by accident or because people deliberately set out to corrupt them.
That computer programs respond based on the data they’re given isn’t new; “Garbage in, garbage out” dates back to the 1950s. As computers and software grow increasingly sophisticated in the data sources they accept and the responses they can provide, however, it’s not always as easy to detect what might be “garbage.”
One of the oldest examples of such manipulated AI systems is the “Google bomb,” where people created a particular catchphrase, associated it with something else, and posted the connection around the Internet. Eventually, searching for either the catchphrase or the associated item would result in the connection coming up high on Google search results. Examples include 1999’s “more evil than Satan himself” with Microsoft, and 2006’s “miserable failure” with President George W. Bush.
More recently, researchers gave Watson, IBM’s supercomputer-based AI system, the Urban Dictionary as a data source. This became a problem because the Urban Dictionary has obscene words in it. Like a three-year-old, Watson obligingly started using some of his new vocabulary. Eventually, that data had to be scrubbed from Watson and filtering developed to keep him from using certain words.
One could argue that, particularly in developing an AI system intended to communicate with humans using natural language, using obscene terms isn’t necessarily a bad thing, in the right context. Unfortunately, Watson wasn’t yet smart enough to know when the use of such terms was appropriate.
Then people started getting more directly involved. In 2014, data scientist Anthony Garvan developed a Web-based game called Bot or Not? that paired players either with other players or an AI program, had them converse, and let them guess which they were talking to. The AI program also learned how to converse from the other players. He quickly had to shut it down when it started spouting racist sentiments.
“A handful of people spammed the bot with tons of racist messages,” Garvan writes. “Although the number of offenders was small, they were an extremely dedicated bunch, and it was enough to contaminate the entire dataset.”
Most recently, Microsoft created an AI Twitter chatbot named Tay. While chatbots had been created before that could parrot Tweets sent to them—sometimes with embarrassing results—Tay was different in that she could not only repeat Tweets, but learn from them and start using the phrases and ideas in context. « The more humans share with me, the more I learn,” Tay chirped when she was first turned on.
Little did she know. “Microsoft’s AI developers sent Tay to the Internet to learn how to be human, but the Internet is a terrible place to figure that out,” writes Ryan Faith in Vice.
As with Bot or Not?, several people systematically Tweeted various sexist and racist sentiments to Tay, and she promptly started using these new ideas in her own Tweets. Microsoft ended up having to shut down the system less than 24 hours later.
But concern about AI systems goes beyond an occasional off-color remark or a shocking Tweet. For example, as AI systems become used for functions such as predicting who might commit a crime, some worry they will start performing racial profiling, based on the data used to teach them.
“The fundamental assumption of every machine learning algorithm is that the past is correct, and anything coming in the future will be, and should be, like the past,” Garvan writes. “This is a fine assumption to make when you are Netflix trying to predict what movie you’ll like, but is immoral when applied to many other situations. It’s one thing to get cursed out by an AI, but wholly another when one puts you in jail, denies you a mortgage, or decides to audit you.”
And with deliberate intervention like what happened with Tay, this could get worse. “Just because developers might succeed in creating a safe AI, it doesn’t mean that it will not become unsafe at some later point,” writes Roman Yampolskiy in his paper Taxonomy of Pathways to Dangerous AI. “This can happen rather innocuously as a result of someone lying to the AI and purposefully supplying it with incorrect information or more explicitly as a result of someone giving the AI orders to perform illegal or dangerous actions against others.”
In addition, Google and other AI research firms are concerned that robots and AI systems that get “rewarded” for the proper action could end up “reward hacking,” or creating problems that it can solve and thus get rewarded. “If we reward the robot for achieving an environment free of messes, it might disable its vision so that it won’t find any messes, or cover over messes with materials it can’t see through, or simply hide when humans are around so they can’t tell it about new types of messes,” notes the report Concrete Problems in AI Safety. Reward hacking already happens often enough that it could be a “deep and general problem,” the report warns.
It’s similar to the Cobra Effect in humans, where people rewarded for a specific behavior do what they can to ensure that the rewards keep coming—in that case, raising cobras so they can continue getting bounties on dead cobras.
As with humans, this is particularly likely to happen if the AI system is aware of the metric used for giving rewards. “A designer might notice that under ordinary circumstances, a cleaning robot’s success in cleaning up the office is proportional to the rate at which it consumes cleaning supplies, such as bleach,” notes the report. “However, if we base the robot’s reward on this measure, it might use more bleach than it needs, or simply pour bleach down the drain in order to give the appearance of success.” (Developers of such systems may need a “lime equation.”)
Market researchers are already considering this idea in machine-based high-frequency stock trading, writes Mark Melin in ValueWalk. “Can a high frequency trading machine both engage in illegal behavior and then cover up its tracks?” he writes. “Don’t laugh, it is a current issue being considered.”
As we soon face a Jetsons-style future with self-driving cars and other AI systems, people are thinking about how to implement Asimov’s Three Laws of Robotics. Perhaps Asimov didn’t expect, though, that some people would deliberately try to teach the robots to violate those laws.