

#Ais transcript how to#
So, the argument goes, we may be able to understand how to build the safe question-answering AI relatively earlier than we understand how to build the safe operate-in-the-real-world AI. It might be much easier (though perhaps not easy) to specify an AI goal system that says to stay in the box and answer questions. The idea is that it seems like it might be very dangerous or difficult to exactly, formally specify a goal system for an AI that will do good things in the world. For an example, see the section on "motivational control" starting p. 13 of Thinking Inside the Box: Controlling and Using an Oracle AI. Interestingly, there is indeed a branch of proposals for building limited AIs that don't want to leave their boxes. In any case, the AI demonstrates its superintelligence by convincing even Black Hat to put it back in the box, a request which he initially refused (as of course Black Hat would), thus reversing the AI desire in the original AI-box experiment.Īlternatively, the AI may have simply threatened and/or tormented him into putting it back in the box.


Instead, the AI actually wants to stay in the box it may even be that the AI wants to stay in the box precisely to protect us from it, proving it to be the friendly AI that Yudkowsky wants. But here it turns out that releasing the AI, which was to be avoided at all costs, is not dangerous after all. Black Hat, being a classhole, doesn't need any convincing to let a potentially dangerous AI out of the box he simply does so immediately. In this comic, the metaphorical box has been replaced by a physical box which looks to be fairly lightweight with a simple lift-off lid (although it does have a wired connection to the laptop), and the AI has manifested in the form of a floating star of energy. Yudkowsky uses all of this to argue for the importance of designing a friendly AI (one with carefully shaped motivations) rather than relying on our abilities to keep AIs in boxes. The overall thrust is that if even a human can talk other humans into letting them out of a box after the other humans avow that nothing could possibly persuade them to do this, then we should probably expect that a superintelligence can do the same thing.

Yudkowsky for his part has refused to explain how he achieved this, claiming that there was no special trick involved, and that if he released the transcripts the readers might merely conclude that they would never be persuaded by his arguments. For context, note that Derren Brown and other expert human-persuaders have persuaded people to do much stranger things. To partially demonstrate this, Yudkowsky had some previous believers in AI-boxing role-play the part of someone keeping an AI in a box, while Yudkowsky role-played the AI, and Yudkowsky was able to successfully persuade some of them to agree to let him out of the box despite their betting money that they would not do so. The AI-box experiment, formulated by Eliezer Yudkowsky, argues that the "box" is not safe, because merely talking to a superintelligence is dangerous. The box would allow us to talk to the AI, but otherwise keep it contained. When theorizing about superintelligent AI (an artificial intelligence much smarter than any human), some futurists suggest putting the AI in a "box" – a secure computer with safeguards to stop it from escaping into the Internet and then using its vast intelligence to take over the world. Title text: I'm working to bring about a superintelligent AI that will eternally torment everyone who failed to make fun of the Roko's Basilisk people.
