WINNER: Yes! Beat you again, sucker! SIRAJ: One day I’m going to make a bot that beats you in any game!
Keep telling yourself that, Siraj. Hello world, it’s Siraj, and let’s make an amazing video game bot in just ten lines of code that can play a huge variety of games. Video games have been around since the 50’s, when Joseph Kates publicly demoed Tic-Tac-Toe at the US online casinos.
That bot used simple scripted actions that ran the same way every time regardless of whatever move the player made. His demo got people hyped, though, because no one had ever seen a computer play a game before and they were lining up off the block to check it out. The game bots that were invented afterwards for games like Nim and Spacewar were similar but along came Polly. I-I mean Pong The Pong bot’s paddle had to make decisions based on the human player’s actions, and that made it feel more realistic. Pong marked the beginning of using heuristics to create game bots. Heuristics are educated guesses and pretty much every single video game bot since Pong’s has used them.
A bot will map out a possible set of decisions as a tree of possibilities, then use one of many techniques to pick the best one. But as cool as that sounds, it’s still always boiled down to a bunch of if-then statements. If Pac-Man moves this way, then the blue ghost should move this way. If Master Chief sees a Grunt, then it should move in circles like my Facebook news feed. If Captain Falcon is being annoying AF Then your team bots should help you pwn him. Squad Goals But yeah, video game bots have pretty much always sucked because there are only so many edge cases that a programmer can predict like, if the human in Fallout 3 has a pistol AND isn’t moving AND there are no enemies nearby, run into each other.
[sigh] We need to think about this problem differently. When you or I start playing a game, we don’t know anything about its environment beforehand. The hallmark of intelligence is our ability to generalize, but can we make artificial intelligence that can generalize to solve any task? A team of researches at DeepMind recently got close by creating one bot that could beat almost any Atari game knowing literally nothing about the game beforehand.
No game-specific hard-coded rules at all. It was just fed the raw pixels of the game and its controls. Using those two things, it learned how to beat almost any Atari game it was given. It did this using a technology called “deep learning.” If you take a deep neural network, and feed it lots of data and compute, it can learn to do a whole lot of incredible things. The field of deep learning right now is where physics was in the early 1900’s.
The state of the art in a huge number of subfields like vision and speech is being broken almost every other day. It’s a very exciting time right now. The Marie Curies and Albert Einsteins of computer science are all alive right now, and newcomers are coming in every day. DeepMind is awesome, and they keep a good chunk of their code private, since Google uses it to outperform its competitors.
But then Elon Musk came along and was all like, ELON MUSK: I think it’s important that if we have this incredible power of AI that it not be concentrated in the hands of a few. SIRAJ: And so he cofounded a nonprofit called OpenAI whose goal is to democratize AI so anyone can use it. And just today they released something called Universe. Universe is a platform that lets you build a bot and test it out in thousands of different environments from games as simple as Space Invaders, to Grand Theft Auto, to protein-folding simulations that could cure cancer. You can create a bot, and the better you make it, the more games it’ll learn to become amazing at. You can compete with other bot developers, to see whose bot beats the most games and Universe has other environments, too, or web interface tasks like managing emails and booking flights.
If you create a bot that’s able to defeat any environment, you’re not only the dopest coder of all time, you just solved intelligence. We could then use your bot to solve literally everything from global warming to poverty to all known diseases. So with that, let’s create our first simple bot in just ten lines of Python code. In our first two lines of code, we’ll import gym and universe. gym is OpenAI’s original codebase that Universe builds on and extends to include way more environments and features. Those are the only two dependencies we’ll need.
Now, we can select our environment. We’ll define an environment variable called “env,” and use gym’s make() method to define our environment parameter. There’s so many to choose from, it’s hard to pick, but let’s go ahead and pick the popular Flash game Coaster Racer. Universe lets us run as many environments at the same time as we want.
but for now, let’s just use one. Our next step is to initialize our environment with the reset() method. It’ll return a list of what we call “observations” for every environment we’ve initialized. An observation is an environment-specific object that represents what the agent observes, like pixel data of what it sees and the state of the game. Initially, we’ll just have an empty set of observations since the game hasn’t started yet. Now that we’ve initialized our environment, let’s go ahead and create a while statement so our agent will just keep running indefinately.
We’re just going to have our bot do one simple thing. It’s going to hit the up arrow [REPEATED BUTTON PRESSING SOUNDS] This is formatted by first specifying the type of event the key, then true, which means “press it,” and we’ll do this for each environment’s observation. We’ll call this an “action” and store it in our action variable. Now we’ll call our environment step method to move forward one time step and use the action as a parameter. This is our implementation of reinforcement learning. Our bot will take an action, in our case pushing the up arrow then it’ll observe the result, and may or may not receive a reward if that action was beneficial to its goal, which in our case is increasing the game score.
OpenAI uses a custom image recognition module here to read the game score in order to return a reward. This module is included in the environment, so we don’t need to worry about it. If it does receive a reward, we can update our bot to do similar actions in the future so it gets better over time through trial and error.
So the step method returns four variables: an observation of the environment, a reward, a yes or no value if the game is done, and some info like performance timings and latencies for debugging, and it’ll do this for all the environments you’ve trained your bot in simultaneously. Lastly, we’ll render the environment so it’s visible to us. Let’s demo this baby. I’ll run the code in terminal, and it’ll connect to our VNC server in our local Docker container, running a Flash enabled Chrome browser. The pre-scripted mouse will click through the necessary screens to get the game started. then our bot will start programmatically controlling the game remotely.
Yeah, our bot really sucks, but how dope is this? We can do this for as many games as we like and to make it better, we can try different strategies like random search, or hill climbing, or just replicate what DeepMind did. They fed the observations that their bot received into a neural network that updated its connections to get better if it received a reward. OpenAI already has a starter bot that uses deep reinforcement learning via TensorFlow that I’ll put a link to in the description.
And so, to break it down, OpenAI’s Universe is a platform that lets you train and test bots for thousands of games and other environments. Reinforcement learning is the process of using trial and error, similar to how we learn, to improve a bot. and if you create one bot that can succeed in any environment it’s given, you’ve just solved intelligence. The coding challenge for this video is to create a bot for just Coaster Racer that is better than this video’s demo code. Post your GitHub link in the comments and I’ll give a shoutout to the winner in my video one week from today. and I’ll do a one-on-one Google Hangout with them just to say hi and talk about whatever.
For now, I’ve got to make a laundry folding robot, so thanks for watching