Dota is one of the last games we expected to be playable by advanced AI, let alone mastered. Up until now, machines have stuck to beating us in simpler arenas. Previously beaten games like Chess and Go have restricted movesets and are ruled by turn based systems. The ultimate number of permutations are limited in Chess - there are only so many spaces to go and only one move can be made per turn.
Chess, since it goes in turns, is not continuous. Dota 2 is. As the OpenAI blog states:
One AI milestone is to exceed human capabilities in a complex video game like StarCraft or Dota. Relative to previous AI milestones like Chess or Go, complex video games start to capture the messiness and continuous nature of the real world.
Units and buildings can only see the area around them. The rest of the map is covered in a fog hiding enemies and their strategies. Strong play requires making inferences based on incomplete data, as well as modeling what one’s opponent might be up to. Both chess and Go are full-information games. In Dota, each hero can take dozens of actions, and many actions target either another unit or a position on the ground. We discretize the space into 170,000 possible actions per hero (not all valid each tick, such as using a spell on cooldown); not counting the continuous parts, there are an average of ~1,000 valid actions each tick. The average number of actions in chess is 35; in Go, 250.
We went from 35-250 possible actions per move to over 1000 per tick per player, in a ten person game. Additionally, the bots have to recognize and think about what the enemy team can be doing in the fog of war. Consider this: if you knew where your enemies were on the map at all times, how would your play change? You could split push at will, knowing none could stop you. You could dodge any gank and defend any oncoming push. The game gets significantly harder as information becomes more limited, but OpenAI has found a way to navigate these challenges by using massive amounts of data.
Massive data harvesting and analyzing it for the use of finding advantages is nothing new. You can go on Dotabuff right now and see that Bounty Hunter is best this week vs Riki, Undying, and Clinkz while being the worst against PL, CK, and Meepo. Sometimes, the reason for the advantages are obvious- Riki gets tracked by Bounty and killed. PL can purge track easily, allowing him to escape. Any person who plays Dota 2 can tell you why Bounty is good against Riki. By parsing a large number of matches (at this moment, 67,691 matches of Riki vs Bounty), you can be told on Dotabuff without having to think about it.
The law of large numbers is a principle of probability where the frequencies of events with the same likelihood of occurrence even out, when given enough trials. If you flip a coin 10 times, you might get 7 heads and 3 tails. If you flip it 20 times, you might end up with 12 heads and 8 tails. If you flip it 100 times, 51 heads and 49 tails. The larger your sample size, the closer your frequencies will approach the predicted probabilities-because it’s something that’s evenly balanced.
Dota 2 begins as a 50-50 game. The two sides are evenly matched, and the creeps are equal strength. As you pick different heroes, that win % can change. After 67,691 matches of Bounty vs Riki, it is not an even 50% winrate between the two. You can therefore deduce one has the advantage against the other. Utilizing those kind of tools can be incredibly powerful, especially for less common matchups. For instance, you may not know how well OD does against PL mid off the top of your head. You have to think about how the lane plays out. Large amounts of data can tell you instantly. However, large amounts of data aren’t the answer to everything. There can be false positives. The bigger the sample size you get to work with, the less variance there is, and if the winrate % difference remains-you’ve probably got something meaningful.
OpenAI plays 180 years of Dota 2 per day, that’s 900 years per day if you want to count each hero separately. The law of large numbers comes into effect once again. As the bot tries new strategies, and sees what wins more, it learns that one is more effective than the other. Humans can’t match such raw computing power. In a game where playing pubs is important for learning new strategies and experimenting with new builds, bots reign supreme through sheer force.
The most striking thing we see in the OpenAI Five video is that the bots start to control 2/3rds of the map. Blitz discusses that the bots have learned to use their advantage to take over the enemy jungle. There are a variety of reasons controlling the enemy jungle is good. You take away the enemy cores safe space to farm. It’s much easier to move around that side of the map due to the design itself. There are more entrance points and more big open spaces.
As you can see here, there is so much more open space in the Dire safelane jungle than the Dire offlane jungle. The Dire offlane jungle has an ancient camp in the way, and a bunch of trees and cliffs. The Dire safelane jungle has so much more room to maneuver around in case of a fight, and offers superior ward placement locations. As Blitz says, it’s one of the highest level plays you can make in Dota 2. He goes on to say he played for 8 years, never thinking of it when Team Liquid told him about it. Jungle trades became the most prevalent strategy before and at TI7 itself. Liquid’s abuse and mastery of it was the biggest reason they won.
You can see here how Liquid have warded the enemy half of the map, and are placing their cores in the enemy jungle to gain farm advantage. Jungle trading was one of the biggest strategies and revelations in map movements to hit the game in a long time, and bots figured it out by themselves at scary speeds.
Of course, jungle trading wasn’t always so good. Changes to the map, the amount of entry points into the jungle, creep camp placements, shrines, and other things all influence how effective this can be. It’s not like jungle trading was always good. But the important thing is that what took one of the most brilliant strategic teams we’ve seen in Dota 2 an unknown amount of times to learn and master, bots did it quickly. Through brute force by playing a large number of games, and figuring out what strategy gave them the larger win %, the bots learned. The slow methodical creep slowly tips the scales until they figure out what the most efficient play is-because that’s what these bots are. Efficient. They think in terms of numbers and advantages, like how much gold they’re getting, and they can all calculate this instantly-while mere humans cannot.
In the future, as patches change, I can’t help but wonder if they will be on the cutting edge of forging new metas and finding the most efficient plays. How quickly can they figure out what is most efficient, and what gives them the highest win % differential? There is still a ways to go. The bots are limited by certain rules still: no wards, no roshan, no scanning, and they’re limited to mirror matchups. However, we have gone from a strict 1v1 bot to something that can play as a team, with a strategy used by TI winners that they figured out by themselves. If I had the amount of funding needed to kickstart my own OpenAI, I would do it in a heartbeat-as should any professional team. The bots that OpenAI creates could be revolutionary and they could be critical to having new ideas of thinking about the game faster than anyone else on the playing field.