AI is developing at a breakneck pace these days. Eventually this technology may change a vast range of our intellectual landscape. Currently, researchers are developing the AI field by teaching AI to play games, with hopes to best human players.
How can these AI bots can be given the best infrastructure to learn on their own?
For relatively simple games like chess, computers have been able to routinely best pro players since the late 90’s. However, chess is a game which has a somewhat small number of permutations. Permutations in gaming refers to the number of decision points that players have. For instance, in the first move of a chess game I have 20 legal moves to choose from, and my decisions branch out from there to form an elaborate decision tree. By copying the moves in the decision tree that a pro would make, the computer functions like a pro that barely ever makes mistakes. While an impressive feat, this isn’t technically AI because the computer never learned anything; it was told which moves were best.
Compared to chess, the ancient Chinese game Go has vastly more permutations. Due to the added complexity, simply copying moves that a pro player would make within a decision tree becomes an impossible option – you’d be painstakingly mapping out moves and teaching the computer for centuries.
Enter Deepmind, a recently acquired subsidiary of Google. Deepmind has recently been absolutely tearing it up in the field of AI research. To create a program which now can reliably beat pros at Go, Deepmind needed to develop an innovative new form of AI [possible paywall].
Here’s a brief and horrifically oversimplified summary of the steps Deepmind took:
1. From a database of games played by pro Go players, format this into something the AI can read
2. Ensure the AI can replicate some of the behaviors it reads
3. Create a reward structure so the AI can learn and identify more effective behaviors
4. Play the AI against itself thousands of times, allowing the AI to tweak itself between matches
5. Profit. This model leverages an artificial form of natural selection
So now we have AI that can win at possibly the most strategically complex board game in the world – so what? The key impact is that eventually, the same types of frameworks used to win games may well lay the foundation for the types of AI in sci-fi movies. Each game conquered brings us one step closer to a revolutionary technology.
Deepmind recently announced they were turning their sights on a new 1v1 game: Starcraft. This is insanely exciting for a few key reasons.
First, Starcraft has basically infinite permutations. Trying to calculate the number of permutations Starcraft has would be like for soccer – it depends which blades of grass each player stands on.
Second, Starcraft has incredible strategic depth with many layers present, and also has a great deal of hidden information. Each player will need to speculate what their opponent is doing with limited clues. Many of the concepts mirror actual warfare: economic growth, soldier training, tactical formations, technological arms-racing, scouting and deception.
Finally, Starcraft is awesome. It’s played by professionals in Korea for six-figure salaries by players treated like sports superstars. I personally play extensively and am very excited to play around with Deepmind’s soon openly available toolkit.
Before I delve into my proposed framework, a few quick caveats:
-I’m not advanced enough to write anything close to a technical paper on AI, and this is very conceptual in nature.
-Deepmind is definitely using some cutting-edge techniques that I don’t fully understand, and whatever they come up with will surely include some mind-blowing stuff above the scope of this post.
-I don’t know what framework Deepmind will use or what the overlaid API will look like.
In a game with such an insane degree of strategic depth, how do we create a way for the AI to effectively learn?
We need only look at our own brains to see the critical element: compartmentalization. The more complex the task, the more that specialization and cooperation between modules becomes necessary. By compartmentalizing an AI brain, different elements can learn and evolve semi- independently.
In other games that DeepMind has mastered, the AI is able to learn quite abstractly by recognizing patterns and repeating those patterns while learning which are most effective. This process is proven and should continue. However, with a game as complex as Starcraft featuring a huge variety of available tasks there is strong incentive for specialized components. In essence, instead of one AI mind there should be nine working in concert.
See the nine modules highlighted in blue:
None of these modules are alone; they’re operating much as a team of generals would. If the Scout module discovers the enemy has a particular advanced technology, it needs to relay that information to almost all of the other modules. It’s definitely relevant information for the Defenses, Rally and Tech modules, as those three are best positioned to formulate an answer to the impending enemy attack. In another interaction, the Tech module should alert the Rally module that a new type of unit will soon be coming down the production pipeline and should be resourced for.
When a module receives new information, a variable should assign an importance value to the info. If critically important information is being handed from module A to module B, then it’s time for the Prioritization module to give module B some extra time calling the shots. This could mimic a human player having a panicked realization, but hopefully without the sweaty palms.
Most high-level Starcraft players share a commonality in attitude. When they finish a game, win or lose, they reflect on that game and consider what they could have improved. I might lose a game due to being blindsided by an opponent who sneakily developed a technology which I wasn’t ready to respond to. In order to improve, I should realize that I didn’t scout enough. My Prioritize module should have been giving more time to the Scout module, and perhaps not as much to the Economic module.
An important limitation to note is that the AI will have its APM capped. APM is a term that was invented largely for Starcraft and stands for Actions per Minute. While the AI can think as much as it wants behind the scenes, it can only interact with the game at something like 300 APM depending on where DeepMind chooses to cap it. An APM score of 300 (six actions per second!) is about the physical limitation of pro human players, so this cap keeps the AI on a fairly level playing field with humans. This effectively limits the touchpoints the AI has to the game, but doesn’t limit the underlying volume of calculations where the modular generals discuss their next move. This makes conversation among modules an inexpensive resource, while actual moves are precious.
A huge responsibility for the AI to have is to juggle which priorities it needs to take in the game. This gives rise to the Prioritization module, perhaps the most important of them all. It leads the other modules and thereby gives the AI its personality. It’s the only module which doesn’t ever interact directly with the game, so it never uses APM and can be constantly running.
The goal of the Prioritization module is to give the other eight modules the right blend of control over the game state. At the start of the game, players don’t have any military units and must focus entirely on expanding their economy, building an army and developing technology. As a result, the Prioritization module is only giving time to those three modules at the outset. This gives way to other priorities as the game develops.
The above chart demonstrates what a pre-programmed Prioritization module might look like, in that it gives a variable preset. The variables being modified are the percentages of time each of the action modules receive throughout the game. Between games the AI must be able to modify these variables in order to learn and evolve strategically.
In-game adaptation also plays a key role, and Prioritization serves as a central repository for variables which allow reactionary play. In the words of Starcraft commentator Artosis:
“When your opponent attacks, defend.
If he defends, expand.
If he expands, attack.”
Reaction is critical to succeeding in Starcraft, so the Prioritization module needs adjust its allotments according to opponent activity.
The Module Breakdown – For Starcraft Nerds
All charts are representative of data tables, and all data tables must be tweakable between games by the AI to evolve. The charts show variable inputs for where the AI might start based on rough advice from human players, but it surely will evolve their inputs into something better. In order to evolve, the AI will need to constantly measure its expectations against the actual outcomes. If it often makes a particular trade that it calculates to be favorable only to find an unfavorable outcome, the data tables which gave it the false expectation would be adjusted.
For the SC2 Fanatics
The charts are speculative and not based on advanced theorycrafting like you might find on some other forums. I’m using Terran (without MULEs) for most of my illustrations because it tends to be the most straightforward and relatable, but each race would likely need its own AI framework.
Module – Expand Economy
This module owns the process of branching out and taking new bases. It looks at the map and game state holistically to determine whether to expand toward an opponent, away from one, or take an island or gold base. It ensures that mineral lines are properly saturated and projects how long those bases are expected to last before depletion.
Above, the AI plans for depletion of current bases in order to maintain an efficient mining total. This model assumes that four is the ideal active base count.
Below, the AI is equipped with an equivalency chart which allows it to recover from an attack where bases or workers are lost. If the AI loses a swath of workers and is down to 20 workers but retains 3 active bases, it knows to rebuild up to about 50 workers.
If Expand encounters an issue while taking a base, such as a burrowed ling or pylon block, it relays this to the Attack modules to clear out the obstruction.
Module – Fortify Defenses
This module determines the value of establishing static defenses, and builds them if worthwhile. It also positions the standing army not yet being used by Deploy in a way which maximizes the defensibility of current bases.
As shown in the API preview video from Deepmind, layers will be used to stage data in map form into the AI’s memory. From this memory, another map could be derived which shows points of weakness within the AI’s base. If a point of weakness is particularly dire, the AI will be able to calculate this in determining if building a turret is worthwhile.
Module – Rally Army
This module is tasked with building the largest army possible given anticipated future resources. This includes creating production facilities and keeping them at high utilization.
This module would likely use two primary tables to determine how many production facilities should be produced. The first is number of active workers, pictured below. The second is the amount of excess resources, or bank built up.
Interestingly, Rally works synergistically with the Advance Technology module while also competing with it for in-game resources. Then again, nearly all of the modules are in some form of constant competition with one another, which is why Prioritization calls the shots.
Module – Advance Technology
This module has two primary tasks. The first is to upgrade the existing army’s weaponry. This gives all units enhanced damage or defense and can be thought of as a multiplier on the number of units. This of course makes it more valuable with more units. There is a very slight exponential curve in the below chart to account for the benefit of having critical-mass DPS. Aside from the slight curve, it’s essentially an exercise in breakeven point investing which is typically a simple linear exercise.
The second task is to create new threat vectors for the AI by granting access to new units or abilities. This is a primary countermeasure to scouted enemy technology, as each new opposing technology can be countered by the right AI choice. The tech tree should be charted in the AI’s mind, with scouted enemy tech leading to increased probabilities for their counters.
Module – Deploy Army
This module is the one which most drives the AI toward the explosive engagements which make Starcraft fun to watch. By posturing aggressively and using attack paths, the AI gets itself into trouble and forces trades. It formulates battalions in order to effectively combo units together, and also determines the paths of least resistance through which to apply pressure. Once armies meet the enemy, the Employ Tactics module takes over.
Module – Employ Tactics
This is what most players refer to as micro. AI is going to have terrifying micro, because it isn’t bound by the mouse and keyboard setup. Its APM might be capped, but it will be able to pull off a few maneuvers which humans find physically impossible (like splitting marines in a perfect divergence).
Maneuvers are behaviors, taught or learned, which execute a specific control pattern over a battalion of units. The AI will need a table detailing all possible matchups from allied to enemy units, and choose its maneuvers on a case-by-case basis using probabilities weighted by opposing unit mix.
When engaging enemy units, players make a split-second decision. Do they have large enough of an army to win? If yes, they engage and chase the enemy down. If not, they disengage and kite away. Implicit in this decision is a measurement of their units’ worth against the opposing force. This can be illustrated through a breakeven trade table, shown below. In the example, the interaction between Marines and Zerglings is nonlinear. This is because marines, as a ranged unit, can hit a critical mass where their numbers become overwhelmingly effective against melee units. The AI will need a breakeven trade table for every unit matchup.
The above chart perhaps oversimplifies the decision being weighed. There are some instances where it’s actually beneficial to take a slightly unfavorable trade.
This is just like in chess. Let’s say you have 6 pieces and your opponent has 5, because you outplayed them earlier in the game. It would be in your best interest to trade knight-for-knight because then you would have 5 pieces to 4, which is an advantage of 25% instead of 20%.
In Starcraft, the same principle applies. A player with a lead should be forcing trades even if they would otherwise be considered slightly bad trades. Likewise, the AI shouldn’t take a slightly favorable trade if it’s otherwise behind. For this reason, the Prioritization module should be tracking the AI’s overall confidence of victory throughout the game in order to facilitate these more subtle choices. With a more nuanced approach, instead of consulting a simple breakeven trade table, the AI could consult the more advanced expected trade table shown below.
Module – Scout
The Scouting module has two primary functions both relating to threat assessment. First is determining how enemy technology is progressing and relaying that information to the other modules best suited to formulate a response. Second is discovering current enemy positions in order to better take a defensive posture.
The value of Scouting diminishes whenever the AI successfully gains information from the opponent. In this way the AI has an increasing anxiety about uncertainty, which is relieved only by validating its suspicions. Using units to scout can be costly in terms of unit utilization time and exposing that unit to danger, so often the net value of scouting is negative after obtaining the information needed to react.
Scouting is the module most likely to pass high-importance variables to the other modules. Often key pieces of information are enough to trigger a frenzied response, so other modules such as Advance Technology may be the sine to Scouting’s cosine wave.
Module – Deceive
This is perhaps the hardest module to conceptualize, in part because it might in fact be unnecessary. The simple truth is that in a game of Starcraft, you want you opponent to know as little as possible about what you’re doing at any given point in time, period. There are particular actions where this is especially true, like going for a hidden/proxy tech or a sneaky drop. However, all information is valuable to an opponent which makes denying information or providing misinformation also valuable at any opportunity.
Should deceptive tactics be employed by a discrete module, or by the other modules? For instance, the AI is going for a drop. Dropping is most effective when it’s totally unseen, and units leave the medivac in a patch of fog of war in the opposing base prior to charging in at full force. Should that maneuver be executed fully by the Deploy Army and Employ Tactics modules? Or should the Deceive module aid the other two to execute this move?
On the one hand, the three modules together may initially fight one another, with Deceive opting always to remain unseen (and ineffective) while the other two modules try to charge in. This would seem to support the notion that Deceive shouldn’t be a module at all, but rather the concept of deception should be built more subtly into every gesture the AI executes.
Alternatively, the Deceive module may be necessary in order to execute advanced “meta” strategies currently being employed by pros regularly. An example of this would be the proxy dark shrine. AI might never come up with this novel strategy without a discrete Deception module. This is because the AI would understand the risk of losing an isolated structure as the downside, but not quantify the upside of never having the hidden building scouted. This could also be true of taking a hidden base. Either way, the AI would probably benefit from assigning probabilities to major sections of the map according to how likely they are to be scouted by the opponent.
This gets to a deeper question of how to imitate the human process of lying effectively, which is way above my IQ.
The field of AI continues to evolve and existentially threaten the intellectual dominance that we take for granted as Earth’s apex species. As the technology industry continues to embrace this inevitability, new techniques are rapidly emerging to help push AI forward. Hopefully using modular AI will help us arrive at the singularity faster, because that would be pretty chill. Thanks for reading.
All non-original photos were labeled for reuse by copyright holders.
All non-captioned photos are original productions copyright John Rhinehart 2016.
John’s Starcraft background:
I got into SC:BW and did fairly well in high school just prior to the release of SC2. In 2, I’ve placed Master (admittedly only on the NA server) for 4+ seasons in 1v1, primarily playing Zerg from Brood War through today. I’ve also played mech and Protoss enough to be comfortable, but Zerg is where it’s at.