What is WHAM?
Imagine a world where video games are created not just by human designers, but in collaboration with artificial intelligence. That’s the promise of systems like WHAM (World and Human Action Model), an AI developed by Microsoft researchers. This AI is trained on vast amounts of video game footage, learning to recognize Patterns in player behavior and game mechanics. This system doesn’t just passively observe; it actively predicts what will happen next in a game Scenario and generates new content based on its predictions. Think of it as an AI that can ‘imagine’ how a Game might unfold and then bring that vision to life.
The Holy Grail of Game Development
The video presenter jokes that this finding is “the holy grail” for Microsoft scientists. After all, it involves finding a way to play video games at work and get paid for it!
While the initial applications may seem simple, the underlying technology is incredibly complex. The AI needs to understand game physics, character interactions, level design, and even player psychology to make Meaningful predictions. This is no easy task, even with the advanced machine learning techniques available today.
How WHAM Works: Learning from Gameplay
The core of WHAM's approach lies in its ability to learn from existing gameplay footage.
The AI analyzes videos of players interacting with games, identifying actions, strategies, and common sequences. It then uses this knowledge to predict what might happen in new, unseen scenarios. This process is iterative, meaning the AI continuously refines its predictions based on feedback and new data.
Iterative Tweaking:
This iterative process is key to improving the quality and relevance of the AI's output. By constantly comparing its predictions to actual gameplay, WHAM learns to better anticipate player behavior and create more engaging and realistic game scenarios.
The Challenge of Low-Resolution Videos:
Interestingly, even low-resolution videos can be used to train the AI. This suggests that the AI is focusing on high-level concepts and patterns rather than pixel-perfect details.
Predicting and Generating Game Sequences
Once trained, WHAM can be given a new game situation and tasked with predicting what will happen next.
This involves generating a sequence of events, including character movements, interactions with objects, and changes to the game environment. The AI essentially fills in the blanks, creating a plausible and engaging continuation of the game scenario. This is not a simple task. The AI must account for a myriad of factors, including player agency, game rules, and the overall narrative context. It needs to generate sequences that are both believable and fun to play.
Early Training Results:
Early training can lead to some unpredictable results, with the AI Generating sequences that quickly diverge from the original game. This highlights the challenges of teaching an AI to understand the nuances of game design and player behavior.
Improved Training:
However, with more training, the AI's predictions become more accurate and Relevant. It learns to stay within the bounds of the game world, maintain consistent character behavior, and even create plausible interactions with objects. At the fully trained footage, we see how objects such as the power cell can be correctly interacted with in gameplay.
Human Input:
One of the most exciting aspects of this AI system is the ability for human players to interact with and influence the generation process. By using a game controller, players can choose different directions or actions, effectively branching the storyline and guiding the AI's creative process.