GameDev

Microsoft's Muse Neural Network: What is it and can it help create video games?

Microsoft's Muse Neural Network: What is it and can it help create video games?

Learn: Game Designer from Scratch to PRO

Learn more

Muse was trained using seven years of gameplay from the action game Bleeding Edge

Work on the Muse project began in late 2022, shortly after the launch of OpenAI's ChatGPT. Kati Hofmann, head of Microsoft Research Game Intelligence, noted that this text-based generative model made a significant impression on the team and raised the question of applying similar technologies to video game development. In an effort to answer this question, Microsoft Research Game Intelligence teamed up with Teachable AI Experiences and Ninja Theory, a studio also part of Microsoft. This collaboration aims to integrate cutting-edge AI technology into the gaming industry, opening new horizons for development and player interaction.

Ninja Theory, known for its games DmC: Devil May Cry and Hellblade: Senua's Sacrifice, released the online arena action game Bleeding Edge in 2020. The game offers 4v4 multiplayer battles, but received lukewarm reviews from critics and users. Its limited popularity led the studio to discontinue support for Bleeding Edge less than a year after its release, highlighting the challenges of creating successful multiplayer titles.

Bleeding Edge played a vital role in the development of Muse. The game recorded and saved the gameplay of all users who agreed to the End User License Agreement. Since most players typically accept such agreements without even reading them, it can be assumed that this included virtually all game sessions. Data collected over seven years of Bleeding Edge's existence served as the basis for a new neural network, significantly improving its functionality and capabilities. Muse was trained using individual frames from recorded gameplay, rather than videos. Hofmann claims over a billion images were used. Experts also provided the neural network with data on how players interacted with controllers in Bleeding Edge, including which buttons they pressed. This approach improves the quality of training and enhances interaction with game mechanics.

Image: Microsoft

The table above illustrates The number of iterations performed by Microsoft's AI specialists during the training process. The left column features a short gameplay video from Bleeding Edge. The experts isolated one second of gameplay, consisting of several frames, and nine seconds of controller interaction. They then began training the Muse neural network, and the first successes were only noticeable after 10,000 iterations. After a million iterations, the AI ​​was able to generate a gameplay video that met all the requirements specified in the left column. This process highlights the complexity and time-consuming nature of AI training, as well as the importance of large data sets for achieving high-quality results.

Muse cannot create either interactive gameplay snippets or actual games.

Muse currently has the unique ability to create short gameplay videos. This process is based on an initial prompt that includes images and controller interaction information. This feature allows users to effectively visualize gameplay, creating engaging content that can be used for a variety of purposes, including game promotion and educational materials. With Muse's ability to generate videos based on given data, users can quickly and easily create high-quality content that meets their needs.

The Microsoft neural network has the ability to analyze the conditions specified in the prompt, including 3D game worlds, character and object placement, player behavior, and interface. It can create short videos that present gameplay as the neural network sees it. This opens up new horizons for visualizing gameplay and demonstrating potential player experiences.

You can provide Muse with several frames of actual gameplay and data on the buttons the player pressed during those frames. In response, the neural network will generate a video showing various gameplay scenarios, with the character acting as if a real person were playing. The result may differ from the actual gameplay, but that's precisely the goal of Muse: to visualize a variety of possible gameplay scenarios based on a few initial frames. This opens up new horizons for players and developers, allowing them to experiment with different approaches and strategies in game mechanics.

Image: Springer Nature

The creators of Muse have introduced WHAM Demonstrator, an innovative environment for interacting with a neural network. This visual interface allows users to enter initial prompts, change generation conditions while working with Muse, and add additional objects to an already generated scene. The neural network takes these elements into account during subsequent generation, significantly expanding creative possibilities and customizing the final result. WHAM Demonstrator opens new horizons for users seeking maximum control over the AI-powered content creation process.

In the context of WHAM Demonstrator, it is important to clarify one aspect. Generation can be adjusted using a controller, as demonstrated in an article on the Microsoft website using an Xbox gamepad. For example, if you want a character in the Muse system to move left instead of right, simply move the controller sticks in the desired direction, and the neural network will follow this instruction. This allows the user to more precisely control the character's behavior and tailor the interaction to their preferences.

Does Muse create interactive gameplay where the player controls the character? In fact, it doesn't, as there is no character control. By moving the stick left or performing another action on the controller, the user merely provides the neural network with an additional prompt, which it must take into account in the subsequent generation process. The artificial intelligence continues to generate videos with gameplay moments, but the gameplay itself is not the result of direct control. The principle of the system can be illustrated as follows: an action on the gamepad, denoted as a(t), interrupts the video generation, denoted as z(t), so that the AI ​​can take this prompt into account in the next generation. Thus, Muse demonstrates a new approach to content creation, but does not provide a traditional character control experience.

Muse could potentially be used for gameplay iteration

Muse offers the ability to think through and test design ideas in the context of game mechanics. For example, a developer wanting to evaluate how a jump mechanic will function in a specific location currently needs to assemble the entire scene in the game engine and conduct manual testing. However, using a neural network capable of generating various gameplay scenarios based on the behavior of real players, this process becomes significantly faster and more convenient. This allows developers to more effectively optimize gameplay and improve user interaction.

The creators of Muse clearly invested in the project an idea based on three key principles that ensure the effective generation of gameplay videos. These principles include consistency, variety, and immutability. Adherence to these principles allows the neural network to create content that is not only attractive but also meets user expectations. Consistency ensures consistent video quality, variety prevents monotony, and consistency preserves the content's core characteristics. These aspects are the foundation of Muse's success and contribute to the creation of a unique gaming experience.

Consistency in neural networks refers to the model's ability to remember the conditions presented during the generation process and follow them when similar situations arise. For example, if the Muse neural network generates a gameplay video where a character controlled by a non-existent player starts shooting when an enemy appears, then all other characters should exhibit similar behavior under similar circumstances. This property helps ensure consistency and realism in gameplay.

Variety in the context of neural networks refers to the system's ability to generate multiple gameplay scenarios based on the initial information provided in the prompt. For example, if the neural network is presented with footage of a character facing a choice between three paths, Muse must generate at least three gameplay videos for each of these paths. The optimal option would be to generate more videos demonstrating different types of character behavior on each route. This significantly increases the diversity of game content and improves the user experience.

Immutability in neural networks means that the system preserves user-defined conditions. For example, if you added a red barrel next to a character in a generated video, Muse shouldn't remove it during further generation. Ideally, the neural network should take this condition into account and utilize it, for example, by showing the character shooting the barrel and causing it to explode. This approach provides greater consistency and predictability in content creation, which is especially important for users striving for high quality and accuracy in visual projects.

Image: Springer Nature

Michael Cook, a game designer, AI researcher, and lecturer at King's College London, argues that even when strictly adhered to, Muse remains a difficult tool for game development. Despite its theoretical potential, its application in game development is questionable.

One argument is that regular developers cannot replicate the conditions the Microsoft team observed when creating their neural network. Even if a large company can compile seven years of gameplay from its in-development game, what about indie studios just starting out? Such studios may never have enough data to provide the neural network with the necessary set of prompts. This creates significant obstacles for indie developers who need to adapt to modern technologies and use them in their games.

Cook raised important questions about how neural networks work, particularly in the context of diversity in generated content. This principle dictates that a neural network should account for the diverse behaviors of real players based on the gameplay data provided to it. However, many questions arise if the neural network was trained on recordings obtained during alpha or beta testing, or even during QA testing. What if there are no available post-release gameplay recordings because the game hasn't been released yet? This raises questions about how relevant the content generated by the neural network will be if it was trained on footage of QA specialists simply searching for bugs and attempting to break the game. This raises questions about the quality and diversity of the results the neural network can provide and the need to use more diverse and relevant training data.

Microsoft representatives have not provided answers to many questions, including these. There's no guarantee that the situation will change anytime soon and that they'll be able to provide clarification.

Muse is a highly questionable tool for preserving old games, even in theory.

The problem with Muse is that Microsoft is trying to present the neural network as a means of preserving video games. According to Phil Spencer, head of Xbox, there's a prospect that Muse could analyze gameplay videos of classic games and facilitate their porting to modern platforms. This opens up new opportunities for preserving gaming heritage and making old games accessible to a new audience.

Michael Cook described Spencer's claims as "silly" and emphasized that Muse shouldn't be considered a means of preserving video games. This is because the process of "preservation" itself is not clearly defined or explained. Currently, Microsoft's neural network can only render gameplay videos based on real player behavior, using extensive data from relatively simple action games. It's unclear how exactly this model will evolve into technology capable of porting older games to modern platforms. Apparently, even Microsoft itself is unclear on this issue.

Microsoft's Muse represents a significant step in the complex process of integrating neural networks into video game development. While this step currently raises more questions and challenges than it offers solutions, the technology has the potential to transform the way artificial intelligence is used in the gaming industry. It can help developers significantly reduce the time spent on solving various problems, which will open up new opportunities for creativity and innovation in game development.

Game designer profession from scratch to PRO

A game designer creates the structure of a game. They develop the idea, rules, and gameplay, and decide what emotions the story will evoke in players. You'll master the principles of game design from scratch and learn how to work with popular engines like Unity and Unreal Engine. You'll learn how to keep players engaged and monetize games. And we'll help you start a career in the gaming industry.

Learn more