A Microsoft patent might pave the way for AI-created video game soundtracks

Microsoft has submitted a patent named “ARTIFICIAL INTELLIGENCE MODELS FOR COMPOSING AUDIO SCORES” on the WIPO IP Portal. Microsoft is developing an intelligent audio composition engine that will allow it to create noises, music, and other audio components for a wide range of media, including movies, TV programmes, games, and even live recordings. The patent references dynamic moments in games, implying that it might provide scores that alter based on the activities of the player. According to the patent abstract, parameters may be defined using visual, audio, and textual elements and prompts (together referred to as ‘Dataset’) to train a variety of AI models on how to create audio scores.

The recent introduction of AI has been groundbreaking, crossing numerous areas of art and media. Although there have previously been a number of AI tools for audio synthesis available, Microsoft’s recent patent seems to indicate that their own AI model ecosystem will be the most extensive and complex system of machine aided music composition to date.

In video games, artificial intelligence (AI) is crucial. AI is essential at every stage of game development, from enemy behaviour and combat encounters to procedural level creation and interactions with NPCs and the environment. Many video games, including recent Doom games, Metal Gear Rising: Revengeance, Devil May Cry 5, and others, use adaptive/dynamic soundtracks. In Devil May Cry 5, for example, the tracks will only begin to carry the energetic voices when the style rating rises.

However, Microsoft’s new AI for audio has the potential to go well beyond the traditional use of dynamic/adaptive music in games. In real-time, player actions may be dynamically scored with relevant auditory cues and music. As a result, the audio experience will change from person to person. Many games put a premium on sound and music. These games may benefit from the heuristics provided by this technology.

The patent description goes into depth regarding the plethora of AI engines that are charged with calculating audio scores based on the given information. They can evaluate human emotions and moods, gather location data, assess the situation’s tone, and much more. The AI can learn about images, videos, films, and live events and then generate a collection of audio files that overlay the visuals with relevant sound effects and music. This cutting-edge AI has the potential to open up a plethora of new opportunities for media production. With a vast collection of ever-expanding audio soundtracks, one may create films, games, and other media. Creating a grandiose symphonic composition for the hero’s arrival, producing a melancholy melody for the death of a pet, and generating sound effects for gunshots and explosions may all be left to the AI’s algorithm. As a result, composers and sound designers may encounter considerable competition.

Cloud computing will fuel the technology. It remains to be seen when the system will become operational. With such a rapidly growing database, the AI system will need a far more powerful infrastructure. However, the future of audio design seems to be bright, and Microsoft might be at the head of a revolution in this field.