Google‘s DeepMind has introduced a new AI technology that can generate background music and sound effects for silent videos. This “video-to-audio” system is designed to streamline the video editing process, particularly for content creators.

The technology is still under development, but it offers some exciting functionalities. Here’s a breakdown of the process:

  1. User Input: Creators upload their silent video and can provide keywords or phrases to guide the AI in generating the desired soundscape. For instance, a silent video of someone walking in the dark could be paired with prompts like “movies, horror films, music, tension, footsteps on concrete” to help the AI understand the mood and setting.
  2. AI in Action: DeepMind’s AI model first disassembles the video to analyze the visuals. This disassembled video data is then combined with the user’s text prompts. Using a diffusion model, the AI iteratively processes this information, ultimately generating background sounds that complement the video content.
  3. Tailoring the Soundscape: The model can create various audio options for a single video, allowing creators to choose the best fit for their project. DeepMind’s system can also consider the emotional tone of the prompt words. For example, prompts emphasizing “tension” might result in suspenseful background music, while prompts like “joyful celebration” could lead to more upbeat sounds.

Looking ahead, DeepMind is actively refining the technology. Future developments include enabling the AI to automatically generate sounds based solely on the video content, eliminating the need for user prompts. Additionally, they’re working on improving the system’s ability to synchronize generated dialogue with the lip movements of characters in the video.

This “video-to-audio” technology has the potential to revolutionize video editing, especially for creators who lack access to professional audio tools or expertise.
