AI music generators could be a boon for artists — but also problematic

It was only five years ago that electronic punk band YACHT entered the recording studio with a daunting task: They would train an AI on 14 years of their music, then synthesize the results into the album “Chain Tripping.” “I’m not interested in being a reactionary,” YACHT member and tech writer Claire L. Evans said in a documentary about the album. “I don’t want to return to my roots and play acoustic guitar because I’m so freaked out about the coming robot apocalypse, but I also don’t want to jump into the trenches and welcome our new robot overlords either.” But our new robot overlords are making a whole lot of progress in the space of AI music generation. Even though the Grammy-nominated “Chain Tripping” was released in 2019, the technology behind it is already becoming outdated. Now, the startup behind the open source AI image generator Stable Diffusion is pushing us forward again with its next act: making music.

AI music generators could be a boon for artists — but also problematic

Creating harmony


Harmonai is an organization with financial backing from Stability AI, the London-based startup behind Stable Diffusion. In late September, Harmonai released Dance Diffusion, an algorithm and set of tools that can generate clips of music by training on hundreds of hours of existing songs.

“I started my work on audio diffusion around the same time as I started working with Stability AI,” Zach Evans, who heads development of Dance Diffusion, told TechCrunch in an email interview. “I was brought on to the company due to my development work with [the image-generating algorithm] Disco Diffusion and I quickly decided to pivot to audio research. To facilitate my own learning and research, and make a community that focuses on audio AI, I started Harmonai.”

Dance Diffusion remains in the testing stages — at present, the system can only generate clips a few seconds long. But the early results provide a tantalizing glimpse at what could be the future of music creation, while at the same time raising questions about the potential impact on artists.



The emergence of Dance Diffusion comes several years after OpenAI, the San Francisco-based lab behind DALL-E 2, detailed its grand experiment with music generation, dubbed Jukebox. Given a genre, artist and a snippet of lyrics, Jukebox could generate relatively coherent music complete with vocals. But the songs Jukebox produced lacked larger musical structures like choruses that repeat and often contained nonsense lyrics.

Google’s AudioLM, detailed for the first time earlier this week, shows more promise, with an uncanny ability to generate piano music given a short snippet of playing. But it hasn’t been open sourced.

Dance Diffusion aims to overcome the limitations of previous open source tools by borrowing technology from image generators such as Stable Diffusion. The system is what’s known as a diffusion model, which generates new data (e.g., songs) by learning how to destroy and recover many existing samples of data. As it’s fed the existing samples — say, the entire Smashing Pumpkins discography — the model gets better at recovering all the data it had previously destroyed to create new works.

Kyle Worrall, a Ph.D. student at the University of York in the U.K. studying the musical applications of machine learning, explained the nuances of diffusion systems in an interview with TechCrunch:

“In the training of a diffusion model, training data such as the MAESTRO data set of piano performances is ‘destroyed’ and ‘recovered,’ and the model improves at performing these tasks as it works its way through the training data,” he said via email. “Eventually the trained model can take noise and turn that into music similar to the training data (i.e., piano performances in MAESTRO’s case). Users can then use the trained model to do one of three tasks: Generate new audio, regenerate existing audio that the user chooses or interpolate between two input tracks.”

It’s not the most intuitive idea. But as DALL-E 2, Stable Diffusion and other such systems have shown, the results can be remarkably realistic.

For example, check out this Disco Diffusion model fine-tuned on Daft Punk music:

Comments

Popular posts from this blog

Sab Ke Anokhe Awards 2015-31-July Winner List, Performances, Download Videos

Lenovo K3 Note Features | Reviews | Specs | Pros & cons | Advantages | Diasadvantages | Price in india