M2UGen

Dive into the world of AI-driven music creativity with M2UGen! An innovative collaboration between Tencent and NUS, M2UGen is a cutting-edge tool that blends the capabilities of large language models to excel in not only understanding music but also generating it. Whether it’s responding to queries about music or creating new tunes from text, images, videos, or audios - M2UGen has got you covered.

Explore its functionality with a hands-on demo!

At its core, M2UGen is a synergy of multiple encoders:

MERT: Delving into music intricacies
ViT: Deciphering images
ViViT: Interpreting video content

…and powered by the MusicGen/AudioLDM2 model for unparalleled music creation, optimized with adapters and the innovative LLaMA 2 model.

For AI aficionados, the model’s blueprint, m2ugen.py, reveals a comprehensive architecture designed for impressive multi-modal performance.

We fuel M2UGen’s learning process with datasets crafted by the MU-LLaMA model, noted for its music captioning and Q&A prowess. Keen on the nitty-gritty? The dataset construction protocol awaits in the Datasets folder.

Elevate your AI music experience with M2UGen, where melodies and machine intelligence harmonize. 🎵🤖

Official Website

The demo is here

M2UGen, an AI-driven music creativity tool created by Tencent and NUS, blending language models to generate music from various inputs like text, images, videos, and audios.

Official Website