Audiocraft: library for audio processing and generation with deep learning


AudioCraft emerges as a cutting-edge PyTorch library designed for deep learning research in audio generation. It features both inference and training code for a suite of state-of-the-art AI generative models, including AudioGen and MusicGen for high-quality audio creation, as well as EnCodec, Multi Band Diffusion, and MAGNeT for advanced audio processing tasks.

To get started with AudioCraft, users need Python 3.9 and PyTorch 2.1.0, with installation instructions provided for both stable and development versions. The library also recommends installing `ffmpeg` for optimal functionality.

The library not only includes the necessary components for conducting deep learning research in audio but also offers training pipelines for its models. Researchers and developers can access detailed documentation to guide them through the creation of their own training pipelines or to reproduce existing work.

AudioCraft’s API documentation is readily available online, and the training code for several models is provided. The models themselves are stored on Hugging Face, with the option to customize the cache directory.

The repository’s code is released under the MIT license, while the model weights are under the CC-BY-NC 4.0 license. For academic referencing, the library provides a citation for the general framework of AudioCraft, as well as specific citations for individual models.
Read more at GitHub…