Audiocraft: library for audio processing and generation with deep learning

AudioCraft emerges as a cutting-edge PyTorch library designed for deep learning research in audio generation. It features both inference and training code for a suite of state-of-the-art AI generative models, including AudioGen and MusicGen for high-quality audio creation, as well as EnCodec, Multi Band Diffusion, and MAGNeT for advanced audio processing tasks.

To get started with AudioCraft, users need Python 3.9 and PyTorch 2.1.0, with installation instructions provided for both stable and development versions. The library also recommends installing `ffmpeg` for optimal functionality.

The library not only includes the necessary components for conducting deep learning research in audio but also offers training pipelines for its models. Researchers and developers can access detailed documentation to guide them through the creation of their own training pipelines or to reproduce existing work.

AudioCraft’s API documentation is readily available online, and the training code for several models is provided. The models themselves are stored on Hugging Face, with the option to customize the cache directory.

The repository’s code is released under the MIT license, while the model weights are under the CC-BY-NC 4.0 license. For academic referencing, the library provides a citation for the general framework of AudioCraft, as well as specific citations for individual models.
Read more at GitHub…

Audiocraft: library for audio processing and generation with deep learning

Related

When the Vending Machine Went Sentient

Constant-Time Breakthrough Raises the Hash-Table Speed Limit

Star Wars Reimagined: China’s Laser Satellite Outpaces Starlink

Court Rules AI’s Use of Books as Fair Use but Slams Pirated Collection Storage

Introducing the OWASP AI Testing Guide: A New Standard for AI Security Testing

The Low-Background Steel Problem of AI

Chinese AI Firms Dodge US Chip Bans with Cross-Border Data Smuggling to Malaysia

OpenAI open-sources a demo of a UI testing agent

Financial Dynamics in Agentic AI: Cursor’s Rise Versus GitHub Copilot