facebookresearch / audiocraft Goto Github PK

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

License: MIT License

Makefile 0.18% Python 98.96% CSS 0.14% HTML 0.71%

audiocraft's Introduction

AudioCraft

AudioCraft is a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code for two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen.

Installation

AudioCraft requires Python 3.9, PyTorch 2.1.0. To install AudioCraft, you can run the following:

# Best to make sure you have torch installed first, in particular before installing xformers.
# Don't run this if you already have PyTorch installed.
python -m pip install 'torch==2.1.0'
# You might need the following before trying to install the packages
python -m pip install setuptools wheel
# Then proceed to one of the following
python -m pip install -U audiocraft  # stable release
python -m pip install -U git+https://[email protected]/facebookresearch/audiocraft#egg=audiocraft  # bleeding edge
python -m pip install -e .  # or if you cloned the repo locally (mandatory if you want to train).
python -m pip install -e '.[wm]'  # if you want to train a watermarking model

We also recommend having ffmpeg installed, either through your system or Anaconda:

sudo apt-get install ffmpeg
# Or if you are using Anaconda or Miniconda
conda install "ffmpeg<5" -c conda-forge

Models

At the moment, AudioCraft contains the training code and inference code for:

MusicGen: A state-of-the-art controllable text-to-music model.
AudioGen: A state-of-the-art text-to-sound model.
EnCodec: A state-of-the-art high fidelity neural audio codec.
Multi Band Diffusion: An EnCodec compatible decoder using diffusion.
MAGNeT: A state-of-the-art non-autoregressive model for text-to-music and text-to-sound.
AudioSeal: A state-of-the-art audio watermarking.

Training code

AudioCraft contains PyTorch components for deep learning research in audio and training pipelines for the developed models. For a general introduction of AudioCraft design principles and instructions to develop your own training pipeline, refer to the AudioCraft training documentation.

For reproducing existing work and using the developed training pipelines, refer to the instructions for each specific model that provides pointers to configuration, example grids and model/task-specific information and FAQ.

API documentation

We provide some API documentation for AudioCraft.

FAQ

Is the training code available?

Yes! We provide the training code for EnCodec, MusicGen and Multi Band Diffusion.

Where are the models stored?

Hugging Face stored the model in a specific location, which can be overridden by setting the AUDIOCRAFT_CACHE_DIR environment variable for the AudioCraft models. In order to change the cache location of the other Hugging Face models, please check out the Hugging Face Transformers documentation for the cache setup. Finally, if you use a model that relies on Demucs (e.g. musicgen-melody) and want to change the download location for Demucs, refer to the Torch Hub documentation.

License

The code in this repository is released under the MIT license as found in the LICENSE file.
The models weights in this repository are released under the CC-BY-NC 4.0 license as found in the LICENSE_weights file.

Citation

For the general framework of AudioCraft, please cite the following.

@inproceedings{copet2023simple,
    title={Simple and Controllable Music Generation},
    author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
    year={2023},
}

When referring to a specific model, please cite as mentioned in the model specific README, e.g ./docs/MUSICGEN.md, ./docs/AUDIOGEN.md, etc.

audiocraft's People

Contributors

Stargazers

Watchers

Forkers

hongwen-sun chenchy cortexelus ishine syhw jnordberg patrickvonplaten stephenroddy techthiyanes mimbres edreams liujuncn paperwave argoopjmc stanleyjacob mec-is jdlozanom princerumi zvk jamierpond juandavidgf cyb3ryoga mclyang pterameta tmophoto allthecodedev jesselau76 vincentneemie octag0no rfjohnso annias sd37 husseinlezzaik silva-m leanderdulac chrsle jijosephslu tocasoft jefedeoro adrianwedd yunghoy un1tz3r0 pollinations seakingii argent-oxidum furkangozukara xaviviro guangyusong tomchapin stracerxx timothylab fyremael redtachyon faisala976 majeriot christimperley eltociear kumar045 sdbds shackleslayer andreyrgw veren-konditer a-leut kokizzu sleeplessinva suryatmodulus wes-kay hironow cate9021 binz120 zippynetworks sycomix rook2pawn ufodriverr smvorwerk songorlick thenomadicpyre y2218 bradparks tonywhite11 ksylvan chang-qing kleon1024 ongoza elainafanboy lokimetasmith c0debrain tw-stephen wansiliang ichoake assets1975 thabiso004 alexbardyshev andykeh710 sr-pepe abhilol123 carlthome ashleykleynhans evdcush jonathanfly

audiocraft's Issues

Microsoft Defender Flag Source code as Trojan:Script/Sabsik.FL.B!ml

Date and time of detection 11 June 2023 01:25 PM Bangkok Timezone
How to Reproduce Effect
Open Repo
Head to download source code
External Download manager (IDM) Download File
Microsoft Defender Intercept and cancel download and give warning of virus detection
Spec detail
OS: Windows 10
Browser : firefox

Wont start

Error code is this after a new install:
C:\MusicGen\audiocraft>python app.py Traceback (most recent call last): File "C:\MusicGen\audiocraft\app.py", line 11, in <module> from audiocraft.models import MusicGen File "C:\MusicGen\audiocraft\audiocraft\__init__.py", line 8, in <module> from . import data, modules, models File "C:\MusicGen\audiocraft\audiocraft\models\__init__.py", line 8, in <module> from .musicgen import MusicGen File "C:\MusicGen\audiocraft\audiocraft\models\musicgen.py", line 17, in <module> from .encodec import CompressionModel File "C:\MusicGen\audiocraft\audiocraft\models\encodec.py", line 14, in <module> from .. import quantization as qt File "C:\MusicGen\audiocraft\audiocraft\quantization\__init__.py", line 8, in <module> from .vq import ResidualVectorQuantizer File "C:\MusicGen\audiocraft\audiocraft\quantization\vq.py", line 13, in <module> from .core_vq import ResidualVectorQuantization File "C:\MusicGen\audiocraft\audiocraft\quantization\core_vq.py", line 10, in <module> import flashy File "C:\Users\Jaulustus-Desktop\AppData\Local\Programs\Python\Python310\lib\site-packages\flashy\__init__.py", line 13, in <module> from .logging import ResultLogger, LogProgressBar, bold, setup_logging File "C:\Users\Jaulustus-Desktop\AppData\Local\Programs\Python\Python310\lib\site-packages\flashy\logging.py", line 19, in <module> from flashy.loggers.base import ExperimentLogger File "C:\Users\Jaulustus-Desktop\AppData\Local\Programs\Python\Python310\lib\site-packages\flashy\loggers\__init__.py", line 8, in <module> from .tensorboard import TensorboardLogger File "C:\Users\Jaulustus-Desktop\AppData\Local\Programs\Python\Python310\lib\site-packages\flashy\loggers\tensorboard.py", line 16, in <module> from torch.utils.tensorboard import SummaryWriter File "C:\Users\Jaulustus-Desktop\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\tensorboard\__init__.py", line 7, in <module> raise ImportError("TensorBoard logging requires TensorBoard version 1.15 or above") ImportError: TensorBoard logging requires TensorBoard version 1.15 or above

please add support for Mac OS Apple silicon M1

since torch 2.0 supports MPS it should be possible to make this great project work on Mac

ModuleNotFoundError: No module named 'soundfile'

The Application will not start. I received this output:

C:\Users{ }\MusicGen\audiocraft>python app.py
Traceback (most recent call last):
File "C:\Users{ }\MusicGen\audiocraft\app.py", line 12, in
from audiocraft.models import MusicGen
File "C:\Users{ }\MusicGen\audiocraft\audiocraft_init_.py", line 8, in
from . import data, modules, models
File "C:\Users{ }\MusicGen\audiocraft\audiocraft\data_init_.py", line 8, in
from . import audio, audio_dataset
File "C:\Users{ }\MusicGen\audiocraft\audiocraft\data\audio.py", line 18, in
import soundfile
ModuleNotFoundError: No module named 'soundfile'

Same Error running the application in a virtual environment and the host environment.

Is it normal that it saves in mp4?

I don't know if it was something from my instalation or if it's the default of the program but here when I press to download the music file it comes as a video in mp4, is it normal or if not, is there a workaround to make it mp3 for instance?

i made a windows version for this

Due to different requirements, a separate webui version was created

https://github.com/sdbds/audiocraft-webui

Please let me know if there are any problems that need improvement.

USAGE SUGGESTIONS: Set Save File Location; Metadata; Autogen File Name; Save last Gradio Setting

Thanks! It's usable the way it is. These are just user interface improvement suggestions. Can these be added?

SET SAVE FILE LOCATION
Even in the same session in Chrome when I tell it to download the file to save it defaults to my default Windows location, not the last place I saved the previous file in the same session. Can there be a designated Output folder?

METADATA
Is there some place in the WAV file format to save the metadata of the prompt & other settings?

AUTOGEN FILE NAME
The default filename is just "audio" which will be a pain when the save location is set. Using the Prompt as the filename can be problematic because of the characters allowable at the prompt that can't be used as a file name. If some unique filename could be used, based on date & time, that'd be great. The metadata can hold the prompt info.

SAVE LAST GRADIO SETTING
Which model is being used, the file length, etc. That'd be helpful.

[Question] Best configs?

Currently I'm getting okay results, getting a lot of reverb and other audio artifacts, with the below, just looking for some suggestions on what others have found work well.

result = client.predict(
		"large",	# str  in 'Model' Radio component
		"Jingle",	# str  in 'Input Text' Textbox component
                               "",	# str (filepath or URL to file) in 'Melody Condition (optional)' Audio component
		10,	# int | float (numeric value between 1 and 30) in 'Duration' Slider component
		250,	# int | float  in 'Top-k' Number component
		0,	# int | float  in 'Top-p' Number component
		1,	# int | float  in 'Temperature' Number component
		1,	# int | float  in 'Classifier Free Guidance' Number component
		fn_index=0
)

Continuation is very choppy unlike text to music

Hello!

It's fuzzy.

I tried continuation of a 12 seconds track to get 30 seconds total, using the demo.ipynb notebook ran in colab, I used Large model, and before the cell that plays it I ran the 2 cells which I am not sure why they are there they seem to be about adding bip bip ?? pi math into the center of the song - no clue. Got it to run without those but keeping the import stuffz.

It at first didn't want to run so I had to add in the very first cell '!pip install audiocraft' at top

The whole 30 second output including the song input itself is now clearly choppy. Like low resolution.

Text to music works fine, those are mostly clear. Any idea why?

melody model doesn't work.

The other models work but when I use melody I get an error:

from scipy.linalg import _fblas
ImportError: DLL load failed while importing _fblas: The specified module could not be found.

I am running this locally on 4080 16gb vram.

Model currupts over 30 seconds

What is the point of this? Why does this model lose its effectiveness with samples that exceed 30 secs?

def set_generation_params(self, use_sampling: bool = True, top_k: int = 250,
                              top_p: float = 0.0, temperature: float = 1.0,
                              duration: float = 60.0, cfg_coef: float = 3.0,
                              two_step_cfg: bool = False):

example:

0.mp4

Melody LLVM ERROR

LLVM ERROR: Symbol not found: __svml_cosf8_ha

I get this error when trying to use the pre-made melody idea's.
Anyone know how to fix this? everything i tried didn't work because of some OpenSSL problem...

Where can we get text token list it was used to be trained on? Prompting tutorial

That would be amazing to get most used token words

Such as music, song, boss, epic, movie, etc

I have 0 musical knowledge so i don't know how to prompt :)

Parameters?

Hi! Any parameters for specific settings? Like length etc?

Add generate_continuation to app.py

It would be cool if you could add generate_continuation for the Gradio app.

I was able to hack this together myself by adding a gr.Slider then modifying the predict function.
I had it so you could choose where to generate from in the song (with the continuation gr.Slider) and how long (with the duration gr.Slider).
But if you could either add this feature yourself or help guide me in the right direction for a PR I would be grateful.

What's the line/flag to toggle the gpu as discussed?

LoRAs possible for fast and lightweight fine-tuning?

Just wondering if it will be feasible? Obviously the base model probably wasn't trained with any copyrighted music and probably won't recognize any band name I throw at it, so it will be very valuable. I could already foresee a huge library of LoRAs much like Stable Diffusion for every band imaginable - Pink Floyd crossed with Aphex Twin, anyone?

Voting

Nice work! Support a method to let us vote for the best banging tracks on the sample page!

Length of input text limit?

How can I measure the input text appropriately to limit how much can be passed in?

I got this error 'TypeError: issubclass() arg 1 must be a class'

Traceback (most recent call last): File "C:\musicgen\audiocraft-main\app.py", line 11, in <module> from audiocraft.models import MusicGen File "C:\musicgen\audiocraft-main\audiocraft\__init__.py", line 8, in <module> from . import data, modules, models File "C:\musicgen\audiocraft-main\audiocraft\models\__init__.py", line 8, in <module> from .musicgen import MusicGen File "C:\musicgen\audiocraft-main\audiocraft\models\musicgen.py", line 18, in <module> from .lm import LMModel File "C:\musicgen\audiocraft-main\audiocraft\models\lm.py", line 18, in <module> from ..modules.conditioners import ( File "C:\musicgen\audiocraft-main\audiocraft\modules\conditioners.py", line 19, in <module> import spacy File "C:\Users\PC\.conda\envs\musicgen\lib\site-packages\spacy\__init__.py", line 14, in <module> from . import pipeline # noqa: F401 File "C:\Users\PC\.conda\envs\musicgen\lib\site-packages\spacy\pipeline\__init__.py", line 1, in <module> from .attributeruler import AttributeRuler File "C:\Users\PC\.conda\envs\musicgen\lib\site-packages\spacy\pipeline\attributeruler.py", line 6, in <module> from .pipe import Pipe File "spacy\pipeline\pipe.pyx", line 1, in init spacy.pipeline.pipe File "spacy\vocab.pyx", line 1, in init spacy.vocab File "C:\Users\PC\.conda\envs\musicgen\lib\site-packages\spacy\tokens\__init__.py", line 1, in <module> from .doc import Doc File "spacy\tokens\doc.pyx", line 36, in init spacy.tokens.doc File "C:\Users\PC\.conda\envs\musicgen\lib\site-packages\spacy\schemas.py", line 250, in <module> class TokenPattern(BaseModel): File "pydantic\main.py", line 197, in pydantic.main.ModelMetaclass.__new__ File "pydantic\fields.py", line 506, in pydantic.fields.ModelField.infer File "pydantic\fields.py", line 436, in pydantic.fields.ModelField.__init__ File "pydantic\fields.py", line 552, in pydantic.fields.ModelField.prepare File "pydantic\fields.py", line 661, in pydantic.fields.ModelField._type_analysis File "pydantic\fields.py", line 668, in pydantic.fields.ModelField._type_analysis File "C:\Users\PC\.conda\envs\musicgen\lib\typing.py", line 852, in __subclasscheck__ return issubclass(cls, self.__origin__) TypeError: issubclass() arg 1 must be a class
trying to run it in conda enviroment but failed.

Saved files?

Awesome work ladies/gents! Where does the gradio interface save generated files?

error with torchaudio.pyd

I have everything installed without errors still i cant run it. This is the error i get when i run app.py:
E:\audiocraft>python app.py
Traceback (most recent call last):
File "E:\audiocraft\app.py", line 11, in
from audiocraft.models import MusicGen
File "E:\audiocraft\audiocraft_init_.py", line 8, in
from . import data, modules, models
File "E:\audiocraft\audiocraft\data_init_.py", line 8, in
from . import audio, audio_dataset
File "E:\audiocraft\audiocraft\data\audio.py", line 21, in
import torchaudio as ta
File "E:\Python\Python310\lib\site-packages\torchaudio_init_.py", line 1, in
from torchaudio import ( # noqa: F401
File "E:\Python\Python310\lib\site-packages\torchaudio_extension_init_.py", line 43, in
_load_lib("libtorchaudio")
File "E:\Python\Python310\lib\site-packages\torchaudio_extension\utils.py", line 61, in load_lib
torch.ops.load_library(path)
File "E:\Python\Python310\lib\site-packages\torch_ops.py", line 643, in load_library
ctypes.CDLL(path)
File "E:\Python\Python310\lib\ctypes_init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'E:\Python\Python310\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

Getting it running on a secondary GPU

How would I get it to run on a secondary GPU? Like how for Stable Diffusion you put "set CUDA_VISIBLE_DEVICES=1" in the batch file.

Feature Request: Support Apple Silicon

What would it take to run this on an Apple M1 or M2 chip with 16+GB of unified CPU/GPU memory?

FR: Generate forever, output directory and wildcards

Would be useful to have an easily accessible output folder.

Generate forever would help us make many outputs and cherry-pick the best ones.

Wildcards could help create unusual and unexpected outputs.

Audio file conditioning to continue (sliding window)

Consider implementing or providing an option to condition a model on a specific audio file, enabling the generation of audio that continues the input audio.

I read it's possible using a sliding window, but I would like to see example usage code for this in the Jupyter notebook.

Thank you guys so much!

You've compared the AI to MusicLM but when? MusicLM has degraded in quality due to trophies users give it. Also here is my comparison between the 2 AIs.

Hello! Sry about the now edited out word in the title my, it sounds rude oops. So from my day1 testing and later testing, yes it seems quality of [MusicLM] (to compare against yours) and prompt listening has degraded I think. That's what the trophies do, they change the model right? It is a nightmare for my AI documentation :) :( cry.

Also I tested MusicGen against MusicLM on hard advanced techno tests, yours seems to win 30% of the tests or so. And the hardest ones it fails - even though I ran them today June 11 on MusicLM about a month after its release.

MusicGen tests:
https://soundcloud.com/immortal-discoveries/sets/musicgen-ai-tests-now-seems-worse-than-musiclm

MusicLM tests:
https://soundcloud.com/immortal-discoveries/sets/adding-to-musiclm-playlist-the-one-with-200-if-no-prompt-go-to-link

Also a must see for you guys (plus 200 more on soundcloud):
https://www.reddit.com/r/singularity/comments/13h0zyy/i_really_crank_out_music_tracks_with_musiclm_this/

Longer prompt results in CUDA out of memory

When using the provided example code any prompt that is longer than 2-5 words results in a out of memory error. This must be something with how this method differs from the gradio UI generation, since there I can generate with the medium model with much longer prompts without running out of memory on the same machine. The error happens specifically on the line model.generate(prompt)

  model = MusicGen.get_pretrained('small')
  model.set_generation_params(duration=8)
  wav = model.generate(prompt)
  for idx, one_wav in enumerate(wav):
  # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
      file_path = audio_write(f'music/{prompt}', one_wav.cpu(), model.sample_rate, strategy="loudness")

After some debugging I assume the issue might be due to package version issues, where my project that is trying to utilize a different version of torch or transformers than the audiocraft ones.

Solution: I didn't see that model.generate doesn't take a string but instead a list. The correct way to call it:
model.generate(descriptions=[prompt])

A matching Triton is not available, some optimizations will not be enabled.

I am preparing a tutorial for windows to hopefully publish on my channel today :
https://www.youtube.com/secourses

I got the first music sample generated

But when starting I got this hated message :/

A matching Triton is not available, some optimizations will not be enabled.

Anyway to install this very annoying Triton in windows?

Here my pip freeze on a fresh venv for this project

Also I am using gradio interface and I got this warning as well

F:\audiocraft\venv\lib\site-packages\gradio\processing_utils.py:171: UserWarning: Trying to convert audio automatically from float32 to 16-bit int format.
  warnings.warn(warning.format(data.dtype))

any my first music sample with small model attached

test1.mp4

NC is non-free

The model weight is listed as CC-BY-SA-NC. Non-commercial (NC) clauses are non-free as they are not free software according to the FSF, open source according to the OSI, or free culture according to Freedom Defined. I would recommend using CC-BY-SA-4.0 instead which is a free culture Creative Commons licenses.

Google Colab output files location

In Google Colab, where are the output files saved? I've looked through /content, /usr/local/lib/python3.10/dist-packages/audiocraft and /tmp with no results.

Music related settings addition

Please add following options as settings in future, if possible:

Setting target BPM (i.e. 120)
Setting target time signature (i.e. 3/4)
Setting target key (i.e. c major)

Thanks

Suggested Addition - Interpolation

Sure would be nice to be able to audiably tween between the 30 sec gens. I know other's have suggested continuation already, but there's probably some dj algorithms to tween, shift bpm, and key to auto mix/match the various clips them build loops with melody.

Can we get more information about Top-k, Top-p, Temperature and Classifier Free Guidance

How would they affect the output we have?

weights?

hi, im new to text to music, is the music free to use commercially (if there are no "Weights"?) i couldnt find any info on the huggingface demo of this? thanks.

Getting error "Found no NVIDIA driver on your system."

I tried to run the Music generator script given in this huggingface and got this error

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and have installed a driver from http://www.nvidia.com/Download/index.aspx

Is it necessary to have an Nvidia GPU to run the pre-trained models?

How to save the audio in the python code i trying to save but my saved audio file is like sounding empty.

How i can solve this?
Please give the code example to save a audio generated via for example uncoditional_generation
Best Regards, tonyx86

download the Tensor object at the end of the generation

When the program runs on collab, i was wondering if there is an easy way to download the wav file.

The usual:

from google.colab import files
files.download(res)

doesn't work on Tensor objects. I tried to convert it to wav or mp3 to no avail. I tried modifying utils/notebook.py display_audio function and i couldn't figure it out.

Does anyone knows how to take the samples and samples_rate and create the wav (or mp3) file? Is Torchaudio the right library for that purpose?

Python script using the large model doesn't work

I am trying to run the following python script, which uses the large model:

import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('large')
model.set_generation_params(duration=8)  # generate 8 seconds.

descriptions = ['happy rock', 'energetic EDM', 'sad jazz']

wav = model.generate(descriptions)  # generates 3 samples.

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness")

I got the following logs:

Downloading: "https://dl.fbaipublicfiles.com/audiocraft/musicgen/v0/9b6e835c-1f0cf17b5e.th" to /home/worker/.cache/torch/hub/checkpoints/9b6e835c-1f0cf17b5e.th
100%|██████████████████████████████████████| 6.07G/6.07G [03:57<00:00, 27.4MB/s]
Killed

It's worth mentioning that the computer freezes for a while before giving the Killed output.

Regarding my hardware, I have an AMD Ryzen 5 2500U processor with Radeon Vega Mobile Gfx and Nvidia GeForce GTX GPU (the appropriate drivers are already installed).

I'm on Ubuntu 20.04 OS.

Here is my output for nvidia-smi command:

Mon Jun 12 00:45:05 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   38C    P8    N/A /  N/A |      9MiB /  2048MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1056      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      2033      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

Here is my output to lscpu:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           17
Model name:                      AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         1600.000
CPU max MHz:                     2000,0000
CPU min MHz:                     1600,0000
BogoMIPS:                        3992.30
Virtualization:                  AMD-V
L1d cache:                       128 KiB
L1i cache:                       256 KiB
L2 cache:                        2 MiB
L3 cache:                        4 MiB
NUMA node0 CPU(s):               0-7

Thanks in advance for any help

Allow server_name and server_port command line arguments to be specified when launching gradio

Please allow command line arguments for server_name and server_port to be specified so that we can allow gradio to listen on all interfaces and a different port when we already have something listening on the default gradio port.

I worked around this for now by changing demo.launch() in app.py as follows:

demo.launch(server_name="0.0.0.0", server_port=3010)

FR: Sequencing sections Text Prompt

Can we introduce a prompt mechanic that will allow us to feed a "time sequential" theme/feel that would look/act like this:

general prompt text [Section 1 prompt {x}(a) [section 2 prompt {y} (b)] [section 3 prompt {z}(c)]... return to general prompt

Where the different sections are sequential in the time of the composition. Where something like Bohemian Rhapsody would be

Queen rock song 85 bpm [a cappella, harmonies {30}] [piano with vocals {120}] [piano with vocal {120}] [guitar solo {30}] [rock opera with vocals {45}] [105 bpm guitar with vocals {45}]

where the values inside the {} are seconds or some specific measure of time
but if they are in parentheses instead it is a percentage of time
and if no values are specified for a particular bracket then those unspecified are evenly divided against what was already specified

I suspect continuity could be maintained by looping in original generated audio as a Melody to base the remaining off of? With some audio overlap built in to better weave the sounds together.

Train to be a perfect loop

Train a separate model to perfect the 30-second loop, ensuring it seamlessly folds/loops onto itself. This will make the output easier to use for musicians/producers, even in its current state with all of its limitations.

Help running on MacOS M1?

Update: For most this should work #13 (comment)

Any chance of getting help and/or updated instructions suitable for running audiocraft on MacOS and M1? At the very least, I think I need to know where to put the models I downloaded from Hugging Face. But, it's likely based on the errors I have some other issues too. My steps + errors follow. Thanks for any tips!

I adapted the instructions here for macOS: https://github.com/facebookresearch/audiocraft#installation

First, I ran each line in my terminal...

conda create -n audiocraft
conda activate audiocraft
pip install 'torch>=2.0'
pip install -U audiocraft 
pip install ffmpeg
jupyter notebook

Second, I downloaded these two items from Hugging Face but wasn't sure where to put them: https://huggingface.co/facebook/musicgen-melody

melody: 1.5B model, text to music and text+melody to music - 🤗 Hub
large: 3.3B model, text to music only - 🤗 Hub

Third, when Jupyter opened in Safari I created a new notebook and ran this from here: https://github.com/facebookresearch/audiocraft#api

import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('melody')
model.set_generation_params(duration=8)  # generate 8 seconds.
wav = model.generate_unconditional(4)    # generates 4 unconditional audio samples
descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
wav = model.generate(descriptions)  # generates 3 samples.

melody, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness")

Fourth, I got these errors in Jupyter

AssertionError Traceback (most recent call last)
Cell In [2], line 5
2 from audiocraft.models import MusicGen
3 from audiocraft.data.audio import audio_write
----> 5 model = MusicGen.get_pretrained('melody')
6 model.set_generation_params(duration=8) # generate 8 seconds.
7 wav = model.generate_unconditional(4) # generates 4 unconditional audio samples

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/audiocraft/models/musicgen.py:88, in MusicGen.get_pretrained(name, device)
86 else:
87 ROOT = 'https://dl.fbaipublicfiles.com/audiocraft/musicgen/v0/'
---> 88 compression_model = load_compression_model(ROOT + 'b0dbef54-37d256b525.th', device=device)
89 names = {
90 'small': 'ba7a97ba-830fe5771e',
91 'medium': 'aa73ae27-fbc9f401db',
92 'large': '9b6e835c-1f0cf17b5e',
93 'melody': 'f79af192-61305ffc49',
94 }
95 sig = names[name]

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/audiocraft/models/loaders.py:45, in load_compression_model(file_or_url, device)
43 cfg = OmegaConf.create(pkg['xp.cfg'])
44 cfg.device = str(device)
---> 45 model = builders.get_compression_model(cfg)
46 model.load_state_dict(pkg['best_state'])
47 model.eval()

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/audiocraft/models/builders.py:82, in get_compression_model(cfg)
79 renormalize = renorm is not None
80 warnings.warn("You are using a deprecated EnCodec model. Please migrate to new renormalization.")
81 return EncodecModel(encoder, decoder, quantizer,
---> 82 frame_rate=frame_rate, renormalize=renormalize, **kwargs).to(cfg.device)
83 else:
84 raise KeyError(f'Unexpected compression model {cfg.compression_model}')

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py:1145, in Module.to(self, *args, **kwargs)
1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1142 non_blocking, memory_format=convert_to_format)
1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
-> 1145 return self._apply(convert)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py:797, in Module._apply(self, fn)
795 def _apply(self, fn):
796 for module in self.children():
--> 797 module._apply(fn)
799 def compute_should_use_set_data(tensor, tensor_applied):
800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
801 # If the new tensor has compatible tensor type as the existing tensor,
802 # the current behavior is to change the tensor in-place using .data =,
(...)
807 # global flag to let the user control whether they want the future
808 # behavior of overwriting the existing tensor or not.

[... skipping similar frames: Module._apply at line 797 (2 times)]

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py:820, in Module._apply(self, fn)
816 # Tensors stored in modules are graph leaves, and we don't want to
817 # track autograd history of param_applied, so we have to use
818 # with torch.no_grad():
819 with torch.no_grad():
--> 820 param_applied = fn(param)
821 should_use_set_data = compute_should_use_set_data(param, param_applied)
822 if should_use_set_data:

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py:1143, in Module.to..convert(t)
1140 if convert_to_format is not None and t.dim() in (4, 5):
1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1142 non_blocking, memory_format=convert_to_format)
-> 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/cuda/init.py:239, in _lazy_init()
235 raise RuntimeError(
236 "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
237 "multiprocessing, you must use the 'spawn' start method")
238 if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 239 raise AssertionError("Torch not compiled with CUDA enabled")
240 if _cudart is None:
241 raise AssertionError(
242 "libcudart functions unavailable. It looks like you have a broken build?")

AssertionError: Torch not compiled with CUDA enabled

waiting for audiocraft 0.0.2 - top_p fix

Hi, since the package is still in a pre-1.0.0 stable release state, maybe it would be worthwhile to push out the 0.0.2 including the already fixed top_p .float bug.

Ran Git Pull https://github.com/facebookresearch/audiocraft.git main -- no won't generate

I went into Anaconda, ran the VENV environment as part of my BAT I created. Still get the Triton error (no biggie). Launch GRADIO. Now when i tell it to run a Large Model 30 sec I get a media player window with 0sec length & errors. This was working fine out-of-the-box before I ran the Git Pull. Any thoughts?

CHROME, windows, Anaconda, NVDIA 3060

Loading model large
CLIPPING C:\Users\dueme\AppData\Local\Temp\tmp79ua18y_.wav happening with proba (a bit of clipping is okay): 0.00013124999532010406 maximum scale: 1.116523027420044
E:\audiocraft\venv\lib\site-packages\matplotlib\axes_axes.py:2229: RuntimeWarning: overflow encountered in scalar add
dx = [convert(x0 + ddx) - x for ddx in dx]
E:\audiocraft\venv\lib\site-packages\matplotlib\axes_axes.py:2229: RuntimeWarning: overflow encountered in scalar subtract
dx = [convert(x0 + ddx) - x for ddx in dx]
E:\audiocraft\venv\lib\site-packages\matplotlib\patches.py:739: RuntimeWarning: overflow encountered in scalar add
y1 = self.convert_yunits(self._y0 + self._height)
'ffmpeg' is not recognized as an internal or external command,
operable program or batch file.

How to share to gradio?

Is there any way to do this?

I get this error: FileNotFoundError: [Errno 2] No such file or directory: '/content/9SNAXgNm3D6r.mp4

Trying to run locally :(

SUGGESTIONS: Neg prompt, segment append, wave visualizer

THIS IS BY FAR the best one I've come across for local generation yet! Awesome job. Thanks!

I'm not a programmer so I really don't know how hard these are to implement so forgive me for asking for the moon & the stars if that's what these would require but I'll put them out there. Hopefully you'll attract a team that will be able to contribute.

NEGATIVE PROMPT
As everyone is familiar with. This would be very helpful.

WEIGHTED TOKEN PROMPT
Another one people are familiar with

SEGMENT APPENDING/INSERTION
Allowing us to extend a song by generating it twice & having the tool automatically add them together one after the other with a little crossover to reduce volume issue

And/or being able to give a sequential orchestration of the song to be created, maybe delimited by [x] where the prompt would be [intro description] [next segment] [ next segment] [ending] with items outside of the brackets applying to all, so it'd be like "harpsichord music [baroque] [70s psychedelic] [musak] fade out"

WAVE VISUALIZER
If someone really wanted to get fancy they could add handles to select sections for a type of "inpainting" and/or other effect controls like normalization, volume, EQ, etc.

Very comprehensive how to install and use tutorial

Also made a pull request and accepted CLA

#17

Thank you so much for keeping this open

[Question] What exactly are the small, med, large models?

Are they just trained on more Hz? Or actually more data?

Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
https://huggingface.co/facebook/musicgen-large