Giter Site home page Giter Site logo

jen-1-pytorch's Introduction

JEN-1-pytorch

Unofficial implementation JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models(https://arxiv.org/abs/2308.04729)

JEN-1

README

📖 Quick Index

💻 Installation

git clone https://github.com/0417keito/JEN-1-pytorch.git
cd JEN-1-pytorch
pip install -r requirements.txt

🐍Usage

Sampling

import torch
from generation import Jen1

ckpt_path =  'your ckpt path'
jen1 = Jen1(ckpt_path)

prompt = 'a beautiful song'
samples = jen1.generate(prompt)

Training

torchrun train.py

Dataset format

Json format. the name of the Json file must be the same as the target music file.

{"prompt": "a beautiful song"}
How should the data_dir be created?

'''
dataset_dir
├── audios
|    ├── music1.wav
|    ├── music2.wav
|    .......
|    ├── music{n}.wav
|
├── metadata
|   ├── music1.json
|   ├── music2.json
|   ......
|   ├── music{n}.json
|
'''

About config

please see config.py and conditioner_config.py

🧠TODO

  • Extension to JEN-1-Composer
  • Extension to music generation with singing voice
  • Adaptation of Consistency Model
  • In the paper, Diffusion Autoencoder was used, but I did not have much computing resources, so I used Encodec instead. So, if I can afford it, I will implement Diffusion Autoencoder.

🚀Demo

coming soon !

🙏Appreciation

Dr Adam Fils - Thank you for providing the GPU. I really appreciate Adam giving me this opportunity.

⭐️Show Your Support

If you find this repo interesting and useful, give us a ⭐️ on GitHub! It encourages us to keep improving the model and adding exciting features. Please inform us of any deficiencies by issue.

🙆Welcome Contributions

Contributions are always welcome.

jen-1-pytorch's People

Contributors

0417keito avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

jen-1-pytorch's Issues

Problem of the input_concat_cond

In the implementation of music inpainting and continuation tasks, I've noticed that the code concatenates the masked audio with the input.
However, in the processing of the masked audio, only the first value of the batch is taken and then duplicated. I'm curious about the reason for this. The comments indicate that even the author is unsure about the rationale behind this approach. I 'm curious to know the reference or source that inspired this piece of code. Thank you!

if len(self.input_concat_ids) > 0:

Prepare dataset - wav/mp3 should be a full-length audio or chunked one

I want to prepare data.

'''
How should the data_dir be created?

dataset_dir
├── audios
| ├── music1.wav
| ├── music2.wav
| .......
| ├── music{n}.wav
|
├── metadata
| ├── music1.json
| ├── music2.json
| ......
| ├── music{n}.json
|
'''

What should the length of music1.wav, music2.wav, etc., be? Should it be a full song, perhaps five minutes long, which will then be automatically trimmed for us? Or do I need to segment it (e.g., into 10-second segments) and place it in the audios folder?

Thank you!

The loss converge but when I try generation.py, it only generates noise audio wav

I followed the tutorial and trained my own model using approximately 300 hours of song accompaniment data. It converged well, but when I tried to generate a song from the best model, even using the same prompt input from the training set, it only generated noisy audio.

I checked the code and noticed that only the UNet1D is saved and loaded during inference, and the Diffusion model is not. Is there anyone who has successfully trained and can actually infer from their model who could offer me any tips? Thank you!

Great Progress!

Hey! I was going through the codebase and saw you've made amazing progress on replicating JEN-1. If you need support on the GPU side or datasets, please let me know and I'd be happy to provide access to some spare A100s, along with 200k copyright-free music files + descriptions.

RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

When running 'torchrun train.py' I get this error:
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

Traceback:
Traceback (most recent call last):
File "E:\New folder\JEN-1-pytorch\train.py", line 129, in
main(config=Config)
File "E:\New folder\JEN-1-pytorch\train.py", line 20, in main
run(rank=0, n_gpus=1, config=config)
File "E:\New folder\JEN-1-pytorch\train.py", line 124, in run
trainer.train_loop()
File "E:\New folder\JEN-1-pytorch\trainer.py", line 116, in train_loop
for batch_idx, (audio_emb, metadata) in enumerate(data_iter):
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torch\utils\data\dataloader.py", line 631, in next
data = self._next_data()
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torch\utils\data\dataloader.py", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in fetch
data = self.dataset.getitems(possibly_batched_index)
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torch\utils\data\dataset.py", line 399, in getitems
return [self.dataset[self.indices[idx]] for idx in indices]
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torch\utils\data\dataset.py", line 399, in
return [self.dataset[self.indices[idx]] for idx in indices]
File "E:\New folder\JEN-1-pytorch\dataset\dataloader.py", line 95, in getitem
chunk = convert_audio(chunk, sr, model.sample_rate, model.channels)
File "C:\Users\jeroe\AppData\Roaming\Python\Python310\site-packages\encodec\utils.py", line 88, in convert_audio
wav = torchaudio.transforms.Resample(sr, target_sr)(wav)
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torchaudio\transforms_transforms.py", line 979, in forward
return _apply_sinc_resample_kernel(waveform, self.orig_freq, self.new_freq, self.gcd, self.kernel, self.width)
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torchaudio\functional\functional.py", line 1462, in _apply_sinc_resample_kernel
waveform = waveform.view(-1, shape[-1])
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous
[2024-02-25 14:34:58,648] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 27492) of binary: D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\python.exe
Traceback (most recent call last):
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\runpy.py", line 196, in _run_module_as_main
return run_code(code, main_globals, None,
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\Scripts\torchrun.exe_main
.py", line 7, in
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torch\distributed\elastic\multiprocessing\errors_init
.py", line 347, in wrapper
return f(*args, **kwargs)
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torch\distributed\run.py", line 812, in main
run(args)
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torch\distributed\run.py", line 803, in run
elastic_launch(
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torch\distributed\launcher\api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "D:\Users\jeroe\pinokio\bin\miniconda\envs\videodiff\lib\site-packages\torch\distributed\launcher\api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-02-25_14:34:58
host : J-Café
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 27492)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

About SpeechTokenizer_trainer

Hi, bro
I'm sorry to open this issue here, because leaving issue is not available in the rep of speechtokenizer_trainer.
Thanks a lot for sharing such a useful and meaningful work, I've carefully read and try your training_code for weeks, but there is till some issues bothering me.
In my experiment, the code works well when '--do_distillation' turned off. But when distillation works, the gradient of speechtokenizer becomes NAN after the first backward step. I wander if you have successfully reproducte speechtokenizer or there is still some problem in loss_distillation. Thanks again for your sharing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.