nomadkaraoke / python-audio-separator Goto Github PK

Easy to use vocal separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)

License: MIT License

Python 100.00%

python-audio-separator's Introduction

Audio Separator 🎶

Summary: Easy to use audio stem separation from the command line or as a dependency in your own Python project, using the amazing MDX-Net, VR Arch, Demucs and MDXC models available in UVR by @Anjok07 & @aufr33.

Audio Separator is a Python package that allows you to separate an audio file into various stems, using models trained by @Anjok07 for use with UVR (https://github.com/Anjok07/ultimatevocalremovergui).

The simplest (and probably most used) use case for this package is to separate an audio file into two stems, Instrumental and Vocals, which can be very useful for producing karaoke videos! However, the models available in UVR can separate audio into many more stems, such as Drums, Bass, Piano, and Guitar, and perform other audio processing tasks, such as denoising or removing echo/reverb.

Features

Separate audio into multiple stems, e.g. instrumental and vocals.
Supports all common audio formats (WAV, MP3, FLAC, M4A, etc.)
Ability to inference using a pre-trained model in PTH or ONNX format.
CLI support for easy use in scripts and batch processing.
Python API for integration into other projects.

Installation 🛠️

🐳 Docker

If you're able to use docker, you don't actually need to install anything - there are images published on Docker Hub for GPU (CUDA) and CPU inferencing, for both amd64 and arm64 platforms.

You probably want to volume-mount a folder containing whatever file you want to separate, which can then also be used as the output folder.

For instance, if your current directory has the file input.wav, you could execute audio-separator as shown below (see usage section for more details):

docker run -it -v `pwd`:/workdir beveradb/audio-separator input.wav

If you're using a machine with a GPU, you'll want to use the GPU specific image and pass in the GPU device to the container, like this:

docker run -it --gpus all -v `pwd`:/workdir beveradb/audio-separator:gpu input.wav

If the GPU isn't being detected, make sure your docker runtime environment is passing through the GPU correctly - there are various guides online to help with that.

🎮 Nvidia GPU with CUDA or 🧪 Google Colab

Supported CUDA Versions: 11.8 and 12.2

💬 If successfully configured, you should see this log message when running audio-separator --env_info: ONNXruntime has CUDAExecutionProvider available, enabling acceleration

Conda: conda install pytorch=*=*cuda* onnxruntime=*=*cuda* audio-separator -c pytorch -c conda-forge

Pip: pip install "audio-separator[gpu]"

Docker: beveradb/audio-separator:gpu

 Apple Silicon, macOS Sonoma+ with M1 or newer CPU (CoreML acceleration)

💬 If successfully configured, you should see this log message when running audio-separator --env_info: ONNXruntime has CoreMLExecutionProvider available, enabling acceleration

Pip: pip install "audio-separator[cpu]"

🐢 No hardware acceleration, CPU only

Conda: conda install audio-separator-c pytorch -c conda-forge

Pip: pip install "audio-separator[cpu]"

Docker: beveradb/audio-separator

🎥 FFmpeg dependency

💬 To test if audio-separator has been successfully configured to use FFmpeg, run audio-separator --env_info. The log will show FFmpeg installed.

If you installed audio-separator using conda or docker, FFmpeg should already be avaialble in your environment.

You may need to separately install FFmpeg. It should be easy to install on most platforms, e.g.:

🐧 Debian/Ubuntu: apt-get update; apt-get install -y ffmpeg

 macOS:brew update; brew install ffmpeg

GPU / CUDA specific installation steps with Pip

In theory, all you should need to do to get audio-separator working with a GPU is install it with the [gpu] extra as above.

However, sometimes getting both PyTorch and ONNX Runtime working with CUDA support can be a bit tricky so it may not work that easily.

You may need to reinstall both packages directly, allowing pip to calculate the right versions for your platform, for example:

pip uninstall torch onnxruntime
pip cache purge
pip install --force-reinstall torch torchvision torchaudio
pip install --force-reinstall onnxruntime-gpu

I generally recommend installing the latest version of PyTorch for your environment using the command recommended by the wizard here: https://pytorch.org/get-started/locally/

Multiple CUDA library versions may be needed

Depending on your CUDA version and environment, you may need to install specific version(s) of CUDA libraries for ONNX Runtime to use your GPU.

🧪 Google Colab, for example, now uses CUDA 12 by default, but ONNX Runtime still needs CUDA 11 libraries to work.

If you see the error Failed to load library or cannot open shared object file when you run audio-separator, this is likely the issue.

You can install the CUDA 11 libraries alongside CUDA 12 like so: apt update; apt install nvidia-cuda-toolkit

Note: if anyone knows how to make this cleaner so we can support both different platform-specific dependencies for hardware acceleration without a separate installation process for each, please let me know or raise a PR!

Usage 🚀

Command Line Interface (CLI)

You can use Audio Separator via the command line, for example:

audio-separator /path/to/your/input/audio.wav --model_filename UVR-MDX-NET-Inst_HQ_3.onnx

This command will download the specified model file, process the audio.wav input audio and generate two new files in the current directory, one containing vocals and one containing instrumental.

Note: You do not need to download any files yourself - audio-separator does that automatically for you!

To see a list of supported models, run audio-separator --list_models

Any file listed in the list models output can be specified (with file extension) with the model_filename parameter (e.g. --model_filename UVR_MDXNET_KARA_2.onnx) and it will be automatically downloaded to the --model_file_dir (default: /tmp/audio-separator-models/) folder on first usage.

Full command-line interface options

usage: audio-separator [-h] [-v] [-d] [-e] [-l] [--log_level LOG_LEVEL] [-m MODEL_FILENAME] [--output_format OUTPUT_FORMAT] [--output_dir OUTPUT_DIR] [--model_file_dir MODEL_FILE_DIR] [--invert_spect]
                       [--normalization NORMALIZATION] [--single_stem SINGLE_STEM] [--sample_rate SAMPLE_RATE] [--mdx_segment_size MDX_SEGMENT_SIZE] [--mdx_overlap MDX_OVERLAP] [--mdx_batch_size MDX_BATCH_SIZE]
                       [--mdx_hop_length MDX_HOP_LENGTH] [--mdx_enable_denoise] [--vr_batch_size VR_BATCH_SIZE] [--vr_window_size VR_WINDOW_SIZE] [--vr_aggression VR_AGGRESSION] [--vr_enable_tta]
                       [--vr_high_end_process] [--vr_enable_post_process] [--vr_post_process_threshold VR_POST_PROCESS_THRESHOLD] [--demucs_segment_size DEMUCS_SEGMENT_SIZE] [--demucs_shifts DEMUCS_SHIFTS]
                       [--demucs_overlap DEMUCS_OVERLAP] [--demucs_segments_enabled DEMUCS_SEGMENTS_ENABLED] [--mdxc_segment_size MDXC_SEGMENT_SIZE] [--mdxc_override_model_segment_size]
                       [--mdxc_overlap MDXC_OVERLAP] [--mdxc_batch_size MDXC_BATCH_SIZE] [--mdxc_pitch_shift MDXC_PITCH_SHIFT]
                       [audio_file]

Separate audio file into different stems.

positional arguments:
  audio_file                                             The audio file path to separate, in any common format.

options:
  -h, --help                                             show this help message and exit

Info and Debugging:
  -v, --version                                          Show the program's version number and exit.
  -d, --debug                                            Enable debug logging, equivalent to --log_level=debug
  -e, --env_info                                         Print environment information and exit.
  -l, --list_models                                      List all supported models and exit.
  --log_level LOG_LEVEL                                  Log level, e.g. info, debug, warning (default: info).

Separation I/O Params:
  -m MODEL_FILENAME, --model_filename MODEL_FILENAME     model to use for separation (default: UVR-MDX-NET-Inst_HQ_3.onnx). Example: -m 2_HP-UVR.pth
  --output_format OUTPUT_FORMAT                          output format for separated files, any common format (default: FLAC). Example: --output_format=MP3
  --output_dir OUTPUT_DIR                                directory to write output files (default: <current dir>). Example: --output_dir=/app/separated
  --model_file_dir MODEL_FILE_DIR                        model files directory (default: /tmp/audio-separator-models/). Example: --model_file_dir=/app/models

Common Separation Parameters:
  --invert_spect                                         invert secondary stem using spectogram (default: False). Example: --invert_spect
  --normalization NORMALIZATION                          value by which to multiply the amplitude of the output files (default: 0.9). Example: --normalization=0.7
  --single_stem SINGLE_STEM                              output only single stem, e.g. Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other. Example: --single_stem=Instrumental
  --sample_rate SAMPLE_RATE                              set the sample rate of the output audio (default: 44100). Example: --sample_rate=44100

MDX Architecture Parameters:
  --mdx_segment_size MDX_SEGMENT_SIZE                    larger consumes more resources, but may give better results (default: 256). Example: --mdx_segment_size=256
  --mdx_overlap MDX_OVERLAP                              amount of overlap between prediction windows, 0.001-0.999. higher is better but slower (default: 0.25). Example: --mdx_overlap=0.25
  --mdx_batch_size MDX_BATCH_SIZE                        larger consumes more RAM but may process slightly faster (default: 1). Example: --mdx_batch_size=4
  --mdx_hop_length MDX_HOP_LENGTH                        usually called stride in neural networks; only change if you know what you're doing (default: 1024). Example: --mdx_hop_length=1024
  --mdx_enable_denoise                                   enable denoising after separation (default: False). Example: --mdx_enable_denoise

VR Architecture Parameters:
  --vr_batch_size VR_BATCH_SIZE                          number of "batches" to process at a time. higher = more RAM, slightly faster processing (default: 4). Example: --vr_batch_size=16
  --vr_window_size VR_WINDOW_SIZE                        balance quality and speed. 1024 = fast but lower, 320 = slower but better quality. (default: 512). Example: --vr_window_size=320
  --vr_aggression VR_AGGRESSION                          intensity of primary stem extraction, -100 - 100. typically 5 for vocals & instrumentals (default: 5). Example: --vr_aggression=2
  --vr_enable_tta                                        enable Test-Time-Augmentation; slow but improves quality (default: False). Example: --vr_enable_tta
  --vr_high_end_process                                  mirror the missing frequency range of the output (default: False). Example: --vr_high_end_process
  --vr_enable_post_process                               identify leftover artifacts within vocal output; may improve separation for some songs (default: False). Example: --vr_enable_post_process
  --vr_post_process_threshold VR_POST_PROCESS_THRESHOLD  threshold for post_process feature: 0.1-0.3 (default: 0.2). Example: --vr_post_process_threshold=0.1

Demucs Architecture Parameters:
  --demucs_segment_size DEMUCS_SEGMENT_SIZE              size of segments into which the audio is split, 1-100. higher = slower but better quality (default: Default). Example: --demucs_segment_size=256
  --demucs_shifts DEMUCS_SHIFTS                          number of predictions with random shifts, higher = slower but better quality (default: 2). Example: --demucs_shifts=4
  --demucs_overlap DEMUCS_OVERLAP                        overlap between prediction windows, 0.001-0.999. higher = slower but better quality (default: 0.25). Example: --demucs_overlap=0.25
  --demucs_segments_enabled DEMUCS_SEGMENTS_ENABLED      enable segment-wise processing (default: True). Example: --demucs_segments_enabled=False

MDXC Architecture Parameters:
  --mdxc_segment_size MDXC_SEGMENT_SIZE                  larger consumes more resources, but may give better results (default: 256). Example: --mdxc_segment_size=256
  --mdxc_override_model_segment_size                     override model default segment size instead of using the model default value. Example: --mdxc_override_model_segment_size
  --mdxc_overlap MDXC_OVERLAP                            amount of overlap between prediction windows, 2-50. higher is better but slower (default: 8). Example: --mdxc_overlap=8
  --mdxc_batch_size MDXC_BATCH_SIZE                      larger consumes more RAM but may process slightly faster (default: 1). Example: --mdxc_batch_size=4
  --mdxc_pitch_shift MDXC_PITCH_SHIFT                    shift audio pitch by a number of semitones while processing. may improve output for deep/high vocals. (default: 0). Example: --mdxc_pitch_shift=2

As a Dependency in a Python Project

You can use Audio Separator in your own Python project. Here's how you can use it:

from audio_separator.separator import Separator

# Initialize the Separator class (with optional configuration properties, below)
separator = Separator()

# Load a machine learning model (if unspecified, defaults to 'model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt')
separator.load_model()

# Perform the separation on specific audio files without reloading the model
output_files = separator.separate('audio1.wav')

print(f"Separation complete! Output file(s): {' '.join(output_files)}")

Batch processing and processing with multiple models

You can process multiple files without reloading the model to save time and memory.

You only need to load a model when choosing or changing models. See example below:

from audio_separator.separator import Separator

# Initialize the Separator with other configuration properties, below
separator = Separator()

# Load a model
separator.load_model(model_filename='UVR-MDX-NET-Inst_HQ_3.onnx')

# Separate multiple audio files without reloading the model
output_file_paths_1 = separator.separate('audio1.wav')
output_file_paths_2 = separator.separate('audio2.wav')
output_file_paths_3 = separator.separate('audio3.wav')

# Load a different model
separator.load_model(model_filename='UVR_MDXNET_KARA_2.onnx')

# Separate the same files with the new model
output_file_paths_4 = separator.separate('audio1.wav')
output_file_paths_5 = separator.separate('audio2.wav')
output_file_paths_6 = separator.separate('audio3.wav')

Parameters for the Separator class

log_level: (Optional) Logging level, e.g., INFO, DEBUG, WARNING. Default: logging.INFO
log_formatter: (Optional) The log format. Default: None, which falls back to '%(asctime)s - %(levelname)s - %(module)s - %(message)s'
model_file_dir: (Optional) Directory to cache model files in. Default: /tmp/audio-separator-models/
output_dir: (Optional) Directory where the separated files will be saved. If not specified, uses the current directory.
output_format: (Optional) Format to encode output files, any common format (WAV, MP3, FLAC, M4A, etc.). Default: WAV
normalization_threshold: (Optional) The amount by which the amplitude of the output audio will be multiplied. Default: 0.9
output_single_stem: (Optional) Output only a single stem, such as 'Instrumental' and 'Vocals'. Default: None
invert_using_spec: (Optional) Flag to invert using spectrogram. Default: False
sample_rate: (Optional) Set the sample rate of the output audio. Default: 44100
mdx_params: (Optional) MDX Architecture Specific Attributes & Defaults. Default: {"hop_length": 1024, "segment_size": 256, "overlap": 0.25, "batch_size": 1}
vr_params: (Optional) VR Architecture Specific Attributes & Defaults. Default: {"batch_size": 16, "window_size": 512, "aggression": 5, "enable_tta": False, "enable_post_process": False, "post_process_threshold": 0.2, "high_end_process": False}
demucs_params: (Optional) VR Architecture Specific Attributes & Defaults. {"segment_size": "Default", "shifts": 2, "overlap": 0.25, "segments_enabled": True}

Requirements 📋

Python >= 3.10

Libraries: torch, onnx, onnxruntime, numpy, librosa, requests, six, tqdm, pydub

Developing Locally

This project uses Poetry for dependency management and packaging. Follow these steps to setup a local development environment:

Prerequisites

Make sure you have Python 3.10 or newer installed on your machine.
Install Conda (I recommend Miniforge: https://github.com/conda-forge/miniforge) to manage your Python virtual environments

Clone the Repository

Clone the repository to your local machine:

git clone https://github.com/YOUR_USERNAME/audio-separator.git
cd audio-separator

Replace YOUR_USERNAME with your GitHub username if you've forked the repository, or use the main repository URL if you have the permissions.

Create and activate the Conda Environment

To create and activate the conda environment, use the following commands:

conda env create
conda activate audio-separator-dev

Install Dependencies

Once you're inside the conda env, run the following command to install the project dependencies:

poetry install

Running the Command-Line Interface Locally

You can run the CLI command directly within the virtual environment. For example:

audio-separator path/to/your/audio-file.wav

Deactivate the Virtual Environment

Once you are done with your development work, you can exit the virtual environment by simply typing:

conda deactivate

Building the Package

To build the package for distribution, use the following command:

poetry build

This will generate the distribution packages in the dist directory - but for now only @beveradb will be able to publish to PyPI.

Contributing 🤝

Contributions are very much welcome! Please fork the repository and submit a pull request with your changes, and I'll try to review, merge and publish promptly!

This project is 100% open-source and free for anyone to use and modify as they wish.
If the maintenance workload for this repo somehow becomes too much for me I'll ask for volunteers to share maintainership of the repo, though I don't think that is very likely
Development and support for the MDX-Net separation models is part of the main UVR project, this repo is just a CLI/Python package wrapper to simplify running those models programmatically. So, if you want to try and improve the actual models, please get involved in the UVR project and look for guidance there!

License 📄

This project is licensed under the MIT License.

Please Note: If you choose to integrate this project into some other project using the default model or any other model trained as part of the UVR project, please honor the MIT license by providing credit to UVR and its developers!

Credits 🙏

Anjok07 - Author of Ultimate Vocal Remover GUI, which almost all of the code in this repo was copied from! Definitely deserving of credit for anything good from this project. Thank you!
DilanBoskan - Your contributions at the start of this project were essential to the success of UVR. Thank you!
Kuielab & Woosung Choi - Developed the original MDX-Net AI code.
KimberleyJSN - Advised and aided the implementation of the training scripts for MDX-Net and Demucs. Thank you!
Hv - Helped implement chunks into the MDX-Net AI code. Thank you!
zhzhongshi - Helped add support for the MDXC models in audio-separator. Thank you!

Contact 💌

For questions or feedback, please raise an issue or reach out to @beveradb (Andrew Beveridge) directly.

python-audio-separator's People

Contributors

Stargazers

Watchers

Forkers

hylimr channel960608 jibril-gudal fanqinfred arthuraltino maxmax2016 quicksandznzn sorisori-ai alxtools karelnagel smokedjedi fseasy sutirthachakraborty shs2008 jordigoyanes seetimee medalcollector steplersemir yassinebelatar thinhlpg wangchen59 vikchan zhzhongshi htdung167 hero-intelligent iampickle smartinezbragado shirounanashi ken2190 helloimmatt omdbsd ishine raynoldvanheyningen gyrusdentatus namlh-braly ga2006467270 yuchenooooooo bw-pt sungbeomchoi jet082 im-ling jdola graypham upseem joseph16388 diplomatictunes senswrong ok3721 lansmurf young01ai davechambers jacoblincool a43992899 kennyboate anhlbt ganjunhong armelgeek mobilewish

python-audio-separator's Issues

No hardware acceleration could be configured, running in CPU mode

why i get the newest docker image: beveradb/audio-separator:gpu-0.12.3, when i run in the docker，GPU can not be used。

2024-01-23 07:37:54.481 - INFO - separator - Separator version 0.13.0 instantiating with output_dir: /workdir/separated, output_format: WAV
2024-01-23 07:37:54.481 - INFO - separator - Checking hardware specifics to configure acceleration
2024-01-23 07:37:54.481 - INFO - separator - Operating System: Linux #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023
2024-01-23 07:37:54.481 - INFO - separator - System: Linux Node: 2268b32cf6d1 Release: 5.15.0-91-generic Machine: x86_64 Proc: x86_64
2024-01-23 07:37:54.481 - INFO - separator - Python Version: 3.10.12
2024-01-23 07:37:54.481 - INFO - separator - ONNX Runtime GPU package installed with version: 1.16.3
2024-01-23 07:37:54.490 - INFO - separator - Torch package installed with version: 2.1.2
2024-01-23 07:37:54.490 - INFO - separator - Torchvision package installed with version: 0.16.2
2024-01-23 07:37:54.490 - INFO - separator - Torchaudio package installed with version: 2.1.2
2024-01-23 07:37:54.513 - INFO - separator - No hardware acceleration could be configured, running in CPU mode

[Feature request] Add parameters for denoising and normalization

Request from Jibril, submitted by email:

I’ve been using the package you built and its been a massive help.
I was wondering if you had any plans to add some post processing option, especially the denoising stuff mentioned in UVR? > I’ve noticed the output still has audio quality issues, and was wondering if this was related to this denoising stuff, is this the case?
[...]
Thanks for the fast response. Was just doing this exact thing now and managed to make it work.
It's quite noticable I think, the quality difference with normaization/denoising as true. I've also just hard coded in my fork of the repo.

Error writing big Audio files

input file

format: mp4
size: 7.8G

error message

2024-03-16 15:31:02,587 - INFO - separator - Separator version 0.16.2 instantiating with output_dir: None, output_format: mp3
2024-03-16 15:31:02,587 - INFO - separator - Operating System: Linux #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-6 (2023-11-29T08:32Z)
2024-03-16 15:31:02,588 - INFO - separator - System: Linux Node: tbot Release: 6.5.11-6-pve Machine: x86_64 Proc:
2024-03-16 15:31:02,588 - INFO - separator - Python Version: 3.11.4
2024-03-16 15:31:02,588 - INFO - separator - PyTorch Version: 2.2.0+cu121
2024-03-16 15:31:02,680 - INFO - separator - FFmpeg installed: ffmpeg version 5.1.4-0+deb12u1 Copyright (c) 2000-2023 the FFmpeg developers
2024-03-16 15:31:02,681 - INFO - separator - ONNX Runtime GPU package installed with version: 1.17.1
2024-03-16 15:31:02,731 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA
2024-03-16 15:31:02,731 - INFO - separator - ONNXruntime has CUDAExecutionProvider available, enabling acceleration
2024-03-16 15:31:02,732 - INFO - separator - Loading model UVR-MDX-NET-Inst_HQ_3.onnx...
2024-03-16 15:31:04,463 - INFO - separator - Load model duration: 00:00:01
2024-03-16 15:31:04,463 - INFO - separator - Starting separation process for audio_file_path: /media/raid/twitch/papaplatte/papaplatte-stream-2024-02-01/16.50.mp4
36%|████████████████████████████████████████████ | 2356/6573 [19:33<33:02, 2.13it/s]100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6573/6573 [52:38<00:00, 2.08it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5030/5030 [06:56<00:00, 12.08it/s]
2024-03-16 16:34:56,926 - INFO - mdx_separator - Saving Vocals stem to 16.50_(Vocals)_UVR-MDX-NET-Inst_HQ_3.mp3...
Traceback (most recent call last):
File "/home/tbot/twitchbot/test.py", line 50, in
output_files = separator.separate('/media/raid/twitch/papaplatte/papaplatte-stream-2024-02-01/16.50.mp4')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/site-packages/audio_separator/separator/separator.py", line 660, in separate
output_files = self.model_instance.separate(audio_file_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/site-packages/audio_separator/separator/architectures/mdx_separator.py", line 181, in separate
self.final_process(self.secondary_stem_output_path, self.secondary_source, self.secondary_stem_name)
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/site-packages/audio_separator/separator/common_separator.py", line 118, in final_process
self.write_audio(stem_path, source)
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/site-packages/audio_separator/separator/common_separator.py", line 255, in write_audio
audio_segment.export(stem_path, format=file_format)
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/site-packages/pydub/audio_segment.py", line 895, in export
wave_data.writeframesraw(pcm_for_wav)
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 547, in writeframesraw
self._ensure_header_written(len(data))
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 588, in _ensure_header_written
self._write_header(datasize)
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 600, in _write_header
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: 'L' format requires 0 <= number <= 4294967295
Exception ignored in: <function Wave_write.del at 0x7f7352a5af20>
Traceback (most recent call last):
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 447, in del
self.close()
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 565, in close
self._ensure_header_written(0)
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 588, in _ensure_header_written
self._write_header(datasize)
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 600, in _write_header
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: 'L' format requires 0 <= number <= 4294967295
self.close()
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 565, in close
self._ensure_header_written(0)
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 588, in _ensure_header_written
self._write_header(datasize)
File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/wave.py", line 600, in _write_header
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: 'L' format requires 0 <= number <= 4294967295

do you have the ensemble mode available?

I am unable to use the GPU of my 3060 for inference.

Hello, I tried following the instructions and installed the necessary packages for GPU acceleration using the command "pip install torch "optimum[onnxruntime-gpu]". However, when running it, I encountered an error and couldn't process with GPU acceleration.

2023-09-01 10:26:44.946 - INFO - cli - Separator beginning with input file: I:\02.当时的我们.flac
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\pypoetry\Cache\virtualenvs\audio-separator-NjHMt13R-py3.10\Scripts\audio-separator", line 6, in <module>
    sys.exit(main())
  File "I:\分離\python-audio-separator\audio_separator\utils\cli.py", line 98, in main
    separator = Separator(
  File "I:\分離\python-audio-separator\audio_separator\separator.py", line 119, in __init__
    raise Exception("CUDA requested but not available with current Torch installation. Do you have an Nvidia GPU?")
Exception: CUDA requested but not available with current Torch installation. Do you have an Nvidia GPU?

I have an Nvidia graphics card 3060.
I also used poetry to deploy the virtual environment, but encountered the same error message. How can I solve this problem? I am using Windows system with Python 3.10.10. Thank you!

Failed to load library libonnxruntime_providers_cuda.

I have this config :
base_image="docker.io/nvidia/cuda:12.3.1-devel-ubuntu22.04",
python_version="python3.10",
python_packages=[
"audio-separator[gpu]",
"pydub",
"boto3",
"pusher"
], # You can also add a path to a requirements.txt instead
# Anything that would normally go inside a Dockerfile is in the commands field
commands=[
"apt-get update && apt-get install -y ffmpeg",
]
i get this erros
[E:onnxruntime:Default, provider_bridge_ort.cc:1480 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory
[W:onnxruntime:Default, onnxruntime_pybind_state.cc:747 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

Issue with Files Being Overwritten When Batch Processing Audio Files

I encountered a problem while using the batch processing feature of audio-separator. I attempted to process two audio files (1.wav and 2.wav) simultaneously, expecting to get separate audio files for each. However, I faced an issue where, regardless of the number of files processed, the output audio files seem to always get overwritten by the first processed audio file, leaving only a file named 1_(Vocals)_UVR-MDX-NET-Inst_HQ_3.wav.

Here is how I executed the command:

from audio_separator.separator import Separator

# Initialize the Separator with other configuration properties below
separator = Separator()

# Load a model
separator.load_model(model_filename="UVR-MDX-NET-Inst_HQ_3.onnx")

# Separate multiple audio files without reloading the model
output_file_paths_1 = separator.separate("1.WAV")

output_file_paths_2 = separator.separate("2.WAV")

I expected each original audio file to have its corresponding separated audio outputs, rather than being overwritten. Is this an issue with my operation, or is it a bug in the software? Has anyone encountered a similar issue and can offer some advice on how to resolve it?

[CICD] building a docker image that includes RVC and audio-seperator

I am building a Dockerfile from my github actions, I want it to use cuda version, It works with RVC now, I added audio-seperator added the lines in the readme for deleting onnx-runtime and installing onnxruntime-gpu. But onnxruntime is already installed and installing it after deleting the onnxruntime throws error about the audio-separator package audio-separator 0.7.3 requires onnxruntime<2.0,>=1.15; sys_platform != "darwin" and platform_machine != "arm64", which is not installed.. It doesn't work in the docker after that message: module ‘onnxruntime’ has no attribute ‘InferenceSession’. It works when I don't uninstall onnxruntime and reinstall onnxruntime-gpu but it works really slow, so I think that it is not using cuda. What do you suggest for that? How can I solve that?

Cannot install the latest PyPI package

Hi. I tried to install the audio-separator for the MDX23C-8KFFT-InstVoc_HQ_2 model but met package version issue.
I use the following command line:

conda create -n uvr python=3.10
conda activate uvr
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install onnxruntime-gpu
pip install "audio-separator[gpu]"

And the output of audio-separator --env_info is:

2024-03-20 02:14:17,873 - INFO - separator - Separator version 0.14.1 instantiating with output_dir: None, output_format: WAV
2024-03-20 02:14:17,874 - INFO - separator - Operating System: Linux #1 SMP Thu Dec 7 15:39:45 UTC 2023
2024-03-20 02:14:17,874 - INFO - separator - System: Linux Node: <my.server.ip> Release: 3.10.0-1160.105.1.el7.x86_64 Machine: x86_64 Proc: x86_64
2024-03-20 02:14:17,874 - INFO - separator - Python Version: 3.10.13
2024-03-20 02:14:17,946 - INFO - separator - FFmpeg installed: ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
2024-03-20 02:14:17,947 - DEBUG - separator - Python package: onnxruntime-silicon not installed
2024-03-20 02:14:17,947 - DEBUG - separator - Python package: onnxruntime not installed
2024-03-20 02:14:17,949 - INFO - separator - ONNX Runtime GPU package installed with version: 1.16.3
2024-03-20 02:14:18,109 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA
2024-03-20 02:14:18,109 - INFO - separator - ONNXruntime has CUDAExecutionProvider available, enabling acceleration

Since I want to use the MDX23C-8KFFT-InstVoc_HQ_2 model, which is supported in version 0.16.2. How should I fix the installation? Thanks for any suggestions.

Randomly raising "librosa ParameterError: Audio buffer is not finite everywhere (cutting wav files)" 1 out 5 generations

Hi,
I deployed the package as part of Serverless Runpod Endpoint, and randomly returns silent vocals. When this happens it is raising a "librosa ParameterError: Audio buffer is not finite everywhere (cutting wav files)" on vr_separator.py - spec_to_wav() line 317.

It is completely random, sometimes the vocal separation is perfect, and other times it is silent and raises the error. Do you know how to fix this.? I tried to fork the repo, and convert the spec infinite values to 0, but did not work.

expose output format as argument

Hi, thanks for the great library/CLI tool!

One minor problem when running this on a lot of inputs is that WAV output files will eat up space very fast. To help alleviate this, supporting an output format/subtype argument/CLI param would be much appreciated.

First, I add output_format="WAV" and output_subtype as optional Separator constructor args.

Then, I add this to the Separator constructor:

# self.subtype replaces self.wav_type_set
self.subtype = output_subtype
self.format = output_format
if self.subtype is None and output_format == "WAV":
    self.subtype = "PCM_16"

And finally replace the sf.write() call with this:

sf.write(stem_path, stem_source, samplerate, subtype=self.subtype, format=self.format)

This maintains existing behavior of 16-bit WAV output with no config, while allowing user to do this:
separator = Separator(audio_file_path, format="FLAC")
or any other format soundfile supports (which is quite a few)

Current workaround would just be importing soundfile and doing this manually, but that's not easy for CLI users, and adds the overhead of reading/writing the files an extra time.

Supporting VR Architecture models like HP5

Hi, I wonder if you are going to support VR Architecture models like HP5 as well. Thanks!

Add scrolling lyrics for Karaoke videos

I'm looking for a way to create karaoke videos with scrolling lyrics. This tool works well to remove the vocals, but it would be nice to have some way to transcribe the audio and create a video with scrolling lyrics. Any examples that use other tools like Whisper would be appreciated.

[Feature request] Docker image / "probable better method to build package for GPU support with clean install step, and testing steps"

         Thanks for sharing 😄 
I certainly like the idea of having a docker image ready to run for anyone who wants to use audio-separator without any installation required; that should also make it easier for people to get it up and running on different machines.

However, I gotta say that dockerfile you linked is pretty wild 😅
Not sure I'd want to use it directly but there's probably some learnings I could take from it and try to publish a docker image from this repo for people to be able to conveniently run audio-separator in docker.

My gut feeling is that should be possible with a much more minimal dockerfile/image - e.g. why are you compiling python from source?
Why not just start with a base image using the official pytorch images (e.g. the latest CUDA one 2.0.1-cuda11.7-cudnn8-runtime )?

Also, hope you don't mind but I've edited the title of this issue to remind me to add a docker image for audio-separator!

Originally posted by @beveradb in #10 (comment) , which may be off line, so I'm here for opening another issue. Sorry for the convenience.

GPU support?

Will there be GPU Torch CUDA demixing support?

PTH & CKPT Support

Hello, can you please add PTH support? Thanks!

UVR-DeEcho-DeReverb not really working (Amazon Linux)

I am trying to use the UVR-DeEcho-DeReverb Model and it's not really working -- producing no audio or some kind of garbled audio. The command options I am using can be seen below. With MDX-Net Model like Reverb_HQ_By_FoxJoy, things seem to work fine. Am I missing something?

audio-separator --single_stem=instrumental --output_format=wav --output_dir=output/UVR-DeEcho-DeReverb/ --model_file_dir=models/ --model_filename=UVR-DeEcho-DeReverb.pth --vr_window_size=256 --vr_aggression=5 [input_file]

Demixing %100 Cpu Usage

Hello, gpu is active but it uses 100% cpu during demixing phase.

model_name: UVR_MDXNET_KARA_2, output_dir: None, use_cuda: True, output_format: WAV
2023-07-31 04:35:40,379 - DEBUG - separator - CUDA requested, checking Torch version and CUDA status
2023-07-31 04:35:40,379 - DEBUG - separator - Torch version: 2.0.0+cu118
2023-07-31 04:35:40,393 - DEBUG - separator - Is CUDA enabled? True
2023-07-31 04:35:40,393 - DEBUG - separator - Running in GPU mode
2023-07-31 04:35:40,393 - DEBUG - separator - Reading model settings...

[ci] workflow to build package with GPU support

if anyone has a way to make this cleaner so we can support both CPU and CUDA transcodes without separate installation processes, please let me know or submit a PR!

I've made a Dockerfile of UVR5 th if anyone has a way to make this cleaner so we can support both CPU and CUDA transcodes without separate installation processes, please let me know or submit a PR!e day before yesterday, and i've posted the Dockerfile in an issue, in case anyone should need it sometime. Anjok07/ultimatevocalremovergui#379 (comment)

This idea was raised when I took my first step into docker, or say programming as a hobby, in order not messing up with or breaking my system, so that I needn't reinstall my system again and again, because i was doing it on my only computer that is too powerless to spin up a virtual machine. But now, that computer was finally broken, so I've changed one a little bit powerful. Anyways, that goes all into my history.

Now, this is my thought that you should take an advantage of this image. Later, i'll push this giant 10GB docker image to dockerhub for you to test out.

My computer I'm using now is without GPU. So, in other words, if the image built with cuda on a machine that is without nvidia GPU installed can work perfectly and perfectly on a machine with an nvidia GPU installed, certain methodology can be applied to build the package.

If anyone with GPU can do us a favor to test if he / she can execute the commands that are listed below INSIDE THE CONTAINER and get the correct output, you may modify your working flow using nvidia docker image to build your package.

# This should get the output
nvidia-smi

# This should spit out "True"
python -c "import torch; result = torch.cuda.is_available(); print(result)"

So, if it is a success, maybe in theory, you could just change [tool.poetry.dependencies.onnxruntime] into [tool.poetry.dependencies.onnxruntime-gpu] inpyproject.toml and run your workflow using docker, if the CPU version and GPU version of onnxruntime is compatible.

But it certainly needs volunteers for further testing, so this should not be pushed to pypi yet, or use an alpha tag.

There is another possibility to test out.

Could you just build your package mannually from source?

First do it on a virtual machine instance that is with cuda installed, then instead of pushing to and then pull from pypi, download the package to your own computer, then destroy the instance.

Second, spin up a similar one and upload the freshly built package to the new instance and test if it works properly.

If it does, yes, your program source code has no bug, the only thing that needs worry about is the environment; if not, that's your code's problem.

Third, spin up a virtual machine on your own computer, which is certainly without GPU installed, and test if the freshly built package can work properly without GPU. This is a test to find out whether the onnxruntime-gpu can run on a machine without GPU installed.

I'm looking forward to your success.

best wishes

run under cuda

run under cuda
form "https://pytorch.org/get-started/locally/"
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

May I know where the Windows model will be downloaded to?

Hello, I would like to know where the model will be downloaded in Windows. I can't find it. Thank you.

issues pip isntalling in colab (samplerate)

Getting this error when pip installing in colab:

Building wheels for collected packages: samplerate
  error: subprocess-exited-with-error
  
  × Building wheel for samplerate (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Building wheel for samplerate (pyproject.toml) ... error
  ERROR: Failed building wheel for samplerate
Failed to build samplerate
ERROR: Could not build wheels for samplerate, which is required to install pyproject.toml-based projects

And when I try 0.14.3, I'm unable to use any of the VR models as this error comes up, even though Colab runs an Ubuntu setup.

some popping sound occurred

Hi,
When I use the karaok_2 model to split the audio file, sometimes some popping sound will occur, but when I use the UVR app it dosent. I use the model with parameters like below:
model_name="UVR_MDXNET_KARA_2",
model_file_dir="./models",
normalization_enabled=False,
denoise_enabled=False

when I set all of the normalization_enabled and denoise_enabled to true, the vocals will include background music. Have you ever been in this situation？
The audio I used is in the attachement.

UnboundLocalError: local variable 'h' referenced before assignment

Hello, I was trying to load "UVR-DeEcho-DeReverb" model, and this error occur:

  File "/opt/conda/lib/python3.10/site-packages/audio_separator/separator/separator.py", line 540, in separate
    output_files = self.model_instance.separate(audio_file_path)
  File "/opt/conda/lib/python3.10/site-packages/audio_separator/separator/architectures/vr_separator.py", line 150, in separate
    y_spec, v_spec = self.inference_vr(self.loading_mix(), self.torch_device, self.aggressiveness)
  File "/opt/conda/lib/python3.10/site-packages/audio_separator/separator/architectures/vr_separator.py", line 305, in inference_vr
    mask = _execute(X_mag_pad, roi_size)
  File "/opt/conda/lib/python3.10/site-packages/audio_separator/separator/architectures/vr_separator.py", line 272, in _execute
    pred = self.model_run.predict_mask(X_batch)
  File "/opt/conda/lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/vr_network/nets_new.py", line 141, in predict_mask
    mask = self.forward(input_tensor)
  File "/opt/conda/lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/vr_network/nets_new.py", line 101, in forward
    l1 = self.stg1_low_band_net(l1_in)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py", line 215, in forward
    input = module(input)
  File "/opt/conda/lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/vr_network/nets_new.py", line 52, in __call__
    bottleneck = torch.cat([bottleneck, self.lstm_dec2(bottleneck)], dim=1)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/vr_network/layers_new.py", line 142, in forward
    h, _ = self.lstm(h)
UnboundLocalError: local variable 'h' referenced before assignment

I check the code in layer_new.py and found this:

def forward(self, input_tensor):
        N, _, nbins, nframes = input_tensor.size()

        # Extract features and prepare for LSTM
        hidden = self.conv(input_tensor)[:, 0]  # N, nbins, nframes
        hidden = hidden.permute(2, 0, 1)  # nframes, N, nbins
        h, _ = self.lstm(h)

        # Apply dense layer and reshape to match expected output format
        hidden = self.dense(h.reshape(-1, hidden.size()[-1]))  # nframes * N, nbins
        hidden = hidden.reshape(nframes, N, 1, nbins)
        hidden = hidden.permute(1, 2, 3, 0)

        return hidden

maybe replace self.lstm(h) to self.lstm(hidden) can fix the problem?

Allows the model to be loaded into memory for repeated use!

Thank you for your excellent work. Please provide a class that allows the model to be loaded into memory for repeated use, instead of having to reload the model every time. This would facilitate creating an application service. Thank you.

huge VRAM comsuption compared to uvr

Hi,

uvr doesnt use much vram (about 4gb VRAM) but this code eats up suddenly all my available VRAM at inference (up to 15GB VRAM)

I compared UVR and this repo using the same MDX models and the same thing happen for all models.
That's preventing me to run other gpu intensive tools at the same time.

any idea why there are huge VRAM spikes with audio separator ?

thanks

How to increase speed?

I have a very large gpu 80GB i want to increase speed increasing batch doesn't help at all . Thanks

Feature request: Ensemble (multi-model combination) mode

Hello,
is it possible to add the functionality of combining multiple models,
similar to UVR's Ensemble Mode? And can we specify the way of combination,
like choosing Min Spec, Max Spec, Average in UVR? Thank you.

AttributeError: module 'onnxruntime' has no attribute 'get_device'

When running the latest version of the separator class, I get:

  File "/home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages/audio_separator/separator.py", line 111, in __init__
    ort_device = ort.get_device()
AttributeError: module 'onnxruntime' has no attribute 'get_device'

Here's how I installed

(audio-separator) tobi:~$ pip uninstall audio-separator
WARNING: Skipping audio-separator as it is not installed.
(audio-separator) tobi:~$ pip uninstall audio-separator[gpu]
WARNING: Skipping audio-separator as it is not installed.
(audio-separator) tobi:~$ pip uninstall onnxruntime[gpu]
WARNING: Skipping onnxruntime as it is not installed.
^[[A(audio-separator) tobi:~$ pip uninstall onnxruntime
WARNING: Skipping onnxruntime as it is not installed.
(audio-separator) tobi:~$ pip install audio-separator[gpu]
Collecting audio-separator[gpu]
  Using cached audio_separator-0.9.6-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: librosa>=0.9 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from audio-separator[gpu]) (0.9.2)
Requirement already satisfied: numpy>=1.23 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from audio-separator[gpu]) (1.24.1)
Requirement already satisfied: onnx>=1.14 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from audio-separator[gpu]) (1.15.0)
Requirement already satisfied: pydub>=0.25 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from audio-separator[gpu]) (0.25.1)
Requirement already satisfied: six>=1.16 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from audio-separator[gpu]) (1.16.0)
Requirement already satisfied: torch>=2 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from audio-separator[gpu]) (2.1.2+cu118)
Requirement already satisfied: wget>=3 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from audio-separator[gpu]) (3.2)
Requirement already satisfied: onnxruntime-gpu>=1.15 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from audio-separator[gpu]) (1.16.3)
Requirement already satisfied: audioread>=2.1.9 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from librosa>=0.9->audio-separator[gpu]) (3.0.1)
Requirement already satisfied: scipy>=1.2.0 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from librosa>=0.9->audio-separator[gpu]) (1.11.4)
Requirement already satisfied: scikit-learn>=0.19.1 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from librosa>=0.9->audio-separator[gpu]) (1.3.2)
Requirement already satisfied: joblib>=0.14 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from librosa>=0.9->audio-separator[gpu]) (1.3.2)
Requirement already satisfied: decorator>=4.0.10 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from librosa>=0.9->audio-separator[gpu]) (5.1.1)
Requirement already satisfied: resampy>=0.2.2 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from librosa>=0.9->audio-separator[gpu]) (0.4.2)
Requirement already satisfied: numba>=0.45.1 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from librosa>=0.9->audio-separator[gpu]) (0.58.1)
Requirement already satisfied: soundfile>=0.10.2 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from librosa>=0.9->audio-separator[gpu]) (0.11.0)
Requirement already satisfied: pooch>=1.0 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from librosa>=0.9->audio-separator[gpu]) (1.8.0)
Requirement already satisfied: packaging>=20.0 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from librosa>=0.9->audio-separator[gpu]) (23.2)
Requirement already satisfied: protobuf>=3.20.2 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from onnx>=1.14->audio-separator[gpu]) (4.25.1)
Requirement already satisfied: coloredlogs in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from onnxruntime-gpu>=1.15->audio-separator[gpu]) (15.0.1)
Requirement already satisfied: flatbuffers in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from onnxruntime-gpu>=1.15->audio-separator[gpu]) (23.5.26)
Requirement already satisfied: sympy in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from onnxruntime-gpu>=1.15->audio-separator[gpu]) (1.12)
Requirement already satisfied: filelock in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from torch>=2->audio-separator[gpu]) (3.9.0)
Requirement already satisfied: typing-extensions in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from torch>=2->audio-separator[gpu]) (4.4.0)
Requirement already satisfied: networkx in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from torch>=2->audio-separator[gpu]) (3.0)
Requirement already satisfied: jinja2 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from torch>=2->audio-separator[gpu]) (3.1.2)
Requirement already satisfied: fsspec in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from torch>=2->audio-separator[gpu]) (2023.4.0)
Requirement already satisfied: triton==2.1.0 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from torch>=2->audio-separator[gpu]) (2.1.0)
Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from numba>=0.45.1->librosa>=0.9->audio-separator[gpu]) (0.41.1)
Requirement already satisfied: platformdirs>=2.5.0 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from pooch>=1.0->librosa>=0.9->audio-separator[gpu]) (4.1.0)
Requirement already satisfied: requests>=2.19.0 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from pooch>=1.0->librosa>=0.9->audio-separator[gpu]) (2.28.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from scikit-learn>=0.19.1->librosa>=0.9->audio-separator[gpu]) (3.2.0)
Requirement already satisfied: cffi>=1.0 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from soundfile>=0.10.2->librosa>=0.9->audio-separator[gpu]) (1.16.0)
Requirement already satisfied: humanfriendly>=9.1 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from coloredlogs->onnxruntime-gpu>=1.15->audio-separator[gpu]) (10.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from jinja2->torch>=2->audio-separator[gpu]) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from sympy->onnxruntime-gpu>=1.15->audio-separator[gpu]) (1.3.0)
Requirement already satisfied: pycparser in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from cffi>=1.0->soundfile>=0.10.2->librosa>=0.9->audio-separator[gpu]) (2.21)
Requirement already satisfied: charset-normalizer<3,>=2 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from requests>=2.19.0->pooch>=1.0->librosa>=0.9->audio-separator[gpu]) (2.1.1)
Requirement already satisfied: idna<4,>=2.5 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from requests>=2.19.0->pooch>=1.0->librosa>=0.9->audio-separator[gpu]) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from requests>=2.19.0->pooch>=1.0->librosa>=0.9->audio-separator[gpu]) (1.26.13)
Requirement already satisfied: certifi>=2017.4.17 in /home/tobi/miniconda3/envs/audio-separator/lib/python3.9/site-packages (from requests>=2.19.0->pooch>=1.0->librosa>=0.9->audio-separator[gpu]) (2022.12.7)
Using cached audio_separator-0.9.6-py3-none-any.whl (18 kB)
Installing collected packages: audio-separator
Successfully installed audio-separator-0.9.6

Some VR Arch models show the error: librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere

Hello, some VR Arch models (specifically: 3_HP-Vocal-UVR.pth, 4_HP-Vocal-UVR.pth, UVR-BVE-4B_SN-44100-1.pth) show the error:

Traceback (most recent call last):
  File "/usr/local/bin/audio-separator", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/audio_separator/utils/cli.py", line 177, in main
    output_files = separator.separate(args.audio_file)
  File "/usr/local/lib/python3.10/dist-packages/audio_separator/separator/separator.py", line 634, in separate
    output_files = self.model_instance.separate(audio_file_path)
  File "/usr/local/lib/python3.10/dist-packages/audio_separator/separator/architectures/vr_separator.py", line 172, in separate
    self.primary_source = self.spec_to_wav(y_spec).T
  File "/usr/local/lib/python3.10/dist-packages/audio_separator/separator/architectures/vr_separator.py", line 324, in spec_to_wav
    wav = spec_utils.cmb_spectrogram_to_wave(spec, self.model_params, is_v51_model=self.is_vr_51_model)
  File "/usr/local/lib/python3.10/dist-packages/audio_separator/separator/uvr_lib_v5/spec_utils.py", line 380, in cmb_spectrogram_to_wave
    wave = librosa.resample(wave2, orig_sr=bp["sr"], target_sr=sr, res_type=wav_resolution)
  File "/usr/local/lib/python3.10/dist-packages/librosa/core/audio.py", line 627, in resample
    util.valid_audio(y, mono=False)
  File "/usr/local/lib/python3.10/dist-packages/librosa/util/utils.py", line 314, in valid_audio
    raise ParameterError("Audio buffer is not finite everywhere")
librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere

I have tried different audios in different formats and frequencies and have gotten the same result, so I think it is not the audio file. Both using it with GPU and CPU I get the same error.
This error appears in the colab I have made, if you can review it it would also be of great help, although it is basically just the installation and execution of the code without further ado.
Here is the link: https://colab.research.google.com/drive/1WZO66n44O8CZJAQ6Na-Ozj0TWlOPd5KA?usp=sharing

Any help would be greatly appreciated, whether I'm doing something wrong or if there is a bug in the code.

[Feature request] Stream audio from input device in real-time

I combined this amazing tool with the real-time implementation by https://github.com/facebookresearch/denoiser . Hopefully this might be useful to someone.

Proof-of-concept:
https://github.com/nnyj/python-audio-separator-live

Add a model

Can you please add and host this model? It's a VR Arc model, and it's the best model I know for keeping and removing backing vocals. It works on UVR5
https://drive.google.com/file/d/1LSHlTvVt4FJxG8VGetueAuNlDc5w3Deu/view?usp=sharing

UVR-DeEcho-DeReverb Support

Seem to only support ONNX Model , UVR-DeEcho-DeReverb Is PyTorch Model , How Convert UVR-DeEcho-DeReverb to ONNX Model, Or Support PyTorch Model Load

MDX23C-8KFFT-InstVoc_HQ_2.ckpt model

Hello I found a model mapper in uvr {
"UVR_MDXNET_1_9703": "UVR-MDX-NET 1",
"UVR_MDXNET_2_9682": "UVR-MDX-NET 2",
"UVR_MDXNET_3_9662": "UVR-MDX-NET 3",
"UVR_MDXNET_KARA": "UVR-MDX-NET Karaoke",
"UVR_MDXNET_Main": "UVR-MDX-NET Main",
"UVR-MDX-NET-Inst_1": "UVR-MDX-NET Inst 1",
"UVR-MDX-NET-Inst_2": "UVR-MDX-NET Inst 2",
"UVR-MDX-NET-Inst_3": "UVR-MDX-NET Inst 3",
"UVR-MDX-NET-Inst_4": "UVR-MDX-NET Inst 4",
"UVR-MDX-NET-Inst_Main": "UVR-MDX-NET Inst Main",
"UVR-MDX-NET-Inst_Main_2": "UVR-MDX-NET Inst Main 2",
"UVR-MDX-NET-Inst_HQ_1": "UVR-MDX-NET Inst HQ 1",
"UVR-MDX-NET-Inst_HQ_2": "UVR-MDX-NET Inst HQ 2",
"UVR-MDX-NET-Inst_HQ_3": "UVR-MDX-NET Inst HQ 3",
"UVR_MDXNET_KARA_2": "UVR-MDX-NET Karaoke 2",
"Kim_Vocal_1": "Kim Vocal 1",
"Kim_Vocal_2": "Kim Vocal 2",
"Kim_Inst": "Kim Inst",
"MDX23C-8KFFT-InstVoc_HQ.ckpt": "MDX23C-InstVoc HQ",
"MDX23C-8KFFT-InstVoc_HQ_2.ckpt": "MDX23C-InstVoc HQ 2",
"MDX23C_D1581.ckpt": "MDX23C-InstVoc D1581",
"Reverb_HQ_By_FoxJoy": "Reverb HQ"
} when trying to use MDX23C-8KFFT-InstVoc_HQ.ckpt it doesnt work !

在分离带有二胡声的音频中，二胡声分离不出去，比如兰亭序，这该采用什么模型，有那种直接全部分离出去的模型吗？

audio-separator: error: unrecognized arguments: --model_name=UVR_MDXNET_KARA

Hi! I'm getting this error when trying to set a model_name:

audio-separator: error: unrecognized arguments: --model_name=UVR_MDXNET_KARA_2

I see there is an example using this option on README.

I'm using Python 3.10.

[Feature] Ability to only keep the vocals/karaoke audio file

Arguably not a critical feature since it would be very easy for me to delete the extra file after the separation,
but I think it could be convenient to have a parameter that allows us to only keep the karaoke or vocals stream.

Vocal file still contains background music

I tried to run the command:
audio-separator test.mp3 --log_level debug --model_file_dir PATHTOMODELS --model_name UVR-MDX-NET-Inst_HQ_1
It saved two files, the vocals file still contains background music while the instrumental file works perfect.
But if I add the --single_stem vocals, it doesn't contain background music. Is this a bug?

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 2 and the array at index 1 has size 6

Hi, when I run the separator on 24 mins long m4a file, I get the following error:

Traceback (most recent call last):
File "/usr/local/bin/audio-separator", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/audio_separator/utils/cli.py", line 112, in main
output_files = separator.separate()
File "/usr/local/lib/python3.9/dist-packages/audio_separator/separator.py", line 193, in separate
source = self.demix_base(mix)[0]
File "/usr/local/lib/python3.9/dist-packages/audio_separator/separator.py", line 271, in demix_base
mix_waves, pad = self.initialize_mix(mix_p, is_ckpt=is_ckpt)
File "/usr/local/lib/python3.9/dist-packages/audio_separator/separator.py", line 254, in initialize_mix
mix_p = np.concatenate((np.zeros((2, self.trim)), mix, np.zeros((2, pad)), np.zeros((2, self.trim))), 1)
File "<array_function internals>", line 180, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 2 and the array at index 1 has size 6

When I print out $mix value, I get
mix =
[[ 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00
0.000000e+00 0.000000e+00]
[ 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00
0.000000e+00 0.000000e+00]
[ 9.408713e-09 -9.586301e-09 8.488125e-09 ... -2.402059e-06
1.561118e-06 0.000000e+00]
[ 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00
0.000000e+00 0.000000e+00]
[ 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00
0.000000e+00 0.000000e+00]
[ 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00
0.000000e+00 0.000000e+00]]
[[0. 0. 0. ... 0. 0. 0.]

It seems like $mix is supposed to be a 2 x n matrix, but it has 6 x n matrix instead.

Do you know how to resolve this issue? The script doesn't really print other messages in debug mode.

Fails to run when segment_size is set to anything except 256 in mdx_params of Separator

Trying to run the library similarly to how I use UVR5; I wanted to run a model with a larger segment size, but I can't seem to get anything other than the default 256 to work at all on either of the two MDX models I've tried (Voc FT and Inst HQ 4).

Running the below code works normally:

from audio_separator.separator import Separator

sep = Separator(model_file_dir="./models", invert_using_spec=True)

# sep.load_model("UVR-MDX-NET-Inst_HQ_4.onnx")

# prim, _ = sep.separate("test.wav")

sep.load_model("UVR-MDX-NET-Voc_FT.onnx")

voc, _ = sep.separate("test.wav")

...outputs two stems as wav files in the parent directory, as expected.

But changing this line:

...
sep = Separator(model_file_dir="models", invert_using_spec=True, mdx_params={"segment_size": 1024})
...

gives the following output:

2024-03-14 15:57:19,547 - INFO - separator - Separator version 0.15.3 instantiating with output_dir: None, output_format: WAV
2024-03-14 15:57:19,547 - DEBUG - separator - Secondary step will be inverted using spectogram rather than waveform. This may improve quality, but is slightly slower.
2024-03-14 15:57:19,548 - INFO - separator - Operating System: Windows 10.0.19045
2024-03-14 15:57:19,548 - INFO - separator - System: Windows Node: realbox Release: 10 Machine: AMD64 Proc: AMD64 Family 25 Model 33 Stepping 2, AuthenticAMD
2024-03-14 15:57:19,548 - INFO - separator - Python Version: 3.11.7
2024-03-14 15:57:19,548 - INFO - separator - PyTorch Version: 2.1.2+cu121
2024-03-14 15:57:19,566 - INFO - separator - FFmpeg installed: ffmpeg version 2024-02-04-git-7375a6ca7b-essentials_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers  
2024-03-14 15:57:19,568 - DEBUG - separator - Python package: onnxruntime-silicon not installed
2024-03-14 15:57:19,570 - INFO - separator - ONNX Runtime GPU package installed with version: 1.17.1
2024-03-14 15:57:19,570 - INFO - separator - ONNX Runtime CPU package installed with version: 1.16.3
2024-03-14 15:57:19,589 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA
2024-03-14 15:57:19,589 - INFO - separator - ONNXruntime has CUDAExecutionProvider available, enabling acceleration
2024-03-14 15:57:19,589 - INFO - separator - Loading model UVR-MDX-NET-Voc_FT.onnx...
2024-03-14 15:57:19,589 - DEBUG - separator - File already exists at ./models\download_checks.json, skipping download
2024-03-14 15:57:19,590 - DEBUG - separator - Model download list loaded
2024-03-14 15:57:19,590 - DEBUG - separator - Searching for model_filename UVR-MDX-NET-Voc_FT.onnx in supported_model_files_grouped
2024-03-14 15:57:19,590 - DEBUG - separator - Single file model identified: MDX-Net Model: UVR-MDX-NET Voc FT
2024-03-14 15:57:19,590 - DEBUG - separator - File already exists at ./models\UVR-MDX-NET-Voc_FT.onnx, skipping download
2024-03-14 15:57:19,591 - DEBUG - separator - Returning path for single model file: ./models\UVR-MDX-NET-Voc_FT.onnx
2024-03-14 15:57:19,591 - DEBUG - separator - Model downloaded, friendly name: MDX-Net Model: UVR-MDX-NET Voc FT
2024-03-14 15:57:19,591 - DEBUG - separator - Calculating MD5 hash for model file to identify model parameters from UVR data...
2024-03-14 15:57:19,591 - DEBUG - separator - Calculating hash of model file ./models\UVR-MDX-NET-Voc_FT.onnx
2024-03-14 15:57:19,605 - DEBUG - separator - Model ./models\UVR-MDX-NET-Voc_FT.onnx has hash 77d07b2667ddf05b9e3175941b4454a0
2024-03-14 15:57:19,606 - DEBUG - separator - VR model data path set to ./models\vr_model_data.json
2024-03-14 15:57:19,606 - DEBUG - separator - File already exists at ./models\vr_model_data.json, skipping download
2024-03-14 15:57:19,606 - DEBUG - separator - MDX model data path set to ./models\mdx_model_data.json
2024-03-14 15:57:19,606 - DEBUG - separator - File already exists at ./models\mdx_model_data.json, skipping download
2024-03-14 15:57:19,607 - DEBUG - separator - Loading MDX and VR model parameters from UVR model data files...
2024-03-14 15:57:19,607 - DEBUG - separator - Model data loaded from UVR JSON using hash 77d07b2667ddf05b9e3175941b4454a0: {'compensate': 1.021, 'mdx_dim_f_set': 3072, 'mdx_dim_t_set': 8, 'mdx_n_fft_scale_set': 7680, 'primary_stem': 'Vocals'}
2024-03-14 15:57:20,238 - DEBUG - common_separator - Common params: model_name=UVR-MDX-NET-Voc_FT, model_path=./models\UVR-MDX-NET-Voc_FT.onnx
2024-03-14 15:57:20,238 - DEBUG - common_separator - Common params: primary_stem_output_path=None, secondary_stem_output_path=None
2024-03-14 15:57:20,238 - DEBUG - common_separator - Common params: output_dir=None, output_format=WAV
2024-03-14 15:57:20,239 - DEBUG - common_separator - Common params: normalization_threshold=0.9
2024-03-14 15:57:20,239 - DEBUG - common_separator - Common params: enable_denoise=None, output_single_stem=None
2024-03-14 15:57:20,239 - DEBUG - common_separator - Common params: invert_using_spec=True, sample_rate=44100
2024-03-14 15:57:20,239 - DEBUG - common_separator - Common params: primary_stem_name=Vocals, secondary_stem_name=Instrumental
2024-03-14 15:57:20,239 - DEBUG - common_separator - Common params: is_karaoke=False, is_bv_model=False, bv_model_rebalance=0
2024-03-14 15:57:20,239 - DEBUG - mdx_separator - MDX arch params: batch_size=1, segment_size=1024
2024-03-14 15:57:20,240 - DEBUG - mdx_separator - MDX arch params: overlap=None, hop_length=None, enable_denoise=None
2024-03-14 15:57:20,240 - DEBUG - mdx_separator - MDX arch params: compensate=1.021, dim_f=3072, dim_t=256, n_fft=7680
2024-03-14 15:57:20,240 - DEBUG - mdx_separator - MDX arch params: config_yaml=None
2024-03-14 15:57:20,240 - DEBUG - mdx_separator - Loading ONNX model for inference...
Traceback (most recent call last):
  File "t:\python-audio-separator-test\Untitled-1.py", line 10, in <module>
    sep.load_model("UVR-MDX-NET-Voc_FT.onnx")
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\audio_separator\separator\separator.py", line 605, in load_model
    self.model_instance = separator_class(common_config=common_params, arch_config=self.arch_specific_params[model_type])
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\audio_separator\separator\architectures\mdx_separator.py", line 106, in __init__
    self.model_run = onnx2torch.convert(self.model_path)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\onnx2torch\converter.py", line 72, in convert
    onnx_model = safe_shape_inference(onnx_model_or_path)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\onnx2torch\utils\safe_shape_inference.py", line 46, in safe_shape_inference
    return _shape_inference_by_model_path(onnx_model_or_path, output_path=tmp_model_file.name, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\onnx2torch\utils\safe_shape_inference.py", line 24, in _shape_inference_by_model_path
    return onnx.load(output_path)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\onnx\__init__.py", line 208, in load_model
    model = _get_serializer(format, f).deserialize_proto(_load_bytes(f), ModelProto())
                                                         ^^^^^^^^^^^^^^
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\onnx\__init__.py", line 145, in _load_bytes
    with open(f, "rb") as readable:
         ^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: 'T:\\python-audio-separator-test\\models\\tmptftn7wrq'

Problem with installing models

2024-02-07 15:13:31.123 - ERROR - separator - Failed to download file from https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/Kim_Vocal_2

As i see, it gives me 404 Not Found error https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/

Kim Vocal2 instrumental music contains vocal sound. But UVR GUI doesn't

I tested with multiple songs using the same model "Kim Vocal 2"

the instrumental file from UVR GUI does not have any vocals left, but python-audio-separator contains significant amount of vocals left in the instrument.

I tested it both locally and on my aws server, but whenever I use python-audio-separator's Kim Vocal 2, I keep getting the same result.

Has anyone else had the same issue?

Implement support for checkpoint models (e.g. "MDX23C-8KFFT-InstVoc_HQ_2.ckpt")

This should be a pretty simple fix -- the bug comes from there not being a "primary_stem" label for this model in the config json. Perhaps remove this requirement or have a default value for it?

Error Installing audio-separator with GPU Support: Dependency Conflicts and CUDA Library Issue

Problem Description:
I'm encountering an issue while setting up GPU support for the audio-separator package as per the README instructions. After following the installation steps, I'm faced with dependency conflicts, and attempting to run the separator defaults to CPU usage due to a missing CUDA library.

Steps to Reproduce:

conda create -n audio-separator python=3.9
conda activate audio-separator
pip install audio-separator
pip uninstall torch onnxruntime
pip cache purge
pip install torch "optimum[onnxruntime-gpu]"

Expected vs Actual Behavior:
Expected: Successful installation of the audio-separator with GPU support.
Actual: Encountered dependency conflicts and a missing CUDA library error.

Error Logs:

Downloading frozenlist-1.4.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (228 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 228.0/228.0 kB 4.0 MB/s eta 0:00:00
Downloading pytz-2023.3.post1-py2.py3-none-any.whl (502 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 502.5/502.5 kB 7.7 MB/s eta 0:00:00
Downloading yarl-1.9.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (304 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 304.3/304.3 kB 4.0 MB/s eta 0:00:00
Installing collected packages: sentencepiece, pytz, xxhash, tzdata, tqdm, safetensors, regex, pyyaml, python-dateutil, pyarrow-hotfix, pyarrow, psutil, multidict, fsspec, frozenlist, dill, attrs, async-timeout, yarl, responses, pandas, onnxruntime-gpu, multiprocess, huggingface-hub, aiosignal, torch, tokenizers, aiohttp, transformers, accelerate, datasets, optimum, evaluate
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2023.12.2
    Uninstalling fsspec-2023.12.2:
      Successfully uninstalled fsspec-2023.12.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
audio-separator 0.8.0 requires onnxruntime<2.0,>=1.15; sys_platform != "darwin" and platform_machine != "arm64", which is not installed.
Successfully installed accelerate-0.25.0 aiohttp-3.9.1 aiosignal-1.3.1 async-timeout-4.0.3 attrs-23.1.0 datasets-2.15.0 dill-0.3.7 evaluate-0.4.1 frozenlist-1.4.0 fsspec-2023.10.0 huggingface-hub-0.19.4 multidict-6.0.4 multiprocess-0.70.15 onnxruntime-gpu-1.16.3 optimum-1.15.0 pandas-2.1.4 psutil-5.9.6 pyarrow-14.0.1 pyarrow-hotfix-0.6 python-dateutil-2.8.2 pytz-2023.3.post1 pyyaml-6.0.1 regex-2023.10.3 responses-0.18.0 safetensors-0.4.1 sentencepiece-0.1.99 tokenizers-0.15.0 torch-2.1.1 tqdm-4.66.1 transformers-4.36.0 tzdata-2023.3 xxhash-3.4.1 yarl-1.9.4

Environment Details:

CUDA Version: 12.2
PyTorch Version: 2.1.1+cu121
Operating System: Ubuntu

Attempts to Resolve:
I tried to manually install onnxruntime, which seems to solve the error, however when I actually run audio-separation, I see 0 GPU utilisation

Request for Specific Help:
Any guidance on resolving these dependency conflicts and the CUDA library issue would be greatly appreciated.

Reference to Documentation:
I followed the GPU setup instructions from the audio-separator README.

MDX23C compatibility

Hey good job on your tool, exactly what I was looking for.

Will there be support for MDX23C ?

It's rather new in uvr and seems to be a variation of mdx (comes in .onnx format) but seems to be really good.

UVR uses different classes for mdx and mdxc as they call it.

Separator terminated with "Killed" message (Out of memory / OOM error on Linux)

def _vocal_instrumental_remover(name_file:str):
    from audio_separator.separator import Separator
    separator = Separator()
    separator.load_model('UVR-MDX-NET-Inst_HQ_3')
    primary_stem_path, secondary_stem_path = separator.separate(f"temp_file/{name_file}/{name_file}.wav")
    print(f'Primary stem saved at {primary_stem_path}')
    print(f'Secondary stem saved at {secondary_stem_path}')
    return {"primary": primary_stem_path, "secondary": secondary_stem_path}

if __name__ == "__main__":
    print(_vocal_instrumental_remover(name_file="1-1708443014-635058"))

024-02-21 08:40:19,232 - DEBUG - separator - STFT applied on mix. Spectrum shape: torch.Size([1, 4, 3072, 256])
2024-02-21 08:40:32,414 - DEBUG - separator - Model run on the spectrum without denoising.
2024-02-21 08:40:32,778 - DEBUG - separator - Inverse STFT applied. Returning result with shape: (1, 2, 261120)
2024-02-21 08:40:32,780 - DEBUG - separator - Normalizing result by dividing result by divider.
/usr/local/lib/python3.10/dist-packages/audio_separator/separator/separator.py:657: RuntimeWarning: invalid value encountered in divide
  tar_waves = result / divider
Killed

Congratulations. This is the problem with one-hour videos. After all the chunks, the console displays the text "Killed" and that's it. How can I fix it? Operating system Ubuntu, CPU.

GPU not being used on Colab

Seems like my separation pipeline is running in CPU mode on colab, even after reinstalling torch -- a 3 minute track takes 5 minutes to separate using Kim Vocal 2.

Steps used to install:
!pip install "audio-separator[gpu]==0.14.5"
!pip uninstall torch onnxruntime-gpu
!pip cache purge
!pip install --force-reinstall torch==2.1.0 torchvision torchaudio
!pip install --force-reinstall onnxruntime-gpu

And the info from the relevant run:
"INFO:audio_separator.separator.separator:Separator version 0.14.5 instantiating with output_dir: None, output_format: FLAC
INFO:audio_separator.separator.separator:Operating System: Linux #1 SMP PREEMPT_DYNAMIC Sat Nov 18 15:31:17 UTC 2023
INFO:audio_separator.separator.separator:System: Linux Node: 47eab71b5093 Release: 6.1.58+ Machine: x86_64 Proc: x86_64
INFO:audio_separator.separator.separator:Python Version: 3.10.12
INFO:audio_separator.separator.separator:FFmpeg installed: ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
DEBUG:audio_separator.separator.separator:Python package: onnxruntime-silicon not installed
DEBUG:audio_separator.separator.separator:Python package: onnxruntime not installed
INFO:audio_separator.separator.separator:ONNX Runtime GPU package installed with version: 1.17.0
INFO:audio_separator.separator.separator:CUDA is available in Torch, setting Torch device to CUDA
INFO:audio_separator.separator.separator:ONNXruntime has CUDAExecutionProvider available, enabling acceleration
INFO:audio_separator.separator.separator:Loading model Kim_Vocal_2.onnx..."

M4A support

m4a audio format seems to be supported as input, but not as output, contrary to what is stated in the readme ("output_format: (Optional) Format to encode output files, any common format (WAV, MP3, FLAC, M4A, etc.). Default: WAV").

When I try to output to m4a, I get Unknown format: 'M4A' (which stems from the soundfile module, which in turn relies on the libsndfile module). The libsndfile module already has an issue for the m4a support: libsndfile/libsndfile#389.

Large instrumental bleedthrough in some vocal tracks

I'm using your python library with mdx net inst HQ 3 and 4 models. Sometimes, vocal tracks have quite a bit of instrumental bleedthrough in them. This doesn't happen with UVR with either of these models. I use the default settings. I've tried with denoise, different normalization, etc, same result.

Some errors popping up, Process seems slow.!!

These are some errors I am seeing while running this on a Modal.run instance.

2024-01-24 03:16:06.043152838 [E:onnxruntime:Default, provider_bridge_ort.cc:1480 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcurand.so.10: cannot open shared object file: No such file or directory

2024-01-24 03:16:06.043193312 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:747 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
DEBUG:audio_separator.separator.separator:Model loaded successfully using ONNXruntime inferencing session.
DEBUG:audio_separator.separator.separator:Loading model completed.
INFO:audio_separator.separator.separator:Load model duration: 00:00:00

DEBUG:audio_separator.separator.separator:Window applied to the chunk.
/usr/local/lib/python3.10/site-packages/audio_separator/separator/separator.py:630: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:261.)
  mix_part = torch.tensor([mix_part_], dtype=torch.float32).to(self.device)
DEBUG:audio_separator.separator.separator:Mix part split into batches. Number of batches: 1
DEBUG:audio_separator.separator.separator:Processing mix_wave batch 1/1
/usr/local/lib/python3.10/site-packages/torch/functional.py:650: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:863.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]

I am processing a 5-minute audio and running twice. usually, it took around 3/4/5 minutes, but now it takes around 10 mins. I was using version 0.7.2 before.

upgraded to newer version and this is how I am using it now.

Run these commands:

    "pip install pydub",
    "pip install moviepy",
    "pip uninstall torch onnxruntime -y",
    "pip install --force-reinstall torch torchvision torchaudio",
    "pip install --force-reinstall onnxruntime-gpu",
    'pip install --force-reinstall "optimum[onnxruntime-gpu]"',
    "pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118",

from audio_separator.separator import Separator
    # Initialize the Separator class (with optional configuration properties below)
    separator = Separator()

    separator.load_model("UVR_MDXNET_KARA_2")
    # Perform the separation
     primary_stem_path, secondary_stem_path = separator.separate(input_file_name)

Is there anything I am doing wrong.??