Giter Site home page Giter Site logo

blaise-tk / rvc_cli Goto Github PK

View Code? Open in Web Editor NEW
56.0 3.0 22.0 937 KB

RVC CLI enables seamless interaction with Retrieval-based Voice Conversion through commands or HTTP requests.

License: Other

Python 96.55% Batchfile 0.28% Jupyter Notebook 3.16% Shell 0.01%
ai cli rvc vits voice-conversion deep-learning pytorch vc voice voiceconversion

rvc_cli's Introduction

rvc_cli's People

Contributors

aitronssesin avatar blaise-tk avatar dedgar avatar github-actions[bot] avatar lukaszliniewicz avatar poiqazwsx avatar vidalnt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rvc_cli's Issues

Inference fails on apple silicon

First of all, thank you, this is the first cli for rvc that actually works!! I've been trying all kinds of solutions. Below is a minor enhancement you could make.

The following error is experienced when inferencing on apple silicon:
The operator 'aten::_fft_r2c' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on pytorch/pytorch#77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
Voice conversion failed: cannot unpack non-iterable NoneType object

Setting the mps fallback as mentioned works but could be handled in your code.

[BUG] '<' not supported between instances of 'str' and 'float'

Describe the bug
A clear and concise description of what the bug is.
It looks like the value of args.protect isn't being converted properly to a float. I've tried converting the arg to a float directly but it produces errors related tot he tensor size in torch.from_numpy(npy).unsqueeze(0).to(self.device) * index_rate + (1 - index_rate) * feats. Fiddled with a bit but couldn't get anything worthwhile to come out.
.......................
To Reproduce
Steps to reproduce the behavior:
$ python3 main.py batch_infer --f0up_key 5 --filter_radius 4 --index_rate 0.9 --hop_length 128 --rms_mix_rate 1.0 --protect 0.4 --f0autotune True --f0method rmvpe --input_folder "/home/mb/Desktop/" --output_folder "/home/......................." --pth_path "/home/........................pth" --index_path "/home........................index" --export_format WAV

changing the value of protect doesn't seem to change the error.

Expected behavior
For the inference to happen correcty

Assets
If applicable, add screenshots/videos to help explain your problem.

Desktop (please complete the following information):

  • Linux Mint 21 runningg kernel 6.5.0-26-generic
  • Firefox?

Additional context
Add any other context about the problem here.

Error: 'config' when trying to infer from a model that I trained

Describe the bug
Inference from a model I trained isn't working and just printing out "Error: 'config'" instead

To Reproduce
I used the colab in the repo, with the change that I made the main directory on my Google Drive so I don't have to pull every time and so the models I make are automatically saved.

I trained a model, and maybe this is the problem, with v2 and 40k sample rate. I later saw in the configs that there is no v2-40k config.

I then tried the inference there, and it didn't work out, it just spat out "Error: 'config'"

I traced it to the vc pipeline where it loads the checkpoint and tries to access ckpt['config'].
Then outside of the code, I loaded the checkpoints saved from my training and they do not have a 'config' key.
So I looked at the code that saves the checkpoints, and there is no 'config' key there either.

I think I'm doing something wrong, but I'm not sure what.

  1. should all of the saved G's be usable? I think the most possible issue is that I'm using the wrong model file. I'm using the ones in the log/model_name directory. Usually RVC saves other weights in the weights directory, but there isn't one here.
  2. As mentioned above, I checked the training code where it saves, and there is no explicit 'config' in the saved pth when it's saving the epoch checkpoints. Are there different models being saved?
  3. Is V2 + 40k supported even though there is no config file for it?
  4. Could having the repo on Google drive cause issues? I know sometimes it causes Linux path issues.
  5. save_only_latest seems to not be a usable flag since the best model might not be the latest, and usually we need to go back and see the performance and pick the one that's best. How is this flag really used?

Expected behavior
Inference works

[BUG] API infer not work.

Code Invocation:

 curl --location 'http://127.0.0.1:8000/infer' \
--header 'Content-Type: application/json' \
--data '{
  "f0up_key": 0,
  "filter_radius": 2,
  "index_rate": 0.5,
  "hop_length": 256,
  "rms_mix_rate": 0.5,
  "protect": 0.5,
  "f0autotune": false,
  "f0method": "rmvpe",
  "input_path": "/home/.../RVC_CLI/input/018b3ee3-50a3-7b40-8b02-c99d3753a8a4.mp3",
  "output_path": "/home/.../RVC_CLI/output/1.wav",
  "pth_path": "/home/.../RVC_CLI/logs/Alisa/Alisa.pth",
  "index_path": "/home/.../RVC_CLI/logs/Alisa/added_IVF757_Flat_nprobe_1_Alisa_v2.index",
  "split_audio": false,
  "clean_audio": false,
  "clean_strength": 0.5,
  "export_format": "WAV"
}

Result:


{
    "output": "<All keys matched successfully>\nConversion completed. Output file: '/home/.../RVC_CLI/output/1.wav' in 4.14 seconds.\n",
    "error": "/home/.../anaconda3/envs/rvc_cli/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.\n  warnings.warn(\"torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.\")\n"
}

However, the file does not appear in the output folder.

Moreover, to make it work, I changed:


@app.post("/infer")
async def infer(request: Request):
    command = ["python", "main.py", "infer"]

    json_data = await request.json()

    command += [f"--{key}={value}" for key, value in json_data.items()]

    return execute_command(command)

Also, it would be good to format the output as follows:

When the status code is 200:

{"audio_content": "path/name.wav", "message": "..."}

For other status codes:

{"message": "...", "error": "..."}

IP and PORT change for API

Hello, I didn’t find any other contacts, so I’ll write the problem here.

Question context:

I'm trying to write an application in C# (WinForms) using your solution. I am weak in programming, outside the zone of simple C# applications, but I was interested in the functionality of the RVC library. I want to try to establish interaction with the RVC by organizing a server and sending requests to it from clients from the global network. But I encountered a problem when the API server is launched on a local machine, where there is a suitable video card for work, but on the machine with which the application is being developed - there is no. And to establish communication with my “server” where your solution will be deployed, I tried to change the server launch parameters to a local IP (192.168.x.x) and another port, but failed.

The main point of the question:

Can I somehow use launch parameters (for example the main.py api [-ip or -port] file) to change the parameters of the API server (uvicorn server)?
If not, is it possible to add such functionality to the "main.py api" command?

Stuff to improve GUI wise or ya get what i mean :3

So first things first, infer section looks alright. moving on to training u guys can def improve some stuff on there:

  1. Preprocess Dataset: i believe you guys should set the preset sample rate to 32k and not 40k
  2. Extract Features: eradicate v1, its outdated and uses a smaller hubert (i think) but regardless it lacks in dynamics much more than the v2 arch we have for rvc so yeah u can just remove the option of choosing versions and auto set it to v2.
    the rest looks fine for now, i really like it at least. One thing u can also add though is model fusion and all the other stuff the gui has like ckpt processing and so on and so forth. Other than that it works wonderfully :3

split_audio does not seem to work?

Setting to True or False always results on in the same output.wav? how can I get the voice swapped output combined with the original instrumental ? Thanks. This one has been the easiest to use tool so far.

Training problem

I was trying to train a model using the OV2 pretraining model, and I came across a strange thing, for every 1 epoch only 1 step was generated

Example:
model_10_epoch_10_steps
model_100_epoch_100_steps

I am using the code on Kaggle, which uses conda with python 3.10

Sorry for the bad English, it's not my first language

feature request: enable GPU for inference

script works fine

feature requests
a) I use RVC and use GPU for inference, can you enable it in cli as well
b) can the temporary files be kept inside a folder say temp on projects, making it easier for housekeeping

thanks
Senthil

[BUG?] It is not possible to use more than one GPU for training on Kaggle

Describe the bug
When i try to use 2 GPUs on Kaggle, this error occours:
[W socket.cpp:663] [c10d] The client socket has failed to connect to [localhost]:55292 (errno: 99 - Cannot assign requested address).

To Reproduce
Steps to reproduce the behavior:

  1. Simply put "0-1" in --gpu in training part on Kaggle

Expected behavior
Be able to use two GPUs for training.

Assets
image

Desktop (please complete the following information):

  • OS: Linux
  • Browser: chrome

Additional context
Yes, I was using the 2x T4, and was using the last commit ("fix multi gpu")

[BUG] API won't start

Normal inference works, but when I try API, with or without host/port arguments, I get an error:

\RVC_CLI> ./env/python.exe main.py api
Error: [WinError 2] The system cannot find the file specified

[BUG] '<' not supported between instances of 'str' and 'float' / No output file

Describe the bug
During infer an error/warning is reported, " '<' not supported between instances of 'str' and 'float' "
and no output file is written despite the output saying it is.

To Reproduce
Steps to reproduce the behavior:
Run the command

PS M:\LLMs\tts\RVC_CLI> .\env\python.exe main.py infer `
>> --index_path '.\rvcs\test0.index' `
>> --pth_path '.\rvcs\test0.pth' `
>> --input_path '.\output.wav' `
>> --output_path 'M:\LLMs\tts\RVC_CLI\output-rvc.wav'
<All keys matched successfully>
'<' not supported between instances of 'str' and 'float'
Conversion completed. Output file: 'M:\LLMs\tts\RVC_CLI\output-rvc.wav' in 2.22 seconds.
PS M:\LLMs\tts\RVC_CLI>

Expected behavior
An output file processed with the supplied pth/index

Assets
If applicable, add screenshots/videos to help explain your problem.

Desktop (please complete the following information):
Windows 11

Additional context
If I checkout tag 1.1.2 it all works

[BUG] Batch Conversion on Apple Silicon Mac

I've got Applio running on my M2 Max Mac Studio but Batch Conversion is not working. To get further information I cloned this git here and tried the CLI batch conversion, which also does not work. Single conversion works fine with CLI and Applio.

This is my single conversion cmd, which results in a working file:
python main.py infer --f0up_key "0" --filter_radius "3" --index_rate "0.8" --hop_length "64" --split_audio "True" --f0autotune "False" --f0method "rmvpe" --input_path "/Users/liam/Music/RVC/city_of_angels/hmmmh.wav" --output_path "/Users/liam/Downloads/test/test.wav" --pth_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super.pth" --index_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super_clean.index"

This is batch-conversion, which results in an error, no matter if rms_mix_rate and other parameters are included in the cmd or not:
python main.py batch_infer --f0up_key "0" --filter_radius "3" --index_rate "0.8" --hop_length "64" --split_audio "True" --f0autotune "False" --f0method "rmvpe" --input_folder "/Users/liam/Music/RVC/love_me_down/ValYoung" --output_folder "/Users/liam/Downloads/test" --pth_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super.pth" --index_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super_clean.index" --rms_mix_rate "0.0"

The conversion fails with the following error:

Inferring /Users/liam/Music/RVC/love_me_down/ValYoung/Ladada_1.wav.wav...
No supported Nvidia GPU found
Traceback (most recent call last):
  File "/Users/liam/Downloads/RVC_CLI/rvc/infer/infer.py", line 229, in <module>
    rms_mix_rate = float(sys.argv[12])
ValueError: could not convert string to float: 'True'

Seems like rms_mix_rate=True is sneaking in somewhere, and resulting in an error when converted to float. But where is it coming from? I removed all arguments that use True/False from the cmd, but it still ends up with this error.

Slight quality issues

Hi,
Now it's a lot better than before. The parameters work well and the quality is better than before.

However, in some places, it makes the voice sound like an old grand mother struggling to speak.

Here's my command:

python main.py infer --f0up_key "2" --filter_radius 5 --index_rate "0.1" --hop_length "25" --f0method "dio" --input_path "input.wav" --output_path "output.wav" --pth_path "rvcfinalv4-harvest-1000epochs.pth" --index_path "rvcfinalv4-harvest-1000epochs.index" --split_audio "False" --f0autotune "False"

Let me know if we can do anything to improve the voice quality.

Once again, your work is great in the RVC commandline space. Yours is the best commandline tool for RVC, better than all the ones even released by the original RVC project. So, Thank you.

robotic sounding output

Static (Robotic) noise in generated output. I even tried upto 3500 epochs but no success using the commandline.

However, when I use the gui, it works. I am not sure what the issue is.

#!/bin/bash

# Define variables
MODEL_NAME="MyVoiceModel"
VOICE_DATA_PATH="voice/myvoice.wav"
SAMPLE_RATE=48000
RVC_VERSION="v2"
HOP_LENGTH=256
F0METHOD="rmvpe"
TOTAL_EPOCHS=1000
BATCH_SIZE=16
GPU=0 # Adjust if you have multiple GPUs
SAVE_EVERY_EPOCH=10
TEXT_TO_SYNTHESIZE="This is a sample text for voice conversion."

# Step 1: Preprocess Dataset
echo "Preprocessing Dataset..."
python main.py preprocess "$MODEL_NAME" "$VOICE_DATA_PATH" $SAMPLE_RATE

# Step 2: Extract Features
echo "Extracting Features..."
python main.py extract "$MODEL_NAME" $RVC_VERSION $F0METHOD $HOP_LENGTH $SAMPLE_RATE

# Step 3: Train the Model
echo "Training the Model..."
python main.py train "$MODEL_NAME" $RVC_VERSION $SAVE_EVERY_EPOCH False True $TOTAL_EPOCHS $SAMPLE_RATE $BATCH_SIZE $GPU True False False

# Step 4: Generate Index File
echo "Generating Index File..."
python main.py index "$MODEL_NAME" $RVC_VERSION

# Step 5: Voice Conversion Inference (Modify paths to the model and index files as needed)
echo "Performing Voice Conversion Inference..."
python main.py infer "$TEXT_TO_SYNTHESIZE" "$MODEL_NAME" 0 5 0.5 $HOP_LENGTH "$F0METHOD" "output_tts.wav" "output_rvc.wav" "path_to_trained_model/$MODEL_NAME.pth" "path_to_index_file/$MODEL_NAME.index"

echo "Voice Conversion Process Completed."

API

I'm having some issues with the API call (internal server error) - I'm assuming its the syntax of the JSON at this point, ive messed around a bit but keeps returning in error. Here is how the Json is syntaxed atm:
{
"f0up_key": 0,
"filter_radius": 5,
"index_rate": 0.5,
"hop_length": 256,
"f0method": "rmvpe",
"input_path": "D:\Projects\VoiceChangerAI\TestFile\testa.wav",
"output_path": "D:\Projects\VoiceChangerAI\TestFile\output.wav",
"pth_file": "LB.pth",
"index_path": "LB.index",
"split_audio": false,
}

have "LB.pth" and the index in the "RVC_CLI\models" folder currently?

Thanks for any help - I'm total narb with this stuff >_<

API -

still having issues with the API. The main infer worked with the same input as the Json bellow. I tried messing around with the format a bit but no luck.

JSON:

{
"f0up_key": "2",
"filter_radius": "5",
"index_rate": "0.1",
"hop_length": "25",
"f0method": "dio",
"input_path": "D:\Projects\VoiceChangerAI\TestFile\testa.wav",
"output_path": "D:\Projects\VoiceChangerAI\TestFile\output_API.wav",
"pth_path": "C:\Users\KCLEE\Documents\GitHub\models\LenvalBrown.pth",
"index_path": "C:\Users\KCLEE\Documents\GitHub\models\LenvalBrown.index",
"split_audio": "false",
"f0autotune": "false"
}

the console spits out this:

Traceback (most recent call last):
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\main.py", line 953, in
main()
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\main.py", line 947, in main
run_api_script()
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\main.py", line 385, in run_api_script
subprocess.run(command)
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 507, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 1126, in communicate
self.wait()
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 1189, in wait
return self._wait(timeout=timeout)
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 1486, in _wait
result = _winapi.WaitForSingleObject(self._handle,

Client side I get a "error 400 - bad request"

Training args

The voice is getting distorted when chunks are made resulting in robotic voice in output

Does not work?

No matter what I try for the hop length value with or without double quotes/single quotes, it wont work. It became very frustrating.

python main.py infer --f0up_key "0" --filter_radius "5" --index_rate "0.5" --hop_length "256" --f0method "dio" --input_path "input.wav" --output_path "output.wav" --pth_file "model.pth" --index_path "model.index" --split_audio "False" --f0autotune "False"

it was better before you changed the arguments. Atleast it worked and it was robotic. but now I am totally unable to use it.

[BUG] Training Threshold set to incorrect value when no value is set in command

Describe the bug
A clear and concise description of what the bug is.
Get the following error, whether overtrain_detector is set to true/false and whether overtrain_threshold is set to an integer value or not:
train.py: error: argument -ot/--overtraining_threshold: invalid int value: 'False'
To Reproduce
Steps to reproduce the behavior:
Run with the following options:
{python_path} main.py train --model_name {model_name} --sampling_rate 40000 --pitch_guidance True --gpu 1 --save_every_epoch 50 --save_only_latest True --overtraining_detector False

Expected behavior
A clear and concise description of what you expected to happen.
Training runs and completes

Assets
If applicable, add screenshots/videos to help explain your problem.

Desktop (please complete the following information):

  • OS: Windows 11

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.