Giter Site home page Giter Site logo

erickong1985 / bark.cpp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pabannier/bark.cpp

0.0 0.0 0.0 16.66 MB

Port of Suno AI's Bark in C/C++ for fast inference

License: MIT License

C++ 10.86% Python 1.03% C 85.22% Makefile 0.88% CMake 2.00%

bark.cpp's Introduction

bark.cpp

bark.cpp

Actions Status License: MIT

Roadmap / encodec.cpp / ggml

Inference of SunoAI's bark model in pure C/C++.

Description

The main goal of bark.cpp is to synthesize audio from a textual input with the Bark model in efficiently using only CPU.

  • Plain C/C++ implementation without dependencies
  • AVX, AVX2 and AVX512 for x86 architectures
  • Optimized via ARM NEON, Accelerate and Metal frameworks
  • iOS on-device deployment using CoreML
  • Mixed F16 / F32 precision
  • 4-bit, 5-bit and 8-bit integer quantization

The original implementation of bark.cpp is the bark's 24Khz English model. We expect to support multiple languages in the future, as well as other vocoders (see this and this). This project is for educational purposes.

Supported platforms:

  • Mac OS
  • Linux
  • Windows

Supported models:

  • Bark's 24Khz model
  • Bark's 48Khz model
  • Multiple voices

Here is a typical run using Bark:

make -j && ./main -p "this is an audio"
I bark.cpp build info:
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I. -O3 -std=c11   -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.0 (clang-1400.0.29.202)
I CXX:      Apple clang version 14.0.0 (clang-1400.0.29.202)

bark_model_load: loading model from './ggml_weights'
bark_model_load: reading bark text model
gpt_model_load: n_in_vocab  = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1894.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1701.69 MB
bark_model_load: reading bark vocab

bark_model_load: reading bark coarse model
gpt_model_load: n_in_vocab  = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1443.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1250.69 MB

bark_model_load: reading bark fine model
gpt_model_load: n_in_vocab  = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 7
gpt_model_load: n_wtes      = 8
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1411.25 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1218.26 MB

bark_model_load: reading bark codec model
encodec_model_load: model size    =   44.32 MB

bark_model_load: total model size  =    74.64 MB

bark_generate_audio: prompt: 'this is an audio'
bark_generate_audio: number of tokens in prompt = 513, first 8 tokens: 20579 20172 20199 33733 129595 129595 129595 129595
bark_forward_text_encoder: ...........................................................................................................

bark_forward_text_encoder: mem per token =     4.80 MB
bark_forward_text_encoder:   sample time =     7.91 ms
bark_forward_text_encoder:  predict time =  2779.49 ms / 7.62 ms per token
bark_forward_text_encoder:    total time =  2829.35 ms

bark_forward_coarse_encoder: .................................................................................................................................................................
..................................................................................................................................................................

bark_forward_coarse_encoder: mem per token =     8.51 MB
bark_forward_coarse_encoder:   sample time =     3.08 ms
bark_forward_coarse_encoder:  predict time = 10997.70 ms / 33.94 ms per token
bark_forward_coarse_encoder:    total time = 11036.88 ms

bark_forward_fine_encoder: .....

bark_forward_fine_encoder: mem per token =     5.11 MB
bark_forward_fine_encoder:   sample time =    39.85 ms
bark_forward_fine_encoder:  predict time = 19773.94 ms
bark_forward_fine_encoder:    total time = 19873.72 ms



bark_forward_encodec: mem per token = 760209 bytes
bark_forward_encodec:  predict time =   528.46 ms / 528.46 ms per token
bark_forward_encodec:    total time =   663.63 ms

Number of frames written = 51840.


main:     load time =  1436.36 ms
main:     eval time = 34520.53 ms
main:    total time = 35956.92 ms

Usage

Here are the steps for the bark model.

Get the code

git clone https://github.com/PABannier/bark.cpp.git
cd bark.cpp

Build

In order to build bark.cpp you have two different options. We recommend using CMake for Windows.

  • Using make:

    • On Linux or MacOS:

      make
  • Using CMake:

    mkdir build
    cd build
    cmake ..
    cmake --build . --config Release

Prepare data & Run

# obtain the original bark and encodec weights and place them in ./models
python3 download_weights.py --download-dir ./models

# install Python dependencies
python3 -m pip install -r requirements.txt

# convert the model to ggml format
python3 convert.py \
        --dir-model ./models \
        --codec-path ./models \
        --vocab-path ./models \
        --out-dir ./ggml_weights/

# run the inference
./main -m ./ggml_weights/ -p "this is an audio"

Seminal papers and background on models

Contributing

bark.cpp is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be

  • bug report: you may encounter a bug while using bark.cpp. Don't hesitate to report it on the issue section.
  • feature request: you want to add a new model or support a new platform. You can use the issue section to make suggestions.
  • pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and a reviewer will reach out to you.

Coding guidelines

  • Avoid adding third-party dependencies, extra files, extra headers, etc.
  • Always consider cross-compatibility with other operating systems and architectures
  • Avoid fancy looking modern STL constructs, keep it simple
  • Clean-up any trailing whitespaces, use 4 spaces for indentation, brackets on the same line, void * ptr, int & ref

bark.cpp's People

Contributors

pabannier avatar green-sky avatar jmtatsch avatar jzeiber avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.