Giter Site home page Giter Site logo

Qwen-Audio给的示例Demo输入本地音频文件没有跑出转写的文本结果? 能提供相应的例子吗 about qwen-audio HOT 3 OPEN

apple2333cream avatar apple2333cream commented on June 8, 2024
Qwen-Audio给的示例Demo输入本地音频文件没有跑出转写的文本结果? 能提供相应的例子吗

from qwen-audio.

Comments (3)

roydcai avatar roydcai commented on June 8, 2024 1

这是我的代码 from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig import torch import re import os import glob import time

torch.manual_seed(1234)

model_path="/home/wzp/.cache/modelscope/hub/qwen/Qwen-Audio" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

打开bf16精度,A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, bf16=True).eval()

打开fp16精度,V100、P100、T4等显卡建议启用以节省显存

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, fp16=True).eval()

使用CPU进行推理,需要约32GB内存

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="cpu", trust_remote_code=True).eval()

默认gpu进行推理,需要约24GB显存

model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True, bf16=True).eval()

可指定不同的生成长度、top_p等相关超参(transformers 4.32.0及以上无需执行此操作)

model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-Audio", trust_remote_code=True)

audio_url = "/home/wzp/project/yolov8/modelscope/output.wav" sp_prompt = "<|startoftranscription|><|cn|><|transcribe|><|cn|><|notimestamps|><|wo_itn|>" query = f"{audio_url}{sp_prompt}" audio_info = tokenizer.process_audio(query) inputs = tokenizer(query, return_tensors='pt', audio_info=audio_info) inputs = inputs.to(model.device) pred = model.generate(**inputs, audio_info=audio_info) response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False,audio_info=audio_info) print(response)

这是终端输出的结果:

Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:10<00:00, 1.17s/it] /home/wzp/project/yolov8/modelscope/output.wav<|startoftranscription|><|cn|><|transcribe|><|cn|><|notimestamps|><|wo_itn|><|notimestamps|><|itn|>Hello, please you need to to handle what business.<|endoftext|>

中文是zh,你看tokenization_qwen.py里的配置参数

from qwen-audio.

xjturobocon avatar xjturobocon commented on June 8, 2024

同问

from qwen-audio.

apple2333cream avatar apple2333cream commented on June 8, 2024

感谢,已解决,但我这边Qwen-Audio跑本地的音频测试集准确率比Qwen-Audio-chat模型的准确率低4个百分点

from qwen-audio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.