Comments (3)
这是我的代码 from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig import torch import re import os import glob import time
torch.manual_seed(1234)
model_path="/home/wzp/.cache/modelscope/hub/qwen/Qwen-Audio" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
打开bf16精度,A100、H100、RTX3060、RTX3070等显卡建议启用以节省显存
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, bf16=True).eval()
打开fp16精度,V100、P100、T4等显卡建议启用以节省显存
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="auto", trust_remote_code=True, fp16=True).eval()
使用CPU进行推理,需要约32GB内存
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Audio", device_map="cpu", trust_remote_code=True).eval()
默认gpu进行推理,需要约24GB显存
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True, bf16=True).eval()
可指定不同的生成长度、top_p等相关超参(transformers 4.32.0及以上无需执行此操作)
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-Audio", trust_remote_code=True)
audio_url = "/home/wzp/project/yolov8/modelscope/output.wav" sp_prompt = "<|startoftranscription|><|cn|><|transcribe|><|cn|><|notimestamps|><|wo_itn|>" query = f"{audio_url}{sp_prompt}" audio_info = tokenizer.process_audio(query) inputs = tokenizer(query, return_tensors='pt', audio_info=audio_info) inputs = inputs.to(model.device) pred = model.generate(**inputs, audio_info=audio_info) response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False,audio_info=audio_info) print(response)
这是终端输出的结果:
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:10<00:00, 1.17s/it] /home/wzp/project/yolov8/modelscope/output.wav<|startoftranscription|><|cn|><|transcribe|><|cn|><|notimestamps|><|wo_itn|><|notimestamps|><|itn|>Hello, please you need to to handle what business.<|endoftext|>
中文是zh,你看tokenization_qwen.py里的配置参数
from qwen-audio.
同问
from qwen-audio.
感谢,已解决,但我这边Qwen-Audio跑本地的音频测试集准确率比Qwen-Audio-chat模型的准确率低4个百分点
from qwen-audio.
Related Issues (20)
- 确定给的本地模型没问题吗
- 报错,requests.exceptions.HTTPError: Response details: 404 page not found, Request id: ab8a478639c847c6bbb41438e4d8606e
- Infer eval_audio目录下的multi-task eval脚本,发现模型针对batch 解码性能衰减很快,请问是训练时候attention mask 或者tokenizer padding部分处理有问题吗?
- Few-shot Examples
- 关于粤语支持
- 请问Qwen-audio的训练速度,阿里官方达到多少?
- 支持本地api调用吗?
- allow_pickle=False HOT 2
- 微信群满了
- 训练超参数相关问题
- 请问是否支持 VLLM 等api部署 HOT 1
- 是不支持中文提示词吗
- how can i chat in demo
- 有onnx格式的模型吗 HOT 1
- 有量化后的版本吗 HOT 1
- chat模型,相同文本问题,不同音频,每次ASR返回结果都一样
- 微信群满了 HOT 1
- 本地部署需要多少算力‘’
- 关于训练数据中不同语言分布情况
- qwen-audio和lauragpt的相关问题讨论
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qwen-audio.