The langchain-chatglm-webui from x-d-lab

请问readme中model scope界面中的web search功能是怎么开启啊？

初次运行报错

显示模块找不到，但是pip装不了这么模块

from duckduckgo_search.utils import Session
ModuleNotFoundError: No module named 'duckduckgo_search.utils'

ubuntu18系统下出现无法识别本地下载好的6b，int8模型，可以识别int4

当加载int4的时候正常

但是切换到6b模型的时候却

配置文件如下

切换6b模型的时候没有错误日志产生

3090系22g显存，cuda11.7，python3.10

请问和单卡对比，多卡推理的响应速度一般多少？Int4的推理速度和fp16的大概是多少呢？谢谢

请问使用chatglm，和单卡对比，多卡推理的响应速度一般多少？Int4的推理速度和fp16的大概是多少呢？谢谢

AiStudio和启智的平台都有问题

AiStudio上依赖库之间出现问题，按照Notebook内容无法正确安装

启智上使用docker镜像，降级了gradio，但是Embedding model无论怎么选择或者怎么放置都无法正确加载（尝试放在dataset中，并对应修改config.py路径，，或者dataset什么都不放让他自己下载）

多个文件依次导入能累积作为知识库

目前新导入一个文件，前一个文件的知识会被覆盖，希望能实现多个文件依次导入持续累积叠加作为知识库

再冒昧的提倆建议

https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui/blob/d93d439e2663b4297e73f50d16af4f382892db69/app.py#L313

这个地方可以多配些参数

demo.queue(concurrency_count=3
).launch(server_name='0.0.0.0',
server_port=17861,
show_api=False,
share=True,
inbrowser=False)

注释尽量用英文，在非utf-8默认的系统里面，中文全是乱码

新版本和视频不太匹配

整个流程跑完了，也正常启动，终端打印的日志中只有127.0.0.1,但是没有外网访问地址，而且使用的是CPU访问，请问是哪一个步骤不对吗？

一直提示找不到模型

模型文件具体在哪，我没看到，但是再开头的log日志里我看到快7个G的文件是下载完了的，我用的hugging face

在模型加载以及知识库上传遇到了各种各样的问题

代码是今天pull的，设备是一机双卡（3090），我这边的网络从HF上下载，要比启智快很多，启智下载不动，所以没有采用作者提供的非chatglm模型：
1、这是从hf上down下来的vicuna-13b-1.1加载后的报错（警告），页面显示加载不成功

This IS expected if you are initializing LlamaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing LlamaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

2、从hf上down下来的BELLE-LLaMA-13B-2M-enc页面显示加载失败，后台无提示；
用我自己微调的vicuna-13b，加载成功，但有提示：You are probably using the old Vicuna-v0 model, which will generate unexpected results with the current fschat.
推理显存爆了torch.cuda.OutOfMemoryError: CUDA out of memory.

3、从新选择chatglm_6b,模型自动下载的那种，加载成功，推理也成功，上传doc文件报：ocx.opc.exceptions.PackageNotFoundError: Package not found at xxx.doc
于是尝试上传UTF-8的txt,结果UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position

4、另外，在config.py修改init_llm 和 init_embedding_model 好像只有界面生效，模型并未加载，只在手动选择后才能加载模型。

感谢作者提供这么好的项目，烦请百忙之中抽空帮忙看看，是不是我哪里配置不对，谢谢!!!

是不是可以考虑加入向量数据库qdrant实现批量加载文档，达到本地知识库的效果

在启智上按照教程部署，执行app.py，只输出local url，没有公网URL的原因是什么？

输入python3 app.py后没有反应

输入python3 app.py后没有反应，接着尝试输入python app.py提示我：

Traceback (most recent call last):
File "E:\langchain-chatGLM-webui\LangChain-ChatGLM-Webui\app.py", line 8, in
from duckduckgo_search.utils import SESSION
ModuleNotFoundError: No module named 'duckduckgo_search.utils'

然后我尝试了使用py app.py运行，结果报错
Traceback (most recent call last):
File "E:\langchain-chatGLM-webui\LangChain-ChatGLM-Webui\app.py", line 4, in
import gradio as gr
ModuleNotFoundError: No module named 'gradio'
但是我已经安装了gradio，并且打开python输入Import也成功，我不知道怎么回事了

另外说一下，在安装requirement.txt里的依赖性的时候无法按照detectron2，于是我按照这个方法：
https://zhuanlan.zhihu.com/p/425631249
安装了，然后才允许的Langchain-chatglm-webui，不知道有没有关系。

能否加入一个文件夹，启动时生成向量文件并加载

如题，本地知识的文件就不用每一次都点上传了，谢谢

给个建议：requirements.txt 最好指定下版本，直接下载最新的可能会有冲突

比如 wandb 最新的是 0.13.5，但是会有接口被迁移掉了，需要降版本（ 0.13.0可以用）才行

是否可以加载 safetensors 模型

我在用 GPTQ-for-LLaMa 压缩模型

CUDA_VISIBLE_DEVICES=0 python llama.py ${MODEL_DIR} c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors llama7b-4bit-128g.safetensors

项目是否可以加载safetensors 模型

提示错误Expecting value: line 1 column 1 (char 0)

按步骤完成相关设置后，模型都可以了，也完全上传文件了，但一问问题就会提示，提示错误Expecting value: line 1 column 1 (char 0)，无法运行。

请问需要自己下载Chatglm-6B模型到本机中进行部署吗

由于此项目涉及到上传文件操作，会不会涉及到数据泄漏的问题

没有GPU的电脑上不能运行

Setting CPU quantization kernel threads to 20
Using quantization cache
Applying quantization to glm layers
Traceback (most recent call last):
File "/home/wz/.local/lib/python3.9/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/home/wz/.local/lib/python3.9/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/home/wz/.local/lib/python3.9/site-packages/gradio/blocks.py", line 1025, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/wz/.local/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/wz/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/wz/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/wz/LangChain-ChatGLM-Webui/app.py", line 150, in predict
resp = get_knowledge_based_answer(
File "/home/wz/LangChain-ChatGLM-Webui/app.py", line 108, in get_knowledge_based_answer
chatLLM.load_model(model_name_or_path=llm_model_dict[large_language_model])
File "/home/wz/LangChain-ChatGLM-Webui/chatglm_llm.py", line 114, in load_model
self.model = (AutoModel.from_pretrained(
File "/home/wz/.local/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1811, in to
return super().to(*args, **kwargs)
File "/home/wz/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/home/wz/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/wz/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/wz/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/home/wz/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/home/wz/.local/lib/python3.9/site-packages/torch/cuda/init.py", line 247, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

是否支持多 GPU。

单卡显存容量不够，一样可以支持多 GPU 把模型分摊到多 GPU 显存。

向量存储建议

'NoneType' object has no attribute 'write'

Traceback (most recent call last):
File "C:\GLM\LangChain-ChatGLM-Webui-master\app.py", line 15, in
from chatllm import ChatLLM
File "C:\GLM\LangChain-ChatGLM-Webui-master\chatllm.py", line 7, in
from fastchat.serve.inference import load_model as load_fastchat_model
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\fastchat\serve\inference.py", line 9, in
from transformers import (
File "", line 1075, in _handle_fromlist
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\utils\import_utils.py", line 1137, in getattr
value = getattr(module, name)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\utils\import_utils.py", line 1136, in getattr
module = self._get_module(self._class_to_module[name])
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\utils\import_utils.py", line 1148, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
'NoneType' object has no attribute 'write'

安装库后，提示这个错误，不知道哪里出了问题

多个文件或文件夹的支持

建议可以支撑一个文件、或者文件夹的支撑，
希望可以有一个针对不同的用户有不同的knowledage base，一个用户可以有多个knowledage base,每个knoledage base可以是多个文件或文件夹

win10下安装detectron2一直失败

安装requirement的时候出错，以下是主要信息：
...
Building wheels for collected packages: unstructured-inference, sentence-transformers, detectron2, langchain-serve, jina, docarray, fvcore, antlr4-python3-runtime, jcloud, promise, pycocotools, future, python-docx, python-pptx, olefile
Building wheel for unstructured-inference (setup.py) ... done
Created wheel for unstructured-inference: filename=unstructured_inference-0.4.4-py3-none-any.whl size=36816 sha256=fd37dd8b1c4723d1206d7c4757dbe285f02cf4119396d0c72f43c935a0ea3e1b
Stored in directory: e:\pipcache\wheels\7f\0c\91\360ebd8b96f0acd20be6cf329c372911a4c01c05c16a8846d3
Building wheel for sentence-transformers (setup.py) ... done
Created wheel for sentence-transformers: filename=sentence_transformers-2.2.2-py3-none-any.whl size=125960 sha256=98195ec8c6a418f5085171098d4fa5bd7b2fb0c3e06e0105138b21f09a6aaeca
Stored in directory: e:\pipcache\wheels\71\67\06\162a3760c40d74dd40bc855d527008d26341c2b0ecf3e8e11f
Building wheel for detectron2 (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [1042 lines of output]
running bdist_wheel
E:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-39
creating build\lib.win-amd64-cpython-39\detectron2
copying detectron2_init_.py -> build\lib.win-amd64-cpython-39\detectron2
creating build\lib.win-amd64-cpython-39\tools
copying tools\analyze_model.py -> build\lib.win-amd64-cpython-39\tools
copying tools\benchmark.py -> build\lib.win-amd64-cpython-39\tools
copying tools\convert-torchvision-to-d2.py -> build\lib.win-amd64-cpython-39\tools
copying tools\lazyconfig_train_net.py -> build\lib.win-amd64-cpython-39\tools
copying tools\lightning_train_net.py -> build\lib.win-amd64-cpython-39\tools
copying tools\plain_train_net.py -> build\lib.win-amd64-cpython-39\tools
copying tools\train_net.py -> build\lib.win-amd64-cpython-39\tools
copying tools\visualize_data.py -> build\lib.win-amd64-cpython-39\tools
copying tools\visualize_json_results.py -> build\lib.win-amd64-cpython-39\tools
copying tools_init_.py -> build\lib.win-amd64-cpython-39\tools
...
"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe" /c logo /O2 /W3 /GL /DNDEBUG /MD -DWITH_CUDA -IC:\Users\cc\AppData\Local\Temp\pip-install-2wjkw3vm\detectron2_1187d9a69c984854be83cbda608f18ff\detectron2\layers\csrc -IE:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include -IE:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include\torch\csrc\api\include -IE:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include\TH -IE:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include" -IE:\minconda_extra_env_folder\langchain-chatGLM-webui\include -IE:\minconda_extra_env_folder\langchain-chatGLM-webui\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\cppwinrt" /EHsc /TpC:\Users\cc\AppData\Local\Temp\pip-install-2wjkw3vm\detectron2_1187d9a69c984854be83cbda608f18ff\detectron2\layers\csrc\ROIAlignRotated\ROIAlignRotated_cpu.cpp /Fobuild\temp.win-amd64-cpython-39\Release\Users\cc\AppData\Local\Temp\pip-install-2wjkw3vm\detectron2_1187d9a69c984854be83cbda608f18ff\detectron2\layers\csrc\ROIAlignRotated\ROIAlignRotated_cpu.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
ROIAlignRotated_cpu.cpp
E:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include\c10/macros/Macros.h(138): warning C4067: 预处理器指令后有意外标记 - 应输入换行符
E:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include\c10/util/Optional.h(212): warning C4624: “c10::constexpr_storage_t”: 已将析构函数隐式定义为“已删除”
with
[
T=c10::SymInt
]
E:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include\c10/util/Optional.h(411): note: 查看对正在编译的类模板实例化“c10::constexpr_storage_t”的引用
with
[
T=c10::SymInt
]
...
晓媮 18:09:58
"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe" /c logo /O2 /W3 /GL /DNDEBUG /MD -DWITH_CUDA -IC:\Users\cc\AppData\Local\Temp\pip-install-2wjkw3vm\detectron2_1187d9a69c984854be83cbda608f18ff\detectron2\layers\csrc -IE:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include -IE:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include\torch\csrc\api\include -IE:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include\TH -IE:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include" -IE:\minconda_extra_env_folder\langchain-chatGLM-webui\include -IE:\minconda_extra_env_folder\langchain-chatGLM-webui\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\cppwinrt" /EHsc /TpC:\Users\cc\AppData\Local\Temp\pip-install-2wjkw3vm\detectron2_1187d9a69c984854be83cbda608f18ff\detectron2\layers\csrc\ROIAlignRotated\ROIAlignRotated_cpu.cpp /Fobuild\temp.win-amd64-cpython-39\Release\Users\cc\AppData\Local\Temp\pip-install-2wjkw3vm\detectron2_1187d9a69c984854be83cbda608f18ff\detectron2\layers\csrc\ROIAlignRotated\ROIAlignRotated_cpu.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
ROIAlignRotated_cpu.cpp
E:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include\c10/macros/Macros.h(138): warning C4067: 预处理器指令后有意外标记 - 应输入换行符
E:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include\c10/util/Optional.h(212): warning C4624: “c10::constexpr_storage_t”: 已将析构函数隐式定义为“已删除”
with
[
T=c10::SymInt
]
E:\minconda_extra_env_folder\langchain-chatGLM-webui\lib\site-packages\torch\include\c10/util/Optional.h(411): note: 查看对正在编译的类模板实例化“c10::constexpr_storage_t”的引用
with
[
T=c10::SymInt
]

晓媮 18:10:34
C:\Users\cc\AppData\Local\Temp\pip-install-2wjkw3vm\detectron2_1187d9a69c984854be83cbda608f18ff\detectron2\layers\csrc\ROIAlignRotated\ROIAlignRotated_cpu.cpp : fatal error C1083: 无法打开编译器生成的文件: “”: Invalid argument
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> detectron2

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

ROIAlignRotated_cpu.cpp : fatal error C1083: 无法打开编译器生成的文件

执行 pip install -r requirement.txt

报错：
\AppData\Local\Temp\pip-install-pzphnpyf\detectron2_342601ed2f124809b3a4ec0ad331962a\detectron2\layers\csrc\ROIAlignRotated\ROIAlignRotated_cpu.cpp : fatal error C1083: 无法打开编译器生成的文件: “▒\x80\x9d: Invalid argument
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.35.32215\bin\HostX86\x64\cl.exe' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for detectron2
Running setup.py clean for detectron2
Failed to build detectron2
ERROR: Could not build wheels for detectron2, which is required to install pyproject.toml-based projects

模型未成功重新加载，请点击重新加载模型

embedding_model_dict = {
"ernie-base": "D:/AIL/workspace/LangChain-ChatGLM-Webui/models/ernie-3.0-base-zh",
"simbert-base-chinese": "D:/AIL/workspace/LangChain-ChatGLM-Webui/models/simbert-base-chinese",
"text2vec-base": "D:/AIL/workspace/LangChain-ChatGLM-Webui/models/text2vec-large-chinese"

}

llm_model_dict = {
"ChatGLM-6B-int4": "D:/AIL/workspace/LangChain-ChatGLM-Webui/models/chatglm-6b-int4",
"BELLE-LLaMA-7B-2M": "D:/AIL/workspace/LangChain-ChatGLM-Webui/models/BELLE-LLaMA-7B-2M"
}

配置的绝对地址路径

有办法可以使用chatglm的api方法进行调用吗？

BELLE-LLaMA-13B-2M 支持

我有BELLE-LLaMA-13B-2M 的模型，如何提供出来给大家使用？

本地运行报错

Traceback (most recent call last):
File "/Users/terry/Downloads/test/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/Users/terry/Downloads/test/lib/python3.8/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/Users/terry/Downloads/test/lib/python3.8/site-packages/gradio/blocks.py", line 1025, in call_function
prediction = await anyio.to_thread.run_sync(
File "/Users/terry/Downloads/test/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/Users/terry/Downloads/test/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/Users/terry/Downloads/test/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "app.py", line 143, in predict
print(file_obj.name)
AttributeError: 'NoneType' object has no attribute 'name'

输入框出现error后只能刷新页面

刷新后选择的模型也变化了。
有没有办法把error去除。

为啥主机不联网不能正常启动服务？

以下是联网状态下启动的输出，启动webui后可以正常访问服务与加载模型，也能正常使用对话。

(chatGLM) a@b:~/chatGLM/LangChain-ChatGLM-Webui$ python app.py 
No sentence-transformers model found with name /home/a/.cache/torch/sentence_transformers/GanymedeNil_text2vec-base-chinese. Creating a new one with MEAN pooling.
No sentence-transformers model found with name /home/a/chatGLM/LangChain-ChatGLM-Webui/model_cache/GanymedeNil/text2vec-base-chinese/GanymedeNil_text2vec-base-chinese. Creating a new one with MEAN pooling.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : /home/a/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int8/3218e92c957a036d2716fc2eaf86454841bcef18/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /home/a/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int8/3218e92c957a036d2716fc2eaf86454841bcef18/quantization_kernels_parallel.c -shared -o /home/a/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int8/3218e92c957a036d2716fc2eaf86454841bcef18/quantization_kernels_parallel.so
Load kernel : /home/a/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int8/3218e92c957a036d2716fc2eaf86454841bcef18/quantization_kernels_parallel.so
Setting CPU quantization kernel threads to 4
Using quantization cache
Applying quantization to glm layers
The dtype of attention mask (torch.int64) is not bool

Thanks for being a Gradio user! If you have questions or feedback, please join our Discord server and chat with us: https://discord.gg/feTf9x3ZSB
Running on local URL:  http://0.0.0.0:6006

To create a public link, set `share=True` in `launch()`.

以下是不联网时的输出，可以正常启动webui并在浏览器访问，但是点加载模型时始终会显示模型未正常加载

(chatGLM) a@b:~/chatGLM/LangChain-ChatGLM-Webui$ python app.py 
Running on local URL:  http://0.0.0.0:6006

To create a public link, set `share=True` in `launch()`.

因为最终的部署环境是无法联网的，想问问不联网咋正常启动服务，感谢感谢

一直提示报错：模型未成功重新加载，请点击重新加载模型

按照步骤部署启动了app.py，提示“模型未成功重新加载，请点击重新加载模型”，这是什么原因
相关的日志信息如下，

Traceback (most recent call last) ──────────────────────╮
│ /content/drive/MyDrive/LangChain-ChatGLM-Webui/app.py:215 in │
│ │
│ 212 │ return '', history, history │
│ 213 │
│ 214 │
│ ❱ 215 model_status = init_model() │
│ 216 │
│ 217 if name == "main": │
│ 218 │ block = gr.Blocks() │
│ │
│ /content/drive/MyDrive/LangChain-ChatGLM-Webui/app.py:160 in init_model │
│ │
│ 157 │
│ 158 def init_model(): │
│ 159 │ # try: │
│ ❱ 160 │ │ knowladge_based_chat_llm.init_model_config() │
│ 161 │ │ print(knowladge_based_chat_llm.llm.call("你好")) │
│ 162 │ │ return """初始模型已成功加载，可以开始对话""" │
│ 163 │ # except Exception as e: │
│ │
│ /content/drive/MyDrive/LangChain-ChatGLM-Webui/app.py:79 in │
│ init_model_config │
│ │
│ 76 │ │ │ model_name=embedding_model_dict[embedding_model], ) │
│ 77 │ │ self.embeddings.client = sentence_transformers.SentenceTransfo │
│ 78 │ │ │ self.embeddings.model_name, device=EMBEDDING_DEVICE) │
│ ❱ 79 │ │ self.llm.load_llm(llm_device=LLM_DEVICE, num_gpus=num_gpus) │
│ 80 │ │
│ 81 │ def init_knowledge_vector_store(self, filepath): │
│ 82 │
│ │
│ /content/drive/MyDrive/LangChain-ChatGLM-Webui/chatllm.py:126 in load_llm │
│ │
│ 123 │ │ │ │ device_map: Optional[Dict[str, int]] = None, │
│ 124 │ │ │ │ **kwargs): │
│ 125 │ │ if 'chatglm' in self.model_name_or_path.lower(): │
│ ❱ 126 │ │ │ self.tokenizer = AutoTokenizer.from_pretrained(self.model │
│ 127 │ │ │ │ │ │ │ │ │ │ │ │ │ trust_remote_co │
│ 128 │ │ │ if torch.cuda.is_available() and llm_device.lower().starts │
│ 129 │ │ │ │ # 根据当前设备GPU数量决定是否进行多卡部署 │
│ │
│ /usr/local/lib/python3.9/dist-packages/transformers/models/auto/tokenization │
│ _auto.py:692 in from_pretrained │
│ │
│ 689 │ │ │ │ raise ValueError( │
│ 690 │ │ │ │ │ f"Tokenizer class {tokenizer_class_candidate} does │
│ 691 │ │ │ │ ) │
│ ❱ 692 │ │ │ return tokenizer_class.from_pretrained(pretrained_model_na │
│ 693 │ │ │
│ 694 │ │ # Otherwise we have to be creative. │
│ 695 │ │ # if model is an encoder decoder, the encoder tokenizer class │
│ │
│ /usr/local/lib/python3.9/dist-packages/transformers/tokenization_utils_base. │
│ py:1812 in from_pretrained │
│ │
│ 1809 │ │ │ else: │
│ 1810 │ │ │ │ logger.info(f"loading file {file_path} from cache at │
│ 1811 │ │ │
│ ❱ 1812 │ │ return cls._from_pretrained( │
│ 1813 │ │ │ resolved_vocab_files, │
│ 1814 │ │ │ pretrained_model_name_or_path, │
│ 1815 │ │ │ init_configuration, │
│ │
│ /usr/local/lib/python3.9/dist-packages/transformers/tokenization_utils_base. │
│ py:1878 in _from_pretrained │
│ │
│ 1875 │ │ │ # For backward compatibility with odl format. │
│ 1876 │ │ │ if isinstance(init_kwargs["auto_map"], (tuple, list)): │
│ 1877 │ │ │ │ init_kwargs["auto_map"] = {"AutoTokenizer": init_kwar │
│ ❱ 1878 │ │ │ init_kwargs["auto_map"] = add_model_info_to_auto_map( │
│ 1879 │ │ │ │ init_kwargs["auto_map"], pretrained_model_name_or_pat │
│ 1880 │ │ │ ) │
│ 1881 │
│ │
│ /usr/local/lib/python3.9/dist-packages/transformers/utils/generic.py:563 in │
│ add_model_info_to_auto_map │
│ │
│ 560 │ """ │
│ 561 │ for key, value in auto_map.items(): │
│ 562 │ │ if isinstance(value, (tuple, list)): │
│ ❱ 563 │ │ │ auto_map[key] = [f"{repo_id}--{v}" if "--" not in v else v │
│ 564 │ │ else: │
│ 565 │ │ │ auto_map[key] = f"{repo_id}--{value}" if "--" not in value │
│ 566 │
│ │
│ /usr/local/lib/python3.9/dist-packages/transformers/utils/generic.py:563 in │
│ │
│ │
│ 560 │ """ │
│ 561 │ for key, value in auto_map.items(): │
│ 562 │ │ if isinstance(value, (tuple, list)): │
│ ❱ 563 │ │ │ auto_map[key] = [f"{repo_id}--{v}" if "--" not in v else v │
│ 564 │ │ else: │
│ 565 │ │ │ auto_map[key] = f"{repo_id}--{value}" if "--" not in value │
│ 566 │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: argument of type 'NoneType' is not iterable

微信群还有新的吗

如题

您好，支持的知识库都有什么？知识库可以为知识图谱吗？

如题，期待您的回复！

希望增加向量相似度阈值设定

本地的知识进行识别、清洗、分词、后存入向量库，我把问题转换成向量去向量库查询最接近的，我查询的时候在向量库中没有对应的数据，但是会返回一些乱七八糟的，是不是可以增加一个相似度阈值的设置

增加类似 OpenAI API 的接口方式

是否可以增加类似 OpenAI API 的接口方式，可以参考：https://github.com/ninehills/chatglm-openai-api

modelscope的演示一直error

couldn't pull docker image

the image is not accessible from the url provided.

有关detectron2

requirements里面的detectron2必需要用到吗（看了下是用来图像检测和图像分割），因为它还不支持cuda12，能不能不装

希望对话模式支持流式方式

API接口服务，疑似加载多个文件后，丢失前一个文件信息。

采用模型 [BELLE-LLaMA-13B-2M]
Embedding model [text2vec-base]
引用文件 https://github.com/LawRefBook/Laws/raw/master/%E5%88%91%E6%B3%95/%E5%88%91%E6%B3%95.md

{
    "input": "在战场上故意遗弃伤病军人会受到什么样的判决？", 
    "use_web": true, 
    "top_k": 3,  
    "history_len": 1, 
    "temperature": 0.01, 
    "top_p": 0.1, 
    "history": []
  }

{
    "result": "根据已知信息，在战场上故意遗弃伤病军人的行为属于徇私枉法、徇情枉法，对明知是无罪的人而使他受追诉、对明知是有罪的人而故意包庇不使他受追诉，或者在刑事审判活动中故意违背事实和法律作枉法裁判的，处五年以下有期徒刑或者拘役；情节严重的，处五年以上十年以下有期徒刑；情节特别严重的，处十年以上有期徒刑。因此，在战场上故意遗弃伤病军人的行为将受到相应的刑事惩罚。",
    "error": "",
    "stdout": "根据已知信息，在战场上故意遗弃伤病军人的行为属于徇私枉法、徇情枉法，对明知是无罪的人而使他受追诉、对明知是有罪的人而故意包庇不使他受追诉，或者在刑事审判活动中故意违背事实和法律作枉法裁判的，处五年以下有期徒刑或者拘役；情节严重的，处五年以上十年以下有期徒刑；情节特别严重的，处十年以上有期徒刑。因此，在战场上故意遗弃伤病军人的行为将受到相应的刑事惩罚。"
}

上传md显示Error，不明确是格式问题还是文件过大

上传md之后显示error，md中含有代码

langchain-serve 集成

Hey 我是来自 langchain-serve 的dev！

请问你们在langchain部分的集成有线上的场景吗？

如果有可以考虑了解下我们的产品，方便把langchain部署在云端：

Exposes APIs from function definitions locally as well as on the cloud.
Very few lines of code changes, ease of development remains the same as local.
Supports both REST & Websocket endpoints
Serverless/autoscaling endpoints with automatic tls certs.
Real-time streaming, human-in-the-loop support

谢谢

在线演示不能用

你好，想试一下demo的效果，发现两个网站的demo都各种error。

Something went wrong Expecting value: line 1 column 1 (char 0)

Something went wrong
Expecting value: line 1 column 1 (char 0)，加载本地模型后，输入问题后提交，报的错误，不知道是什么原因。

模型加载提示已成功，但发送问题会报error

Setting CPU quantization kernel threads to 6
Using quantization cache
Applying quantization to glm layers
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/gradio/routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.8/dist-packages/gradio/blocks.py", line 1075, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.8/dist-packages/gradio/blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.8/dist-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "app.py", line 198, in predict
print(file_obj.name)
AttributeError: 'NoneType' object has no attribute 'name'
^CKeyboard interruption in main thread... closing server.

能用.safetensors格式加载chatglm-6b-int4吗?

您好，请问能用.safetensors格式的模型吗?
用Hugging Face的工具转换chatglm-6b-int4。

x-d-lab / langchain-chatglm-webui Goto Github PK

langchain-chatglm-webui's People

Contributors

Stargazers

Watchers

Forkers

langchain-chatglm-webui's Issues

Recommend Projects

Recommend Topics

Recommend Org