Comments (7)
@AnitaSherry 你好,感谢你的关注与使用!
由于simhash库本身支持的问题,它的原始代码库只能安装并使用于python3.8及以下的环境,因此目前你有两种方法来解决其安装失败的问题:
- 使用一个符合要求的python环境:如 @fuxuelinwudi 提供的方法,感谢 @fuxuelinwudi 提供帮助!
- 从源码安装,并将simhash-py的安装源从原始代码库改为 https://github.com/hylcool/simhash-py ,后者是我们基于原代码库做了若干修改,使得它能够兼容python 3.9及以上的版本
供你参考~如有进一步问题欢迎随时提问
from data-juicer.
conda环境装python=3.8 可以解决
from data-juicer.
从 https://github.com/hylcool/simhash-py 源码安装也失败。
`
(sakura) kemove@kemove-Super-Server:/data/competition/competition_kit/data-juicer/simhash-py-master$ pip install .
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing /data/competition/competition_kit/data-juicer/simhash-py-master
Preparing metadata (setup.py) ... done
Building wheels for collected packages: simhash-py
Building wheel for simhash-py (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [108 lines of output]
Building from Cython
/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/dist.py:723: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
warnings.warn(
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.9
creating build/lib.linux-x86_64-3.9/simhash
copying simhash/init.py -> build/lib.linux-x86_64-3.9/simhash
running build_ext
Compiling simhash/simhash.pyx because it changed.
[1/1] Cythonizing simhash/simhash.pyx
/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/Cython/Compiler/Main.py:381: FutureWarning: Cython directive 'language_level' not set, using '3str' for now (Py3). This has changed from earlier releases! File: /data/competition/competition_kit/data-juicer/simhash-py-master/simhash/simhash.pxd
tree = Parsing.p_module(s, pxd, full_module_name)
Error compiling Cython file:
------------------------------------------------------------
...
import hashlib
import struct
from simhash cimport compute as c_compute
^
------------------------------------------------------------
simhash/simhash.pyx:4:0: 'simhash/compute.pxd' not found
Error compiling Cython file:
------------------------------------------------------------
...
import hashlib
import struct
from simhash cimport compute as c_compute
from simhash cimport find_all as c_find_all
^
------------------------------------------------------------
simhash/simhash.pyx:5:0: 'simhash/find_all.pxd' not found
warning: simhash/simhash.pyx:15:0: Overriding cdef method with def method.
warning: simhash/simhash.pyx:19:0: Overriding cdef method with def method.
Error compiling Cython file:
------------------------------------------------------------
...
# Unpacks the binary bytes in digest into a Python integer
return struct.unpack('>Q', digest)[0] & 0xFFFFFFFFFFFFFFFF
def compute(hashes):
'''Compute the simhash of a vector of hashes.'''
return c_compute(hashes)
^
------------------------------------------------------------
simhash/simhash.pyx:17:11: 'c_compute' is not a constant, variable or function identifier
Error compiling Cython file:
------------------------------------------------------------
...
Find the set of all matches within the provided vector of hashes.
The provided hashes are manipulated in place, but upon completion are
restored to their original state.
'''
cdef matches_t results_set = c_find_all(hashes, number_of_blocks, different_bits)
^
------------------------------------------------------------
simhash/simhash.pyx:26:33: 'c_find_all' is not a constant, variable or function identifier
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/data/competition/competition_kit/data-juicer/simhash-py-master/setup.py", line 37, in <module>
setup(
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 148, in setup
return run_commands(dist)
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
dist.run_commands()
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
self.run_command(cmd)
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/wheel/bdist_wheel.py", line 364, in run
self.run_command("build")
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/_distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
cmd_obj.run()
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/Cython/Distutils/build_ext.py", line 122, in build_extension
new_ext = cythonize(
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/Cython/Build/Dependencies.py", line 1134, in cythonize
cythonize_one(*args)
File "/home/kemove/anaconda3/envs/sakura/lib/python3.9/site-packages/Cython/Build/Dependencies.py", line 1301, in cythonize_one
raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: simhash/simhash.pyx
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for simhash-py
Running setup.py clean for simhash-py
Failed to build simhash-py
ERROR: Could not build wheels for simhash-py, which is required to install pyproject.toml-based projects
WARNING: There was an error checking the latest version of pip.
`
from data-juicer.
现在很多项目都要用到python3.11了,这怎么还停留在python3.8,这样太过时了
from data-juicer.
在environments/science_requires.txt中
删掉simhash-py,变成simhash,就成功安装了
from data-juicer.
现在很多项目都要用到python3.11了,这怎么还停留在python3.8,这样太过时了
这种第三方库的原因我们这边目前也没啥好的办法,作者对这个库已经不做后续支持了。。。后边儿我们会考虑看能否替换为别的依赖库或者我们自行实现这一套计算过程吧
from data-juicer.
我之前也遇到过这个问题,可以试试把Cython的版本降低一点试试,我用Cython==0.29.21之后就可以安装simhash-py
from data-juicer.
Related Issues (20)
- OP insight demo enhancement
- DJ-v.0.2 docker image update HOT 1
- DJ-v0.2 API page enhancement
- Video content compliance and privacy protection operators (image, text, audio)
- [Bug]: video split by duration mapper return non-exist video
- support panda's student captioner model in our captioning mapper HOT 3
- [Bug]: Video_split_by_scene_mapper create non-exist video_keys
- [Feature Request] Implement more streamlined interfaces for users seeking minimal functionality (data_juicer.op.functional) HOT 2
- Request a sample code demonstrating the use of image_captioning_from_gpt4v_mapper.py HOT 3
- Can not download the data quality classifier models. HOT 1
- alphanumeric_filter算子清洗疑问 HOT 5
- Absolute path to relative path for multi-source
- [Bug]: process on ray occur "TypeError: 'str' object cannot be interpreted as an integer" HOT 8
- filter是否支持batch处理,以及怎么设置batch_size? HOT 5
- hash calculate in ray deduplicator HOT 4
- 为什么大部分的refined recipe都是用simhash去重? HOT 3
- [Bug]: 运行tools/analyze_data.py报错,出现 KeyError: 'text' HOT 2
- [Question] Can't find evalutor.yaml on the path of `/workspace/data-juicer/demos` HOT 1
- A Compatibility Issue in Environment Installation of DJ-Sandbox HOT 1
- stopwords_filter 为什么是过滤掉小于某个阈值的样本 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from data-juicer.