Giter Site home page Giter Site logo

Comments (8)

uranusjr avatar uranusjr commented on July 30, 2024 1

Ah, right, I forgot about paths. Falling back with a deprecation warning sounds like the way to go.

from pip.

matthewhughes934 avatar matthewhughes934 commented on July 30, 2024

Are you able to share the contents of the requirements.txt file you were using?

from pip.

danerlt avatar danerlt commented on July 30, 2024

@matthewhughes934
The contents of my requirements.txt are as follows:

# server
supervisor==4.2.5
gunicorn==21.2.0
gevent==23.9.1

# web
Werkzeug==2.3.7
celery==5.2.7
click==8.1.7
dataclasses_json==0.6.4
Flask==2.3.3
Flask_Cors==3.0.10
Flask_Login==0.6.2
Flask_Migrate==4.0.5
Flask_RESTful==0.3.9
flask_sqlalchemy==3.0.5
SQLAlchemy==2.0.0
minio==7.2.4
psycopg2-binary==2.9.9
python-dotenv==1.0.1
redis==5.0.2
requests==2.31.0

# rag
langchain==0.1.16
llama-index==0.10.30
llama-index-core==0.10.30  # 这个必须手指定,不然构建的时候会去获取最新的版本,可能会有bug。
llama-index-retrievers-bm25==0.1.3
llama-index-storage-index-store-redis==0.1.2
llama-index-storage-kvstore-redis==0.1.3
llama-index-storage-docstore-mongodb==0.1.3
llama-index-vector-stores-milvus==0.1.10
llama-index-vector-stores-qdrant==0.2.5
llama-parse==0.4.1
rank-bm25==0.2.2
ragas==0.1.1
qdrant-client==1.9.0
pymongo==4.6.3
motor==3.4.0
asyncpg==0.29.0
spacy==3.7.4
jieba==0.42.1
./zh_core_web_sm-3.7.0-py3-none-any.whl
scikit-learn==1.4.2


# data loader 相关依赖
pypdf==4.2.0
pdfminer-six==20231228
PyMuPDF==1.24.2
docx2txt==0.8
python-docx==1.1.0
openpyxl==3.1.2

# 评估相关
dashscope==1.19.2
zhipuai==2.1.0

from pip.

danerlt avatar danerlt commented on July 30, 2024

@matthewhughes934
I modified pip_internal\utils\encoding.py and added the ignore parameter to its data.decode method, which resolved the issue.

from pip.

uranusjr avatar uranusjr commented on July 30, 2024

It’s probably best to always use ascii with replace. We only allow ASCII in requirements, and anything else (e.g. comments) are ignored by the parser anyway.

A PR would be much welcomed.

from pip.

matthewhughes934 avatar matthewhughes934 commented on July 30, 2024

I modified pip_internal\utils\encoding.py and added the ignore parameter to its data.decode method, which resolved the issue.

I guess the underlying issue was: the file looks to be UTF-8 encoded but you're working in an environment that uses a simplified Chinese locale, and so uses GBK for decoding. I guess an alternative solution would be to run Python in UTF-8 mode (https://docs.python.org/3/using/windows.html#utf-8-mode)

from pip.

matthewhughes934 avatar matthewhughes934 commented on July 30, 2024

It’s probably best to always use ascii with replace. We only allow ASCII in requirements, and anything else (e.g. comments) are ignored by the parser anyway.

A PR would be much welcomed.

It’s probably best to always use ascii with replace. We only allow ASCII in requirements, and anything else (e.g. comments) are ignored by the parser anyway.

A PR would be much welcomed.

👍 happy to get a PR up. I'm wondering two things:

  • If I change auto_decode: are there places where we want decoding to fail (per errors="strict") or would it be ok to always replace? Or is there code elsewhere that should be changed?
  • 🤔 Is there any potential for issues with multi-byte/non-ascii-extended encodings: I have no idea how common these might be, but I guess a consequence could be instead of getting a 'failed to decode' error you could get an error about pip failing to install a package named "����"

from pip.

pfmoore avatar pfmoore commented on July 30, 2024

We only allow ASCII in requirements, and anything else (e.g. comments) are ignored by the parser anyway.

Unfortunately, requirements aren't the only things in a requirement file. --requirement <path to file to include> could include arbitrary Unicode characters, and for that matter a simple local pathname is valid (and could be Unicode).

However, the documentation states that requirement files should be UTF-8 by default, so this seems like a simple bug in auto_decode - https://github.com/pypa/pip/blob/main/src/pip/_internal/utils/encoding.py#L35 should be using UTF-8. (And arguably the BOM detection in there is in violation of the spec, but IMO it's not worth changing).

Of course, even though this is technically a bug fix, it is still a breaking change, potentially, so we need to consider how we handle that. (We could fall back to the system encoding if UTF8 fails, with a deprecation warning - this won't avoid mojibake, but it will catch outright encoding failures).

from pip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.