Comments (7)
I already change string decoding error handler to an argument in dev branch by commit ee3128b.
from torrent_parser.
Thanks for your idea.
I will finish the customize hash fields API tomorrow and release a new version.
Due to the break change and so may thing be added, It will be 0.3.0.
(And yes, in 0.x.x break change don't need add the major version... I'm still considering when to reach the 1.0 ⌛)
from torrent_parser.
v0.3.0 just released.
In this version, there are many way to deal with this problem:
import torrent_parser as tp
file = 'tests/test_files/utf8.encoding.error.torrent'
# way 1
data = tp.parse_torrent_file(file, errors='ignore')
print(data['magnet-info']['info_hash'])
data = tp.parse_torrent_file(file, errors='replace')
print(data['magnet-info']['info_hash'])
# way 2
data = tp.parse_torrent_file(file, hash_fields={'info_hash': (20, False)})
print(data['magnet-info']['info_hash'])
# way 3
data = tp.parse_torrent_file(file, hash_fields={'info_hash': (20, False)}, hash_raw=True)
print(data['magnet-info']['info_hash'])
# If you don't use any above option
try:
data = tp.parse_torrent_file(file)
except tp.InvalidTorrentDataException as e:
print(e)
the output:
jysL
�j��y�sL�
36fd06b595119b380df46ab2f2a0b579b1734ca8
b'6\xfd\x06\xb5\x95\x11\x9b8\r\xf4j\xb2\xf2\xa0\xb5y\xb1sL\xa8'
Fail to decode string at pos 16436 using encoding utf-8 when parser field "info_hash", maybe it is an hash field. You can use self.hash_field("info_hash") to let it be treated as hash value, so this error may disappear
the hash_field("info_hash")
is added to the class:
with open(file, 'rb') as f:
data = tp.TorrentFileParser(f).hash_field('info_hash').parse()
print(data['magnet-info']['info_hash'])
# 36fd06b595119b380df46ab2f2a0b579b1734ca8
with open(file, 'rb') as f:
data = tp.BDecoder(f.read()).hash_field('info_hash').decode()
print(data['magnet-info']['info_hash'])
# 36fd06b595119b380df46ab2f2a0b579b1734ca8
from torrent_parser.
I just merged your PR #5, but I came up with some ideas just now, and want to discuss them with you (and others).
-
I notice the error happened because there is a field
magnet-info.info_hash
, which doesn't seem to be a string, instead, it's a piece of hash value. I'm wondering if I should/need add it to the field list whose member will be treated as hash automatically. (see line 108 and 189) -
The decoding error handler will become an option of
TorrentFileParser
class andparse_torrent_file
shortcut function. It's default behavior will not change, that is, default value of it will bestrict
. You can useignore
orreplace
to avoid exception if you wish. But if I addedinfo_hash
to that list, your error will disappear automatically. So I think usestrict
as error handler and addtry catch
to bypass REAL invalid torrent is the best way. -
I can add an method to
TorrentFileParser
andTorrentFileCreator
to let user add their own hash value field to that list. And meantime, the error message of string decode error will suggest user to use this method to add custom hash field to the list But I'm wondering if it is worth to do. And if I decide to do this, yourmagnet-info.info_hash
will not be added to the list by default.
Waiting for your idea. (Only 1 day, then I will do in the way I like)
from torrent_parser.
-
Yes. Maybe the field was created by an obscure client or private torrent index.
-
For general use I think your suggestions of passing
'strict'
to.decode()
errors argument is okay.
But for my use case, giving me the option to pass my own argument would be perfect. Fault tolerance is a desirable quality in a crawler. I need the'ignore'
or'replace'
flag as I wish to collect as many files as possible.
Given the scale of my operation such errors are bound to happen, and I might lose out on thousands of potentialy working torrents. I have 87 torrents with the samemagnet-info.hash_info
issue right now. As long as the torrent works at the minimum, I add it. -
Yes it might be useful to a small percentage of users. If it is not too much work, add it and document it.
To conclude, I think if you add many different options for achieving many different goals(as long as you write good tests and documentation), your library will appeal to a broader audience.
Don't lock out a subset of users. If you need help tag your Kanban board with help wanted
Thank you.
from torrent_parser.
No problem take your time
from torrent_parser.
Very good and much appreciated. I think we can go ahead and close the issue.
from torrent_parser.
Related Issues (12)
- UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 1: invalid start byte HOT 4
- Changing trackers in announce-list corrupts save HOT 7
- Missing dependency typing_extensions HOT 3
- torrent_parser does not return "Info Hash v1" value HOT 6
- add functions to calculate v1 and v2 info hashes of torrent files HOT 3
- Python2.7 处理不存在的文件时出现 NameError: global name 'FileNotFoundError' is not defined HOT 1
- hello,I find something wrong in you code.. HOT 4
- Same error with every torrent. HOT 2
- Web siders are sorted by length, but not by name. HOT 5
- UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 43 HOT 2
- v2 Torrent supported? HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torrent_parser.