Comments (25)
We can close this now, as you helped us even more than I asked for!
Thanks so much!
from node-yencode.
I remember having a quick look at it in the past. The main blocker was that I don't know how the Python build system supports setting different compiler flags for different files. I'm guessing this might help; perhaps the code has to be split up into separate modules to get different compiler flags.
Some sort of compiler detection could be useful (so that it still builds on those using older compilers); at the very least, different flags need to be given to x86/ARM builds, though if it's a Python script, I suppose detection can easily be done there.
Essentially, translating binding.gyp to the Python equivalent (setuptools?) would be the main thing. After that, it's just writing wrappers to whatever decode flavour you're after (whether you need dot unstuffing or end-of-article detection).
from node-yencode.
Indeed I understand that those things need to happen, but I feel lost in the compile errors I am getting.. Probably for you they are obvious as you dealt with compiling a lot, but as a Python person it's a whole different universe.
So hoping for your help!
If it matters in any way, we could also donate some of our advertising income to you for the effort! We would be very willing to do so, because this will be a core part of our application.
from node-yencode.
Thanks for the offer, but I don't accept those.
What are you ultimately looking for? A simple modification to your existing module, or something new?
How do you intend to process the data? Do you need dot unstuffing? Scanning for end of article? Does the module need to be redistributable in binary form?
I presumed you'd have more knowledge around creating a Python module? That's something I have no experience with yet.
from node-yencode.
I want to be able to call your do_decode
function on a raw python yenc string that I got from usenet. So it's not in chucks but the whole string is present already. It does need all the processing indeed, but that's already in the function as far as I could tell.
My knowledge is not much beyond using Python C extension template projects found on Github.
from node-yencode.
As you may have figured already, it's not really as simple as that. Or rather, calling the function is fairly straightforward (call do_decode_end
with the source data buffer, destination data buffer, length of source data, and current yEnc state). The difficulty is getting it to compile, as that requires the appropriate flags to be set for each file.
(you said the 'whole string' is present already, which I'm assuming doesn't represent the whole article, otherwise you'd need something which scans for where the article ends)
Do you already have some code/setup which tries to call the function?
from node-yencode.
I will gather my failed attempts and create 1 project to share so maybe you can see the problem.
from node-yencode.
I see - so this is a fresh project. I thought you may have been trying to adopt SABYenc.
I can take a look at a Python port of this module at some point.
from node-yencode.
Eventually it's intended to replace or be incorporated into Sabyenc, but trying to implement it in there directly was giving me unrelated errors that were too hard for me to seperate.
As such I wanted to get back to basics and first making it compile at all for Python.
from node-yencode.
Eventually it's intended to replace or be incorporated into Sabyenc
If that's the case, do you intend to keep it working as it currently does? As in, there's something else which identifies the end of article, assembles an array of strings, then passes it to the decoder? Or would you rather the yEnc decoder itself to detect the end of article, which means that the network receiver would be passing data directly to it without aggregating into an array?
(Sorry if I'm approaching this differently to how you would - I mostly try to see the end goal and see how to get there instead of taking small steps)
from node-yencode.
Currently we can only have 1 thread that handles all connections, because of the Python GIL that only allows 1 active thread at a time. So we use a select
to see which sockets have waiting data, read that with the 1 thread and check for end of article. If not end of article, we append the new raw data to an array of chunks with raw data. If we find end of article, we pass all the chunks of data to the decoder threads. In the decoder thread we call the C extension, which can release the GIL and thus can decode multiple articles at once.
It's a bit of a long introduction to say: decoding while receiving data (to also detect end of article) would probably hit performance, since we then can decode only 1 article simultaneously.
To have multiple sockets being read at the same time, we would also have to move that to C extension. The benefit of Python is that we don't have to care about any of the platform specific socket and IO/OpenSSL stuff, so it's very convenient and safe to have it like this.
from node-yencode.
NodeJS is also single threaded - the typical approach taken is that tasks can be processed asynchronously.
I believe Python also has async support, but am not sure of requirements or whether it's being used. But a possible approach would be to receive data on a socket, then send it to be asynchronously processed, whilst data from another socket is received. The async part can operate outside of the GIL because it doesn't need to interact with Python objects.
From the sounds of it, it's not the approach currently being used, but I don't know if there's any intent to adopt something like this.
The only downside of the current method is that the end-of-article scan is completely single threaded (and blocks the main thread), but presumably not much of an issue to really matter.
Am I safe to assume that we just need to replace the existing yEnc decode function in SABYenc as it is right now?
from node-yencode.
I did some digging last year and four that any "async" stuff in Python is not really asynchronous, only if written in C and if it releases the GIL specifically there. But there's almost no documentation how to properly implement the C part to be used in the async keywords.
End of article is not really that hard to scan for in the chunked approach: https://github.com/sabnzbd/sabnzbd/blob/c3d826df8c41221e0bd57a5e978756e1e0e81e7e/sabnzbd/newswrapper.py#L204
As the data is delivered in chunks, we have to do some logic to jump between chunks: https://github.com/sabnzbd/sabyenc/blob/89ef55dd6d9175589d9dec25acdedd541ea62d75/src/sabyenc3.c#L239
But I think I can implement that, once there is a base project that compiles and can run on a whole string that's not in chunks. Didn't want to bother you with our specific implementation details :)
from node-yencode.
I see. It also looks like asyncio requires most of the project to be async as well, which might be a big change.
It can probably be done without native async, but probably not worth the effort.
I'll find some time to take a look at SABYenc to get a better idea of how to approach this.
End of article is not really that hard to scan for in the chunked approach
Quick scan of the code, so could be wrong here, but does it mean that the end sequence won't be detected if it's sent in three or more chunks? Also, if the server sends the end sequence, followed by something else (say, a quit message), it could get missed?
If I'm correct, and it's never really been an issue, I guess it doesn't matter.
As the data is delivered in chunks, we have to do some logic to jump between chunks
The decoder supports incremental processing - you just have to maintain a parsing state between calls (or chunks, in this case).
The decoder doesn't scan for null bytes though, but it looks like PyBytes_AsStringAndSize
can be used to get the size.
from node-yencode.
Quick scan of the code, so could be wrong here, but does it mean that the end sequence won't be detected if it's sent in three or more chunks? Also, if the server sends the end sequence, followed by something else (say, a quit message), it could get missed?
Indeed, it has never happend that it's split between more than 2 chunks.
from node-yencode.
Currently the unit tests of sabyenc are broken, would it help if I fixed them?
from node-yencode.
The unit tests seem to run fine for me?
The only thing I found was the speed test fails on Python 3, but that's an easy fix.
I've been looking at implementing this, but a few things I noticed:
- it looks like a search for
crc32=
is done as opposed topcrc32=
(or probably more accurately,[space]pcrc32=
). If the article contains both 'pcrc32' and 'crc32', wouldn't it be possible for this to be confused? - I'm not sure where the article size is determined, but is there any reason to trust what's specified, and not just use the actual data size all the time? Decoding yEnc can never generate output larger than the input, so it seems like a sensible allocation strategy.
- might be related to above, but what's with the 10% extra reservation?
- the attempt to correct a bad part size doesn't make much sense to me - if it's bad, shouldn't it just simply be rejected?
- bit flipping the CRC here is weird (usually you do it on the calculated CRC, not the declared one) - do the upper layers require knowing the CRC specified by the article? Because if they do, they'd be getting an incorrect value (or 0 if
CRC_CHECK
isn't defined, orcrc32=
was never found in the line)
from node-yencode.
The unit tests seem to run fine for me?
They fail on some platforms when I tried to run them on GitHub Actions, but good thing they work on yours 👍
it looks like a search for crc32= is done as opposed to pcrc32= (or probably more accurately, [space]pcrc32=). If the article contains both 'pcrc32' and 'crc32', wouldn't it be possible for this to be confused?
Probably, I don't think we ever had those cases where both were present.
I'm not sure where the article size is determined, but is there any reason to trust what's specified, and not just use the actual data size all the time? Decoding yEnc can never generate output larger than the input, so it seems like a sensible allocation strategy.
might be related to above, but what's with the 10% extra reservation?
Since we have a per-article-size-indication from the NZB, we use that to save the extra looping over all chunks to find the "real" length. Think it made a few % performance difference to do this.
the attempt to correct a bad part size doesn't make much sense to me - if it's bad, shouldn't it just simply be rejected?
Good question, probably just a safe-guard "to be sure".
bit flipping the CRC here is weird (usually you do it on the calculated CRC, not the declared one) - do the upper layers require knowing the CRC specified by the article? Because if they do, they'd be getting an incorrect value (or 0 if CRC_CHECK isn't defined, or crc32= was never found in the line)
The upper layers only care if the CRC is correct, not what it was really.
As you can see, many things in this code are more based on real-world tweaks to push some extra performance and not on really being strict on the yEnc standard perse. That is then also combined with my fairly limited C-knowledge, resulting in not so-nice code.
I wanted to note specifically on the CRC: after much testing with real users, it was found that never some char-bit was flipped but always just chunks of data was missing. So calculating the CRC was just an expensive extra step relative to just comparing the sizes.
So for already a number of years in the production release we don't calculate the CRC anymore and just use the size comparison.
from node-yencode.
Probably, I don't think we ever had those cases where both were present.
Nyuu definitely writes both, but puts the pcrc32 first, so it'd work because of that ordering. The yEnc spec example also puts pcrc32 before crc32, so not likely an issue in reality, but I'll change it to pcrc32 as it should be more accurate with basically no downside.
I recall most uploaders not ever including 'crc32', so your experience makes sense.
Since we have a per-article-size-indication from the NZB, we use that to save the extra looping over all chunks to find the "real" length. Think it made a few % performance difference to do this.
Ah I see.
I'm surprised it'd make that much of a difference though, particularly since I'd generally expect most articles to arrive in a few chunks only.
Whilst it may seem slower on the surface, it actually allows you to remove this check, which would have a much greater impact on performance, since that's checked ~700K times per article.
This module similarly skips such a check (for performance reasons), requiring having sufficiently large buffers, so I'd need to remove trusting the NZB value here.
The upper layers only care if the CRC is correct, not what it was really.
I can see it being useful if you want to verify crc32
in addition to pcrc32
.
But otherwise, I guess I can remove it then - I'll keep the interface the same, but always return 0 for the CRC (which already happens since CRC is disabled by default).
I wanted to note specifically on the CRC: after much testing with real users, it was found that never some char-bit was flipped but always just chunks of data was missing. So calculating the CRC was just an expensive extra step relative to just comparing the sizes.
That sounds very reasonable, though personally I feel uneasy about skipping the CRC check. What you decide to do is your decision in the end, though the CRC32 performance of this module should be better, so hopefully will encourage you to reconsider.
A few more things:
- it looks like this could fail to advance far enough, if there happens to be a chunk boundary in the middle. It's unlikely for ypart to ever include more than a begin/end property, so I suppose it's not an issue in reality
- this assumes the input is a bytearray, but everywhere else assumes bytes object. I assume the latter is correct?
- the CRC checked in the encoder test is bit-inverted. This actually means the encoder is generating the wrong CRC, though I doubt Sabnzbd really uses an encoder much. I did find this case though, so I don't know if fixing the CRC would break something
For handling multiple chunks, I'd have made a small flat buffer (say, around 4KB as the ybegin/ypart lines should fit within that) and filled that with data from the chunks, before processing. This saves you from having to implement complex logic to deal with data being split across chunks every time you try to parse something, and it also works in the rare case that something spans across three or more chunks.
from node-yencode.
I'm surprised it'd make that much of a difference though, particularly since I'd generally expect most articles to arrive in a few chunks only.
Here's another fun Python fact: when SSL is enabled for the usenet-connection, Python delivers each chunk in the TLS fragment size of 16k, resulting in a minimum of around 40 chunks in case of SSL-connections. This is something that hopefully will be addressed in a future version of Python: python/cpython#25478
Whilst it may seem slower on the surface, it actually allows you to remove this check, which would have a much greater impact on performance, since that's checked ~700K times per article.
This module similarly skips such a check (for performance reasons), requiring having sufficiently large buffers, so I'd need to remove trusting the NZB value here.
👍
this assumes the input is a bytearray, but everywhere else assumes bytes object. I assume the latter is correct?
Indeed, should be bytes.
the CRC checked in the encoder test is bit-inverted. This actually means the encoder is generating the wrong CRC, though I doubt Sabnzbd really uses an encoder much. I did find this case though, so I don't know if fixing the CRC would break something
I should fix that then!
For handling multiple chunks, I'd have made a small flat buffer (say, around 4KB as the ybegin/ypart lines should fit within that) and filled that with data from the chunks, before processing. This saves you from having to implement complex logic to deal with data being split across chunks every time you try to parse something, and it also works in the rare case that something spans across three or more chunks.
Great! The current code is so clunky..
from node-yencode.
I've got a version of sabyenc with yencode integrated here, if you want to play around with it: https://github.com/animetosho/sabyenc/tree/integrate_yencode
There's one major issue remaining: I can't see how to define different compile options for particular files, or the ability to define dependent C libraries in setuptools. I've worked around this by invoking the compiler manually, which seems to work, though probably isn't the nicest solution.
In particular, it currently executes before setup
, so compiling dependencies will always be called when setup.py is (even if not building). It may be possible to avoid this with the cmdclass
directive.
I also can't find a reliable way to determine the current build architecture. The closest thing I can find in Python is platform.machine()
, but this has some problems:
- it can return a bunch of different values for the same architecture
- it is based off the kernel, which may not be the same as the Python interpreter (e.g. 32-bit Python running on a 64-bit kernel)
Do you know of a way to determine Python's build architecture? Or better, the target architecture, if setuptools allows cross-arch compilation.
In my changes, I've removed the 10% overallocation, since the NZB size is now ignored, and the size calculation can be relied upon.
I've left the bad part size correction - you may wish to consider rejecting the article immediately, as I'm not sure it does anything useful.
The changes introduce three test failures:
- test_single_part: the test case contains a dot at the start of a line, which is invalid according to yEnc specs. SabYenc originally strips doubled dots only, whilst this decoder strips any dot following a new line. Since it's invalid, the spec doesn't really say which is better, but I felt the latter was, which is why I went with that.
- test_partial: the old decoder actually tries to decode the =yend segment as yEnc encoded data, so gives a longer output. I don't get the point of the partial test - does the decoder ever get fed *part *of an article?
- test_false_bytes: the new code completely ignores the passed in length, so passing in an invalid length won't stop the decoder
Enabling CRC check also affects the following cases:
- test_single_part: the supplied article doesn't have a pcrc32 value, which is mandatory under yEnc specification, so is immediately marked as invalid. crc32 is given, however 'crc32' is optional, but 'pcrc32' is not. If you want to still allow such articles, you could allow searching for crc32 if pcrc32 is not found.
- test_bad_filename_pickle: fails on the old code. I'm guessing the CRC is correct, but the length is wrong.
- test_crc_pickles: likely same issue as with test_single_part
Here's another fun Python fact: when SSL is enabled for the usenet-connection, Python delivers each chunk in the TLS fragment size of 16k, resulting in a minimum of around 40 chunks in case of SSL-connections
Does Python have the ability to receive into a pre-allocated buffer/bytes object, rather than accept a new bytes object every receive?
from node-yencode.
Amazing! It works 😃
I had to do 1 tweak though: the crcutil-1.0\examples
is not included, but the files in there are required to compile so I manually copied them in from this repository.
Does Python have the ability to receive into a pre-allocated buffer/bytes object, rather than accept a new bytes object every receive?
Yes, but it still only does that 16k at a time.
from node-yencode.
Ah, excluded by .gitignore. Missed that, thanks.
Yes, but it still only does that 16k at a time.
Why not pre-allocate, say, a 1MB buffer, then receive into that? There'll still be some overhead with small receives, but everything subsequent will only have to deal with one buffer.
(it would also have other benefits, such as enabling a buffer pool to relieve allocation/GC pressure)
Taking a quick look, Python does have socket.recv_into
, which should do the trick.
from node-yencode.
Deleted my previous post, because it turns out my script wasn't optimal. Allocating the bytearray
takes a lot of time, so picking a proper size (like we could get from the NZB) makes a big difference. Updated the linked gist below.
Now buffers are always faster.
@animetosho I looked into the recv_into
and it looks promising. When I checked it out a long time ago, it was slower than expected.
So just to be sure I made a speed-test, using nserv
(NZBGet's super-fast NNTP-test-server) and end-of-article-checks.
https://gist.github.com/Safihre/12f9826631c0aaf47491c118511679c4
(lower = better)
EDIT: Some data updated, wrong copy-paste!
SSL:
RECV SIZE: 262144
Chunks: 828.125
Buffer: 781.25
RECV SIZE: 16384
Chunks: 640.625
Buffer: 609.375
NO SSL:
RECV SIZE: 262144
Chunks: 250.0
Buffer: 171.875
RECV SIZE: 16384
Chunks: 437.5
Buffer: 375.0
from node-yencode.
Allocating the bytearray takes a lot of time, so picking a proper size (like we could get from the NZB) makes a big difference
I'd persist the buffer in that case. Perhaps assign one to each connection, and always recv into that buffer.
This saves you from having to reallocate it for every article. You can use the NZB size as a hint (or use some hard-coded value) and perhaps add a 10% margin on top. If a recv would exceed the buffer size, resize it (say, make it 10% larger). Most of the time, a resize should never be needed, so you're basically just writing into a fixed buffer.
Also, mentioned in the pull request, but it would also make sense for the decoded output to occupy a fixed buffer, instead of creating a new one each time.
Thanks for testing it!
from node-yencode.
Related Issues (12)
- Please Create CRC checksum's HOT 2
- CLI options? HOT 3
- Building for Go HOT 7
- Clarification on `column_offset` parameter for `encode` and `encodeTo` HOT 2
- Critical characters not escaped when they are the last character of the segment HOT 1
- yEnc SSE decode ideas HOT 54
- Build fails with node.js 12.7.0 HOT 3
- Unable to build or install HOT 2
- Errors trying to install yencode HOT 1
- macOS: llvm: yencode 1.1.0: build error: use of undeclared identifier 'aligned_alloc' HOT 5
- Trying to install yencode: "npm WARN saveError ENOENT: no such file or directory, open '/home/sander/package.json'" HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from node-yencode.