Comments (27)
ping @satazor ... is this possible to do with SparkMD5?
from js-spark-md5.
Yes you can use buffers or strings, check this spec for an example with a string: https://github.com/satazor/SparkMD5/blob/master/test/specs.js#L61
from js-spark-md5.
Oh sorry, didnt quite understand your question at first. I am afraid its not possible at the moment, but definivetly possible with some changes to the code and a way to set and get internal state.
from js-spark-md5.
@satazor any suggestions on how I could achieve this? I'm willing to give it a shot and submit a patch request if you aren't able to at the moment, but some advice from someone familiar with the code would help get me going in the right direction.
from js-spark-md5.
Hey @silverbucket
I've made it possible in the state
branch.
There's two new functions, getState() => state
and setState(state)
. Please test it and see how it works. At the moment I'm short on free time, so I would appreciate if you could add tests and documentation to the README on that feature branch.
Will wait for your feedback.
from js-spark-md5.
@satazor awesome! I will have a look at this today and let you know how it goes. I'm happy to add tests and docs assuming everything goes well.
from js-spark-md5.
Hi @satazor - I had a look at this, and perhaps there's still a misunderstand of what I'm looking for. The functionality you added requires a Binary Blob as one of it's parameters, which means that there must be resident in memory the entire file.
What I was asking about is, providing the md5 string
of the latest bit of data, and computing the next bit without needed the entire file.
Let's say you have 1 file split into 5 chunks, and you want to continue to generate your md5 sum as the incoming chunks arrive.
- init
SparkMD5.ArrayBuffer
object - begin transfer
- chunk 1 ->
38049dbc642b1f957563aabd5874450c
- chunk 2 ->
c04347f6eebc69c5d787c4df247b42e8
- chunk 3 ->
b79276e973a320c6b53cb09c69beb587
Now, let's say at this point the transfer terminates, page reloads, or something else happens that aborts the file transfer. My question is, could we continue to generate the final md5 sum when the page reloads and we resume the transfer, by providing the latest md5 sum that we have (b79276e973a320c6b53cb09c69beb587
) ?
- init
SparkMD5.ArrayBuffer
- provide existing md5 as starting point
- begin resume-able transfer
- chunk 4 ->
6f68b61c9bf25371bf65d4ee3cf646cc
- chunk 5 ->
d85bdd9bc25a76431a7a17c13bdbc9fa
So the final md5 checksum (d85bdd9bc25a76431a7a17c13bdbc9fa
) would match if we were to perform an md5 checksum on the completed downloaded file as a whole.
I've read this is possible (Incremental Checksum) but it appears like currently, SparkMD5 contains the entire file within it's object. Meaning we have 2 copies of the file being download resident in memory (one for the file transfer itself, the other within SparkMD5 .append()
behavior. Is this correct? It's also quite possible I misunderstand the capabilities of incremental checksumming.
from js-spark-md5.
I've heard people refer to Adler/Fletcher checksumming as what does this, mostly for error detection of corrupt packets, etc.
from js-spark-md5.
Hi @satazor - I had a look at this, and perhaps there's still a misunderstand of what I'm looking for. The functionality you added requires a Binary Blob as one of it's parameters, which means that there must be resident in memory the entire file.
No, the buffer is only 64 bits maximum. MD5 works by computing 64 bits each time. Every time that .append()
is called, it concatenates what you are passing into a buffer. Then if the buffer has atleast 64 bits, then spark
computes 64 bits for each cycle and removes that chunk from the buffer (it actually repeats until the buffer doesn't have more than 64 bits to consume). This means that the buffer will contain residual bits that will be used in the next append()
. See: https://github.com/satazor/SparkMD5/blob/state/spark-md5.js#L344
I understood what you ask for, but spark
cannot computed the md5
after each chunk, because the tail of the whole input must be computed differently (thats what end()
does).
For this to work as you want, you must call .getState()
and store it somewhere, perhaps in the local storage and then call .setState()
to resume the previous known state. Again, there's no problem storing it in the local storage because that will contain a maximum 64 bits plus a bit more because the object also contains the length and the current hash.
from js-spark-md5.
If you are having problems understanding what I'm trying to say, I can make you a quick example of getState()
and setState()
, plus local storage.
from js-spark-md5.
aha i see, so the binary object passed in during setState() is just the
latest 64kb of the file? along with the length and array values.
On Wed, Jul 1, 2015, 12:54 André Cruz [email protected] wrote:
Hi @satazor https://github.com/satazor - I had a look at this, and
perhaps there's still a misunderstand of what I'm looking for. The
functionality you added requires a Binary Blob as one of it's parameters,
which means that there must be resident in memory the entire file.No, the buffer is only 64 bits maximum. MD5 works by computing each 64
bits each time. Each time .append() is called, it concatenates what you
are passing into the buffer. Then if the buffer has at list 64 bits, then
spark computes a cycle and removes the computed chunk from the buffer.
This means that the buffer will contain residual bits that will be used in
the next append().I understood what you ask for, but spark cannot computed the md5 after
each chunk, because the tail of the whole input must be computed
differently (thats what end() does).For this to work as you want, you must call .getState() and store if
somewhere, perhaps in the local storage and then call .setState() to
resume the previous known state. Again, there's no problem storing it in
the local storage because that will contain a maximum 64 bits plus a bit
more because of the object itself.—
Reply to this email directly or view it on GitHub
#24 (comment).
from js-spark-md5.
Exactly, and its not 64kb
its 64bits which is really small.
from js-spark-md5.
@satazor aha, ok now I understand much better, and if the chunk is larger that 64 bits, does anything break or will SparkMD5 just ignore the previous data?
from js-spark-md5.
If you call append with a chunk of 16000000
bits (2 megabytes) it will execute the for loop 250000 times and will leave behind a buffer of 0 bits.
If you call append with a chunk of 16000020
bits (2 megabytes plus 20 bits) it will execute the for loop 250000 times and will leave behind a buffer of 20 bits.
SparkMD5 cannot ignore data otherwise the computed hash won't be correct.
from js-spark-md5.
OK, so I'll make sure to slice it to be exactly the latest 64 bits. Thanks! I am still working on my own integration and testing, will submit a PR for tests and docs when I'm certain it's working.
from js-spark-md5.
Yes that will leave you behind an empty buffer always. Can I ask why is it an issue to store the buffer?
from js-spark-md5.
You should be able to just do a JSON.stringify(spark.getState())
and store it in the local storage and then to resume: spark.setState(JSON.parse(storedvalue))
;
from js-spark-md5.
That's true, if I've already got to store the other two properties I may as well store the buffer as well. It's only replicating the latest 64bits. The reason I didn't previously is because I incorrectly assumed the buffer needed to be all the existing file data.
So, I will need to store the getState()
output, along with the chunk number attributed to the latest state. That way, during page load, if there are more chunks stored than I have indicated I've performed the md5 checksum on, I can get those missing chunks out of IndexedDB and catch up.
from js-spark-md5.
Yep. Let me know how it goes.
from js-spark-md5.
OK, so I gave this a shot and when I load in previous state data via setState
I get the error Uncaught RangeError: Source is too large
. It points to https://github.com/satazor/SparkMD5/blob/state/spark-md5.js#L594
{
buff: {
0: 152,
1: 236,
2: 242,
3: 228,
4: 227,
5: 122,
6: 9,
7: 54,
8: 225,
9: 69,
10: 88,
11: 21,
12: 19,
13: 78,
14: 133,
15: 255,
16: 0,
17: 7,
18: 255,
19: 217
},
hash: [
1518490340,
1206410578,
-1075573721,
1028990893
],
length: 22356
}
from js-spark-md5.
Hmm, can you provide a working example where I can reproduce and fix it?
from js-spark-md5.
@silverbucket I've updated the branch, can you try it? It should be ok now, I've added tests for getState and setState and they are all passing.
Btw I made a mistake in the explanation above, its not 64 bits its 64 bytes (512 bits)
The README is also updated. I will wait for your feedback before merging.
from js-spark-md5.
That fixed it! thanks. I'd still getting mismatched md5 checksums upon reload, but this could be an issue with my code. I'll keep working through it and if I still have issues I'll try to create a simplified example. If you've already made passing tests then I assume it's an issue on my end. Thanks for you help! Since you've already done the tests and updated the README, let me know if there's anything I can do to help.
from js-spark-md5.
Ok, I will be waiting. There's nothing important to be done, thanks!
from js-spark-md5.
Bump.
from js-spark-md5.
Literally just finished confirming everything is working now (I had a long weekend vacation). The issue I was having seems to be related to cases where the md5state wasn't being reset correctly (error in my code), so subsequent downloads of the same file started off with existing data. I fixed all that and am able to confirm that the state additions you made are working perfectly! So I think it's good to merge. Thanks again for your work!
from js-spark-md5.
Great!
from js-spark-md5.
Related Issues (20)
- Streams HOT 3
- Different md5 for the same file HOT 6
- License issue HOT 4
- Speed comparison to js-md5 HOT 2
- Demo link problem
- Provide functionality for calculating MD5 hashes of files HOT 3
- Array buffer allocation failed
- Seems like it doesn't work in Edge for cyrillic symbols HOT 2
- spark.end(); The results after two executions are different HOT 2
- No tag or release for 3.0.1 HOT 2
- MD5 Computed In IE11 Not Steady HOT 1
- IE10 compute file md5 failed
- Is there a sample implementation for react native expo? HOT 1
- in IE is error HOT 1
- md5 error ( d41d8cd98f00b204e9800998ecf8427e for all big files than 600M) HOT 1
- Fix deprecated unescape
- Provide ESM build HOT 5
- Uncaught TypeError: Cannot read properties of undefined (reading 'ArrayBuffer')
- URIError: malformed URI sequence HOT 1
- 超大体积文件多线程计算 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from js-spark-md5.