I see examples for updating an md5 incrementally based on incoming chunks of the file,

ping <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Incremental updates using existing md5 string? about js-spark-md5 HOT 27 CLOSED

silverbucket commented on May 30, 2024

Incremental updates using existing md5 string?

from js-spark-md5.

Comments (27)

silverbucket commented on May 30, 2024

ping @satazor ... is this possible to do with SparkMD5?

from js-spark-md5.

satazor commented on May 30, 2024

Yes you can use buffers or strings, check this spec for an example with a string: https://github.com/satazor/SparkMD5/blob/master/test/specs.js#L61

from js-spark-md5.

satazor commented on May 30, 2024

Oh sorry, didnt quite understand your question at first. I am afraid its not possible at the moment, but definivetly possible with some changes to the code and a way to set and get internal state.

from js-spark-md5.

silverbucket commented on May 30, 2024

@satazor any suggestions on how I could achieve this? I'm willing to give it a shot and submit a patch request if you aren't able to at the moment, but some advice from someone familiar with the code would help get me going in the right direction.

from js-spark-md5.

satazor commented on May 30, 2024

Hey @silverbucket

I've made it possible in the state branch.

There's two new functions, getState() => state and setState(state). Please test it and see how it works. At the moment I'm short on free time, so I would appreciate if you could add tests and documentation to the README on that feature branch.

Will wait for your feedback.

from js-spark-md5.

silverbucket commented on May 30, 2024

@satazor awesome! I will have a look at this today and let you know how it goes. I'm happy to add tests and docs assuming everything goes well.

from js-spark-md5.

silverbucket commented on May 30, 2024

Hi @satazor - I had a look at this, and perhaps there's still a misunderstand of what I'm looking for. The functionality you added requires a Binary Blob as one of it's parameters, which means that there must be resident in memory the entire file.

What I was asking about is, providing the md5 string of the latest bit of data, and computing the next bit without needed the entire file.

Let's say you have 1 file split into 5 chunks, and you want to continue to generate your md5 sum as the incoming chunks arrive.

init SparkMD5.ArrayBuffer object
begin transfer
chunk 1 -> 38049dbc642b1f957563aabd5874450c
chunk 2 -> c04347f6eebc69c5d787c4df247b42e8
chunk 3 -> b79276e973a320c6b53cb09c69beb587

Now, let's say at this point the transfer terminates, page reloads, or something else happens that aborts the file transfer. My question is, could we continue to generate the final md5 sum when the page reloads and we resume the transfer, by providing the latest md5 sum that we have (b79276e973a320c6b53cb09c69beb587) ?

init SparkMD5.ArrayBuffer
provide existing md5 as starting point
begin resume-able transfer
chunk 4 -> 6f68b61c9bf25371bf65d4ee3cf646cc
chunk 5 -> d85bdd9bc25a76431a7a17c13bdbc9fa

So the final md5 checksum (d85bdd9bc25a76431a7a17c13bdbc9fa) would match if we were to perform an md5 checksum on the completed downloaded file as a whole.

I've read this is possible (Incremental Checksum) but it appears like currently, SparkMD5 contains the entire file within it's object. Meaning we have 2 copies of the file being download resident in memory (one for the file transfer itself, the other within SparkMD5 .append() behavior. Is this correct? It's also quite possible I misunderstand the capabilities of incremental checksumming.

from js-spark-md5.

silverbucket commented on May 30, 2024

I've heard people refer to Adler/Fletcher checksumming as what does this, mostly for error detection of corrupt packets, etc.

from js-spark-md5.

satazor commented on May 30, 2024

Hi @satazor - I had a look at this, and perhaps there's still a misunderstand of what I'm looking for. The functionality you added requires a Binary Blob as one of it's parameters, which means that there must be resident in memory the entire file.

No, the buffer is only 64 bits maximum. MD5 works by computing 64 bits each time. Every time that .append() is called, it concatenates what you are passing into a buffer. Then if the buffer has atleast 64 bits, then spark computes 64 bits for each cycle and removes that chunk from the buffer (it actually repeats until the buffer doesn't have more than 64 bits to consume). This means that the buffer will contain residual bits that will be used in the next append(). See: https://github.com/satazor/SparkMD5/blob/state/spark-md5.js#L344

I understood what you ask for, but spark cannot computed the md5 after each chunk, because the tail of the whole input must be computed differently (thats what end() does).

For this to work as you want, you must call .getState() and store it somewhere, perhaps in the local storage and then call .setState() to resume the previous known state. Again, there's no problem storing it in the local storage because that will contain a maximum 64 bits plus a bit more because the object also contains the length and the current hash.

from js-spark-md5.

satazor commented on May 30, 2024

If you are having problems understanding what I'm trying to say, I can make you a quick example of getState() and setState(), plus local storage.

from js-spark-md5.

silverbucket commented on May 30, 2024

aha i see, so the binary object passed in during setState() is just the
latest 64kb of the file? along with the length and array values.

On Wed, Jul 1, 2015, 12:54 André Cruz [email protected] wrote:

Hi @satazor https://github.com/satazor - I had a look at this, and
perhaps there's still a misunderstand of what I'm looking for. The
functionality you added requires a Binary Blob as one of it's parameters,
which means that there must be resident in memory the entire file.

No, the buffer is only 64 bits maximum. MD5 works by computing each 64
bits each time. Each time .append() is called, it concatenates what you
are passing into the buffer. Then if the buffer has at list 64 bits, then
spark computes a cycle and removes the computed chunk from the buffer.
This means that the buffer will contain residual bits that will be used in
the next append().

I understood what you ask for, but spark cannot computed the md5 after
each chunk, because the tail of the whole input must be computed
differently (thats what end() does).

For this to work as you want, you must call .getState() and store if
somewhere, perhaps in the local storage and then call .setState() to
resume the previous known state. Again, there's no problem storing it in
the local storage because that will contain a maximum 64 bits plus a bit
more because of the object itself.

—
Reply to this email directly or view it on GitHub
#24 (comment).

from js-spark-md5.

satazor commented on May 30, 2024

Exactly, and its not 64kb its 64bits which is really small.

from js-spark-md5.

silverbucket commented on May 30, 2024

@satazor aha, ok now I understand much better, and if the chunk is larger that 64 bits, does anything break or will SparkMD5 just ignore the previous data?

from js-spark-md5.

satazor commented on May 30, 2024

If you call append with a chunk of 16000000 bits (2 megabytes) it will execute the for loop 250000 times and will leave behind a buffer of 0 bits.

If you call append with a chunk of 16000020 bits (2 megabytes plus 20 bits) it will execute the for loop 250000 times and will leave behind a buffer of 20 bits.

SparkMD5 cannot ignore data otherwise the computed hash won't be correct.

from js-spark-md5.

silverbucket commented on May 30, 2024

OK, so I'll make sure to slice it to be exactly the latest 64 bits. Thanks! I am still working on my own integration and testing, will submit a PR for tests and docs when I'm certain it's working.

from js-spark-md5.

satazor commented on May 30, 2024

Yes that will leave you behind an empty buffer always. Can I ask why is it an issue to store the buffer?

from js-spark-md5.

satazor commented on May 30, 2024

You should be able to just do a JSON.stringify(spark.getState()) and store it in the local storage and then to resume: spark.setState(JSON.parse(storedvalue));

from js-spark-md5.

silverbucket commented on May 30, 2024

That's true, if I've already got to store the other two properties I may as well store the buffer as well. It's only replicating the latest 64bits. The reason I didn't previously is because I incorrectly assumed the buffer needed to be all the existing file data.

So, I will need to store the getState() output, along with the chunk number attributed to the latest state. That way, during page load, if there are more chunks stored than I have indicated I've performed the md5 checksum on, I can get those missing chunks out of IndexedDB and catch up.

from js-spark-md5.

satazor commented on May 30, 2024

Yep. Let me know how it goes.

from js-spark-md5.

silverbucket commented on May 30, 2024

OK, so I gave this a shot and when I load in previous state data via setState I get the error Uncaught RangeError: Source is too large. It points to https://github.com/satazor/SparkMD5/blob/state/spark-md5.js#L594

{
  buff: {
    0: 152,
    1: 236,
    2: 242,
    3: 228,
    4: 227,
    5: 122,
    6: 9,
    7: 54,
    8: 225,
    9: 69,
    10: 88,
    11: 21,
    12: 19,
    13: 78,
    14: 133,
    15: 255,
    16: 0,
    17: 7,
    18: 255,
    19: 217
  },
  hash: [
    1518490340,
    1206410578,
    -1075573721,
    1028990893
  ],
  length: 22356
}

from js-spark-md5.

satazor commented on May 30, 2024

Hmm, can you provide a working example where I can reproduce and fix it?

from js-spark-md5.

satazor commented on May 30, 2024

@silverbucket I've updated the branch, can you try it? It should be ok now, I've added tests for getState and setState and they are all passing.

Btw I made a mistake in the explanation above, its not 64 bits its 64 bytes (512 bits)

The README is also updated. I will wait for your feedback before merging.

from js-spark-md5.

silverbucket commented on May 30, 2024

That fixed it! thanks. I'd still getting mismatched md5 checksums upon reload, but this could be an issue with my code. I'll keep working through it and if I still have issues I'll try to create a simplified example. If you've already made passing tests then I assume it's an issue on my end. Thanks for you help! Since you've already done the tests and updated the README, let me know if there's anything I can do to help.

from js-spark-md5.

satazor commented on May 30, 2024

Ok, I will be waiting. There's nothing important to be done, thanks!

from js-spark-md5.

satazor commented on May 30, 2024

Bump.

from js-spark-md5.

silverbucket commented on May 30, 2024

Literally just finished confirming everything is working now (I had a long weekend vacation). The issue I was having seems to be related to cases where the md5state wasn't being reset correctly (error in my code), so subsequent downloads of the same file started off with existing data. I fixed all that and am able to confirm that the state additions you made are working perfectly! So I think it's good to merge. Thanks again for your work!

from js-spark-md5.

satazor commented on May 30, 2024

Great!

from js-spark-md5.

Incremental updates using existing md5 string? about js-spark-md5 HOT 27 CLOSED

Comments (27)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent