Giter Site home page Giter Site logo

Comments (14)

chylex avatar chylex commented on June 26, 2024

I'll check it out, sounds like a weird bug.

from discord-history-tracker.

chylex avatar chylex commented on June 26, 2024

Also, what version of the tracker do you have? It's visible in Settings.

from discord-history-tracker.

ssokolow avatar ssokolow commented on June 26, 2024

I'm not sure if it's the same bug, but I ran into a similar problem (other party's nick getting replaced by mine) in my log.

Unfortunately, because of your userindex-based approach and varying ordering of the userindex list, there's a lot of these in the diffs I tried between my older dumps, making it difficult to identify when the bug occurred with trivial shell scripting:

-                "u": 0
+                "u": 1
-                "u": 1
+                "u": 0

from discord-history-tracker.

ssokolow avatar ssokolow commented on June 26, 2024

OK, I whipped up a quick little Python script to let me identify between which two old copies of the log one of the two user IDs became less prevalent despite more messages being added, and it looks like whatever went wrong happened between these two dumps:

-rw-rw-r-- 1 ssokolow ssokolow 1.8M Apr 28 19:14 dht.txt.old.13
-rw-rw-r-- 1 ssokolow ssokolow 2.1M Jul  3 21:59 dht.txt.old.14

That said, were I in your situation, I'd never have gone with the userindex-based approach specifically because it's so easy to introduce this kind of data-loss bug. It just feels like premature optimization at the cost of fragility to me to not store something like "u": "398493450724704277" in the first place. If space becomes an issue, either introduce gzip or implement chunking, depending on where it's becoming an issue.

from discord-history-tracker.

chylex avatar chylex commented on June 26, 2024

File size would definitely be an issue and unfortunately there are no browser APIs for compression (and a third party compression library would've made DHT several times larger and possibly not fit within bookmarklet/URL limits). Browsers can already take a long time to generate the download and run out of memory while tracking messages, which I think is beyond the realm of premature optimizations.

I'll look into the issue more, would be nice to have minimal reproduction steps but I suspect the issue is somewhere in archive combining code.

from discord-history-tracker.

ssokolow avatar ssokolow commented on June 26, 2024

Browsers can already take a long time to generate the download and run out of memory while tracking messages, which I think is beyond the realm of premature optimizations.

I only use DHT on a single private conversation per file, but that concern did come to mind. Have you considered support for chunked output as a complement to the default "pause tracking on encountering already-seen messages" behaviour?

from discord-history-tracker.

ssokolow avatar ssokolow commented on June 26, 2024

As for reproduction code, if you can provide me with something more suitable to batch operation to test with (ideally, something that'll run on the command-line under Node.js), I kept every revision dht.txt went through to hedge against just this kind of thing.

If I can trigger the problem with any of those, I can pare down the chatlog to something I'm willing to share.

from discord-history-tracker.

chylex avatar chylex commented on June 26, 2024

I haven't considered chunked output, DHT started as "save whatever your computer and browser can handle", so I tried to compact the JSON structure so that you could save a reasonable amount of messages (i.e. all of one person's DMs at minimum, up to a few hundred thousand messages).

What the project could really use is unit tests, but I'm barely working on it nowadays because it's low priority and there appear to be much more user-friendly alternatives :P but anyway, I'll go over the code and try to find a reproducible example.

from discord-history-tracker.

ssokolow avatar ssokolow commented on June 26, 2024

What the project could really use is unit tests, but I'm barely working on it nowadays

That sounds familiar... though mine being low priority is far less my choice than I'd like. :)

from discord-history-tracker.

ssokolow avatar ssokolow commented on June 26, 2024

Oh, speaking of which...

and there appear to be much more user-friendly alternatives

Which ones are you thinking of? Yours was the only Linux-compatible solution that turned up last time I googled around.

from discord-history-tracker.

chylex avatar chylex commented on June 26, 2024

Which ones are you thinking of? Yours was the only Linux-compatible solution that turned up last time I googled around.

Fair enough, though Discord Chat Exporter has a multiplatform CLI version, which may still be more "user-friendly" than dealing with the mess I made :P. Haven't used it though, so I can't tell.

Anyway I found the bug, or at least one bug - the archive combining code has a safeguard in case a user was missing from the index, and it coerced "undefined" and "0" into the same thing, so a valid user was being considered invalid with a fallback to its original (and now wrong) ID from the other archive.

Stupid mistake on my part, but at least this could only happen when combining archives after tracking messages, which is probably why barely anyone noticed because the recommended steps are to upload the archive first and only then start tracking new messages.

Do you remember uploading the archive after tracking, or combining multiple archives together, at the time where diffs show the changes? Otherwise there may be more than 1 issue.

from discord-history-tracker.

ssokolow avatar ssokolow commented on June 26, 2024

I think I remember doing "upload, track, re-upload" (with the same dht.txt) at some point in time for some reason that now escapes me, but my memory for dates and times is terrible.

from discord-history-tracker.

chylex avatar chylex commented on June 26, 2024

Well, the diff looks like what happened in my test case and what happened to OP, but the OP mentioned renamed user accounts which didn't make sense.

I'll push the fix and close the issue, then. Unfortunately the only reliable way to fix corrupted archives is to load the archive and re-track all messages. Even with your full revision history, it'd probably take less time to re-track than to script a fix based on the diffs.

from discord-history-tracker.

ssokolow avatar ssokolow commented on June 26, 2024

I already re-dumped but, just in case, I might try a little script in the future to be sure.

It shouldn't be too difficult or time-consuming for me to whip up a little Python script which walks through from oldest to newest revision, building its own list of message dicts with user IDs rather than indexes, and then raise the alarm if, after the process is finished, there are any mismatches between the first appearance of a given message ID and the most recent dump's copy.

from discord-history-tracker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.