Giter Site home page Giter Site logo

sanqui / discard2 Goto Github PK

View Code? Open in Web Editor NEW
37.0 3.0 1.0 38.18 MB

Discard2 is a high fidelity archival tool for the Discord chat platform

License: MIT License

TypeScript 95.16% JavaScript 0.72% Dockerfile 1.03% Lua 0.68% Python 2.41%
archiver discord

discard2's People

Contributors

sanqui avatar thetechrobo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

salman-irfan

discard2's Issues

Caught error while performing task: TypeError: Cannot read properties of undefined (reading 'click')

*** Task: ProfileDiscordTask (0 more)
Caught error while performing task: TypeError: Cannot read properties of undefined (reading 'click')
Saved screenshot to out/20230419T130738-profile/error.png
Closing dummy capture tool
/app/node_modules/brotli/build/encode.js:3
1<process.argv.length?process.argv[1].replace(/\/g,"/"):"unknown-program");b.arguments=process.argv.slice(2);"undefined"!==typeof module&&(module.exports=b);process.on("uncaughtException",function(a){if(!(a instanceof y))throw a;});b.inspect=function(){return"[Emscripten Module object]"}}else if(x)b.print||(b.print=print),"undefined"!=typeof printErr&&(b.printErr=printErr),b.read="undefined"!=typeof read?read:function(){throw"no read() available (jsc?)";},b.readBinary=function(a){if("function"===

                                                                                                  ^

TypeError: Cannot read properties of undefined (reading 'click')
at ProfileDiscordTask._getEmail (/app/src/crawler/projects/discord/profile.ts:43:37)
at runMicrotasks ()
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async ProfileDiscordTask.perform (/app/src/crawler/projects/discord/profile.ts:64:23)
at async Crawler.run (/app/src/crawler/crawl.ts:312:28)
at async crawler (/app/src/cli.ts:83:5)
at async Command. (/app/src/cli.ts:89:9)

Huge channels sometimes have slow search results

Not sure if this is just my Internet or the channel being huge, but figured I'd report it either way (if it's my Internet, you should add a way to increase the timeout).

https://discord.gg/2cujRs9K : in the #bot channel, scraping fails with

*** Task: ChannelDiscordTask (7 more)
Channel 514225574051446795 opened
Caught error while performing task: Error: Did not get search results after 10 seconds.
Saved screenshot to out/20220607T020524-resume/error.png
Stopping mitmdump
/home/thetechrobo/discard2/node_modules/brotli/build/encode.js:3
1<process.argv.length?process.argv[1].replace(/\\/g,"/"):"unknown-program");b.arguments=process.argv.slice(2);"undefined"!==typeof module&&(module.exports=b);process.on("uncaughtException",function(a){if(!(a instanceof y))throw a;});b.inspect=function(){return"[Emscripten Module object]"}}else if(x)b.print||(b.print=print),"undefined"!=typeof printErr&&(b.printErr=printErr),b.read="undefined"!=typeof read?read:function(){throw"no read() available (jsc?)";},b.readBinary=function(a){if("function"===
                                                                                                                                                                                                                              ^

Error: Did not get search results after 10 seconds.
    at performAndWaitForSearchResults (/home/thetechrobo/discard2/src/crawler/projects/discord/channel.ts:131:27)
    at async ChannelDiscordTask._searchAndClickFirstResult (/home/thetechrobo/discard2/src/crawler/projects/discord/channel.ts:136:29)
    at async ChannelDiscordTask.perform (/home/thetechrobo/discard2/src/crawler/projects/discord/channel.ts:253:24)
    at async Crawler.run (/home/thetechrobo/discard2/src/crawler/crawl.ts:258:28)
    at async crawler (/home/thetechrobo/discard2/src/cli.ts:71:5)
    at async Command.<anonymous> (/home/thetechrobo/discard2/src/cli.ts:148:9)

Here's the error screenshot:
error

Getting tons of Trakt error messages on Kodi

I installed the latest version of kodi. Since then I keep getting multiple Trakt error messages although it still works. Trakt said it was a kodi problem. The messages I'm getting are: Trakt error 502, Remote communication server failed to start and Trakt wait limit reached 429. I am mainly using scrubs v2. Anyone?

I don't do builds and I"m a newbie so please dumb it down for me. Thanks!

Add a way to do "explain" ala ArchiveBot

There are three ways I can think of implementing this:

  1. Add an explain for jobs
  2. Add an explain for individual tasks
  3. Add an "Explain" task (as suggested in #6 (comment))

1. and 2. aren't mutually exclusive and I think are good ideas. The job explain could be simply provided using --explain on the command line. I can't think a good reason for 3. but maybe @TheTechRobo can give some example use case?

Create "result" structure for tasks

Individual tasks should report on their runtime results: in particular, names of servers and channels, IDs of first and last messages encountered, perhaps total number of messages seen.

Logical way of implementing #3 as well as paving way for incremental jobs ("get all messages up to the newest messages in this finished job").

Unable to login

Error: Captcha detected on login.  It is recommended you log into this account manually in a browser from the same IP.

No further option is suggested to work around this. A browser is already logged in with no success

Add a crawler to capture all DMs

When you have a lot of DMs, it's not practical to run them one at a time. I see DMs as channels rather than servers, so there should be something to load all of them.

Periodically save the message ID to the state.json

This would be a path to resuming the crawl without using the date filters (which is a bad way to resume since there are the messages loaded when the channel is clicked on included in the dump).

The message ID fetched should preferably be from the top of the loaded messages so it doesn't accidentally skip surrounding messages.

Unlike #6, this would be good for putting inside the task listed as "current" in the state.json so crashed jobs can be quickly and easily resumed.

TimeoutError when loading a server if server is low in the server list

~/d/o/c/krita git:master ❯❯❯ npm run start -- crawler server 435123295550046218 -c mitmdump --headless

> start
> ts-node ./src/cli.ts "crawler" "server" "435123295550046218" "-c" "mitmdump" "--headless"

Using mitmdump at /usr/bin/mitmdump
Initiated new job state name 20220623T214047-server
Starting mitmdump
mitmdump stderr:  /home/thetechrobo/.local/lib/python3.9/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.14.0-unknown is an invalid version and will not be supported in a future release
  warnings.warn(

*** Task: LoginDiscordTask (2 more)
Filling in login form
Logged in: [redacted]
*** Task: ProfileDiscordTask (1 more)
Email read as  [edit: redacted]
*** Task: ServerDiscordTask (0 more)
Caught error while performing task: TimeoutError: waiting for selector `#channels ul li a[href^="/channels/435123295550046218"]` failed: timeout 30000ms exceeded
Saved screenshot to out/20220623T214047-server/error.png
Stopping mitmdump
/media/thetechrobo/2tb/discard2/node_modules/brotli/build/encode.js:3
1<process.argv.length?process.argv[1].replace(/\\/g,"/"):"unknown-program");b.arguments=process.argv.slice(2);"undefined"!==typeof module&&(module.exports=b);process.on("uncaughtException",function(a){if(!(a instanceof y))throw a;});b.inspect=function(){return"[Emscripten Module object]"}}else if(x)b.print||(b.print=print),"undefined"!=typeof printErr&&(b.printErr=printErr),b.read="undefined"!=typeof read?read:function(){throw"no read() available (jsc?)";},b.readBinary=function(a){if("function"===
                                                                                                                                                                                                                              ^

TimeoutError: waiting for selector `#channels ul li a[href^="/channels/435123295550046218"]` failed: timeout 30000ms exceeded
    at new WaitTask (/media/thetechrobo/2tb/discard2/node_modules/puppeteer/src/common/DOMWorld.ts:813:28)
    at DOMWorld.waitForSelectorInPage (/media/thetechrobo/2tb/discard2/node_modules/puppeteer/src/common/DOMWorld.ts:656:22)
    at Object.internalHandler.waitFor (/media/thetechrobo/2tb/discard2/node_modules/puppeteer/src/common/QueryHandler.ts:78:19)
    at DOMWorld.waitForSelector (/media/thetechrobo/2tb/discard2/node_modules/puppeteer/src/common/DOMWorld.ts:511:25)
    at Frame.waitForSelector (/media/thetechrobo/2tb/discard2/node_modules/puppeteer/src/common/FrameManager.ts:1290:47)
    at Page.waitForSelector (/media/thetechrobo/2tb/discard2/node_modules/puppeteer/src/common/Page.ts:3210:29)
    at openServer (/media/thetechrobo/2tb/discard2/src/crawler/projects/discord/server.ts:22:20)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async ServerDiscordTask.perform (/media/thetechrobo/2tb/discard2/src/crawler/projects/discord/server.ts:64:9)

I think I know why this is happening. This has happened multiple times in different servers and the common factor is that it's down near the bottom of the server list. Indeed, when I moved the server to the top of the list, it worked.

Can't add trailing slashes when resuming

npm run start -- crawler resume 20220618T213621-server/ -c mitmdump --headless results in:

[Error: ENOENT: no such file or directory, open '20220618T213621-server//state.json'] {
  errno: -2,
  code: 'ENOENT',
  syscall: 'open',
  path: '20220618T213621-server//state.json'
}

Add the server name to the state file

Doing this would be really helpful for seeing what server I scraped without having to find its ID (I leave a server after I archive it, so it's really annoying).

Broken in Firefox

Chrome's RAM usage steadily grows when using it with this. I wanted to see if Firefox did any better - Puppeteer does indeed support it. But it's broken. It can't get past the login page - it never ends up filling the form. I'm not sure why.

Discard2 reader fails when Python 2 is default on system

~/discard2 git:master ❯❯❯ npm run --silent start -- reader -f raw-jsonl $JOB_DIRECTORY > $JOB_DIRECTORY/jsonl.jsonl
mitmproxy read stderr: Traceback (most recent call last):
  File "mitmproxy/read.py", line 9, in <module>
    from mitmproxy import io, http
ModuleNotFoundError: No module named 'mitmproxy'

/home/thetechrobo/discard2/node_modules/brotli/build/encode.js:3
1<process.argv.length?process.argv[1].replace(/\\/g,"/"):"unknown-program");b.arguments=process.argv.slice(2);"undefined"!==typeof module&&(module.exports=b);process.on("uncaughtException",function(a){if(!(a instanceof y))throw a;});b.inspect=function(){return"[Emscripten Module object]"}}else if(x)b.print||(b.print=print),"undefined"!=typeof printErr&&(b.printErr=printErr),b.read="undefined"!=typeof read?read:function(){throw"no read() available (jsc?)";},b.readBinary=function(a){if("function"===
                                                                                                                                                                                                                              ^

Error: mitmproxy read exited with code 1
    at ChildProcess.<anonymous> (/home/thetechrobo/discard2/src/reader/mitmproxy.ts:22:19)
    at ChildProcess.emit (node:events:394:28)
    at ChildProcess.emit (node:domain:475:12)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12)

I've got mitmproxy installed via both pip (--user) and apt.

Get server emojis

This is listed in the README, but I figured I'd report it as an issue here for discussion.

Maybe this would work: https://discord.com/developers/docs/resources/emoji#list-guild-emojis

But if you don't want to manually make requests to the Discord API, I think this system would work:

  • For every channel encountered, see if you can type in it.
  • If so, click the emoji button.
  • If not, check if you can add a reaction to the latest message. If so, click the button to add one.
  • This opens the emoji picker. Scroll until you've scrolled through all emojis for that server.
  • If you couldn't open the emoji picker, try again in the next channel. Chances are, you'll eventually find a channel that meets either criteria.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.