Giter Site home page Giter Site logo

reddit-user-media-downloader-public's Introduction

Reddit-User-Media-Downloader-Public

Download all the picture/video posts from a particular user on Reddit, for uh...reasons. Automatically removes duplicates โญ.

QuickStart with Docker

The simplest way to run this tool is from Docker due to the number of overlapping dependencies.

  1. Install Docker from this link - https://docs.docker.com/engine/install/ (make sure virtualization is enabled in your BIOS)
  2. Pull the image from Docker Hub with
docker pull monkeymaster64/reddit-media-downloader:latest
  1. To run the tool, from the shell (CMD for Windows, bash for Linux/MAC) run the following commmand
docker run -v "[Path to folder to store output]:/usr/src/app/Reddit-User-Media-Downloader-Public/output" monkeymaster64/reddit-media-downloader --user [Reddit username] --limit [maximum number of posts to download from user]

An example is

docker run -v "C:\Users\User\Downloadsr:/usr/src/app/Reddit-User-Media-Downloader-Public/output" monkeymaster64/reddit-media-downloader --user monkeymaster64 --limit 10

Usage from Command Prompt

If you choose to run the tool natively, here is how you'd run it from Python in your shell

--limit tag is optional

python3 reddit-media-downloader.py --user [case-sensitive username] --limit [maximum number of posts to download from user]

Requirements

  • Python 3.8
  • Microsoft Visual Studio Build Tools (C++ Build Tools) <--- only required if you're on Windows

Python libraries

  • youtube_dl
  • imagededup
  • OpenCV2
  • Cython
  • Requests

How to Install Python Requirements

pip install -r requirements.txt

How to Install Visual C++ Build Tools

  1. Download the executable from this link - https://visualstudio.microsoft.com/visual-cpp-build-tools/
  2. Select the "C++ Build Tools" workload under "Desktop and Mobile"
  3. When it's finished downloading and installing, restart your PC

C++ Build Tools Instructions

reddit-user-media-downloader-public's People

Contributors

derwana avatar kdknigga avatar monkeymaster64 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reddit-user-media-downloader-public's Issues

Error thrown by running script on 'empty' account

When running the script on an account without any posted videos or images find_duplicates() throws an error that should be catched.

0
[]
2021-05-25 01:01:00,306: INFO Start: Calculating hashes...
0it [00:00, ?it/s]
2021-05-25 01:01:00,384: INFO End: Calculating hashes!
Traceback (most recent call last):
  File "path/to/Reddit-User-Media-Downloader-Public/reddit-media-downloader.py", line 199, in <module>
    main()
  File "path/to/Reddit-User-Media-Downloader-Public/reddit-media-downloader.py", line 189, in main
    duplicates = phasher.find_duplicates(encoding_map=encodings)
  File "path/to/Reddit-User-Media-Downloader-Public/venv/lib/python3.9/site-packages/imagededup/methods/hashing.py", line 338, in find_duplicates
    raise ValueError('Provide either an image directory or encodings!')
ValueError: Provide either an image directory or encodings!

Code cannot handle two dashes in a username

Tested with username "wheatbread-and-toes" on reddit. The code just goes ahead and searches for random usernames containing multiple dashes (-) and downloads multiple profiles after another.

Overengineered nonsense

99% of dedup use cases would have been covered by a simple hash compare instead of all the crap you're forcing me to install.

Error when writing file to Windows when URL contains `?`

Traceback (most recent call last):
File "reddit-media-downloader.py", line 184, in
main()
File "reddit-media-downloader.py", line 164, in main
get_posts('submission', {**json.loads(args.pushshift_params), 'subreddit':args.subreddit, 'author':args.user}, submission_callback)
File "reddit-media-downloader.py", line 57, in get_posts
cb(data)
File "reddit-media-downloader.py", line 69, in submission_callback
process_submission(post)
File "reddit-media-downloader.py", line 81, in process_submission
with open(f"{post['author']}{datetime.datetime.now().strftime('%Y-%m-%dT%H%M%S')}-{post['url'].split('/')[-1]}", "wb+") as f:
OSError: [Errno 22] Invalid argument: 'theawesomekate\2021-05-25T102048-qSkDhpk.jpg?1'

524 server error

Hi, I am running Linux Mint 21 and I installed all the required dependencies but I'm still getting this error.

sudo docker run -v "~/home/user/reddit/" monkeymaster64/reddit-media-downloader --user *******

Traceback (most recent call last):
  File "reddit-media-downloader.py", line 199, in <module>
    main()
  File "reddit-media-downloader.py", line 182, in main
    get_posts('submission', {**json.loads(args.pushshift_params), 'subreddit':args.subreddit, 'author':args.user}, submission_callback)
  File "reddit-media-downloader.py", line 56, in get_posts
    res.raise_for_status()
  File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 524 Server Error:  for url: https://api.pushshift.io/reddit/submission/search?author=******&size=100&before=1672410115

Is there anything I can do to make it run or is it a server-side error?

imagededup cloning?

For the Linux Setup: why cloning imagededup, when it's already in the requirements.txt?

Doesn't run any more

Hi, I'm getting this error on running it on docker and have not been able to fix the issue

[Errno 17] File exists: '/usr/src/app/Reddit-User-Media-Downloader-Public/output/Lydiagh0st'
Traceback (most recent call last):
File "reddit-media-downloader.py", line 199, in
main()
File "reddit-media-downloader.py", line 180, in main
get_posts('submission', {**json.loads(args.pushshift_params), 'subreddit':args.subreddit, 'author':args.user}, submission_callback, int(args.limit))
File "reddit-media-downloader.py", line 56, in get_posts
res.raise_for_status()
File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.pushshift.io/reddit/submission/search?author=Lydiagh0st&size=100&before=1701661767

Thanks

Code does not gracefully handle 404 pages

For some imgur pages, when 404 is hit, file is downloaded, and file is not image/video, seems to be filled with only html code:

Example

Filename: 2021-05-25T102021-EDGEi8B
Contents:
<!doctype html> <title>I'll scrunch my nose, you make me scrunch my toes? - Album on Imgur</title> <script>dataLayer=[];var pbjs=pbjs||{};pbjs.que=pbjs.que||[]</script> <script async="true">!function(){var e=document.createElement("script"),t=document.getElementsByTagName("script")[0],n="https://quantcast.mgr.consensu.org".concat("/choice/","f8oruOqDFlMeI","/","imgur.com","/choice.js"),i=0;e.async=!0,e.type="text/javascript",e.src=n,e.onload=function(){var e=document.createEvent("Event");e.initEvent("cmpLoaded",!0,!0),window.dispatchEvent(e)},t.parentNode.insertBefore(e,t),function(){for(var e,a="__tcfapiLocator",i=[],o=window;o;){try{if(o.frames[a]){e=o;break}}catch(e){}if(o===window.top)break;o=o.parent}e||(function e(){var t=o.document,n=!!o.frames[a];if(!n)if(t.body){var i=t.createElement("iframe");i.style.cssText="display:none",i.name=a,t.body.appendChild(i)}else setTimeout(e,5);return!n}(),o.__tcfapi=function(){var e,t=arguments;if(!t.length)return i;if("setGdprApplies"===t[0])3<t.length&&2===t[2]&&"boolean"==typeof t[3]&&(e=t[3],"function"==typeof t[2]&&t2);else if("ping"===t[0]){var n={gdprApplies:e,cmpLoaded:!1,cmpStatus:"stub"};"function"==typeof t[2]&&t2}else i.push(t)},o.addEventListener("message",function(i){var a="string"==typeof i.data,e={};try{e=a?JSON.parse(i.data):i.data}catch(e){}var o=e.__tcfapiCall;o&&window.__tcfapi(o.command,o.version,function(e,t){var n={__tcfapiReturn:{returnValue:e,success:t,callId:o.callId}};a&&(n=JSON.stringify(n)),i.source.postMessage(n,"*")},o.parameter)},!1))}();var a=function(){var e=arguments;typeof window.__uspapi!==a&&setTimeout(function(){void 0!==window.__uspapi&&window.__uspapi.apply(window.__uspapi,e)},500)};if(void 0===window.__uspapi){window.__uspapi=a;var o=setInterval(function(){i++,window.__uspapi===a&&i<3?console.warn("USP is not accessible"):clearInterval(o)},6e3)}}(),"function"==typeof window.__uspapi&&window.__uspapi("uspPing",1,function(e,t){t&&e.mode.includes("USP")&&e.jurisdiction.includes(e.location.toUpperCase())&&window.__uspapi("setUspDftData",1,function(e,t){t||console.log("Error: USP string not updated!")})})</script> If you're seeing this message, that means JavaScript has been disabled on your browser, please enable JS to make Imgur work.

<script class="abp" src="https://s.imgur.com/min/px.js?ch=1"></script> <script class="abp" src="https://s.imgur.com/min/px.js?ch=2"></script> <script>!function(e,t,n,o,c,a,f){e.fbq||(c=e.fbq=function(){c.callMethod?c.callMethod.apply(c,arguments):c.queue.push(arguments)},e._fbq||(e._fbq=c),(c.push=c).loaded=!0,c.version="2.0",c.queue=[],(a=t.createElement(n)).async=!0,a.src="//connect.facebook.net/en_US/fbevents.js",(f=t.getElementsByTagName(n)[0]).parentNode.insertBefore(a,f))}(window,document,"script"),fbq("init","742377892535530"),fbq("track","PageView"),"?reg"===document.location.search&&fbq("track","CompleteRegistration")</script>

<script src="https://s.imgur.com/desktop-assets/js/main.8864cd67e6ebb60dd112.js"></script>

conflicting tensorflow requirements make installing imagededup fail

snip of CMD output:

ERROR: Cannot install -r requirements.txt (line 1) because these package versions have conflicting dependencies.

The conflict is caused by:
    imagededup 0.1.0 depends on tensorflow==2.0.0
    imagededup 0.0.4 depends on tensorflow==2.0.0
    imagededup 0.0.3 depends on tensorflow==1.13.1
    imagededup 0.0.2 depends on tensorflow==1.13.1
    imagededup 0.0.1 depends on tensorflow==1.13.1
    ```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.