shadowmoose / redditdownloader Goto Github PK

Scrapes Reddit to download media of your choice.

Python 100.00%

reddit backup downloader python3 archival scraper reactjs media

redditdownloader's Introduction

Reddit Media Downloader

Update as of July, 2023

Due to the Reddit API changes, RMD will no longer function. First the Reddit admins destroyed PushShift, the platform RMD heavily relied on for historical data, and now they have completely restricted access to their direct API.

In conjunction, these actions make it impossible for RMD to operate. It would have been possible to require users to register their own application keys in order to continue, but this process has also now been restricted by Reddit. Additionally, I (the primary dev of RMD) not longer wish to work with or support anything involving the company. While this was already a discouraging factor before the latest changes, this latest debacle has made it clear where Reddit stands.

I'll leave the project up for a little while longer, then likely archive it all and move on. I'll presrve as much as possible when I do - including the original README below. Other users are welcome to fork the project and continue onward, if anybody is willing to continue in spite of the Admin actions pushing against them.

Thanks for all the support for the project over the years. It's been great. Catch you at the next one.

~ ShadowMoose

Original text below:

Let's face it: In this day and age, the internet is ephemeral. Anything anybody posts might be here one day, and gone the next. Websites take down or move their images and videos, posts get hidden or removed, and their content is lost. Even more so on Reddit, where accounts are constantly springing into existence - and then vanishing as quickly without a trace. How's anybody supposed to keep their cherished cat picture collection around with all this change? Well fear not, my data-hoarding friend, because Reddit Media Downloader is here for you!

What is this?

Reddit Media Downloader is a program built to scan Comments & Submissions from multiple sources on Reddit, and download any media they contain or link to. It can handle scanning posts from your personal Upvoted/Saved lists, subreddits of your choice, user-curated multireddits, and more! When it finds a Comment or Submission from wherever you specify, it will parse the text and links within to find any media linked to. It then uses multiple downloaders to save this media locally onto your disk.

RMD comes equipped with a suite of options to let it scan just about anywhere you can find media on Reddit. Coupled with its powerful baked-in Filter options to let you specify exactly what type of posts and comments you're looking for ("I only want Submissions with 'Unicorn' in the title, and no less than 1000 upvotes!"), RMD makes automatically saving things a simple process. Built in Python and runnable headless (without a GUI) or as its own server, you can launch this program anywhere - and it's built from the ground-up to make automation a breeze.

Check out the different places RMD can scan!

Things RMD Can Do:

Extract links inside comments, links in Submissions, and links within selfpost text.
Work with links to most video sites, image hosts, and image blogs.
Avoid saving duplicates of the same file, by using image recognition to compare similar pictures.
Automatically seek to - and resume - where it last left off downloading.
Launch a web-based UI (locally or remotely) to make even the most complex configuration setups simple.

Getting Started:

Click here to visit the documentation site!

New Feature Requests or Issues?

If you hit any links which RMD does not support, find anything broken, need assistance, or just want to suggest new features, please open a new request here.

Updates?

RMD is currently under a large-scale rewrite in a new language, which will greatly expand its capabilities. If you don't see much recent activity in the master branch, or in the bug reports, it's probably because the rewrite already fixes the problems or adds the requested features. In the meantime, the current release of RMD is considered extremely stable, and should work exactly as expected.

redditdownloader's People

Contributors

Stargazers

Watchers

Forkers

dedsec1 ikishorek kurzondax katie1995 riviera12345 nilportugues suhkotos studio-vx torchinz vigorpush smoke-pgf github-userx arujit shadow-bot drmaxis jiangtianzheng mguterl jgaruti daystrominst girisagar46 darkchjo ruffi123456789 halfk1ng hoyratat jasonqhuang mohitjoshi155 vman45 lxngoddess5321 benmtl grinco samoo-7 n30z3n cjwesley cece95 the-compiler nguyenanhtuan1008 visiont3lab moonriverlover koeningyou vyscode michaelluxeder cheesegrenades ichoake athrva98 l4rb1 hixio-mh greysian777 0x00finch vohzd creatibefiction trytotrycode gacoder pronoy2108 gymfreak00 driter ppldl joe07ski ssamuel89 jnrnzedb sour4bh cob05 dontbanmeplz b13rg shoklan14 zigzix python-repository-hub worldsmoker sburgett winternebs powerm4 kungfoome dnanadev geesegoo swifilaboroka ukaserge osamabenladi rddd468 nulledexceptions aazsamir mohseninima kkpan11 luis-herrera-aburto rktfier szynaka okhmhaka heywood8 nightwishvan 5l1v3r1 ywang1224 phatlynx thanhy08 fd2013 testxsubject elgiguiere nakedlitttlezombie luui10 haiya512 nymbo tek314159 digudc

redditdownloader's Issues

Feeding a list of subreddits

Hi, I have a list of nearly 500 subreddits I´m subscribed to. Is there a way to directly feeding it into the app, or you have to add them manually, one by one? (I normally use cat + parallel to feed a list of subreddits into an "argumentable" script, but yours doesnt seem to have the oportunity to add a subreddit by argument (eg script.py --subreddit X)) Thanks

2FA not supported

For the initial login I used <password>:<auth-code> to login. That was successful but it seems every time the script is invoked, Reddit is expecting a new auth code on the password. I'd rather not disable my 2FA for this.

[parker@inspiron15 RedditDownloader]$ python main.py

====================================
    Reddit Media Downloader 2.0
====================================
    (By ShadowMoose @ Github)

Loading all settings from file.
Loaded Source:  <source>
Authenticating via OAuth...
Traceback (most recent call last):
  File "main.py", line 227, in <module>
    p = Scraper(settings, custom_settings)
  File "main.py", line 116, in __init__
    reddit.login()
  File "./classes/reddit/reddit.py", line 36, in login
    _user = _reddit.user.me()
  File "/usr/lib/python3.6/site-packages/praw/models/user.py", line 95, in me
    user_data = self._reddit.get(API_PATH['me'])
  File "/usr/lib/python3.6/site-packages/praw/reddit.py", line 371, in get
    data = self.request('GET', path, params=params)
  File "/usr/lib/python3.6/site-packages/praw/reddit.py", line 486, in request
    params=params)
  File "/usr/lib/python3.6/site-packages/prawcore/sessions.py", line 182, in request
    params=params, url=url)
  File "/usr/lib/python3.6/site-packages/prawcore/sessions.py", line 113, in _request_with_retries
    data, files, json, method, params, retries, url)
  File "/usr/lib/python3.6/site-packages/prawcore/sessions.py", line 98, in _make_request
    params=params)
  File "/usr/lib/python3.6/site-packages/prawcore/rate_limit.py", line 32, in call
    kwargs['headers'] = set_header_callback()
  File "/usr/lib/python3.6/site-packages/prawcore/sessions.py", line 142, in _set_header_callback
    self._authorizer.refresh()
  File "/usr/lib/python3.6/site-packages/prawcore/auth.py", line 328, in refresh
    password=self._password)
  File "/usr/lib/python3.6/site-packages/prawcore/auth.py", line 142, in _request_token
    payload.get('error_description'))
prawcore.exceptions.OAuthException: invalid_grant error processing request

Proliferate User-Agent through all Handlers

All Handlers, where possible to set, should use the supplied User-Agent string - not just the Reddit client.
This will prevent some sites from potentially blocking API calls due to common User-Agent strings.

This is halfway implemented in testing, but needs completion.

RMD backend keeps disconnecting

Hi.
I have been really liking RMD. One particular issue, that I have been facing since the first time I used it, is that the browser integration or the WebUI, keeps on disconnecting (as shown in the attached screenshot). It becomes very difficult to add sources using webui as it keeps quitting on me. I don't think the port has anything to do with it. Currently, I have tweaked my settings file where I have kept the "keep it open" parameter as "true". But the UI is still quitting. I have tried downloading Chrome for the same but the same thing happens there too. I have tried changing browser from default and chrome but not to my help.
I hope you can show me what I maybe doing wrong here. Thank you.

"deduplicate files" setting not working

The Setting under Output "deduplicate files" is not working for me correctly.
It doesn't detect duplicate and saves them all.

Converting Legacy Manifest?

Ok so using your latest commit, 7022827, it was working 3-4 times, not mow it gives an error of converting legacy manifest. Please help.

Windows Directory Filename Bug

There appears to be an interesting issue with Windows directory names, where Windows isn't able to properly handle any directory ending with a trailing space. Despite this, the OS will allow you to create these directories without complaining - but will then be unable to rename/delete the folder through typical means.

Due to the way duplicate file names were handled, this made it possible to end up with directories that Windows secretly can't work with through the UI. The only solution is to delete the problematic directory using its Windows short path, instead of the normal file name (see here).

With the default RMD file pattern, this issue is probably unlikely to happen, however I've pushed out a fix (02eb467) to be sure it won't be capable of generating an infringing directory using any dynamically-inserted values. This will be rolled out soon along-side the new Threading patch.

Hangs on long file names in Ubuntu

Thanks for the application. I’m having this issue on Ubuntu where a long title.

We just hang with Errno 36 filename too long and do nothing. I'm downloading to a share on my network from a Ubuntu host.

Is there a fix for this?

RedditDownloader crashes when trying to rip a post with a long title

Unfortunately, right in the middle of a 17,000+ post rip, RedditDownloader failed out with this error:

Traceback (most recent call last): File "main.py", line 175, in <module> p = Scraper(settings, custom_settings) File "main.py", line 105, in __init__ self.processor.run() File "./classes\elementprocessor.py", line 35, in run self.process_ele(ele) File "./classes\elementprocessor.py", line 60, in process_ele file_path = self.process_url(url, file_info) File "./classes\elementprocessor.py", line 73, in process_url ret = h.handle(url, info) File "./classes/handlers\imgur.py", line 276, in handle downloader.save_images(targ_dir) File "./classes/handlers\imgur.py", line 135, in save_images os.makedirs(album_folder) File "E:\Program Files\Python\Python36\lib\os.py", line 220, in makedirs mkdir(name, mode) OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect

The post that it was trying to scrape before it crashed had a very long name, and I suspect that this was the problem.

Tried to run a Raspberry Pi... AttributeError: 'NoneType' object has no attribute 'erase_screen'

Hi @shadowmoose ,

What can be causing the following error? Seems to be python related. (Colorama is in the newest version)

Thanks a lot a keep up the amazing work.


Downloading from Source: foo
Element loading complete.

Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Traceback (most recent call last):
  File "main.py", line 227, in <module>
    p.run()
  File "main.py", line 124, in run
    self.processor.run()
  File "./classes/processing/elementprocessor.py", line 40, in run
    self.redraw(clear, q)
  File "./classes/processing/elementprocessor.py", line 92, in redraw
    print(out.rstrip(), end='')
  File "/usr/local/lib/python3.4/dist-packages/colorama/ansitowin32.py", line 40, in write
    self.__convertor.write(text)
  File "/usr/local/lib/python3.4/dist-packages/colorama/ansitowin32.py", line 141, in write
    self.write_and_convert(text)
  File "/usr/local/lib/python3.4/dist-packages/colorama/ansitowin32.py", line 167, in write_and_convert
    self.convert_ansi(*match.groups())
  File "/usr/local/lib/python3.4/dist-packages/colorama/ansitowin32.py", line 181, in convert_ansi
    self.call_win32(command, params)
  File "/usr/local/lib/python3.4/dist-packages/colorama/ansitowin32.py", line 212, in call_win32
    winterm.erase_screen(params[0], on_stderr=self.on_stderr)
AttributeError: 'NoneType' object has no attribute 'erase_screen'

Throws error if settings.json file is missing

I installed the required packages and started up the code by going python .\main.py and got prawcore.exceptions.ResponseException: received 401 HTTP response error. I think the issue, before asking for my client_id and client_password, it tried to login to reddit. Here's the console log, I hope you can fix the error.

Love from bulk-downloader-for-reddit.

Do you accept contributions?

I was thinking about helping you out with the front-end.

What do you say?

RMD is only downloading first page of user submitted posts

I'm almost certain the post limit on Reddit is 1000, yet RMD only finds either 100 or 200 max source posts when downloading a user name.

There are no issues downloading a 1000 posts using other options. The "bug"(??) only occurs when using the "A User's Submission and/or Comment History" setting.

All results are not shown

I am not getting all post when downloading. According to Reddit Search page, the limitation of the search is 1000 results.s RedditDownloader using the default Reddit search results? If so, that explains why I am not getting all post downloaded.

The UI never loads

I run the command to start the app and it opens the chrome browser but the page is just blank. I downloaded a fresh version from the repo and did the same and had the same thing. Even with a fresh settings file. Is there an issue with my python or something?

Write Guide for Github on the optional parameters

I need to document the optional parameters this program can accept, in more detail than is available on the command line.

Will handle this once I've finalized the initial flags.

OFF TOPIC: Quick question.. about Twitter

Quick question.. about Twitter - hope u don't mind?

Does anyone know similar app like this but for Twitter?

RMD not downloading images from single-image imgur pages

Getting the following error when RMD tries to download from imgur if its not a direct image link or an album:

 URL: https://imgur.com/nw33ENa
 Checking handler: imgur
         Imgur Error: URL must be a valid Imgur Album
 Checking handler: github
 Checking handler: reddit
 Checking handler: ytdl
         YTDL :: ERROR: No sources found for video nw33ENa. Maybe an image?
 Checking handler: newspaper
         "Newspaper" Generic handler failed. Configuration object being passed incorrectly as title or source_url! Please verify `Article`s __init__() fn.
 !No handlers were able to accept this URL.

It should be noted that I'm running an earlier version so if this was fixed in a later version, let me know what specific changes I need to make to the code.

Download logic seems to expect Windows?

(To get this running I quickly updated my 2FA code in the setting.json, saved, and ran the script)

[parker@inspiron15 RedditDownloader]$ python main.py 

====================================
    Reddit Media Downloader 2.0
====================================
    (By ShadowMoose @ Github)

Loading all settings from file.
Loaded Source:  Source1
Authenticating via OAuth...
Authenticated as [parkerlreed]

Downloading from Source: Source1
Element loading complete.

Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Traceback (most recent call last):
  File "main.py", line 228, in <module>
    p.run()
  File "main.py", line 124, in run
    self.processor.run()
  File "./classes/processing/elementprocessor.py", line 40, in run
    self.redraw(clear, q)
  File "./classes/processing/elementprocessor.py", line 92, in redraw
    print(out.rstrip(), end='')
  File "/usr/lib/python3.6/site-packages/colorama/ansitowin32.py", line 40, in write
    self.__convertor.write(text)
  File "/usr/lib/python3.6/site-packages/colorama/ansitowin32.py", line 141, in write
    self.write_and_convert(text)
  File "/usr/lib/python3.6/site-packages/colorama/ansitowin32.py", line 167, in write_and_convert
    self.convert_ansi(*match.groups())
  File "/usr/lib/python3.6/site-packages/colorama/ansitowin32.py", line 181, in convert_ansi
    self.call_win32(command, params)
  File "/usr/lib/python3.6/site-packages/colorama/ansitowin32.py", line 212, in call_win32
    winterm.erase_screen(params[0], on_stderr=self.on_stderr)
AttributeError: 'NoneType' object has no attribute 'erase_screen'

[Feature] Argument to select specific 'source' to download

Either add an id to each 'source' on the json or the possibility to download the chosen one by alias

eg. python main.py -s 3 ( --source 3)
eg. python main.py -a "aww_pics" ( --alias "aww_pics")

Thanks a lot

Headless support

I went through the process of setting up RedditDownloader on a Raspberry Pi and hit a few roadblocks. I run it headless without an X server so browser auth was a bit of an issue. I'll just go over it here so others who had the same issues can find a way to set the app up. Firstly want to preface this is an absolutely stellar tool and I'm glad to have it working and love the work done on it. I can see that some issues I found probably won't be easily solvable.

The first issue was that I can't use localhost as a webserver as it will prevent any remote connections from accessing it. That's fine, I just changed it to my local network IP of the Pi in settings.json.

I found that port 7505 conflicts with OpenVPN's management interface, which is unfortunate, so I switched to port 7506. However, the current React OAuth launch logic doesn't support using any other port (just got a blank page) so I disabled OpenVPN just for setup.

Having launched the webserver and navigated to the site, I attempted auth but got some issues from Reddit regarding an invalid URL. It turns out that changing localhost to your local IP causes the redirect_uri parameter to mismatch against RMD's reddit app redirect_uri setting (which is set up for localhost).

This meant that I had to re-use an old developer app I created and change its redirect_uri to my local IP and change the client_key to my app's client key in settings.json. With this I was able to obtain a 302 redirect from Reddit containing the authorization code.

From here, the second step of OAuth was attempted but because I had changed the client key to my own app, and RMD doesn't include a client secret, reddit returned a 401 Unauthorized response. So instead I completed the second step manually via Postman with the client key, client secret, authorization code, state and redirect URI parameters to retrieve a refresh token.

Finally, after plugging in the refresh token into settings.json, further requests were failing as the client secret was still needed. So I modified classes/static/praw_wrapper.py and classes/static/settings.py to accept a new key in settings.json defined as auth.rmd_client_secret which would allow a client secret to be specified and sent through with each request. This allowed full authentication to complete successfully.

From this, I learnt that increasing headless support is going to be very challenging due to the redirect_uri parameter not being manipulatable from the app itself. Much of the work was required as I was using my own developer app to run RMD due to the control required over the redirect_uri parameter.

Thinking through the process now, it would be far more robust to simply set up RMD on another machine with a browser, then move settings.json over to the headless browser. I'm aware rclone supports this kind of behaviour and suggests it in the wizard. Perhaps an idea could be to suggest this to the user through setup. Given they've thought this through and implemented it, it may be the best solution. I don't think reverting to the old RMD behaviour where the user is required to create their own app is sustainable and I appreciate the move away from it.

Long paths on Windows

RedditDownloader fails to download any posts that would result in a file being created that has a full path of over 260 characters in length on Windows 7.

While this issue is entirely Microsoft's fault, I would still appreciate a workaround. Either some way of shortening names or using APIs that allow the use of long paths, whatever works.

Couldn't find anything in documentation on how to download text only.

I presume that could be done with filters.
There is a 'type' option in the web ui dropdown which seems to be the most appropriate tool for this kind of filtering, but I couldn't find anything in docs about how it works.
Any advice?

I apologize if I missed it.

Transform Handlers into better Class module

Now that we're beyond the "get it working" stage;

All handler objects should be encapsulated into a "Handler" class module, to supply more generic functionality down the road. This shouldn't significantly impact the main logic, but will enable generic functionality - such as a generic file download function.

Error generating user agent on wizard.

I wasn't able to update through the app, so I deleted everything and downloaded again.
Here's what I got:

Traceback (most recent call last): File "main.py", line 175, in <module> p = Scraper(settings, custom_settings) File "main.py", line 66, in __init__ self.settings = Settings(settings_file, can_save=(c_settings is None), can_load=(not args.test) ) File "./classes\settings.py", line 42, in __init__ wizard.run(file) File "./classes/wizards\wizard.py", line 70, in run "user_agent": "RMD-"+random.random(), TypeError: must be str, not float

Option to grab mp4 instead gif

Option to grab mp4 or webm instead of gif file.

Clean up main reddit class

The whole main loop was written for personal use initially, and now that it's live this code is entirely sub-par.
Split the main logic into a better class module, or at least clean it up a bit.

Build Tests

Implement build testing, probably through TravisCI, once major structure changes are complete.

Shouldn't be very difficult to generate some dummy data and make sure it passes tests.

Automated downloading

Is it possible to check for new submissions in subreddits every x (adjustable) minutes and download those new submissions automatically?

Thank you.

gfycat.com should use the youtube-dl handler to get the webm

Hi, first of thanks for your work, this is by far the best Reddit downloader.

At the moment gfycat.com links are getting downloaded as rather bad quality GIFs.

With youtube-dl it is possible to download the original .webm. It works directly with the non-hotlink links.

Example links:
https://gfycat.com/kindsardonicblesbok
https://giant.gfycat.com/KindSardonicBlesbok.webm

Greetings

Main method entry point

There should be a Main method implemented, in order to allow the user to override settings by passing their own custom params in-line rather than needing to build a settings file.

Expand the custom params to enable features such as less-verbose logging, custom output directory, custom output format, etc.

Is there any way to run Reddit Media Downloader purely from the command line?

I want to be able to run this without opening the GUI. Is this possible?

When I run python3 main.py -skip_pauses, it appears I still have to click Start Downloading in the GUI.

RMD not downloading posts >2000 in any subreddit

Describe the bug
RMD does not discover all posts in a subreddit, limiting to amounts between 1800-2000 posts, and doesn't find anymore posts.

Environment Info (please complete the following information):

OS: Archlinux
RMD Version: 3.0

Automatic Handler detection & updating

With the current method of loading Handlers it's dead-simple to first check the existing handler.py files against a list online of updated & new ones.

Since the main logic won't often need to change once finalized, this would allow dynamic updating of sorts, with large potential for down the road.

Once #3 is taken care of, this will probably be the next step.

Error during setup after Authentication

After going through the setup, it said: Authenticated as [username]

And then the following error:

Traceback (most recent call last): File "main.py", line 175, in <module> p = Scraper(settings, custom_settings) File "main.py", line 66, in __init__ self.settings = Settings(settings_file, can_save=(c_settings is None), can_load=(not args.test) ) File "./classes\settings.py", line 42, in __init__ wizard.run(file) File "./classes/wizards\wizard.py", line 69, in run "user_agent": "RMD-"+random.random(), TypeError: must be str, not float

After running python main.py again after this error:

Using file values. Authenticating via OAuth... Traceback (most recent call last): File "main.py", line 175, in <module> p = Scraper(settings, custom_settings) File "main.py", line 95, in __init__ reddit.login() File "./classes\reddit.py", line 36, in login _user = _reddit.user.me() File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\praw\models\user.py", line 99, in me user_data = self._reddit.get(API_PATH['me']) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\praw\reddit.py", line 408, in get data = self.request('GET', path, params=params) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\praw\reddit.py", line 534, in request params=params) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\sessions.py", line 185, in request params=params, url=url) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\sessions.py", line 116, in _request_with_retries data, files, json, method, params, retries, url) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\sessions.py", line 101, in _make_request params=params) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\rate_limit.py", line 35, in call kwargs['headers'] = set_header_callback() File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\sessions.py", line 145, in _set_header_callback self._authorizer.refresh() File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\auth.py", line 328, in refresh password=self._password) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\auth.py", line 138, in _request_token response = self._authenticator._post(url, **data) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\auth.py", line 31, in _post raise ResponseException(response) prawcore.exceptions.ResponseException: received 401 HTTP response

Thanks

PushShift doesn't work

It appears PushShift has been depreciated. The Reddit tools on its website no longer work.

SyntaxError - OS X

HI, I'm hitting a snag over here.

"main.py", line 252
print( ("\t%3d:%-"+padding_len+"s -> ") % (i, name) , end='')
^
SyntaxError: invalid syntax

I'm running OS X 10.13.2, running python 2.7

Many thanks in advance for any assistance!

Add some form of output manifest

Some programs may want to piggyback on the file structure this creates (say, a personal browser or something).
We should dump a full JSON manifest of whatever appropriate information about the post -and that posts' files - that we can assemble. Consider including and updating with each run things like comment #'s, Upvotes, and other metrics.

This likely goes hand-in-hand with #1

[Feature] Filter to download only from specific sites.

eg. Download only from imgur and ignore all the others.

I think that can be accomplished using some regex.

[Feature] Reddit Saved Posts with Gold

As you may know, reddit allows you to only see 1000 saved posts. But if you buy gold, this goes to 1000 per subreddit. I used your program to download the 1000 posts then learned of the limit and bought gold to be able to download more images. I unsaved the first 1000 posts.

Now the problem is, normally saved posts are accessible from "www.reddit.com/user/USERNAME/saved", but when you have gold, this section is still empty (in my case at least, because I unsaved the first 1000 posts). Now, I've to go to "www.reddit.com/user/USERNAME/saved?sr=SUBREDDITNAME" and this again allows me to see 1000 saved posts PER subreddit. But I can't seem to access this using your program.

I don't know if reddit API allows you to access saved posts like this, so I was thinking if you could also add a new source option, one that allows you to add LINKS from which it would download the posts. So I could give links, lets say "www.reddit.com/user/vargas/saved?sr=aww" and get posts from each subreddit one by one.

I would really appreciate it, thanks!

new ways to sort

One of my sources is a user's submissions. The downloads are sorted based on the subreddit they were posted to. The ability to sort these into a separate folder would be cool.

For example if my source is u/tom and he posts in subreddits r/gaming and r/math, then his posts would be divided into the two folders gaming and math. Is it feasible to store all of his posts into a separate folder titled tom?

Thanks in advance

Build process needs custom oAuth params

Builds keep failing because the TravisCI IPs are over-used for the Github API.
This can be fixed by authenticating through a client.

https://developer.github.com/v3/#oauth2-keysecret

i.redd.it files images not downloading

Hey there. Just wanted to double check: Is it me, or do images ULed to i.redd.it fail to download? I'm using the webui version and successfully getting downloads that are hosted on imgur, but not i.redd.it. Thank you!

Exceptions with installing requirements (RedditDounloader)

First of all thanks for putting together this script! This is exactly what I am looking to use for a school project.

When I ran the pip install -r requirements.txt command I get the following output towards the bottom of the list:

**Exception:
Traceback (most recent call last):
File "c:\program files (x86)\python36-32\lib\site-packages\pip\basecommand.py", line 215, in main
status = self.run(options, args)
File "c:\program files (x86)\python36-32\lib\site-packages\pip\commands\install.py", line 342, in run
prefix=options.prefix_path,
File "c:\program files (x86)\python36-32\lib\site-packages\pip\req\req_set.py", line 784, in install
kwargs
File "c:\program files (x86)\python36-32\lib\site-packages\pip\req\req_install.py", line 851, in install
self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
File "c:\program files (x86)\python36-32\lib\site-packages\pip\req\req_install.py", line 1064, in move_wheel_files
isolated=self.isolated,
File "c:\program files (x86)\python36-32\lib\site-packages\pip\wheel.py", line 345, in move_wheel_files
clobber(source, lib_dir, True)
File "c:\program files (x86)\python36-32\lib\site-packages\pip\wheel.py", line 316, in clobber
ensure_dir(destdir)
File "c:\program files (x86)\python36-32\lib\site-packages\pip\utils_init_.py", line 83, in ensure_dir
os.makedirs(path)
File "c:\program files (x86)\python36-32\lib\os.py", line 220, in makedirs
mkdir(name, mode)
PermissionError: [WinError 5] Access is denied: 'c:\program files (x86)\python36-32\Lib\site-packages\certifi'

All of this text is in red and the main.py script will not run because it can't locate a module.
An ideas on what is causing the requirements to not install properly?
thanks, Steve

Authorize an Account doesn't do anything

Hi all. Fresh install on a Ubuntu 18.04 system. Erased my prior settings and manifest files.
On first run, the web interface opens up and I go to "settings" tab, then click on "Authorize an account" and nothing seems to happen. The account doesn't seem to authorize because subsequent downloading doesn't find anything to download.

Thanks.

Syntax Error: Invalid Syntax

I am on Mac OSX High Sierra and downloaded and unzipped the folder. I ran your command "pip install -r requirements.txt" after downloading Python. At first it didnt work so i ran "sudo easy_install-3.7 pip". it worked. But then when I tried running "python main.py" it gives me
"File "main.py", line 201
print(("\t%3d:%-" + padding_len + "s -> ") % (i, name), end='')
^
SyntaxError: invalid syntax"

Grahical User Interface

Would you consider making a grahical user interface for RMD?

Can't run the script at all

I did pip install -r requirements.txt but still get this:

PS C:\Users\Ali\git\@shadowmoose\RedditDownloader> python .\main.py
Traceback (most recent call last):
  File ".\main.py", line 56, in <module>
    from classes.webserver import eelwrapper
  File "C:\Users\Ali\git\@shadowmoose\RedditDownloader\classes\webserver\eelwrapper.py", line 1, in <module>
    import eel
ModuleNotFoundError: No module named 'eel'

Hangs on download

Well everything was working. I now get the following output on stderr and the program hangs:

Downloading from Source: default-downloader
Exception in thread Handler - 3:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 42, in run
self.process_ele(item)
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 95, in process_ele
manifest.insert_post(reddit_element) # Update Manifest with completed ele.
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 101, in insert_post
direct_insert_post(ele['id'], ele['author'], ele['source_alias'], ele['subreddit'], ele['title'], ele['type'], ele['files'], ele['parent'], ele['body'])
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 90, in direct_insert_post
(_id, author, source_alias, subreddit, title, _type, parent, body)
sqlite3.OperationalError: table posts has no column named parent

Exception in thread Handler - 5:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 42, in run
self.process_ele(item)
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 95, in process_ele
manifest.insert_post(reddit_element) # Update Manifest with completed ele.
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 101, in insert_post
direct_insert_post(ele['id'], ele['author'], ele['source_alias'], ele['subreddit'], ele['title'], ele['type'], ele['files'], ele['parent'], ele['body'])
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 90, in direct_insert_post
(_id, author, source_alias, subreddit, title, _type, parent, body)
sqlite3.OperationalError: table posts has no column named parent

Exception in thread Handler - 4:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 42, in run
self.process_ele(item)
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 95, in process_ele
manifest.insert_post(reddit_element) # Update Manifest with completed ele.
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 101, in insert_post
direct_insert_post(ele['id'], ele['author'], ele['source_alias'], ele['subreddit'], ele['title'], ele['type'], ele['files'], ele['parent'], ele['body'])
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 90, in direct_insert_post
(_id, author, source_alias, subreddit, title, _type, parent, body)
sqlite3.OperationalError: table posts has no column named parent

Exception in thread Handler - 1:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 42, in run
self.process_ele(item)
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 95, in process_ele
manifest.insert_post(reddit_element) # Update Manifest with completed ele.
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 101, in insert_post
direct_insert_post(ele['id'], ele['author'], ele['source_alias'], ele['subreddit'], ele['title'], ele['type'], ele['files'], ele['parent'], ele['body'])
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 90, in direct_insert_post
(_id, author, source_alias, subreddit, title, _type, parent, body)
sqlite3.OperationalError: table posts has no column named parent

Exception in thread Handler - 2:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 42, in run
self.process_ele(item)
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 95, in process_ele
manifest.insert_post(reddit_element) # Update Manifest with completed ele.
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 101, in insert_post
direct_insert_post(ele['id'], ele['author'], ele['source_alias'], ele['subreddit'], ele['title'], ele['type'], ele['files'], ele['parent'], ele['body'])
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 90, in direct_insert_post
(_id, author, source_alias, subreddit, title, _type, parent, body)
sqlite3.OperationalError: table posts has no column named parent``

Filters created through the web gui are incorrect

Hi,
I wanted to set up a few filters through the web gui but the filters created are wrong and result in no files getting downloaded (because they match no filter).

For example creating a "Title matches: 123" filter results in the creation of the following line in the settings.json:
"title": "123"

However, the line should read:
"title.match": "123"

The same thing happens with score: minimum (results in "score": when it should be "score.min":) and I would assume other filters as well.