nschrading / redditdataextractor Goto Github PK

The reddit Data Extractor is a cross-platform GUI tool for downloading almost any content posted to reddit. Downloads from specific users, specific subreddits, users by subreddit, and with filters on the content is supported. Some intelligence is built in to attempt to avoid downloading duplicate external content.

License: GNU General Public License v3.0

Python 100.00%

redditdataextractor's People

Contributors

Stargazers

Watchers

redditdataextractor's Issues

Is it possible to add the date/time field in the extracted data?

Just wondering if the posted time and date for each reddit thread discussion and comments can be extracted too.

Thanks！

Not able to download

I am trying to download files and get the error "to attempt to redownload this file, uncheck "restrict retrieved submissions to creation dates after the last downloaded submission" in the settings". I have this setting unchecked but it will not download the files.

Allow saving by the poster in sub folders

is it possible to store by subreddit/author/ rather than labeling photos as the reddit id.

for some reason some of the images are in the root of the subreddit folder and others are in folders..

Fails to run, when installed on OSX.

Installed all the required bits, and it doesn't seem to work:

natasha:john(130:355)$ pwd
/Users/john/workarea/bin/redditDataExtractor-master
natasha:john(131:356)$ python3.4 main.py
Traceback (most recent call last):
  File "main.py", line 125, in <module>
    main()
  File "main.py", line 98, in main
    rddtDataExtractor._r.http.validate_certs = 'RedditDataExtractor/cacert.pem'
AttributeError: 'Reddit' object has no attribute 'http'
natasha:john(132:357)$

If I comment out line 98 or main.py, it runs, but all the default lists fail with a message every subreddit fails to exist.

The subreddit movies does not exist. Remove from list?
With a Yes|No dialog popup for each reddit list, I am using the defaults list to test.

Also had problems getting it to run without a praw.ini that included the client_id, once I had that, it ran, but didn't read the client_id, and I needed to re-enter in the extractor.

Any ideas? I am not a python guy, so I am operating outside my talent envelope here. But I have successfully installed dozens of other python apps without too many issues, well a file locking issue on one I was able to fix, and commit. This one has me stumped.

natasha:john(138:363)$ uname -a
Darwin natasha.wrongcrowd.net 16.3.0 Darwin Kernel Version 16.3.0: Thu Nov 17 20:23:58 PST 2016; root:xnu-3789.31.2~1/RELEASE_X86_64 x86_64
natasha:john(139:364)$ python3.4 -V
Python 3.4.6
natasha:john(139:364)$ pip3.4 list
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
beautifulsoup4 (4.5.3)
pathlib (1.0.1)
pip (9.0.1)
praw (4.3.0)
prawcore (0.7.0)
requests (2.13.0)
setuptools (32.3.1)
update-checker (0.16)
youtube-dl (2015.7.4)
natasha:john(138:365)$ port installed | grep -E 'sip|pyqt'
  py-sip @4.19_0 (active)
  py27-sip @4.19_0 (active)
  py34-pyqt4 @4.12.0_0 (active)
  py34-sip @4.19_0 (active)
natasha:john(139:366)$

Anything I can do to gather more relevant info? The only runtime error I get when I comment out line 98 of main.py:

libpng warning: iCCP: known incorrect sRGB profile

Which would seem to me not relevant to the issues at hand.

Home folder hardcoded in the premade executable

I tried downloading the premade executable for Linux, and I'm getting the following error when trying to run it:
/exe.linux-x86_64-3.4$ ./redditDataExtractor /mnt/plaintext_data/Downloads/ubuntu/exe.linux-x86_64-3.4/library.zip/imp.py:32: PendingDeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses Traceback (most recent call last): File "/home/jschradi/anaconda3/lib/python3.4/site-packages/cx_Freeze/initscripts/Console.py", line 27, in <module> File "main.py", line 125, in <module> File "main.py", line 89, in main File "main.py", line 84, in loadState File "/home/jschradi/anaconda3/lib/python3.4/shelve.py", line 141, in close File "/home/jschradi/anaconda3/lib/python3.4/shelve.py", line 168, in sync File "/home/jschradi/anaconda3/lib/python3.4/dbm/dumb.py", line 113, in _commit File "/home/jschradi/anaconda3/lib/python3.4/dbm/dumb.py", line 257, in _chmod PermissionError: [Errno 1] Operation not permitted: 'RedditDataExtractor/saves/settings.db.dir' Exception ignored in: <bound method DbfilenameShelf.__del__ of <shelve.DbfilenameShelf object at 0x7fc0433c92b0>> Traceback (most recent call last): File "/home/jschradi/anaconda3/lib/python3.4/shelve.py", line 158, in __del__ File "/home/jschradi/anaconda3/lib/python3.4/shelve.py", line 141, in close File "/home/jschradi/anaconda3/lib/python3.4/shelve.py", line 168, in sync File "/home/jschradi/anaconda3/lib/python3.4/dbm/dumb.py", line 113, in _commit File "/home/jschradi/anaconda3/lib/python3.4/dbm/dumb.py", line 257, in _chmod PermissionError: [Errno 1] Operation not permitted: 'RedditDataExtractor/saves/settings.db.dir' Exception ignored in: <bound method _Database.close of <dbm.dumb._Database object at 0x7fc0433c92e8>> Traceback (most recent call last): File "/home/jschradi/anaconda3/lib/python3.4/dbm/dumb.py", line 250, in close File "/home/jschradi/anaconda3/lib/python3.4/dbm/dumb.py", line 113, in _commit File "/home/jschradi/anaconda3/lib/python3.4/dbm/dumb.py", line 257, in _chmod PermissionError: [Errno 1] Operation not permitted: 'RedditDataExtractor/saves/settings.db.dir'
Looks like a bunch of paths are hardcoded, and don't get ported to a different system.

Only allowed to download max 1000 posts?

In settings there is "Max Posts Retrieved in Subreddit Content Download[1-1000]", so if I want to download posts more than 1000? 1000 posts is far from enough to get data analysis.

Which setting will allow me to extract next 1001-2000, 2001-3000 etc posts? Is there an automatically mechansim to download ALL posts for one reddit topic?

Thanks!

could you add allow NSFW?

Unrelated issue

Hey there mate, the guy who created this beautiful tool and posted it on Github.
You're the reason why i signed to Github to learn how to code.
Just yesterday i couldn't sleep was thinking about this tool that i wanted to create that crawls a website gather links and download them, but i found yours the perfect one to start learning with, thing is im kind of a noob, new to this whole programing thing, i want to learn python.
I've installed python, installed pyqt4 that you provided, but the thing is on the line 24 it freezes.
So i want to change that to pyqt5 because i had so many problems with pyqt4.
if i do that will it affect the rest of the code.
Edit = you know! after a second note this shit is fucking nerve breaking mate, you fix a problem another one pops out from the middle of nowhere, install fucking sip, oh wait sip doesn't work it need qt, install qt oh shit qt needs cs, cs needs pv, pv need dc, dc nneds ls, and when you do all that shit the program still won't work at the end, excuse my language, i don't have time for this shit, ... i need my smoke..

Not working for some images

Using both the latest source version with the praw 3.5.0 downgrade, and the precompiled binaries for 64bit Linux Mint, some imgur images are not being downloaded properly. I keep getting a corrupted .gif file which can't be opened by any image viewer or vlc.

For example, for the the image in this post, it doesn't work (corrupted .gif is downloaded)

https://www.reddit.com/r/reddit_data_extractor/comments/3c53r1/imgur_gifv_page/

pre-made Linux executable crashes probably because a hardcoded home folder

@NSchrading , that's probably related to #12 . Trying to launch the pre-made Linux executable and it crashes:

exe.linux-x86_64-3.4$ /home/myusername/rde/exe.linux-x86_64-3.4/library.zip/imp.py:32: PendingDeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
Traceback (most recent call last):
  File "/home/jschradi/anaconda3/lib/python3.4/site-packages/cx_Freeze/initscripts/Console.py", line 27, in <module>
  File "main.py", line 24, in <module>
ImportError: cannot import name 'QApplication'

[5]-  Exit 1                  ./redditDataExtractor

Supported Downloader

Since this project is no longer supported, I would like to make everyone aware of a similar project that is still under active development and support.

https://github.com/MalloyDelacroix/DownloaderForReddit

The Downloader For Reddit has a very similar UI, with many of the features requested here, and is updated in many ways.

Sadly even with Pull Request #15 your RDE tool does not work - downloads empty .txt files (no comment body) and complains about AttributeError: '<class 'praw.objects.Submission'>' has no attribute 'comments'

@NSchrading Sadly even with Pull Request #15 your RDE tool does not work - downloads empty .txt files (no comment body) and complains about AttributeError: '<class 'praw.objects.Submission'>' has no attribute 'comments' . Please reply if you still care about your tool, I think I did like 90% of work but got stuck a bit

Videos not downloading

Videos seem to not download, instead "Uncheck Restrict retrieved submissions to creation dates after the last downloaded submission" error is logged; this is verified to be unchecked. Also, does this support redgifs.com hosted gifs and gifv as well?

Possible to archive everything?

I'm failing to find the option to archive absolutely everything, is it possible?

name of file

Is it possible to change the name of the file to the Title of the post?

extractor not validating any subreddits

Hi -

I'm hoping to download a specific subreddit's data (r/PoliticalDiscussion) but receive an error message each time I try to use the extractor. Even the default subreddits produce a message "The subreddit does not exist." Am I doing something wrong?

Thanks!

Create flatpak and publish on Flathub

Would be nice to be able to install this app directly via Flathub. https://flathub.org/

Top does not get top of all time

When choosing top as the sorting option, it only get the top post of less than a year. Is there a way to expand that to top from "all time"? I would like to archive a few now-dead sub-reddits and I don't want to get all the spam stuff and waste the 1000 post limit with "new".

nschrading / redditdataextractor Goto Github PK

redditdataextractor's People

Contributors

Stargazers

Watchers

Forkers

redditdataextractor's Issues

Recommend Projects

Recommend Topics

Recommend Org