fletchowns / awfulutils Goto Github PK
View Code? Open in Web Editor NEWA set of utilities for the Something Awful forums
License: Other
A set of utilities for the Something Awful forums
License: Other
From evol262:
AwfulClient.userinfo is a huge mess of shit that should be in UserInfo, which should inherit from AwfulClient and set all that in init. AwfulClient.userinfo can return UserInfo(userid) which sets it all
Everything you're doing with "contacts_elem.find('dt'..." could (and arguably should, so it's testable) be done from a private helper method that takes **kwargs if necessary to pass extra shit in. How many times is "elem.find(...).get_text()" in there?
Same for "[tag.extract() for tag in soup.findAll(...)]"
And in __process_paginators
On SA these are powered by JavaScript, but the thread export strips out all the JavaScript. Need to figure out how to handle spoiler tags.
The script is designed for Python 3, it should specify this in the shebang line
There is actually json support for user profiles, just by appending &json=1
to the url (for example). This should be easier and more reliable than scraping the HTML manually. The only potential issue I see is that the avatar and title text are stored in the same field, so that would still have to parsed out if desired.
Currently in the thread export, quoted images will appear full sized instead of as thumbnails
I managed to get a new error
Traceback (most recent call last):
File "/usr/local/bin/awful_export_thread.py", line 33, in <module>
awful_client.export_thread(args.threadid)
File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 86, in export_thread
thread_export.save()
File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 167, in save
data = future.result()
File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 201, in __save_page
downloaded_images_count = self.__process_images(page_soup, page_number)
File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 293, in __process_images
with open(output_filename, 'wb') as output_file, self.opener.open(original_src) as response:
File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/local/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.7/urllib/request.py", line 1378, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/local/lib/python3.7/urllib/request.py", line 1350, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/usr/local/lib/python3.7/http/client.py", line 1262, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.7/http/client.py", line 1273, in _send_request
self.putrequest(method, url, **skips)
File "/usr/local/lib/python3.7/http/client.py", line 1116, in putrequest
self._validate_path(url)
File "/usr/local/lib/python3.7/http/client.py", line 1207, in _validate_path
raise InvalidURL(f"URL can't contain control characters. {url!r} "
http.client.InvalidURL: URL can't contain control characters. '/de28e6e57e891eb66aa0d111bc570c552d86bdac/michael cera- awkward 5.jpg' (found at least ' ')
It seems like a simple misparse, though?
There's no timeouts set on any of the requests, and by default the requests library will wait indefinitely. Need to add a configurable timeout value.
We should add the ability to export private messages
Per feedback from evol262, tests should use mocked objects
Occationally, while grabbing pages, it'll throw an error to stderr which looks something like this:
Starting page 32/453
--- Logging error ---
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 166, in save
data = future.result()
File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 201, in __save_page
downloaded_images_count = self.__process_images(page_soup, page_number)
File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 306, in __process_images
with open(output_filename, 'wb') as output_file, self.opener.open(original_href) as response:
File "/usr/local/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/local/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/local/lib/python3.7/urllib/request.py", line 563, in error
result = self._call_chain(*args)
File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.7/urllib/request.py", line 755, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/local/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/local/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/local/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
msg = self.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
return fmt.format(record)
File "/usr/local/lib/python3.7/logging/__init__.py", line 608, in format
record.message = record.getMessage()
File "/usr/local/lib/python3.7/logging/__init__.py", line 369, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/usr/local/bin/awful_export_thread.py", line 32, in <module>
awful_client.export_thread(args.threadid)
File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 85, in export_thread
thread_export.save()
File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 168, in save
logger.exception('Error saving page %d' % page_number, e)
Message: 'Error saving page 32'
Arguments: (<HTTPError 404: 'Not Found'>,)
I've snipped out all the other threads that were running at the same time, but lemme know if you need a full log file, as I have that available.
As far as I can tell, it seems to coincide with those files ending up being 0 bytes big, so it's definitely something that needs a little TLC, if it's to be used for archiving.
img.waffleimages.com seems to be long gone, which is unfortunate as it had a huge amount of image uploads for Something Awful forums. We used to be able to find these images @ 46.59.2.17 but that no longer seems to be up either.
Don't have a good solution for this unless somebody has another copy of these pictures somewhere.
Reported on the forums by Takes No Damage
I got it installed with the pip command, but then realized I was still on Python 2.7 or whatever. Upgraded to 3.8 and tried running the install command again but the files still just show up in .local/lib/python2.7 or similar. What do I need to do to reinstall under Python 3.8?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.