Giter Site home page Giter Site logo

awfulutils's People

Contributors

fletchowns avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

awfulutils's Issues

Refactor UserInfo

From evol262:

AwfulClient.userinfo is a huge mess of shit that should be in UserInfo, which should inherit from AwfulClient and set all that in init. AwfulClient.userinfo can return UserInfo(userid) which sets it all

Everything you're doing with "contacts_elem.find('dt'..." could (and arguably should, so it's testable) be done from a private helper method that takes **kwargs if necessary to pass extra shit in. How many times is "elem.find(...).get_text()" in there?

Same for "[tag.extract() for tag in soup.findAll(...)]"

And in __process_paginators

Improved handling of spoiler tags

On SA these are powered by JavaScript, but the thread export strips out all the JavaScript. Need to figure out how to handle spoiler tags.

Use JSON for user profiles

There is actually json support for user profiles, just by appending &json=1 to the url (for example). This should be easier and more reliable than scraping the HTML manually. The only potential issue I see is that the avatar and title text are stored in the same field, so that would still have to parsed out if desired.

URL can't contain control characters

I managed to get a new error

Traceback (most recent call last):
  File "/usr/local/bin/awful_export_thread.py", line 33, in <module>
    awful_client.export_thread(args.threadid)
  File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 86, in export_thread
    thread_export.save()
  File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 167, in save
    data = future.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 201, in __save_page
    downloaded_images_count = self.__process_images(page_soup, page_number)
  File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 293, in __process_images
    with open(output_filename, 'wb') as output_file, self.opener.open(original_src) as response:
  File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/local/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.7/urllib/request.py", line 1378, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/local/lib/python3.7/urllib/request.py", line 1350, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/usr/local/lib/python3.7/http/client.py", line 1262, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.7/http/client.py", line 1273, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/local/lib/python3.7/http/client.py", line 1116, in putrequest
    self._validate_path(url)
  File "/usr/local/lib/python3.7/http/client.py", line 1207, in _validate_path
    raise InvalidURL(f"URL can't contain control characters. {url!r} "
http.client.InvalidURL: URL can't contain control characters. '/de28e6e57e891eb66aa0d111bc570c552d86bdac/michael cera- awkward 5.jpg' (found at least ' ')

It seems like a simple misparse, though?

Thread export can hang indefinitely

There's no timeouts set on any of the requests, and by default the requests library will wait indefinitely. Need to add a configurable timeout value.

Error saving page..

Occationally, while grabbing pages, it'll throw an error to stderr which looks something like this:

Starting page 32/453
--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 166, in save
    data = future.result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 201, in __save_page
    downloaded_images_count = self.__process_images(page_soup, page_number)
  File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 306, in __process_images
    with open(output_filename, 'wb') as output_file, self.opener.open(original_href) as response:
  File "/usr/local/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/local/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/local/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/local/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 608, in format
    record.message = record.getMessage()
  File "/usr/local/lib/python3.7/logging/__init__.py", line 369, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "/usr/local/bin/awful_export_thread.py", line 32, in <module>
    awful_client.export_thread(args.threadid)
  File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 85, in export_thread
    thread_export.save()
  File "/usr/local/lib/python3.7/site-packages/awfulutils/awfulclient.py", line 168, in save
    logger.exception('Error saving page %d' % page_number, e)
Message: 'Error saving page 32'
Arguments: (<HTTPError 404: 'Not Found'>,)

I've snipped out all the other threads that were running at the same time, but lemme know if you need a full log file, as I have that available.

As far as I can tell, it seems to coincide with those files ending up being 0 bytes big, so it's definitely something that needs a little TLC, if it's to be used for archiving.

Images from img.waffleimages.com cannot be downloaded

img.waffleimages.com seems to be long gone, which is unfortunate as it had a huge amount of image uploads for Something Awful forums. We used to be able to find these images @ 46.59.2.17 but that no longer seems to be up either.

Don't have a good solution for this unless somebody has another copy of these pictures somewhere.

Installation instructions do not work if Python 2.7 is your system default

Reported on the forums by Takes No Damage

I got it installed with the pip command, but then realized I was still on Python 2.7 or whatever. Upgraded to 3.8 and tried running the install command again but the files still just show up in .local/lib/python2.7 or similar. What do I need to do to reinstall under Python 3.8?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.