Giter Site home page Giter Site logo

Comments (24)

remagio avatar remagio commented on July 17, 2024

Addendum: Archive.py & Hydrate python process exit immediately when it happens

Feature request: does it require to implement an error.log to get trace of this kind of error when running on background without using 'screen' or 'tmux'?

from twarc.

edsu avatar edsu commented on July 17, 2024

So are you seeing this error when using utils/archive.py ?

from twarc.

edsu avatar edsu commented on July 17, 2024

Can you upgrade to v0.3.1 and see if that fixes your hydrate problem? It did look like the exception handling I added previously wasn't around the post that is only used by --hydrate.

20cb45a

from twarc.

remagio avatar remagio commented on July 17, 2024

Thx @edsu I'm launching an hydrate now. I'll udpate you asap.

x archive.py I don't have a saved trace of it, but it looks the same. The ending part of the hydrate trace for sure.

from twarc.

remagio avatar remagio commented on July 17, 2024

Same error still come up, if it helps it raise up always when starting again the hydrate after a "rate limit exceeded" wait.

from twarc.

edsu avatar edsu commented on July 17, 2024

Do you have the stack trace?

from twarc.

remagio avatar remagio commented on July 17, 2024
2015-07-04 21:22:16,451 WARNING rate limit exceeded: sleeping 833.548175097 secs
Traceback (most recent call last):
  File "/usr/local/bin/twarc.py", line 4, in <module>
    __import__('pkg_resources').run_script('twarc==0.3.1', 'twarc.py')
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 729, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1649, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.1-py2.7.egg/EGG-INFO/scripts/twarc.py", line 340, in <module>

  File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.1-py2.7.egg/EGG-INFO/scripts/twarc.py", line 109, in main

  File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.1-py2.7.egg/EGG-INFO/scripts/twarc.py", line 298, in hydrate

  File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.1-py2.7.egg/EGG-INFO/scripts/twarc.py", line 172, in new_f

  File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.1-py2.7.egg/EGG-INFO/scripts/twarc.py", line 324, in post

  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 507, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 464, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 370, in send
    timeout=timeout
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 544, in urlopen
    body=body, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 372, in _make_request
    httplib_response = conn.getresponse(buffering=True)
  File "/usr/lib/python2.7/httplib.py", line 1034, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 407, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
    line = self.fp.readline()
  File "/usr/lib/python2.7/socket.py", line 447, in readline
    data = self._sock.recv(self._rbufsize)
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/contrib/pyopenssl.py", line 188, in recv
    data = self.connection.recv(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pyOpenSSL-0.14-py2.7.egg/OpenSSL/SSL.py", line 995, in recv
    self._raise_ssl_error(self._ssl, result)
  File "/usr/local/lib/python2.7/dist-packages/pyOpenSSL-0.14-py2.7.egg/OpenSSL/SSL.py", line 862, in _raise_ssl_error
    raise SysCallError(errno, errorcode[errno])
OpenSSL.SSL.SysCallError: (104, 'ECONNRESET')

from twarc.

edsu avatar edsu commented on July 17, 2024

Oh interesting. I'm just noticing now that you are getting a OpenSSL.SSL.SysCallError not a requests.exceptions.ConnectionError ; I guess I can catch that too.

For some context https://github.com/kennethreitz/requests/issues/2543

It's a little bit tricky because pyOpenSSL is optionally installed in requests. I don't have it installed in my twarc environment and everything seems fine. So I guess I need to optionally handle OpenSSL.SSL.SysCallError somehow for those people that do have pyOpenSSL installed.

If you want to route around your problem for the moment you can remove pyOpenSSL from your twarc environment (easy to do if you are using a virtualenv) and try again.

from twarc.

edsu avatar edsu commented on July 17, 2024

@remagio what version of requests do you have installed? If you're not sure one way to find out is:

% python
Python 2.7.6 (default, Sep  9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> requests.__version__
'2.7.0

from twarc.

remagio avatar remagio commented on July 17, 2024

It's weird but requests version is '2.6.0', I think it's related with other dependencies not directly with requests. Does it call ssl via urllib remapping via urllib.request to use recent ssl lib ?
urllib.version = '1.17'
urllib3.version = '1.10.2'
I upgraded requests to 2.7 & urllib3 to 1.10.4 and launched again an hydrate to test.

from twarc.

edsu avatar edsu commented on July 17, 2024

Can you try to upgrade to latest requests (2.7.0) and test again?

from twarc.

remagio avatar remagio commented on July 17, 2024

Yes, I confirmed the upgrade in previous comment and hydrate is running!

from twarc.

remagio avatar remagio commented on July 17, 2024

@edsu in my experience only a few months ago I didn't get any problem in running multiple hydrate daily. But in your experience: does it mean that people could get in trouble with --hydrate if I upload current IDS sets to Archive.org?

from twarc.

edsu avatar edsu commented on July 17, 2024

Why don't you upload them so I can test with them? I used twarc to hydrate 15 million tweets with no problem a few months ago.

from twarc.

remagio avatar remagio commented on July 17, 2024

Waiting anyway for a solutions about ""this"" issue (hydrate->request->urlib3->pyopenssl<-Twitter), having to hydrate weekly something, I'll do.
When ready with a post describing it I'll upload the ids archive and I'll email you.

from twarc.

edsu avatar edsu commented on July 17, 2024

I believe if you remove pyOpenSSL from your python environment things will just work, if you are looking for a workaround.

from twarc.

edsu avatar edsu commented on July 17, 2024

@remagio would you be able to test this fix I have on the pyopenssl branch? It should catch these pyopenssl errors that aren't being wrapper properly by requests and urllib3.

from twarc.

remagio avatar remagio commented on July 17, 2024

@edsu I submitted the job, feedbacks asap!

from twarc.

edsu avatar edsu commented on July 17, 2024

Given the radio silence it sounds like it might be working longer?

from twarc.

remagio avatar remagio commented on July 17, 2024

Haha, not really there were some other nightly scheduled stuff I forgot to stop. Switched to a dedicated instance for testing b/c it require ~15-~18 hours to hydrate first part of IDS. Relaunched this late morning local time.

from twarc.

remagio avatar remagio commented on July 17, 2024

@edsu it works fine, almost with sets of 750k IDS.

from twarc.

edsu avatar edsu commented on July 17, 2024

Yay!

from twarc.

remagio avatar remagio commented on July 17, 2024

@edsu the branch you released for this issue worked fine till now. If you merge it to master you could close this issue too

from twarc.

edsu avatar edsu commented on July 17, 2024

Thanks for the reminder!

from twarc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.