Comments (24)
Addendum: Archive.py & Hydrate python process exit immediately when it happens
Feature request: does it require to implement an error.log to get trace of this kind of error when running on background without using 'screen' or 'tmux'?
from twarc.
So are you seeing this error when using utils/archive.py ?
from twarc.
Can you upgrade to v0.3.1 and see if that fixes your hydrate problem? It did look like the exception handling I added previously wasn't around the post that is only used by --hydrate.
from twarc.
Thx @edsu I'm launching an hydrate now. I'll udpate you asap.
x archive.py I don't have a saved trace of it, but it looks the same. The ending part of the hydrate trace for sure.
from twarc.
Same error still come up, if it helps it raise up always when starting again the hydrate after a "rate limit exceeded" wait.
from twarc.
Do you have the stack trace?
from twarc.
2015-07-04 21:22:16,451 WARNING rate limit exceeded: sleeping 833.548175097 secs
Traceback (most recent call last):
File "/usr/local/bin/twarc.py", line 4, in <module>
__import__('pkg_resources').run_script('twarc==0.3.1', 'twarc.py')
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 729, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1649, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.1-py2.7.egg/EGG-INFO/scripts/twarc.py", line 340, in <module>
File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.1-py2.7.egg/EGG-INFO/scripts/twarc.py", line 109, in main
File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.1-py2.7.egg/EGG-INFO/scripts/twarc.py", line 298, in hydrate
File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.1-py2.7.egg/EGG-INFO/scripts/twarc.py", line 172, in new_f
File "/usr/local/lib/python2.7/dist-packages/twarc-0.3.1-py2.7.egg/EGG-INFO/scripts/twarc.py", line 324, in post
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 507, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 464, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 370, in send
timeout=timeout
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 544, in urlopen
body=body, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 372, in _make_request
httplib_response = conn.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1034, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.7/socket.py", line 447, in readline
data = self._sock.recv(self._rbufsize)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/contrib/pyopenssl.py", line 188, in recv
data = self.connection.recv(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pyOpenSSL-0.14-py2.7.egg/OpenSSL/SSL.py", line 995, in recv
self._raise_ssl_error(self._ssl, result)
File "/usr/local/lib/python2.7/dist-packages/pyOpenSSL-0.14-py2.7.egg/OpenSSL/SSL.py", line 862, in _raise_ssl_error
raise SysCallError(errno, errorcode[errno])
OpenSSL.SSL.SysCallError: (104, 'ECONNRESET')
from twarc.
Oh interesting. I'm just noticing now that you are getting a OpenSSL.SSL.SysCallError not a requests.exceptions.ConnectionError ; I guess I can catch that too.
For some context https://github.com/kennethreitz/requests/issues/2543
It's a little bit tricky because pyOpenSSL is optionally installed in requests. I don't have it installed in my twarc environment and everything seems fine. So I guess I need to optionally handle OpenSSL.SSL.SysCallError somehow for those people that do have pyOpenSSL installed.
If you want to route around your problem for the moment you can remove pyOpenSSL from your twarc environment (easy to do if you are using a virtualenv) and try again.
from twarc.
@remagio what version of requests do you have installed? If you're not sure one way to find out is:
% python
Python 2.7.6 (default, Sep 9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> requests.__version__
'2.7.0
from twarc.
It's weird but requests version is '2.6.0', I think it's related with other dependencies not directly with requests. Does it call ssl via urllib remapping via urllib.request to use recent ssl lib ?
urllib.version = '1.17'
urllib3.version = '1.10.2'
I upgraded requests to 2.7 & urllib3 to 1.10.4 and launched again an hydrate to test.
from twarc.
Can you try to upgrade to latest requests (2.7.0) and test again?
from twarc.
Yes, I confirmed the upgrade in previous comment and hydrate is running!
from twarc.
@edsu in my experience only a few months ago I didn't get any problem in running multiple hydrate daily. But in your experience: does it mean that people could get in trouble with --hydrate if I upload current IDS sets to Archive.org?
from twarc.
Why don't you upload them so I can test with them? I used twarc to hydrate 15 million tweets with no problem a few months ago.
from twarc.
Waiting anyway for a solutions about ""this"" issue (hydrate->request->urlib3->pyopenssl<-Twitter), having to hydrate weekly something, I'll do.
When ready with a post describing it I'll upload the ids archive and I'll email you.
from twarc.
I believe if you remove pyOpenSSL from your python environment things will just work, if you are looking for a workaround.
from twarc.
@remagio would you be able to test this fix I have on the pyopenssl branch? It should catch these pyopenssl errors that aren't being wrapper properly by requests and urllib3.
from twarc.
@edsu I submitted the job, feedbacks asap!
from twarc.
Given the radio silence it sounds like it might be working longer?
from twarc.
Haha, not really there were some other nightly scheduled stuff I forgot to stop. Switched to a dedicated instance for testing b/c it require ~15-~18 hours to hydrate first part of IDS. Relaunched this late morning local time.
from twarc.
@edsu it works fine, almost with sets of 750k IDS.
from twarc.
Yay!
from twarc.
@edsu the branch you released for this issue worked fine till now. If you merge it to master you could close this issue too
from twarc.
Thanks for the reminder!
from twarc.
Related Issues (20)
- How to use Academic API to search archives with language filters? HOT 2
- How to extract the author user name of a retweet? HOT 1
- Twarc API V2 function to get conversations HOT 1
- Is it possible to use "next_token"? HOT 3
- Missing `variants` field in `media.fields`
- Add `include_ext_is_blue_verified=True`
- CLI: Allow to differentiate between 404 and connection timeout HOT 4
- Have twarc2 400 error while searching phrases. HOT 1
- CLI Errors fail to show for invalid input file
- CLI Errors fail to show for invalid input file HOT 1
- twarc2 timeline --no-context-annotations not pulling 500 tweets HOT 2
- Adding query parameters to search terms imported via .txt? HOT 1
- Is it possible to convert twarc jsnol files to twarc2 jsonl files? HOT 2
- Impossible to specify No expansions at all in CLI.
- Implement new compliance Streams
- Cannot find user_verified_type in the tweet metadata HOT 18
- twarc2 hydrate HOT 2
- twarc2 hydrate freezes at 20% HOT 3
- Client forbidden HOT 3
- Forbidden error after few tweets hydrated (403) HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from twarc.