Comments (9)
I was working on a similar issue.
Can you please try a smoke test with this branch? https://github.com/taganaka/polipus/tree/proxy_no_cache
A connection is the refreshed correctly after 3 attempts
New parameters has been added for a more fine grain http timeouts controls:
# HTTP open connection timeout in seconds
:open_timeout => 10,
# Mark a connection as staled after connection_max_hits request
:connection_max_hits => nil
Let me know how it goes
from polipus.
Hey Francesco,
thanks for the response. Just tested it with your branch. It seems the thread is still hanging. I diagnosed it, it must be in one of the following lines:
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/2.0.0/net/protocol.rb:155:in `select'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/2.0.0/net/protocol.rb:155:in `rescue in rbuf_fill'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/2.0.0/net/protocol.rb:152:in `rbuf_fill'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/2.0.0/net/protocol.rb:134:in `readuntil'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/2.0.0/net/protocol.rb:144:in `readline'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/2.0.0/net/http/response.rb:39:in `read_status_line'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/2.0.0/net/http/response.rb:28:in `read_new'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/2.0.0/net/http.rb:1406:in `block in transport_request'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/2.0.0/net/http.rb:1403:in `catch'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/2.0.0/net/http.rb:1403:in `transport_request'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/2.0.0/net/http.rb:1376:in `request'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/rest-client-1.6.7/lib/restclient/net_http_ext.rb:51:in `request'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/bundler/gems/polipus-3a8e7de1a245/lib/polipus/http.rb:167:in `get_response'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/bundler/gems/polipus-3a8e7de1a245/lib/polipus/http.rb:141:in `get'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/bundler/gems/polipus-3a8e7de1a245/lib/polipus/http.rb:33:in `fetch_pages'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/bundler/gems/polipus-3a8e7de1a245/lib/polipus.rb:188:in `block (3 levels) in takeover'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/redis-queue-0.0.3/lib/redis/queue.rb:56:in `block in process'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/redis-queue-0.0.3/lib/redis/queue.rb:54:in `loop'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/gems/redis-queue-0.0.3/lib/redis/queue.rb:54:in `process'
/home/deployer/.rbenv/versions/2.0.0-p353/lib/ruby/gems/2.0.0/bundler/gems/polipus-3a8e7de1a245/lib/polipus.rb:163:in `block (2 levels) in takeover'
from polipus.
Hi @hendricius
Can't reproduce even after weeks of running
Is that the full backtrace?
from polipus.
Hi,
yep. The problem happens when we are crawling some websites that block us after giving a timeout. Some of them also just return a 500 server error. It seems the crawler continuously tries to request the URL again and hangs in there.
-hendrik
from polipus.
Update:
I think it is a problem with the http library of ruby. It seems not to be threadsafe.
We are getting this randomly every 3-4 weeks. If I find a solution I will update here.
The issue is here:
https://github.com/ruby/ruby/blob/ruby_2_1/lib/net/http.rb#L879
from polipus.
We once considered to use Excon instead. (see #37 (comment), item 5)
Would this help?
from polipus.
@tmaier eventually that would work. I guess the best would be to just allow adding your own http library that should be used. What do you think?
I read from a few people that they are having issues with threadsafety and the default ruby http library.
from polipus.
@hendricius @tmaier definitively on our todo list.
from polipus.
I think we should go for: https://github.com/lostisland/faraday
That will make things a lot easier as people can just wire their own library.
from polipus.
Related Issues (20)
- #queue_overflow_adapter and #overflow_adapter; Same thing? HOT 1
- Robots.txt option HOT 3
- Change logging format HOT 1
- Internet connection lost; Page still stored and processed HOT 6
- Show code coverage and results of quality analysis
- Gzip decoded body not used anywhere HOT 1
- Fails when response["Set-Cookie"] is nil HOT 3
- invalid byte sequence in US-ASCII (ArgumentError)
- Whitelist start urls? HOT 1
- Kill s3 entirely, use Fog, yo!
- Cannot install on JRuby 1.7.13. Error with bson_ext-1.9.2 HOT 9
- SocketError could mean, domain is gone or no internet connection
- Cannot use with mongoid ~> 4.0.0 HOT 5
- RethinkDB Storage HOT 2
- Anchor links converted to %23 causing 404 errors
- Support for other charsets than UTF-8 HOT 1
- [Question] Using MongoDB as 'cache' aside of Redis
- Support for headless crawling
- Make it work with Mongo 2.x HOT 2
- Unicode pages does not work anymore on 0.5.0 HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polipus.