Comments (3)
Thanks, hopefully they've fixed the ruby's uri implementation, but it was badly broken some years ago, since then i've used addressable as its been the most reliable, but would be good to add some options or even switch to ruby's uri if it is now true to the rfc.
from cobweb.
Do you know a good way to test the URI implementation in 2.1 against the RFC?
from cobweb.
Just closed sporkmonger/addressable#160 as "won't fix", but wanted to comment here because I suspect cobweb is actually misusing Addressable, given that this issue came up.
The uri.normalize
method's output should not generally be used directly to query a web server. Instead, you want to use it as more of a lookup key for caches or previously crawled URLs, etc. So you'd use uri
to query your web server and uri.normalize
to record the output the web server gives you back. But you'd never want to make a request to the web server with the output of uri.normalize
because it's an intentionally lossy operation that's primarily meant for lookups and equality testing (where both sides of the URI equality being tested get normalized and then compared). Probably 95+% of the time it'll work fine to do it the wrong way, but then every once in awhile it'll bite you.
Sorry about getting to this so late after the issue was opened.
from cobweb.
Related Issues (20)
- Encoding problems HOT 5
- Improve connection handling
- Standalone Crawler gives error for redis HOT 6
- `require': cannot load such file -- resque (LoadError) HOT 3
- Should it be possible to add "depth" in the data hash ? HOT 3
- Inbound links are not normalized when stored
- Code organization HOT 1
- Binary not installed HOT 1
- Redirect Limit causing crawl to stop
- LoadError with version 1.0.26 HOT 2
- error while installing cobweb-1.0.28.gem: Invalid argument @ rb_sysopen HOT 5
- Cobweb gem causes Rails app to run 10x slower HOT 9
- How can I start stop crawling website HOT 1
- Should depend on slop 3.6.0 or update cobweb bin HOT 1
- Error on first run HOT 2
- undefined method `banner' for main:Object (NoMethodError) on calling from command line HOT 1
- Falling into Crawl Traps HOT 10
- Feature request: Stop crawl at time HOT 3
- Error raised when there's a valid <base> tag in <head> HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cobweb.