Comments (7)
It's doing the right thing actually. IRIs (unicode-friendly URIs) use unicode normalization form KC to limit phishing. NFKC tends to do perceptual codepoint conversions, like converting '?' to '?'. The solution here is not to normalize the URI if this is causing a problem, or to instead normalize components piecemeal. "http://foo.com/blah%ef%bc%9f" and "http://foo.com/blah%3F" are considered equivalent.
from addressable.
This issue probably requires a check-in with the IETF URI mailing list before deciding one way or the other.
from addressable.
I understand that it's been a long time ago, but still wanted to check in to see what's up with this issue? We've hit this bug in a bit different context and are not sure how to deal with it. Any chance this going to be fixed?
from addressable.
Could you elaborate on the issue you're hitting? A test case would be awesome.
from addressable.
Actually, now I'm not sure if our issue is related to this one. Here is our problem:
irb(main):001:0> Addressable::URI.parse(PostRank::URI.unescape("http://foo.com/blah%ef%bc%9f"))
=> #<Addressable::URI:0x5648890 URI:http://foo.com/blah?>
irb(main):002:0> Addressable::URI.parse(PostRank::URI.unescape("http://foo.com/blah%ef%bc%9f")).normalize!
=> #<Addressable::URI:0x564ed08 URI:http://foo.com/blah%3F>
Normalize call screws up a perfectly valid (AFAIU) unicode symbol and replaces it with a latin1 question mark.
from addressable.
Some more context, %2E
is .
irb(main):038:0> CGI.unescapeURIComponent "%2E"
=> "."
Addressable::URI.parse("/%2E/").normalize.to_str.should == "/%2E/"
Not sure why this should be true? If you want to compare URIs, shouldn't you normalize both before comparing?
Hmm, from https://www.rfc-editor.org/rfc/rfc3986#section-2.3
Characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include uppercase and lowercase
letters, decimal digits, hyphen, period, underscore, and tilde.unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
URIs that differ in the replacement of an unreserved character with
its corresponding percent-encoded US-ASCII octet are equivalent: they
identify the same resource. However, URI comparison implementations
do not always perform normalization prior to comparison (see Section 6).
For consistency, percent-encoded octets in the ranges of ALPHA
(%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
underscore (%5F), or tilde (%7E) should not be created by URI
producers and, when found in a URI, should be decoded to their
corresponding unreserved characters by URI normalizers.
Does this mean that Addressable::URI.parse("/%2E/")
should be turned into Addressable::URI.parse("/./")
directly at #parse
?
Normalization removes the dot and the trailing slash
irb(main):042:0> Addressable::URI.parse("/%2E/").normalize.to_s
=> "/"
irb(main):044:0> Addressable::URI.parse("/./").normalize.to_s
=> "/"
from addressable.
Does this mean that
Addressable::URI.parse("/%2E/")
should be turned intoAddressable::URI.parse("/./")
directly at#parse
?
That would go against what's suggested in #477
from addressable.
Related Issues (20)
- Improve pure ruby IDNA implementation to match browsers behavior (IDNA2008 and UTS#46) HOT 3
- Equivalent of `URI.regexp(schemes)`? HOT 4
- Crypto mining
- undefined method `to_str' for :id:Symbol (NoMethodError) in 2.8.2 HOT 8
- Template expansion does not work with symbolized hashes in 2.8.1 HOT 1
- Update to 2.8.2 break test env HOT 1
- Any version after 2.8.1 causes errors in our test suite coming from addressable. HOT 8
- Drop support for Ruby 2.2 (and more?) HOT 3
- Disallow backtick in host HOT 1
- Normalize errors when trying to run a simple url normalize HOT 4
- Unsafe concurrent Hash access HOT 9
- k
- feed:http: crashes servers HOT 11
- Valid domain not parsing HOT 1
- Improve release flow HOT 7
- Addressable::URI.escape method does not escape & properly as path param HOT 1
- "CWE-798 - Hardcoded credentials" in Amazon Inspector from uri_spec.rb
- Allow Public Suffix 6
- Stop testing on `macos-11` HOT 2
- Downgrade the dependency for public_suffix HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from addressable.