Giter Site home page Giter Site logo

Comments (33)

mtarnovan avatar mtarnovan commented on September 3, 2024 5

A bit off-topic, but why does strip_tags convert special chars to html entities - & to &, quotes to " etc? This is not the documented behaviour, which just states:

Strips all HTML tags from the html, including comments.

from rails-html-sanitizer.

lacco avatar lacco commented on September 3, 2024 5

Not sure if this refers to @mtarnovan or @mattt416 problem, but strip_tags is calling loofag's text method which encodes special chars by default:

Loofah.fragment("H&M").text
#=> "H&M"
Loofah.fragment("H&M").text(encode_special_chars: false)
#=> "H&M"

Is this something that should be added to https://github.com/rails/rails-html-sanitizer/blob/master/lib/rails/html/sanitizer.rb#L25 ? Not sure about security implications....

from rails-html-sanitizer.

kaspth avatar kaspth commented on September 3, 2024

The internal parsing is done by Nokogiri, so the version affect your results.

Here's a test using Nokogiri 1.6.5, the code is equivalent to strip_tags:

Rails::Html::FullSanitizer.new.sanitize('&') # => &

What Nokogiri version are you using?

from rails-html-sanitizer.

kaspth avatar kaspth commented on September 3, 2024

Oh, and thanks for filing ❤️

from rails-html-sanitizer.

mattt416 avatar mattt416 commented on September 3, 2024

Hi @kaspth,

I can't seem to replicate this via the console with rails-deprecated_sanitizer commented in my Gemfile:

irb(main):001:0> helper.strip_tags('&')
=> "&"
irb(main):002:0> Rails::Html::FullSanitizer.new.sanitize('&')
=> "&"
irb(main):003:0>

According to my Gemfile.lock, I am using nokogiri 1.6.5.

--Matt

from rails-html-sanitizer.

kaspth avatar kaspth commented on September 3, 2024

Ok, try leaving your bundle out of it for now. Can you try requiring Rails Html Sanitizer in irb and then running the code?

from rails-html-sanitizer.

mattt416 avatar mattt416 commented on September 3, 2024

Hi @kaspth,

Apologies for the delay. I just spun up a new Ubuntu 14.10 VM and installed ruby (2.1.2p95), rails 4.2.0 (which pulled in nokogiri 1.6.5 somewhere along the way), etc. to rule out my local environment being the cause here. Unfortunately, I still see the correct behaviour when popping into irb and doing require 'rails-html-sanitizer', but when I create a generic view w/ <%= strip_tags('&') %> in it I still see &amp;amp; in the page source.

--Matt

from rails-html-sanitizer.

kaspth avatar kaspth commented on September 3, 2024

Hold on a second, when you're viewing page source you see it escaped, but what does it look like when you view it on the page?

Signs like '&' should be escaped, be '&', in the source code.

Kasper

Den 15/01/2015 kl. 09.14 skrev Matt Thompson [email protected]:

Hi @kaspth,

Apologies for the delay. I just spun up a new Ubuntu 14.10 VM and installed ruby (2.1.2p95), rails 4.2.0 (which pulled in nokogiri 1.6.5 somewhere along the way), etc. to rule out my local environment being the cause here. Unfortunately, I still see the same thing when popping into irb and doing require 'rails-html-sanitizer', but when I create a generic view w/ <%= strip_tags('&') %>' in it I still see&` in the page source.

--Matt


Reply to this email directly or view it on GitHub.

from rails-html-sanitizer.

mattt416 avatar mattt416 commented on September 3, 2024

Hi @kaspth,

If I drop this into a template:

<p>This is a test <%= strip_tags('&') %></p>

I see this in the page source:

<p>This is a test &amp;amp;</p>

... and this on the page itself:

This is a test &amp;

Hope that helps. :)

--Matt

from rails-html-sanitizer.

kaspth avatar kaspth commented on September 3, 2024

Does the double escaping happen on the Linux machine you set up as well?

Kasper

Den 15/01/2015 kl. 09.34 skrev Matt Thompson [email protected]:

Hi @kaspth,

If I drop this into a template:

This is a test <%= strip_tags('&') %>

I see this in the page source:

This is a test &amp;

... and this on the page itself:

This is a test &

Hope that helps. :)

--Matt


Reply to this email directly or view it on GitHub.

from rails-html-sanitizer.

mattt416 avatar mattt416 commented on September 3, 2024

Hi @kaspth,

I see the same behaviour on all. When I reported this bug I was seeing this issue on my dev laptop (Mac OS X) and production (Ubuntu) both running ruby 1.9.3. I then created a test Ubuntu VM (ruby 2.1.2) and created a dummy rails app to eliminate it being something in my app itself. The issue appears the same across all three.

--Matt

from rails-html-sanitizer.

kommen avatar kommen commented on September 3, 2024

@mtarnovan this is what rails/rails#18527 was about, which got closed as a duplicate of this. I've noted there what I use a workaround for now.

from rails-html-sanitizer.

mtarnovan avatar mtarnovan commented on September 3, 2024

@kommen Thanks. If I understood the issues correctly they don't seem to be duplicates.

from rails-html-sanitizer.

kaspth avatar kaspth commented on September 3, 2024

We're closer to a solution on #31, so I'll close this in favor of that. Thanks for your help and sorry it took so long ❤️

from rails-html-sanitizer.

mattt416 avatar mattt416 commented on September 3, 2024

thanks @kaspth!

from rails-html-sanitizer.

mahemoff avatar mahemoff commented on September 3, 2024

Just to be clear, does #31 fix ampersand encoding? It seems to from the description, but the problem is stated only in terms of newlines.

from rails-html-sanitizer.

lacco avatar lacco commented on September 3, 2024

As far as I see #31 is still open. But as soon as encode_special_chars: false is used internally, it will also fix ampersand encoding, at least it did work for me.

from rails-html-sanitizer.

optimum-dulopin avatar optimum-dulopin commented on September 3, 2024

hi, I still have this bug. How to get rid of it ? I need to strip html tag, but to also to keep & and normal < and > which are not html tag.
thanks

from rails-html-sanitizer.

lulalalalistia avatar lulalalalistia commented on September 3, 2024

Hi @kaspth and @lacco
After testing, #31 does not actually fix this issue. Here is me using Rails 4.2.5.2
and ampersand still gets escaped.
Can you confirm and reopen this issue please? :P

Loading development environment (Rails 4.2.5.2)
irb(main):001:0> ActionController::Base.helpers.strip_tags("test\r\n\r\ntest")
=> "test\r\n\r\ntest"
irb(main):002:0> ActionController::Base.helpers.strip_tags("a & b")
=> "a &amp; b"
irb(main):003:0>

from rails-html-sanitizer.

kaspth avatar kaspth commented on September 3, 2024

We don't bump rails-html-sanitizer versions on Rails upgrades. You're required to upgrade that yourselves. So are you on the latest version of this library?

from rails-html-sanitizer.

lulalala avatar lulalala commented on September 3, 2024

@kaspth I am on 1.0.3, so should be the latest right?

Loading development environment (Rails 4.2.5.2)
irb(main):001:0> ActionController::Base.helpers.strip_tags("test\r\n\r\ntest")
=> "test\r\n\r\ntest"
irb(main):002:0> ActionController::Base.helpers.strip_tags("a & b")
=> "a &amp; b"
irb(main):003:0> Rails::Html::Sanitizer::VERSION
=> "1.0.3"

from rails-html-sanitizer.

kaspth avatar kaspth commented on September 3, 2024

@lulalala why are you calling this on Action Controller?

It works fine on Action View:

kaspers-macbook-pro:~ kasperhansen$ irb -r action_view
irb(main):001:0> include ActionView::Helpers::SanitizeHelper
=> Object
irb(main):002:0> strip_tags('a & b')
=> "a & b"
irb(main):003:0> 

However, it could also have to do with your Nokogiri version or even your Libxml version.

from rails-html-sanitizer.

lulalala avatar lulalala commented on September 3, 2024

Because the other issue used ActionController::Base.helpers.strip_tags("test\r\n\r\ntest") as the example :)

irb(main):001:0> include ActionView::Helpers::SanitizeHelper
=> Object
irb(main):002:0> strip_tags('a & b')
=> "a &amp; b"
irb(main):003:0> Nokogiri::VERSION
=> "1.6.7.2"

My libxml is:

xsltproc --version
Using libxml 20900, libxslt 10128 and libexslt 817

I believe those would be libxml2 2.9.0, libxslt 1.1.28, and libexslt 0.8.17.

What are yours?

from rails-html-sanitizer.

kaspth avatar kaspth commented on September 3, 2024

I remember now. This was removed because it introduced a security issue. Read more here: 49dfc15

cc @rafaelfranca

from rails-html-sanitizer.

mattt416 avatar mattt416 commented on September 3, 2024

Hi @kaspth , so what exactly is the solution here? :) I thought this problem went away but now I'm seeing this &amp;amp; again. Any info appreciated!

--Matt

from rails-html-sanitizer.

kaspth avatar kaspth commented on September 3, 2024

Nothing more to say than my comment just above did :)

from rails-html-sanitizer.

lulalala avatar lulalala commented on September 3, 2024

So would you consider reopen this issue, so more people can discussion how to deal with it? Because otherwise this method remains unusable.

from rails-html-sanitizer.

kaspth avatar kaspth commented on September 3, 2024

Sure

from rails-html-sanitizer.

rafaelfranca avatar rafaelfranca commented on September 3, 2024

there is no way to fix it. It will generate escaped HTML that will be displayed correctly in the HTML page. If you escaping &amp; it will be &amp;amp;. If you are escaping & it will be &amp;.

from rails-html-sanitizer.

lulalala avatar lulalala commented on September 3, 2024

@rafaelfranca then can it be marked as html_safe??

from rails-html-sanitizer.

dimitrypo avatar dimitrypo commented on September 3, 2024

That's looks like a major issue!

I've thought as per best practice a function does one thing and one thing only and used strip_tags to remove tags from user input BEFORE saving it into DB

Now I have DB full of stuff like "Bank Name" = "JPMorgan Chase &amp; Co"

Moreover, if you do

link_to(strip_tags("JPMorgan Chase & Co"))

You'll get link to JPMorgan Chase &amp; Co

How come it's not a major bug

IMO it should either JUST strip tags (without escaping it, without replacing new lines with <br> or I don't know replacing * ... * with <b> ... </b>)
Or perhaps to deprecate it and call new function strip_tags_and_escape

Or AT LEAST we need to put warnings in docs so people will know that it can be used only in narrow use case (only direct output into HTML, not even in link_to etc)

from rails-html-sanitizer.

flavorjones avatar flavorjones commented on September 3, 2024

@dutgcom Lots to unpack here, I'll try to help.

First: this issue has been closed for almost six years. My advice is that you should open a new issue if you'd like to discuss something. I'll be locking this issue after this response, but I'd be happy to continue the conversation in a new issue!

This library is intended to be used in the view layer at page render time, and not by Active Record to sanitize database records. Though I can point you at Loofah::ActiveRecord which is intended precisely for that.

Finally, though, if you've html-sanitized a string before putting it into your database, I'm not sure why you're surprised to see HTML entities present in the string. Replacing < with &lt;, as an example, is an important part of HTML sanitization.

If you want finer-grained control over sanitization (say, stripping but not replacing entities) then please take a look at Loofah which provides a feature that does this (see https://github.com/flavorjones/loofah/blob/main/lib/loofah/instance_methods.rb#L84-L87).

from rails-html-sanitizer.

flavorjones avatar flavorjones commented on September 3, 2024

I'll update the README today with some of this info as well.

from rails-html-sanitizer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.