Giter Site home page Giter Site logo

rails-html-sanitizer's Introduction

Rails HTML Sanitizers

This gem is responsible for sanitizing HTML fragments in Rails applications. Specifically, this is the set of sanitizers used to implement the Action View SanitizerHelper methods sanitize, sanitize_css, strip_tags and strip_links.

Rails HTML Sanitizer is only intended to be used with Rails applications. If you need similar functionality but aren't using Rails, consider using the underlying sanitization library Loofah directly.

Usage

Sanitizers

All sanitizers respond to sanitize, and are available in variants that use either HTML4 or HTML5 parsing, under the Rails::HTML4 and Rails::HTML5 namespaces, respectively.

NOTE: The HTML5 sanitizers are not supported on JRuby. Users may programmatically check for support by calling Rails::HTML::Sanitizer.html5_support?.

FullSanitizer

full_sanitizer = Rails::HTML5::FullSanitizer.new
full_sanitizer.sanitize("<b>Bold</b> no more!  <a href='more.html'>See more here</a>...")
# => Bold no more!  See more here...

or, if you insist on parsing the content as HTML4:

full_sanitizer = Rails::HTML4::FullSanitizer.new
full_sanitizer.sanitize("<b>Bold</b> no more!  <a href='more.html'>See more here</a>...")
# => Bold no more!  See more here...

HTML5 version:

LinkSanitizer

link_sanitizer = Rails::HTML5::LinkSanitizer.new
link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
# => Only the link text will be kept.

or, if you insist on parsing the content as HTML4:

link_sanitizer = Rails::HTML4::LinkSanitizer.new
link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
# => Only the link text will be kept.

SafeListSanitizer

This sanitizer is also available as an HTML4 variant, but for simplicity we'll document only the HTML5 variant below.

safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new

# sanitize via an extensive safe list of allowed elements
safe_list_sanitizer.sanitize(@article.body)

# sanitize only the supplied tags and attributes
safe_list_sanitizer.sanitize(@article.body, tags: %w(table tr td), attributes: %w(id class style))

# sanitize via a custom scrubber
safe_list_sanitizer.sanitize(@article.body, scrubber: ArticleScrubber.new)

# prune nodes from the tree instead of stripping tags and leaving inner content
safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new(prune: true)

# the sanitizer can also sanitize css
safe_list_sanitizer.sanitize_css('background-color: #000;')

Scrubbers

Scrubbers are objects responsible for removing nodes or attributes you don't want in your HTML document.

This gem includes two scrubbers Rails::HTML::PermitScrubber and Rails::HTML::TargetScrubber.

Rails::HTML::PermitScrubber

This scrubber allows you to permit only the tags and attributes you want.

scrubber = Rails::HTML::PermitScrubber.new
scrubber.tags = ['a']

html_fragment = Loofah.fragment('<a><img/ ></a>')
html_fragment.scrub!(scrubber)
html_fragment.to_s # => "<a></a>"

By default, inner content is left, but it can be removed as well.

scrubber = Rails::HTML::PermitScrubber.new
scrubber.tags = ['a']

html_fragment = Loofah.fragment('<a><span>text</span></a>')
html_fragment.scrub!(scrubber)
html_fragment.to_s # => "<a>text</a>"

scrubber = Rails::HTML::PermitScrubber.new(prune: true)
scrubber.tags = ['a']

html_fragment = Loofah.fragment('<a><span>text</span></a>')
html_fragment.scrub!(scrubber)
html_fragment.to_s # => "<a></a>"

Rails::HTML::TargetScrubber

Where PermitScrubber picks out tags and attributes to permit in sanitization, Rails::HTML::TargetScrubber targets them for removal. See https://github.com/flavorjones/loofah/blob/main/lib/loofah/html5/safelist.rb for the tag list.

Note: by default, it will scrub anything that is not part of the permitted tags from loofah HTML5::Scrub.allowed_element?.

scrubber = Rails::HTML::TargetScrubber.new
scrubber.tags = ['img']

html_fragment = Loofah.fragment('<a><img/ ></a>')
html_fragment.scrub!(scrubber)
html_fragment.to_s # => "<a></a>"

Similarly to PermitScrubber, nodes can be fully pruned.

scrubber = Rails::HTML::TargetScrubber.new
scrubber.tags = ['span']

html_fragment = Loofah.fragment('<a><span>text</span></a>')
html_fragment.scrub!(scrubber)
html_fragment.to_s # => "<a>text</a>"

scrubber = Rails::HTML::TargetScrubber.new(prune: true)
scrubber.tags = ['span']

html_fragment = Loofah.fragment('<a><span>text</span></a>')
html_fragment.scrub!(scrubber)
html_fragment.to_s # => "<a></a>"

Custom Scrubbers

You can also create custom scrubbers in your application if you want to.

class CommentScrubber < Rails::HTML::PermitScrubber
  def initialize
    super
    self.tags = %w( form script comment blockquote )
    self.attributes = %w( style )
  end

  def skip_node?(node)
    node.text?
  end
end

See Rails::HTML::PermitScrubber documentation to learn more about which methods can be overridden.

Custom Scrubber in a Rails app

Using the CommentScrubber from above, you can use this in a Rails view like so:

<%= sanitize @comment, scrubber: CommentScrubber.new %>

A note on HTML entities

Rails HTML sanitizers are intended to be used by the view layer, at page-render time. They are not intended to sanitize persisted strings that will be sanitized again at page-render time.

Proper HTML sanitization will replace some characters with HTML entities. For example, text containing a < character will be updated to contain &lt; to ensure that the markup is well-formed.

This is important to keep in mind because HTML entities will render improperly if they are sanitized twice.

A concrete example showing the problem that can arise

Imagine the user is asked to enter their employer's name, which will appear on their public profile page. Then imagine they enter JPMorgan Chase & Co..

If you sanitize this before persisting it in the database, the stored string will be JPMorgan Chase &amp; Co.

When the page is rendered, if this string is sanitized a second time by the view layer, the HTML will contain JPMorgan Chase &amp;amp; Co. which will render as "JPMorgan Chase &amp; Co.".

Another problem that can arise is rendering the sanitized string in a non-HTML context (for example, if it ends up being part of an SMS message). In this case, it may contain inappropriate HTML entities.

Suggested alternatives

You might simply choose to persist the untrusted string as-is (the raw input), and then ensure that the string will be properly sanitized by the view layer.

That raw string, if rendered in an non-HTML context (like SMS), must also be sanitized by a method appropriate for that context. You may wish to look into using Loofah or Sanitize to customize how this sanitization works, including omitting HTML entities in the final string.

If you really want to sanitize the string that's stored in your database, you may wish to look into Loofah::ActiveRecord rather than use the Rails HTML sanitizers.

A note on module names

In versions < 1.6, the only module defined by this library was Rails::Html. Starting in 1.6, we define three additional modules:

  • Rails::HTML for general functionality (replacing Rails::Html)
  • Rails::HTML4 containing sanitizers that parse content as HTML4
  • Rails::HTML5 containing sanitizers that parse content as HTML5 (if supported)

The following aliases are maintained for backwards compatibility:

  • Rails::Html points to Rails::HTML
  • Rails::HTML::FullSanitizer points to Rails::HTML4::FullSanitizer
  • Rails::HTML::LinkSanitizer points to Rails::HTML4::LinkSanitizer
  • Rails::HTML::SafeListSanitizer points to Rails::HTML4::SafeListSanitizer

Installation

Add this line to your application's Gemfile:

gem 'rails-html-sanitizer'

And then execute:

$ bundle

Or install it yourself as:

$ gem install rails-html-sanitizer

Support matrix

branch ruby support actively maintained security support
1.6.x >= 2.7 yes yes
1.5.x >= 2.5 no while Rails 6.1 is in security support
1.4.x >= 1.8.7 no no

Read more

Loofah is what underlies the sanitizers and scrubbers of rails-html-sanitizer.

The node argument passed to some methods in a custom scrubber is an instance of Nokogiri::XML::Node.

Contributing to Rails HTML Sanitizers

Rails HTML Sanitizers is work of many contributors. You're encouraged to submit pull requests, propose features and discuss issues.

See CONTRIBUTING.

Security reports

Trying to report a possible security vulnerability in this project? Please check out the Rails project's security policy for instructions.

License

Rails HTML Sanitizers is released under the MIT License.

rails-html-sanitizer's People

Contributors

akhilgkrishnan avatar amatsuda avatar chancancode avatar dependabot[bot] avatar dogweather avatar flavorjones avatar fschwahn avatar georgeclaghorn avatar gogainda avatar inopinatus avatar jbampton avatar juanitofatas avatar jweir avatar kaspth avatar kyoshidajp avatar m-nakamura145 avatar maclover7 avatar neoelit avatar nicolasleger avatar olleolleolle avatar orien avatar paul-mesnilgrente avatar pvalena avatar rafaelfranca avatar rwojnarowski avatar seyerian avatar tebs avatar tenderlove avatar trevorrjohn avatar yui-knk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rails-html-sanitizer's Issues

Data URI's get sanitized

When I sanitize a HTML string with an image whose src points to a data URI, its src attribute is removed (even when src is whitelisted):

unclean_html = "A test
<img src=\"http://placehold.it/400x300\">
<img src=''/>"
sanitizer = Rails::Html::WhiteListSanitizer.new
clean_html = sanitizer.sanitize(unclean_html, tags: %w{img}, attributes: %w{src})
clean_html
# => "A test\n    <img src=\"http://placehold.it/400x300\">\n    <img>"

I presume that this happens because of too strict JS-prevention measures (or the data-URI is just discarded because the sanitizer does not understand the protocol?).

rails-html-sanitizer 1.0.3 crashes with both ruby 2.3.0p0 & ruby 2.2.2p95

Hi,

I'm having an issue where rails-html-sanitizer seems to be causing a VM crash for both ruby 2.2.2p95 and ruby 2.3.0p0 when sending email via ActionMailer. I've reproduced this crash with Rails 4.2.5, 4.2.5.1 and 4.2.6.

In each case I can fix the crash by specifying rails-html-sanitizer 1.0.2 in our Gemfile. With everything else being equal (no other differences in the Gemfile.lock file) 1.0.3 will reliably cause a crash when sending email via ActionMailer and 1.0.2 will not. Our fix, for the moment, is to lock rails-html-sanitizer at version 1.0.2.

Any suggestions as to why 1.0.3 might be consistently causing or enabling Ruby VM Crashes when sending email?

I've attached two Ruby VM dumps from Apache's error log.
ruby-crash-2_3_0p0-apache_error.log.gz

ruby-crash-2_2_2p95-apache_error.log.gz

Sanitize method adds line breaks where they do not exist

Running Rails 4.2.0, I get weird behavior when running sanitize on the following HTML:

<div>
  <div>
    <input>
  </div><div></div>
</div>

The method seems to add a line break between the two inner divs, resulting in this:

<div>
  <div>
    <input>
  </div>
<div></div>
</div>

I noticed this because that extra \n broke one of my layouts (due to inline-block divs with width 50%. A single piece of whitespace causes wrapping).

The extra line break only happens when there's an input element (AKA an element that doesn't need a closing tag), so I assume it is not expected behavior. Don't know what could be causing this, but thought I'd point it out.

Environment-based sanitizer difference with open lt tags

On heroku:

Rails::Html::WhiteListSanitizer.new.sanitize("stuff < things")
=> "stuff "

Locally (all rails environments)

Rails::Html::WhiteListSanitizer.new.sanitize("stuff < things")
"stuff &lt; things"

I'd expect the local version to be correct, though I'm not sure what the developer intended.

We are using ruby 2.5.0 with the following locked dependencies shared across environments:
rails (4.2.10)
rails-html-sanitizer (1.0.3)
loofah (2.2.2)
crass (1.0.3)
nokogiri (1.8.2)

We all use macbooks on sierra or high sierra, it's the same in both.

It seems like the production case must affect more people than just us, so thought I'd report it. Please let me know if there's any other info I can provide, and thanks for all your hard work.

Issue w/ upgrade to rails 4.2.0

Apologies if I'm reporting this in the wrong place, or if I'm doing something incorrectly, but I noticed the following changed behaviour upon upgrading to rails 4.2.0. If I put this in an ERB template:

<p><%= strip_tags("&") %></p>

... and then view page source, I see:

<p>&amp;amp;</p>

If I add rails-deprecated_sanitizer to my Gemfile and try again, I see:

<p>&amp;</p>

Please let me know if I can provide any further information.

Thanks!

--Matt

Defining the Rails module prevents non-Rails use alongside libraries that check for that module (like ActiveRecord)

Some libraries use the conditional if defined?(Rails) to determine if the running app is a Rails one. If it finds the module exists, such libraries then go on to assume that methods like Rails.env are available.

The most prominent example we've seen is ActiveRecord, which uses the check in mutliple places. For example:

module ActiveRecord
  module ConnectionHandling
    RAILS_ENV   = -> { Rails.env if defined?(Rails) }
    DEFAULT_ENV = -> { RAILS_ENV.call || "default_env" }

Because rails-html-sanitizer defines a Rails module, the above will result in the exception undefined method 'env' for Rails:Module whenever using both rails-html-sanitizer and ActiveRecord together in a non-Rails app.

An older version of New Relic's monitoring gem also had this problem, but they solved it by checking for the specific constant they needed (Rails::VERSION) instead of just the module.

A possible workaround would be to manually define needed methods (like Rails#env) within the non-Rails app before ActiveRecord is loaded. Our use case isn't urgent, so we haven't gotten around to trying this yet.

A solution on rails-html-sanitizer's end would probably require changing the gem's entire namespace. Since that's not minor, I don't know if you'll actually want to change anything for this, especially if non-Rails support isn't a priority. But I thought I'd at least highlight it for anyone else trying to use the gem outside of Rails.

How to migrate to new sanitizer

Currently after upgrading gems in my app I see the following (with config.active_support.deprecation = :raise to see stack trace) :

How to set new sanitizer? I'm just calling "simple_format in the email".

  2) Comment creates email
     Failure/Error: = simple_format @enquiry.message
     
     ActionView::Template::Error:
       DEPRECATION WARNING: warning: white_list_sanitizer isdeprecated, please use safe_list_sanitizer instead. (called from _app_views_user_mailer_new_enquiry_html_slim___3814535579015863922_148706880 at /root/projects/platforma/app/views/user_mailer/new_enquiry.html.slim:4)
     # ./app/views/user_mailer/new_enquiry.html.slim:4:in `_app_views_user_mailer_new_enquiry_html_slim___3814535579015863922_148706880'
     # ./app/mailers/application_mailer.rb:16:in `mail'
     # ./app/mailers/user_mailer.rb:56:in `new_enquiry'
     # ./app/models/enquiry.rb:129:in `send_notification'
     # /usr/local/rvm/gems/ruby-2.6.1/gems/factory_bot-5.0.2/lib/factory_bot/evaluation.rb:18:in `create'
     # /usr/local/rvm/gems/ruby-2.6.1/gems/factory_bot-5.0.2/lib/factory_bot/strategy/create.rb:12:in `block in result'
     # /usr/local/rvm/gems/ruby-2.6.1/gems/factory_bot-5.0.2/lib/factory_bot/strategy/create.rb:9:in `tap'
     # /usr/local/rvm/gems/ruby-2.6.1/gems/factory_bot-5.0.2/lib/factory_bot/strategy/create.rb:9:in `result'
     # /usr/local/rvm/gems/ruby-2.6.1/gems/factory_bot-5.0.2/lib/factory_bot/factory.rb:43:in `run'
     # /usr/local/rvm/gems/ruby-2.6.1/gems/factory_bot-5.0.2/lib/factory_bot/factory_runner.rb:29:in `block in run'
     # /usr/local/rvm/gems/ruby-2.6.1/gems/factory_bot-5.0.2/lib/factory_bot/factory_runner.rb:28:in `run'
     # /usr/local/rvm/gems/ruby-2.6.1/gems/factory_bot-5.0.2/lib/factory_bot/strategy_syntax_method_registrar.rb:20:in `block in define_singular_strategy_method'
     # ./spec/models/comment_spec.rb:109:in `block (2 levels) in <main>'
     # ------------------
     # --- Caused by: ---
     # ActiveSupport::DeprecationException:
     #   DEPRECATION WARNING: warning: white_list_sanitizer isdeprecated, please use safe_list_sanitizer instead. (called from _app_views_user_mailer_new_enquiry_html_slim___3814535579015863922_148706880 at /root/projects/platforma/app/views/user_mailer/new_enquiry.html.slim:4)
     #   ./app/views/user_mailer/new_enquiry.html.slim:4:in `_app_views_user_mailer_new_enquiry_html_slim___3814535579015863922_148706880'

sanitize doesn't use sanitize_css for style tags

From SanitizeHelper about sanitize_css:

# Sanitizes a block of CSS code. Used by +sanitize+ when it comes across a style attribute.

sanitize never uses sanitize_css.

I haven't checked if there are any tests for this. I.e. using sanitize with some html with a style attribute.

What's the plan for JRuby?

I recently had a look into Loofahs (and by extension this gems) test failures on JRuby. While basic sanitization through Loofah / Nokogiri-J still appears to do what it should, such a massive amount of test failures might turn off potential users of Rails in a Java / JRuby environment and is imho a no-go for anything remotely security relevant.

Are there any plans to support sanitizers not based on Loofah through this gem? I'm asking because I recently came up with a sanitizer for JRuby that wraps the OWASP Java HTML Sanitizer which I think is at least at the moment a much better solution than relying on Loofah on top of the apparently somewhat flaky Java-Implementation of Nokogiri.

WhitelistSanitizer manipulating URLs

I've found that a WhitelistSanitizer instance will manipulate the values of an allowed attribute.

Rails::Html::WhiteListSanitizer.new.sanitize('<img src="https://example/$/example.jpg">', tags: %w(img), attributes: %w(src))
=> "<img src=\"https://example/%24/example.jpg\">"

The conversion of $ to %24 can cause some urls to 404. Is this intentional? Is there a way to configure it to leave the values of attributes as is?

Custom Scrubber documentation

Not an issue, just a question about the documentation. I know I can whitelist a group of tags like this:
<%= sanitize comment.content, tags: %w(b i) %>

Now if want to use a custom scrubber to do the exact same thing, it's not clear to me what that would look like. I'd like to use this in the views:
<%= sanitize comment.content, scrubber: CommentScrubber.new %>

... but what would CommentScrubber look like? I've tried a few different ways using the example in the 'Custom Scrubber' section of the README but nothing seems to work.

`sanitize` inserts unintended whitespace

By executing following test.rb file via rails runner test.rb,

test_str="<li><span>foo</span>bar</li>"

sanitizer = Rails::Html::WhiteListSanitizer.new

sanitized = sanitizer.sanitize(test_str)
puts `bash -c 'diff -u <(echo "#{test_str}") <(echo "#{sanitized}")'`

I got this output:

Running via Spring preloader in process 73791
--- /dev/fd/63	2017-04-26 17:58:44.000000000 +0900
+++ /dev/fd/62	2017-04-26 17:58:44.000000000 +0900
@@ -1 +1,2 @@
-<li><span>foo</span>bar</li>
+<li>
+<span>foo</span>bar</li>

Is this expected behavior?
Unintended new line breaks my html layout.

Encoding errors on ASCII-8BIT strings (eg: any string from the mysql adapter)

The sanitizer seems to have issues when its input is a string in ASCII-8BIT encoding:

irb(main):006:0* Rails::Html::WhiteListSanitizer.new.sanitize("tooth".encode('ASCII-8BIT'))
output error : unknown encoding ASCII-8BIT
=> ""
irb(main):007:0>

While ASCII-8BIT isn't the default encoding these days, it seems that strings coming from the mysql adapter (but not the mysql2 adapter) are always in ASCII-8BIT encoding, even when the table is using charset utf8:

irb(main):004:0> Day.connection.charset
=> "utf8"
irb(main):005:0> Day.last.notes.encoding
=> #<Encoding:ASCII-8BIT>

This means that using the sanitizer on any string from the database when using the mysql adapter will result in errors. I chased the error down to Nokogiri's NodeSet#to_s method, but wasn't sure what the right approach was for addressing the issue.

Switching to the mysql2 adapter makes the issue go away, since it produces all strings in UTF-8. However, folks who've been using the mysql gem (for legacy reasons or whatever) could run into headaches trying to upgrade to Rails 4.2 because of this (it hit me by way of the highlight method in ActionView::Helpers::TextHelper).

Tests failing with libxml2-2.9.3

When i update to libxml2-2.9.3, these tests fail:

SanitizersTest#test_strip_links_with_tags_in_tags
SanitizersTest#test_strip_nested_tags
SanitizersTest#test_should_sanitize_script_tag_with_multiple_open_brackets
SanitizersTest#test_strip_tags_with_many_open_quotes
SanitizersTest#test_strip_invalid_html

http://paste.fedoraproject.org/315402/17785145/

Tests do not fail with libxml2-2.9.2.
Maybe this is related to issue #44, but i cannot reproduce it with neither libxml2 libraries.
I am using rails 4.2.5.

Workaround for encode_special_chars

I'm trying to take an HTML fragment and output it in a non-HTML context (CSV, to be specific). As such, I want to "slurp out" the text unchanged but strip tags completely, which I used to be able to do but in 1.0.3 I now have to decode HTML entities outside the sanitizer.

Before 1.0.3, You could do:

Rails::Html::FullSanitizer.new.sanitize(%{"I would like," <a href="etc">John</a> said, "a Black & Tan."}, encode_special_chars: false)
=> %{"I would like," John said, "a Black & Tan."}

But now in 1.0.3, there is no way to strip tags without also HTML-encoding the special characters:

Rails::Html::FullSanitizer.new.sanitize(%{"I would like," <a href="etc">John</a> said, "a Black & Tan."}, encode_special_chars: false)
=> "\"I would like,\" John said, \"a Black &amp; Tan.\""

(that encode_special_characters is totally ignored without any warning is also concerning; a deprecation message or an error would be nice)

I understand that this was because we were un-escaping already-escaped special characters, but I don't understand why this means we now have no way to keep our not-escaped special characters not-escaped.

Deprication warnings with loofah 2.3.0

When using this with newest loofah 2.3.0 it gives this deprication warning:

rails-html-sanitizer-1.2.0/lib/rails/html/scrubbers.rb:148: warning: constant Loofah::HTML5::WhiteList is deprecated

Problem trying to whitelist rgb color within style attribute

I have a basic comment scrubber:

class CommentScrubber < Rails::Html::PermitScrubber
  def initialize
    super
    self.tags = %w( strong em u div span br h1 h2 h3 h4 ul ol li table thead tbody th tr td img hr a )
    self.attributes = %w( style colspan rowspan text-align class href target src href )
  end

  def skip_node?(node)
    node.text?
  end
end

The problem is if a comment is styled with a color using an rgb value, the color gets stripped out.

For example, this doesn't work:

<span style="color: rgb(226, 80, 65);">PRODUCTS</span>

But this does work:

<span style="color: red;">PRODUCTS</span>

Any suggestions on how to get the rgb color working properly?

removing HTML comments

I am having a hard time removing HTML comments using a custom scrubber. I would appreciate some help.

Following the documentation here is what I tried:

scrubber = Rails::Html::TargetScrubber.new
scrubber.tags = ['comment']
white_list_sanitizer = Rails::Html::WhiteListSanitizer.new
white_list_sanitizer.sanitize(@article.body, scrubber: scrubber)

However, the HTML comment is not removed in the output string. Am I missing something?
Thank you very much.

FullSantitizer de-escapes escaped HTML entities besides &, <, >

If you use FullSanitizer on a string containing an escaped &, <, or >, it remains escaped, as discussed here and added here:

> Rails::Html::FullSanitizer.new.sanitize("Read more &amp; &lt; &gt;")
=> "Read more &amp; &lt; &gt;"

(using version 1.03 with Ruby 2.3.4)

But if you use it on strings containing other escaped characters, they become de-escaped:

> Rails::Html::FullSanitizer.new.sanitize("Read more&hellip;")
=> "Read more…"
> Rails::Html::FullSanitizer.new.sanitize("Save&nbsp;8&euro; &amp; 5&cent;")
=> "Save 8€ &amp; 5¢"

Is this intended behavior? Thanks!

< (less than bracket) is stripped

I'm using the white list sanitizer, and it looks like it strips < even if it doesn't form a true html tag.

helper.sanitize("the event happened < 2 weeks ago")
=> "the event happened "

Is this expected behavior? Doesn't seem quite right.

Thanks in advance!

CDATA contents could be left behind

I was going through tests and changing expected values when I came across this one:

# Expected: "&lt;![CDATA[&lt;span&gt;neverending...]]&gt;"
# Actual: "neverending..."
assert_sanitized "<![CDATA[<span>neverending...", "&lt;![CDATA[&lt;span>neverending...]]>"

If we change the expected value to the one in the actual comment, would that leave us open to people having contents of a CDATA left behind?
And is that a potential security hole?

https://github.com/rafaelfranca/rails-html-sanitizer/blob/master/test/sanitizer_test.rb#L465-L471

Stripping of comments

I just ran into an issue where something was removed which was unexpected. I also found a test, but this test is rather confusing: test_strip_comments. It contains this assertion:

assert_equal "This is ", full_sanitize("This is <-- not\n a comment here.")

This assertion was different, but was changed 3 years ago, without explanation why this behaviour changed.

assert_equal "This is <-- not\n a comment here.", full_sanitize("This is <-- not\n a comment here.")

I think the old assertion is what is expected, not that everything after <-- is stripped.

This came up because I ran simple_format on user input, and everything after the user entered <- was removed.

Unexpected WhiteListSanitizer behavior when allowed_attributes is set

When sanitizing strings with WhiteListSanitizer, if neither allowed_attributes nor allowed_tags are set:

  • all HTML comments are removed
  • all <script> and <form> tags are removed, including their contents
> t = '<p id="harmless_p" class="important_p"><script>alert("pwned!");</script>I am totally harmless :)</p><form action="/malicious.php"><!--you are pwned, friend --><input type="submit"></form>'
> sanitize t
 => "<p id=\"harmless_p\" class=\"important_p\">I am totally harmless :)</p>"

However if allowed_attributes is set:

  • HTML comments are still removed
  • <script> tags are stripped, but their content remains in the document
  • <form> tags and their contents are unaffected (except for attributes not present in the allowed_attributes array, which are removed)
# in application.rb:
config.action_view.sanitized_allowed_attributes = ['id']

---
> t = '<p id="harmless_p" class="important_p"><script>alert("pwned!");</script>I am totally harmless :)</p><form action="/malicious.php"><!--you are pwned, friend --><input type="submit"></form>'
> sanitize t
 => "<p id=\"harmless_p\">alert(\"pwned!\");I am totally harmless :)</p><form><input></form>"

I think this is an undesirable behavior that will surprise users. If I set allowed_attributes, I expect it to affect only the behavior regarding attributes, and the behavior regarding tags to be unaffected.

I propose changing WhiteListSanitizer so that the stripping of <script> and <form> tags and their contents is done regardless of allowed_attributes and allowed_tags being set or not. If you agree I will submit a PR.

Allow "tel:" links

Hello,

Is it intentional that tel: links are not allowed? Seems like a harmless protocol that should be allowed

>> sanitize("<a href='http://123'>click</a>")
=> "<a href=\"http://123\">click</a>"
>> sanitize("<a href='tel:123'>click</a>")
=> "<a>click</a>"

As a workaround, I am currently using:

# config/initializers/sanitize.rb
Loofah::HTML5::WhiteList::ALLOWED_PROTOCOLS.add('tel')

As proposed by this StackOverflow answer

Sanitize removes negative value in style

Sanitize removes a style like "margin:-4px".
This test fails

input = '<p style="color: #000; margin: -9px;"></p>'
assert_equal input, white_list_sanitize(input, attributes: %w(style))

XSS vulnerability v1.2.0

bundle audit check --update

+ bundle audit check --update
Updating ruby-advisory-db ...
Skipping update
ruby-advisory-db: 287 advisories
Name: rails-html-sanitizer
Version: 1.2.0
Advisory: CVE-2015-7578
Criticality: Unknown
URL: https://groups.google.com/forum/#!topic/rubyonrails-security/uh--W4TDwmI
Title: Possible XSS vulnerability in rails-html-sanitizer
Solution: upgrade to ~> 1.0.3

Name: rails-html-sanitizer
Version: 1.2.0
Advisory: CVE-2015-7579
Criticality: Unknown
URL: https://groups.google.com/forum/#!topic/rubyonrails-security/OU9ugTZcbjc
Title: XSS vulnerability in rails-html-sanitizer
Solution: upgrade to ~> 1.0.3

Name: rails-html-sanitizer
Version: 1.2.0
Advisory: CVE-2015-7580
Criticality: Unknown
URL: https://groups.google.com/forum/#!topic/rubyonrails-security/uh--W4TDwmI
Title: Possible XSS vulnerability in rails-html-sanitizer
Solution: upgrade to ~> 1.0.3

Vulnerabilities found!

Unfinished open tag being escaped

Hi there

Moving from the built in rails 3 sanitizer to the one supplied with 4.2 yields a difference in the way string featuring the less than character are escaped. An example

# Pre-Rails 4.2: 
> sanitize("good<better") # => "good&lt;better" 

# Rails 4.2:
> sanitize("good<better") # => "good" 

To me; the old behavior seems more correct, there are plenty of valid uses for the less than symbol touching another character. For example, a simple ascii emoji:

# Pre-Rails 4.2: 
> sanitize("<:)") # => "&lt;:)" 

# Rails 4.2:
> sanitize("<:)") # => "" 

I'd write a failing test, but there seems to be tests already for the opposite behavior. https://github.com/rails/rails-html-sanitizer/blob/master/test/sanitizer_test.rb#L131-L133

Looking at the code, it seems like it may not even be possible to implement this behavior anymore due to the fact we're now using nokogiri instead of regepx matching for detecting tags.

Is that the case? What's the go here?

WhiteListSanitizer hangs on long/specific style attributes

WhiteListSanitizer hangs on (for example) the following input:

b2="<li class=\"my_next\" style=\"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 5px; padding-right: 10px; padding-bottom: 5px; padding-left: 10px; border-top-width: 1px; border-right-width: 0px; border-bottom-width: 0px; border-left-width: 0px; border-style: initial; border-color: initial; outline-width: 0px; outline-style: initial; outline-color: initial; line-height: normal; font-style: inherit; font-size: 11px; font-family: inherit; word-spacing: normal; vertical-align: baseline; list-style-type: none; list-style-position: initial; list-style-image: initial; border-top-style: solid; border-top-color: rgb(51, 51, 51); font: normal normal bold 11px/normal Arial, Helvetica, sans-serif; color: rgb(255, 255, 255); white-space: nowrap; cursor: pointer; text-indent: 0px; \">Play Next</li>"
white_list_sanitizer = Rails::Html::WhiteListSanitizer.new
white_list_sanitizer.sanitize(b2)

Tested with rails 4.2.3

Debian packaging 1.2.0: test issues

Hi,

I'm preparing 1.2.0 for Debian, and running into the following, which looks like quite minor, possibly related to encoding:

SanitizersTest#test_uri_escaping_of_href_attr_in_a_tag_in_safe_list_sanitizer = 0.00 s = F

Failure:
SanitizersTest#test_uri_escaping_of_href_attr_in_a_tag_in_safe_list_sanitizer [/<<PKGBUILDDIR>>/test/sanitizer_test.rb:490]:
--- expected
+++ actual
@@ -1 +1 @@
-"<a href=\"examp&lt;!--%22%20unsafeattr=foo()&gt;--&gt;le.com\">test</a>"
+"<a href=\"examp<!--%22%20unsafeattr=foo()>-->le.com\">test</a>"


SanitizersTest#test_uri_escaping_of_src_attr_in_a_tag_in_safe_list_sanitizer = 0.00 s = F

Failure:
SanitizersTest#test_uri_escaping_of_src_attr_in_a_tag_in_safe_list_sanitizer [/<<PKGBUILDDIR>>/test/sanitizer_test.rb:500]:
--- expected
+++ actual
@@ -1 +1 @@
-"<a src=\"examp&lt;!--%22%20unsafeattr=foo()&gt;--&gt;le.com\">test</a>"
+"<a src=\"examp<!--%22%20unsafeattr=foo()>-->le.com\">test</a>"


SanitizersTest#test_uri_escaping_of_name_action_in_a_tag_in_safe_list_sanitizer = 0.00 s = F

Failure:
SanitizersTest#test_uri_escaping_of_name_action_in_a_tag_in_safe_list_sanitizer [/<<PKGBUILDDIR>>/test/sanitizer_test.rb:520]:
--- expected
+++ actual
@@ -1 +1 @@
-"<a action=\"examp&lt;!--%22%20unsafeattr=foo()&gt;--&gt;le.com\">test</a>"
+"<a action=\"examp<!--%22%20unsafeattr=foo()>-->le.com\">test</a>"


SanitizersTest#test_uri_escaping_of_name_attr_in_a_tag_in_safe_list_sanitizer = 0.00 s = F


Failure:
SanitizersTest#test_uri_escaping_of_name_attr_in_a_tag_in_safe_list_sanitizer [/<<PKGBUILDDIR>>/test/sanitizer_test.rb:510]:
--- expected
+++ actual
@@ -1 +1 @@
-"<a name=\"examp&lt;!--%22%20unsafeattr=foo()&gt;--&gt;le.com\">test</a>"
+"<a name=\"examp<!--%22%20unsafeattr=foo()>-->le.com\">test</a>"


Finished in 0.128160s, 2387.6355 runs/s, 2520.2819 assertions/s.
306 runs, 323 assertions, 4 failures, 0 errors, 0 skips
rake aborted!

I'm happy to provide more details if needed.

Thanks for your work,
cheers!

4.2.5.1 #sanitize whitelist changes vs Rails::Html::WhiteListSanitizer docs

4.2.5.1 gave me some failing tests re: RedCloth output & I'm trying to figure out how to get going again quickly.

I noticed 297161e changed the whitelist to one set at https://github.com/rails/rails-html-sanitizer/blob/master/lib/rails/html/sanitizer.rb#L107

However, the documentation implies that it's using Loofah's HTML5 whitelist, which contains a lot more tags (tables, svgs, kitchen-sinks).

https://github.com/rails/rails-html-sanitizer/blob/master/lib/rails/html/sanitizer.rb#L61 and https://github.com/rails/rails-html-sanitizer/blob/master/lib/rails/html/sanitizer.rb#L75

I don't know what work went into CVE-2015-7578, but if it's more about data-attributes, isn't setting a smaller whitelist a different problem that didn't need fixing? Or is there another security concern that we'd reintroduce by using the Loofah HTML5 whitelist in our app, which should be fixed in Loofah?

Full sanitizer does not escape quotes

hey,

i've found the following behaviour: <, > and & are escaped when using the sanitizer, while " is not.

irb -r rails-html-sanitizer.rb
Rails::Html::Sanitizer.full_sanitizer.new.sanitize("lt < and something")
=> "lt &lt; and something"
Rails::Html::Sanitizer.full_sanitizer.new.sanitize("gt > and something")
=> "gt &gt; and something"
Rails::Html::Sanitizer.full_sanitizer.new.sanitize("amp & and something")
=> "amp &amp; and something"
Rails::Html::Sanitizer.full_sanitizer.new.sanitize("quote \" and something")
=> "quote \" and something"

Before 49dfc15 , the full sanitizer was using loofah's text method which did escape quotes, but now it's using to_html, which does not.

Loofah.fragment("quote \" ").to_html
=> "quote \" "
Loofah.fragment("quote \" ").text
=> "quote &quot; "
Loofah.fragment("amp & ").to_html
=> "amp &amp; "
Loofah.fragment("amp & ").text
=> "amp &amp; "

Is this a bug or expected behaviour? Should sanitize escape quotes or not? Or i am i missing something altogether?

Thanks

Strip_tags regression in Rails 4.2.4

Rails 4.2.1

2.2.0 :001 > include ActionView::Helpers::SanitizeHelper
 => Object
2.2.0 :002 > strip_tags('&lt;<title>&gt;')
 => "&lt;&gt;"

Rails 4.2.4

2.2.0 :004 > include ActionView::Helpers::SanitizeHelper
 => Object
2.2.0 :006 > strip_tags('&lt;<title>&gt;')
 => "<>"

Therefore strip_tags is NOW in Rails 4.2.4 also translating text as well as stripping out tags. The lt; and gt; are TEXT and should be left alone.

This gem breaks ActionMailer + ActionView + ActiveRecord - Rails projects

I have a project that uses ActiveRecord and ActionView (for ActionMailer templates) without Rails.

At https://github.com/rails/rails/blob/v4.2.0/activerecord/lib/active_record/connection_handling.rb#L3, ActiveRecord makes a (reasonable) assumption that if Rails is defined, Rails responds to env. This gem violates that contract in https://github.com/rails/rails-html-sanitizer/blob/v1.0.1/lib/rails/html/sanitizer.rb#L1, which is immediately loaded when you require "action_view/helpers".

Minimal implementation: https://gist.github.com/betesh/f080b8a38ba7f476b2a0

test failures with new loofah (2.2.1)

Hi,

In newer version, loofah changed from homemade regexps to the crass gem to handle css parsing/scrubbing.
As a consequence, the output of scrubbing changed a little bit, and additional spaces which were present after the ":" are now removed, causing test failures:

SanitizersTest#test_should_sanitize_illegal_style_properties [/home/boutil/debian/ruby-team/ruby-rails-html-sanitizer/test/sanitizer_test.rb:392]:
--- expected
+++ actual
@@ -1 +1 @@
-"display: block; width: 100%; height: 100%; background-color: black; background-x: center; background-y: center;"
+"display:block;width:100%;height:100%;background-color:black;background-x:center;background-y:center;"

SanitizersTest#test_should_sanitize_with_trailing_space [/home/boutil/debian/ruby-team/ruby-rails-html-sanitizer/test/sanitizer_test.rb:398]:
Expected: "display: block;"
  Actual: "display:block;"

Of course, it is just cosmetic and doesn't really break any functionality. Could you adapt the tests to the new output? Maybe these tests could be made more robust to such minor changes of output?

sanitize without encode special chars

Currently when we santize (using rails strip_tags helper for example) text it encode \r chars. For example

ActionController::Base.helpers.strip_tags("test\r\n\r\ntest") #=> "test&#13;\n&#13;\ntest"

We should pass encode_special_chars: false to Loofah.fragment.text method to avoid it.

Loofah.fragment("test\r\n\r\ntest").text(encode_special_chars: false) #=> "test\r\n\r\ntest"

Having text with encoded \r makes problems with transforming it using for example textile rails helper - it stop produce paragraphs. (Rails use RedCloth for it) example:

RedCloth.new("test&#13;\n&#13;\ntest").to_html #=> "<p>test&#13;<br />\n&#13;<br />\ntest</p>"
RedCloth.new("test\r\n\r\ntest").to_html #=> "<p>test</p>\n<p>test</p>"

By the way sanitize from Rails::Html::WhiteListSanitizer don't encode special chars :)

ActionController::Base.helpers.sanitize("test\r\n\r\ntest") #=> "test\r\n\r\ntest"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.