Giter Site home page Giter Site logo

jitbit / htmlsanitizer Goto Github PK

View Code? Open in Web Editor NEW
134.0 16.0 31.0 63 KB

Fast JavaScript HTML Sanitizer, client-side (i.e. needs a browser, won't work in Node and other backend)

Home Page: https://www.jitbit.com

License: MIT License

JavaScript 46.41% HTML 53.59%
sanitize sanitize-html sanitizer

htmlsanitizer's People

Contributors

alex-jitbit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

htmlsanitizer's Issues

Minified and/or dist version

It's not a big deal if either/both versions aren't provided, as we can make them ourselves. However, it would be appreciated if they were included as a simple gesture, and many developers would benefit from it.

Sensitive to DOM clobbering

Execute SanitizeHtml with HTML contains DOM clobbering lead to unexpected results:

HtmlSanitizer.SanitizeHtml("<img name=createElement>");
// Result: TypeError: iframedoc.createElement is not a function
// Expected: "<img>"
HtmlSanitizer.SanitizeHtml("<p>Hello world!</p><img name=body>")
// Result: ""
// Expected: "<p>Hello world!</p><img>"

SVG Support

For people who would like to add svg support, you need to add it in lower case, as well as path, to the white list.

A better solution is to change makeSanitizedCopy to get an upper case version of the tag name, and in that case we can use SVG and PATH:

function makeSanitizedCopy(node) {
  let newNode, nodeTagName = (node.tagName||"").toUpperCase();

  if (node.nodeType == Node.TEXT_NODE) {
    newNode = node.cloneNode(true);
  } else if (node.nodeType == Node.ELEMENT_NODE && (tagWhitelist_[nodeTagName] || contentTagWhiteList_[nodeTagName])) {
    //remove useless empty spans (lots of those when pasting from MS Outlook)
    if ((nodeTagName == "SPAN" || nodeTagName == "B" || nodeTagName == "I" || nodeTagName == "U")
      && node.innerHTML.trim() == "") {
      return document.createDocumentFragment();
    }

    if (contentTagWhiteList_[nodeTagName])
      newNode = iframedoc.createElement('DIV'); //convert to DIV
    else
      newNode = iframedoc.createElement(nodeTagName);

    for (let i = 0; i < node.attributes.length; i++) {
      let attr = node.attributes[i];
      if (attributeWhitelist_[attr.name]) {
        if (attr.name == "style") {
          for (let s = 0; s < node.style.length; s++) {
            let styleName = node.style[s];
            if (cssWhitelist_[styleName])
              newNode.style.setProperty(styleName, node.style.getPropertyValue(styleName));
          }
        }
        else {
          if (uriAttributes_[attr.name]) { //if this is a "uri" attribute, that can have "javascript:" or something
            if (attr.value.indexOf(":") > -1 && !startsWithAny(attr.value, schemaWhiteList_))
              continue;
          }
          newNode.setAttribute(attr.name, attr.value);
        }
      }
    }
    for (let i = 0; i < node.childNodes.length; i++) {
      let subCopy = makeSanitizedCopy(node.childNodes[i]);
      newNode.appendChild(subCopy, false);
    }
  } else {
    newNode = document.createDocumentFragment();
  }
  return newNode;
}

Don't forget to also add the attributes (like xmlns, viewbox, d, fill, โ€ฆ)

Adding input to iframe flagged by Veracode as XSS flaw

We were using HtmlSanitizer.js for XSS protection in the front end. But in a recent scan Veracode is flagging the sanitizer itself as an XSS flaw, specifically L44 where it's sending the input to the sandbox iframe.

iframedoc.body.innerHTML = input;

So I'm wondering, is setting iframe['sandbox'] = 'allow-same-origin'; sufficient to prevent the iframe itself becoming vulnerable so that I can tell Veracode "I know that, it's safe"?

license question

Given that this code is heavily inspired by this SO answer, and that code from SO is licensed as CC BY-SA 3.0 and requires attribution, should the license for this library change? Probably to CC-BY-SA 3.0 as well?

Performance issue with many images

When sanitizing HTML that contains many image tags, this code wants to load all the images (presumably due to loading them into the DOM). This makes it actually a good deal slower than a pure-JS solution like sanitize-html.

I don't know if there's actually anything you can do about this (maybe you can turn off image loading for the iframe in question?)

ES6 module?

It will be great if this is published as a ES6 module, with proper exports, so I can just do:

import HtmlSanitizer from '@jitbit/htmlsanitizer'

Possible to whitelist any data attributes?

Trying to sanitize a small bit of html which includes data attributes that is built based on a JSON response from a 3rd party API.

Is there a way to whitelist any data attributes ex: (data-{attribute-name}), or do I need to add all data attributes I have to the attributes whitelist?

Produces error when the first node / root container is not whitelisted

I get the following error:

ERROR TypeError: Cannot read property 'replace' of undefined
    at Object.HtmlSanitizer.SanitizeHtml (HtmlSanitizer.js:92)

The line number might be incorrect by one as I rermoved the log statement in the top.

var resultElement = makeSanitizedCopy(iframedoc.body);
document.body.removeChild(iframe);
return resultElement.innerHTML
	.replace(/<br[^>]*>(\S)/g, "<br>\n$1")
	.replace(/div><div/g, "div>\n<div"); //replace is just for cleaner code

resultElement is not null, but the .innerHTML property doesn't exist here. Debugging in Chrome shows that resultElement is of type #document-fragment, which doesn't have a innerHTML property.

It happened because the script went into the else statement of makeSanitzedCopy() for the first node:

			} else {
				newNode = document.createDocumentFragment();
			}

I'm guessing that because I empties all the whitelists and only left in my custom tags. The html that goes into it starts with or somthine which is not white listed anymore.

Error whenn _tagWhitelist missing BODY

i want to have a very restrictive tag list

const _tagWhitelist = { 'A': true, 'B': true, 'BODY': true, 'BR': true, };

but if i remove 'BODY': true, i get an javascript error in console saying

Uncaught TypeError: resultElement.innerHTML is undefined

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.