Giter Site home page Giter Site logo

htmlawed's Introduction

htmLawed

Build Status Coverage Packagist Version LGPL-3.0

A composer wrapper for the htmLawed library to purify & filter HTML. Tested with PHPUnit and PhantomJS.

Why use htmLawed?

If your website has any user-generated content then you need to worry about cross-site scripting (XSS). htmLawed will take a piece of potentially malicious html and remove the malicious code, leaving the rest of html behind.

Beyond the base htmLawed library, this package makes htmLawed a composer package and wraps it in an object so that it can be autoloaded.

Installation

htmLawed requres PHP 5.4 or higher

htmLawed is PSR-4 compliant and can be installed using composer. Just add vanilla/htmlawed to your composer.json.

"require": {
    "vanilla/htmlawed": "~1.0"
}

Example

echo Htmlawed::filter('<h1>Hello world!');
// Outputs: '<h1>Hello world!</h1>'.

echo Htmlawed::filter('<i>nothing to see</i><script>alert("xss")</script>')
// Outputs: '<i>nothing to see</i>alert("xss")'

Configs and Specs

The htmLawed filter takes two optional parameters: $config and $spec. This library provides sensible defaults to these parameters, but you can override them in Htmlawed::filter().

$xss = "<i>nothing to see <script>alert('xss')</script>";

// Pass an empty config and spec for no filtering of malicious code.
echo Htmlawed::filter($xss, [], []);
// Outputs: '<i>nothing to see <script type="text/javascript">alert("xss")</script></i>'

// Pass safe=1 to turn on all the safe options.
echo Htmlawed::filter($xss, ['safe' => 1]);
// Outputs: '<i>nothing to see alert("xss")</i>'

// We provide a convenience method that strips all tags that aren't supposed to be in rss feeds.
echo Htmlawed::filterRSS('<html><body><h1>Hello world!</h1></body></html>');
// Outputs: '<h1>Hello world!</h1>'

See the htmLawed documentation for the full list of options.

Differences in Vanilla's version of Htmlawed

We try and use the most recent version of htmLawed with as few changes as possible so that bug fixes and security releases can be merged from the main project. However, We've made a few changes in the source code.

  • Balance tags (hl_bal) before validating tags (hl_tag). We found some cases where an unbalanced script tag would not get removed and this addresses that issue.
  • Don't add an extra <div> inside of <blockquote> tags.
  • Remove naked <span>.
  • Change indentation from 1 space to 4 spaces.

If the original author of htmLawed wants to make any of these changes upstream please get in contact with [email protected].

htmlawed's People

Contributors

acharron-hl avatar bleistivt avatar charrondev avatar daazku avatar initvector avatar linc avatar tburry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

htmlawed's Issues

Function create_function() is deprecated

When running tests on PHP nightlies in Travis CI (PHP 7.3.0-dev), we get:

Function create_function() is deprecated
vendor/vanilla/htmlawed/src/htmLawed/htmLawed.php:1010

I will also submit an upstream issue.

Edit: Fixed in 1.2.4 - 31 August 2017. Removes use of PHP 'create_function' function and '$php_errormsg' reserved variable (deprecated in PHP 7.2)

Upgrade Htmlawed to 1.2.1

1.2.1 - 15 May 2017. Fix for a potential security vulnerability in transformation of deprecated attributes

stripping span tags appears too agressive

If I pass

<span style="expression(alert('XSS')">foo</span>

into version 1.2.3 I end up getting back

foo</span>

This appears to be related to the following code:

        if ($e !== 'span' || !empty($a)) {
            if (!isset($cE[$e])) {
                $q[] = $e;
            }
            echo '<', $e, $a, '>';
        }

If I take out the if ($e !== 'span' || !empty($a)) then it works as expected, returning

<span>foo</span>

Question : initial tabulation ?

Hello,

I found yesterday your repository after having searched for repo for htmLawed and found yours.

Did you know if it's possible to add an initial tabulation ? I didn't find something on http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/htmLawed_README.htm

My need / question :

By using the code below

$html= '<div><p>...</p></div>';
echo htmLawed($html, array('indent_size'=>4);

I receive

<div>
    <p>...</p>
</div>

i.e. the first tag has been put at column 0.

Is it possible to say, f.i. "First tag should be indented by two" like :

        <div>
            <p>...</p>
        </div>

(this because when I read my HTML view source, I've my html and body tags f.i. well indented and the result of htmlLawed is put at column 0 and not at column 8 as, in fact, expected)

(htmlLawed is only used for the body part; not for the full HTML page).

Thanks !

Upgrade Htmlawed to 1.2.1.1

1.2.1.1 - 17 May 2017. Fix for a potential security vulnerability in transformation of deprecated attributes

Customized blockquote handling removes elements

It looks like the changes made to htmLawed's <blockquote> handling in this package have caused a regression in 2.2.

Steps to Reproduce

  1. Pass the following HTML through Htmlawed::filter() with default configuration:
<blockquote>
  <p>Line 1</p>
  <p>Line 2</p>
  <p>Line 3</p>
</blockquote>

Expected Result

The markup is left unchanged.

Actual Result

The <p> tags are stripped from all but the first <p> within the blockquote.

<blockquote>
  <p>Line 1</p>
  Line 2
  Line 3
</blockquote>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.