lullabot / amp-library Goto Github PK
View Code? Open in Web Editor NEWConvert HTML to AMP HTML and report HTML compliance with the AMP HTML specification
License: Other
Convert HTML to AMP HTML and report HTML compliance with the AMP HTML specification
License: Other
We keep getting this pesky PHP notice. It needs to hunted down and made to go away.
Notice: Undefined property: DOMElement::$data in
Masterminds\HTML5\Serializer\OutputRules->element() (line 222 of
/var/www/amp/docroot/vendor/masterminds/html5/src/HTML5/Serializer/OutputRules.php
As far as I can tell, this specific notice does not cause any problems in the AMP HTML output but we still need to try to figure out why it occurs and how to make it go away. Its probably something simple like the code of the html5 library like not doing an isset
/ !empty
check before using the variable.
Add continuous integration.
Most probably we can use TravisCI as its hosted and works well with composer and phpunit
As a note to future developers please write a note (maybe at DEVELOPERS.md) on how the AMP specification generation representation (validator-generated.php
) works.
( We have a fork at https://github.com/Lullabot/amphtml
which does this )
This is small, subtle bug. Make sure we return the same value AMPDOMTreeBuilder::startTag()
that DOMTreeBuilder::startTag()
returns
(This ticket is part of the omnibus ticket #35 )
In CSS validation we validate the CSS in the amp page. Some of the things that are not allowed are: @imports
in your CSS, various rules around url()
in your CSS and so forth. Please add functionality to support this CSS validation. See https://github.com/ampproject/amphtml/blob/master/spec/amp-html-format.md#stylesheets for a longer list.
Here is an example of a css_spec
section that we have to use:
https://github.com/ampproject/amphtml/blob/dfaa8e1ba556165b24177aa4ec3c1aebdfcbd4f5/validator/validator-main.protoascii#L470
Note: CDATA validation as part of #75 already supports some CSS validation via regexes (e.g. !important
is dissallowed and so forth). In this ticket we need to actually parse the CSS and report any problems with it.
attribute-name="attribute-value"
style when attribute-name=''
e.g. noloading="noloading"
As there are some new embed code conversions, please update documentation. Also add basic instructions for running tests.
Add support for Template validation
(This is a subticket of #35)
phpunit.xml
configuration file to whitelist the files we want to do code coverage onParsedUrlSpecStylesheetErrorAdapter.php
does not get any code coverage statistics when it is very obviously being calledSample embed code
<iframe src="https://player.vimeo.com/video/18352872" width="640" height="360" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
<p><a href="https://vimeo.com/18352872">Drupal 7 Marketing Video</a> from <a href="https://vimeo.com/lullabot">Lullabot</a> on <a href="https://vimeo.com">Vimeo</a>.</p>
Sample amp-vimeo tag see https://github.com/ampproject/amphtml/blob/master/extensions/amp-vimeo/amp-vimeo.md
When a tag matches a specification then its is correct -- no problems.
When a tag does not satisfy any specifications, then there are competing tag specifications (e.g. amp-audio > track
vs audio > track
) from which errors must be presented.
The amp php library should select the errors in a similar way that the canonical validator does. Otherwise, the output of the php validator may be less useful because its has chosen a longer set of errors or a tagspec thats not really the best fit etc.
Update the code so that we match canonical validator.
We discussed this on the call. We should run the Full HTML page through the library on amp paths on Drupal 8.
While the library will strip out any bad attributes, bad tags and bad property value pairs it does not know how to "add" things that would fix any potential problems. So you have two situations:
It may still be worthwhile to do this exercise and see if we're net positive by running the full HTML through the library. And hopefully we should.
cc: @sirkitree @Mdrummond
It would be nice to see ci and some tests in this library.
Convert Vine embed code to <amp-vine>
tag
Add a small amount of documentation so that people know how to run tests
Add Layout validation to our library by porting over relevant code from the amphtml project.
To know what layouts in AMP are, here is a useful link:
https://github.com/ampproject/amphtml/blob/master/spec/amp-html-layout.md
(This is subticket of #35)
In Drupal 7, I'm seeing an error where the script tags for component JS are being added a second time when full HTML processing is turned on. I'm not seeing this error in Drupal 8. Examples include amp-iframe, amp-twitter, and amp-analytics. In Drupal 7, the tags are initially added to $head which appears towards the top of the head element. In Drupal 8, the tags are output by the JS placeholder, which is just above the main amp script tag. Maybe that has something to do with it? In any case, when full HTML processing is turned off, this doesn't happen, so it's something to do with the AMP library. I don't see any code in the D7 module that is doing this in conjunction with the full HTML pass, so I can only assume it's the library itself.
The AMP class is getting unwieldy. There is a lot of scope for refactoring code into smaller functions. Do this. Additionally, we have a facility to be able to diff
the initial HTML input with the AMPized output HTML. This feature has some issues -- address them.
While porting a subset of the canonical validator to PHP comparisons were made between the output given by the canonical validator and our (ported) validator.
This exercise should happen again with greater rigour and with more sample HTML documents to ensure that the AMP HTML warnings that we report are correct.
Please note that the output will not be the same between the two validators (and that is OK).
Our validator will give:
We just need to ensure that our warnings are correct.
This ticket is partially related to #11 but not fully.
Implement video to amp-video tag conversion
Hi,
I have a non-Drupal PHP website. It basically generates HTML files on the fly fetching from mysql database. In addition I have over 500 million HTML files that are generated on the fly as well. In case if you wondering what website is he talking about, head over to XYZCalledYou.com.
I am looking for an automated AMP HTML file creation integration.
I was this close to finishing, but converting so many pages can be a nightmare that can't end at all.
In case if you are still wondering, the site never breaches, abuses or violates any of Google's terms of conduct.
I can send my PHP code over to you if you need to examine it.
Thank you,
Esh
I started trying to set this up on Packagist and got the following error when I tried:
"Uncaught Exception: [RuntimeException] The HOME or COMPOSER_HOME environment variable must be set for composer to run correctly"
I think we need something added to composer.json to make packagist happy.
Implement audio to amp-audio tag conversion
Attribute trigger specs is a new feature that was implemented in our PHP validator recently.
In protoascii language it looks like this
...
attrs: {
name: "on"
trigger: {
if_value_regex: "tap:.*"
also_requires_attr: "role"
also_requires_attr: "tabindex"
}
}
...
Essentially, if there is an attribute "on" on a tag, and its value starts with "tap:" then we should also require attributes role
and tabindex
on that tag.
Now:
validator-generated.php
and copy it into this repo; this will bring in some attribute specs that use this feature in the validatorAdd support for pinterest embed code conversion to <amp-pinterest>
See
https://www.ampproject.org/docs/reference/extended/amp-pinterest.html
Also see
https://developers.pinterest.com/tools/widget-builder/
Corresponds to https://github.com/Lullabot/amp-library-issues/issues/12
Currently the AMP warnings displayed have line numbers displayed beside them. Line numbers don't have much relevance if the HTML is coming from a rich text editor.
It might be a good idea to provide the HTML of the tag as a "context". The formatting of the message can be made better and more user friendly.
There are a lot more things that can be done here / possibly need to be done here. This will require refactoring of the warnings display/action taken related code.
Here are some general ideas mentioned by marc https://github.com/Lullabot/amp-module/issues/77#issuecomment-189687121 that can serve as an additional background for this ticket.
While working on #62 I encountered problems related to the AMP library's support for HTML5. The main issue is that we're using the inbuilt HTML parser of PHP DOM (libxml version is 2.9.0).
The following does not parse properly for instance:
<audio controls autoplay>
<p>Your browser does not support the <strong>audio</strong> tag</p>
<source src="abc.wav" type="audio/wav">
<source src="xyz.mp3" type="audio/mpeg">
</audio>
(The above HTML actually parses without any fatal errors but the parse tree is not correct).
This HTML needs to be changed to:
<audio controls autoplay>
<p>Your browser does not support the <strong>audio</strong> tag</p>
<source src="abc.wav" type="audio/wav" />
<source src="xyz.mp3" type="audio/mpeg" />
</audio>
Notice that <source>
was changed to <source />
so that the PHP dom parser knows that the tag has ended.
We can't expect users to do this while entering HTML. Users should be able to provide valid HTML5 in additional to traditional HTML.
The solution is to not use the default parser but the mastermind/html5-php
library to parse the HTML5 to build the DOMDocument
. (We're already using this library for HTML output)
The use of mastermind for input is a seemingly small change but causes a huge problem: We cannot point out line numbers when we find validation issues. DOMNode::getLineNo()
always returns zero.
I filed an issue on the mastermind/html5-php
. Sadly there has been no response on it (yet).
I have been able to make my own workaround though (See filed issue for workaround details).
Task for this ticket:
When the AMP library is run, it would be a good idea to see for performance tracking and understanding (for cache purposes) when a page was processed by the AMP library
memory_get_peak_usage()
different at the start of the script vs at the end of the script?Our AMP HTML validator does not currently validate all aspects of the HTML to make sure it conforms to the AMP HTML standard.
CSS, Layout, Template, CDATA validations are still pending and should be implemented.
This is large task but can be listed under one ticket for now. (I'll create separate tickets if required as we go along).
This came to light when run test on:
https://github.com/ampproject/amphtml/blob/27ee29ffc3d809fcc8143044d22df9d176ad8169/extensions/amp-accordion/0.1/test/validator-amp-accordion.html
And comparing output against:
https://github.com/ampproject/amphtml/blob/27ee29ffc3d809fcc8143044d22df9d176ad8169/extensions/amp-accordion/0.1/test/validator-amp-accordion.out
Fix this bug and add this test to the full-html tests directory
Per https://www.drupal.org/node/2702125, we need to change LICENSE.txt to GPLv2
The library should be able to handle conversion of <img>
tags to <amp-img>
tags even if the URLs are non-absolute. E.g. <img src="/abc/xyz.jpg" />
in that case the image dimensions should be attempted to be retrieved from the http://server-url.com/abc/xyz.jpg
. The server url can be ascertained from the $_SERVER
super globals.
Similarly it should be able to handle relative paths e.g. <img src="abc/xyz.jpg" />
(note that there is no leading /
.
The AMP HTML specification is continuously changing. New tags are being introduced and the way existing tags are being validated is changing too.
Our PHP validator is based on the canonical javascript validator. We need to need to update our port to tackle non-trivial changes in the way the specification is validated (simple changes should automatically work when validator-generated.php is updated). (Examples of non-trivial changes are changes to the internal classes in validator-generated.php and so forth).
The soundcloud embed code looks something like this:
<iframe width="100%" height="450" scrolling="no"
frameborder="no"
src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/253267058&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false&visual=true"></iframe>
The specifications of the <amp-soundcloud>
tag can be found here:
https://www.ampproject.org/docs/reference/extended/amp-soundcloud.html
Write a pass that will detect a soundcloud iframe and convert it into a <amp-soundcloud>
tag with the appropriate javascript include.
As discussed in #78 (comment)
The canonical validator points out issues in the CSS on the exact line of the HTML document. In our case the line number we provide is the line number of the containing
<style>
tag. To provide exact line number within the CSS we might need to make changes to the the PHP parser library ( https://packagist.org/packages/sabberworm/php-css-parser ).
So we need to make two changes:
sabberworm/php-css-parser
to support line numbers. (This could later be submitted as a pull request upstream; but composer allows us to point to our repo if we need to also)Ideally, the README here says what the code here does. Any further detail can be learned via the links already there.
I am using the amp module for Drupal 7 and i'm facing an encoding issue when using that in my hosting environment. I did a lot of research and debugging and i narrowed down the problem to the native DomDocument - implementation.
I already created an issue on d.o for that: https://www.drupal.org/node/2712895
AMP::convertToAmpHtml()
calls $qp = QueryPath::withHTML($document, NULL, ['convert_to_encoding' => 'UTF-8']);
that results (simplified) to an equivalent of this:
// see QueryPath\DOMQuery::parseXMLString(...)
$content = mb_convert_encoding($content, 'UTF-8', 'auto');
$document->loadHTML($content);
This is where my umlauts get broken in my hosting environment. When i change the lines to
// see QueryPath\DOMQuery::parseXMLString(...)
$content = mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8');
$document->loadHTML($content);
Everything works fine.
I think that might be an issue with PHP/libxml? Is that library too old to work with AMP?
In #22 there was a subtle workaround for <noscript>
tags due to a problem with the mastermind/html5
library versions < 2.2.0
Now that the upstream library has fixed its bug as of 2.2.0 we need to remove this workaround. (See release 2.2.0 https://github.com/Masterminds/html5-php/releases/tag/2.2.0 . Also Masterminds/html5-php#98 ).
Not removing the workaround and running with 2.2.0 results in the noscript tag having html tags within <noscript>
escaped:
i.e. instead of:
<noscript><style amp-boilerplate>body{-webkit-animation:none;-moz-animation:none;-ms-animation:none;animation:none}</style>
We get:
<noscript><style amp-boilerplate>body{-webkit-animation:none;-moz-animation:none;-ms-animation:none;animation:none}</style></noscript>
In other words with the later versions of the masterminds/html5
library the workaround results in a html output issue.
The section on Composer starts as follows:
If you're familiar with composer then you should not have any problems in getting this to work for your PHP project. If you're not familiar with composer then please read up on it before trying to set this up!
Perhaps change that to something like:
Use Composer to get this library working with your PHP project. Not familiar with Composer? Read on to learn more about how to get started.
Saying that setup steps are easy can trip people up when they run into unexpected problems. Not only can a setup step be frustrating, but now they are wondering why this wasn't easy. Better to just simply state what is needed to get started.
Other than that, great README!
When I installed this via Composer and uploaded the files to a server, I received an error about a git submodule being within my repo. I looked into the amp library files in the repo and found a .git directory within the amp php library directory, which seemed unlike the other libraries in my vendors directory. Once I removed the .git directory, I was able to upload the library to the server without an issue. Not sure if something should be changed in the setup, but wanted to note this.
There are some advanced features for embed code conversion that are supported by the <amp-dailymotion>
tag. These have not been implemented in #42
See:
https://developer.dailymotion.com/player#player-parameters
and
https://www.ampproject.org/docs/reference/extended/amp-dailymotion.html
e.g. start time, ui hightlight, ui logo etc.
We should support these.
height
) copied from this Instagram post:<blockquote class="instagram-media" data-instgrm-captioned data-instgrm-version="7" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:658px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);"><div style="padding:8px;"> <div style=" background:#F8F8F8; line-height:0; margin-top:40px; padding:50.0% 0; text-align:center; width:100%;"> <div style=" background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAsCAMAAAApWqozAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAMUExURczMzPf399fX1+bm5mzY9AMAAADiSURBVDjLvZXbEsMgCES5/P8/t9FuRVCRmU73JWlzosgSIIZURCjo/ad+EQJJB4Hv8BFt+IDpQoCx1wjOSBFhh2XssxEIYn3ulI/6MNReE07UIWJEv8UEOWDS88LY97kqyTliJKKtuYBbruAyVh5wOHiXmpi5we58Ek028czwyuQdLKPG1Bkb4NnM+VeAnfHqn1k4+GPT6uGQcvu2h2OVuIf/gWUFyy8OWEpdyZSa3aVCqpVoVvzZZ2VTnn2wU8qzVjDDetO90GSy9mVLqtgYSy231MxrY6I2gGqjrTY0L8fxCxfCBbhWrsYYAAAAAElFTkSuQmCC); display:block; height:44px; margin:0 auto -44px; position:relative; top:-22px; width:44px;"></div></div> <p style=" margin:8px 0 0 0; padding:0 4px;"> <a href="https://www.instagram.com/p/BFUN97QpznO/" style=" color:#000; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none; word-wrap:break-word;" target="_blank">We've still got #VR headsets at the #Lullabot #DrupalCon booth. Stop by and enter a virtual world!</a></p> <p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">A photo posted by Lullabot (@lullabot) on <time style=" font-family:Arial,sans-serif; font-size:14px; line-height:17px;" datetime="2016-05-12T17:40:05+00:00">May 12, 2016 at 10:40am PDT</time></p></div></blockquote>
<script async defer src="//platform.instagram.com/en_US/embeds.js"></script>
<amp-instagram layout="responsive" width="400" height="600" data-shortcode="BFUN97QpznO"></amp-instagram>
(Corresponds to https://github.com/Lullabot/amp-library-issues/issues/13 )
Add support for CDATA validation with tests. This type of validation supports validation of the contents of tags e.g. this would catch scenarios like the use of !important
in css within style tags (which is not allowed). This uses things like a allowed data regex and a blacklist regex
This does not cover all of the scenarios for CSS validation. Some CSS validation will need to wait for more advanced CSS parsing validation (for another ticket).
This is related to the omnibus ticket #35
One of the objectives of this library is to find and automatically correct AMP HTML standard non-compliance automatically.
So if a style
attribute is illegal for a tag, we remove it. If a tag cannot exist in a particular location in the document we remove it and so on.
As mentioned in the README.md
this correction is currently basic and can be made a bit more sophisticated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.