Comments (6)
The line that causes the crash is a javascript tag:
<script type="text/javascript">window.NREUM||(NREUM={}),__nr_require=function(t,n,e){function r(e){if(!n[e]){var o=n[e]={exports:{}};t[e][0].call(o.exports,function(n){var o=t[e][1][n];return r(o?o:n)},o,o.exports)}return n[e].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<e.length;o++)r(e[o]);return r}({D5DuLP:[function(t,n){function e(t,n){var e=r[t];return e?e.apply(this,n):(o[t]||(o[t]=[]),void o[t].push(n))}var r={},o={};n.exports=e,e.queues=o,e.handlers=r},{}],handle:[function(t,n){n.exports=t("D5DuLP")},{}],G9z0Bl:[function(t,n){function e(){var t=l.info=NREUM.info;if(t&&t.agent&&t.licenseKey&&t.applicationID&&p&&p.body){l.proto="https"===f.split(":")[0]||t.sslForHttp?"https://":"http://",i("mark",["onload",a()]);var n=p.createElement("script");n.src=l.proto+t.agent,p.body.appendChild(n)}}function r(){"complete"===p.readyState&&o()}function o(){i("mark",["domContent",a()])}function a(){return(new Date).getTime()}var i=t("handle"),u=window,p=u.document,s="addEventListener",c="attachEvent",f=(""+location).split("?")[0],l=n.exports={offset:a(),origin:f,features:[]};p[s]?(p[s]("DOMContentLoaded",o,!1),u[s]("load",e,!1)):(p[c]("onreadystatechange",r),u[c]("onload",e)),i("mark",["firstbyte",a()])},{handle:"D5DuLP"}],loader:[function(t,n){n.exports=t("G9z0Bl")},{}]},{},["G9z0Bl"]);</script>
from xmlpath.
The cause, as you've already figured, is that xmlpath right now uses the xml package to parse the HTML code. Although we can loose some bolts so that it can parse more than strict XML, it's still not enough to parse general HTML without errors.
The good news is that I've already been working on xmlpath v2, which will use a real HTML parser to avoid such issues. The bad news is that it will take a couple of weeks before I can finish this.
If you want to solve the problem right away, one hack I've done before is to use the regexp package to get rid of the content within such script tags, before handing it off to xmlpath.
I'll leave this issue open and report here once the problem is solved.
from xmlpath.
Using regex on html reminds me of this:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
I will give it a try.
Can't wait for the v2 :) Keep up the good work 👍
from xmlpath.
I've one more question. For the future version of xmlpath, will it be possible to get Raw value of a content of the node?
For instance, If I have:
<p>
foo <br> bar
</p>
I would like to get raw contents of p
with html tags and everything (in this case, I'd like unstripped <br>
tag to retain formatting).
//p
foo <br> bar
from xmlpath.
It's unlikely that you might get the unmodified raw content. The parser will generally alter it to have it well formed.
from xmlpath.
xmlpath.v2 is out, fixing this issue: http://goo.gl/a6d0MG
from xmlpath.
Related Issues (20)
- Basic tutorial HOT 1
- v2 parser refers to old code.google.com repo HOT 1
- How do you set the Charset Reader? HOT 2
- Convenience Functions
- How to get matched node names HOT 1
- No error when parsing empty string HOT 1
- No Length Information Available for Iter HOT 1
- root.String() return "empty" string
- Project scope? HOT 2
- please tag and version this project HOT 2
- Support multiple attribute conditions
- ok deleted
- Active maintainer needed
- XPath in iterated sub-nodes HOT 1
- Infinite recursion with `%#v` formatter
- LGPLv3 and static linking HOT 5
- 2022/01/12 17:05:34 xml: unsupported version "1.1"; only version 1.0 is supported
- Iterate through nodes HOT 3
- Equals operator index out of range
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xmlpath.