Giter Site home page Giter Site logo

xenon's Introduction

Introduction

Xenon is a library that provides the feature to parse loose or imcomplete XML including HTML and build DOM to your application. You can use almost in the same way with JAXP.

DocumentBuilderFactory factory = new HTMLDocumentBuilderFactory();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(htmlFile);

This library includes type 2 SAX parser and level 3 DOM builder for loose XML and HTML. Of course this recognize well-formed XML same as JAXP.

  • Runtime Environment: Java2 SE 5.0 or later
  • Additional Library: none
  • Project Properties: Eclipse 3.6. standard file encoding is UTF-8, and line separator is LF.
  • License: Apache License, Version 2.0

Feature

  • Compatible to use for JAXP 1.3.
  • Build DOM from HTML or loose, incomplete XML.
  • Callback based parsing is same as SAX, too.
  • Guess charset from <?xml?> or <meta> element in XML/HTML.
  • Any SAXException are not arisen in parsing. But when uses HTMLDocumentBuilderFactory, ErrorHandler.warning() is called to report incomplete xml format.

Note that the SAX feature is faithfulness for loose structure. It means that the all callback will not be based on JAXP specification. For example, asymmetric begin-end callback will occur for element, presence of text or comment will be reported out of root element.

DTD, XML Schema and any validations are not supported when parsing. If you need these features, you will use valid-XML parser or validate DOM after building.

History

2010/02/25

  • Java2 SE 5.0 対応
  • Xerces 2.9 バグ回避対応
  • クォートで囲まれていない属性値の最後がスラッシュで終わっておりその直後に要素の終了が存在する場合について、空要素と して解釈されていたものを属性値として認識するように修正。具体的には <a href=/foo/bar/> のような記述に対して 今までは <a href="/foo/bar"/> と解釈していたのを <a href="/foo/bar/"> と解釈するようになった。 初期の HTML でクォートの省略が多く見られていた事と、その頃は空要素の記述に <a/> という書き方はしなかった事から、 クォート省略 + "/>" が現れたらそのスラッシュは属性値の一部と見なすのがより正しいかと判断しています。
  • 属性値においてシングルクォートに囲まれた中のダブルクォート、ダブルクォートに囲まれた中のシングルクォートを属性値の 一部として認識するよう修正。

xenon's People

Contributors

torao avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.