Giter Site home page Giter Site logo

yegor256 / xembly Goto Github PK

View Code? Open in Web Editor NEW
237.0 14.0 28.0 5.08 MB

Assembly for XML: an imperative language for modifying XML documents

Home Page: https://www.xembly.org

License: Other

Java 99.03% HTML 0.97%
xml-editor java xml-documents xpath xml-builder xml

xembly's Introduction

Xembly

logo

EO principles respected here DevOps By Rultor.com We recommend IntelliJ IDEA

mvn PDD status codecov codebeat badge Codacy Badge Javadoc Maven Central Hits-of-Code Lines-of-Code Code Smells

Xembly is an Assembly-like imperative programming language for data manipulation in XML documents. It is a much simplier alternative to DOM, XSLT, and XQuery. Read this blog post for a more detailed explanation: Xembly, an Assembly for XML. You may also want to watch this webinar.

You need this dependency:

<dependency>
  <groupId>com.jcabi.incubator</groupId>
  <artifactId>xembly</artifactId>
  <version>0.31.1</version>
</dependency>

Here is a command line implementation (as Ruby gem): xembly-gem

For example, you have an XML document:

<orders>
  <order id="553">
    <amount>$45.00</amount>
  </order>
</orders>

Then, you want to change the amount of the order #553 from $45.00 to $140.00. Xembly script would look like this:

XPATH "orders/order[@id=553]";
XPATH "amount";
SET "$140.00";

As you see, it's much simpler and compact than DOM, XSLT, or XQuery.

This Java package implements Xembly:

Document document = DocumentBuilderFactory.newInstance()
  .newDocumentBuilder().newDocument();
new Xembler(
  new Directives(
    "ADD 'orders'; ADD 'order'; ATTR 'id', '553'; SET '$140.00';"
  )
).apply(document);

Since version 0.9 you can directly transform directives to XML:

String xml = new Xembler(
  new Directives()
    .add("root")
    .add("order")
    .attr("id", "553")
    .set("$140.00")
).xml();

This code will produce the following XML document:

<root>
  <order id="553">$140</order>
</root>

Directives

This is a full list of supported directives, in the current version:

  • ADD: adds new node to all current nodes
  • ADDIF: adds new node, if it's absent
  • SET: sets text value of current node
  • XSET: sets text value, calculating it with XPath
  • XATTR: sets attribute value, calculating it with XPath
  • CDATA: same as SET, but makes CDATA
  • UP: moves cursor one node up
  • XPATH: moves cursor to the nodes found by XPath
  • REMOVE: removes all current nodes
  • STRICT: throws an exception if cursor is missing nodes
  • PI: adds processing instruction
  • PUSH: saves cursor in stack
  • POP: retrieves cursor from stack
  • NS: sets namespace of all current nodes
  • COMMENT: adds XML comment

The "cursor" or "current nodes" is where we're currently located in the XML document. When Xembly script starts, the cursor is empty: it simply points to the highest level in the XML hierarchy. Pay attention, it doesn't point to the root node. It points to one level above the root. Remember, when a document is empty, there is no root node.

Then, we start executing directives one by one. After each directive the cursor is moving somewhere. There may be many nodes under the cursor, or just one, or none. For example, let's assume we're starting with this simple document <car/>:

ADD 'hello';        // Nothing happens, since the cursor is empty
XPATH '/car';       // There is one node <car> under the cursor
ADD 'make';         // The result is "<car><make/></car>",
                    // the cursor has one node "<make/>"
ATTR 'name', 'BMW'; // The result is "<car><make name='BMW'/></car>",
                    // the cursor still points to one node "<make/>"
UP;                 // The cursor has one node "<car>"
ADD 'mileage';      // The result is "<car><make name='BMW'/><mileage/></car>",
                    // the cursor still has one node "<car>"
XPATH '*';          // The cursor has two nodes "<make name='BMW'/>"
                    // and "<mileage/>"
REMOVE;             // The result is "<car/>", since all nodes under
                    // the cursor are removed

You can create a collection of directives either from a text or via supplementary methods, one per each directive. In both cases, you need to use the Directives class:

import org.xembly.Directives;
new Directives("XPATH '//car'; REMOVE;");
new Directives().xpath("//car").remove();

The second option is preferable, because it is faster — there is no parsing involved.

ADD

The ADD directive adds a new node to every node in the current node set. ADD expects exactly one mandatory argument, which is the name of a new node to be added (case sensitive):

ADD 'orders';
ADD 'order';

Even if a node with the same name already exists, a new node will be added. Use ADDIF if you need to add only if the same-name node is absent.

After the execution, the ADD directive moves the cursor to the nodes just added.

ADDIF

The ADDIF directive adds a new node to every node of the current set, only if it is absent. ADDIF expects exactly one argument, which is the name of the node to be added (case sensitive):

ADD 'orders';
ADDIF 'order';

After the execution, the ADDIF directive moves the cursor to the nodes just added.

SET

The SET directive changes text content of all current nodes, and expects exactly one argument, which is the text content to set:

ADD "employee";
SET "John Smith";

SET doesn't move the cursor anywhere.

XSET

The XSET directive changes text content of all current nodes to a value calculated with the provided XPath expression:

ADD "product-1";
ADD "price";
XSET "sum(/products/price) div count(/products)";

XSET doesn't move the cursor anywhere.

XATTR

The XATTR directive changes the value of an attribute of all current nodes to a value calculated with the provided XPath expression:

ADD "product-1";
ADD "price";
XATTR "s", "sum(/products/price) div count(/products)";

XATTR doesn't move the cursor anywhere.

UP

The UP directive moves all current nodes to their parents.

XPATH

The XPATH directive re-points the cursor to the nodes found by the provided XPath expression:

XPATH "//employee[@id='234' and name='John Smith']/name";
SET "John R. Smith";

REMOVE

The REMOVE directive removes current nodes under the cursor and moves the cursor to their parents:

ADD "employee";
REMOVE;

STRICT

The STRICT directive checks that there is a certain number of current nodes:

XPATH "//employee[name='John Doe']";  // Move the cursor to the employee
STRICT "1";                           // Throw an exception if there
                                      // is not exactly one node under
                                      // the cursor

This is a very effective mechanism of validation of your script, in production mode. It is similar to assert statement in Java. It is recommended to use STRICT regularly, to make sure your cursor has correct amount of nodes, to avoid unexpected modifications.

STRICT doesn't move the cursor anywhere.

PI

The PI directive adds a new processing directive to the XML:

PI "xsl-stylesheet" "href='http://example.com'";

PI doesn't move the cursor anywhere.

PUSH and POP

The PUSH and POP directives save current DOM position to stack and restore it from there.

Let's say, you start your Xembly manipulations from a place in DOM, which location is not determined for you. After your manipulations are done, you want to get back to exactly the same place. You should use PUSH to save your current location and POP to restore it back, when manipulations are finished, for example:

PUSH;                        // Doesn't matter where we are
                             // We just save the location to stack
XPATH '//user[@id="123"]';   // Move the cursor to a completely
                             // different location in the XML
ADD 'name';                  // Add "<name/>" to all nodes under the cursor
SET 'Jeff';                  // Set text value to the nodes
POP;                         // Get back to where we were before the PUSH

PUSH basically saves the cursor into stack and POP restores it from there. This is a very similar technique to PUSH/POP directives in Assembly. The stack has no limits, you can push multiple times and pop them back. It is a stack, that's why it is First-In-Last-Out (FILO).

This operation is fast and it is highly recommended to use it everywhere, to be sure you're not making unexpected changes to the XML document.

NS

The NS directive adds a namespace attribute to a node:

XPATH '/garage/car';                // Move the cursor to "<car/>" node(s)
NS "http://www.w3.org/TR/html4/";   // Set the namespace over there

If an original document was like this:

<garage>
  <car>BMW</car>
  <car>Toyota</car>
</garage>

After the applying of that two directives, it will look like this:

<garage xmlns:a="http://www.w3.org/TR/html4/">
  <a:car>BMW</a:car>
  <a:car>Toyota</a:car>
</garage>

The namspace prefix may not necessarily be a:.

NS doesn't move the cursor anywhere.

XML Collections

Let's say you want to build an XML document with a collection of names:

package org.xembly.example;
import org.xembly.Directives;
import org.xembly.Xembler;
public class XemblyExample {
  public static void main(String[] args) throws Exception {
    String[] names = new String[] {
      "Jeffrey Lebowski",
      "Walter Sobchak",
      "Theodore Donald 'Donny' Kerabatsos",
    };
    Directives directives = new Directives().add("actors");
    for (String name : names) {
      directives.add("actor").set(name).up();
    }
    System.out.println(new Xembler(directives).xml());
  }
}

The standard output will contain this text:

<?xml version="1.0" encoding="UTF-8"?>
<actors>
  <actor>Jeffrey Lebowski</actor>
  <actor>Walter Sobchak</actor>
  <actor>Theodore Donald &apos;Donny&apos; Kerabatsos</actor>
</actors>

Merging Documents

When you need to add an entire XML document, you can convert it first into Xembly directives and then add them all together:

Iterable<Iterable> dirs = new Directives()
  .add("garage")
  .append(Directives.copyOf(node))
  .add("something-else");

This static utility method copyOf() converts an instance of class org.w3c.dom.Node into a collection of Xembly directives. Then, the append() method adds them all together to the main list.

Unfortunately, not every valid XML document can be parsed by copyOf(). For example, this one will lead to a runtime exception: <car>2015<name>BMW</name></car>. Read more about Xembly limitations, a few paragraphs below.

Escaping Invalid XML Text

XML, as a standard, doesn't allow certain characters in its body. For example, this code will throw an exception:

String xml = new Xembler(
  new Directives().add("car").set("\u00")
).xml();

The character \u00 is not allowed in XML. Actually, these ranges are also not allowed: \u00..\u08, \u0B..\u0C, \u0E..\u1F, \u7F..\u84, and \u86..u9F.

This means that you should validate everything and make sure you're setting only the "valid" text values to your XML nodes. Sometimes, it's not feasible to always check them. Sometimes you may simply need to save whatever is possible and call it a day. There a utility static method Xembler.escape(), to help you do that:

String xml = new Xembler(
  new Directives().add("car").set(Xembler.escape("\u00"))
).xml();

This code won't throw an exception. The Xembler.escape() method will convert "\u00" to "\u0000". It is recommended to use this method everywhere, if you are not sure about the quality of the content.

Shaded Xembly JAR With Dependencies

Usually, you're supposed to use this dependency in your pom.xml:

<dependency>
  <groupId>com.jcabi.incubator</groupId>
  <artifactId>xembly</artifactId>
</dependency>

However, if you have conflicts between dependencies, you can use our "shaded" JAR, that includes all dependencies:

<dependency>
  <groupId>com.jcabi.incubator</groupId>
  <artifactId>xembly</artifactId>
  <classifier>jar-with-dependencies</classifier>
</dependency>

Known Limitations

Xembly is not intended to be a replacement of XSL or XQuery. It is a lightweight (!) instrument for XML manipulations. There are a few things that can't be done by means of Xembly:

  • DTD section can't be modified

  • Elements and text content can't be mixed, e.g. this structure is not supported: <test>hello <b>friend</a></test>

Some of these limitations may be removed in the next versions. Please, submit an issue.

How To Contribute

Fork repository, make changes, send us a pull request. We will review your changes and apply them to the master branch shortly, provided they don't violate our quality standards. To avoid frustration, before sending us your pull request, please run full Maven build:

$ mvn clean install -Pqulice

You must fix all static analysis issues, otherwise we won't be able to merge your pull request. The build must be "clean".

Delivery Pipeline

Git master branch is our cutting edge of development. It always contains the latest version of the product, always in -SNAPSHOT suffixed version. Nobody is allowed to commit directly to master — this branch is basically read-only. Everybody contributes changes via pull requrests. We are using rultor, a hosted chatbot, in order to merge pull requests into master. Only our architect is allowed to send pull requests to @rultor for merge, using merge command. Before it happens, a mandatory code review must be performed for a pull request.

After each successful merge of a pull request, our project manager gives deploy command to @rultor. The code from master branch is tested, packaged, and deployed to Sonatype, in version *-SNAPSHOT.

Every once in a while, the architect may decide that it's time to release a new minor/major version of the product. When it happens, he gives release command to @rultor. The code from master branch is tested, versioned, packaged, and deployed to Sonatype and Maven Central. A new Git tag is created. A new GitHub release is created and briefly documented. All this is done automatically by @rultor.

Got questions?

If you have questions or general suggestions, don't hesitate to submit a new Github issue. But keep these Five Principles of Bug Tracking in mind.

xembly's People

Contributors

bitdeli-chef avatar carlosmiranda avatar dependabot[bot] avatar llorllale avatar maurezen avatar maxonfjvipon avatar mentiflectax avatar nowheresly avatar renovate[bot] avatar romankisilenko avatar rultor avatar volodya-lombrozo avatar yegor256 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xembly's Issues

PUSH/POP

Let's implement two new directives: PUSH and POP.

PUSH should save current nodes to stack and POP should get the latest location from stack and make it current.

multiple current nodes

Would be nice to have an ability to modify multiple nodes at once:

XPATH "//employee"
ADD "salary"
SET "$100"

This script should add salary node to all employees and set $100 as their text contents

ADD/ADDIF directives convert node names to lowercase.

The following code:

Directives directives = new Directives().add("Hello").add("World");
System.out.println(new Xembler(directives).xml());

...generates the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<hello>
<world/>
</hello>

I realize that in Xembly, case insensitivity is a feature, not a bug. However, XML itself and languages such as XPath are case sensitive. As a result, converting this into an XML document and using the XPath /Hello/World will return no results.

Perhaps the decision to make Xembly case-insensitive should be reconsidered, or at least there should be a way to reconcile that with XML's case-sensitive nature.

Example collection code is wrong

From the Example Collection page:

    String[] names = new String[] {
      "Jeffrey Lebowski",
      "Walter Sobchak",
      "Theodore Donald 'Donny' Kerabatsos",
    };
    Directives directives = new Directives().add("actors");
    for (String name : names) {
      directives.add("actor").set(name).up();
    }
    System.out.println(new Xembler(directives));

The expected output is

<?xml version="1.0" encoding="UTF-8"?>
<actors>
  <actor>Jeffrey Lebowski</actor>
  <actor>Walter Sobchak</actor>
  <actor>Theodore Donald &apos;Donny&apos; Kerabatsos</actor>
</actors>

The actual output is
Xembler(directives=ADD "actors", ADD "actor", SET "Jeffrey Lebowski", UP, ADD "actor", SET "Walter Sobchak", UP, ADD "actor", SET "Theodore Donald &apos;Donny&apos; Kerabatsos", UP)

I believe that last System.out.println line is missing the .xml() invocation. Instead it should should be

System.out.println(new Xembler(directives).xml());

backslash is not escaped correctly

for example:

org.xembly.SyntaxException: ADD "html";ATTR "xmlns", "http://www.w3.org/1999/xhtml";ADD "body";ADD "p";SET "€ \";
    at org.xembly.Directives.parse(Directives.java:373)
    at org.xembly.Directives.<init>(Directives.java:112)
    at org.xembly.DirectivesTest.performsFullScaleModifications(DirectivesTest.java:142)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:258)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)
Caused by: org.xembly.ParsingException: line 1:85 mismatched character '<EOF>' expecting '"'
    at org.xembly.XemblyLexer.emitErrorMessage(XemblyLexer.java:31)
    at org.antlr.runtime.BaseRecognizer.displayRecognitionError(BaseRecognizer.java:194)
    at org.antlr.runtime.Lexer.reportError(Lexer.java:275)
    at org.antlr.runtime.Lexer.nextToken(Lexer.java:99)
    at org.antlr.runtime.BufferedTokenStream.fetch(BufferedTokenStream.java:143)
    at org.antlr.runtime.BufferedTokenStream.sync(BufferedTokenStream.java:137)
    at org.antlr.runtime.CommonTokenStream.consume(CommonTokenStream.java:68)
    at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:106)
    at org.xembly.XemblyParser.directive(XemblyParser.java:245)
    at org.xembly.XemblyParser.directives(XemblyParser.java:118)
    at org.xembly.Directives.parse(Directives.java:369)
    ... 21 more

XSET to update node value with XPath expression

Let's introduce a new directive XSET. It has to make possible updates of XML node values with XPath 2.0 expressions, for example:

ADD "price"; SET "50";
XSET ". - 30";

Should set the value to 20.

Xembler.dom() and Xembler.print()

Let's introduce two supplementary static methods Xembler.dom() and Xembler.print() which should create an empty DOM and print it to an output stream

performance tests

let's test performance of Xembler and publish some results (maybe in comparison with XQuery and XSLT)

standalone=no

Let's make it possible to create documents with standalone=no in the XML declaration, for example:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<page/>

Need to update documentation to reflect case sensitivity

In the site's index page, it indicates that the ADD/ADDIF directives are case-insensitive:

Line 57 of index.md.vm

ADD directive adds a new node to every node in the current node set. ADD expects exactly one mandatory argument, which is the name of a new node to be added (case insensitive):

Line 73 of index.md.vm

ADDIF directive adds a new node to every node of the current set, only if it's absent. ADDIF expects exactly one argument, which is the name of the node to be added (case insensitive):

This has to be updated to reflect that the names given to these directives are, in fact, case-sensitive, as it has been since the 0.15 release.

Directives.copyOf(Node)

Let's add a static method copyOf(Node), which should convert existing Node to a collection of directives

new Directives(""): line 1:0 no viable alternative at input '<EOF>'

Empty list of directives gives a runtime exception:

org.xembly.ParsingException: line 1:0 no viable alternative at input '<EOF>'
    at org.xembly.XemblyParser.emitErrorMessage(XemblyParser.java:85)
    at org.antlr.runtime.BaseRecognizer.displayRecognitionError(BaseRecognizer.java:194)
    at org.antlr.runtime.BaseRecognizer.reportError(BaseRecognizer.java:186)
    at org.xembly.XemblyParser.directive(XemblyParser.java:360)
    at org.xembly.XemblyParser.directives(XemblyParser.java:105)
    at org.xembly.Directives.parse(Directives.java:363)

ADDIF is redundant

ADDIF directive is redundant. For example:

XPATH "/root";
ADDIF "user";

Can be replaced with:

XPATH "/root[not(user)]";
ADD "user";

Missing license config in pom.xml

Please put <license>file:${basedir}/LICENSE.txt</license> into the pom.xml of xembly, otherwise we'll be getting checkstyle errors (no header) upon fresh checkout.

NPE in AddIfDirective

java.lang.NullPointerException
    at org.xembly.AddIfDirective.exec(AddIfDirective.java:81)
    at org.xembly.Xembler.apply_aroundBody0(Xembler.java:132)
    at org.xembly.Xembler$AjcClosure1.run(Xembler.java:1)
    at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
    at com.jcabi.aspects.aj.MethodLogger.wrap(MethodLogger.java:204)
    at com.jcabi.aspects.aj.MethodLogger.ajc$inlineAccessMethod$com_jcabi_aspects_aj_MethodLogger$com_jcabi_aspects_aj_MethodLogger$wrap(MethodLogger.java:1)
    at com.jcabi.aspects.aj.MethodLogger.wrapMethod(MethodLogger.java:160)
    at org.xembly.Xembler.apply_aroundBody2(Xembler.java:128)
    at org.xembly.Xembler$AjcClosure3.run(Xembler.java:1)
    at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
    at com.jcabi.aspects.aj.MethodLogger.wrapClass(MethodLogger.java:130)
    at org.xembly.Xembler.apply(Xembler.java:128)

I have no idea why this happens..

NPE in XpathDirective

java.lang.NullPointerException
    at org.xembly.XpathDirective.rootOnly(XpathDirective.java:112)
    at org.xembly.XpathDirective.exec(XpathDirective.java:95)
    at org.xembly.Xembler.apply_aroundBody0(Xembler.java:131)
    at org.xembly.Xembler$AjcClosure1.run(Xembler.java:1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.