deijin27 / wren-xsequence Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 0.0 207 KB

XML parser/writer for the Wren programming language

License: MIT License

Wren 100.00%

wren wren-language xml xml-parser

wren-xsequence's People

Contributors

Stargazers

Watchers

wren-xsequence's Issues

Utility methods for converting values into common types

I'm considering adding utility methods for auto parsing common types.

Dont wanna over-do it, but for simple types, probably just

Bool
Num

Strings are already covered by

Attribute.value
Element.value
Element.attributeValue(name)

Something like

Attribute.boolValue
Element.boolValue
Element.attributeBool(name)

Do we want default value arg to the methods?

How clunky would it be to be on a utility class instead?

Allow non-string passed to element value

Should convert it with toString

Parsing problems

I've turned my attention now to XML parsing and was trying to parse the following fragment, with a view to extracting the 'name' attributes from all 'Student' elements, when I ran into some problems.

import "./xsequence" for XDocument

var xml = """
<Students>
  <Student Name="April" Gender="F" DateOfBirth="1989-01-02" />
  <Student Name="Bob" Gender="M"  DateOfBirth="1990-03-04" />
  <Student Name="Chad" Gender="M"  DateOfBirth="1991-05-06" />
  <Student Name="Dave" Gender="M"  DateOfBirth="1992-07-08">
    <Pet Type="dog" Name="Rover" />
  </Student>
  <Student DateOfBirth="1993-09-10" Gender="F" Name="&#x00C9;mily" />
</Students>
"""
var doc = XDocument.parse(xml)
var names = doc.root.elements("Student").map { |el| el.attribute("Name").value }.toList
System.print(names)

I traced these problems back to the following lines in xsequence.wren:

line 738: return parser.parserDocument(text) - this method shouldn't take any parameters.
(similar problem at lines 548 and 582)
line 540:unexpected(c, "Invalid escape sequence")- this method should take one rather than two parameters.
(similar problem at line 536)

If I alter those method calls to to take the correct number of arguments and also get rid of the acute accent so the last name is simply "Emily" rather than "Émily" then it works fine and I get the expected output:

[April, Bob, Chad, Dave, Emily]

So the first two problems should be easy to fix.

On the last problem, I think it would be worthwhile supporting Unicode character references (perhaps in a future version) as accented characters crop up quite a lot in my experience.

Newlines in attributes

I think newlines in attributes are meant to be treated as single spaces. Investigate this, and if true implement it.

Verify whitespace between attribute name

I noticed that c# allows whitespace between attribute name and equals, and between equals and attribute value.

TODO: check the xml spec to see if this should be allowed. If so, implement it.

Namespace support

A summary of my thoughts on namespace support.

Inversion of control

Xml namespaces are defined in two ways:

Explicit namespaces. xmlns:name="...", where "name" is an arbitrary name you choose.
You use something from the "name" namespace as follows name:something.
This case is present in both elements, and attributes.
The default namespace xmlns="...". This is a namespace that is implicit in elements without an explicit one.
So if you create an element in the scope of an xmlns, the namespace of it is the xmlns.
Unlike elements, attributes do use this, attributes without an explicit namespace always have no namespace.

The current way of dealing with namespaces puts the responisibility on the user to manage what namespaces name correspond to what namespaces.

// if the user creates an element, the trust is put in them to ensure it is in the scope of the correct xmlns definition.
var element = XElement.new("p:thing") 
// namespace "p" must be a defined manually, else the document is invalid

This alone is not much of an issue, it's easy to thing about this when constructing a document.

The problem arises when interpreting a document that you've loaded from a file. The way namespaces can be any arbitrary name which maps to a value means that in order to correctly interpret the namespace of an element, you must search for the scoped namespace manually, and this management is unreasonably complex.

<forest xmlns:b="blue">
  <tree>
    <b:bird/>
    <!-- What is "b:" in this element? You have to search up the tree for "xmlns:b" yourself. 
    The more deeply nested, the harder it becomes -->
  <tree>
</forest>

To address this, an approach which I'm referring to as "Inversion of control" is employed.
By storing the namespace value, rather than the namespace name in elements, you lift the responsibility of it's management from the user:

Instead of b:bird, they are presented with {blue}bird.

This is what is used in C#'s XLinq, and it works nice. It can be frustrating having to always define the default namespace for every element, but there are various methods to make this a negligible issue, which I intend to explain via a collection of examples at some point.

The class-based approach and why it's not viable in wren

In C#, {blue}bird is represented via the classes XNamespace and XName, but they on a feature exclusive to statically typed languages to be useful.

I'll start with a basic implementation to talk about

class XNamespace {
    construct new(name) {
        _name = name
    }
    name { _name }

    +(str) { XName.new(this, str) }

    ==(other) {
        if (other is String str) {
            return str == _name
        }
        return other.name == _name
    }
}

class XName {
    construct new(name) {
        _name = name
    }
    construct new(namespace, name) {
        _namespace = namespace
        _name = name
    }
    namespace { _namespace }
    name { _name }

    ==(other) { other.name == name && other.namespace == namespace}
}

An XNamespace has only it's name. An XName can have a namespace or not, along with it's "local name".

But when you're creating an element, you don't want to have to explicitly construct these every time.

// In c# you can pass a string, and it is implicitly converted to
// an XName.
var element = new XElement("bird");

// However, it also helps combine a namespace with a string to 
// make an XName
var ns = XNamespace.Get("blue");
var name = ns + "bird";

// and equality checks too
var areSame = name == "bird"

But of course, wren is a dynamic programming language. You can see the equality check in the wren XNamespace I implemented is only going to work with strings if the namespace comes first, and equality checks should not be reliant on order of operands. You can modify every method to convert a string input to an XName, but there are many situations, especially equality checks, where this will not work.

The result would be a library internally cluttered with string conversions, and an API with many subtle sources of bugs and unintuitive behaviour.

The string-based approach

So what is the better solution? My idea is to use strings to represent XName. That is, an XName is a string with a part of it representing the namespace, and a part the localName.

var namespace = "{blue}"
var localName = "bird"
var nameWithNamespace = namespace + localName // == "{blue}bird"

While the addition operation is still order-dependent, it's obvious to the user since they are working with strings, and managing the namespace themselves. And the equality operator works great.

The {} notation is used in c#, and converted to XNamespace and XName appropriately in the implicit conversion, so it has precedent. But the use of specifically curly braces isn't important, the important part is to have a marker both before and after the namespace.

The first marker makes it quick to check if a namespace is present
The second marker shows where the namespace ends and local name begins

var name = "{blue}bird"
var localName = null
var namespace = null
if (name[0] != "{") {
  // does not have namespace
  localName = name
} else {
  // has namespace
  var idx = name.indexOf("}") + 1
  var namespace = s[0...idx] // "{blue}"
  var localName = s[idx..-1] // "bird"
}

With this, the code for the api doesn't need to be changed, since all the names are strings, just as they are currently.

What will need to change is the parser and writer. They will need to track scope of namespaces, to know at any point what namespace-name corresponts to what namespace-value.

Minor bug in XElement.add(child)

Firstly, thanks for writing this which should be a useful addition to the Wren armory :)

When trying to add a list of children to an XElement, I noticed that there was a bug in line 706 of xsequence.wren where:

for (i in Sequence)

should, of course, be:

for (i in child)

Perhaps you could fix this when you have time.

Improve error info

The parse error messages could give better context information.

Currently they just do "unexpected 's' at line 2 col 3.

That could be improved with things like:

Was expecting '<' to start document node
'&' must be followed by a valid escape character name such as 'amp'
Expected the opening tag of attribute value following attribute name 'name'

Allow passing value to XElement constructor at the same time as attribute

This would allow one line constructors for an element like this:

<TextBlock Name="hello">Hello World</TextBlock>

Breaks with utf-8 bom

I was using this with DOME game engine, and encountered the issue that the utf-8 bom stops the parser from working. I have a fix that I will merge across soon.

Mixed content

Should support mixed content.

<letter>
  Dear Mr. <name>John Smith</name>.
  Your order <orderid>1032</orderid>
  will be shipped on <shipdate>2001-07-13</shipdate>.
</letter>

This probably means having an XText node, and element "Value" sets the nodes to be one of those?

Remove a hack from the doc gen

Instead of

#doc = "Description"
#args(arg1="Desc", arg2="desc")
method(arg1, arg2)

Where the args are in inconsistent order as dictionary keys.

#doc = "Description"
#arg(name=arg1, desc="Desc")
#arg(name=arg2, desc="Desc")
method(arg1, arg2)

Which the args are consistently in the declared order in lists.

It also gives the flexibility to add more things, maybe types options

#doc = "Description"
#arg(name=arg1, desc="Desc", type=String)
#arg(name=arg2, desc="Desc", type=Num)
method(arg1, arg2)

Error message tests

Make sure that the error messages are reporting the correct position in the file with tests.

XComment

Thanks for adding support for XML comments which is working great :)

Incidentally, in the example on the main page, you need to tag XComment onto the end of the import statement for it to work.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.