Giter Site home page Giter Site logo

libxml2's Introduction

libxml2

Interface to libxml2, with DOM interface.

Build Status

GoDoc

Index

Why?

I needed to write go-xmlsec. This means we need to build trees using libxml2, and then muck with it in xmlsec: Two separate packages in Go means we cannot (safely) pass around C.xmlFooPtr objects (also, you pay a penalty for pointer types). This package carefully avoid references to C.xmlFooPtr types and uses uintptr to pass data around, so other libraries that needs to interact with libxml2 can safely interact with it.

Status

  • This library should be considered alpha grade. API may still change.
  • Much of commonly used functionalities from libxml2 that I use are there already, and are known to be functional

Package Layout:

Name Description
libxml2 Globally available utility functions, such as ParseString
types Common data types, such as types.Node
parser Parser routines
dom DOM-like manipulation of XML document/nodes
xpath XPath related tools
xsd XML Schema related tools
clib Wrapper around C libxml2 library - DO NOT TOUCH IF UNSURE

Features

Create XML documents using DOM-like interface:

  d := dom.CreateDocument()
  e, err := d.CreateElement("foo")
  if err != nil {
    println(err)
    return
  }
  d.SetDocumentElement(e)
  ...

Parse documents:

  d, err := libxml2.ParseString(xmlstring)
  if err != nil {
    println(err)
    return
  }

Use XPath to extract node values:

  text := xpath.String(node.Find("//xpath/expression"))

Examples

Basic XML Example

import (
  "log"
  "net/http"

  "github.com/lestrrat-go/libxml2"
  "github.com/lestrrat-go/libxml2/parser"
  "github.com/lestrrat-go/libxml2/types"
  "github.com/lestrrat-go/libxml2/xpath"
)

func ExampleXML() {
  res, err := http.Get("http://blog.golang.org/feed.atom")
  if err != nil {
    panic("failed to get blog.golang.org: " + err.Error())
  }

  p := parser.New()
  doc, err := p.ParseReader(res.Body)
  defer res.Body.Close()

  if err != nil {
    panic("failed to parse XML: " + err.Error())
  }
  defer doc.Free()

  doc.Walk(func(n types.Node) error {
    log.Printf(n.NodeName())
    return nil
  })

  root, err := doc.DocumentElement()
  if err != nil {
    log.Printf("Failed to fetch document element: %s", err)
    return
  }

  ctx, err := xpath.NewContext(root)
  if err != nil {
    log.Printf("Failed to create xpath context: %s", err)
    return
  }
  defer ctx.Free()

  ctx.RegisterNS("atom", "http://www.w3.org/2005/Atom")
  title := xpath.String(ctx.Find("/atom:feed/atom:title/text()"))
  log.Printf("feed title = %s", title)
}

Basic HTML Example

func ExampleHTML() {
  res, err := http.Get("http://golang.org")
  if err != nil {
    panic("failed to get golang.org: " + err.Error())
  }

  doc, err := libxml2.ParseHTMLReader(res.Body)
  if err != nil {
    panic("failed to parse HTML: " + err.Error())
  }
  defer doc.Free()

  doc.Walk(func(n types.Node) error {
    log.Printf(n.NodeName())
    return nil
  })

  nodes := xpath.NodeList(doc.Find(`//div[@id="menu"]/a`))
  for i := 0; i < len(nodes); i++ {
    log.Printf("Found node: %s", nodes[i].NodeName())
  }
}

XSD Validation

import (
  "io/ioutil"
  "log"
  "os"
  "path/filepath"

  "github.com/lestrrat-go/libxml2"
  "github.com/lestrrat-go/libxml2/xsd"
)

func ExampleXSD() {
  xsdfile := filepath.Join("test", "xmldsig-core-schema.xsd")
  f, err := os.Open(xsdfile)
  if err != nil {
    log.Printf("failed to open file: %s", err)
    return
  }
  defer f.Close()

  buf, err := ioutil.ReadAll(f)
  if err != nil {
    log.Printf("failed to read file: %s", err)
    return
  }

  s, err := xsd.Parse(buf)
  if err != nil {
    log.Printf("failed to parse XSD: %s", err)
    return
  }
  defer s.Free()

  d, err := libxml2.ParseString(`<foo></foo>`)
  if err != nil {
    log.Printf("failed to parse XML: %s", err)
    return
  }
  defer d.Free()

  if err := s.Validate(d); err != nil {
    for _, e := range err.(xsd.SchemaValidationError).Errors() {
      log.Printf("error: %s", e.Error())
    }
    return
  }

  log.Printf("validation successful!")
}

Caveats

Other libraries

There exists many similar libraries. I want speed, I want DOM, and I want XPath.When all of these are met, I'd be happy to switch to another library.

For now my closest contender was xmlpath, but as of this writing it suffers in the speed (for xpath) area a bit:

shoebill% go test -v -run=none -benchmem -benchtime=5s -bench .
PASS
BenchmarkXmlpathXmlpath-4     500000         11737 ns/op         721 B/op          6 allocs/op
BenchmarkLibxml2Xmlpath-4    1000000          7627 ns/op         368 B/op         15 allocs/op
BenchmarkEncodingXMLDOM-4    2000000          4079 ns/op        4560 B/op          9 allocs/op
BenchmarkLibxml2DOM-4        1000000         11454 ns/op         264 B/op          7 allocs/op
ok      github.com/lestrrat-go/libxml2  37.597s

FAQ

"It won't build"

The very first thing you need to be aware is that this is a C binding to libxml2. You should understand how to build C programs, how to debug them, or at least be able to ask the right questions and deal with a great deal more than Go alone.

Having said that, the most common causes for build errors are:

  1. You have not installed libxml2 / You installed it incorrectly

The first one is obvious, but I get this a lot. You have to install libxml2. If you are installing via some sort of package manager like apt/apk, remember that you need to install the "development" files as well. The name of the package differs in each environment, but it's usually something like "libxml2-dev".

The second is more subtle, and tends to happen when you install your libxml2 in a non-standard location. This causes problems for other tools such as your C compiler or pkg-config. See more below

  1. Your header files are not in the search path

If you don't understand what header files are or how they work, this is where you should either look for your local C-guru, or study how these things work before filing an issue on this repository.

Your C compiler, which is invoked via Go, needs to be able to find the libxml2 header files. If you installed them in a non-standard location, for example, such as outside of /usr/include and /usr/local/include, you may have to configure them yourself.

How to configure them depends greatly on your environment, and again, if you don't understand how you can fix it, you should consult your local C-guru about it, not this repository.

  1. Your pkg-config files are not in the search path

If you don't understand what pkg-config does, this is where you should either look for your local sysadmin friend, or study how these things work before filing an issue on this repository.

pkg-config provides metadata about a installed components, such as build flags that are required. Go uses it to figure out how to build and link Go programs that needs to interact with things written in C.

However, pkg-config is merely a thin frontend to extract information from file(s) that each component provided upon installation. pkg-config itself needs to know where to find these files.

Make sure that the output of the following command contains libxml-2.0. If not, and you don't understand how to fix this yourself, you should consult your local sysadmin friend about it, not this repository

pkg-config --list-all

"Fatal error: 'libxml/HTMLparser.h' file not found"

See the first FAQ entry.

I can't statically link this module to libxml2

Use the static_build tag when building this module, for example:

go build -tags static_build

See Also

Credits

libxml2's People

Contributors

andy-miracl avatar bigshahan avatar chrisnovakovic avatar dobegor avatar emou avatar fiveside avatar galdor avatar gutweiler avatar hellodword avatar johnnybubonic avatar khasanovbi avatar lestrrat avatar matiasinsaurralde avatar mattn avatar samwhited avatar syohex avatar zapisanchez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

libxml2's Issues

Capturing parser errors programmatically

Hi,

I can currently see detailed errors generated by libxml in stdout, but I need to suppress this output and capture the errors programmatically. Is there a way to do this at present?

e.g. I want to capture error detail such as the following in the error object returning from Parse :

Entity: line 1: parser error : Start tag expected, '<' not found
invalid
^

OS X latest Xcode - fatal error: 'libxml/HTMLparser.h' file not found

Thank you so much for this package!

I am getting:

github.com/lestrrat/go-libxml2/clib

../github.com/lestrrat/go-libxml2/clib/clib.go:28:10: fatal error: 'libxml/HTMLparser.h' file not found
#include <libxml/HTMLparser.h>

on the latest Xcode 9.2
Max OS X 10.13.3

I don't get the error on Xcode 8.2.1
Max OS X 10.13.3

Any ideas?

the coding is error.

coding is error! the case is here.

test.html is in test.zip

test.zip

func TestCase(t *testing.T) {
	f, _ := os.Open("./test.html")
	data, _ := ioutil.ReadAll(f)

	doc, err := libxml2.ParseHTML(data)
	if err != nil {
		panic(err)
	}
	xr, err := doc.Find("//h1[ contains(@class, 'MovieTitle__Title')]")
	if err != nil {
		panic(nil)
	}
	t.Error(xr)
}

SetNamespace does not work for default namespace

Lets say I want to build the following xml:

<pfx:root xmlns:pfx="http://some.uri">
  <elem xmlns="http://other.uri"/>
</pfx:root>

With go-libxml2 I would do this:

package main
import "github.com/lestrrat/go-libxml2/dom"

func main() {
    d := dom.CreateDocument()
    r, _ := d.CreateElement("root")
    r.SetNamespace("http://some.uri", "pfx", true)
    d.SetDocumentElement(r)
    e, _ := d.CreateElement("elem")
    e.SetNamespace("http://other.uri", "", true)
    r.AddChild(e)
    println(d.ToString(1, true))
}   

But that does not produce the expected xml, instead I get the following which is missing the xmlns declaration on <elem>

<pfx:root xmlns:pfx="http://some.uri">
  <elem/>
</pfx:root>

If I write the equivalent code in Perl with XML::LibXML that go-libxml2 is inspired by I get the correct result with a xmlns declaration on <elem>

#!/usr/bin/perl
use XML::LibXML;

$d = XML::LibXML::Document->new();
$r = $d->createElement('root');
$r->setNamespace('http://some.uri', 'pfx', 1);
$d->setDocumentElement($r);
$e = $d->createElement('elem');
$e->setNamespace('http://other.uri', '', 1);
$r->addChild($e);
print $d->toString(1, 1);

XSD validation XML path

I saw a previous issue regarding getting the line number of the error when doing XSD validation. Since that doesn't seem to be possible, is there a way we can get the XML path of element with the error? That will help with troubleshooting XML errors.

Possible problem with the example

On schema validation example, there is this code:

d, err := libxml2.ParseString(`<foo></foo>`)
  if err != nil {
    log.Printf("failed to parse XML: %s", err)
    return
}

Should we call defer d.Free() after it?

XML_PARSE_RECOVER is ignored when using xmlParseDocument

Here's the xml file I'm trying to parse:

<?xml version=1.0?>
<rootnode>
    <greeting>Hello</greeting>
    <goodbye>Goodbye!</goodbye>
</rootnode>

The xml version declaration here is invalid, there should be quotes around the 1.0 in order to make it valid. libxml2 will continue attempting to parse anyway if you use the XML_PARSE_RECOVER parser flag. However, when I pass this flag to go-libxml2, the flag is ignored and the parse fails. I tracked down the problem a bit and it looks like the problem is with xmlParseDocument. Here's a simple C program to demonstrate the issue:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <libxml/xmlreader.h>
#include <libxml/parserInternals.h>

void fails(xmlParserCtxtPtr ctx, char *xmlfile) {
    int success = xmlParseDocument(ctx);
    if (success == 0) {
        printf("Successfully parsed document using xmlParseDocument\n");
    } else {
        printf("Failed to parse document using xmlParseDocument\n");
    }
}

void works(xmlParserCtxtPtr ctx, char *xmlfile, int options) {
    xmlDoc *doc = xmlCtxtReadMemory(ctx, xmlfile, strlen(xmlfile), NULL, NULL, options);
    if (doc == NULL) {
        printf("Failed to parse document using xmlCtxtReadMemory\n");
    } else {
        printf("Successfully parsed document using xmlCtxtReadMemory\n");
    }
}

int main(int argc, char **argv) {
    LIBXML_TEST_VERSION;
    char *xmlfile = "<?xml version=1.0?><rootnode><greeting>Hello</greeting><goodbye>Goodbye!</goodbye></rootnode>";

    int options =  XML_PARSE_RECOVER;
    xmlParserCtxtPtr ctx = xmlCreateMemoryParserCtxt(xmlfile, strlen(xmlfile));
    xmlCtxtUseOptions(ctx, options);

    works(ctx, xmlfile, options);
    fails(ctx, xmlfile);
}

And its output:

Entity: line 1: parser error : String not started expecting ' or "
<?xml version=1.0?><rootnode><greeting>Hello</greeting><goodbye>Goodbye!</goodby
              ^
Entity: line 1: parser error : Malformed declaration expecting version
<?xml version=1.0?><rootnode><greeting>Hello</greeting><goodbye>Goodbye!</goodby
              ^
Entity: line 1: parser error : Blank needed here
<?xml version=1.0?><rootnode><greeting>Hello</greeting><goodbye>Goodbye!</goodby
              ^
Entity: line 1: parser error : parsing XML declaration: '?>' expected
<?xml version=1.0?><rootnode><greeting>Hello</greeting><goodbye>Goodbye!</goodby
              ^
Successfully parsed document using xmlCtxtReadMemory
Failed to parse document using xmlParseDocument

There seems to be some error cacheing going inside libxml2 because those errors (normally squelched by XML_PARSE_NOERROR and XML_PARSE_NOWARNING only show up once, regardless of which method (or both) is used.

Alternatively, you could do away with the xmlCreateMemoryParserCtxt call entirely and just use xmlReadMemory

node.String() don't return pair of html tag when htmlCode with scirpt tag.

func TestNodeStringWithScriptTag(t *testing.T){
    scirptTag:=`<script type="text/x-template" title="searchResultsGrid">
            <table class="aui">
                <thead>
                <tr class="header">
                    <th class="search-result-title">Page Title</th>
                    <th class="search-result-space">Space</th>
                    <th class="search-result-date">Updated</th>
                </tr>
                </thead>
            </table>
        </script>`

    doc, err := ParseHTMLString(scirptTag)
    if !assert.NoError(t, err, "ParseHTMLString should succeed") {
        return
    }

    nodes := xpath.NodeList(doc.Find(`.//script`))
    if !assert.NotEmpty(t, nodes, "Xpath Find should succeed") {
        return
    }

    v:= nodes.String()

    if !assert.NotEmpty(t, v, "Literal() should return some string") {
        return
    }
    if !assert.Equal(t,scirptTag,v, "String() and   var scirptTag   should equal") {
        return
    }
    t.Logf("v = '%s'", v)
}

nodes.String() lost below tags

  </th>       </tr>     </thead>    </table>

i had forked and add a test file here:
wxf4150@27db593

Large XML - out of memory - C14NSerialize

I try to parse and canonize quite quite big XML - about 100 Mb.

Code is simple
func _ProcessCanonicalizationC14n(inputXML []byte, transformXML string, withComments bool) ([]byte, error) {
doc, err := libxml2.Parse(inputXML)
if doc != nil {
defer doc.Free()
}
if err != nil {
return nil, myerror.New("4444", errM, "libxml2.Parse()", "")
}
outputXML, err := dom.C14NSerialize{Mode: dom.C14NExclusive1_0, WithComments: withComments}.Serialize(doc)
if err != nil {
return nil, myerror.New("4444", errM, "dom.C14NSerialize()", "")
}
return []byte(outputXML), nil
}

When I try to bench this code in parallel (8 core), after about 100 repetition a have always got an error
FAIL
parser error : Memory allocation failed
C14N error : Memory allocation failed : coping canonicanized document
internal buffer error : Memory allocation failed : growing buffer

Benchmark code:
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
_, err := _ProcessCanonicalizationC14n(testXML, "", false)
if err != nil {
b.Errorf("\n _ProcessCanonicalizationC14n error: %v", fmt.Sprintf("%+v", err))
return
}
}
})

static build problems

When building with
CGO_ENABLED=0

I'm getting

vendor/github.com/lestrrat-go/libxml2/xpath/xpath.go:40:30: undefined: clib.XMLXPathObjectType
vendor/github.com/lestrrat-go/libxml2/xpath/xpath.go:45:9: undefined: clib.XMLXPathObjectFloat64
vendor/github.com/lestrrat-go/libxml2/xpath/xpath.go:50:9: undefined: clib.XMLXPathObjectBool
vendor/github.com/lestrrat-go/libxml2/xpath/xpath.go:74:13: undefined: clib.XMLXPathObjectNodeList
vendor/github.com/lestrrat-go/libxml2/xpath/xpath.go:91:13: undefined: clib.XMLXPathObjectNodeList
vendor/github.com/lestrrat-go/libxml2/xpath/xpath.go:124:2: undefined: clib.XMLXPathFreeObject
vendor/github.com/lestrrat-go/libxml2/xpath/xpath.go:129:14: undefined: clib.XMLXPathCompile
vendor/github.com/lestrrat-go/libxml2/xpath/xpath.go:149:2: undefined: clib.XMLXPathFreeCompExpr
vendor/github.com/lestrrat-go/libxml2/xpath/xpath.go:163:17: undefined: clib.XMLXPathNewContext
vendor/github.com/lestrrat-go/libxml2/xpath/xpath.go:179:9: undefined: clib.XMLXPathContextSetContextNode
vendor/github.com/lestrrat-go/libxml2/xpath/xpath.go:179:9: too many errors

github.com/18F/e-QIP-prototype/api/vendor/github.com/lestrrat-go/libxml2/xsd

vendor/github.com/lestrrat-go/libxml2/xsd/xsd.go:29:15: undefined: clib.XMLSchemaParse
vendor/github.com/lestrrat-go/libxml2/xsd/xsd.go:40:15: undefined: clib.XMLSchemaParseFromFile
vendor/github.com/lestrrat-go/libxml2/xsd/xsd.go:55:12: undefined: clib.XMLSchemaFree
vendor/github.com/lestrrat-go/libxml2/xsd/xsd.go:65:10: undefined: clib.XMLSchemaValidateDocument

Missing pkg-config on OSX

Trying to run this on a mac gives:

$ go get github.com/lestrrat/go-libxml2
# pkg-config --cflags libxml-2.0
pkg-config: exec: "pkg-config": executable file not found in $PATH

so you have to install pkg-config:

$ brew install pkg-config

Trouble installing this package

Hey there,

I am running MacOS, Go 1.7, running "go get github.com/lestrrat/go-libxml2" I get

'libxml/HTMLparser.h' file not found

include <libxml/HTMLparser.h>

I have done sudo brew install libxml2 and etc

I have made an alias to "libxml" inside my /usr/local/include/

I can't seem to figure out how to check where the default include path is when using go get, I can compile fine under GCC (not the Go library though)

I have worked with libxml in C before and it has worked, but I can't figure it out using Go; any help would be appreciated.

HOMEBREW_VERSION: 0.9.9
ORIGIN: https://github.com/Homebrew/brew
HEAD: 7c7e2d00af9d8a05f4736c368804713fba61a422
Last commit: 9 hours ago
Core tap ORIGIN: https://github.com/Homebrew/homebrew-core
Core tap HEAD: 12f6a5e8a13f05bb045638deb0032882012af514
Core tap last commit: 10 hours ago
HOMEBREW_PREFIX: /usr/local
HOMEBREW_REPOSITORY: /usr/local
HOMEBREW_CELLAR: /usr/local/Cellar
HOMEBREW_BOTTLE_DOMAIN: https://homebrew.bintray.com
CPU: octa-core 64-bit haswell
Homebrew Ruby: 2.0.0-p648
Clang: 7.3 build 703
Git: 2.6.4 => /Applications/Xcode.app/Contents/Developer/usr/bin/git
Perl: /usr/bin/perl
Python: /usr/bin/python
Ruby: /usr/bin/ruby => /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/bin/ruby
Java: 1.8.0_45, 1.7.0_79
OS X: 10.11.5-x86_64
Xcode: 7.3
CLT: N/A
X11: 2.7.7 => /opt/X11

Edit:

It seems like Go expects my include path to be /usr/include/ - GCC was fine.

It wouldn't accept it as an alias so I had to copy the whole of /usr/local/include/ into /usr/include/

PS before copying my include folder into /usr/include - I tried go get -gccgoflags -I/usr/local/include,-I/usr/local/include/libxml2 ... and had the same issue.

Do you happen to know if I can reconfigure go to look in /usr/local/include instead ? Apart from that - library works great now 👯

[Feature Request] Populate Defaults from Schema

XML Schema supports defaults in certain contexts. e.g.

<!-- ... -->
<xs:attribute name="someAttr" type="xs:string" default="this attribute was not specified" use="optional"/>
<!-- ... -->

With the above example, let's say we have a document:

<foo>
  <bar someAttr="this is populated"/>
  <bar/>
</foo>

This is possible with python lxml's etree.XMLParser() (which uses libxml2) (example here). (source for lxml.etree is here).

I think it'd be a really handy thing to have.

(Along similar lines, any ideas on how to strip namespaces from documents? I can do it with lxml like this but I wasn't sure what struct fields there are in each element and whether the tag was able to be changed on the object or not. I'm still really new to golang and non-OOP, so my apologies if there's an obvious way to find this out. WHOA, I'm a dummy. Looks like it automatically separates out "Space" vs. "Local", at least when I unmarshal and marshal to JSON. That's pretty cool.

Using catalog files

Hi I found your library recently. I need to validate xml (OGC CSW) in golang using multiple xsd. The xmllint tool allows me to do so but I'd like to use something more integrated with the go language.
Can you provide me some tips on how to do it with your library ? Maybe I will work on top of it

Literal() method will return" unknown node" with style tag

nodes, err := doc.Find(xpath) ///html/body/div/div
if err != nil {
    panic("failed to evaluate xpath: " + err.Error())
}
nlist:=nodes.NodeList()
if len(nlist)>0{
    v,err:=nodes.NodeList().Literal()
    if err!=nil{
        fmt.Println("Literal:",err)
    }else{
        *str=v
    }
}

i get the error "Literal: unknown node".
when use NodeList().NodeValue or NodeList().String() will retrun the xml not the html content.

is this a bug?

ubuntu 15.10 go1.5.2 libxml2.9.2

Cross compile from macos

Ciao,

when I try to cross-compile with this env from macos,

GOARCH=amd64
GOOS=freebsd
CGO_ENABLED=0

PATH=$GOROOT/bin:$PATH:/usr/local/bin

get this errors:

# github.com/lestrrat-go/libxml2/xsd
../../../github.com/lestrrat-go/libxml2/xsd/xsd.go:29:15: undefined: clib.XMLSchemaParse
../../../github.com/lestrrat-go/libxml2/xsd/xsd.go:40:15: undefined: clib.XMLSchemaParseFromFile
../../../github.com/lestrrat-go/libxml2/xsd/xsd.go:55:12: undefined: clib.XMLSchemaFree
../../../github.com/lestrrat-go/libxml2/xsd/xsd.go:65:10: undefined: clib.XMLSchemaValidateDocument
# github.com/lestrrat-go/libxml2/xpath
../../../github.com/lestrrat-go/libxml2/xpath/xpath.go:40:30: undefined: clib.XMLXPathObjectType
../../../github.com/lestrrat-go/libxml2/xpath/xpath.go:45:9: undefined: clib.XMLXPathObjectFloat64
../../../github.com/lestrrat-go/libxml2/xpath/xpath.go:50:9: undefined: clib.XMLXPathObjectBool
../../../github.com/lestrrat-go/libxml2/xpath/xpath.go:74:13: undefined: clib.XMLXPathObjectNodeList
../../../github.com/lestrrat-go/libxml2/xpath/xpath.go:91:13: undefined: clib.XMLXPathObjectNodeList
../../../github.com/lestrrat-go/libxml2/xpath/xpath.go:124:2: undefined: clib.XMLXPathFreeObject
../../../github.com/lestrrat-go/libxml2/xpath/xpath.go:129:14: undefined: clib.XMLXPathCompile
../../../github.com/lestrrat-go/libxml2/xpath/xpath.go:149:2: undefined: clib.XMLXPathFreeCompExpr
../../../github.com/lestrrat-go/libxml2/xpath/xpath.go:163:17: undefined: clib.XMLXPathNewContext
../../../github.com/lestrrat-go/libxml2/xpath/xpath.go:179:9: undefined: clib.XMLXPathContextSetContextNode
../../../github.com/lestrrat-go/libxml2/xpath/xpath.go:179:9: too many errors

with this env:

GOARCH=amd64
GOOS=freebsd
CGO_ENABLED=1

PATH=$GOROOT/bin:$PATH:/usr/local/bin

get this errors:

# runtime/cgo
gcc_freebsd_amd64.c:46:2: error: implicit declaration of function 'SIGFILLSET' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
gcc_freebsd_amd64.c:46:13: error: variable 'ign' is uninitialized when used here [-Werror,-Wuninitialized]
gcc_freebsd_amd64.c:41:14: note: initialize the variable 'ign' to silence this warning

Any idea?

T.I.A. Franco

Unhandled exception

Hi!
Please, add handler for this exception:

[signal SIGSEGV: segmentation violation code=0x2 addr=0x7f18bcdfb049 pc=0xa6c9d0]

runtime stack:
runtime.throw(0xbdce18, 0x2a)
        /usr/local/go/src/runtime/panic.go:605 +0x95
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:351 +0x2b8

goroutine 236 [syscall, locked to thread]:
runtime.cgocall(0xa6bc50, 0xc4207bc578, 0xc400a6c880)
        /usr/local/go/src/runtime/cgocall.go:132 +0xe4 fp=0xc4207bc548 sp=0xc4207bc508 pc=0x40ee34
github.com/lestrrat/go-libxml2/clib._Cfunc_xmlSchemaValidateDoc(0x7f18801a2510, 0x7f188c52fae0, 0x0)
        github.com/lestrrat/go-libxml2/clib/_obj/_cgo_gotypes.go:1458 +0x4d fp=0xc4207bc578 sp=0xc4207bc548 pc=0x8c12a
d
github.com/lestrrat/go-libxml2/clib.XMLSchemaValidateDocument.func5(0x7f18801a2510, 0x7f188c52fae0, 0x7f1880190d70)
        /srv/rrload/src/github.com/lestrrat/go-libxml2/clib/clib.go:2122 +0xa3 fp=0xc4207bc5b0 sp=0xc4207bc578 pc=0x8c
dab3
github.com/lestrrat/go-libxml2/clib.XMLSchemaValidateDocument(0x114f6a0, 0xc4207efaf0, 0x114f460, 0xc420674010, 0x0, 0
x0, 0x0)
        /srv/rrload/src/github.com/lestrrat/go-libxml2/clib/clib.go:2122 +0x2a8 fp=0xc4207bc668 sp=0xc4207bc5b0 pc=0x8
c9d48
github.com/lestrrat/go-libxml2/xsd.(*Schema).Validate(0xc4207efaf0, 0x115f480, 0xc420674010, 0xc4207efaf0, 0x0)
        /srv/rrload/src/github.com/lestrrat/go-libxml2/xsd/xsd.go:54 +0x73 fp=0xc4207bc6c8 sp=0xc4207bc668 pc=0x979503
gitlab.*****.ru/EKryukov/registry-fileserver/app/jobs.validateXSD(0xbbd458, 0x3, 0x115f480, 0xc420674010, 0xc42062
e870, 0x4e, 0x1, 0x0, 0x0, 0x0, ...)```

ERROR pkg-config --cflags -- libxml-2.0

When execute go run main.do,console show this:

# pkg-config --cflags  -- libxml-2.0
pkg-config: exec: "pkg-config": executable file not found in $PATH

  1. MacOS Catalina 10.15.7;
  2. Alredy installed libxml2, use brew install libxml2;
  3. Other golang program can be run, business as usual;

What should I do?

Relative includes and HTTP(S) includes

Similar to #50, I also use imports in my shcema ("include"s).

However, I use relative imports (where the "root" schema may be at /foo/bar/baz.xsd, and may include /foo/bar/baz2.xsd via <xsi:include schemaLocation="./bar2.xsd"/>, /foo/quux/bar.xsd via <xsi:include schemaLocation="../quux/bar.xsd"/>, etc.).

it seems that relative includes are not supported currently. Typically this is done in other language library implementations via a baseURI.

I propose that if we can specify a baseURI, relative includes would then be possible by resolving the baseURI + relative path = absolute path. Note also that this would allow for improved handling of recursive includes (as each included instance would then have its own baseURI as well).

Along those lines, parsing from an HTTP URL would be recommended and desired as well, as libxml2's xmllint does handle both remote URIs (HTTP only, not HTTPS) and local URIs, with relative includes:

$ xmllint -noout -schema http://schema.xml.r00t2.io/projects/aif.xsd aif.xml 
aif.xml validates
$ grep schemaLocation aif.xml 
     xsi:schemaLocation="https://aif-ng.io/ http://schema.xml.r00t2.io/projects/aif.xsd"
$ curl -sL "http://schema.xml.r00t2.io/projects/aif.xsd" | grep 'xs:include'
    <xs:include schemaLocation="../lib/types/aif.xsd"/>
$ curl -sL "http://schema.xml.r00t2.io/lib/types/aif.xsd" | grep 'xs:include'
    <xs:include schemaLocation="./gpg.xsd"/>
    <!-- <xs:include schemaLocation="./linux.xsd"/> --><!-- Included by the linux elements XSD. -->
    <xs:include schemaLocation="./net.xsd"/>
    <xs:include schemaLocation="./std.xsd"/>
    <xs:include schemaLocation="./unix.xsd"/>
    <xs:include schemaLocation="../elements/linux.xsd"/>
# (etc.)

cannot use nptr.children

# github.com/lestrrat/go-libxml2/clib
../github.com/lestrrat/go-libxml2/clib/clib.go:1034: cannot use nptr.children (type *_Ctype_struct__xmlNode) as type *_Ctype_xmlNode in function argument
../github.com/lestrrat/go-libxml2/clib/clib.go:1037: cannot use _Cfunc_xmlNewText(cvalue) (type C.xmlNodePtr) as type *_Ctype_struct__xmlNode in assignment
../github.com/lestrrat/go-libxml2/clib/clib.go:1038: cannot use nptr (type *_Ctype_xmlNode) as type *_Ctype_struct__xmlNode in assignment
../github.com/lestrrat/go-libxml2/clib/clib.go:1062: cannot use nptr.doc (type *_Ctype_struct__xmlDoc) as type *_Ctype_xmlDoc in function argument
../github.com/lestrrat/go-libxml2/clib/clib.go:1065: cannot use nptr.doc (type *_Ctype_struct__xmlDoc) as type *_Ctype_xmlDoc in function argument
../github.com/lestrrat/go-libxml2/clib/clib.go:1083: cannot use nptr.doc (type *_Ctype_struct__xmlDoc) as type *_Ctype_xmlDoc in function argument
../github.com/lestrrat/go-libxml2/clib/clib.go:1103: cannot use nptr.doc (type *_Ctype_struct__xmlDoc) as type *_Ctype_xmlDoc in function argument
../github.com/lestrrat/go-libxml2/clib/clib.go:1502: cannot use ns.next (type *_Ctype_struct__xmlNs) as type *_Ctype_xmlNs in assignment
../github.com/lestrrat/go-libxml2/clib/clib.go:1526: cannot use nsdef.next (type *_Ctype_struct__xmlNs) as type *_Ctype_xmlNs in assignment
../github.com/lestrrat/go-libxml2/clib/clib.go:1552: cannot use nptr.doc (type *_Ctype_struct__xmlDoc) as type *_Ctype_xmlDoc in function argument

ubuntu 14.04 .
i have installed libxml2
~$ pkg-config --cflags --libs libxml-2.0
~$ -I/usr/include/libxml2 -lxml2

About HTMLParser encoding

1、The html charset such asutf-8/gbk/gb2312, if html charset is gbk or gb2312, terminal show this: encoding error : input conversion failed due to input error, bytes 0x20 0xE9 0x95 0xBF.

I try use doc, err := libxml2.ParseHTML(content, parser.HTMLParseIgnoreEnc), but the results were even worse.

What do I need to do to make the program work, to parse the html where charset is gbk or gb2312???


2、After reading libxml2 source code, i see this:

# this code from ParseHTMLString(content string, options ...parser.HTMLOption) (types.Document, error) 
docptr, err := clib.HTMLReadDoc(content, "", "", int(option))

Why not set encoding="utf8", but set encoding=""?

Parse HTML get attributes

Hi,
i'm look for a way to get attribute/ parameter from a html node. As described in #4 get nodes with nodes := xpath.NodeList(node.Find("/your/xpath")) but i can't find any way to get parameter form these nodes.

I can parse node.String() to get parameter but is there something i have missed to get it directly?

Thanks

docker build found clib undefined

I'm not sure if it was caused by libxml-2.0, but there was an error prompt when building:

github.com/lestrrat-go/[email protected]/xpath/xpath.go:179:9: undefined: clib.XMLXPathContextSetContextNode
github.com/lestrrat-go/[email protected]/xpath/xpath.go:149:2: undefined: clib.XMLXPathFreeCompExpr
github.com/lestrrat-go/[email protected]/xpath/xpath.go:91:13: undefined: clib.XMLXPathObjectNodeList
too many errors

When i execute pkg-config --list-all (on docker contain), terminal output:

pkg-config --list-all
systemd          systemd - systemd System and Service Manager
icu-io           icu-io - International Components for Unicode: Stream and I/O Library
icu-i18n         icu-i18n - International Components for Unicode: Internationalization library
icu-uc           icu-uc - International Components for Unicode: Common and Data libraries
libxml-2.0       libXML - libXML library version2.
icu-lx           icu-lx - International Components for Unicode: Paragraph Layout library
shared-mime-info shared-mime-info - Freedesktop common MIME database
icu-le           icu-le - International Components for Unicode: Layout library

The libxml-2.0 was installed.

My dockerfile in here:

FROM ubuntu:16.04 as builder
MAINTAINER asyncins
RUN sed -i 's/archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list
RUN sed -i 's/security.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list
RUN apt-get update -y
RUN apt-get install gcc g++ libxml2 libxml2-dev libxslt-dev wget pkg-config -y
RUN wget https://dl.google.com/go/go1.15.6.linux-amd64.tar.gz
RUN tar -C /usr/lib/ -xzf go1.15.6.linux-amd64.tar.gz 
ENV GOROOT=/usr/lib/go
ENV PATH=$PATH:/usr/lib/go/bin
ENV GOPATH=/root/go
ENV PATH=$GOPATH/bin/:$PATH
ENV GO111MODULE=on \
    GOPROXY=https://goproxy.cn,direct

ADD . /go/src/application
WORKDIR /go/src/application
RUN go get all

RUN GOOS=linux GOARCH=386 go build -o MyApp ./main.go

FROM alpine
COPY --from=builder /go/src/application/main /usr/local/bin/MAP

CMD ["MAP"]

can you help me?

How to compile with CGO_ENABLED=0 flag?

$CGO_ENABLED=0 GOOS=linux go build

../../vendor/github.com/lestrrat/go-libxml2/xpath/xpath.go:40: undefined: clib.XMLXPathObjectType
../../vendor/github.com/lestrrat/go-libxml2/xpath/xpath.go:45: undefined: clib.XMLXPathObjectFloat64
../../vendor/github.com/lestrrat/go-libxml2/xpath/xpath.go:50: undefined: clib.XMLXPathObjectBool
...

CGO_ENABLED=0 needed for building alpine-based docker container.

Is there any solution?

HTMLparser.h' file not found

# github.com/lestrrat-go/libxml2/clib
../../../../github.com/lestrrat-go/libxml2/clib/clib.go:28:10: fatal error: 'libxml/HTMLparser.h' file not found
#include <libxml/HTMLparser.h>
         ^~~~~~~~~~~~~~~~~~~~~
1 error generated.

What should I do

The Walk method issue

I tried to Walk all the nodes of a Document starting from root, print all node names, and attributes for the Element nodes. There's however a problem:

root.Walk(func(n types.Node) error {
		fmt.Fprintln(os.Stdout, n.NodeName())
		if n.NodeType() == dom.ElementNode { 
			e := n.(types.Element)
			attrs, err := e.Attributes()
			if err != nil {
				return err
			}
			for _, a := range attrs {
				fmt.Fprintln(os.Stdout, "\t"+a.NodeName())
			}
		}
		return nil
	})

panics at runtime with panic: interface conversion: *dom.XMLNode is not types.Element: missing method AppendText. I've also tried e := n.(*dom.Element) only to get panic: interface conversion: types.Node is *dom.XMLNode, not *dom.Element.

I've debugged a bit and it seems that the receiver gets converted to *dom.XMLNode because of the Walk method receiver type. I can work this around by calling doc.Walk... but it won't work if I wanted to start from an arbitrary node.

Sorry I won't fix it myself but I've just started with go and I don't feel like I could do it right.

disable Ctrl+C

Without errors. This, only deactivate "Ctrl + C" in the console and I must close the console and open the console, and restart the server and "this is very tedious".
image

go get github.com/lestrrat/go-xmlsec returning error

go get github.com/lestrrat/go-xmlsec
# github.com/lestrrat/go-libxml2/xpath
..\..\github.com\lestrrat\go-libxml2\xpath\xpath.go:40: undefined: clib.XMLXPathObjectType
..\..\github.com\lestrrat\go-libxml2\xpath\xpath.go:45: undefined: clib.XMLXPathObjectFloat64
..\..\github.com\lestrrat\go-libxml2\xpath\xpath.go:50: undefined: clib.XMLXPathObjectBool
..\..\github.com\lestrrat\go-libxml2\xpath\xpath.go:74: undefined: clib.XMLXPathObjectNodeList
..\..\github.com\lestrrat\go-libxml2\xpath\xpath.go:91: undefined: clib.XMLXPathObjectNodeList
..\..\github.com\lestrrat\go-libxml2\xpath\xpath.go:124: undefined: clib.XMLXPathFreeObject
..\..\github.com\lestrrat\go-libxml2\xpath\xpath.go:129: undefined: clib.XMLXPathCompile
..\..\github.com\lestrrat\go-libxml2\xpath\xpath.go:149: undefined: clib.XMLXPathFreeCompExpr
..\..\github.com\lestrrat\go-libxml2\xpath\xpath.go:163: undefined: clib.XMLXPathNewContext
..\..\github.com\lestrrat\go-libxml2\xpath\xpath.go:179: undefined: clib.XMLXPathContextSetContextNode
..\..\github.com\lestrrat\go-libxml2\xpath\xpath.go:179: too many errors

Bug in LastChild()

go-libxml2/dom/node.go
line 144-150 LastChild() call wrong clib function

Get namespaces for node

I see that the Element type has a method to GetNamespaces(), but this method isn't available on a Node type. How can I traverse a document and discover the namespaces on each node?

For example:

root, err := doc.DocumentElement()
fmt.Println(root.NodeType())     // ElementNode
root.GetNamespaces()   // node.GetNamespaces undefined (type types.Node has no field or method GetNamespaces)

I'm at a bit of a loss for converting the root node (or any node) into something that I can call get namespaces on...

How to remove node

</html>
<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>

Hello ~ I've come to learn from you again.

I want to remove h1 node. the code shuold be...?

XSD Validation Line Number

Hi,

Is there a way to get the line numbers of the errors in the XML file when validating against an XSD?

Thanks,
Charles

geting result of xpath function

How one is supposed to get result of the xpath function, ie

var doc *xpath.Context
...
v, err := doc.Find(`concat("foo", "bar")`)
fmt.Printf("%s %v", v.String(), v.Type())

outputs xpath.Object XPathString which is "expected" in a way as per documentation

String returns the stringified value of the nodes included in this Object. If the Object is anything other than a NodeSet, then we fallback to using fmt.Sprintf to generate some sort of readable output

The result of the query is XPathString not NodeSet... so how one is supposed to get the string value out of it (ie "foobar")?
And why is the current behaviour to not process anything but NodeSet?

Problem with using Find method on Node

Hello, I have following situation.
Here you can see working example:

  1. Getting successfully parsed document
  2. Extracting list of nodes by xpath expression
  3. Parsing string of ODNode into new document and getting nodes by Find method.
DocumentPtr := *DocumentPtrRaw.(*types.Document)
odNodes := xpath.NodeList(DocumentPtr.Find("//*[local-name()='OriginDestination']"))

for _, ODNode := range odNodes {
    // Didn't succeed to use find on main document pointer so parsed xml part again. Very strange error
    doc, _ := libxml2.ParseString(ODNode.String())
    nodes := xpath.NodeList(doc.Find("./*/AirportCode"))
    codes := []string{}
    for _, airportCodeNode := range nodes {
        codes = append(codes, airportCodeNode.NodeValue())
    }
    ODList = append(ODList, codes)
}

Before it I tried to use ODNode.Find("./*/AirportCode") statement but it doesn't work. Could you help me get idea why it's not working? I found that inside both .Find methods finally use same Node.Find method.. but ODNode.Find didn't work in any ways except ODNode.Find(".")

Is it bug or I'm doing something wrong?

Throws error while debugging

OK, So after many hurdles I got this library working in my windows and was able to host it in my docker too but the new challenge which I am facing is now :

  1. I cannot debug my code. It gives some dependencies error -

github.com/lestrrat-go/libxml2/clib In file included from C:/msys64/usr/include/libxml2/libxml/parser.h:810:0, from C:/msys64/usr/include/libxml2/libxml/HTMLparser.h:16, from ....\github.com\lestrrat-go\libxml2\clib\clib.go:28: C:/msys64/usr/include/libxml2/libxml/encoding.h:28:19: fatal error: iconv.h: No such file or directory compilation terminated. exit status 2 Process exiting with code: 1

  1. Ctrl plus C is not working to terminate the terminal session. It is very annoying.

Using GC hooks to free memory

The current pattern that go-libxml2 advertises involves manually freeing memory after we're done using it. Can we use object finalizers to free up allocated memory instead? I've taken a quick look at converting the library to do so, but it looks there was a conscious effort to allow external access to the low level clib stuff.

Does automatic garbage collection still make sense given the public access to the low level clib interface?

How to remove a node?

When i want remove a node:

parent, err := node.ParentNode()
if err != nil {
	return err
}
parent.RemoveChild(node)
return nil

1 - This way, the memory will continue to rise until the oom(the memory usage 20M,150M,800M,2G,7G).

2 - This happens in the for i:=0;i<10000;i++{},such as consumer from rabbitmq or kafka.

3 - If use node.Free(), the remove will be invalid,but memory usage doesn't rise.

How can I write code to remove a node???

Please help me, thanks.

Memory leak in doc.Find()/ctx.Find()?

Hello!

This is a minimal repro I came up with. Basically, doing a Find() once for a brand new document, repeated many times, would leak memory. I tried with doc.Find() as well as ctx.Find() with the same result.

package main

import (
	"github.com/lestrrat/go-libxml2/parser"
)

func main() {
	data := []byte("<parent><child>The quick brown fox jumps over the lazy dog</child></parent>")
	p := parser.New()

	for i := 0; i < 10000000; i++ {
		_ = read(p, data)
	}
}

func read(p *parser.Parser, data []byte) string {
	doc, _ := p.Parse(data)
	defer doc.Free()

	result, _ := doc.Find("//child/text()")
	value := result.String()
	result.Free()
	return value
}

Memory usage keeps growing without going down. Am I missing a .Free() somewhere? If I don't do Find(), it does not leak. Ideas?

XPath overhaul

  • FindValue() isn't doing what I want it to do
  • There are a few things that I punted before and had forgotten to fix

This issue is here mainly to remind others that I am aware of the problem, and am thinking/working on it.

SAML v2.0 schema hangs in C.xmlSchemaParse()

I am trying to use the SAML v2.0 schema for validation e.g.

protocolSchema, err = xsd.ParseFromFile("http://docs.oasis-open.org/security/saml/v2.0/saml-schema-protocol-2.0.xsd")

Unfortunately the function hangs when calling C.xmlSchemaParse. Any idea why this is (some circularity perhaps?), or is there a workaround?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.