Comments (8)
Based on some quick research, I don't think this is a bug in PyXB. PyXB is intended to operate on XML documents that are validated against XML schemas. XAML is an XML-based language which uses a different validation semantics, in particular allowing individual documents to change what namespaces are validated. PyXB is not an XAML processor and won't ignore those namespaces, so you will get validation errors if they are referenced but not validatable.
You might first convert the document to DOM format, then run a preprocessing step that removes elements and attributes that have a prefix that appears in an {http://schemas.openxmlformats.org/markup-compatibility/2006}Ignorable
attribute. PyXB should be able to process what's left.
That the exception this produces doesn't have a nice text representation is a reasonable complaint, though. I've added that as issue #31.
from pyxb.
Hi Peter,
I appreciate the thorough response. It looks like I can just extend the default SAX handler used by PyXB to filter out these elements and attributes. I don't quite have a working implementation yet, but I'll post it when I'm done.
Thanks,
-Kyle
from pyxb.
Hi Peter,
I have a SAX handler that overrides the default PyXB SAX handler to strip out the ignorable attributes. This appears to cause PyXB to raise ContentNondeterminismExceededError: Nondeterminism exceeded validating
. The code in Configuration.candidateTransitions
and AutomatonConfiguration.step
is fairly complex, so I am struggling to resolve the issue. If there's any pointers, advice or references you could share I would appreciate it.
I prefer to avoid having to pre-process the XML before passing it to PyXB.
Thanks,
-Kyle
from pyxb.
"Override" or "extend"? You probably shouldn't discard PyXB's SAX handler in favor of your own, but you could subclass it and overload some of the methods to strip out the attributes (and elements) that are in ignorable namespaces.
It may also simply be that the documents you're using are nondeterministic and exceed the configured threshold. You could try increasing PermittedNondeterminism slowly to see if there's a reasonable threshold that makes it pass. Be aware that the larger the value you use, the more memory PyXB may require to validate the document, and the longer it will take.
from pyxb.
Sorry, I mean extend. I'm subclassing pyxb.binding.saxer.PyXBSAXHandler
and overloading the startElementNS
method. That part seems to be doing exactly what I want.
My hypothesis was that the ContentNondeterminismExceededError
exception was being caused by my SAX handler. To test that, I manually removed all of the ignorable attributes from the XML document, and attempted to load it using PyXB. I got the same ContentNondeterminismExceededError
exception. I increased the PermittedNondeterminism
to 1024. Exception still occurs. I've encountered this issue on almost all of my sample documents thus far except one. That particular sample is a very simple document, only containing a single word. It's not yet clear to me what's special about my other samples that is causing this problem.
I also lack understanding of the purpose of the determinism check. It's not clear to me what non-determinism means in this context, or why/whether it's a problem. If there's any references or advice you could share I would appreciate it.
Thanks,
-Kyle
from pyxb.
Try this stackoverflow question, this PyXB test case, and possibly the technical references in the PyXB FAC documentation. More generally, a google for "nondeterminism in xml" might be fruitful, or the more common nondeterministic finite automata.
PyXB "resolves" nondeterminism by executing multiple candidate parses in parallel until only one succeeds or the number of potential candidates exceeds the limit. In grossly nondeterministic languages this can happen with pretty small documents.
from pyxb.
Thanks so much for your help Peter, it is greatly appreciated.
After some reading and testing, it appears that I will not be able to utilize PyXB generated bindings to open and interact with ECMA-376 (v2008 transitional) documents due to this issue with nondeterminism.
For example, the following document requires a PermittedNondeterminism
of 12288
, and takes about 5 seconds on my system (quad core, 16GB ram) to process:
<?xml version="1.0"?>
<w:document xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing">
<w:body>
<w:p>
<w:pPr>
<w:pStyle w:val="Normal"/>
<w:rPr/>
</w:pPr>
<w:ins w:id="1" w:author="Foo" w:date="2013-01-29T14:31:00Z">
<w:r>
<w:rPr>
<w:b/>
</w:rPr>
<w:t>This is an insertion</w:t>
</w:r>
</w:ins>
<w:ins w:id="2" w:author="Foo" w:date="2013-01-29T14:31:00Z">
<w:r>
<w:rPr/>
<w:t xml:space="preserve">. </w:t>
</w:r>
</w:ins>
<w:r>
<w:rPr>
<w:b/>
</w:rPr>
<w:t xml:space="preserve">This is </w:t>
</w:r>
<w:del w:id="3" w:author="Foo" w:date="2013-02-05T18:50:00Z">
<w:r>
<w:rPr>
<w:b/>
</w:rPr>
<w:delText>the</w:delText>
</w:r>
</w:del>
<w:r>
<w:rPr>
<w:b/>
</w:rPr>
<w:t xml:space="preserve"> end</w:t>
</w:r>
<w:r>
<w:rPr/>
<w:t xml:space="preserve"> of the</w:t>
</w:r>
<w:ins w:id="4" w:author="Foo" w:date="2013-01-29T14:31:00Z">
<w:r>
<w:rPr/>
<w:t xml:space="preserve"> inserted</w:t>
</w:r>
</w:ins>
<w:r>
<w:rPr/>
<w:t xml:space="preserve"> </w:t>
</w:r>
<w:commentRangeStart w:id="0"/>
<w:r>
<w:rPr/>
<w:t>paragraph</w:t>
</w:r>
<w:commentRangeEnd w:id="0"/>
<w:r>
<w:rPr/>
</w:r>
<w:r>
<w:rPr/>
<w:commentReference w:id="0"/>
</w:r>
<w:r>
<w:rPr/>
<w:t>.</w:t>
</w:r>
</w:p>
</w:body>
</w:document>
PyXB includes the EMCA-376 generating script, and while it can generate the bindings without issue, actually using them in practice doesn't appear reliable. Is that your experience with ECMA-376?
from pyxb.
I have no personal experience using the ECMA-376 bindings; they were added primarily as an example after another user had problems generating them. To my knowledge that user was able to accomplish hir task with them, but may have been using a namespace that wasn't as generic.
from pyxb.
Related Issues (20)
- Simple content with fixed value and attributes are not handled properly HOT 2
- Preserve XSD annotations in bindings HOT 1
- AnyContent not definable
- "List index out of range" using bindings built by pyxbgen HOT 1
- Invalid XML can be generated with extension elements HOT 2
- "all" allows multiple occurrences of the same term HOT 1
- NamespaceUniquenessError in elementDeclaration HOT 3
- Marshalling of restricted dateTime ignores restriction pattern HOT 3
- Install without pre-defined bundles? HOT 2
- Support of Python 3.7+ HOT 3
- Support of Python 3.8. HOT 7
- UnboundElementError when trying to subclass HOT 3
- SimpleTypeValueError when trying createfromdocument for MD_Metadata
- XInclude processing HOT 3
- creating an instance of a union does not name the actual type in the resulting exception
- Multiple accepting paths for xs:any ##other element
- validateBinding, toDOM and toxml raise AbstractElementError on unpickled pyxb object with abstract element/substitutionGroup
- Test suite does not work HOT 2
- Running setup.py install for pyxb … error
- test_gYearMonth.Test_gYearMonth.testBasic and test_date.Test_date.testArguments fails, test_gMonthDay.Test_gMonthDay.testBasic errors
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyxb.