mattt / ono Goto Github PK
View Code? Open in Web Editor NEWA sensible way to deal with XML & HTML for iOS & macOS
License: MIT License
A sensible way to deal with XML & HTML for iOS & macOS
License: MIT License
Hi,
When trying to parse the following:
<elementType>
<multiCheckBox>
<checkBoxes>
<checkBox>
<text>Text 0</text>
<answerCode>Code 0</answerCode>
</checkBox>
</checkBoxes>
</multiCheckBox>
</elementType>
Using:
NSString *XPath = @"/elementType/multiCheckBox/checkBoxes/checkBox";
ONOXMLElement *checkBoxElement = [self.document firstChildWithXPath:XPath];
ONOXMLElement *textElement = [checkBoxElement firstChildWithTag:@"text"];
ONOXMLElement *answerCodeElement = [checkBoxElement firstChildWithTag:@"answerCode"];
textElement
is nil
, but answerCodeElement
is not nil
!
However using firstChildWithXPath:
instead of firstChildWithTag:
works!
Bit of a weird issue! I've created a few failing test cases the in the associated pull request...
Any ideas? I've gotten around it for now using firstChildWithXPath:
, but I can't see anything going wrong :(
Cheers,
Rich
Hello Matt,
Are you going to implement this library natively on Swift?
I use this library in Swift project and there're several common Swift to Objective-C inconveniences, e.g.
for object in element.childrenWithTag("td") {
let child = object as! ONOXMLElement
...
}
NSFastEnumeration cannot be used in for-in directly
- (id <NSFastEnumeration>)CSS:(NSString *)CSS;
most methods return ImplicitlyUnwrappedOptional
I can update the code base at least to use Objective-C Nullability.
Please comment.
- (NSDictionary *)attributes {
if (!_attributes) {
NSMutableDictionary *mutableAttributes = [NSMutableDictionary dictionary];
for (xmlAttrPtr attribute = self.xmlNode->properties; attribute != NULL; attribute = attribute->next) {
NSString *key = @((const char *)attribute->name);
// valueForAttribute is nil for 'description'
[mutableAttributes setObject:[self valueForAttribute:key] forKey:key];
}
self.attributes = [NSDictionary dictionaryWithDictionary:mutableAttributes];
}
return _attributes;
}
Sample:
<outline text="BBC Persian" description="<div style="direction:rtl;text-align:right">این صفحه دیگر به روخ.." htmlUrl="http://www.bbc.co.uk/blogs/persian/editors/" xmlUrl="http://www.bbc.co.uk/blogs/persian/editors/rss.xml" subscribe="false" content_type="text/xml" site_icon="http://www.bbc.co.uk/favicon.ico"/>
Thanks for the very helpful library!
While using this with HTML, I found some CSS selector queries don't work. For example, when I tried to find elements under a specific element with a query like "#price .a-color-price", it generates the error below:
XPath error : Invalid expression
.//[@id = 'price']/descendant::[@id = 'a-col']*[contains(concat(' ',normalize-space(@Class),' '),' a-color-price ')]
Can you please fix this error if I'm not using CSS selector in a wrong way?
i want parse http://www.99kubo.com/vod-play-id-94480-sid-1-pid-1.html
this page have a playerURL
but it in 「iframe」
how can i use Ono Parse it ?
Fix by adding "xmlXPathFreeObject(XPath)" under line 345 of ONOXMLDocument.m
I have tested using Ono with Swift, and there seems to be a problem with getting the stringValue of elements. I have created a minimal example here to try:
https://github.com/tkrajacic/OnoTest.git
Calling stringValue
on an element produces (Function)
in Swift.
Could it be that stringValue
has a namespace problem in swift?
Adding a breakpoint to my test project directly inside its implementation shows that it doesn't get called even!
In another project I use it with Alamofire to parse XML from a server and it produces nil
The function stringValue
does get called there, returns the correct value, but that value somehow ends up nil
after the call anyway... (No idea how that happens)
Scratch that, I'm an idiot. Of course stringValue
is a function in Swift and needs to be called stringValue()
I'm unable to use Carthage to add Ono to my project. I'm using Carthage 0.11.0.
The following build commands failed:
CompileC /Users/daniel/Library/Developer/Xcode/DerivedData/Ono-algafnmzvwnwysbsctdxjjediapq/Build/Intermediates/Ono.build/Release-iphoneos/Ono\ iOS\ Tests.build/Objects-normal/armv7/ONOHTMLTests.o Tests/ONOHTMLTests.m normal armv7 objective-c com.apple.compilers.llvm.clang.1_0.compiler
(1 failure)
- (void)enumerateElementsWithXPath:(NSString *)XPath block:(void (^)(ONOXMLElement *element))block
works totally fine with iOS7, but when I try to do find elements with class='art_show_page'
it crashes on iOS6.1 simulator. Codes as following:
[body enumerateElementsWithXPath:@"//div[@class='art_show_page']"
block:^(ONOXMLElement *element) {
NSLog(@"%@:%@",element.tag,element.class);
}];
Not sure what's going on... It crashes at
dyld`dyld_fatal_error:
0x8fe0b0b4: int3
0x8fe0b0b5: nop
Suppose I have this
<food> text before<br /> text after </food>
When I call [ONOXMLElement stringValue], I receive "text before text after"
But I want to receive "text before<br /> text after"
How to deal with this ?
I'm working on a pod called Spectra that uses Ono 1.2.2 to parse XML for a scene graph and some other things.
I upgraded to XCode 7.1 yesterday and i haven't been able to resolve this build error at all. I've found a lot of people online asking about this, but for the most part, they're not resolving this issue for a pod their building. So many of the solutions i've found don't work for me.
include of non-modular header inside framework module Ono.Ono
I've found that by changing the following line in ono.h
resolve the Ono build issues for me.
// changed from this
#import <Ono/ONOXMLDocument.h>
// to this
#import "ONOXMLDocument.h"
I'm using Ono as epub parser, but I can't get value from tags like "dc:title".
Here is the xml file I am using.
Here is the code I am using.
NSString *title = [[metaElement firstChildWithTag:@"dc:title"] stringValue];
NSString *autherName = [[metaElement firstChildWithTag:@"dc:creator"] stringValue];
<?xml version="1.0" encoding="utf-8"?>
<package version="2.0" unique-identifier="uuid_id" xmlns="http://www.idpf.org/2007/opf">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata" xmlns:opf="http://www.idpf.org/2007/opf" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<dc:title>礼记(一)</dc:title>
<dc:creator opf:role="aut" opf:file-as="李明哲">李明哲</dc:creator>
<dc:contributor opf:role="bkp">calibre (2.83.0) [https://calibre-ebook.com]</dc:contributor>
<dc:date>2006-12-02T00:00:00+00:00</dc:date>
<dc:publisher>**青少年出版社</dc:publisher>
<dc:identifier opf:scheme="uuid" id="uuid_id">bbf6f12e-e2cc-4e55-b10a-51218b8524af</dc:identifier>
<dc:language>zh</dc:language>
<dc:identifier opf:scheme="calibre">bbf6f12e-e2cc-4e55-b10a-51218b8524af</dc:identifier>
<meta content="礼记(一)" name="calibre:title_sort" />
<meta name="cover" content="cover" />
<meta content="2017-05-08T02:00:56.663000+00:00" name="calibre:timestamp" />
<meta content="{"李明哲": ""}" name="calibre:author_link_map" />
<meta content="0.9.7" name="Sigil version" />
<dc:date opf:event="modification" xmlns:opf="http://www.idpf.org/2007/opf">2017-05-28</dc:date>
</metadata>
<manifest>
<item id="titlepage" href="Text/titlepage.xhtml" media-type="application/xhtml+xml"/>
<item id="a_cover_xhtml" href="Text/a_cover.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter_00001_xhtml" href="Text/chapter_00001.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter_00002_xhtml" href="Text/chapter_00002.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter_00003_xhtml" href="Text/chapter_00003.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter_00004_xhtml" href="Text/chapter_00004.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter_00005_xhtml" href="Text/chapter_00005.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter_00006_xhtml" href="Text/chapter_00006.xhtml" media-type="application/xhtml+xml"/>
<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
<item id="page_css" href="Styles/page_styles.css" media-type="text/css"/>
<item id="css" href="Styles/stylesheet.css" media-type="text/css"/>
<item id="cover" href="Images/cover.jpeg" media-type="image/jpeg"/>
</manifest>
<spine toc="ncx">
<itemref idref="titlepage"/>
<itemref idref="a_cover_xhtml"/>
<itemref idref="chapter_00001_xhtml"/>
<itemref idref="chapter_00002_xhtml"/>
<itemref idref="chapter_00003_xhtml"/>
<itemref idref="chapter_00004_xhtml"/>
<itemref idref="chapter_00005_xhtml"/>
<itemref idref="chapter_00006_xhtml"/>
</spine>
<guide>
<reference type="cover" title="Cover" href="Text/titlepage.xhtml"/>
</guide>
</package>
If I have a child element and I run an Xpath enumeration on it, I'd like to only have results from inside that element. Instead results are returned from the entire document. Is this intended behavior?
If the project contains .mm files, the following line in ONOXMLDocument.h will cause compile errors. (For example: Expected member name or ';' after declaration specifiers)
@property (readonly, nonatomic, copy) NSString *namespace;
- (id)valueForAttribute:(NSString *)attribute
inNamespace:(NSString *)namespace;
- (ONOXMLElement *)firstChildWithTag:(NSString *)tag
inNamespace:(NSString *)namespace;
- (NSArray *)childrenWithTag:(NSString *)tag
inNamespace:(NSString *)namespace;
I think it's because namespace is a reserved word in C++.
Since namepace is also a keyword in XML, is there a workaround, besides modifying the property and parameter names?
Ono does not work with ![CDATA[xxxx]]
Hi,
I’ve had problems with CocoaPods before, so liking the look of the more straight forward Carthage. I was wondering if I had to do anything special to get Ono working with Carthage.
I get the error:
Project "Ono.xcodeproj" has no shared schemes
I will try building Ono manually and adding that to my Xcode project. Thanks.
For the following XML document(SOAP service), XPath can't find the elements.
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<ServiceResponse xmlns="http://sub.domain.com">
<ServiceReturn>
<a>US</a>
<b>
<b>
<e>Herrod</e>
<f>[email protected]</f>
<g>Et Commodo LLC</g>
<i>07/14/2014</i>
</b>
<b>
<e>Armand</e>
<f>[email protected]</f>
<g>Vel Convallis In Consulting</g>
<i>04/18/2015</i>
</b>
<b>
<e>Bernard</e>
<f>[email protected]</f>
<g>Cursus Nunc Mauris PC</g>
<i>11/12/2015</i>
</b>
<b>
<e>Dante</e>
<f>[email protected]</f>
<g>Sit Amet Ltd</g>
<i>01/19/2016</i>
</b>
</b>
<c>0</c>
<k>12345678</k>
</ServiceReturn>
</ServiceResponse>
</soapenv:Body>
</soapenv:Envelope>
Here are the result of some functions
document.rootElement.firstChildWithTag("Body", inNamespace: "soapenv") //some
document.rootElement.firstChildWithTag("//soapenv:Body/ServiceResponse") //nil
document.rootElement.firstChildWithTag("//Body//b", inNamespace: "soapenv") //nil
document.firstChildWithXPath("//ServiceResponse") //nil
document.firstChildWithXPath("//ServiceReturn") //nil
document.firstChildWithXPath("//*/ServiceReturn/b/b") //nil
document.firstChildWithXPath("//ServiceReturn/b/b") //nil
document.firstChildWithXPath("//b/b") //nil
Another example
<?xml version="1.0" encoding="UTF-8"?>
<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
<S:Body>
<ns2:OperationHistoryData xmlns:ns2="http://www.domain.org/operation" xmlns:ns3="http://www.domain.org/info" xmlns:ns5="http://www.domain.org/data">
<ns2:historyRecord>
<ns2:DestinationAddress>
<ns2:Index>190000</ns2:Index>
<ns2:Description>Info 1</ns2:Description>
</ns2:DestinationAddress>
</ns2:historyRecord>
</ns2:OperationHistoryData>
</S:Body>
</S:Envelope>
Whenever there is nested namespaces, ono can't parse it. I don't know maybe it is limitation of libxml. Only workaround I found is to strip name spaces and other attributes.
When I request a url, I parse it
[doc enumerateElementsWithXPath:@"//*[@class='res-list']" usingBlock:^(ONOXMLElement * _Nonnull element, NSUInteger idx, BOOL * _Nonnull stop) {
//parse
}];
The contents of the parsing are actually the contents of the first element,When I take a single element parse out, it's perfectly fine.
Tests/ONOHTMLTests.m file
- (void)testRootElementChildren {
NSArray *children = [self.document.rootElement children];
XCTAssertNotNil(children, @"children should not be nil");
XCTAssertTrue([children count] == 2, @"root element has more than two children");
XCTAssertEqualObjects([[children firstObject] tag], @"head", @"head not first child of html");
XCTAssertEqualObjects([[children lastObject] tag], @"body", @"body not last child of html");
}
The last two line of that method occurs following error :
"Multiple methods named 'tag' found with mismatched result, parameter type or attributes..."
Is there any way to fix this? Please help me.
Thanks.
CSS selector is not working for UPS web site with
1ZA829420304772945 as tracking code. I put the result on http://pastebin.com/7wj6AMTm I tried o use table.dataTable
and .dataTable
selectors. Both of them are returning nil. However .//*/table[@class='dataTable']
XPath is working correctly.
Let's say document has following javascript, ono stops whenever it sees ending html tag. However, ono shouldn't parse javascript I guess.
function PrintDiv() {
//Print Content of the Div in a New Blank Window
var disp_setting = "toolbar=no,location=no,directories=no,menubar=no,";
var content_vlue = document.getElementById('DivBlaBla').innerHTML;
var docprint = window.open("", "", disp_setting);
docprint.document.open();
docprint.document.write('<html><head><title></title>');
docprint.document.write('</head><body onLoad="self.print();window.close();">');
docprint.document.write(content_vlue);
docprint.document.write("</body></html>");
docprint.document.close();
docprint.focus();
}
</script>
If there are some non-printable ASCII characters inside a CDATA
section in a XML document, the document object will miss some content.
Suppose I get this from the server.
<blog>
<title><![CDATA[Example]]></title>
<commentCount>0</commentCount>
<body>
<![CDATA[
XXX
This �� line �� contains� non-printable ascii characters.
XXX
]]>
</body>
<author><![CDATA[Author]]></author>
<pubDate>2015-07-21 20:24:36</pubDate>
</blog>
The response ONOXMLDocument object will be
<blog>
<title><![CDATA[Example]]></title>
<commentCount>0</commentCount>
<body>
line contains</body></blog>
This case happens only when non-printable characters are inside a CDATA
section.
Hi Mattt,
Any change you can support watchOS 2?
Cocoapods updated their implementation, more information can be found here. I can take a look later if you want.
all of these individual warnings come up when compiling ONOXMLDocument.m with -Weverything
-Wshadow
-Wsign-conversion
-Wfloat-conversion
-Wconversion
-Wnullable-to-nonnull-conversion
-Wcast-qual
First of all thanks for wonderful libraries. I am using Ono with Alamofire with following extension
extension Request {
class func HTMLResponseSerializer() -> Serializer {
return { (request, response, data) in
if data == nil {
return (nil, nil)
}
var HTMLSerializationError: NSError?
let HTML = ONOXMLDocument.HTMLDocumentWithData(data, error: &HTMLSerializationError)
return (HTML, HTMLSerializationError)
}
}
func responseHTMLDocument(completionHandler: (NSURLRequest, NSHTTPURLResponse?, ONOXMLDocument?, NSError?) -> Void) -> Self {
return response(serializer: Alamofire.Request.XMLResponseSerializer(), completionHandler: { (request, response, XML, error) in
completionHandler(request, response, XML as? ONOXMLDocument, error)
})
}
}
It works very well. However for some HTMl pages which has
It prints
parser error : Entity 'nbsp' not defined
How can I prevent/silence this error? Thanks again for your wonderful libraries.
NSError *xmlError; ONOXMLDocument *document = [ONOXMLDocument HTMLDocumentWithData:data error:&xmlError]; ONOXMLElement *element = [document.rootElement firstChildWithCSS:@"table:nth-of-type(2)"];
XPath error : Invalid expression
.//table:nth-of-type(2)
^
I am trying to do XPath query on an Atom document something like /feed/title
(this returns nil currently) and eventually enumerate over the entires. The default namespace is set xmlns="http://www.w3.org/2005/Atom
in the document but no prefix is set. Is there a correct way to way to use XPath with Atom or is there a way to set the prefix (so the xpath would be something like /atom:title/atom:title
?
When I use #import "Ono.h"
in .mm
file ,it raises error below
Use of '@import' when C++ modules are disabled, consider using -fmodules and -fcxx-modules
but I change @import Foundation;
with #import <Foundation/Foundation.h>
in Ono.h
,it's fine
jsoup can do it:
String absHref = link.attr("abs:href");
Ono how to do
ONOXMLDocument's rootElement property creates a retain cycle, keeping the entire XML doc in memory.
Perhaps the reference to ONOXMLDocument should be weak instead?
I would like to have a method similar to stringValue which doesn't recursively prints everything under a certain XPathQuery. Here is the full code + HTML and the produced output by Ono
plus which output I'd like to have.
My XPath Query: XPathQuery: //div[@class='thread']
Ono
code:
document = [ONOXMLDocument HTMLDocumentWithData:file error:&error];
[document enumerateElementsWithXPath:xPath usingBlock:^(ONOXMLElement *element, NSUInteger idx, BOOL *stop) {
NSLog(@"%@", [element stringValue]);
}];
Which prints:
FirstName LastName, SecondNameFirst SecondNameLast
FirstName LastName
Wednesday, December 24, 2014 at 6:57pm UTC+01
This is a dummy text
SecondNameFirst SecondNameLast
Wednesday, December 24, 2014 at 6:56pm UTC+01
And a 2nd one just to show off
Another, User
Another
Monday, April 27, 2015 at 10:54pm UTC+02
Text: 2.1
User
Thursday, February 26, 2015 at 5:41pm UTC+01
Text: 2.2
Another
Thursday, February 26, 2015 at 4:25pm UTC+01
Text: 2.3
I would prefer to have an output similar to hpple which is:
FirstName LastName, SecondNameFirst SecondNameLast
Another, User
hpple code:
tutorialsParser = [TFHpple hppleWithHTMLData:file];
tutorialsNodes = [tutorialsParser searchWithXPathQuery:xPath];
for (TFHppleElement *element in tutorialsNodes) {
NSLog(@"%@", [[element firstChild] content].trim);
}
And I don't want to use hpple since it is too slow.
Here is my input HTML file:
<!DOCTYPE html>
<html>
<head><title/></head>
<body>
<div class="thread">FirstName LastName, SecondNameFirst SecondNameLast
<div class="message">
<div class="message_header">
<span class="user">FirstName LastName</span>
<span class="meta">Wednesday, December 24, 2014 at 6:57pm UTC+01 </span>
</div>
</div>
<p>This is a dummy text</p>
<div class="message">
<div class="message_header">
<span class="user">SecondNameFirst SecondNameLast</span>
<span class="meta">Wednesday, December 24, 2014 at 6:56pm UTC+01</span>
</div>
</div>
<p>And a 2nd one just to show off</p>
</div>
<div class="thread">Another, User
<div class="message">
<div class="message_header">
<span class="user">Another</span>
<span class="meta">Monday, April 27, 2015 at 10:54pm UTC+02</span>
</div>
</div>
<p>Text: 2.1</p>
<div class="message">
<div class="message_header">
<span class="user">User</span>
<span class="meta">Thursday, February 26, 2015 at 5:41pm UTC+01</span>
</div>
</div>
<p>Text: 2.2</p>
<div class="message">
<div class="message_header">
<span class="user">Another</span>
<span class="meta">Thursday, February 26, 2015 at 4:25pm UTC+01</span>
</div>
</div>
<p>Text: 2.3</p>
</div>
</body>
</html>
-(NSArray *)childrenAtIndexes:(NSIndexSet *)indexes {
NSMutableArray *mutableChildren = [NSMutableArray array];
xmlNodePtr cursor = self.xmlNode->children;
NSUInteger idx = 0;
while (cursor) {
if ([indexes containsIndex:idx] && cursor->type == XML_ELEMENT_NODE) {
[mutableChildren addObject:[self.document elementWithNode:cursor]];
}
cursor = cursor->next;
idx++;
}
return [NSArray arrayWithArray:mutableChildren];
}
According to the above code I get child node must be element nodes, but also I think text node's child nodes.
Namely:
if ([indexes containsIndex:idx] && (cursor->type == XML_ELEMENT_NODE || cursor->type == XML_TEXT_NODE)) { [mutableChildren addObject:[self.document elementWithNode:cursor]]; }
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.