Giter Site home page Giter Site logo

sax-machine's Introduction

SAX Machine

Status

Gem Version Build Status Coverage Status Code Climate Dependencies

Description

A declarative SAX parsing library backed by Nokogiri, Ox or Oga.

Installation

Add this line to your application's Gemfile:

gem 'sax-machine'

And then execute:

$ bundle

Usage

SAX Machine can use either nokogiri, ox or oga as XML SAX handler.

To use Nokogiri add this line to your Gemfile:

gem 'nokogiri', '~> 1.6'

To use Ox add this line to your Gemfile:

gem 'ox', '>= 2.1.2'

To use Oga add this line to your Gemfile:

gem 'oga', '>= 0.2.0'

You can also specify which handler to use manually, like this:

SAXMachine.handler = :nokogiri

Examples

Include SAXMachine in any class and define properties to parse:

class AtomContent
  include SAXMachine
  attribute :type
  value :text
end

class AtomEntry
  include SAXMachine
  element :title
  # The :as argument makes this available through entry.author instead of .name
  element :name, as: :author
  element "feedburner:origLink", as: :url
  # The :default argument specifies default value for element when it's missing
  element :summary, class: String, default: "No summary available"
  element :content, class: AtomContent
  element :published
  ancestor :ancestor
end

class Atom
  include SAXMachine
  # Use block to modify the returned value
  # Blocks are working with pretty much everything,
  # except for `elements` with `class` attribute
  element :title do |title|
    title.strip
  end
  # The :with argument means that you only match a link tag
  # that has an attribute of type: "text/html"
  element :link, value: :href, as: :url, with: {
    type: "text/html"
  }
  # The :value argument means that instead of setting the value
  # to the text between the tag, it sets it to the attribute value of :href
  element :link, value: :href, as: :feed_url, with: {
    type: "application/atom+xml"
  }
  elements :entry, as: :entries, class: AtomEntry
end

Then parse any XML with your class:

feed = Atom.parse(xml_text)

feed.title # Whatever the title of the blog is
feed.url # The main URL of the blog
feed.feed_url # The URL of the blog feed

feed.entries.first.title # Title of the first entry
feed.entries.first.author # The author of the first entry
feed.entries.first.url # Permalink on the blog for this entry
feed.entries.first.summary # Returns "No summary available" if summary is missing
feed.entries.first.ancestor # The Atom ancestor
feed.entries.first.content # Instance of AtomContent
feed.entries.first.content.text # Entry content text

You can also use the elements method without specifying a class:

class ServiceResponse
  include SAXMachine
  elements :message, as: :messages
end

response = ServiceResponse.parse("
  <response>
    <message>hi</message>
    <message>world</message>
  </response>
")
response.messages.first # hi
response.messages.last  # world

To limit conflicts in the class used for mappping, you can use the alternate SAXMachine.configure syntax:

class X < ActiveRecord::Base
  # This way no element, elements or ancestor method will be added to X
  SAXMachine.configure(X) do |c|
    c.element :title
  end
end

Multiple elements can be mapped to the same alias:

class RSSEntry
  include SAXMachine
  # ...
  element :pubDate, as: :published
  element :pubdate, as: :published
  element :"dc:date", as: :published
  element :"dc:Date", as: :published
  element :"dcterms:created", as: :published
end

If more than one of these elements exists in the source, the value from the last one is used. The order of the element declarations in the code is unimportant. The order they are encountered while parsing the document determines the value assigned to the alias.

If an element is defined in the source but is blank (e.g., <pubDate></pubDate>), it is ignored, and non-empty one is picked.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

LICENSE

The MIT License

Copyright (c) 2009-2014:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

sax-machine's People

Contributors

abrandoned avatar aderyabin avatar anarchivist avatar archiloque avatar brynary avatar domestika avatar emilford avatar ezkl avatar gaffneyc avatar johnf avatar jonasnielsen avatar jordimassaguerpla avatar joshk avatar julien51 avatar krasnoukhov avatar krobertson avatar loren avatar mje113 avatar papipo avatar pauldix avatar prodis avatar spiegela avatar tebayoso avatar trekdemo avatar underpantsgnome avatar volontarian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sax-machine's Issues

Set class of attribute

First of all: Awesome gem!

For a element I can do:

element :id, class: Integer

why can't I do

attribute :id, class: Integer

It doesn't throw an error, it just returns it as a String. Is there a way for attributes to accomplish that it returns the attribute in the desired class?

Missing "default" method causes error with AR 3.2.5

NoMethodError: undefined method default' for #<SAXMachine::SAXConfig::ElementConfig:0x007ffdf278ee00> from /Users/brian/.rvm/gems/ruby-1.9.3-p194@dmarc-web/gems/activerecord-3.2.5/lib/active_record/model_schema.rb:243:inblock in column_defaults'

Change the context of blocks passed with configuration

It might be worth while to note in the documentation that the blocks operate not on SAXMachine objects, but at the class level. This means that self inside blocks is the class, not the instance as a newbie might expect.


Here's some code that demonstrates what I mean / how I was thinking of working with this.

require 'sax-machine'
require 'pry'


Objects = []

class Message
   include SAXMachine
   value :text
   attribute :saveself do |m| #Save self
      Objects.push(self)
      m
   end
end

class Selves
  include SAXMachine
  elements :message, as: :messages, class: Message
end

response = Selves.parse("
    <message saveself='true'>hi</message>
    <message>world</message>
")

binding.pry

puts response.inspect
puts response.messages.first.inspect
puts Objects.inspect
puts Objects[0].inspect

And here's the pry session where I finally put two and two together 😂

[2] pry(main)> Objects[0]
=> Message
[4] pry(main)> response.messages.first.class
=> Message
[6] pry(main)> Objects[0].class
=> Class

Attributes are nil

Somehow the attributes can't get displayed …

require 'sax-machine';

class AtomContent
  include SAXMachine
  attribute :type
  value :text
  element :test, :value => :type
end

node = AtomContent.parse('
    <content type="text">
        sample
    </content>
');
puts node;
puts node.type;
puts node.text;

Provide access to parent elements

First, thanks for this great library (and also FeedZirra) Paul. It makes importing 3rd party xml a lot easier and faster.

One way to improve it for me would be to give collection elements access to their parents. In the code below the declaration 'elements' in Toplevel would also pass a reference to 'self' so that an Entry instance can access its parent.

class Entry
  include SaxMachine
  ...
  def credentials
    parent_element.credentials
  end
  ...
end

class Toplevel
  include SaxMachine
  element :credentials
  elements :entry, :class => :entry 
end

I have tried to monkey patch it, but I didn't succeed. Any hints where to begin would be welcome as well :)

Cheers, Jeroen

Is there a way to Proc/Lambda process element after declared?

Assuming whe have an element:

<Dates PickUpDateTime="2015-04-22T11:00:00-05:00" ReturnDateTime="2015-04-23T11:00:00-05:00"></dates>

Is there a way to turn the attributes into DateTime instances?

Assuming we need to do DateTime.parse as a block to the element instead of DateTime.new

Inheritance to cleaner definition of parsers

Hi Pauldix

Maybe could be usefull to add inheritance for the sax config to cleaner definition of parser.I've tried to add inheritance in sax-machine but I didn't success.

Thanks,sax-machine works great

Elements with a hyphen in them fail to populate

I'm a relatively new sax-machine user, so please forgive my ignorance if I'm overlooking something obvious, but I think their might be an issue (or lack of support) for parsing class defined elements that contain a hyphen in their name.

Example A (barbaz)

require 'sax-machine'

class Bar
  include SAXMachine
  attribute :id
end

class Foo
  include SAXMachine
  attribute :id
  elements "barbaz", as: :bars, class: Bar
end

class Document
  include SAXMachine
  elements :foo, as: :foos, class: Foo
end

xml = "<?xml version='1.0' encoding='UTF-8'?>
<foo id='1'>
  <barbaz id='1'>test</barbaz>
  <barbaz id='2'>test</barbaz>
</foo>"

doc = Document.parse(xml)
doc.foos.first

When run in IRB results in:

=> #<Foo:0x007ff2a38cd988 @id="1", @bars=[#<Bar:0x007ff2a38cd190 @id="1">, #<Bar:0x007ff2a38cc6c8 @id="2">]>

Example B (bar-baz)

 require 'sax-machine'

class Bar
  include SAXMachine
  attribute :id
end

class Foo
  include SAXMachine
  attribute :id
  elements "bar-baz", as: :bars, class: Bar
end

class Document
  include SAXMachine
  elements :foo, as: :foos, class: Foo
end

xml = "<?xml version='1.0' encoding='UTF-8'?>
<foo id='1'>
  <bar-baz id='1'>test</bar-baz>
  <bar-baz id='2'>test</bar-baz>
</foo>"

doc = Document.parse(xml)
doc.foos.first

When run in IRB results in:

=> #<Foo:0x007f9c0c084a30 @id="1">

What I would like to know is whether this is expected behavior or whether this is something that would be of interest in addressing in sax-machine. Like I said above, I'm pretty new at it, so would be glad to find I'm just doing it wrong.

Thanks!

Impossible to use relative path

I am looking for something like

 elements "parent/child"

Here is example

require 'sax-machine'

class Location
  include SAXMachine
  element :country
  element :province
end

klass =
Class.new do
  include SAXMachine
  elements :location, as: :locations, class: Location
  ancestor :ancestor
end
document = klass.new
document.parse("
<location>
  <country>UA</country>
  <province>SUMY</province>
</location>
<locations>
  <location>
    <country>CH</country>
    <province>GE</province>
  </location>
  <location>
    <country>DE</country>
    <province>DE</province>
  </location>
</locations>
")

puts document.locations.size #returns 3
# but I want to parse all "location" under "locations" with ignoring "location" 
# tag on other levels of document.

Now I use workaround with specifying of container, it looks ugly

document.location_container.locations

Skipping all the child tags without raising any errors

In case the underlying XML is not valid. In my case, just having unescaped &

Gemfile:

source "https://rubygems.org"

gem "sax-machine"
gem "nokogiri"

test.rb:

require "sax-machine"

class Child
  include SAXMachine
end

class Parent
  include SAXMachine
  elements :child, as: :children, class: Child
end

p Parent.parse(File.open("xml").read)

xml:

<?xml version="1.0"?>
<parent>
  <child>
    one &
  </child>
  <child>
    two
  </child>
</parent>

And it goes #<Parent:0x000055fef4541cc8> without nor child elements nether error raised.

Parse groups of elements with no container elements?

I was wondering how one might parse a document that looks like this:

<root>
  <key>A</key><value>1</value>
  <key>B</key><value>2</value>
  <key>C</key><value>3</value>
  <key>D</key><value>4</value>
</root>

Each key/value pair is a single entity and I'd like to populate a hash. I haven't found a way to do it. Any advice appreciated.

parent conflict with load_missing_constant

A conflict occurs when trying to initialize a class that has multiple elements of another class... This happens if the other class hasn't been required/loaded yet.

class Example
    include SAXMachine
    element :works_fine
    elements :multiple, :as => :items, :class => ExampleItem

end

wrong number of arguments (0 for 1)
  # /Users/christos/.rvm/gems/ruby-1.9.2-p180/gems/sax-machine-0.0.20/lib/sax-machine/sax_document.rb:44:in `parent'
  # /Users/christos/.rvm/gems/ruby-1.9.2-p180/gems/activesupport-3.1.0/lib/active_support/dependencies.rb:494:in `load_missing_constant'

To make this go away you can require ExampleItem but that doesn't seem like a good solution.

Can we rename the parent method to something else to avoid conflicts? Maybe a block style definition would work better?

Element nesting depth ignored

I have the following case:

<root>
  <foo id="1">
    <bar>
      <meta>
        <value id="a"/>
        <value id="b"/>
      </meta>
    </bar>
    <meta>
      <value id="x"/>
      <value id="y"/>
    </meta>
  </foo>
  <foo id="2">
    <meta>
      <value id="z"/>
    </meta>
  </foo>
</root>

And the following mapping:

class Root
  include SAXMachine
  elements :foo, as: :children, class: Foo
end

class Meta
  include SAXMachine
  elements :values, as: :values, class: String
end

class Foo
  include SAXMachine
  element :meta, class: Meta
end

Now if I get metaof foo with id 1, I get the meta node of bar instead of the meta node of foo.

How do I solve this?

Maybe it would be great to have a depth constraint, so I could define something like:

class Foo
  include SAXMachine
  element :meta, class: Meta, depth: 1
  element :other, class: Other, depth: ->(depth) { depth > 2 && depth < 5 }
end

Parsing as integer doesn't seem to be working.

I have this XML Node:

<Warning Type="1" ShortText="MIN AGE 20  " RecordID="613"/>

And I'm trying to parse it with:

  class Warning < HertzSearch
    element :Warning, value: :Type,       as: :warning_type, class: Integer
    element :Warning, value: :ShortText,  as: :short_text
    element :Warning, value: :RecordID,   as: :record_id, class: Integer
  end

The expected output should be integers in both values. But I still get Strings.

Root element not being parsed properly

Root elements don't get parsed properly. Here's an example:

#!/usr/bin/env ruby
require 'sax-machine'

class A
  include SAXMachine

  attribute :attr
  element :thing
end

a = A.parse <<-EOXML
<?xml version="1.0" encoding="UTF-8"?>
<a attr="this does not work">
  <thing>neither does this</things>
</a>
EOXML

p a.attr
# => nil

p a.thing
# => nil

The parsed model should both get the attribute and the element but it doesn't

Should cast into the correct class without a collection element

  it "should cast into the correct class" do
    document = @klass.parse("<item type=\"Bar\"><title>Bar title</title></item><item type=\"Foo\"><title>Foo title</title></item>")
    document.items.size.should == 2
    document.items.first.should be_a(Bar)
    document.items.first.title.should == "Bar title"
    document.items.last.should be_a(Foo)
    document.items.last.title.should == "Foo title"
  end

'SAXMachine elements when using the with and class options should cast into the correct class' FAILED
expected: 2,
got: 1 (using ==)

Diff:
@@ -1,2 +1,2 @@
-2
+1

RuntimeError: can't add a new key into hash during iteration

I'm randomly seeing this exception.

RuntimeError: can't add a new key into hash during iteration
org/jruby/RubyHash.java:1002→ []=
[GEM_ROOT]/bundler/gems/sax-machine-332cafa62ed6/lib/sax-machine/sax_config.rb:17→ initialize
org/jruby/RubyProc.java:255→ call
org/jruby/RubyHash.java:690→ default
org/jruby/RubyHash.java:1118→ []
[GEM_ROOT]/bundler/gems/sax-machine-332cafa62ed6/lib/sax-machine/sax_config.rb:55→ collection_config
[GEM_ROOT]/bundler/gems/sax-machine-332cafa62ed6/lib/sax-machine/sax_handler.rb:44→ start_element
nokogiri-1.5.10-java/lib/nokogiri/xml/sax/document.rb:116→ start_element_namespace
nokogiri/XmlSaxParserContext.java:241→ parse_with
$ jruby -v
jruby 1.7.4 (1.9.3p392) 2013-05-16 2390d3b on Java HotSpot(TM) 64-Bit Server VM 1.7.0_21-b12 [darwin-x86_64]

Access XMLNS prefix definitions

In the case of <rss xmlns:content="http://purl.org/rss/1.0/modules/content/">, I can't figure out how to work with that attribute (either the value, or the attribute itself). Doing attribute :"xmlns:content" give an invalid attribute name error. Is there a workaround?

Inheritance Issues

I am working on creating a Rails plugin for Feedzirra by trying to make an ActiveRecord::Base#acts_as_parser class method to make any ActiveRecord object behave as though it is a Feedzirra::Parser so it can be added as a feed class to Feedzirra::Feed.

Anyways, in order to accomplish this I have written this RSpec test. If I can get this to pass then it should be possible to implement the acts_as_parser class method, any help would be appreciated.

class BaseClass; end
module Something
  def self.included(base)
    base.send :extend, ClassMethods
  end

  module ClassMethods
    def acts_as_sax
      self.class.send :extend, Parser
    end
  end
end

module Parser
  include SAXMachine
  element :title
end

# Gives BaseClass#acts_as_sax
BaseClass.send :include, Something

class ExtendsClassMethods < BaseClass
  acts_as_sax
end

xml = "<top><title>Test</title><b>Matched!</b><c>And Again</c></top>"
@class_methods = ExtendsClassMethods.new
@class_methods.parse xml 

@class_methods.title.should == "Test"

Is it possible to get the parse results lazily?

I'd like to use sax-machine to parse pretty big XML documents. Its SAX nature is appealing but it seems that it builds all the output at once in memory, instead of "streaming" the results out. For example:

class Document
  include SAXMachine
  # etc
end

# other classes etc

records = Document.parse File.read(large_xml_file)
records.each do |record|
  # etc
end

At the moment, if I understand sax-machine correctly, the parsing step parses the entire document there and then. From a memory point of view, this seems to negate the benefit of using a SAX parser.

Instead I would like to keep memory down by parsing one record at a time in the enumeration. Is this possible?

Cannot associate block with elements?

Perhaps this is newbie problem, but it seems odd to me that a block cannot be
associated with an elements list...

require 'sax-machine'

class ServiceResponse
  include SAXMachine
  elements :message, as: :messages do |message|
     puts "Got message: #{message}"
  end
end

response = ServiceResponse.parse("
  <response>
    <message>hi</message>
    <message>world</message>
  </response>
")

puts response.inspect

(The 'got message' puts never happens)

Issue with same xml tag name nested

So I am running the latest version of the gem, ruby 2.1.5 on Linux .

The following script should run out of the box. See inline comments where I am printing of whats wrong. I am not sure how to solve it to have it properly fill the count and not to fill incorrectly the entries either

require 'sax-machine'

class TestEntry
 include SAXMachine
 element :ContentId
end

class Test
 include SAXMachine

 element :content
 element :meta
 elements :content, as: :entries, class: TestEntry
end

xml =<<-EOF
<content>
 <meta><count>0</count></meta>
</content>
EOF
res = Test.parse xml
p res.entries  # it should have not entries.. but yet it think it does....
p res.meta  # i was expected the meta to be 0, but not nil

xml =<<-EOF
<content>
 <content><ContentId>1</ContentId></content>
 <content><ContentId>2</ContentId></content>
 <meta><count>2</count></meta>
</content>
EOF

res = Test.parse xml
p res.entries # expected to have 2 entries and works
p res.meta  # expected to be 2 and works

ActiveRecord/SAXMachine method conflicts

I don't think this is actually working as expected. If I understand correctly, this should not work.

class Z < ActiveRecord::Base
  SAXMachine.configure(Z) do |config|
    config.element :title
  end
end

z = Z.new
z.title

However, when I try it it succeeds. I believe it should return NoMethodError.

Further, if I call:

z = Z.parse(text)

I get the following:

Exception: NoMethodError: undefined method `default' for #SAXMachine::SAXConfig::ElementConfig:0x007fcb3db816d0
0: /Users/brian/.rvm/gems/ruby-1.9.3-p194@dmarc-web/gems/activerecord-3.2.5/lib/active_record/model_schema.rb:243:in block in column_defaults'
1: /Users/brian/.rvm/gems/ruby-1.9.3-p194@dmarc-web/gems/activerecord-3.2.5/lib/active_record/model_schema.rb:243:inmap'
2: /Users/brian/.rvm/gems/ruby-1.9.3-p194@dmarc-web/gems/activerecord-3.2.5/lib/active_record/model_schema.rb:243:in column_defaults'
3: /Users/brian/.rvm/gems/ruby-1.9.3-p194@dmarc-web/gems/activerecord-3.2.5/lib/active_record/base.rb:482:ininitialize'
4: /Users/brian/.rvm/gems/ruby-1.9.3-p194@dmarc-web/gems/sax-machine-0.2.0.rc1/lib/sax-machine/sax_document.rb:23:in new'
5: /Users/brian/.rvm/gems/ruby-1.9.3-p194@dmarc-web/gems/sax-machine-0.2.0.rc1/lib/sax-machine/sax_document.rb:23:inparse'
6: /Users/brian/.rvm/gems/ruby-1.9.3-p194@dmarc-web/gems/sax-machine-0.2.0.rc1/lib/sax-machine/sax_configure.rb:19:in block in configure'
7: (pry):12:in'
8: /Users/brian/.rvm/gems/ruby-1.9.3-p194@dmarc-web/gems/pry-0.9.9.6/lib/pry/pry_instance.rb:249:in eval'

Can't parse elements with hyphens in name

For example, the in-reply-to element of ATOM threads. SAXMachine::SAXHandler#normalize_name seems to squish the hyphens into underscores, meaning the element configs aren't found.

I'm at the end of my day and my energy, otherwise this would be a pull request rather than an issue ;) I hope to look at it more tomorrow.

SAXMachine and ActiveRecord::Base Mixin Conflicts

require 'spec_helper'

class X < ActiveRecord::Base
  include SAXMachine
  element :title
end

class Y
  include SAXMachine
  element :title
end

describe "SAXMachine interaction with ActiveRecord::Base" do
  context "with ActiveRecord::Base" do
    before do
      xml = "<top><title>Test</title><b>Matched!</b><c>And Again</c></top>"
      @x = X.new
      a.parse xml
    end
    it { @x.title.should == "Test" }
  end

  context "without ActiveRecord::Base" do
    before do
      xml = "<top><title>Test</title><b>Matched!</b><c>And Again</c></top>"
      @y = Y.new
      @y.parse xml
    end
    it { @y.title.should == "Test" }
  end
end
  1) SAXMachine interaction with ActiveRecord::Base with ActiveRecord::Base 
     Failure/Error: @x = X.new
     ArgumentError:
       wrong number of arguments (0 for 1)
     # ./lib/sax-machine/sax_document.rb:49:in `parent'
     # ./spec/sax-machine/active_record_sax_machine_spec.rb:17:in `new'
     # ./spec/sax-machine/active_record_sax_machine_spec.rb:17:in `block (3 levels) in <top (required)>'

Any suggestions on how to work around/fix this issue?

Expected behavior when multiple elements map to same accessor using ":as"

I noticed this block over in Feedjira rss_entry.rb:

         element :pubDate, :as => :published
         element :pubdate, :as => :published
         element :"dc:date", :as => :published
         element :"dc:Date", :as => :published
         element :"dcterms:created", :as => :published

I'm trying to figure out what is supposed to happen when, say, both :pubDate and dc:Date are defined in the XML entry, or if :pubDate is not defined but some later one is.

Can someone shed some light on this?

Unable to parse elments with minus "-" in the elment name

Hi,

When the element name contains a minus sign the XML is not parsed
For example:

class RootElement
   include SAXMachine 
   elements "sub-element", :as => :subElements
end

root = RootElement.parse("<root><sub-element>1</sub-element><sub-element>2</sub-element></root>")

Using ruby 1.9.3p286 and sax-machine 1.0.0

Issue with hypen in element name

sax-machine will not parse elements containing hyphens.
example:

class Test
  include SAXMachine

  element 'test:test-element', as: :test
end

using the following xml:

<test>
  <test:test-element>This won't work</test:test-element>
</test>

Am I doing something wrong?
Any help is greatly appreciated.

Large xml file seems to not be "streaming", eatings GBs of Ram

I have a 1.6gb xml file, and when I parse it with Sax Machine it does not seem to be streaming or eating the file in chunks - rather it appears to be loading the whole file into memory (or maybe there is a memory leak somewhere?) because my ruby process climbs upwards of 2.5gb of ram. I don't know where it stops growing because I ran out of memory.

On a smaller file (50mb) it also appears to be loading the whole file. My task iterates over the records in the xml file and saves each record to a database. It takes about 30 seconds of "idling" and then all of a sudden the database queries start executing.

I thought SAX was supposed to allow you to work with large files like this without loading the whole thing in memory.

Is there something I am overlooking?

Many thanks,

@jakeonrails

SAXMachine Inheritance

The issue I am proposing a fix to is best expressed by this code snippit:

class A
  include SAXMachine
  element :a
end

class B < A
  element :b
end

class C < B
  element :c
end

xml = "<top><a>Test</a><b>Matched!</b><c>And Again</c></top>"
a = A.new
a.parse xml
b = B.new
b.parse xml
c = C.new
c.parse xml

a.a.should == "Test"

b.a.should == "Test"
b.b.should == "Matched!"

c.a.should == "Test"
c.b.should == "Matched!"
c.c.should == "And Again"

Basically, element(s) cannot be extended as the base class is extended.

oga 0.3.0 not compatible with sax-machine 1.3.1

Reproducer:

require 'feedjira';
require 'open-uri';
SAXMachine.handler = :oga
Feedjira::Feed.parse(open('http://feedjira.com/blog/feed.xml').read)

NoMethodError: undefined method `name' for ["xmlns", "http://www.w3.org/2005/Atom"]:Array
from /usr/lib64/ruby/gems/2.2.0/gems/sax-machine-1.3.1/lib/sax-machine/handlers/sax_oga_handler.rb:17:in `block in on_element'

yorick mentioned the the following links as part of the discussion:

18:03:52 < yorickpeterse> darix: https://github.com/YorickPeterse/oga/issues/92
18:06:44 < yorickpeterse> https://github.com/YorickPeterse/oga/commit/d8b9725b82f93d92b10170612446fbbef6190fda actual commit  
18:06:50 < yorickpeterse> basically attributes are now hashes in the SAX API
18:07:04 < yorickpeterse> Although you can override whatever it should be
18:15:39 < yorickpeterse> Basically in the past on_element's signature would be on_element(String, String, Array<Oga::XML::Attribute>) IIRC
18:15:47 < yorickpeterse> That has been changed to on_element(String, String, Hash) 

Switching to nokogiri works as work around for now.

Is it me or the selectors being buggy?

<vehicle lang="de-DE">
  <mainData>
  </mainData>
  <options>
    <option id="1">
      <value>5</value>
      <name>5 Türen</name>
      <description>Türen</description>
    </option>
    <option id="2">
      <value>5</value>
      <name>5 Sitze</name>
      <description>Sitze</description>
    </option>
  </options>
  <equipment/>
  <colors>
    <bodyColors>
      <bodyColor id="12">
        <name>silber</name>
        <description>Platin-Grau (D69)</description>
        <paint id="2">metallic</paint>
      </bodyColor>
    </bodyColors>
  </colors>
  <dates>
    <date id="2">2016-06-01</date>
    <date id="3">2019-04-01</date>
  </dates>
  <reservations/>
  <description>Dealer-of-the-Year-Sonderpreis</description>
  <package/>
</vehicle>
class Vehicle
  include SAXMachine
  element "description", as: :main_description, within: '.'
  element "description", as: :color, within: 'bodyColor'
end

class XmlParser
  include SAXMachine
  elements "vehicle", as: :vehicles, class: Vehicle
end

p = XmlParser.parse(xml)
p.vehicles.first
#=> #<Vehicle:0x007fc5213ec608 @main_description="Türen">

As you can see, neither the color nor the main_description is found correctly

Getting an attribute value and the content of a node at the same time doesn't work

I have the following xml :

    <status feed="http://domain.tld/path/to/feed.xml">
      <http code="200">9718 bytes fetched in 1.462708s : 2 new entries.</http>
      <next_fetch>2009-05-10T11:19:38-07:00</next_fetch>
    </status>

I have defined a SaxMachine with :

    element :http, :as => :message_status
    element :http, :as => :http_status, :value => :code

But message_status is always empty when I parse the XML... unless I remive the line that gets the :code attribute...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.