Giter Site home page Giter Site logo

serpapi / nokolexbor Goto Github PK

View Code? Open in Web Editor NEW
181.0 12.0 4.0 646 KB

High-performance HTML5 parser for Ruby based on Lexbor, with support for both CSS selectors and XPath.

Ruby 7.00% C 68.71% HTML 23.97% CMake 0.32%
c-extension css html5 parser ruby web-scraping xpath serpapi

nokolexbor's People

Contributors

hartator avatar katafrakt avatar zyc9012 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nokolexbor's Issues

Provide the line number of a node? (particularly an attribute)

This may not be straightforward (I'm not familiar with the underlying parser engine), but I would love to be able to get the original line number of a node as it had been parsed. (And in my particular case, I would love the line number of an attribute.) This is useful in cases where content within the source document is being processed programmatically, and if there's an issue the user can be notified where exactly in the source document the problem originates.

For reference, Nokogiri provides this feature: https://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri%2FXML%2FNode:line

`<template>` tags in cloned node get messed up

Thanks for the great library!

I discovered one issue which is that when cloning a node which contains one or more <template> tags in the tree, the template elements don't clone properly. The data is still there but there's more than one DocumentFragment and the element gets serialized to <template></template>. I came up with a workaround to create a new template element, pull the right document fragment out, and swap the bad element for the new one:

example item_node HTML:

<li :class="{ 'high-light': item.name == 'xyz' }">
  <blockquote>
    <span v-text="index"></span>: Item! <strong><span v-text="text"></span> <span v-html="item.name"></span></strong>
  </blockquote>
  <ul>
    <template v-for="(subitem, index2) in item.subitems" :key="[subitem.name, index2]">
      <li :class="{ 'high-light': bigCount(count) }"><span v-text="index"></span> <span v-text="index2"></span>: <span v-text="subitem.name"></span></li>
    </template>
  </ul>
</li>

trying to clone that node:

new_node = item_node.clone # any template elements will be "empty" now

# workaround:
new_node.css("template").each do |bad_tmpl|
  frag = bad_tmpl.children.last
  new_tmpl = item_node.document.create_element("template")
  bad_tmpl.attributes.each do |k, v|
    new_tmpl[k] = v
  end
  new_tmpl.children[0].children = frag
  bad_tmpl.swap(new_tmpl)
end

Environment

  • OS: macOS Ventura 13.2.1
  • Ruby version: 3.1.0
  • Nokolexbor version: 0.3.7

undefined symbol: xmlFreeNodeList

require "nokolexbor"
doc = Nokolexbor::HTML("")
/var/lib/gems/3.0.0/gems/nokolexbor-0.2.0/lib/nokolexbor.rb:3:in `require': /var/lib/gems/3.0.0/gems/nokolexbor-0.2.0/lib/nokolexbor/nokolexbor.so: undefined symbol: xmlFreeNodeList - /var/lib/gems/3.0.0/gems/nokolexbor-0.2.0/lib/nokolexbor/nokolexbor.so (LoadError)
        from /var/lib/gems/3.0.0/gems/nokolexbor-0.2.0/lib/nokolexbor.rb:3:in `<top (required)>'

7x slower than Nokogiri with small documents

Not sure if I've missed something here, but even though Nokolexbor is faster with large documents, when you give it something simple, it's actually quite a lot slower. in the blow case, it's over 7x slower!

ruby 3.2.2 (2023-03-30 revision e51014f9c0) +YJIT [arm64-darwin22]
Warming up --------------------------------------
    Nokolexbor parse     2.064k i/100ms
      Nokogiri parse    13.850k i/100ms
Calculating -------------------------------------
    Nokolexbor parse     18.963k (± 5.8%) i/s -    379.776k in  20.097018s
      Nokogiri parse    139.865k (± 3.9%) i/s -      2.798M in  20.032824s

Comparison:
      Nokogiri parse:   139864.8 i/s
    Nokolexbor parse:    18963.3 i/s - 7.38x  slower
content = %(<h1 class="hello">Hello World</h1>)

Benchmark.ips do |x|
  x.warmup = 5
  x.time = 20

  x.report('Nokolexbor parse') do
    Nokolexbor::HTML(content)
  end
  x.report('Nokogiri parse') do
    Nokogiri::HTML(content)
  end
  x.compare!
end

Is there a reason for this? and is there anything we can do to speed it up? I would hate to have to use both libs - one for small docs and the other for larger ones.

thx

Low-level API for creating a document/fragment without parsing an HTML string

I maintain a Ruby-based view component library called Phlex, which takes a Ruby structure and turns it into an HTML String. I’m wondering if it might be possible (and reasonable) to alternatively turn it into a Nokolexbor syntax tree directly, skipping the HTML rendering and parsing steps for performance.

Basically, instead of returning an HTML string, a Phlex component could optionally return a Nokolexbor::DocumentFragment, which could then be used for testing or further DOM manipulation.

It may not work since we’d have to spend a lot of time in Ruby land calling lots of Ruby methods to build this document (they’d have to be faster than String#<< for this to make sense). And if it does work, it may not be worth it since it sounds like Nokolexbor is already really fast at parsing HTML. I thought it might be an interesting idea to explore anyway.

What are your thoughts?

Compilation errors with 0.5.2

Hi,

Thanks for building the library. When I tried to build Bridgetown to fix an issue bridgetownrb/bridgetown#852, bundle tried to install and compile nokolexbor.

My compiler is XCode's clang on macOS 14.4.1 (arm).

Apple clang version 15.0.0 (clang-1500.3.9.4)
Target: arm64-apple-darwin23.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

I skimmed through the code and the compiler complained about a few type signature problems.

I understand that clang can be more strict in implicit type conversions. Making explicit type conversations would help macOS users a bit more.

Details:

Gem::Ext::BuildError: ERROR: Failed to build gem native extension.

    current directory: /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/ext/nokolexbor
/Users/erickg/.local/share/mise/installs/ruby/3.2.3/bin/ruby extconf.rb
checking for whether -DLEXBOR_STATIC is accepted as CFLAGS... yes
checking for whether -DLIBXML_STATIC is accepted as CFLAGS... yes
checking for gmake... no
checking for make... yes
checking for cmake... yes
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Project name: lexbor
-- Build without Threads
-- Lexbor version: 2.1.0
-- The C compiler identification is AppleClang 15.0.0.15000309
-- The CXX compiler identification is AppleClang 15.0.0.15000309
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Append module: core (1.5.0)
-- Append module: css (0.3.0)
-- Append module: dom (1.4.0)
-- Append module: html (2.2.0)
-- Append module: ns (1.2.0)
-- Append module: selectors (0.1.0)
-- Append module: tag (1.2.0)
-- Append module: utils (0.3.0)
-- CFLAGS:  -O2 -Wall -pedantic -pipe -std=c99 -fPIC
-- CXXFLAGS:  -O2
-- Feature ASAN: disable
-- Feature Fuzzer: enabled
-- Configuring done (0.4s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/vendor/lexbor/build
-- /usr/bin/make install
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- The C compiler identification is AppleClang 15.0.0.15000309
-- The CXX compiler identification is AppleClang 15.0.0.15000309
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Performing Test HAVE_ATTRIBUTE_DESTRUCTOR
-- Performing Test HAVE_ATTRIBUTE_DESTRUCTOR - Success
-- Looking for include file inttypes.h
-- Looking for include file inttypes.h - found
-- Looking for rand_r
-- Looking for rand_r - found
-- Looking for include file stdint.h
-- Looking for include file stdint.h - found
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Configuring done (0.7s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/ext/nokolexbor/build
checking for -llexbor_static... yes
checking for lexbor/html/html.h... yes
creating Makefile

current directory: /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/ext/nokolexbor
make DESTDIR\= sitearchdir\=./.gem.20240420-75863-8t3ezg sitelibdir\=./.gem.20240420-75863-8t3ezg clean

current directory: /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/ext/nokolexbor
make DESTDIR\= sitearchdir\=./.gem.20240420-75863-8t3ezg sitelibdir\=./.gem.20240420-75863-8t3ezg
compiling nl_attribute.c
nl_attribute.c:62:15: warning: initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'const lxb_char_t *' (aka 'const unsigned char *') discards qualifiers
[-Wincompatible-pointer-types-discards-qualifiers]
  lxb_char_t *name = lxb_dom_attr_qualified_name(attr, &len);
              ^      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_attribute.c:64:10: warning: passing 'lxb_char_t *' (aka 'unsigned char *') to parameter of type 'const char *' converts between pointers to integer types where one is of the unique
plain 'char' type and the other is not [-Wpointer-sign]
  return rb_utf8_str_new(name, len);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/erickg/.local/share/mise/installs/ruby/3.2.3/include/ruby-3.2.0/ruby/internal/intern/string.h:1553:25: note: expanded from macro 'rb_utf8_str_new'
      rb_utf8_str_new) ((str), (len)))
                        ^~~~~
nl_attribute.c:102:15: warning: initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'const lxb_char_t *' (aka 'const unsigned char *') discards qualifiers
[-Wincompatible-pointer-types-discards-qualifiers]
  lxb_char_t *value = lxb_dom_attr_value(attr, &len);
              ^       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_attribute.c:104:10: warning: passing 'lxb_char_t *' (aka 'unsigned char *') to parameter of type 'const char *' converts between pointers to integer types where one is of the unique
plain 'char' type and the other is not [-Wpointer-sign]
  return rb_utf8_str_new(value, len);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/erickg/.local/share/mise/installs/ruby/3.2.3/include/ruby-3.2.0/ruby/internal/intern/string.h:1553:25: note: expanded from macro 'rb_utf8_str_new'
      rb_utf8_str_new) ((str), (len)))
                        ^~~~~
nl_attribute.c:144:28: warning: incompatible pointer types passing 'lxb_dom_element_t *' (aka 'struct lxb_dom_element *') to parameter of type 'lxb_dom_node_t *' (aka 'struct lxb_dom_node
*') [-Wincompatible-pointer-types]
  return nl_rb_node_create(attr->owner, nl_rb_document_get(self));
                           ^~~~~~~~~~~
./nokolexbor.h:26:41: note: passing argument to parameter 'node' here
VALUE nl_rb_node_create(lxb_dom_node_t *node, VALUE rb_document);
                                        ^
nl_attribute.c:161:28: warning: incompatible pointer types passing 'lxb_dom_attr_t *' (aka 'struct lxb_dom_attr *') to parameter of type 'lxb_dom_node_t *' (aka 'struct lxb_dom_node *')
[-Wincompatible-pointer-types]
  return nl_rb_node_create(attr->prev, nl_rb_document_get(self));
                           ^~~~~~~~~~
./nokolexbor.h:26:41: note: passing argument to parameter 'node' here
VALUE nl_rb_node_create(lxb_dom_node_t *node, VALUE rb_document);
                                        ^
nl_attribute.c:178:28: warning: incompatible pointer types passing 'lxb_dom_attr_t *' (aka 'struct lxb_dom_attr *') to parameter of type 'lxb_dom_node_t *' (aka 'struct lxb_dom_node *')
[-Wincompatible-pointer-types]
  return nl_rb_node_create(attr->next, nl_rb_document_get(self));
                           ^~~~~~~~~~
./nokolexbor.h:26:41: note: passing argument to parameter 'node' here
VALUE nl_rb_node_create(lxb_dom_node_t *node, VALUE rb_document);
                                        ^
nl_attribute.c:188:15: warning: initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'const lxb_char_t *' (aka 'const unsigned char *') discards qualifiers
[-Wincompatible-pointer-types-discards-qualifiers]
  lxb_char_t *attr_value = lxb_dom_attr_value(attr, &len);
              ^            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_attribute.c:192:40: warning: pointer type mismatch ('const char *' and 'lxb_char_t *' (aka 'unsigned char *')) [-Wpointer-type-mismatch]
                    attr_value == NULL ? "" : attr_value);
                                       ^ ~~   ~~~~~~~~~~
nl_attribute.c:192:21: warning: format specifies type 'char *' but the argument has type 'void *' [-Wformat]
                    attr_value == NULL ? "" : attr_value);
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10 warnings generated.
compiling nl_cdata.c
compiling nl_comment.c
compiling nl_document.c
nl_document.c:23:9: error: incompatible function pointer types initializing 'RUBY_DATA_FUNC' (aka 'void (*)(void *)') with an expression of type 'void (lxb_html_document_t *)' (aka 'void
(struct lxb_html_document *)') [-Wincompatible-function-pointer-types]
        free_nl_document,
        ^~~~~~~~~~~~~~~~
nl_document.c:107:45: warning: incompatible pointer types passing 'lxb_dom_document_t *' (aka 'struct lxb_dom_document *') to parameter of type 'lxb_html_document_t *' (aka 'struct
lxb_html_document *') [-Wincompatible-pointer-types]
  lxb_char_t *str = lxb_html_document_title(nl_rb_document_unwrap(self), &len);
                                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/ext/nokolexbor/../../vendor/lexbor/dist/include/lexbor/html/interfaces/document.h:100:46:
note: passing argument to parameter 'document' here
lxb_html_document_title(lxb_html_document_t *document, size_t *len);
                                             ^
nl_document.c:107:15: warning: initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'const lxb_char_t *' (aka 'const unsigned char *') discards qualifiers
[-Wincompatible-pointer-types-discards-qualifiers]
  lxb_char_t *str = lxb_html_document_title(nl_rb_document_unwrap(self), &len);
              ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_document.c:108:44: warning: passing 'lxb_char_t *' (aka 'unsigned char *') to parameter of type 'const char *' converts between pointers to integer types where one is of the unique
plain 'char' type and the other is not [-Wpointer-sign]
  return str == NULL ? rb_str_new("", 0) : rb_utf8_str_new(str, len);
                                           ^~~~~~~~~~~~~~~~~~~~~~~~~
/Users/erickg/.local/share/mise/installs/ruby/3.2.3/include/ruby-3.2.0/ruby/internal/intern/string.h:1553:25: note: expanded from macro 'rb_utf8_str_new'
      rb_utf8_str_new) ((str), (len)))
                        ^~~~~
nl_document.c:129:49: warning: incompatible pointer types passing 'lxb_dom_document_t *' (aka 'struct lxb_dom_document *') to parameter of type 'lxb_html_document_t *' (aka 'struct
lxb_html_document *') [-Wincompatible-pointer-types]
  lxb_char_t *str = lxb_html_document_title_set(nl_rb_document_unwrap(self), (const lxb_char_t *)c_title, len);
                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/ext/nokolexbor/../../vendor/lexbor/dist/include/lexbor/html/interfaces/document.h:103:50:
note: passing argument to parameter 'document' here
lxb_html_document_title_set(lxb_html_document_t *document,
                                                 ^
nl_document.c:129:15: error: incompatible integer to pointer conversion initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'lxb_status_t' (aka 'unsigned int')
[-Wint-conversion]
  lxb_char_t *str = lxb_html_document_title_set(nl_rb_document_unwrap(self), (const lxb_char_t *)c_title, len);
              ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_document.c:129:15: warning: unused variable 'str' [-Wunused-variable]
5 warnings and 2 errors generated.
make: *** [nl_document.o] Error 1

make failed, exit code 2

Gem files will remain installed in /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2 for inspection.
Results logged to /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/extensions/arm64-darwin-23/3.2.0/nokolexbor-0.5.2/gem_make.out

  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:119:in `run'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:53:in `block in make'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:45:in `each'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:45:in `make'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/ext_conf_builder.rb:42:in `build'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:187:in `build_extension'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:221:in `block in build_extensions'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:218:in `each'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:218:in `build_extensions'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/installer.rb:846:in `build_extensions'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/rubygems_gem_installer.rb:76:in `build_extensions'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/rubygems_gem_installer.rb:28:in `install'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/source/rubygems.rb:205:in `install'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/installer/gem_installer.rb:54:in `install'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/installer/gem_installer.rb:16:in `install_from_spec'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/installer/parallel_installer.rb:132:in `do_install'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/installer/parallel_installer.rb:123:in `block in worker_pool'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/worker.rb:62:in `apply_func'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/worker.rb:57:in `block in process_queue'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/worker.rb:54:in `loop'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/worker.rb:54:in `process_queue'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/worker.rb:90:in `block (2 levels) in create_threads'

`Nokolexbor::HTML` doesn't force conversion to String

Getting an error when trying the Nokogiri's quickstart:

doc = Nokolexbor::HTML(URI.open('https://github.com/serpapi/nokolexbor'))
Exception: TypeError: no implicit conversion of Tempfile into String
--
 0: /Users/user/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/nokolexbor-0.2.4/lib/nokolexbor.rb:15:in `parse'
 1: /Users/user/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/nokolexbor-0.2.4/lib/nokolexbor.rb:15:in `HTML'
 2: (pry):1:in `<main>'

Nokolexbor::HTML probably just doesn't force conversion to String like Nokogiri::HTML does.

Compilation error while installing 0.5.3 version

Describe the bug
I am facing while installing (bundle install) the gems on my mac with ruby version 3.3.0

Environment

  • OS: [MacOs Sonoma 14.4.1]
  • Ruby version: [3.3.0]
  • Nokolexbor version: [2.2.33]

Additional context

Gem::Ext::BuildError: ERROR: Failed to build gem native extension.

    current directory: /Users/user1/.rvm/gems/ruby-3.3.0/gems/nokolexbor-0.5.3/ext/nokolexbor
/Users/user1/.rvm/rubies/ruby-3.3.0/bin/ruby extconf.rb --with-cflags\=-Wno-error\=incompatible-function-pointer-types
checking for whether -DLEXBOR_STATIC is accepted as CFLAGS... yes
checking for whether -DLIBXML_STATIC is accepted as CFLAGS... yes
checking for gmake... yes
checking for cmake... yes
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Project name: lexbor
-- Build without Threads
-- Lexbor version: 2.1.0
-- The C compiler identification is AppleClang 15.0.0.15000309
-- The CXX compiler identification is AppleClang 15.0.0.15000309
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Append module: core (1.5.0)
-- Append module: css (0.3.0)
-- Append module: dom (1.4.0)
-- Append module: html (2.2.0)
-- Append module: ns (1.2.0)
-- Append module: selectors (0.1.0)
-- Append module: tag (1.2.0)
-- Append module: utils (0.3.0)
-- CFLAGS:  -O2 -Wall -pedantic -pipe -std=c99 -fPIC
-- CXXFLAGS:  -O2
-- Feature ASAN: disable
-- Feature Fuzzer: enabled
-- Configuring done (0.6s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/user1/.rvm/gems/ruby-3.3.0/gems/nokolexbor-0.5.3/vendor/lexbor/build
-- /opt/homebrew/bin/gmake install
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- The C compiler identification is AppleClang 15.0.0.15000309
-- The CXX compiler identification is AppleClang 15.0.0.15000309
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Performing Test HAVE_ATTRIBUTE_DESTRUCTOR
-- Performing Test HAVE_ATTRIBUTE_DESTRUCTOR - Success
-- Looking for include file inttypes.h
-- Looking for include file inttypes.h - found
-- Looking for rand_r
-- Looking for rand_r - found
-- Looking for include file stdint.h
-- Looking for include file stdint.h - found
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Configuring done (0.9s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/user1/.rvm/gems/ruby-3.3.0/gems/nokolexbor-0.5.3/ext/nokolexbor/build
checking for -llexbor_static... yes
checking for lexbor/html/html.h... yes
creating Makefile

current directory: /Users/user1/.rvm/gems/ruby-3.3.0/gems/nokolexbor-0.5.3/ext/nokolexbor
make DESTDIR\= sitearchdir\=./.gem.20240503-92175-cj7kz3 sitelibdir\=./.gem.20240503-92175-cj7kz3 clean

current directory: /Users/user1/.rvm/gems/ruby-3.3.0/gems/nokolexbor-0.5.3/ext/nokolexbor
make DESTDIR\= sitearchdir\=./.gem.20240503-92175-cj7kz3 sitelibdir\=./.gem.20240503-92175-cj7kz3
compiling nl_attribute.c
nl_attribute.c:62:15: warning: initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'const lxb_char_t *' (aka 'const unsigned char *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
  lxb_char_t *name = lxb_dom_attr_qualified_name(attr, &len);
              ^      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_attribute.c:64:10: warning: passing 'lxb_char_t *' (aka 'unsigned char *') to parameter of type 'const char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is not [-Wpointer-sign]
  return rb_utf8_str_new(name, len);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/user1/.rvm/rubies/ruby-3.3.0/include/ruby-3.3.0/ruby/internal/intern/string.h:1553:25: note: expanded from macro 'rb_utf8_str_new'
      rb_utf8_str_new) ((str), (len)))
                        ^~~~~
nl_attribute.c:102:15: warning: initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'const lxb_char_t *' (aka 'const unsigned char *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
  lxb_char_t *value = lxb_dom_attr_value(attr, &len);
              ^       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_attribute.c:104:10: warning: passing 'lxb_char_t *' (aka 'unsigned char *') to parameter of type 'const char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is not [-Wpointer-sign]
  return rb_utf8_str_new(value, len);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/user1/.rvm/rubies/ruby-3.3.0/include/ruby-3.3.0/ruby/internal/intern/string.h:1553:25: note: expanded from macro 'rb_utf8_str_new'
      rb_utf8_str_new) ((str), (len)))
                        ^~~~~
nl_attribute.c:144:28: warning: incompatible pointer types passing 'lxb_dom_element_t *' (aka 'struct lxb_dom_element *') to parameter of type 'lxb_dom_node_t *' (aka 'struct lxb_dom_node *') [-Wincompatible-pointer-types]
  return nl_rb_node_create(attr->owner, nl_rb_document_get(self));
                           ^~~~~~~~~~~
./nokolexbor.h:26:41: note: passing argument to parameter 'node' here
VALUE nl_rb_node_create(lxb_dom_node_t *node, VALUE rb_document);
                                        ^
nl_attribute.c:161:28: warning: incompatible pointer types passing 'lxb_dom_attr_t *' (aka 'struct lxb_dom_attr *') to parameter of type 'lxb_dom_node_t *' (aka 'struct lxb_dom_node *') [-Wincompatible-pointer-types]
  return nl_rb_node_create(attr->prev, nl_rb_document_get(self));
                           ^~~~~~~~~~
./nokolexbor.h:26:41: note: passing argument to parameter 'node' here
VALUE nl_rb_node_create(lxb_dom_node_t *node, VALUE rb_document);
                                        ^
nl_attribute.c:178:28: warning: incompatible pointer types passing 'lxb_dom_attr_t *' (aka 'struct lxb_dom_attr *') to parameter of type 'lxb_dom_node_t *' (aka 'struct lxb_dom_node *') [-Wincompatible-pointer-types]
  return nl_rb_node_create(attr->next, nl_rb_document_get(self));
                           ^~~~~~~~~~
./nokolexbor.h:26:41: note: passing argument to parameter 'node' here
VALUE nl_rb_node_create(lxb_dom_node_t *node, VALUE rb_document);
                                        ^
nl_attribute.c:188:15: warning: initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'const lxb_char_t *' (aka 'const unsigned char *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
  lxb_char_t *attr_value = lxb_dom_attr_value(attr, &len);
              ^            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_attribute.c:192:40: warning: pointer type mismatch ('char *' and 'lxb_char_t *' (aka 'unsigned char *')) [-Wpointer-type-mismatch]
                    attr_value == NULL ? "" : attr_value);
                                       ^ ~~   ~~~~~~~~~~
nl_attribute.c:192:21: warning: format specifies type 'char *' but the argument has type 'void *' [-Wformat]
                    attr_value == NULL ? "" : attr_value);
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10 warnings generated.
compiling nl_cdata.c
compiling nl_comment.c
compiling nl_document.c
nl_document.c:23:9: warning: incompatible function pointer types initializing 'RUBY_DATA_FUNC' (aka 'void (*)(void *)') with an expression of type 'void (lxb_html_document_t *)' (aka 'void (struct lxb_html_document *)') [-Wincompatible-function-pointer-types]
        free_nl_document,
        ^~~~~~~~~~~~~~~~
nl_document.c:107:45: warning: incompatible pointer types passing 'lxb_dom_document_t *' (aka 'struct lxb_dom_document *') to parameter of type 'lxb_html_document_t *' (aka 'struct lxb_html_document *') [-Wincompatible-pointer-types]
  lxb_char_t *str = lxb_html_document_title(nl_rb_document_unwrap(self), &len);
                                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/user1/.rvm/gems/ruby-3.3.0/gems/nokolexbor-0.5.3/ext/nokolexbor/../../vendor/lexbor/dist/include/lexbor/html/interfaces/document.h:100:46: note: passing argument to parameter 'document' here
lxb_html_document_title(lxb_html_document_t *document, size_t *len);
                                             ^
nl_document.c:107:15: warning: initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'const lxb_char_t *' (aka 'const unsigned char *') discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
  lxb_char_t *str = lxb_html_document_title(nl_rb_document_unwrap(self), &len);
              ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_document.c:108:44: warning: passing 'lxb_char_t *' (aka 'unsigned char *') to parameter of type 'const char *' converts between pointers to integer types where one is of the unique plain 'char' type and the other is not [-Wpointer-sign]
  return str == NULL ? rb_str_new("", 0) : rb_utf8_str_new(str, len);
                                           ^~~~~~~~~~~~~~~~~~~~~~~~~
/Users/user1/.rvm/rubies/ruby-3.3.0/include/ruby-3.3.0/ruby/internal/intern/string.h:1553:25: note: expanded from macro 'rb_utf8_str_new'
      rb_utf8_str_new) ((str), (len)))
                        ^~~~~
nl_document.c:129:49: warning: incompatible pointer types passing 'lxb_dom_document_t *' (aka 'struct lxb_dom_document *') to parameter of type 'lxb_html_document_t *' (aka 'struct lxb_html_document *') [-Wincompatible-pointer-types]
  lxb_char_t *str = lxb_html_document_title_set(nl_rb_document_unwrap(self), (const lxb_char_t *)c_title, len);
                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/user1/.rvm/gems/ruby-3.3.0/gems/nokolexbor-0.5.3/ext/nokolexbor/../../vendor/lexbor/dist/include/lexbor/html/interfaces/document.h:103:50: note: passing argument to parameter 'document' here
lxb_html_document_title_set(lxb_html_document_t *document,
                                                 ^
nl_document.c:129:15: error: incompatible integer to pointer conversion initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'lxb_status_t' (aka 'unsigned int') [-Wint-conversion]
  lxb_char_t *str = lxb_html_document_title_set(nl_rb_document_unwrap(self), (const lxb_char_t *)c_title, len);
              ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5 warnings and 1 error generated.
make: *** [nl_document.o] Error 1

make failed, exit code 2

Gem files will remain installed in /Users/user1/.rvm/gems/ruby-3.3.0/gems/nokolexbor-0.5.3 for inspection.
Results logged to /Users/user1/.rvm/gems/ruby-3.3.0/extensions/arm64-darwin-23/3.3.0/nokolexbor-0.5.3/gem_make.out

  /Users/user1/.rvm/rubies/ruby-3.3.0/lib/ruby/3.3.0/rubygems/ext/builder.rb:125:in `run'
  /Users/user1/.rvm/rubies/ruby-3.3.0/lib/ruby/3.3.0/rubygems/ext/builder.rb:51:in `block in make'
  /Users/user1/.rvm/rubies/ruby-3.3.0/lib/ruby/3.3.0/rubygems/ext/builder.rb:43:in `each'
  /Users/user1/.rvm/rubies/ruby-3.3.0/lib/ruby/3.3.0/rubygems/ext/builder.rb:43:in `make'
  /Users/user1/.rvm/rubies/ruby-3.3.0/lib/ruby/3.3.0/rubygems/ext/ext_conf_builder.rb:42:in `build'
  /Users/user1/.rvm/rubies/ruby-3.3.0/lib/ruby/3.3.0/rubygems/ext/builder.rb:193:in `build_extension'
  /Users/user1/.rvm/rubies/ruby-3.3.0/lib/ruby/3.3.0/rubygems/ext/builder.rb:227:in `block in build_extensions'
  /Users/user1/.rvm/rubies/ruby-3.3.0/lib/ruby/3.3.0/rubygems/ext/builder.rb:224:in `each'
  /Users/user1/.rvm/rubies/ruby-3.3.0/lib/ruby/3.3.0/rubygems/ext/builder.rb:224:in `build_extensions'
  /Users/user1/.rvm/rubies/ruby-3.3.0/lib/ruby/3.3.0/rubygems/installer.rb:852:in `build_extensions'
  /Users/user1/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/rubygems_gem_installer.rb:71:in `build_extensions'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/rubygems_gem_installer.rb:28:in `install'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/source/rubygems.rb:204:in `install'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/installer/gem_installer.rb:54:in `install'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/installer/gem_installer.rb:59:in `block in install_with_settings'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/rubygems_integration.rb:575:in `install_with_build_args'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/installer/gem_installer.rb:59:in `install_with_settings'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/installer/gem_installer.rb:16:in `install_from_spec'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/installer/parallel_installer.rb:186:in `do_install'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/installer/parallel_installer.rb:177:in `block in worker_pool'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/worker.rb:62:in `apply_func'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/worker.rb:57:in `block in process_queue'
  <internal:kernel>:187:in `loop'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/worker.rb:54:in `process_queue'
  /Users/.rvm/gems/ruby-3.3.0/gems/bundler-2.2.33/lib/bundler/worker.rb:91:in `block (2 levels) in create_threads'

An error occurred while installing nokolexbor (0.5.3), and Bundler cannot continue.

In Gemfile:
  nokolexbor

error build on clang15

nl_node.c:256:14: error: incompatible integer to pointer conversion assigning to 'void *' from 'int' [-Wint-conversion]
  root->user = count;
             ^ ~~~~~
nl_node.c:263:18: error: incompatible integer to pointer conversion assigning to 'void *' from 'int' [-Wint-conversion]
      node->user = ++count;
                 ^ ~~~~~~~
nl_node.c:278:18: error: incompatible integer to pointer conversion assigning to 'void *' from 'int' [-Wint-conversion]
      node->user = ++count;
                 ^ ~~~~~~~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.