Giter Site home page Giter Site logo

Comments (18)

jorendorff avatar jorendorff commented on April 27, 2024 2

@aphillipo Yes, there are two tests that do this.

Threaded NIFs work like this:

  • you call the NIF; it spawns a thread, then returns some useless value like nil or :ok immediately

  • the thread works

  • when the thread is finished, it sends a message back to the process that originally called the NIF

Here are two example NIFs: https://github.com/hansihe/Rustler/blob/master/test/src/test_thread.rs

And here is what it looks like to call a threaded NIF: https://github.com/hansihe/Rustler/blob/master/test/test/thread_test.exs#L5-L8

from rustler.

hansihe avatar hansihe commented on April 27, 2024 1

This works now.

https://github.com/hansihe/ex_html5ever

from rustler.

hansihe avatar hansihe commented on April 27, 2024

I have not had a go at it, but it would probably be interesting to do so.

The main issue is that we would have to deal with a parsing operation that lasts longer than the 1ms that is recommended for NIF execution time. To deal with this we would either:

  • Not care. Easiest, but could also create problems in the BEAM if we are trying to parse a large chunk of HTML.
  • Use dirty schedulers. They are experimental for now, and require a special VM build.
  • Find a way to do manual rescheduling. This would probably be difficult to do since we would be calling a monolithic parsing function in html5ever.
  • Use a thread pool in rust. Unfortunately Rustler does not support owned environments at the moment. This should probably be a priority to implement.

It might be nice to implement it without a rescheduling strategy for now, and then later convert to either a thread pool when implemented or dirty schedulers when they stabilize. It would be a very nice demo of what this project would be capable of, and would also provide more examples on how to do stuff in Rustler.

from rustler.

aphillipo avatar aphillipo commented on April 27, 2024

Could it be that we implement the < 1ms thing by splitting up the parsing/html somehow?

I'll just do some timing to see how quickly html5ever can parse documents and what size 1ms represents (I know that any load will cause it to change so this might not be good). Seems 1ms isn't very long 👎

from rustler.

hansihe avatar hansihe commented on April 27, 2024

If splitting up the html is possible, that would indeed be a nice solution.

It would be nice to know the performance of html5ever, I am interested in any results you might have.

from rustler.

aphillipo avatar aphillipo commented on April 27, 2024

Right... I have some basic tests running (using the html2html example in html5ever but sending output to std::io::sink() to avoid output being an issue) and I'm surprised that it's this slow but I guess the Daily Mail have quite a big horrid homepage (I chose them because they are a good test of what I want to do).

Takes about 950ms average to tokenise, parse then serialise to html again their homepage.

Takes 725ms average to just tokenise and parse. I'm guessing the conversion price to Elixir AST will be of similar cost to serialise in html5ever.

Splitting HTML is actually going to be very difficult.

I'm starting to think writing an Elixir parser might be worth a stab at or just writing the bits where I need to parse html in pure rust for my app... Actually that is the right idea ;-)!

I'm actually pretty new to rust so if there are magical production flags that make things faster let me know. Happy to share the rubbish code I wrote to make this work ;-)

from rustler.

hansihe avatar hansihe commented on April 27, 2024

This was compiled in release mode, right?

from rustler.

aphillipo avatar aphillipo commented on April 27, 2024

Aha, okay I guess this new version compiled correctly in release mode. It goes down to a startlingly better 40ms and 60ms, for parse only and parse+serialize respectively.

This is measured using time::precise_time_ns().

Played around with a few other things but it looks around that.

from rustler.

aphillipo avatar aphillipo commented on April 27, 2024

Would it be better to write a bridge that sends html to a command line tool/zeromq/rust mailbox/queue and said tool produces Elixir code/tuples that can be run with Code.eval_string? Then no need to worry about nifs and we have a robust means of importing most html.

Will need to think about security for this...

from rustler.

hansihe avatar hansihe commented on April 27, 2024

If you are considering calling out to another program, you should really look at using the External Term Format instead of producing code like what you where considering. I believe there are libraries for dealing with this format in many languages, including Rust.

You should also look at ports and port drivers.

from rustler.

aphillipo avatar aphillipo commented on April 27, 2024

I would probably use https://github.com/alco/porcelain which uses ports and I'd just produce some elixir code from the output. Clearly it's not as good but I don't trust mochiweb_html all that much...

from rustler.

hansihe avatar hansihe commented on April 27, 2024

You can absolutely use the external term format with porcelain. You could use something like https://github.com/seriyps/rust-erl-ext on the rust side, and erlang:term_to_binary / erlang:binary_to_term on the erlang side. This would be much easier and a ton faster than producing and parsing elixir code like what you were thinking about doing.

from rustler.

scrogson avatar scrogson commented on April 27, 2024

This can be done pretty easily now that Rustler supports threaded NIFs in master.

from rustler.

aphillipo avatar aphillipo commented on April 27, 2024

Is there an example? 🤔

from rustler.

aphillipo avatar aphillipo commented on April 27, 2024

Okay this looks fairly trivial, I will take a stab at this on the weekend! Will let you guys know how I get on with:

a) calling html5ever in a thread with spawn (should be easy)
b) creating erlang/elixir lists/tuples/terms in the same format Floki/Mochiweb expects (should also be easy)

Thanks!

from rustler.

hansihe avatar hansihe commented on April 27, 2024

@aphillipo
I started working on this a little bit yesterday actually. Right now it works and returns terms to erlang, but I still want to try how it works with both rescheduling and threading.

If you want to contribute as well, I can put it up on github. We are in the #rustler channel on the elixir slack if you want to talk.

from rustler.

aphillipo avatar aphillipo commented on April 27, 2024

@hansihe Cool sounds great to me - if you are doing it that's a big win.

Do you agree with the Floki/mochiweb term format? I'll join #rustler now...

from rustler.

hansihe avatar hansihe commented on April 27, 2024

Yep, I am following that format fairly closely, with the exception of some small changes I discussed with @philss:

  • The root element returned from a parse function is always a list.
  • The addition of a doctype node type.

from rustler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.