Comments (18)
@aphillipo Yes, there are two tests that do this.
Threaded NIFs work like this:
-
you call the NIF; it spawns a thread, then returns some useless value like
nil
or:ok
immediately -
the thread works
-
when the thread is finished, it sends a message back to the process that originally called the NIF
Here are two example NIFs: https://github.com/hansihe/Rustler/blob/master/test/src/test_thread.rs
And here is what it looks like to call a threaded NIF: https://github.com/hansihe/Rustler/blob/master/test/test/thread_test.exs#L5-L8
from rustler.
This works now.
https://github.com/hansihe/ex_html5ever
from rustler.
I have not had a go at it, but it would probably be interesting to do so.
The main issue is that we would have to deal with a parsing operation that lasts longer than the 1ms that is recommended for NIF execution time. To deal with this we would either:
- Not care. Easiest, but could also create problems in the BEAM if we are trying to parse a large chunk of HTML.
- Use dirty schedulers. They are experimental for now, and require a special VM build.
- Find a way to do manual rescheduling. This would probably be difficult to do since we would be calling a monolithic parsing function in html5ever.
- Use a thread pool in rust. Unfortunately Rustler does not support owned environments at the moment. This should probably be a priority to implement.
It might be nice to implement it without a rescheduling strategy for now, and then later convert to either a thread pool when implemented or dirty schedulers when they stabilize. It would be a very nice demo of what this project would be capable of, and would also provide more examples on how to do stuff in Rustler.
from rustler.
Could it be that we implement the < 1ms thing by splitting up the parsing/html somehow?
I'll just do some timing to see how quickly html5ever can parse documents and what size 1ms represents (I know that any load will cause it to change so this might not be good). Seems 1ms isn't very long 👎
from rustler.
If splitting up the html is possible, that would indeed be a nice solution.
It would be nice to know the performance of html5ever, I am interested in any results you might have.
from rustler.
Right... I have some basic tests running (using the html2html example in html5ever but sending output to std::io::sink() to avoid output being an issue) and I'm surprised that it's this slow but I guess the Daily Mail have quite a big horrid homepage (I chose them because they are a good test of what I want to do).
Takes about 950ms average to tokenise, parse then serialise to html again their homepage.
Takes 725ms average to just tokenise and parse. I'm guessing the conversion price to Elixir AST will be of similar cost to serialise in html5ever.
Splitting HTML is actually going to be very difficult.
I'm starting to think writing an Elixir parser might be worth a stab at or just writing the bits where I need to parse html in pure rust for my app... Actually that is the right idea ;-)!
I'm actually pretty new to rust so if there are magical production flags that make things faster let me know. Happy to share the rubbish code I wrote to make this work ;-)
from rustler.
This was compiled in release mode, right?
from rustler.
Aha, okay I guess this new version compiled correctly in release mode. It goes down to a startlingly better 40ms and 60ms, for parse only and parse+serialize respectively.
This is measured using time::precise_time_ns().
Played around with a few other things but it looks around that.
from rustler.
Would it be better to write a bridge that sends html to a command line tool/zeromq/rust mailbox/queue and said tool produces Elixir code/tuples that can be run with Code.eval_string? Then no need to worry about nifs and we have a robust means of importing most html.
Will need to think about security for this...
from rustler.
If you are considering calling out to another program, you should really look at using the External Term Format instead of producing code like what you where considering. I believe there are libraries for dealing with this format in many languages, including Rust.
You should also look at ports and port drivers.
from rustler.
I would probably use https://github.com/alco/porcelain which uses ports and I'd just produce some elixir code from the output. Clearly it's not as good but I don't trust mochiweb_html all that much...
from rustler.
You can absolutely use the external term format with porcelain. You could use something like https://github.com/seriyps/rust-erl-ext on the rust side, and erlang:term_to_binary
/ erlang:binary_to_term
on the erlang side. This would be much easier and a ton faster than producing and parsing elixir code like what you were thinking about doing.
from rustler.
This can be done pretty easily now that Rustler supports threaded NIFs in master
.
from rustler.
Is there an example? 🤔
from rustler.
Okay this looks fairly trivial, I will take a stab at this on the weekend! Will let you guys know how I get on with:
a) calling html5ever in a thread with spawn (should be easy)
b) creating erlang/elixir lists/tuples/terms in the same format Floki/Mochiweb expects (should also be easy)
Thanks!
from rustler.
@aphillipo
I started working on this a little bit yesterday actually. Right now it works and returns terms to erlang, but I still want to try how it works with both rescheduling and threading.
If you want to contribute as well, I can put it up on github. We are in the #rustler
channel on the elixir slack if you want to talk.
from rustler.
@hansihe Cool sounds great to me - if you are doing it that's a big win.
Do you agree with the Floki/mochiweb term format? I'll join #rustler now...
from rustler.
Yep, I am following that format fairly closely, with the exception of some small changes I discussed with @philss:
- The root element returned from a parse function is always a list.
- The addition of a
doctype
node type.
from rustler.
Related Issues (20)
- NIF code trying to using MongoDB sync client hangs while trying to connect to DB HOT 2
- Check if Term is float HOT 5
- NifMap expansion error with rustler v0.27.0 HOT 2
- Update changelog for 0.27 HOT 1
- Store rustler template outside of priv/? HOT 2
- how to custom build target arch?
- can you support cargo-zigbuild? HOT 2
- `nif` macro fails with macro arguments HOT 2
- New Binary from Vec or Slice
- Support for enif_monitor_process HOT 1
- resource with lifetime
- Expose `enif_is_process_alive` HOT 4
- rustler errors on OTP26 HOT 1
- How to conveniently work with binaries as vectors of bytes?
- Why does Term::map_from_pairs return a NifResult?
- Support for fully custom `:load_from` for escripts HOT 8
- This OTP release uses the unsupported Erlang NIF version "2.17" HOT 1
- Test NifStruct derived traits
- Running `dialiyzer` with `rustler` 0.30.0 results in "The pattern can never match the type"" HOT 4
- Import structs from a library
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rustler.