Comments (5)
I can do this but not earlier than 2 weeks, sorry. Ping me in case I forget.
from etl-language-comparison.
Ok, sounds good
from etl-language-comparison.
Ping @josevalim
from etl-language-comparison.
Sorry, I can't provide a full README but I don't want to hold this for any longer. So I will send some text and I hope you can complement it as you find more appropriate. Sorry for the delay!
The Elixir implementation has provides a mapreduce
mix task that receives either "binary" or "regex" as argument. The argument chooses which module will do the mapping over each files. Once mapping is performed with each file in a different Elixir process, the result of the mapper is returned to a single reducer process which sums up all counts.
The "binary" argument chooses a BinaryMapActor
which uses binary matches provided by the Erlang VM to lookup for matches. Because binary matches are not case insensitive, the first step of the mapper algorithm is to generate all permutations for the word being counted.
The "regex" argument chooses a RegexMapActor
which uses regular expressions and are often slower than binary matches.
Generally speaking, both solutions could be further optimized although we believe the current code provides a good trade-off between clarity and performing more low-level optimizations. Other than that, we expect future Elixir versions to yield even better results. For example, Elixir 1.2 introduces support to large maps which will perform faster than HashDict.
from etl-language-comparison.
This will do just fine, thanks @josevalim
from etl-language-comparison.
Related Issues (8)
- Alternative single-threaded python implementation (using library) HOT 2
- Create a (automatic?) process for updating the benchmark results HOT 1
- Include memory consumption in the benchmarks HOT 1
- Standardize algorithms HOT 1
- Add README.me to Rust implementation
- Add README.md to PHP implementation
- Review Elixir implementation HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from etl-language-comparison.