zseder / hunmisc Goto Github PK
View Code? Open in Web Editor NEWmiscellaneous tools/scripts for different NLP related tasks
License: GNU Lesser General Public License v3.0
miscellaneous tools/scripts for different NLP related tasks
License: GNU Lesser General Public License v3.0
@zseder I accidentally pushed my changes to master. Could you check them out and see if I have to change anything (HEAD
vs HEAD~2
)?
Enable the user to assign weights to the three basic operations (insertion, deletion and replacement) in LSD. The default must be the current, hard-coded weights (even though it is different from the original case, where the weight of replacement was 2).
Enable the user to customize the LSD further by giving the function three maps:
insert_map
and delete_map
would contain character - weight pairs. Their meaning is: if we insert/delete said character, the cost should grow by the specified weight;replace_map
would store character x character -> weight pairs; I hope its role is obvious.I would create a new function so that the original levenshtein()
could remain as simple as possible. I would even go as far as to create a Levenshtein class that takes as parameters the above (and all parameters the original has) and that has a distance(s1, s2) method.
Actually, thinking about it now, the new functionality does not make the implementation that much more complex (only a few dict.get
's), but would probably make it a bit slower. What do you think?
These algorithms should not really be written in Python, and calling them many times can make the program rather slow, so any method that helps to speed them up is welcome.
the selection of main category should be rewritten so that it doesn't depend on the order of the entity's lines in the dump
Since pylzma doesn't seem to be working, we should create a wrapper around "7zr" the way we did with gzip in xzip.py
Since ALL cells in the matrix are computed (which might not actually be necessary, I have to check), we can reach max_distance even though the final result is lower than that. So now it has a new meaning: if max_distance < final result, return that.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.