yahoo / fel Goto Github PK
View Code? Open in Web Editor NEWFast Entity Linker Toolkit for training models to link entities to KnowledgeBase (Wikipedia) in documents and queries.
License: Apache License 2.0
Fast Entity Linker Toolkit for training models to link entities to KnowledgeBase (Wikipedia) in documents and queries.
License: Apache License 2.0
Can I run the FEL without Hadoop, if yes, how can I run it?
This could be a follow-up from #7:
I'm wondering if it's possible to do the same for coherentEntityLinker. Here is what i'm currently at:
changed EntityResult to add a Span variable
including variable in the map as such:
List<EntityResult> candidates = felCandidates.stream().map(felResult -> {
String wikiId = ...;
return new EntityResult(wikiId, felResult.score, felResult.type, felResult.s);
but the Span variable seems to hold the entire query, instead of only the surface form. Is it the expected behaviour? Or is there a step I'm missing?
Thanks!
Dear aasish:
Thanks for your contribution over entity linking. It is a good tool for entity linking.
After reading your paper and the codes, I find that the FastEntityLinker Class can get candidate mentions and entities, and the CoherentEntityLinkerWrapper Class takes in mentions and gets the coherent entities. However, I don't find the integrated entity linking in the codes. For example, inputting the sentence "Yahoo is a company headquartered in Sunnyvale, CA with Marissa Mayer as CEO", and return the linked entities in Wikipedia "Yahoo", "Sunnyvale, California" and "Marissa_Mayer". Could you please provide the integrated entity linking?
Thank you very much.
I have read related paper in WSDM2017 and the code. When using the class whose name is
CoherentEntityLinker, the args in the file of README.md are "en/enwiki.wiki2vec.d300.compressed en/english-nov15.hash", I get the wiki2vec and hash files. In fact, there are some other models which are ENTITIES.PHRASE.model and PHRASE.model to load. Would you please provide those ENTITIES.PHRASE.model and PHRASE.model in language of English and Chinese?
Dear Authors,
When I try to run your tool, I encounter a "java.lang.NoClassDefFoundError: it/unimi/dsi/fastutil/io/BinIO".
I found that the codes for this class is missing in your github. I wonder if I use the wrong commands to compile the project. (mvn compile)
Regards,
zphuang
Is "mvn exec:java -Dexec.mainClass=com.yahoo.semsearch.fastlinking.FastEntityLinker -Dexec.args=“zh/chinese-dec15.hash" the right command to do fastlinking of Chinese?
I run that command and got into the interactive shell. But when I input some sentence, it does not shows the entities. I tried Spanish, and the same thing happened. What could be the problem? Thanks a lot!
Hi,
I want to know that the EntityContextFastEntityLinker class "Word vectors file" and "Entities word vectors file" are the same file?
new FlaggedOption( "hash", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, 'h', "hash", "quasi succint hash" ),
new FlaggedOption( "vectors", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, 'v', "vectors", "Word vectors file" ),
new FlaggedOption( "entities", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, 'e', "entities", "Entities word vectors file" )
Thanks!
Hi
I am following the steps provided here to train my model.
I have pre-processed the datapack. But when I am trying to "Build Data Structures and extract anchor text", I am having this GC overhead issue.
I have even increased the MAPRED and HADOOP memory to 15G and even provided opts for
Dmapreduce.reduce.java.opts and Dmapreduce.reduce.memory.mb
My system has 8 cores 32 GB, using java 8. This is the snippet of command that I am following.
hadoop \
jar target/FEL-0.1.0-fat.jar \
com.yahoo.semsearch.fastlinking.io.ExtractWikipediaAnchorText \
-Dmapreduce.map.env="JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" \
-Dmapreduce.reduce.env="JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" \
-Dyarn.app.mapreduce.am.env="JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" \
-Dmapred.job.map.memory.mb=15144 \
-Dmapreduce.map.memory.mb=15144 \
-Dmapreduce.reduce.memory.mb=15144 \
-Dmapred.child.java.opts="-Xmx15g" \
-Dmapreduce.map.java.opts='-Xmx15g -XX:NewRatio=8 -XX:+UseSerialGC' \
-Dmapreduce.reduce.java.opts="-Xmx15g -XX:NewRatio=8 -XX:+UseSerialGC" \
-input wiki/${WIKI_MARKET}/${WIKI_DATE}/pages-articles.block \
-emap wiki/${WIKI_MARKET}/${WIKI_DATE}/entities.map \
-amap wiki/${WIKI_MARKET}/${WIKI_DATE}/anchors.map \
-cfmap wiki/${WIKI_MARKET}/${WIKI_DATE}/alias-entity-counts.map \
-redir wiki/${WIKI_MARKET}/${WIKI_DATE}/redirects
Could you please suggest why this might be happening?
Pardon me as I am novice to hadoop and java
Thank you for providing the code.
We were trying out to mine wikipedia using this shell script for our entity linker using the dump for 2018/05/01. We were able to generate the hash file but surprisingly the file size was 284 MB. In contrast, the pre-trained model provided has a file size of 1.3G for English Hash trained from November 2015 Wikipedia
@aasish could you suggest what might be happening wrong. Is it because of the compression or are we missing out on some entities? Is there a way that we could combine both the hash files so that we can take into account the recent entities.
I'm fairly new to the Java/Maven ecosystem, so, I'm sorry if this is completely unrelated to FEL itself.
After clonning the repository and running mvn install
, running java -Xmx10G com.yahoo.semsearch.fastlinking.FastEntityLinker --help
will always return the same error:
Error: Could not find or load main class com.yahoo.semsearch.fastlinking.FastEntityLinker
Tried running these both on a linux machine and on a Mac machine.
Is there any step that I'm missing?
Thanks!
P.S.: In my Maven local repository, I can find both /it/unimi/dsi/fastutil/
and /com/yahoo/FEL/FEL/
Hey, Dear aasish:
I am very curious about how to link the candidates in the result list into their surface form.
Now, my stand-alone result is like:
`>Trump had dinner with me tonight.
Trump had dinner with me tonight. Donald_Trump -3.6968905396651404 1331552
Trump had dinner with me tonight. Dinner -3.811408020070193 1300672
Trump had dinner with me tonight. Come_Dine_with_Me -3.871200967594029 1082062
.......`
However, it seems that we cannot get the surface form of Donald_Trump directly in your code. In another word, could I get the output like this:
Trump(Donald_Trump) had dinner with me tonight.
?
If it is possible, this would be very helpful for our research.
Thank you very much in advance.
How can I use FEL on my own ontology, rather than using wikipedia ?
Dear Author,
mvn clean compile exec:java -Dexec.mainClass=com.yahoo.semsearch.fastlinking.CoherentEntityLinkerWrapper -Dexec.args="en/enwiki.wiki2vec.d300.compressed en/english-nov15.hash test.txt" -Dexec.classpathScope=compile
when I exec the command, I find there is no class named CoherentEntityLinkerWrapper.
I downloaded your model english-nov15.hash and enwiki.wiki2vec.d300.compressed, but I can't find Entity file. How can I use EntityContextFastEntityLinker with your trained model?
thank you~
See: https://issues.apache.org/jira/browse/PIG-4175
As the above post says, PIG CROSS operation follow by STORE produces non-deterministic results and the below portion of codes has the same problem:
https://github.com/yahoo/FEL/blob/master/src/main/pig/compute-graph-alias-entity-counts.pig#L137-L152
I think FEL doesn't run correctly under version 0.12.0 or lower and should require version 0.14.0 or higher.
hello:
i want to do some entity embedding,with below command,but i don't know the paramter --files?
"
hadoop jar FEL-0.1.0-fat.jar com.yahoo.semsearch.fastlinking.w2v.EntityEmbeddings -Dmapreduce.job.queuename=adhoc -files word_vectors#vectors E2W entity.embeddings
"
i think it may the input or output of word embedding command,but not
"
java com.yahoo.semsearch.fastlinking.w2v.Quantizer -i <word_embeddings> -o -h
"
can you give me some suggestions? thanks
Hi,
I would like to use the entity linker and I am trying to get the datasets from the links given in the readme file.
If I log in with my yahoo account and try to submit a request to get the L30 dataset, nothing happens. I am redirected to the "My dataset selection" page again, the request does not seem to be send.
How can I get the dataset?
Hi,
I've come up with the following question. When running FastEntityLinker for a question, it will return entity mentions with score and id. What is the id referring to?
I've tested on wikidata QID and wikipedia pageid, finding out that neither is the id referring to. Thanks!
Dear author:
Thank you for your contribution. I use the chinese hash model to get the entities using com.yahoo.semsearch.fastlinking.FastEntityLinker class and get results. However,the result is not good enough. Could you do me a favor to improve the model?I have two questions to ask:1.Is the result related to the hash model?2.can I train model using my own data? Thank you very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.