Comments (4)
Hello! Great question.
So we do have a contextual candidate generator that we used for kore50 and rss500. This takes into account contextual similarities between an entity's Wikipedia page and the sentence itself. So because the sentences are different, the lists are different.
The score for a candidate is based on a few features that we used for this contextual generation: the similarity between the mention, the overall entity popularity, and the similarity between the sentence and the entity's Wikipedia page. We only use this score for filtering the lists.
from bootleg.
@lorr1 Thanks for answer! How about the the score in data/wiki_entity_data/entity_mappings/alias2qids.json file? Is it generated by averaging the scores of the same alias-entity pairs in Wikipedia anchor texts using the same contextual candidate generator?
Thanks.
from bootleg.
So that one is used just for training so is not contextual. We could certainly make it that way (and are exploring these ideas!) but didn't use a contextual one for training. That score based on an overall entity's occurrence in Wikipedia. So it's a ranking based on entity popularity. Note that this is not conditioned on a specific alias - it's just overall entity popularity. We found that this was necessary when incorporating aliases from Wikidata that may never have been seen in Wikipedia yet still be valid aliases.
from bootleg.
Great. Got it. Well understood. Thanks!
from bootleg.
Related Issues (20)
- Maybe a bug in 'bootleg_annotator.py' HOT 1
- Consider to Benchmark Bootleg? HOT 2
- Published PyPI module is out of date HOT 2
- Is there any way to replace the current NER ? HOT 5
- Annotations using entity_emb_file parameter are fast but not matching the accuracy level HOT 1
- Entity embedding training is not using GPU on Google Colab Pro+ HOT 1
- Do you update the knowledge graph periodically ? HOT 3
- Batch processing on label_mentions is not working HOT 1
- Error in the end2end module
- Installation guide is insufficient
- Details about the development set HOT 2
- Static embeddings are similar HOT 6
- AssertionError: After eval, some sentences had left over mentions {0: {0}} HOT 2
- Languages Supported HOT 3
- Version comprison between bootleg 1.0.0 and bootleg 1.1.0 HOT 4
- The Embeddings can not be download ! HOT 1
- bug of example HOT 3
- Answer gets significantly wrong when input is long HOT 2
- Installation error HOT 12
- No such file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bootleg.