Comments (5)
So I went ahead and added your function as an example in the branch here. If you use the annotator and use the extract method of custom, it should trigger your extractor. I haven't tested it but it should get you started.
from bootleg.
Hi!
Yes, you can do this. I have a list of possible extractors here. If you want to implement your own extractor function and add it there, you should be able to trigger it being used via this argument here.
As long as you have the same inputs/outputs, it should be possible.
from bootleg.
Hi,
Thanks a lot for the quick response. :)
My extractor function using flair takes input as a string and outputs the extracted entities in a pandas dataframe.
def entity_recognition(text):
"""Given a text document, run a NER on it using flair and return a dataframe with the following columns
text: actual raw text input
entity: identified entity text
entity_start: character start position of entity in raw text
entity_end: character end position of entity in raw text
"""
import pandas as pd
from flair.data import Sentence
from flair.models import SequenceTagger
tagger_fast = SequenceTagger.load('ner-ontonotes-fast')
sentence = Sentence(text)
tagger_fast.predict(sentence, mini_batch_size=16)
entities = []
for i in tqdm(range(len(sentence.to_dict(tag_type='ner')['entities']))):
str_main=None
start_pos = -1
end_pos = -1
if str(sentence.to_dict(tag_type=
'ner')['entities'][i]['labels']
[0]).split()[0] in 'ORG':
str_main = str(sentence.to_dict(tag_type='ner')['entities'][i]
['text'])
start_pos = sentence.to_dict(tag_type='ner')['entities'][i]['start_pos']
end_pos = sentence.to_dict(tag_type='ner')['entities'][i]['end_pos']
elif str(sentence.to_dict(tag_type=
'ner')['entities'][i]['labels']
[0]).split()[0] in 'PERSON':
str_main = str(sentence.to_dict(tag_type=
'ner')['entities'][i]['text'])
start_pos = sentence.to_dict(tag_type='ner')['entities'][i]['start_pos']
end_pos = sentence.to_dict(tag_type='ner')['entities'][i]['end_pos']
elif str(sentence.to_dict(tag_type=
'ner')['entities'][i]['labels']
[0]).split()[0] in 'GPE':
str_main = str(sentence.to_dict(tag_type=
'ner')['entities'][i]['text'])
start_pos = sentence.to_dict(tag_type='ner')['entities'][i]['start_pos']
end_pos = sentence.to_dict(tag_type='ner')['entities'][i]['end_pos']
if str_main is not None and (start_pos!=-1 and end_pos!=-1):
entities.append([str_main, start_pos, end_pos])
entities = pd.DataFrame(entities, columns=['entity', 'entity_start', 'entity_end'])
entities['text'] = text
return entities
Can you please help me with the changes I need to make to this function so that it can work with bootleg?
Thanks in advance.
from bootleg.
Hi @lorr1 , thanks a lot for your help. You are so nice and awesome :)
I am able to run this code using the Flair NER engine.
However, if I have to do some more changes, can I directly push them to the branch you created? or do I need to raise PR ?
from bootleg.
How about you raise PRs? I'll pretty much approve everything, but I'd like to keep track of what you're finding difficult/useful to implement.
Thanks!
from bootleg.
Related Issues (20)
- Maybe a bug in 'bootleg_annotator.py' HOT 1
- Consider to Benchmark Bootleg? HOT 2
- Published PyPI module is out of date HOT 2
- Annotations using entity_emb_file parameter are fast but not matching the accuracy level HOT 1
- Entity embedding training is not using GPU on Google Colab Pro+ HOT 1
- Do you update the knowledge graph periodically ? HOT 3
- Batch processing on label_mentions is not working HOT 1
- Error in the end2end module
- Installation guide is insufficient
- Details about the development set HOT 2
- Static embeddings are similar HOT 6
- AssertionError: After eval, some sentences had left over mentions {0: {0}} HOT 2
- Languages Supported HOT 3
- Version comprison between bootleg 1.0.0 and bootleg 1.1.0 HOT 4
- The Embeddings can not be download ! HOT 1
- bug of example HOT 3
- Answer gets significantly wrong when input is long HOT 2
- Installation error HOT 12
- No such file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bootleg.