Giter Site home page Giter Site logo

Comments (5)

lorr1 avatar lorr1 commented on September 14, 2024 1

So I went ahead and added your function as an example in the branch here. If you use the annotator and use the extract method of custom, it should trigger your extractor. I haven't tested it but it should get you started.

from bootleg.

lorr1 avatar lorr1 commented on September 14, 2024

Hi!

Yes, you can do this. I have a list of possible extractors here. If you want to implement your own extractor function and add it there, you should be able to trigger it being used via this argument here.

As long as you have the same inputs/outputs, it should be possible.

from bootleg.

coolcoder001 avatar coolcoder001 commented on September 14, 2024

Hi,
Thanks a lot for the quick response. :)
My extractor function using flair takes input as a string and outputs the extracted entities in a pandas dataframe.

def entity_recognition(text):
    """Given a text document, run a NER on it using flair and return a dataframe with the following columns
    text: actual raw text input
    entity: identified entity text
    entity_start: character start position of entity in raw text
    entity_end: character end position of entity in raw text
    """
    import pandas as pd
    from flair.data import Sentence
    from flair.models import SequenceTagger
    tagger_fast = SequenceTagger.load('ner-ontonotes-fast')
    sentence = Sentence(text)
    tagger_fast.predict(sentence, mini_batch_size=16)
    entities = []
    for i in tqdm(range(len(sentence.to_dict(tag_type='ner')['entities']))):
        str_main=None
        start_pos = -1
        end_pos = -1
        if str(sentence.to_dict(tag_type=
                                'ner')['entities'][i]['labels']
                [0]).split()[0] in 'ORG':
            str_main = str(sentence.to_dict(tag_type='ner')['entities'][i]
                        ['text'])
            start_pos = sentence.to_dict(tag_type='ner')['entities'][i]['start_pos']
            end_pos = sentence.to_dict(tag_type='ner')['entities'][i]['end_pos']
            
        elif str(sentence.to_dict(tag_type=
                                    'ner')['entities'][i]['labels']
                    [0]).split()[0] in 'PERSON':
            str_main = str(sentence.to_dict(tag_type=
                                        'ner')['entities'][i]['text'])
            start_pos = sentence.to_dict(tag_type='ner')['entities'][i]['start_pos']
            end_pos = sentence.to_dict(tag_type='ner')['entities'][i]['end_pos']
            
        elif str(sentence.to_dict(tag_type=
                                    'ner')['entities'][i]['labels']
                    [0]).split()[0] in 'GPE':
            str_main = str(sentence.to_dict(tag_type=
                                        'ner')['entities'][i]['text'])
            start_pos = sentence.to_dict(tag_type='ner')['entities'][i]['start_pos']
            end_pos = sentence.to_dict(tag_type='ner')['entities'][i]['end_pos']
        if str_main is not None and (start_pos!=-1 and end_pos!=-1):
            entities.append([str_main, start_pos, end_pos])
    
    entities = pd.DataFrame(entities, columns=['entity', 'entity_start', 'entity_end'])
    entities['text'] = text
    return entities

Can you please help me with the changes I need to make to this function so that it can work with bootleg?

Thanks in advance.

from bootleg.

coolcoder001 avatar coolcoder001 commented on September 14, 2024

Hi @lorr1 , thanks a lot for your help. You are so nice and awesome :)

I am able to run this code using the Flair NER engine.

However, if I have to do some more changes, can I directly push them to the branch you created? or do I need to raise PR ?

from bootleg.

lorr1 avatar lorr1 commented on September 14, 2024

How about you raise PRs? I'll pretty much approve everything, but I'd like to keep track of what you're finding difficult/useful to implement.

Thanks!

from bootleg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.