asyml / fortehealth Goto Github PK

The project is in the incubation stage and still under development. ForteHealth is a flexible and powerful ML workflow builder for biomedical and clinical scenarios. This is part of the CASL project: http://casl-project.ai/

License: Apache License 2.0

Python 100.00%

biomedical-named-entity-recognition clinical-nlp clinical-text-processing data-processing deep-learning information-retrieval machine-learning natural-language natural-language-processing python

fortehealth's People

Contributors

Stargazers

Watchers

Forkers

piyush13y kialan leolty bhaskar2443053 nikhilranjan7

fortehealth's Issues

Some bugs in Quick Start Guide in README.md

Describe the bug
I followed the instructions of Python code in the Quick Start Guide, and I found some bugs that needed to be fixed:
In the following codes:

pl.add(SpacyProcessor(), config={
    processors: ["sentence", "tokenize", "pos", "ner", "umls_link"],
    medical_onto_type: "ftx.medical.clinical_ontology.MedicalEntityMention"
    umls_onto_type: "ftx.medical.clinical_ontology.UMLSConceptLink"
    lang: "en_ner_bc5cdr_md"
    })

processors, medical_onto_type, umls_onto_type, lang need double quatation marks;
and some commas are also needed.

I will pull a request to give the codes I successfully debug.

To Reproduce
Steps to reproduce the behavior:

follow all the instructions in README.md (copy)
run

Desktop (please complete the following information):

OS: Windows
Version: 10
Compiler: Spyder

Add and integrate visualization to the demo pipeline

Is your feature request related to a problem? Please describe.
We want to be able to visualize the findings of our medical pipeline, and use UIMA Cas Viewer to show annotations for input data in a graphical interface instead of building a GUI from scratch. Stave isn't stable at this point either for our use case.

Describe the solution you'd like
Might have to use some 3rd party adapters to be able to leverage UIMA's Annotation Viewer(written in Java), for our use case.

Describe alternatives you've considered
@drishtiramesh looked into Jython, Python annotator, dkpro-cassis as possible options to implement this translation from python to java.

Multiple occurrence of same abbreviation

Describe the bug
Scispacy abbreviation detector processor output multiple occurrence of same abbreviation

To Reproduce
Run medical_text_understanding example

Expected behavior
One abbreviation should not repeat in same sentence

Add support for context analysis - Status of a named entity

Is your feature request related to a problem? Please describe.
As part of our medical NLP pipeline, we have implemented one part of the context analysis module i.e. NegationContextAnalyzer which detects negation of named entities. Now, we need the other aspect of it, StatusContextAnalyzer which should work analogous to its CTakes counterpart. This processor should get the status of every identified named entity in the input text

To begin with, status of a named entity could be one of these 3 probable values (with reference to this)

history_status
family_history_status
probable_status

Describe the solution you'd like
CTakes leverages FSMs (finite state machines) for exactly this problem. However, any reliable algorithm can be used for this purpose. You can go through this PPT for better understanding of the working of CTakes pipeline that we are trying to adopt and improve. You can further read up here to gain more insights into the functionality of this processor.

This file can also be looked at to get more examples of what's expected from status context analysis.

Add support for Temporal Mentions and Normalizer

Huggingface has pretrained models to detect Temporal mentions in a chunk of text, which can be leveraged in a processor for our use case.
Normalizer: To incorporate different types of date formats and ensure consistency, etc.

You can look at some temporal taggers here

remove python 3.6 from workflow

Describe the bug
due to the recent bug with python 3.6 and ubuntu, We have decided to remove test for python 3.6 in forte health workflow.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Add support for Coreference Resolution

Is your feature request related to a problem? Please describe.
Issues with coreference resolution are one of the most frequently mentioned challenges for information extraction from the biomedical literature. We plan to add support for coreferencing into our pipeline through a CoreferenceProcessor and this issue will help you get the implementation kickstarted.

Describe the solution you'd like
We will be developing a wrapper around Huggingface's NeuralCoref library to suit our use case and leverage their pre trained model for coreference resolution purposes. It uses spaCy with Neural Networks in the backend. The following is the link to the GitHub repo for the NeuralCoref project:
https://github.com/huggingface/neuralcoref

This is the blogpost by Huggingface to better describe their coreference resolution:
https://medium.com/huggingface/state-of-the-art-neural-coreference-resolution-for-chatbots-3302365dcf30

Please give the GitHub repository Readme file and the blogpost a read as it would help implementing the wrapper around NeuralCoref.
Their model is trained on English language (non biomedical corpus). The ontologies pertaining to this issue have already been defined, i.e. CoreferenceGroup, EntityMention, MedicalEntityMention. CoreferenceGroup currently works with EntityMention members, and we might have to translate/merge those as MedicalEntityMention for our medical pipeline.

doc._.coref_clusters <=> CoreferenceGroup
doc._.coref_clusters[1].mentions <=> EntityMentions

(Building and Generating Ontologies documentation)

As is clear from the GitHub repository, if doc._.has_coref is True, doc._.coref_clusters returns a list of all coref clusters, each of which would in turn define CoreferenceGroups. NeuralCoref mentions are all Span objects, which implies its straightforward to define EntityMentions/MedicalEntityMentions from these. These in turn can then be used to define a CoreferenceGroup.

Regarding config for the processor, the user can provide values for greediness, max_dist, blacklist, etc. These parameters are mentioned in the GitHub repository readme and can be referred to for more details.

Example call:

pl.add(
        CoreferenceProcessor(),
        {
            lang: "en_core_web_sm",
            greedyness: 0.75,
            max_dist: 50,
            max_dist_match: 500, 
        },
    )

Another thing that we will have to ensure is that we must install neuralcoref along with forte-medical. Hence, it will have to be added to the setup.py and requirements files.

Also, make sure you add unit test cases for the processor. You can refer any of the test files in https://github.com/asyml/ForteHealth/tree/master/tests/forte_medical/processors for reference.

P.S. You can follow NegationContextAnalyzer processor for the structure and code design. It can be used as the template processor to refer to when implementing a new one.

Describe alternatives you've considered
Several papers were referred to and a couple GitHub repositories as well. E2E was another alternative to this for coreference resolution, but Huggingface's NeuralCoref seems to be strightforward to implement and since we already have spaCy based processors in our code base, it can be easier to write this wrapper.

Add separated example for ICDCodingProcessor

Related problem
As my starter project, adding a separated example for ICDCodingProcessor.

Solution
Modifying the integrated example for mimic-iii dataset.

Support Bio NER using Stanza processor

Is your feature request related to a problem? Please describe.
We currently have stanza_processor implemented in forte-wrappers repo, which supports these components - tokenize, pos, lemma, depparse. However, we want to incorporate NER functionality to our processor as well. Stanza, by itself does support Bio NER and we can just leverage that for our use case as well.

You can refer to the following link on tutorial to use bioNER with stanza. On understanding these examples, all you have to do is incorporate that into Forte structure. If you go through stanza_processor, you will see how the other functionalities work through stanza.Pipeline(). You can follow the same design to add NER component to the processor.

Link: https://stanfordnlp.github.io/stanza/biomed_model_usage

Also, you can refer NegationContextAnalyzer processor to assist you with writing the code according to Forte principles.

Lastly, stanza_processor is defined in the forte-wrappers repository and hence the changes will effectively be in that repo and the PR can link this issue.

mimic-iii pipeline

Is your feature request related to a problem? Please describe.
Adapting the clinical pipeline from the forte example into this pacakge.

Describe the solution you'd like
place the components (ontology, dataset reader, and processor, etc.) into corresponding folders in forte_medical/, and provide the pipeline example into examples/mimic-iii/.

Describe alternatives you've considered
Given it's the first example in this package, so feel free to change the code framework if necessary.

Fix ICD processor for forte version 0.3.0.dev3

Describe the bug
The ICD processor test started failing with version forte==0.3.0.dev3

Screenshots

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

CT image reader

Is your feature request related to a problem? Please describe.

This issue is to develop a CT image reader, which will use pydicom lib.

Describe the solution you'd like
this reader will output the CT image as numpy array in HU unit.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Xray_image_reader

This issue is created to add an x-ray image reader for the x-ray classification task.

this feature will read an image from a given path and add it to datapack

Add ICD coding support to pipeline

Is your feature request related to a problem? Please describe.
ICD coding is a process of assigning the International Classification of Disease diagnosis codes to clinical/medical notes documented by health professionals (e.g. clinicians).
We currently do not support automatically detecting the ICD codes given a clinical excerpt. This issue explains what is expected from the to-be-developed ICDCodingProcessor for the pipeline.

Describe the solution you'd like
Huggingface has a few models that can be used for this particular use case and we will be leveraging those into our processor. Along with that, new ontologies will have to be defined, that will then be used by the processor and data packs to process and store the ICD codes for any clinical excerpt.

To begin with, we should be defining the ontologies that will be required for this processor. A new parent ontology has to be defined under forte.data.ontology.top.Annotation. MedicalArticle would be the parent ontology which would represent the whole text of a discharge note, etc. ICDCode - child ontology name and there will be couple of attributes within this, namely code (as string) and version (int). These can then be used to store coding information as such:
example input: "Patient has been diagnosed with lung tuberculosis"
example output ontology:

- MedicalArticle
  - ICDCode
      - code "A15.0"
      - version 10

(Building and Generating Ontologies documentation)

Now, moving onto the processor and it is implementation. We want to keep it configurable in the similar way as our NER processors. So the actual model that will be used by our processor will be passed through the config and not hardcoded in the processor to ensure modularity and configurable nature of our processors.

pl.add(
        ICDCodingProcessor(),
        {
            model: "AkshatSurolia/ICD-10-Code-Prediction",
        },
    )

This ICD Coding pretained model can be used as one of the models for ICD coding. The link can be referred to look at how the results are fetched from the model given an input. What's important here is to ensure that the processor can work with different models, if we were to extend support to multiple models going forward.

P.S. You can follow NegationContextAnalyzer processor for the structure and code design. It can be used as the template processor to refer to when implementing a new one.

Describe alternatives you've considered
A few other models and research papers were considered. This particular approach seems to be the one to go with given the simplicity of implementation.

Create an example for MIMIC-III clinical note pipeline.

I had this idea because I wanted to have a pipeline that had the ability to cover all of our processors (in the NLP field) as much as possible. And I think the mimic-iii data satisfies that.

In this example, we should try to use all the processors we have, for example, if our sample data is selected from a patient's self-report or query or clinical diagnose records (maybe a covid-19 patient), which describes their physical condition, e.g., with A symptoms and without B symptoms (Negation Context Detect), and then give a diagnosis based on the symptom description (ICD Coding). The user description may have a more specific time, such as how it was last night, how it was last month, so that it can be extended to the Temporal domain. ( I know the temporal related processors may be not completed, we can just work on all the things we have currently).

But it may be hard to find a piece of data that covers all the processors, for this issue, maybe we can just concatenate them to achieve what we want.

Possible included components:

Sentence Segmenter
Tokenizer
Bio NER Tagger
Negation Context
ICD Coding
Temporal Mention Tagging
Temporal Relation Extraction
Deidentification

(Just ignore the processors we do not have currently)

Add a search engine linking with Stave using Streamlit

Add an example with the following features:

a search engine using streamlit.
link the search results to relevant stave documents.

create a new branch for storing data samples

Is your feature request related to a problem? Please describe.
this issue is to create a new branch to store sample data used by test code.

Describe the solution you'd like
this will make the main branch light weight

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Resolve ontology depth issue

Describe the bug
Make sure that __init__.py isn't present until the depth that has no ontologies in the folder ftx/ in this case, or else it throws an error due to conflict with forte's ftx/

Also, update the config file for demo pipeline for config driven spacyProcessor

Fix abbreviation detector in Scispacy processor

Describe the bug
Incorrect index for begin and end in abbreviation detection processor

To Reproduce
Steps to reproduce the behavior:
Run Scispacy processor for abbreviation detection

Expected behavior
Example:

"Spinal and bulbar muscular atrophy (SBMA) is an
inherited motor neuron disease caused by the expansion
of a polyglutamine tract within the androgen receptor (AR)."

long_form = Spinal and bulbar muscular atrophy
Stored long_form in tmp_abrv.text is just one letter character due to incorrect indexing

Create an example for building bio NER pipeline

Describe the solution you'd like

In ForteHealth, we incorporate ScispaCy for bio ner annotation, I think, as the very first example, we can simply create a pipeline for bio NER annotation. The demo from scispacy is here

In scispacy, with model en_ner_bc5cdr_md, we can annotate Disease and Chemical, with model en_ner_bionlp13cg_md, we can annotate Cancer, Organ, etc. We can also show this by using different configuration to build the pipeline.

Possible included componets:

Sentence Segementor
Tokenizer
Bio NER Tagger

processor for CT image windowing task

Is your feature request related to a problem? Please describe.
new processor to support windowing of CT images.

Describe the solution you'd like
windowing of CT image

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

numpy issue with xray processor and scispacy processor

Describe the bug
Numpy is throwing an error "AttributeError: module 'numpy' has no attribute 'object'"

to solve this : version downgrade of numpy to 1.21.6 resolved the issue.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Create an example for medical text understanding

The idea is mainly for Abbreviation detection and Hyponym detection.

For example, the sample text is from wikipedia, Source:

Sample text:
Appropriately treating underlying illnesses (such as HIV/AIDS, diabetes mellitus, and malnutrition) (HYPONYM) can decrease the risk of pneumonia.[24][84][87] In children less than 6 months of age, exclusive breast feeding reduces both the risk and severity of disease.[24] In people with HIV/AIDS and a CD4 count of less than 200 cells/uL the antibiotic trimethoprim/sulfamethoxazole decreases the risk of Pneumocystis pneumonia [88] and is also useful for prevention in those that are immunocompromised but do not have HIV.[89]

Abbreviation includes HIV, AIDS, CD4, …

Possible included components:

Abbr detection
hyponym detection
Coreference resolution

(Other processors that can help text understanding can be added as well)

Add support for Temporal Relations (To be updated)

Is your feature request related to a problem? Please describe.

Events are linked together through a variety of temporal structures. The temporal relations are expressed both explicitly, through words like after, and implicitly through inference. Extracting these sorts of temporal structures is crucial for an understanding of the text. Machine reasoning requires an explicit representation of the temporal structure. Such an explicit representation can be formed by identifying specific words or phrases as the event anchors of the structure, and then drawing explicit temporal relation links between the various events. Examples are given below:

Describe the solution you'd like
CTakes used SVM-based temporal relation annotators which achieves an F-score of 0.589. State-of-the-art results for event-time relations were achieved with our neural network approaches. All the annotators were trained and tested on colon cancer notes from the THYME data set. Similar module is expected by using any reliable algorithm. Please find some resources to refer down below.

Additional Resources

Apache CTakes Summary PPT
Temporal Relations Module in CTakes
Temporal Relations CTakes Github
Savova, Guergana et al. “Towards temporal relation discovery from the clinical narrative.” AMIA ... Annual Symposium proceedings. AMIA Symposium vol. 2009 568-72. 14 Nov. 2009
Lin, Chen et al. “Multilayered temporal modeling for the clinical domain.” Journal of the American Medical Informatics Association : JAMIA vol. 23,2 (2016): 387-95. doi:10.1093/jamia/ocv113

Support Medical Note Deidentification

Is your feature request related to a problem? Please describe.
Deidentification task: detect and mask out the private information (e.g., names, dates) in a document.

Describe the solution you'd like
Apply a NER-style model to detect spans that include private information.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context

Merging clinical ontologies

Is your feature request related to a problem? Please describe.
There are two clinical ontology files- clinical_onto.json and clinical_ontology.json.

Describe the solution you'd like
Merging the two files into one.

Some top level bug fixes

There are a few top level stuff on ForteHealth that needs to be fixed:

several name with “forte_medical”, such as the package and the ontologies, the test folder, the cli folder
“install_required” are not actually required: https://github.com/asyml/ForteHealth/blob/master/setup.py#L24

Xray processor example

Is your feature request related to a problem? Please describe.
This issue is to add an Xray image classification example using the XrayProcessor and Xrayreader.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Fix hyponym detector in Scispacy processor

Describe the bug
Parent and child string is not stored correctly in processor ontology. Only relation phrase can be tested e.g., "such as"

To Reproduce
Steps to reproduce the behavior:
Run Scispacy processor for hyponym detection

Expected behavior
Example:

Keystone plant species such as fig trees are good for the soil.
Real parent = Keystone plant species
Real child = fig trees
Real Relation = such as

But the scispacy processor stores:
Real parent = Key
Real child = fi
Real relation = such as

documentation error correction

Describe the bug
fortex/health/readers/xray_image_reader.py(https://github.com/asyml/ForteHealth/pull/81/files#diff-1768718f2d2c35efa3726cac628cf25780923862b95313b0ccd2d9f33126f14b)

there are few minor comment error with the documentation of the above file.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

NER Demo

Is your feature request related to a problem? Please describe.
Implement a demo for NER with Streamlit.

Reference

Update the legends in Stave to show Disease, Medical, etc.

As mentioned in the meeting.

Add an example for clinical pipeline and stave

Is your feature request related to a problem? Please describe.
We need an example in ForteHeath to show how we incorporate clinical pipeline and stave.

Describe the solution you'd like
Here is the reference: asyml/forte#793

Missing "setup.py" when pip install forte-wrappers from git

I am trying the example in examples/mimic_iii/. When I follow the README and run this command:

pip install git+https://[email protected]/asyml/forte-wrappers#egg=forte-wrappers[elastic,spacy]

pip shows this error:

ERROR: File "setup.py" not found for legacy project forte-wrappers[elastic,spacy] from git+https://****@github.com/asyml/forte-wrappers#egg=forte-wrappers[elastic,spacy].

It seems that setup.py is missing in forte-wrapper repo.

However, installing from pypi works well. Maybe we need to replace the above command with this one:

pip install forte.spacy forte.elastic

Add ontology definitions for temporal tagger & normalizer

As definitions required for the new processor to be implemented for Temporal tagging and Normalizing.
With reference to #48

Implement a scispaCy processor as wrapper

Is your feature request related to a problem? Please describe.
Two new tasks to be incorporated into medical pipeline are Abbreviation Detection and Hyponym Detection. We use scispaCy to implement these 2 tasks. Hence, we need a processor to wrap all the scispaCy functionalities that we might require in our pipeline.

Describe the solution you'd like
The task is to develop a processor called ScispacyProcessor which wraps all the scispaCy methods that will be used for these 2 tasks and any future task that wraps scispaCy functionalities. Other than just writing the wrapper for this, we will also need to generate new ontologies which the wrappers will be utilizing. Please ping me on slack, or just add a comment here, once you start working on this so I can update the ontologies that we might require for these tasks here.
(Ontology generation will be handled in #25 )

Annotation
- Abbreviation
  - long_form

-Link

Hyponym
- hyponym_link

While implementing this processor, you can refer NegationContextAnalyzer processor for the practices and design principles that we should follow. All the processors in Forte have a similar design since they are all modular components of Forte as a pipeline. For any queries, feel free to add comments in this issue.

You can also refer to the following link for help on how scispacy's abbreviation detection and hyponym detection are used out of the box.
https://pythonlang.dev/repo/allenai-scispacy/

Update README.md and other files

Is your feature request related to a problem? Please describe.
Update the readme to be informative for its users, so the repository can go public.
Also add .md files for Contributing and code of conduct just like Forte.

xray_image_classification_processor

this issue is created to write a processor for x-ray image classification.

we will be using the HuggingFace based VIT pretrained model https://huggingface.co/nickmuchi/vit-finetuned-chest-xray-pneumonia in our task.

we have verified the accuracy claim of 95.51 % for this model on https://data.mendeley.com/datasets/jctsfj2sfn/1

this processor and reader from issue #59 will help to create an application of x-ray image classification using the ForteHealth pipeline.

Create ontology for ICDCoding

Is your feature request related to a problem? Please describe.
According to #14, we should be defining the ontologies that will be required for this processor. A new parent ontology has to be defined under forte.data.ontology.top.Generics. MedicalArticle would be the parent ontology which would represent the whole text of a discharge note, etc. ICDCode - child ontology name and there will be couple of attributes within this, namely code (as string) and version (int). These can then be used to store coding information as such:
example input: "Patient has been diagnosed with lung tuberculosis"
example output ontology:

- MedicalArticle
  - ICDCode
      - code "A15.0"
      - version 10

Add ontology config for med pipeline demo

Describe the bug
The default for Spacy was medical repo's medical ontology. However, now we are making the change to use forte main repo's medical ontologies as default. Hence, this change is required for our demo pipeline to work without exceptions. We need to pass the correct medical ontology to spacy through config.

Rectify ICD Coding ontologies

Describe the bug
Current ICD Coding ontologies aren't as they're supposed to be and need fixing. This will also ensure proper working in tandem with Stave visualizer.

MedicalArticle should be inherited from Annotation
ICDCode should be an attribute of the article.

Search engine demo

Is your feature request related to a problem? Please describe.
Implement a search engine demo for the MIMIC-III example with Streamlit.

Reference

A tutorial

Add support for Negation Context analysis to pipelines

Is your feature request related to a problem? Please describe.
We want to incorporate negation context analysis into our pipeline so as to infer in what context is a particular EntityMention being talked about. Its vital that the polarity of negation is extracted because in clinical notes, it decides the eventual diagnosis.

Describe the solution you'd like
Either negspacy library could be used to run Negation Context analysis (easier to plug into forte wrappers) , something on the lines of using FSMs to detect negation can be developed (analogous to ctakes) or NegEx which uses regular expressions.

Optimize index finding in scispacy processor

Is your feature request related to a problem? Please describe.
Optimize find_index function in scispacy processor as it is currently scanning the whole sentence for each item.

Describe the solution you'd like
Accumulating all the items from hearst_patterns. And then iterating over the items as the input pack text is traversed. This way we don’t have to traverse the whole input pack text for each items.

Hyponym Ontology update

Update the existing ontology to include the commented attributes.

Hyponym ontology update

Describe the bug
Currently, the parent and child of Hyponym link are both ft.onto.base_ontology.Token. We want to update that to
ft.onto.base_ontology.Phrase, since it need not be a single token but several of those, and hence could be phrases.

Add support for De-identification

Generate ontology for AbbreviationDetection and HyponymDetection

Is your feature request related to a problem? Please describe.
According to #14, we should be defining the ontologies that will be required for this processor. A new parent ontology has to be defined under Token. Abbreviation would be the ontology name and there will be an attribute within this, namely long_form (as string). These can then be used to store coding information as such:

example input:
"Spinal and bulbar muscular atrophy (SBMA) is an
inherited motor neuron disease caused by the expansion
of a polyglutamine tract within the androgen receptor (AR).
SBMA can be caused by this easily."

example output ontology:

- Abbreviation (span 'SBMA')
  - long_form "Spinal and bulbar muscular atrophy"

- Abbreviation (span 'AR')
  - long_form "androgen receptor"

HyponymDetection....

-Link
  - Hyponym
    - hyponym_link

For example: "Keystone plant species such as fig trees are good for the soil."

Hyponym:
hyponym_link: such_as
Parent: Keystone plant species
Child: fig trees

asyml / fortehealth Goto Github PK

fortehealth's People

Contributors

Stargazers

Watchers

Forkers

fortehealth's Issues

Recommend Projects

Recommend Topics

Recommend Org