jenojp / negspacy Goto Github PK

View Code? Open in Web Editor NEW

266.0 7.0 35.0 269 KB

spaCy pipeline object for negating concepts in text

License: MIT License

Python 100.00%

python nlp spacy negation negex negation-phrases spacy-pipeline spacy-extension

negspacy's People

Contributors

Stargazers

Watchers

negspacy's Issues

Negation of a wrong dependency

I have a custom NER pipeline. When I pair it with negspacy, I see wrong part of the sentence being negated.

e.g. in these cases, "home" is detected as negated:

"he has not been able to walk much outside his home"
"He does not have any help at home"

In these examples I am looking whether person is housed or homeless

When I use multiprocessing pool.map, word._.negex throws an error as Can't retrieve unregistered extension attribute 'negex'

Describe the bug
I am processing the medical texts written by nurses and doctors using spacy English() model and Negex to find the appropriate negations. The code works fine when i run it in single thread but when I use Multiprocessing to process texts simultaneously it raises an Exception as given below

File "../code/process_notes.py", line 154, in multiprocessing_finding_negation pool_results = pool.map(self.process, split_dfs) File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value AttributeError: ("[E046] Can't retrieve unregistered extension attribute 'negex'. Did you forget to call theset_extension method?", 'occurred at index 2')

To Reproduce

def load_spacy_model(self):
	nlp = English()
	nlp.add_pipe(self.nlp.create_pipe('sentencizer'))
	ruler = EntityRuler(self.nlp)
	# adding labels and patterns in the entity ruler
	ruler.add_patterns(self.progressnotes_constant_objects.return_pattern_label_list(self.client))
	# adding entityruler in spacy pipeline
	nlp.add_pipe(ruler)
	preceding_negations = en_clinical['preceding_negations']
	following_negations = en_clinical['following_negations']
	# adding custom preceding negations with the default preceding negations.
	preceding_negations += self.progressnotes_constant_objects.return_custom_preceding_negations()
	# adding custom following negations with the default following negations.
	following_negations += self.progressnotes_constant_objects.return_custom_following_negations()
	# negation words to see if a noun chunk has negation or not.
	negation_words = self.progressnotes_constant_objects.return_negation_words()
	negex = Negex(nlp, language='en_clinical', chunk_prefix=negation_words,
	              preceding_negations=preceding_negations,
	              following_negations=following_negations)
	# adding negex in the spacy pipeline
	# input----->|entityruler|----->|negex|---->entities
	nlp.add_pipe(negex, last=True)
      
def process(self, split_dfs):
    split_dfs = split_dfs
    # this function is run under multiprocessing.
    split_dfs = self.split_dfs.apply(self.lambda_func, axis=1)

def lambda_func(self, row):
    """
    This is a lambda function which runs inside a multiprocessing pool.
    It read single row of the dataframe.
    Applies basic cleanup using replace dict.
    Finds positive,their respective tart-end index and negative words.
    positive words are the words mentioned in the keywords patterns.
    """
    row['clean_note'] = row['notetext']
    
    # passing the sentence from NLP pipeline.
    doc = self.nlp(row['clean_note'])
    neg_list = list()
    pos_list, pos_index_list = list(), list()
    for word in doc.ents:
        # segregating positive and negative words.
        if not word._.negex:
            # populating positive and respective positive index list.
            pos_list.append(word.text)
            pos_index_list.append((word.start_char, word.end_char))
        else:
            neg_list.append(word.text)

p = os.cpu_count() - 1
pool = mp.Pool(processes=p)
split_dfs = np.array_split(notes_df, 25)  # notes_df is a panda dataframe
pool_results = pool.map(self.process, split_dfs)
pool.close()
pool.join()

Expected behavior
pos_list & neg_list needs to get populated

Screenshots

Desktop (please complete the following information):

OS: MacOS Catalina, 8GB RAM , 1.6 Ghz dual-core

Error with the termset

Hi,

With negspacy 1.0.0 and spacy 3.0.1, I get the error for using termsets:

'negex -> neg_tersmset extra fields not permitted'

    nlp = spacy.load("en_core_web_sm")
    ts = termset("en")
    nlp.add_pipe(
        "negex",
        config={
            "neg_tersmset": ts.get_patterns()
        }
    )

Can you help?

negspacy pairing example

Thank you for creating negspacy! This is extremely helpful. I was wondering you can provide more detail on Consider pairing with scispacy to find UMLS concepts in text and process negations. ?

I searched Google and was not able to find any examples. How do you pair with othet spacy elements.

Spacy 3.2 support

Spacy extension error

When I use this code, I can generate negex as seperate column
doc=nlp_negex(d)
labels=["ENTITY", "FAMILY"]
df_test = pd.DataFrame(columns=["ent","label", "negex", "sent"])
attrs = ["text", "label_", "sent"]
for e in doc.ents:
if e.label_ in labels:
df_test = df_test.append({'ent': e.text,'label': e.label_, 'negex': e._.negex, "sent":str(e.sent) }, ignore_index=True)

But, when I use the following code

doc=nlp_negex(d)
labels=['ENTITY', 'FAMILY']
attrs = ["text", "label_", "sent", "..negex"]
data = [[str(getattr(ent, attr)) for attr in attrs] for ent in doc.ents if ent.label in labels]
df2 = pd.DataFrame(data, columns=attrs)`

I am getting following error
`AttributeError: 'spacy.tokens.span.Span' object has no attribute '._.negex'

How can I access negex as a registered extension

Thank you

pyproject.toml

To avoid issues in pip 23, utilize pyproject.toml over legacy setup.py method.

How can I get this to work?

I think I'm missing something here and can't seem to resolve it.

The code works with the example texts provided in much of the documentation (e.g. "She does not like Steve Jobs but likes Apple products."), and the term 'cannot' appears in the termset - how can I identify these simple negations? Please note the print is indented in the original code.

Here's my code:

pip install negspacy

import spacy
from negspacy.negation import Negex

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

ts = termset("en")

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

doc = nlp("Men cannot play football.")
for e in doc.ents:
print(e.text,` e._.negex)

adding on custom patterns in negspacy

My code currently looks like -

import en_core_sci_lg
from negspacy.negation import Negex
nlp = en_core_sci_lg.load()

negex = Negex(nlp, language = "en_clinical_sensitive")
nlp.add_pipe(negex, last=True)

doc = nlp(""" patient has no signs of shortness of breath. """)

for word in doc.ents:
    print(word, word._.negex)

The output is -

patient False
shortness True

I want the output to be -

patient False
shortness of breath True

How can I consider phrases like "shortness of breath", "sore throat", "respiratory distress" as a single entity.

I was thinking of adding this custom phrases to add in negation.py line 81. how can I do that? is there any other approach with which I can resolve this issue.

Applying Negex to Adjectives

Is there a straight forward way to apply Negex to adjectives? I already incorporated Negex into my pipeline with my own custom component, but I didn't realize until after the fact that it seems to only be searching for negations in relation to Named Entities. For example, I was hoping to apply it, so I'd get positive matches on something like:

doc = nlp("Eve is not nice. Eve is friendly. Eve is not chill.")
for s in doc.sents:
    for t in s.tokens:
      print(t._.negex, t.text)

True nice
False friendly
True chill

It seems like this is not supported at the moment, but if anyone has any advice on how to customize Negex to achieve this it would be much appreciated. Also, if there's a good reason to not bother trying to do this at all, would love to understand that too.

Thanks!

Documentation at negspacy's PyPI webpage needs to be updated

Describe the bug
Same as issues #31 and #34, the example here describing how to add negspacy object to pipeline needs to be updated.

P.S. Thank you for releasing and maintaining such a useful library!

Is it possible to apply Negspacy over token?

Hi,

How can I apply negations on the token? is it possible?

I tried following but it ended up in error.
`for token in doc:

print(token.text, token._.negex)

`The error is`

AttributeError Traceback (most recent call last)
in
1 for token in doc:
----> 2 print(token.text, token._.negex)

~/anaconda3/envs/python3/lib/python3.6/site-packages/spacy/tokens/underscore.py in getattr(self, name)
33 def getattr(self, name):
34 if name not in self._extensions:
---> 35 raise AttributeError(Errors.E046.format(name=name))
36 default, method, getter, setter = self._extensions[name]
37 if getter is not None:

AttributeError: [E046] Can't retrieve unregistered extension attribute 'negex'. Did you forget to call the set_extension method?
`

Tagging 'possible' terms

Is your feature request related to a problem? Please describe.
An additional functionality of tagging terms as 'possible': It's a feature in one of the original negex implementations as well as in pyConTextNLP. Also, some important negation corpora include such annotation (i.e. speculated/possible terms).

Describe the solution you'd like
An example could like this:

doc = nlp("breast cancer may be ruled out")
for e in doc.ents:
    print(e.text, e._.negex)

Output:

breast cancer possible

Obviously, this would require adjusting the return value of e._.negex to be type of i.e. string instead of bool.
This implementation could help when considering the logic behind this feature. In case, anyone wanna run this negex with possible tagging enabled, check this issue here.
The "possible" pre and post triggers ([PREP] and [POSP]) can be also added easily from the the same implementation's negex_triggers.txt file.

Describe alternatives you've considered
None other than using this mentioned negex code separately (combined with spacy, without negspacy).

Additional context
I can refer to the README as well as negex.py files in here. I imagine, step 2. is the only one that would require more work and having good understanding of negspacy.

Negation detection for :No terms

Hi,

For cases like "Blood Transfusion: No" the negation is failing tried adding :No in the termset still no change

nlp = spacy.load("en_core_sci_lg")
ts = termset("en_clinical")
nlp.add_pipe(
"negex",
config={
"chunk_prefix": ["no"]
},
last=True,
)

doc = nlp("Blood Transfusion: No")
for e in doc.ents:
print(e.text, e._.negex)

Output:
Blood Transfusion False

Get the List of Corresponding Negation Terms for a Set of Negated Lexicons

Hello, I am working on a project with a clinical dataset. So far I was able to detect all the diagnoses and whether they are negated or not. But I really like to get the negation term used to detect negated lexicon as well. For example:

import spacy
from negspacy.negation import Negex

nlp = spacy.load("en_core_sci_lg")
nlp.add_pipe("negex")

doc = nlp("She has neither fever nor cough.")

for e in doc.ents:
print(e.text, e._.negex)

fever True
cough True

What more do I expect to get:
The negation term for each negated lexicon detected: neither, nor

It would be very appreciated if you could help me.

KeyError: 'es_clinical'

I tested the pipeline for Spanish text as following

ts = termset("es_clinical")

nlp = spacy.load("es_core_news_sm")

nlp.add_pipe(
    "negex",
    config={
        "neg_termset":ts.get_patterns()
    }
)

and got the error

File "/root/negex_spacy/negex.py", line 5, in <module>
    ts = termset("es_clinical")
File "/usr/local/lib/python3.10/dist-packages/negspacy/termsets.py", line 212, in __init__
    self.terms = LANGUAGES[termset_lang]
KeyError: 'es_clinical'

Update: try to install via cloned repository and works. Seems that the one installed via pip is not the lastest version.

Can negspacy be used with already identified Entities from scispacy

Is your feature request related to a problem? Please describe.
Can negspacy be used with already identified Entities and their spans through scispacy, by providing them somehow?

Describe the solution you'd like

For instance, scispacy has been already run with its EntityLinker, and umls entities with their the indices have been obtained and stored somewhere.

It would be computationally expensive to run the whole scispacy with negspacy again. Is there a way to only run (sci)spacy with only base spacy functionality like the tokenizer, and provide the full text string, the entities and their indices somehow, so that negspacy can determine the negation status?

Compatibility with Scispacy

I am using the versions
spaCy 3.0.3
negspacy 1.0.0
scispacy 0.4.0

I think the current version of negspacy is not compatible with scispacy. I already read the issue but I think it works with the previous negspacy version. I also tried other models of scispacy like en_core_sci_sm but got the same error:

ValueError: [E002] Can't find factory for 'negex' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator `@Language.component` (for function components) or `@Language.factory` (for class components).

Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, textcat_multilabel, en.lemmatizer

My code is as follows:

 nlp = spacy.load("en_core_sci_md")
    ts = termset("en_clinical")
    ts.add_patterns({'preceding_negations': ["nor","non"]})
    nlp.add_pipe(
        "negex",
        config={
            "neg_termset":ts.get_patterns(),
            "chunk_prefix": ["no"],
        },
        last=True,
    )

    for text in df['Cons']:
        doc = nlp(str(text))
        ..........

Can you help?

Context-based medical concepts extraction from the given text in python spacy

Hi ,

Thanks for this awesome negation tool, its amazing.

Motivating by negspacy I have some thoughts to implement which is similar task of negspacy, so kindly suggest to me.

I have a medical text which has a diagnosis, past history, negations (like no headache, fever), and in what case-patient to consult a doctor in the future/emergency.

Example of text

The patient admitted to the hospital with hypertension and chronic kidney disease. The patient had a past history of diabetes mellitus and coronary artery disease. When the patient admitted to the hospital, no symptoms of fever, giddiness, and headache found. The patient is asked to consult a doctor in case of vomiting and nausea.

The above sentence has a present illness (sentence 1), past history (sentence 2), negations (sentence 3), and future consultation (sentence 4). I have been using scispacy for medical concept extraction and negspacy for negations, both of them are working fine.

Now my next task is,
How do I separate present illness, past history, and future consultations in the NLP technique?

I have thought in mind that to add "past history of", "in case of emergency", "history of" in the chunk_prefix . is it a good move?

Can I create a duplicate of negspacy and add my own terms and add as a separate pipeline to spacy?

Pseudo negations are not being handled properly.

Describe the bug
Pseudo negations are not being handled properly.

Termset choices

Thanks for enabling negex in the spaCy ecosystem -- this is incredibly helpful.

I noticed your termsets.py file is a subset of the trigger words/phrases historically used by negex (see here)

Was this for performance issues? Or to make negspacy more generalizable in non-healthcare domains? Some other reason?

I'm aware you can override negspacy's default termsets (nice feature), so this is more of a general question.

Thanks again for making this available.

spacy-2.2.3

Describe the bug
A clear and concise description of what the bug is.
uninstalls the latest spacy 2.2.3 and reinstalls an older version spacy 2.1.8.
To Reproduce
Steps to reproduce the behavior:
pip install negspacy
Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. MacOS Mojave]

Additional context
Add any other context about the problem here.

Error while processing the example

Hi I am getting the following error while trying to process the example with UmlsEntityLinker() 👍

File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
[Sun Sep 20 16:41:24.954031 2020] [wsgi:error] [pid 458:tid 140686318831360] [remote 122.176.234.216:63073] obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[Sun Sep 20 16:41:24.954035 2020] [wsgi:error] [pid 458:tid 140686318831360] [remote 122.176.234.216:63073] File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
[Sun Sep 20 16:41:24.954041 2020] [wsgi:error] [pid 458:tid 140686318831360] [remote 122.176.234.216:63073] raise JSONDecodeError("Expecting value", s, err.value) from None
[Sun Sep 20 16:41:24.954045 2020] [wsgi:error] [pid 458:tid 140686318831360] [remote 122.176.234.216:63073]

json.decoder.JSONDecodeError: Expecting value: line 1 column 219152385 (char 219152384)

I am using Python 3.8.2

Please help.

Regards

Prabhat

Can negspacy be also used to detect family mentions?

Is your feature request related to a problem? Please describe.
With the architecture to detect negations, could negspacy also detect if named-entities are within the scope of family mentions? This would be especially useful in combination with found UMLS concepts. Instead of negation words, it would probably have to use words like brother, sister, etc.

Describe the solution you'd like
In addition to the negation status of UMLS concepts, a familiy mention status could be reported.

spaCy issue #4267 will make negspacy believe a doc has not been processed for NER

Describe the bug
See explosion/spaCy#4267

To Reproduce
See explosion/spaCy#4267

Expected behavior
Whether using model based NER or EntityRuler, negspacy should know a document was NERed.

Allow for negation algorithm to consider first token of a noun chunk

Is your feature request related to a problem? Please describe.
Some models may noun chunk a negation into an entity span. For example:
There is no headache doc.ents will include "no headache".

This would cause the negation algorithm to miss an obvious negation. This does not seem to happen with spacy out of the box models but more so in scispacy's biomedical language models.

Wrongly picking negations in negaspcy?

Hi,

I am trying to find negations in a sentence using negspacy. But, it's printing first negation (no headache) as False which is supposed to be True and picking the second negation correctly. Should I fine-tune any parameters to get the first negation correctly?

Here is my code.

nlp = spacy.load("en_core_sci_md")
negex = Negex(nlp, language = "en_clinical")
doc = nlp('I am having Hypertension with no headache and fever')
for ent in doc.ents:
    print(ent.text, ent._.negex)

Output:

Hypertension False
no headache False
fever True

negspacy docs on spacy universe are out of date

Describe the bug
If you go to the docs here you'll see the following code example:

import spacy
from negspacy.negation import Negex

nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, ent_types=["PERSON','ORG"])
nlp.add_pipe(negex, last=True)

doc = nlp("She does not like Steve Jobs but likes Apple products.")
for e in doc.ents:
    print(e.text, e._.negex)

This is out of date and should be replaced with the contents of the current readme.

Example from README is not working

First of all, thanks for this awesome library.

Describe the bug
When running the example from the README with the version 1.0.0 I get the following error:

TypeError: __init__() missing 4 required positional arguments: 'name', 'neg_termset', 'extension_name', and 'chunk_prefix'

To Reproduce

Steps to reproduce the behavior:

Go to a Python terminal and run the following example:

import spacy
from negspacy.negation import Negex


nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, ent_types=["PERSON","ORG"])

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__() missing 4 required positional arguments: 'name', 'neg_termset', 'extension_name', and 'chunk_prefix'

Expected behavior

Expecting it to work as in the README example with the latest version.

Desktop (please complete the following information):

OS: Linux PopOS
Spacy: 3.0.1

Install error on negspacy (pip install)

Describe the bug
Getting the below error with pip install negspacy command in Anacondo Prompt.

To Reproduce
Steps to reproduce the behavior:
Run Anaconda Prompt as Administrator
type - pip install negspacy and enter
Expected behavior
negspacy package gets install and should be present in C:\ProgramData\Anaconda3\Lib\site-packages

Screenshots
(base) C:\WINDOWS\system32>pip install C:\negspacy-0.1.0a0.tar.gz
Processing c:\negspacy-0.1.0a0.tar.gz
ERROR: Command errored out with exit status 1:
command: 'c:\programdata\anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\xx\AppData\Local\Temp\pip-req-build-4b2hz8em\setup.py'"'"'; file='"'"'C:\Users\xx\AppData\Local\Temp\pip-req-build-4b2hz8em\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
cwd: C:\Users\xxAppData\Local\Temp\pip-req-build-4b2hz8em
Complete output (7 lines):
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\xx\AppData\Local\Temp\pip-req-build-4b2hz8em\setup.py", line 10, in
long_description=open("README.md").read(),
File "c:\programdata\anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 274: character maps to
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Desktop (please complete the following information):

OS: Windows

Additional context
Add any other context about the problem here.

Allow user updates to different negation dictionaries

Is your feature request related to a problem? Please describe.
Currently, a user cannot modify negation dictionaries.

Describe the solution you'd like
When initializing NegEx object, allow a user to add custom terminology lists or keep defaults.

Describe alternatives you've considered
N/A

Additional context
N/A

extract patterns out the negated entities

Hi, how do we extract the patterns of the negations identified? Is there any standard methods of negex we can use?

Like if I want to know which negation pattern below was identified for the negated entity.

NegEx Patterns
psuedo_negations - phrases that are false triggers, ambiguous negations, or double negatives
preceding_negations - negation phrases that precede an entity
following_negations - negation phrases that follow an entity
termination - phrases that cut a sentence in parts, for purposes of negation detection (.e.g., "but")

Support for Spanish language

I used the Negex algorithm to deal with Spanish text. With the help of a Spanish Language expert, a list of negex terms based on the lists provided in the original paper was created. I'd like to contribute those to this repository. To that effect, I've created a fork of this repository. I opened this issue so there could be a discussion as to how this extension can happen.

When running example it returns "Steve Jobs False"

Describe the bug
When I run this example code:

import spacy
from negspacy.negation import Negex

nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, ent_types=["PERSON','ORG"])
nlp.add_pipe(negex, last=True)

doc = nlp("She does not like Steve Jobs but likes Apple products.")
for e in doc.ents:
    print(e.text, e._.negex)

It outputs:
Steve Jobs False
Apple False

Expected behavior
Steve Jobs True

Desktop (please complete the following information):

OS: Windows 10

jenojp / negspacy Goto Github PK

negspacy's People

Contributors

Stargazers

Watchers

Forkers

negspacy's Issues

The error is

Recommend Projects

Recommend Topics

Recommend Org

`The error is`