jenojp / negspacy Goto Github PK
View Code? Open in Web Editor NEWspaCy pipeline object for negating concepts in text
License: MIT License
spaCy pipeline object for negating concepts in text
License: MIT License
I have a custom NER pipeline. When I pair it with negspacy, I see wrong part of the sentence being negated.
e.g. in these cases, "home" is detected as negated:
"he has not been able to walk much outside his home"
"He does not have any help at home"
In these examples I am looking whether person is housed or homeless
Describe the bug
I am processing the medical texts written by nurses and doctors using spacy English() model and Negex to find the appropriate negations. The code works fine when i run it in single thread but when I use Multiprocessing to process texts simultaneously it raises an Exception as given below
File "../code/process_notes.py", line 154, in multiprocessing_finding_negation pool_results = pool.map(self.process, split_dfs) File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value AttributeError: ("[E046] Can't retrieve unregistered extension attribute 'negex'. Did you forget to call the
set_extension method?", 'occurred at index 2')
To Reproduce
def load_spacy_model(self):
nlp = English()
nlp.add_pipe(self.nlp.create_pipe('sentencizer'))
ruler = EntityRuler(self.nlp)
# adding labels and patterns in the entity ruler
ruler.add_patterns(self.progressnotes_constant_objects.return_pattern_label_list(self.client))
# adding entityruler in spacy pipeline
nlp.add_pipe(ruler)
preceding_negations = en_clinical['preceding_negations']
following_negations = en_clinical['following_negations']
# adding custom preceding negations with the default preceding negations.
preceding_negations += self.progressnotes_constant_objects.return_custom_preceding_negations()
# adding custom following negations with the default following negations.
following_negations += self.progressnotes_constant_objects.return_custom_following_negations()
# negation words to see if a noun chunk has negation or not.
negation_words = self.progressnotes_constant_objects.return_negation_words()
negex = Negex(nlp, language='en_clinical', chunk_prefix=negation_words,
preceding_negations=preceding_negations,
following_negations=following_negations)
# adding negex in the spacy pipeline
# input----->|entityruler|----->|negex|---->entities
nlp.add_pipe(negex, last=True)
def process(self, split_dfs):
split_dfs = split_dfs
# this function is run under multiprocessing.
split_dfs = self.split_dfs.apply(self.lambda_func, axis=1)
def lambda_func(self, row):
"""
This is a lambda function which runs inside a multiprocessing pool.
It read single row of the dataframe.
Applies basic cleanup using replace dict.
Finds positive,their respective tart-end index and negative words.
positive words are the words mentioned in the keywords patterns.
"""
row['clean_note'] = row['notetext']
# passing the sentence from NLP pipeline.
doc = self.nlp(row['clean_note'])
neg_list = list()
pos_list, pos_index_list = list(), list()
for word in doc.ents:
# segregating positive and negative words.
if not word._.negex:
# populating positive and respective positive index list.
pos_list.append(word.text)
pos_index_list.append((word.start_char, word.end_char))
else:
neg_list.append(word.text)
p = os.cpu_count() - 1
pool = mp.Pool(processes=p)
split_dfs = np.array_split(notes_df, 25) # notes_df is a panda dataframe
pool_results = pool.map(self.process, split_dfs)
pool.close()
pool.join()
Expected behavior
pos_list & neg_list needs to get populated
Desktop (please complete the following information):
Hi,
With negspacy 1.0.0 and spacy 3.0.1, I get the error for using termsets:
'negex -> neg_tersmset extra fields not permitted'
nlp = spacy.load("en_core_web_sm")
ts = termset("en")
nlp.add_pipe(
"negex",
config={
"neg_tersmset": ts.get_patterns()
}
)
Can you help?
Thank you for creating negspacy! This is extremely helpful. I was wondering you can provide more detail on Consider pairing with scispacy to find UMLS concepts in text and process negations.
?
I searched Google and was not able to find any examples. How do you pair with othet spacy elements.
When I use this code, I can generate negex as seperate column
doc=nlp_negex(d)
labels=["ENTITY", "FAMILY"]
df_test = pd.DataFrame(columns=["ent","label", "negex", "sent"])
attrs = ["text", "label_", "sent"]
for e in doc.ents:
if e.label_ in labels:
df_test = df_test.append({'ent': e.text,'label': e.label_, 'negex': e._.negex, "sent":str(e.sent) }, ignore_index=True)
But, when I use the following code
doc=nlp_negex(d)
labels=['ENTITY', 'FAMILY']
attrs = ["text", "label_", "sent", "..negex"]
data = [[str(getattr(ent, attr)) for attr in attrs] for ent in doc.ents if ent.label in labels]
df2 = pd.DataFrame(data, columns=attrs)`
I am getting following error
`AttributeError: 'spacy.tokens.span.Span' object has no attribute '._.negex'
How can I access negex as a registered extension
Thank you
To avoid issues in pip 23, utilize pyproject.toml over legacy setup.py method.
I think I'm missing something here and can't seem to resolve it.
The code works with the example texts provided in much of the documentation (e.g. "She does not like Steve Jobs but likes Apple products."), and the term 'cannot' appears in the termset - how can I identify these simple negations? Please note the print is indented in the original code.
Here's my code:
pip install negspacy
import spacy
from negspacy.negation import Negex
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})
ts = termset("en")
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})
doc = nlp("Men cannot play football.")
for e in doc.ents:
print(e.text,` e._.negex)
My code currently looks like -
import en_core_sci_lg
from negspacy.negation import Negex
nlp = en_core_sci_lg.load()
negex = Negex(nlp, language = "en_clinical_sensitive")
nlp.add_pipe(negex, last=True)
doc = nlp(""" patient has no signs of shortness of breath. """)
for word in doc.ents:
print(word, word._.negex)
The output is -
patient False
shortness True
I want the output to be -
patient False
shortness of breath True
How can I consider phrases like "shortness of breath", "sore throat", "respiratory distress" as a single entity.
I was thinking of adding this custom phrases to add in negation.py line 81. how can I do that? is there any other approach with which I can resolve this issue.
Is there a straight forward way to apply Negex to adjectives? I already incorporated Negex into my pipeline with my own custom component, but I didn't realize until after the fact that it seems to only be searching for negations in relation to Named Entities. For example, I was hoping to apply it, so I'd get positive matches on something like:
doc = nlp("Eve is not nice. Eve is friendly. Eve is not chill.")
for s in doc.sents:
for t in s.tokens:
print(t._.negex, t.text)
True nice
False friendly
True chill
It seems like this is not supported at the moment, but if anyone has any advice on how to customize Negex to achieve this it would be much appreciated. Also, if there's a good reason to not bother trying to do this at all, would love to understand that too.
Thanks!
Hi,
How can I apply negations on the token? is it possible?
I tried following but it ended up in error.
`for token in doc:
print(token.text, token._.negex)
The error is
AttributeError Traceback (most recent call last)
in
1 for token in doc:
----> 2 print(token.text, token._.negex)
~/anaconda3/envs/python3/lib/python3.6/site-packages/spacy/tokens/underscore.py in getattr(self, name)
33 def getattr(self, name):
34 if name not in self._extensions:
---> 35 raise AttributeError(Errors.E046.format(name=name))
36 default, method, getter, setter = self._extensions[name]
37 if getter is not None:
AttributeError: [E046] Can't retrieve unregistered extension attribute 'negex'. Did you forget to call the set_extension
method?
`
Is your feature request related to a problem? Please describe.
An additional functionality of tagging terms as 'possible': It's a feature in one of the original negex implementations as well as in pyConTextNLP. Also, some important negation corpora include such annotation (i.e. speculated/possible terms).
Describe the solution you'd like
An example could like this:
doc = nlp("breast cancer may be ruled out")
for e in doc.ents:
print(e.text, e._.negex)
Output:
breast cancer possible
e._.negex
to be type of i.e. string instead of bool.possible
tagging enabled, check this issue here.negex_triggers.txt
file.Describe alternatives you've considered
None other than using this mentioned negex code separately (combined with spacy, without negspacy).
Additional context
I can refer to the README as well as negex.py files in here. I imagine, step 2. is the only one that would require more work and having good understanding of negspacy.
Hi,
For cases like "Blood Transfusion: No" the negation is failing tried adding :No in the termset still no change
nlp = spacy.load("en_core_sci_lg")
ts = termset("en_clinical")
nlp.add_pipe(
"negex",
config={
"chunk_prefix": ["no"]
},
last=True,
)
doc = nlp("Blood Transfusion: No")
for e in doc.ents:
print(e.text, e._.negex)
Output:
Blood Transfusion False
Hello, I am working on a project with a clinical dataset. So far I was able to detect all the diagnoses and whether they are negated or not. But I really like to get the negation term used to detect negated lexicon as well. For example:
import spacy
from negspacy.negation import Negex
nlp = spacy.load("en_core_sci_lg")
nlp.add_pipe("negex")
doc = nlp("She has neither fever nor cough.")
for e in doc.ents:
print(e.text, e._.negex)
fever True
cough True
What more do I expect to get:
The negation term for each negated lexicon detected: neither, nor
It would be very appreciated if you could help me.
I tested the pipeline for Spanish text as following
ts = termset("es_clinical")
nlp = spacy.load("es_core_news_sm")
nlp.add_pipe(
"negex",
config={
"neg_termset":ts.get_patterns()
}
)
and got the error
File "/root/negex_spacy/negex.py", line 5, in <module>
ts = termset("es_clinical")
File "/usr/local/lib/python3.10/dist-packages/negspacy/termsets.py", line 212, in __init__
self.terms = LANGUAGES[termset_lang]
KeyError: 'es_clinical'
Update: try to install via cloned repository and works. Seems that the one installed via pip
is not the lastest version.
Is your feature request related to a problem? Please describe.
Can negspacy be used with already identified Entities and their spans through scispacy, by providing them somehow?
Describe the solution you'd like
For instance, scispacy has been already run with its EntityLinker, and umls entities with their the indices have been obtained and stored somewhere.
It would be computationally expensive to run the whole scispacy with negspacy again. Is there a way to only run (sci)spacy with only base spacy functionality like the tokenizer, and provide the full text string, the entities and their indices somehow, so that negspacy can determine the negation status?
I am using the versions
spaCy 3.0.3
negspacy 1.0.0
scispacy 0.4.0
I think the current version of negspacy is not compatible with scispacy. I already read the issue but I think it works with the previous negspacy version. I also tried other models of scispacy like en_core_sci_sm
but got the same error:
ValueError: [E002] Can't find factory for 'negex' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator `@Language.component` (for function components) or `@Language.factory` (for class components).
Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, textcat_multilabel, en.lemmatizer
My code is as follows:
nlp = spacy.load("en_core_sci_md")
ts = termset("en_clinical")
ts.add_patterns({'preceding_negations': ["nor","non"]})
nlp.add_pipe(
"negex",
config={
"neg_termset":ts.get_patterns(),
"chunk_prefix": ["no"],
},
last=True,
)
for text in df['Cons']:
doc = nlp(str(text))
..........
Can you help?
Hi ,
Thanks for this awesome negation tool, its amazing.
Motivating by negspacy I have some thoughts to implement which is similar task of negspacy, so kindly suggest to me.
I have a medical text which has a diagnosis, past history, negations (like no headache, fever), and in what case-patient to consult a doctor in the future/emergency.
Example of text
The patient admitted to the hospital with hypertension and chronic kidney disease. The patient had a past history of diabetes mellitus and coronary artery disease. When the patient admitted to the hospital, no symptoms of fever, giddiness, and headache found. The patient is asked to consult a doctor in case of vomiting and nausea.
The above sentence has a present illness (sentence 1), past history (sentence 2), negations (sentence 3), and future consultation (sentence 4). I have been using scispacy for medical concept extraction and negspacy for negations, both of them are working fine.
Now my next task is,
How do I separate present illness, past history, and future consultations in the NLP technique?
I have thought in mind that to add "past history of", "in case of emergency", "history of" in the chunk_prefix
. is it a good move?
Can I create a duplicate of negspacy and add my own terms and add as a separate pipeline to spacy?
Describe the bug
Pseudo negations are not being handled properly.
Thanks for enabling negex in the spaCy ecosystem -- this is incredibly helpful.
I noticed your termsets.py file is a subset of the trigger words/phrases historically used by negex (see here)
Was this for performance issues? Or to make negspacy more generalizable in non-healthcare domains? Some other reason?
I'm aware you can override negspacy's default termsets (nice feature), so this is more of a general question.
Thanks again for making this available.
Describe the bug
A clear and concise description of what the bug is.
uninstalls the latest spacy 2.2.3 and reinstalls an older version spacy 2.1.8.
To Reproduce
Steps to reproduce the behavior:
pip install negspacy
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
Hi I am getting the following error while trying to process the example with UmlsEntityLinker() ๐
File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
[Sun Sep 20 16:41:24.954031 2020] [wsgi:error] [pid 458:tid 140686318831360] [remote 122.176.234.216:63073] obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[Sun Sep 20 16:41:24.954035 2020] [wsgi:error] [pid 458:tid 140686318831360] [remote 122.176.234.216:63073] File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
[Sun Sep 20 16:41:24.954041 2020] [wsgi:error] [pid 458:tid 140686318831360] [remote 122.176.234.216:63073] raise JSONDecodeError("Expecting value", s, err.value) from None
[Sun Sep 20 16:41:24.954045 2020] [wsgi:error] [pid 458:tid 140686318831360] [remote 122.176.234.216:63073]
json.decoder.JSONDecodeError: Expecting value: line 1 column 219152385 (char 219152384)
I am using Python 3.8.2
Please help.
Regards
Prabhat
Is your feature request related to a problem? Please describe.
With the architecture to detect negations, could negspacy also detect if named-entities are within the scope of family mentions? This would be especially useful in combination with found UMLS concepts. Instead of negation words, it would probably have to use words like brother, sister, etc.
Describe the solution you'd like
In addition to the negation status of UMLS concepts, a familiy mention status could be reported.
Describe the bug
See explosion/spaCy#4267
To Reproduce
See explosion/spaCy#4267
Expected behavior
Whether using model based NER or EntityRuler, negspacy should know a document was NERed.
Is your feature request related to a problem? Please describe.
Some models may noun chunk a negation into an entity span. For example:
There is no headache
doc.ents will include "no headache"
.
This would cause the negation algorithm to miss an obvious negation. This does not seem to happen with spacy out of the box models but more so in scispacy's biomedical language models.
Hi,
I am trying to find negations in a sentence using negspacy. But, it's printing first negation (no headache) as False which is supposed to be True and picking the second negation correctly. Should I fine-tune any parameters to get the first negation correctly?
Here is my code.
nlp = spacy.load("en_core_sci_md")
negex = Negex(nlp, language = "en_clinical")
doc = nlp('I am having Hypertension with no headache and fever')
for ent in doc.ents:
print(ent.text, ent._.negex)
Output:
Hypertension False
no headache False
fever True
Describe the bug
If you go to the docs here you'll see the following code example:
import spacy
from negspacy.negation import Negex
nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, ent_types=["PERSON','ORG"])
nlp.add_pipe(negex, last=True)
doc = nlp("She does not like Steve Jobs but likes Apple products.")
for e in doc.ents:
print(e.text, e._.negex)
This is out of date and should be replaced with the contents of the current readme.
First of all, thanks for this awesome library.
Describe the bug
When running the example from the README with the version 1.0.0 I get the following error:
TypeError: __init__() missing 4 required positional arguments: 'name', 'neg_termset', 'extension_name', and 'chunk_prefix'
To Reproduce
Steps to reproduce the behavior:
import spacy
from negspacy.negation import Negex
nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, ent_types=["PERSON","ORG"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __init__() missing 4 required positional arguments: 'name', 'neg_termset', 'extension_name', and 'chunk_prefix'
Expected behavior
Expecting it to work as in the README example with the latest version.
Desktop (please complete the following information):
Describe the bug
Getting the below error with pip install negspacy command in Anacondo Prompt.
To Reproduce
Steps to reproduce the behavior:
Run Anaconda Prompt as Administrator
type - pip install negspacy and enter
Expected behavior
negspacy package gets install and should be present in C:\ProgramData\Anaconda3\Lib\site-packages
Screenshots
(base) C:\WINDOWS\system32>pip install C:\negspacy-0.1.0a0.tar.gz
Processing c:\negspacy-0.1.0a0.tar.gz
ERROR: Command errored out with exit status 1:
command: 'c:\programdata\anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\xx\AppData\Local\Temp\pip-req-build-4b2hz8em\setup.py'"'"'; file='"'"'C:\Users\xx\AppData\Local\Temp\pip-req-build-4b2hz8em\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
cwd: C:\Users\xxAppData\Local\Temp\pip-req-build-4b2hz8em
Complete output (7 lines):
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\xx\AppData\Local\Temp\pip-req-build-4b2hz8em\setup.py", line 10, in
long_description=open("README.md").read(),
File "c:\programdata\anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 274: character maps to
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
Is your feature request related to a problem? Please describe.
Currently, a user cannot modify negation dictionaries.
Describe the solution you'd like
When initializing NegEx object, allow a user to add custom terminology lists or keep defaults.
Describe alternatives you've considered
N/A
Additional context
N/A
Hi, how do we extract the patterns of the negations identified? Is there any standard methods of negex we can use?
Like if I want to know which negation pattern below was identified for the negated entity.
NegEx Patterns
psuedo_negations - phrases that are false triggers, ambiguous negations, or double negatives
preceding_negations - negation phrases that precede an entity
following_negations - negation phrases that follow an entity
termination - phrases that cut a sentence in parts, for purposes of negation detection (.e.g., "but")
I used the Negex algorithm to deal with Spanish text. With the help of a Spanish Language expert, a list of negex terms based on the lists provided in the original paper was created. I'd like to contribute those to this repository. To that effect, I've created a fork of this repository. I opened this issue so there could be a discussion as to how this extension can happen.
Describe the bug
When I run this example code:
import spacy
from negspacy.negation import Negex
nlp = spacy.load("en_core_web_sm")
negex = Negex(nlp, ent_types=["PERSON','ORG"])
nlp.add_pipe(negex, last=True)
doc = nlp("She does not like Steve Jobs but likes Apple products.")
for e in doc.ents:
print(e.text, e._.negex)
It outputs:
Steve Jobs False
Apple False
Expected behavior
Steve Jobs True
Desktop (please complete the following information):
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.