arshsekhon / pubtator_loader Goto Github PK
View Code? Open in Web Editor NEWA Python ๐ package to load PubTator Documents ๐งพ, tokenize and convert them to BILUO Format.
License: GNU General Public License v3.0
A Python ๐ package to load PubTator Documents ๐งพ, tokenize and convert them to BILUO Format.
License: GNU General Public License v3.0
Hi,
may I ask what is the input format of the tool? As I only see a file named 'sample_pubator_reader_input' but not available.
Hi,
while trying to use your library, I failed at the first step: importing it. Not sure what I am doing wrong; can you sport what I could be doing wrong?
Find the traceback below.
ModuleNotFoundError Traceback (most recent call last)
in
----> 1 from pubtator_loader.pubtator_corpus_reader import PubTatorCorpusReader
~/anaconda3/envs/know-nlp-tf2/lib/python3.6/site-packages/pubtator_loader-0.1.1-py3.6.egg/pubtator_loader/init.py in
----> 1 from .models import PubTatorEntity, PubTatorDocument # noqa
ModuleNotFoundError: No module named 'pubtator_loader.models'
When I try and install the package, I get the following error:
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
ร Getting requirements to build wheel did not run successfully.
โ exit code: 1
โฐโ> [164 lines of output]
Error compiling Cython file:
------------------------------------------------------------
...
int length
cdef class Vocab:
cdef Pool mem
cpdef readonly StringStore strings
^
------------------------------------------------------------
spacy/vocab.pxd:28:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
Error compiling Cython file:
------------------------------------------------------------
...
cdef class Vocab:
cdef Pool mem
cpdef readonly StringStore strings
cpdef public Morphology morphology
^
------------------------------------------------------------
spacy/vocab.pxd:29:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
Error compiling Cython file:
------------------------------------------------------------
...
cdef class Vocab:
cdef Pool mem
cpdef readonly StringStore strings
cpdef public Morphology morphology
cpdef public object vectors
^
------------------------------------------------------------
spacy/vocab.pxd:30:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
Error compiling Cython file:
------------------------------------------------------------
...
cdef class Vocab:
cdef Pool mem
cpdef readonly StringStore strings
cpdef public Morphology morphology
cpdef public object vectors
cpdef public object _lookups
^
------------------------------------------------------------
spacy/vocab.pxd:31:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
Error compiling Cython file:
------------------------------------------------------------
...
cdef Pool mem
cpdef readonly StringStore strings
cpdef public Morphology morphology
cpdef public object vectors
cpdef public object _lookups
cpdef public object writing_system
^
------------------------------------------------------------
spacy/vocab.pxd:32:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
Error compiling Cython file:
------------------------------------------------------------
...
cpdef readonly StringStore strings
cpdef public Morphology morphology
cpdef public object vectors
cpdef public object _lookups
cpdef public object writing_system
cpdef public object get_noun_chunks
^
------------------------------------------------------------
spacy/vocab.pxd:33:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
Error compiling Cython file:
------------------------------------------------------------
...
cdef float prior_prob
cdef class KnowledgeBase:
cdef Pool mem
cpdef readonly Vocab vocab
^
------------------------------------------------------------
spacy/kb.pxd:31:10: Variables cannot be declared with 'cpdef'. Use 'cdef' instead.
Copied /tmp/pip-install-2b1stxhf/spacy_240f1e06418342eb88af85a39eba4527/setup.cfg -> /tmp/pip-install-2b1stxhf/spacy_240f1e06418342eb88af85a39eba4527/spacy/tests/package
Copied /tmp/pip-install-2b1stxhf/spacy_240f1e06418342eb88af85a39eba4527/pyproject.toml -> /tmp/pip-install-2b1stxhf/spacy_240f1e06418342eb88af85a39eba4527/spacy/tests/package
Cythonizing sources
Compiling spacy/training/example.pyx because it changed.
Compiling spacy/parts_of_speech.pyx because it changed.
Compiling spacy/strings.pyx because it changed.
Compiling spacy/lexeme.pyx because it changed.
Compiling spacy/vocab.pyx because it changed.
Compiling spacy/attrs.pyx because it changed.
Compiling spacy/kb.pyx because it changed.
Compiling spacy/ml/parser_model.pyx because it changed.
Compiling spacy/morphology.pyx because it changed.
Compiling spacy/pipeline/dep_parser.pyx because it changed.
Compiling spacy/pipeline/morphologizer.pyx because it changed.
Compiling spacy/pipeline/multitask.pyx because it changed.
Compiling spacy/pipeline/ner.pyx because it changed.
Compiling spacy/pipeline/pipe.pyx because it changed.
Compiling spacy/pipeline/trainable_pipe.pyx because it changed.
Compiling spacy/pipeline/sentencizer.pyx because it changed.
Compiling spacy/pipeline/senter.pyx because it changed.
Compiling spacy/pipeline/tagger.pyx because it changed.
Compiling spacy/pipeline/transition_parser.pyx because it changed.
Compiling spacy/pipeline/_parser_internals/arc_eager.pyx because it changed.
Compiling spacy/pipeline/_parser_internals/ner.pyx because it changed.
Compiling spacy/pipeline/_parser_internals/nonproj.pyx because it changed.
Compiling spacy/pipeline/_parser_internals/_state.pyx because it changed.
Compiling spacy/pipeline/_parser_internals/stateclass.pyx because it changed.
Compiling spacy/pipeline/_parser_internals/transition_system.pyx because it changed.
Compiling spacy/pipeline/_parser_internals/_beam_utils.pyx because it changed.
Compiling spacy/tokenizer.pyx because it changed.
Compiling spacy/training/align.pyx because it changed.
Compiling spacy/training/gold_io.pyx because it changed.
Compiling spacy/tokens/doc.pyx because it changed.
Compiling spacy/tokens/span.pyx because it changed.
Compiling spacy/tokens/token.pyx because it changed.
Compiling spacy/tokens/span_group.pyx because it changed.
Compiling spacy/tokens/graph.pyx because it changed.
Compiling spacy/tokens/morphanalysis.pyx because it changed.
Compiling spacy/tokens/_retokenize.pyx because it changed.
Compiling spacy/matcher/matcher.pyx because it changed.
Compiling spacy/matcher/phrasematcher.pyx because it changed.
Compiling spacy/matcher/dependencymatcher.pyx because it changed.
Compiling spacy/symbols.pyx because it changed.
Compiling spacy/vectors.pyx because it changed.
[ 1/41] Cythonizing spacy/attrs.pyx
[ 2/41] Cythonizing spacy/kb.pyx
Traceback (most recent call last):
File "/mnt/home/lotrecks/anaconda3/envs/graphs/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/mnt/home/lotrecks/anaconda3/envs/graphs/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/mnt/home/lotrecks/anaconda3/envs/graphs/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-ytg7xk32/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 355, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/pip-build-env-ytg7xk32/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 325, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-ytg7xk32/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in run_setup
exec(code, locals())
File "<string>", line 224, in <module>
File "<string>", line 211, in setup_package
File "/tmp/pip-build-env-ytg7xk32/overlay/lib/python3.10/site-packages/Cython/Build/Dependencies.py", line 1154, in cythonize
cythonize_one(*args)
File "/tmp/pip-build-env-ytg7xk32/overlay/lib/python3.10/site-packages/Cython/Build/Dependencies.py", line 1321, in cythonize_one
raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: spacy/kb.pyx
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
ร Getting requirements to build wheel did not run successfully.
โ exit code: 1
โฐโ> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
It looks like this is maybe related to spacy, and not pubtator loader directly, but I've never gotten this error when installing spacy with other packages. Wondering if you've seen this error?
Trying to parse the NCBI Disease Corpus train set, but get an error for mentions that include multiple MeSH terms (i.e. "colon and some other cancers" -> D003110|D009369). Suggestions on how to handle this aside from removing lines that include "CompositeMention".
Dataset
10192393|t|A common human skin tumour is caused by activating mutations in beta-catenin.
10192393|a|WNT signalling orchestrates... but a small percentage of colon and some other cancers harbour...
10192393 15 26 skin tumour DiseaseClass D012878
10192393 443 449 cancer DiseaseClass D009369
10192393 483 496 colon cancers DiseaseClass D003110
10192393 539 565 adenomatous polyposis coli SpecificDisease D011125
10192393 567 570 APC SpecificDisease D011125
10192393 670 698 colon and some other cancers CompositeMention D003110|D009369
10192393 855 867 skin tumours DiseaseClass D012878
10192393 879 893 pilomatricomas SpecificDisease D018296
10192393 1021 1035 pilomatricomas SpecificDisease D018296
10192393 1210 1221 skin tumour DiseaseClass D012878
10192393 1262 1268 tumour Modifier D009369
10192393 1312 1326 pilomatricomas SpecificDisease D018296
10192393 1385 1392 tumours DiseaseClass D009369
10192393 1615 1622 tumours DiseaseClass D009369
Error
77 prev_line_type = curr_line_type
78 except Exception as e:
---> 79 raise Exception('ERROR occured when parsing line'
80 f' #{line_number}. Exception {e}')
82 if self.__document_being_read is not None:
83 self.corpus.append(self.__document_being_read)
Exception: ERROR occured when parsing line #8. Exception Unexpected content received on line #8, the line/data may have been corrupted. Content: '10192393 670 698 colon and some other cancers CompositeMention D003110|D009369
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.