When running spacy-udpipe for Romanian, I get the following error: <div class="sni

I've enabled a quick fix in <a class="issue-link js-issue-link" data-error-text="Faile

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Closing this issue as it is fixed in <a class="issue-link js-issue-link" data-error-te

Error running model for Romanian about spacy-udpipe HOT 10 CLOSED

takelab commented on May 18, 2024

Error running model for Romanian

from spacy-udpipe.

Comments (10)

luckytoilet commented on May 18, 2024 1

Thanks, I'll take a look. In the meantime, is there nothing that can be done on this project, at least fail more gracefully? For me, I'm only looking to use it as a part-of-speech tagger and I don't need to extract the case markings, but it fails to run at all. Maybe it would be better to ignore unrecognized morphological features rather than crashing.

from spacy-udpipe.

asajatovic commented on May 18, 2024 1

I've enabled a quick fix in #11. After some discussion, I am fairly confident this should remain in a separate branch (as the underlying issue is in spaCy). For now, you can use
pip install git+https://github.com/TakeLab/spacy-udpipe.git@feature/soft-morph-fail
to install the quick-fix version.

from spacy-udpipe.

rahonalab commented on May 18, 2024 1

Hello @asajatovic and hvala 🙏 for your quick response :-)
As far as I understand, the two Italian models as well as the Croatian one don't have the morphological features, right? The link you sent to me explain how to add the tag map to an existing model, so probably I'd have to write the whole set of morphological features for Italian to get it work. But I thought there was already a set of morphological feature, since the key_error contains something...

from spacy-udpipe.

asajatovic commented on May 18, 2024

Thanks for reporting this. After some code digging, I am confident this happens because of the way the tag maps for Romanian and Polish are defined. For the code snippet you provided, a morphology feature "Case" is extracted from "Pw3--r", an XPOS (Language-specific part-of-speech tag) of the word Ce. As "Case" is not in the supported FEATURES for the Morphology class (see this and this), an exception occurs. The same problem happens again for the word Ce and XPOS values "Person" and "PronType". An equivalent thing occurs for the word faci with XPOS value "Vmip2s" mapping to "Person", which again is not in FEATURES(link). You can access the xpostag attribute if you process the text using the 'raw' UDPipe model (nlp.udpipe(text)).

Since this library is only a wrapper for the UDPipe models and as tag maps are specific to each language, to solve the issue(s), I suggest you update the tag maps for the problematic languages. A good start would be https://spacy.io/usage/adding-languages#tag-map and making sure the tag map features are compliant with the ones defined in spaCy. 😄

from spacy-udpipe.

rahonalab commented on May 18, 2024

Hi!
I don't know whether this is related, but I cannot print out morphological features for Italian. I have tried both the standard isdt model and the vit model.

I have also tried tag_map:

>>> nlp = spacy_udpipe.load("it")
>>> for token in nlp("Il bello di questo mestiere è che ti fa crescere."): nlp.vocab.morphology.tag_map[token.tag_]
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'RD'

The function works with other languages, for instance English:

>>> nlp = spacy_udpipe.load("en")
>>> for token in nlp("Dogs are friendly."): nlp.vocab.morphology.tag_map[token.tag_]
... 
{74: 92, 'Number_plur': True}
{74: 100, 'Tense_pres': True, 'VerbForm_fin': True}
{74: 84, 'Degree_pos': True}
{74: 97, 'PunctType_peri': True}

but fails for others too, for instance, Croatian:

>>> nlp = spacy_udpipe.load("hr")
>>> for token in nlp("Magdalena već godinama radi u Državnom Restauratorskom Zavodu."): nlp.vocab.morphology.tag_map[token.tag_]
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Npfsn'

I am using the latest version of spacy (2.2.3) and spacy-udpipe (0.1.0), both with the soft-morph-fail fix and without.

from spacy-udpipe.

asajatovic commented on May 18, 2024

@rahonalab Hi! The reason it does not work is because of the tag map for the Italian language (link).
Regarding the tag map for Croatian in spaCy, it doesn't yet exist.
Both are inherently related to spaCy and if you want to use morphological features, the tag map for a specific language should be updated in the spaCy repo. For more details see https://spacy.io/usage/adding-languages#tag-map.
All of this will be documented with some workarounds in a new spacy-udpipe release which is currently WIP. 😄
Edit: You can now install the latest package version (with the mentioned update ^) directly from the master branch!

from spacy-udpipe.

asajatovic commented on May 18, 2024

@rahonalab You are welcome! :)
You are right, there already exist morphological features for Italian, however spaCy recently changed the (language-agnostic) values in morphological FEATURES. The keys for TAG_MAP from tag_map.py should map exactly from and to morphological FEATURES. Regarding Italian, you should ideally only update the TAG_MAP, whereas for Croatian it can only be done from scratch (no existing TAG_MAP).
Also, the TAG_MAP for a specific language is and should be independent of any model for the same language.

from spacy-udpipe.

rahonalab commented on May 18, 2024

Thank you, now I start to understand something :-)
The Italian tag_map which is currently employed in the UD model has numbers in place of POS:XPOS

nlp.vocab.morphology.tag_map
{'AP__Gender=Fem|Number=Plur|Poss=Yes|PronType=Prs': {74: 90},

whereas the Italian spacy 2.2.4 has:

(/usr/local/lib/python3.7/site-packages/spacy/lang/it)

TAG_MAP = {
    "AP__Gender=Fem|Number=Plur|Poss=Yes|PronType=Prs": {POS: DET},

I saw your workaround to stop importing the 'wrong' TAG_MAP:

nlp = spacy_udpipe.load("it",ignore_tag_map=True)

Why don't you include an option to automatically import the tagmap from spacy?

from spacy-udpipe.

asajatovic commented on May 18, 2024

If available, a language-specific TAG MAP is automatically loaded for every spacy-udpipe andspacy language model. Keep in mind that TAG MAP is defined in spaCy, specifically for each language, and is loaded only from spaCy.

The workaround is simply there to enable proper POS tagging by ignoring morphological features if they are outdated (in other words, if the TAG_MAP values don't exactly match FEATURES values).

I hope this clears the confusion! :)

Edit: Regarding the numbers in place of XPOS:POS, that is fine as this also happens when you load a 'pure' spaCy model.

from spacy-udpipe.

asajatovic commented on May 18, 2024

Closing this issue as it is fixed in #12.

from spacy-udpipe.

Error running model for Romanian about spacy-udpipe HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent