Comments (7)
Hi,
The problem isn't caused by the stanza sentiment processer, as this is not used to create the conllu. I'll have a look at this today and see if I can replicate it and then find the cause.
from semeval22_structured_sentiment.
Hi again,
Just reran the code using stanza 1.1.1, and the output looks fine (see below). Which stanza version are you running?
In fact, it looks like token 14 ('-') was deleted in your conllu and I imagine that could be the root of the problem. I wonder why it was deleted though?
# text = The opposition Movement for Democratic Change ( MDC ) complained that the set - up was deliberately confusing in a ploy to discourage the urban vote , which is thought to favor Mugabe 's challenger Morgan Tsvangirai .
1 The the DET _ _ 3 det _ _ 9:holder
2 opposition opposition NOUN _ _ 3 compound _ _ 9:holder
3 Movement Movement PROPN _ _ 10 nsubj _ _ 9:holder
4 for for ADP _ _ 6 case _ _ 9:holder
5 Democratic Democratic PROPN _ _ 6 compound _ _ 9:holder
6 Change Change PROPN _ _ 3 nmod _ _ 9:holder
7 ( ( PUNCT _ _ 8 punct _ _ 9:holder
8 MDC MDC PROPN _ _ 6 appos _ _ 9:holder
9 ) ) PUNCT _ _ 8 punct _ _ 10:holder
10 complained complain VERB _ _ 0 root _ _ 0:exp-negative
11 that that SCONJ _ _ 18 mark _ _ _
12 the the DET _ _ 15 det _ _ 15:targ
13 set set NOUN _ _ 15 compound _ _ 15:targ
14 - - PUNCT _ _ 15 punct _ _ 15:targ
15 up up NOUN _ _ 18 nsubj _ _ 10:targ
16 was be AUX _ _ 18 cop _ _ _
17 deliberately deliberately ADV _ _ 18 advmod _ _ _
18 confusing confusing ADJ _ _ 10 ccomp _ _ _
19 in in ADP _ _ 21 case _ _ _
20 a a DET _ _ 21 det _ _ _
21 ploy ploy NOUN _ _ 18 obl _ _ _
22 to to PART _ _ 23 mark _ _ _
23 discourage discourage VERB _ _ 21 acl _ _ _
24 the the DET _ _ 26 det _ _ _
25 urban urban ADJ _ _ 26 amod _ _ _
26 vote vote NOUN _ _ 23 obj _ _ _
27 , , PUNCT _ _ 26 punct _ _ _
28 which which PRON _ _ 30 nsubj:pass _ _ _
29 is be AUX _ _ 30 aux:pass _ _ _
30 thought think VERB _ _ 26 acl:relcl _ _ _
31 to to PART _ _ 32 mark _ _ _
32 favor favor VERB _ _ 30 xcomp _ _ 0:exp-positive
33 Mugabe Mugabe PROPN _ _ 35 nmod:poss _ _ 37:targ
34 's 's PART _ _ 33 case _ _ 37:targ
35 challenger challenger NOUN _ _ 32 obj _ _ 37:targ
36 Morgan Morgan PROPN _ _ 32 obj _ _ 37:targ
37 Tsvangirai Tsvangirai PROPN _ _ 36 flat _ _ 32:targ
38 . . PUNCT _ _ 10 punct _ _ _```
from semeval22_structured_sentiment.
Hi
Good to know that the dataset annotation is not depend on stanza.
I will switch to your stanza version 1.1.1 to avoid any error.
I think the problem comes to this issue stanfordnlp/stanza#804
from semeval22_structured_sentiment.
Ok, great! Let me know if using 1.1.1 works and if so, I'll close the issue. If you still have problems and it is the sentiment module that removes the token, we could also always remove that element from the stanza pipeline.
from semeval22_structured_sentiment.
Ok, great! Let me know if using 1.1.1 works and if so, I'll close the issue. If you still have problems and it is the sentiment module that removes the token, we could also always remove that element from the stanza pipeline.
The number of sentences in *.json
generated by process_mpqa.py
with Stanza v1.1.1 is differebt with the Stanza v1.2.3. Also some minor difference in the number of holders.
What amounts of the data are expected?
from semeval22_structured_sentiment.
Hi,
I've just tried rerunning process_mpqa.py with both Stanza v1.1.1 and v1.2.3. I get two small differences in tokenization due to how they deal with some punctuation marks on the following two sentences ('temp_fbis/21.50.57-15245-29' and 'ula/118CWL050-40'):
1.1.1 : 'Image- 2.gif'
1.2.3 : 'Image - 2.gif'
1.1.1: 'To receive an application form , check the NAP box on the enclosed pledge card or call us , ( 317 ) 634-6102 , ext. 20 .'
1.2.3: 'To receive an application form , check the NAP box on the enclosed pledge card or call us , ( 317 ) 634-6102 , ext. 20.'
However, all the annotations and number of sentences in train (5873) are the same. Silly question, but just to be safe, have you pulled all the recent changes to the code?
from semeval22_structured_sentiment.
Great! That is the same as mine results. Thanks for clarrification.
from semeval22_structured_sentiment.
Related Issues (20)
- Accuracy of baseline values HOT 4
- Doubts about the cross-lingual task HOT 2
- problems in submitting zip file HOT 4
- Gaps between local evaluation and online evaluation HOT 7
- python evaluate_single_dataset.py dev.json dev.json does not result in 100% F1 HOT 2
- Request for requirements.txt for baseline codes HOT 2
- Missing samples in MPQA dataset HOT 12
- null values in MPQA dataset HOT 2
- Error when uploading the file with the predictions HOT 9
- Some details about data HOT 4
- About pytorch version HOT 2
- Intensity labels HOT 1
- the test set of norec HOT 2
- Miss sentences in test HOT 5
- Can we use the multilingual pretrained model (e.g. mBERT) in the two subtasks (Monolingual and Cross-lingual) ? HOT 2
- Conflicting data usage rules about crosslingual task HOT 2
- Difference in MPQA test set HOT 2
- Mistake in MPQA annotation
- Mistake in darmstadt_unis annotation
- Mistake in Multibooked_eu annotation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from semeval22_structured_sentiment.