Hi, I encounter an error. This is a sample from mp

The collu format last column seems to be index shifted. about semeval22_structured_sentiment HOT 7 CLOSED

jerbarnes commented on September 26, 2024

The collu format last column seems to be index shifted.

from semeval22_structured_sentiment.

Comments (7)

jerbarnes commented on September 26, 2024

Hi,

The problem isn't caused by the stanza sentiment processer, as this is not used to create the conllu. I'll have a look at this today and see if I can replicate it and then find the cause.

from semeval22_structured_sentiment.

jerbarnes commented on September 26, 2024

Hi again,

Just reran the code using stanza 1.1.1, and the output looks fine (see below). Which stanza version are you running?

In fact, it looks like token 14 ('-') was deleted in your conllu and I imagine that could be the root of the problem. I wonder why it was deleted though?

# text = The opposition Movement for Democratic Change ( MDC ) complained that the set - up was deliberately confusing in a ploy to discourage the urban vote , which is thought to favor Mugabe 's challenger Morgan Tsvangirai .
1	The	the	DET	_	_	3	det	_	_	9:holder
2	opposition	opposition	NOUN	_	_	3	compound	_	_	9:holder
3	Movement	Movement	PROPN	_	_	10	nsubj	_	_	9:holder
4	for	for	ADP	_	_	6	case	_	_	9:holder
5	Democratic	Democratic	PROPN	_	_	6	compound	_	_	9:holder
6	Change	Change	PROPN	_	_	3	nmod	_	_	9:holder
7	(	(	PUNCT	_	_	8	punct	_	_	9:holder
8	MDC	MDC	PROPN	_	_	6	appos	_	_	9:holder
9	)	)	PUNCT	_	_	8	punct	_	_	10:holder
10	complained	complain	VERB	_	_	0	root	_	_	0:exp-negative
11	that	that	SCONJ	_	_	18	mark	_	_	_
12	the	the	DET	_	_	15	det	_	_	15:targ
13	set	set	NOUN	_	_	15	compound	_	_	15:targ
14	-	-	PUNCT	_	_	15	punct	_	_	15:targ
15	up	up	NOUN	_	_	18	nsubj	_	_	10:targ
16	was	be	AUX	_	_	18	cop	_	_	_
17	deliberately	deliberately	ADV	_	_	18	advmod	_	_	_
18	confusing	confusing	ADJ	_	_	10	ccomp	_	_	_
19	in	in	ADP	_	_	21	case	_	_	_
20	a	a	DET	_	_	21	det	_	_	_
21	ploy	ploy	NOUN	_	_	18	obl	_	_	_
22	to	to	PART	_	_	23	mark	_	_	_
23	discourage	discourage	VERB	_	_	21	acl	_	_	_
24	the	the	DET	_	_	26	det	_	_	_
25	urban	urban	ADJ	_	_	26	amod	_	_	_
26	vote	vote	NOUN	_	_	23	obj	_	_	_
27	,	,	PUNCT	_	_	26	punct	_	_	_
28	which	which	PRON	_	_	30	nsubj:pass	_	_	_
29	is	be	AUX	_	_	30	aux:pass	_	_	_
30	thought	think	VERB	_	_	26	acl:relcl	_	_	_
31	to	to	PART	_	_	32	mark	_	_	_
32	favor	favor	VERB	_	_	30	xcomp	_	_	0:exp-positive
33	Mugabe	Mugabe	PROPN	_	_	35	nmod:poss	_	_	37:targ
34	's	's	PART	_	_	33	case	_	_	37:targ
35	challenger	challenger	NOUN	_	_	32	obj	_	_	37:targ
36	Morgan	Morgan	PROPN	_	_	32	obj	_	_	37:targ
37	Tsvangirai	Tsvangirai	PROPN	_	_	36	flat	_	_	32:targ
38	.	.	PUNCT	_	_	10	punct	_	_	_```

from semeval22_structured_sentiment.

congchan commented on September 26, 2024

Hi
Good to know that the dataset annotation is not depend on stanza.
I will switch to your stanza version 1.1.1 to avoid any error.
I think the problem comes to this issue stanfordnlp/stanza#804

from semeval22_structured_sentiment.

jerbarnes commented on September 26, 2024

Ok, great! Let me know if using 1.1.1 works and if so, I'll close the issue. If you still have problems and it is the sentiment module that removes the token, we could also always remove that element from the stanza pipeline.

from semeval22_structured_sentiment.

congchan commented on September 26, 2024

Ok, great! Let me know if using 1.1.1 works and if so, I'll close the issue. If you still have problems and it is the sentiment module that removes the token, we could also always remove that element from the stanza pipeline.

The number of sentences in *.json generated by process_mpqa.py with Stanza v1.1.1 is differebt with the Stanza v1.2.3. Also some minor difference in the number of holders.

What amounts of the data are expected?

from semeval22_structured_sentiment.

jerbarnes commented on September 26, 2024

Hi,

I've just tried rerunning process_mpqa.py with both Stanza v1.1.1 and v1.2.3. I get two small differences in tokenization due to how they deal with some punctuation marks on the following two sentences ('temp_fbis/21.50.57-15245-29' and 'ula/118CWL050-40'):

1.1.1 : 'Image- 2.gif'
1.2.3 : 'Image - 2.gif'

1.1.1: 'To receive an application form , check the NAP box on the enclosed pledge card or call us , ( 317 ) 634-6102 , ext. 20 .'
1.2.3: 'To receive an application form , check the NAP box on the enclosed pledge card or call us , ( 317 ) 634-6102 , ext. 20.'

However, all the annotations and number of sentences in train (5873) are the same. Silly question, but just to be safe, have you pulled all the recent changes to the code?

from semeval22_structured_sentiment.

congchan commented on September 26, 2024

Great! That is the same as mine results. Thanks for clarrification.

from semeval22_structured_sentiment.

The collu format last column seems to be index shifted. about semeval22_structured_sentiment HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent