Giter Site home page Giter Site logo

Comments (7)

jerbarnes avatar jerbarnes commented on September 26, 2024

Hi,

The problem isn't caused by the stanza sentiment processer, as this is not used to create the conllu. I'll have a look at this today and see if I can replicate it and then find the cause.

from semeval22_structured_sentiment.

jerbarnes avatar jerbarnes commented on September 26, 2024

Hi again,

Just reran the code using stanza 1.1.1, and the output looks fine (see below). Which stanza version are you running?

In fact, it looks like token 14 ('-') was deleted in your conllu and I imagine that could be the root of the problem. I wonder why it was deleted though?

# text = The opposition Movement for Democratic Change ( MDC ) complained that the set - up was deliberately confusing in a ploy to discourage the urban vote , which is thought to favor Mugabe 's challenger Morgan Tsvangirai .
1	The	the	DET	_	_	3	det	_	_	9:holder
2	opposition	opposition	NOUN	_	_	3	compound	_	_	9:holder
3	Movement	Movement	PROPN	_	_	10	nsubj	_	_	9:holder
4	for	for	ADP	_	_	6	case	_	_	9:holder
5	Democratic	Democratic	PROPN	_	_	6	compound	_	_	9:holder
6	Change	Change	PROPN	_	_	3	nmod	_	_	9:holder
7	(	(	PUNCT	_	_	8	punct	_	_	9:holder
8	MDC	MDC	PROPN	_	_	6	appos	_	_	9:holder
9	)	)	PUNCT	_	_	8	punct	_	_	10:holder
10	complained	complain	VERB	_	_	0	root	_	_	0:exp-negative
11	that	that	SCONJ	_	_	18	mark	_	_	_
12	the	the	DET	_	_	15	det	_	_	15:targ
13	set	set	NOUN	_	_	15	compound	_	_	15:targ
14	-	-	PUNCT	_	_	15	punct	_	_	15:targ
15	up	up	NOUN	_	_	18	nsubj	_	_	10:targ
16	was	be	AUX	_	_	18	cop	_	_	_
17	deliberately	deliberately	ADV	_	_	18	advmod	_	_	_
18	confusing	confusing	ADJ	_	_	10	ccomp	_	_	_
19	in	in	ADP	_	_	21	case	_	_	_
20	a	a	DET	_	_	21	det	_	_	_
21	ploy	ploy	NOUN	_	_	18	obl	_	_	_
22	to	to	PART	_	_	23	mark	_	_	_
23	discourage	discourage	VERB	_	_	21	acl	_	_	_
24	the	the	DET	_	_	26	det	_	_	_
25	urban	urban	ADJ	_	_	26	amod	_	_	_
26	vote	vote	NOUN	_	_	23	obj	_	_	_
27	,	,	PUNCT	_	_	26	punct	_	_	_
28	which	which	PRON	_	_	30	nsubj:pass	_	_	_
29	is	be	AUX	_	_	30	aux:pass	_	_	_
30	thought	think	VERB	_	_	26	acl:relcl	_	_	_
31	to	to	PART	_	_	32	mark	_	_	_
32	favor	favor	VERB	_	_	30	xcomp	_	_	0:exp-positive
33	Mugabe	Mugabe	PROPN	_	_	35	nmod:poss	_	_	37:targ
34	's	's	PART	_	_	33	case	_	_	37:targ
35	challenger	challenger	NOUN	_	_	32	obj	_	_	37:targ
36	Morgan	Morgan	PROPN	_	_	32	obj	_	_	37:targ
37	Tsvangirai	Tsvangirai	PROPN	_	_	36	flat	_	_	32:targ
38	.	.	PUNCT	_	_	10	punct	_	_	_```

from semeval22_structured_sentiment.

congchan avatar congchan commented on September 26, 2024

Hi
Good to know that the dataset annotation is not depend on stanza.
I will switch to your stanza version 1.1.1 to avoid any error.
I think the problem comes to this issue stanfordnlp/stanza#804

from semeval22_structured_sentiment.

jerbarnes avatar jerbarnes commented on September 26, 2024

Ok, great! Let me know if using 1.1.1 works and if so, I'll close the issue. If you still have problems and it is the sentiment module that removes the token, we could also always remove that element from the stanza pipeline.

from semeval22_structured_sentiment.

congchan avatar congchan commented on September 26, 2024

Ok, great! Let me know if using 1.1.1 works and if so, I'll close the issue. If you still have problems and it is the sentiment module that removes the token, we could also always remove that element from the stanza pipeline.

The number of sentences in *.json generated by process_mpqa.py with Stanza v1.1.1 is differebt with the Stanza v1.2.3. Also some minor difference in the number of holders.

What amounts of the data are expected?

from semeval22_structured_sentiment.

jerbarnes avatar jerbarnes commented on September 26, 2024

Hi,

I've just tried rerunning process_mpqa.py with both Stanza v1.1.1 and v1.2.3. I get two small differences in tokenization due to how they deal with some punctuation marks on the following two sentences ('temp_fbis/21.50.57-15245-29' and 'ula/118CWL050-40'):

1.1.1 : 'Image- 2.gif'
1.2.3 : 'Image - 2.gif'

1.1.1: 'To receive an application form , check the NAP box on the enclosed pledge card or call us , ( 317 ) 634-6102 , ext. 20 .'
1.2.3: 'To receive an application form , check the NAP box on the enclosed pledge card or call us , ( 317 ) 634-6102 , ext. 20.'

However, all the annotations and number of sentences in train (5873) are the same. Silly question, but just to be safe, have you pulled all the recent changes to the code?

from semeval22_structured_sentiment.

congchan avatar congchan commented on September 26, 2024

Great! That is the same as mine results. Thanks for clarrification.

from semeval22_structured_sentiment.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.