jerbarnes / semeval22_structured_sentiment Goto Github PK

SemEval-2022 Shared Task 10: Structured Sentiment Analysis

Python 97.15% Shell 2.85%

natural-language-processing semeval-2022 sentiment-analysis structured-prediction

semeval22_structured_sentiment's Introduction

SemEval-2022 Shared Task 10: Structured Sentiment Analysis

This Github repository hosts the data and baseline models for the SemEval-2022 shared task 10 on structured sentiment. In this repository you will find the datasets, baselines, and other useful information on the shared task.

LATEST NEWS

20.02.22: The OpenReview instance for SemEval is now available. You can submit your system description papers here by Feb 28. Remeber tht for SemEval 2022 all deadlines are 23:59 UTC-12 ("anywhere on Earth").

4.02.22: Evaluation period has finished. We encourage all participants to submit a system description paper (more info to come), deadline is February 28th. Guidelines from SemEval. If you do submit, please cite the Shared Task Description and datasets.

11.01.2022: Added test data for the evaluation phase. Additionally, we corrected the problems associated with issue #9 in train, dev, and test splits. We've also updated the example_submission.zip in the evaluation subrepo for the evaluation phase. Therefore, we recommend you pull the changes.

09.12.2021: Updated the evaluation script on codalab, as previous versions gave incorrectly low results when run on codalab due to floor division in python 2.7 (codalab still runs on python 2, see issue #16).

29.11.2021: Updated MPQA processing script to remove annotations with no polarity (see issue #20). If you have the data from before this date, you will need to rerun the preprocessing scripts.

15.11.2021: Updated Darmstadt and MPQA processing script to remove annotations which contain polar expessions with incorrect offsets, giving errors during evaluation (see issue #17). If you have the data from before this date, you will need to rerun the preprocessing scripts.

15.10.2021: Updated Darmstadt processing script to remove annotations which contain polar expessions with no offset (see issue #9).

6.10.2021: Updated MPQA and Darmstadt dev data on the codalab. You may need to check your data to make sure that you're working with the newest version in order to submit.

Problem description

The task is to predict all structured sentiment graphs in a text (see the examples below). We can formalize this as finding all the opinion tuples O = O_i,...,O_n in a text. Each opinion O_i is a tuple (h, t, e, p)

where h is a holder who expresses a polarity p towards a target t through a sentiment expression e, implicitly defining the relationships between the elements of a sentiment graph.

The two examples below (first in English, then in Basque) give a visual representation of these sentiment graphs.

Participants can then either approach this as a sequence-labelling task, or as a graph prediction task.

Subtasks

Monolingual

This track assumes that you train and test on the same language. Participants will need to submit results for seven datasets in five languages.

The datasets can be found in the data directory.

Data

Dataset	Language	# sents	# holders	# targets	# expr.
NoReC_fine	Norwegian	11437	1128	8923	11115
MultiBooked_eu	Basque	1521	296	1775	2328
MultiBooked_ca	Catalan	1678	235	2336	2756
OpeNER_es	Spanish	2057	255	3980	4388
OpeNER_en	English	2494	413	3850	4150
MPQA	English	10048	2279	2452	2814
Darmstadt_unis	English	2803	86	1119	1119

Cross-lingual

This track will explore how well models can generalize across languages. The test data will be the MultiBooked Datasets (Catalan and Basque) and the OpeNER Spanish dataset. For training, you can use any of the other datasets, as well as any other resource that does not contain sentiment annotations in the target language.

This setup is often known as zero-shot cross-lingual transfer and we will assume that all submissions follow this format.

Evaluation

The two subtasks will be evaluated separately. In both tasks, the evaluation will be based on Sentiment Graph F₁.

This metric defines true positive as an exact match at graph-level, weighting the overlap in predicted and gold spans for each element, averaged across all three spans.

For precision we weight the number of correctly predicted tokens divided by the total number of predicted tokens (for recall, we divide instead by the number of gold tokens), allowing for empty holders and targets which exist in the gold standard.

The leaderboard for each dataset, as well as the average of all 7. The winning submission will be the one that has the highest average Sentiment Graph F₁.

Data format

We provide the data in json lines format.

Each line is an annotated sentence, represented as a dictionary with the following keys and values:

'sent_id': unique sentence identifiers
'text': raw text version of the previously tokenized sentence
opinions': list of all opinions (dictionaries) in the sentence

Additionally, each opinion in a sentence is a dictionary with the following keys and values:

'Source': a list of text and character offsets for the opinion holder
'Target': a list of text and character offsets for the opinion target
'Polar_expression': a list of text and character offsets for the opinion expression
'Polarity': sentiment label ('Negative', 'Positive', 'Neutral')
'Intensity': sentiment intensity ('Average', 'Strong', 'Weak')

{
    "sent_id": "../opener/en/kaf/hotel/english00164_c6d60bf75b0de8d72b7e1c575e04e314-6",

    "text": "Even though the price is decent for Paris , I would not recommend this hotel .",

    "opinions": [
                 {
                    "Source": [["I"], ["44:45"]],
                    "Target": [["this hotel"], ["66:76"]],
                    "Polar_expression": [["would not recommend"], ["46:65"]],
                    "Polarity": "Negative",
                    "Intensity": "Average"
                  },
                 {
                    "Source": [[], []],
                    "Target": [["the price"], ["12:21"]],
                    "Polar_expression": [["decent"], ["25:31"]],
                    "Polarity": "Positive",
                    "Intensity": "Average"}
                ]
}

You can import the data by using the json library in python:

>>> import json
>>> with open("data/norec/train.json") as infile:
            norec_train = json.load(infile)

Resources:

The task organizers provide training data, but participants are free to use other resources (word embeddings, pretrained models, sentiment lexicons, translation lexicons, translation datasets, etc). We do ask that participants document and cite their resources well.

We also provide some links to what we believe could be helpful resources:

Submission via Codalab

Submissions will be handled through our codalab competition website.

Baselines

The task organizers provide two baselines: one that takes a sequence-labelling approach and a second that converts the problem to a dependency graph parsing task. You can find both of them in baselines.

Important dates

Training data ready: September 3, 2021
Evaluation data ready: December 3, 2021
Evaluation start: January 10, 2022
Evaluation end: by January 31, 2022
Paper submissions due: February 28, 2022
Notification to authors: March 31, 2022

Frequently asked questions

Q: How do I participate?

A: Sign up at our codalab website, download the data, train the baselines and submit the results to the codalab website.

Q: Can I run the graph parsing baseline on GPU?

A: The code is not set up to run on GPU right now, but if you want to implement it and issue a pull request, we'd be happy to incorporate that into the repository.

Task organizers

Corresponding organizers
- Jeremy Barnes: contact for info on task, participation, etc. ([email protected])
- Andrey Kutuzov: [email protected]
Organizers
- Jan Buchman
- Laura Ana Maria Oberländer
- Enrica Troiano
- Rodrigo Agerri
- Lilja Øvrelid
- Erik Velldal
- Stephan Oepen

Mailing list (Google group) for the task

Citation

If you use the baselines or data from this shared task, please cite the following paper, as well as the papers for the specific datasets that you use (see the bib files that follow afterwards) :

@inproceedings{barnes-etal-2022-semeval,
    title = "{S}em{E}val-2022 Task 10: Structured Sentiment Analysis",
    author = "Barnes, Jeremy and
              Oberl{\"a}nder, Laura Ana Maria and
              Troiano, Enrica and
              Kutuzov, Andrey and
              Buchmann, Jan and
              Agerri, Rodrigo and
              {\O}vrelid, Lilja  and
              Velldal, Erik",
    booktitle = "Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)",
    month = july,
    year = "2022",
    address = "Seattle",
    publisher = "Association for Computational Linguistics"
}

For the specific datasets, use the following:

NoReC

@inproceedings{ovrelid-etal-2020-fine,
    title = "A Fine-grained Sentiment Dataset for {N}orwegian",
    author = "{\O}vrelid, Lilja  and
      M{\ae}hlum, Petter  and
      Barnes, Jeremy  and
      Velldal, Erik",
    booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2020.lrec-1.618",
    pages = "5025--5033",
    abstract = "We here introduce NoReC{\_}fine, a dataset for fine-grained sentiment analysis in Norwegian, annotated with respect to polar expressions, targets and holders of opinion. The underlying texts are taken from a corpus of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more. We here present a detailed description of this annotation effort. We provide an overview of the developed annotation guidelines, illustrated with examples and present an analysis of inter-annotator agreement. We also report the first experimental results on the dataset, intended as a preliminary benchmark for further experiments.",
    language = "English",
    ISBN = "979-10-95546-34-4",
}

MultiBooked

@inproceedings{barnes-etal-2018-multibooked,
    title = "{M}ulti{B}ooked: A Corpus of {B}asque and {C}atalan Hotel Reviews Annotated for Aspect-level Sentiment Classification",
    author = "Barnes, Jeremy  and
      Badia, Toni  and
      Lambert, Patrik",
    booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)",
    month = may,
    year = "2018",
    address = "Miyazaki, Japan",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://aclanthology.org/L18-1104",
}

OpeNER

@inproceedings{Agerri2013,
author = {Agerri, Rodrigo and Cuadros, Montse and Gaines, Sean and Rigau, German},
booktitle = {Sociedad Espa{\~{n}}ola para el Procesamiento del Lenguaje Natural},
pages = {215--218},
title = {{OpeNER: Open polarity enhanced named entity recognition.}},
volume = {51},
year = {2013}
}

MPQA

@article{Wiebe2005b,
author = {Wiebe, Janyce
        and Wilson, Theresa
        and Cardie, Claire},
journal = {Language Resources and Evaluation},
number = {2-3},
pages = {165--210},
title = {{Annotating expressions of opinions and emotions in language}},
volume = {39},
year = {2005}
}

Darmstadt Service Reviews

@inproceedings{toprak-etal-2010-sentence,
    title = "Sentence and Expression Level Annotation of Opinions in User-Generated Discourse",
    author = "Toprak, Cigdem  and
      Jakob, Niklas  and
      Gurevych, Iryna",
    booktitle = "Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2010",
    address = "Uppsala, Sweden",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P10-1059",
    pages = "575--584",
}

semeval22_structured_sentiment's People

Contributors

Stargazers

Watchers

Forkers

drnemor pmhalvor bluecoder5015 akkarimi huyennguyenhelen arefrazavi amirsartipi13 davda54 hamedhematian butyuhao zbirenbaum jzh1qaz colinoooooh realredfancy loujie0822 jesusmiguelgarcia huyenbui117 ybz79 binliang-nlp l294265421 rafalposwiata sanatbekmatlatipov pratyush98 samikd yoyo-yun enricaims shreyassheshadri quyenthucdoan aliceymh jayeeeeee fifirex sadra-barikbin killshot667 tharindudr martinkozle antoinecyr mystreamer wtwong316 liammay

semeval22_structured_sentiment's Issues

Request for requirements.txt for baseline codes

I am keeping encountering bugs caused by version mismatch (torch torchtext & numpy) while reproducing baselines results. Could you please provide us a package version config file (such as requirements.txt) so that we can run the baseline codes more smoothly?

Error when getting the baselines with graph_parser

Hi, I'm trying to run the get_baseline.sh inside baselines/graph_parser but I get this error during the execution of the script:

##################################################
Predicting data from sentiment_graphs/opener_en/head_final/train.conllu
Loading data from sentiment_graphs/opener_en/head_final/train.conllu
......
Train F1 on epoch 100 is 94.96%
Have not seen any improvement for 3 epochs
Best F1 was 95.16% seen at epoch #97
Loading External Vectors
8
Restoring model from experiments/opener_en/head_final/best_model.save
Traceback (most recent call last):
  File "./src/main.py", line 399, in <module>
    run_parser(get_args())
  File "./src/main.py", line 394, in run_parser
    predict(model, args, args.val, args.elmo_dev, vocabs)
  File "./src/main.py", line 300, in predict
    pred_path = settings.dir + to_predict.split("/")[-1] + ".pred"
AttributeError: 'NoneType' object has no attribute 'split'

Regards.

Some details about data

In the monolingual subsection in README, there's a sentence:

This track assumes that you train and test on the same language.

Does this mean we can use extra training data in the same language other than the given training set? For example, can we use MPQA+Opener_en training set (or some other mined data in English) to train models and test on MPQA?

Mistake in darmstadt_unis annotation

Hi,
darmstadt_unis train sample 1666 is seemingly wrong. It is:

{'opinions': [{'Intensity': 'Strong',
   'Polar_expression': [['All', 'care money'], ['0:3', '9:28']],
   'Polarity': 'Negative',
   'Source': [[], []],
   'Target': [['they'], ['4:8']]},
  {'Intensity': 'Average',
   'Polar_expression': [['not', 'care education'], ['31:34', '9:44']],
   'Polarity': 'Negative',
   'Source': [[], []],
   'Target': [['they'], ['4:8']]}],
 'sent_id': 'Colorado_Technical_University_Online_1_07-10-2008-17',
 'text': 'All they care about is money , not education .'}

In polar expression, care education is not a contiguous expression in the sentence. Maybe care should be replaced with not and vice versa.

Also in train sample 2015 the same thing is happened.

{'opinions': [{'Intensity': 'Strong',
   'Polar_expression': [['great', 'made sense'], ['28:33', '19:49']],
   'Polarity': 'Positive',
   'Source': [[], []],
   'Target': [['Online Program'], ['4:18']]}],
 'sent_id': 'University_of_Phoenix_Online_112_01-09-2006-5',
 'text': "The Online Program made for great practical sense given my wife's pregnancy and time constraints ."}

In polar expression, made sense is not a contiguous expression in the sentence. It should have been separated.

Finally in dev sample 55 the same thing is happened:

{'opinions': [{'Intensity': 'Weak',
   'Polar_expression': [['some', 'deserves improvement'], ['35:39', '26:51']],
   'Polarity': 'Negative',
   'Source': [[], []],
   'Target': [['instruction'], ['14:25']]}],
 'sent_id': 'University_of_Maryland_University_College_3_01-03-2008-2',
 'text': 'The medium of instruction deserves some improvement , but otherwise , I think it has all the pros and cons of any regular F2F university .'}

In polar expression, deserves improvement is not a contiguous expression in the sentence. It should have been separated or have been combined with some.

About pytorch version

Processing Darmstadt on OSX

Hey there,

Thanks for the great repo! Just wanted to point out a little issue with processing the Darmstadt files on OSX. On OSX the sed command works a little differently so line 20 of process_darmstadt.sh should be:

grep -rl "&" universities/basedata | xargs sed -i '' -e 's/&/and/g'

Here's an explanation on StackOverflow: https://stackoverflow.com/questions/19456518/error-when-using-sed-with-find-command-on-os-x-invalid-command-code

Otherwise the script fails with the following error due to the rogue ampersands in the XML file:

...
  inflating: universities/customization/SentenceOpinionAnalysisResult_customization.xml  
sed: 1: "universities/basedata/U ...": invalid command code u
Traceback (most recent call last):
  File "/Users/amith/Documents/columbia/phd/sourceid/corpora/semeval22_structured_sentiment/data/darmstadt_unis/process_darmstadt.py", line 475, in <module>
    o = get_opinions(bfile, mfile)
  File "/Users/amith/Documents/columbia/phd/sourceid/corpora/semeval22_structured_sentiment/data/darmstadt_unis/process_darmstadt.py", line 113, in get_opinions
    text += token + " "
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Happy to open a PR with the change (tried pushing a branch but I think the repo is restricted).

Gaps between local evaluation and online evaluation

Hello, I have evaluated my predictions locally with "./evaluation/evaluates.py" and "dev.json" (the latest version) as the gold file before submitted them to Codalab. However, results from online evaluation are much lower than my local results (espeically for norec, a 9.4 drop in SF1). Am I using the wrong gold file? Thank you!

darmstadt_unis head_first got F1 0.00%

Hi,

Is this dataset not suited for head_first setup? Or are there some details I have missed.

bash ./sentgraph.sh darmstadt_unis head_first 17181920

Done
Primary Dev F1 on epoch 13 is 0.00%
Have not seen any improvement for 12 epochs
Best F1 was 0.00% seen at epoch #1

Mistake in Multibooked_eu annotation

Hi,
There is a mistake in train sample 832:

{'opinions': [{'Intensity': 'Strong',
   'Polar_expression': [['ez zebilen bat ere ondo'], ['6:29']],
   'Polarity': 'Negative',
   'Source': [['Nik', 'nuen'], ['0:3', '7:11']],
   'Target': [['WiFia'], ['0:5']]}],
 'sent_id': 'multibooked/corpora/eu/kaype-quintamar-llanes_1-1',
 'text': 'WiFia ez zebilen bat ere ondo hala ere , kontuan izanik oporrak egunerokotasunetik deskonektatzeko direla ......'}

Nik in source expression is not in sentence's 0:3

problems in submitting zip file

Create the right structured zip file and send it to CodaLab.
Get mistake:

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. Traceback (most recent call last): File "/tmp/codalab/tmpSeGUFE/run/program/evaluate.py", line 272, in main() File "/tmp/codalab/tmpSeGUFE/run/program/evaluate.py", line 236, in main with open(submission_answer_file) as infile: IOError: [Errno 2] No such file or directory: '/tmp/codalab/tmpSeGUFE/run/input/res/monolingual/norec/predictions.json'

Compare with your example example_submission.zip and it is okay. The same file's naming. The same structure. Everything is the same.
Send your example and it is okay for the system. But when I unzip it and again zip it is the same mistake.

What am I doing wrong?

Miss sentences in test

Hi, we submit a result to the evaluation and got the error of missing sentences as follows. However, the missing ids are not included in the test data. Could you please help us to deal with it.

assert g.issubset(p), "missing some sentences: {}".format(g.difference(p))
AssertionError: missing some sentences: set([u'000778-54-01', u'000335-01-01', u'108702-07-01', u'700798-09-01', u'108702-07-03', u'108702-07-02', u'701679-14-01', u'000227-14-01', u'500718-05-02', u'500718-05-03', u'500718-05-01', u'301571-18-01', u'000778-11-02', u'700798-09-02', u'000778-11-03', u'700798-09-03', u'200102-06-01', u'200102-06-02', u'104926-03-05', u'500718-09-04', u'500718-09-05', u'500718-09-02', u'500718-09-03', u'700798-15-01', u'500718-09-01', u'104926-03-06', u'202263-10-01', u'200102-02-01', u'200102-02-02', u'002907-08-02', u'002907-08-03', u'002907-08-01', u'000227-17-01', u'000286-10-02', u'000286-10-01', u'000227-17-02', u'200942-10-02', u'200942-10-01', u'304685-06-02', u'304685-06-03', u'304685-06-01', u'000735-05-02', u'000735-05-01', u'105373-03-02', u'000778-42-03', u'105373-03-01', u'000778-42-06', u'000778-42-04', u'004567-08-01', u'400295-09-01', u'300002-02-01', u'202263-05-01', u'300433-07-04', u'300433-07-01', u'300433-07-02', u'300433-07-03', u'304825-01-02', u'304825-01-01', u'000227-11-02', u'000778-04-01', u'000778-04-02', u'000778-04-03', u'700798-15-09', u'200102-20-01', u'200102-20-02', u'600614-02-01', u'000227-06-03', u'000227-06-02', u'000227-06-01', u'000778-21-01', u'000778-21-03', u'000778-21-02', u'104987-18-01', u'102329-24-01', u'102329-24-03', u'102329-24-02', u'102329-24-05', u'102329-24-04', u'102329-24-07', u'102329-24-06', u'104987-14-06', u'104987-14-04',

Accuracy of baseline values

Hi,

I am using the latest code available in the repo and have trained both baselines (graph parser and sequence labeling). Then I have measured the Sentiment Tuple F1 with the dev.json file and I am getting this values:

	Graph Parsing	Sequence Labeling
Darmstadt Unis	0.077	0.107
MPQA	0.141	0.016
Multibooked (CA)	0.534	0.286
Multibooked (EU)	0.545	0.372
Norec	0.296	0.190
Opener (EN)	0.546	0.339
Opener (ES)	0.536	0.328

For Graph Parsing, the values are around 0.5xx except for Darmstadt Unis, MPQA and Norec which are lower, specially Darmstadt Unis.
For Sequence Labeling, the values are around 0.3xx except for Darmstadt Unis, MPQA and Norec which are lower, specially MPQA.

Are those values correct to be taken as a reference or I am doing something wrong training the models or when I do the inference to get the scores?

Regards.

the test set of norec

Dear Jeremy,

I have not found the test set of norec.
May I ask whether it has not been released or I have missed any important information?

Best,
Ren Mengjie

Intensity labels

Dear organizers,

The vocabulary for the intensity label is confusing, there seems to be some cases with None, can we consider those to be standard/average (for example: ula/Article247_500-9) ? Also is there a difference between the label standard and average?, the neutral class seems to contain no instances labeled with standard.

Best,
Rob

Difference in MPQA test set

Hi, I used the provided script process_mpqa.sh to process the mpqa dataset and get 2058 sentences. However, I found that the test set of mpqa in example submission has 2111 sentences. Several sentences are missed:

How can I fix this?

Error when getting the baselines with sequence_labeling

Hi, I'm trying to run the get_baselines.sh inside baselines/sequence_labeling but I get this error during the execution of the script:

Namespace(ANNOTATION='targets', BATCH_SIZE=50, DATADIR='darmstadt_unis', DEVDATA=False, EMBEDDINGS='../graph_parser/embeddings/18.zip', HIDDEN_DIM=100, NUM_LAYERS=1, OUTDIR='saved_models', TRAIN_EMBEDDINGS=False, save_all=False)
Traceback (most recent call last):
  File "extraction_module.py", line 198, in <module>
    annotation=annotation
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 215, in get_split
    return Split(self.open_split(filename, lower_case, annotation=annotation))
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 201, in open_split
    torch.LongTensor(self.label2idx.labels2idxs(item.targets, annotation="targets"))) for item in data]
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 201, in <listcomp>
    torch.LongTensor(self.label2idx.labels2idxs(item.targets, annotation="targets"))) for item in data]
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 98, in labels2idxs
    return [self.label2idx[annotation][label] for label in labels]
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 98, in <listcomp>
    return [self.label2idx[annotation][label] for label in labels]
KeyError: 'B-targ-positive'
Namespace(ANNOTATION='expressions', BATCH_SIZE=50, DATADIR='darmstadt_unis', DEVDATA=False, EMBEDDINGS='../graph_parser/embeddings/18.zip', HIDDEN_DIM=100, NUM_LAYERS=1, OUTDIR='saved_models', TRAIN_EMBEDDINGS=False, save_all=False)
Traceback (most recent call last):
  File "extraction_module.py", line 198, in <module>
    annotation=annotation
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 215, in get_split
    return Split(self.open_split(filename, lower_case, annotation=annotation))
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 205, in open_split
    torch.LongTensor(self.label2idx.labels2idxs(item.expressions, annotation="expressions"))) for item in data]
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 205, in <listcomp>
    torch.LongTensor(self.label2idx.labels2idxs(item.expressions, annotation="expressions"))) for item in data]
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 98, in labels2idxs
    return [self.label2idx[annotation][label] for label in labels]
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 98, in <listcomp>
    return [self.label2idx[annotation][label] for label in labels]
KeyError: 'B-exp-negative'
Namespace(BATCH_SIZE=50, DATADIR='darmstadt_unis', DEVDATA=False, EMBEDDINGS='../graph_parser/embeddings/18.zip', HIDDEN_DIM=100, LEARNING_RATE=0.001, NUM_LAYERS=1, OUTDIR='saved_models', POOLING='max', TRAIN_EMBEDDINGS=False, save_all=False)

And:

Namespace(ANNOTATION='sources', BATCH_SIZE=50, DATADIR='mpqa', DEVDATA=False, EMBEDDINGS='../graph_parser/embeddings/18.zip', HIDDEN_DIM=100, NUM_LAYERS=1, OUTDIR='saved_models', TRAIN_EMBEDDINGS=False, save_all=False)
Traceback (most recent call last):
  File "extraction_module.py", line 198, in <module>
    annotation=annotation
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 215, in get_split
    return Split(self.open_split(filename, lower_case, annotation=annotation))
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 193, in open_split
    data = torchtext.data.TabularDataset(data_file, format="json", fields={"sent_id": ("sent_id", sent_id), "text": ("text", text), "sources": ("sources", sources), "targets": ("targets", targets), "expressions": ("expressions", expressions)})
  File "/home/iago/anaconda3/envs/syncap/lib/python3.6/site-packages/torchtext/data/dataset.py", line 251, in __init__
    with io.open(os.path.expanduser(path), encoding="utf8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/extraction/mpqa/train.json'
Namespace(ANNOTATION='targets', BATCH_SIZE=50, DATADIR='mpqa', DEVDATA=False, EMBEDDINGS='../graph_parser/embeddings/18.zip', HIDDEN_DIM=100, NUM_LAYERS=1, OUTDIR='saved_models', TRAIN_EMBEDDINGS=False, save_all=False)
Traceback (most recent call last):
  File "extraction_module.py", line 198, in <module>
    annotation=annotation
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 215, in get_split
    return Split(self.open_split(filename, lower_case, annotation=annotation))
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 193, in open_split
    data = torchtext.data.TabularDataset(data_file, format="json", fields={"sent_id": ("sent_id", sent_id), "text": ("text", text), "sources": ("sources", sources), "targets": ("targets", targets), "expressions": ("expressions", expressions)})
  File "/home/iago/anaconda3/envs/syncap/lib/python3.6/site-packages/torchtext/data/dataset.py", line 251, in __init__
    with io.open(os.path.expanduser(path), encoding="utf8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/extraction/mpqa/train.json'
Namespace(ANNOTATION='expressions', BATCH_SIZE=50, DATADIR='mpqa', DEVDATA=False, EMBEDDINGS='../graph_parser/embeddings/18.zip', HIDDEN_DIM=100, NUM_LAYERS=1, OUTDIR='saved_models', TRAIN_EMBEDDINGS=False, save_all=False)
Traceback (most recent call last):
  File "extraction_module.py", line 198, in <module>
    annotation=annotation
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 215, in get_split
    return Split(self.open_split(filename, lower_case, annotation=annotation))
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 193, in open_split
    data = torchtext.data.TabularDataset(data_file, format="json", fields={"sent_id": ("sent_id", sent_id), "text": ("text", text), "sources": ("sources", sources), "targets": ("targets", targets), "expressions": ("expressions", expressions)})
  File "/home/iago/anaconda3/envs/syncap/lib/python3.6/site-packages/torchtext/data/dataset.py", line 251, in __init__
    with io.open(os.path.expanduser(path), encoding="utf8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/extraction/mpqa/train.json'
Namespace(BATCH_SIZE=50, DATADIR='mpqa', DEVDATA=False, EMBEDDINGS='../graph_parser/embeddings/18.zip', HIDDEN_DIM=100, LEARNING_RATE=0.001, NUM_LAYERS=1, OUTDIR='saved_models', POOLING='max', TRAIN_EMBEDDINGS=False, save_all=False)
loading embeddings from ../graph_parser/embeddings/18.zip
Traceback (most recent call last):
  File "relation_prediction_module.py", line 269, in <module>
    "train.json"))
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 154, in get_split
    return RelationSplit(self.open_split(filename, lower_case))
  File "/home/iago/Escritorio/SemEval-2022 Shared Task 10: Structured Sentiment Analysis/baselines/sequence_labeling/utils.py", line 145, in open_split
    data = torchtext.data.TabularDataset(data_file, format="json", fields={"sent_id": ("sent_id", sent_id), "text": ("text", text), "e1": ("e1", e1), "e2": ("e2", e2), "label": ("label", label)})
  File "/home/iago/anaconda3/envs/syncap/lib/python3.6/site-packages/torchtext/data/dataset.py", line 251, in __init__
    with io.open(os.path.expanduser(path), encoding="utf8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/relations/mpqa/train.json'
Namespace(ANNOTATION='sources', BATCH_SIZE=50, DATADIR='multibooked_ca', DEVDATA=False, EMBEDDINGS='../graph_parser/embeddings/34.zip', HIDDEN_DIM=100, NUM_LAYERS=1, OUTDIR='saved_models', TRAIN_EMBEDDINGS=False, save_all=False)

Regards.

Doubts about the cross-lingual task

Hi,

When reading the details of the multilingual task, it is indicated that the test languages will be Catalan, Basque and Spanish. I'm wondering whether the dev files for Catalan, Basque and Spanish can be used for model selection or not during training.

Regards.

null values in MPQA dataset

null values exist both in keys Intensity and Polarity in MPQA dataset.
This bug occurs in the newest commit 99f2f885aba188a52afeb9523cc9b9d194465d85.

Doubt about scripts for data file conversion

Hi,

Looking at the code I see that there are some scripts to convert data formats.

convert_to_conllu.py in baselines/graph_parser.
convert_to_bio.py in baselines/sequence_labeling.
convert_to_rels.py in baselines/sequence_labeling.

With convert_to_conllu.py I can convert the JSON files to CoNLL-U but there is a script to convert from CoNLL-U to JSON or that script has to be made as part of the shared task?

In case of convert_to_bio.py and convert_to_rels.py I've some doubts, I can see that convert_to_bio.py is used within relation_prediction_module.py but for convert_to_rels.py I don't find where is used or for what is needed. What is each one used for?

Regards.

Missing samples in MPQA dataset

When I am trying to submit submission.zip on CodaLab, this error is raised by the online scoring script:

Traceback (most recent call last):
  File "/tmp/codalab/tmpuNjjC5/run/program/evaluate.py", line 269, in 
    main()
  File "/tmp/codalab/tmpuNjjC5/run/program/evaluate.py", line 246, in main
    assert g.issubset(p), "missing some sentences: {}".format(g.difference(p))
AssertionError: missing some sentences: set([u'20011204/21.34.10-25509-3', u'20020113/03.19.43-8352-7', u'ula/110CYL068-30', u'xbank/wsj_0266-8', u'20020113/03.20.33-11983-4', u'ula/sw2078-UTF16-ms98-a-trans-211', u'ula/sw2078-UTF16-ms98-a-trans-210', u'non_fbis/16.01.33-12919-50', u'ula/110CYL067-46', u'ula/110CYL067-45', u'ula/110CYL067-44', u'20020507/17.55.53-20579-19', u'xbank/wsj_0189-15', u'20020316/20.37.48-18053-23', u'20020316/20.37.48-18053-24', u'xbank/wsj_0679-11', u'non_fbis/16.01.33-12919-49', u'20020411/22.23.02-12197-18', u'non_fbis/06.12.31-26764-20', u'ula/sw2078-UTF16-ms98-a-trans-208', u'xbank/wsj_0176-7', u'20020318/20.48.00-11907-39', u'ula/115CVL035-14', u'xbank/wsj_0557-20', u'xbank/wsj_0173-10', u'20020517/22.08.22-24562-17', u'xbank/wsj_0144-9', u'xbank/wsj_0136-7', u'20020513/21.31.14-23484-9', u'20020517/22.08.22-24562-15', u'20020517/22.08.22-24562-16', u'20020320/12.04.40-21590-29', u'xbank/wsj_0762-6', u'20011130/12.33.55-762-11', u'20011204/21.34.10-25509-4', u'non_fbis/09.35.06-27851-12', u'20020302/21.01.08-20603-7', u'non_fbis/09.35.06-27851-11', u'20020517/22.08.22-24562-18', u'xbank/wsj_1038-9', u'ula/sw2078-UTF16-ms98-a-trans-205', u'ula/sw2078-UTF16-ms98-a-trans-206', u'ula/sw2078-UTF16-ms98-a-trans-207', u'ula/sw2015-ms98-a-trans-41', u'ula/sw2015-ms98-a-trans-40', u'ula/115CVL035-17', u'ula/115CVL035-16', u'ula/110CYL068-31', u'xbank/wsj_0679-10', u'20020516/22.23.24-9583-11', u'ula/sw2078-UTF16-ms98-a-trans-209', u'20011206/21.18.24-28147-9', u'xbank/wsj_0189-17', u'20011221/20.54.40-10484-15', u'20011221/20.54.40-10484-14', u'xbank/wsj_0991-6', u'ula/110CYL200-23', u'20010620/13.40.05-15087-17', u'20020409/22.17.52-18926-11', u'20020123/21.21.45-6259-23', u'xbank/wsj_0551-4', u'non_fbis/06.12.31-26764-19', u'20010706/02.01.27-21386-7', u'xbank/wsj_0068-7', u'xbank/wsj_0189-16', u'ula/115CVL035-15', u'20020206/20.31.05-16359-20', u'xbank/wsj_0811-11', u'non_fbis/08.06.09-13335-9'])

After searching these sent_id in data/mpqa/dev.json， I found that these samples are missing, but these ids can still be found in example_submission.zip. I am wondering that if it is a bug caused by commit e63e80140d8673def09f3471c95790c988d8acd5 which modified the MPQA preprocessing script or I did something wrong while preprocessing this dataset?

Conflicting data usage rules about crosslingual task

In ./README.md the data usage rule about subtask 2 is:

For training, you can use any of the other datasets, as well as any other resource that does not contain sentiment annotations in the target language.

In ./data/README.md the data usage rule about subtask 2 is:

This track will instead train only on a high-resource language (English) and test on several languages.

From my perspective, these two descriptions are quite different. We have spent a lot of time optimizing our cross-lingual model under the rule in ./README.md. And now it is the evaluation phase, we hope that there's a solution that fairly merges the results under different rules.

The collu format last column seems to be index shifted.

Hi,

I encounter an error.

This is a sample from mpqa.

Two possible error:

The last column indicate "challenger Morgan Tsvangirai" rooted at 38
while id=33 indicate 's as the root 0:exp-positive, I guess it should be the id=31 instead to be the 0:exp-positive

# sent_id = non_fbis/03.47.06-11142-11
# text = The opposition Movement for Democratic Change ( MDC ) complained that the set   up was deliberately confusing in a ploy to discourage the urban vote , which is thought to favor Mugabe 's challenger Morgan Tsvangirai .
1	The	the	DET	_	_	3	det	_	_	9:holder
2	opposition	opposition	NOUN	_	_	3	compound	_	_	9:holder
3	Movement	Movement	PROPN	_	_	10	nsubj	_	_	9:holder
4	for	for	ADP	_	_	6	case	_	_	9:holder
5	Democratic	Democratic	ADJ	_	_	6	amod	_	_	9:holder
6	Change	Change	PROPN	_	_	3	nmod	_	_	9:holder
7	(	(	PUNCT	_	_	8	punct	_	_	9:holder
8	MDC	MDC	PROPN	_	_	6	appos	_	_	9:holder
9	)	)	PUNCT	_	_	8	punct	_	_	10:holder
10	complained	complain	VERB	_	_	0	root	_	_	0:exp-negative
11	that	that	SCONJ	_	_	17	mark	_	_	_
12	the	the	DET	_	_	13	det	_	_	15:targ
13	set	set	NOUN	_	_	17	nsubj	_	_	15:targ
14	up	up	ADP	_	_	17	nsubj	_	_	15:targ
15	was	be	AUX	_	_	17	cop	_	_	10:targ
16	deliberately	deliberately	ADV	_	_	17	advmod	_	_	_
17	confusing	confusing	ADJ	_	_	10	ccomp	_	_	_
18	in	in	ADP	_	_	20	case	_	_	_
19	a	a	DET	_	_	20	det	_	_	_
20	ploy	ploy	NOUN	_	_	17	obl	_	_	_
21	to	to	PART	_	_	22	mark	_	_	_
22	discourage	discourage	VERB	_	_	20	acl	_	_	_
23	the	the	DET	_	_	25	det	_	_	_
24	urban	urban	ADJ	_	_	25	amod	_	_	_
25	vote	vote	NOUN	_	_	22	obj	_	_	_
26	,	,	PUNCT	_	_	29	punct	_	_	_
27	which	which	PRON	_	_	29	nsubj:pass	_	_	_
28	is	be	AUX	_	_	29	aux:pass	_	_	_
29	thought	think	VERB	_	_	25	acl:relcl	_	_	_
30	to	to	PART	_	_	31	mark	_	_	_
31	favor	favor	VERB	_	_	29	xcomp	_	_	_
32	Mugabe	Mugabe	PROPN	_	_	34	nmod:poss	_	_	_
33	's	's	PART	_	_	32	case	_	_	0:exp-positive
34	challenger	challenger	NOUN	_	_	31	obj	_	_	38:targ
35	Morgan	Morgan	PROPN	_	_	34	appos	_	_	38:targ
36	Tsvangirai	Tsvangirai	PROPN	_	_	35	flat	_	_	38:targ
37	.	.	PUNCT	_	_	10	punct	_	_	38:targ

the problem was possibly caused by stanza sentiment processer, stanfordnlp/stanza#804 ,
It would be greate if you could help to verify the stanza version, and if the dataset processsing depends on the stanza sentiment processer or not?

Mistake in MPQA annotation

Hi,
Thanks for organizing this task.

Target expression of third opinion in MPQA train sample 1898 is seemingly wrong. It is the it within with !

   'Polar_expression': [['expressed the hope'], ['27:45']],
   'Polarity': 'Positive',
   'Source': [['Iranian president'], ['4:21']],
   'Target': [['it'], ['52:54']]}

Sentence is:
The Iranian president also expressed the hope that with the return of President Chavez the Venezuelan government would be able to achieve its exalted objectives with the support of the people .

Also Target expression of second opinion in MPQA train sample 2293 is wrong as well. It is the American within Americans.

{'Intensity': 'Weak',
   'Polar_expression': [['avowed'], ['244:250']],
   'Polarity': 'Neutral',
   'Source': [['generals'], ['264:272']],
   'Target': [['American'], ['25:33']]}

Sentence is:
In direct response , the Americans , who preach democracy to the entire planet , recognizing that they could no longer tolerate someone like Chavez reverted to their old bullying and dictatorial tactics and arranged for an army coup by several avowed pro-American generals , who managed to remove Chavez from power in a few short hours .

Furthermore, MPQA dev sample 2007 seems to be totally wrong:

{'opinions': [{'Intensity': 'Average',
   'Polar_expression': [[','], ['17:18']],
   'Polarity': 'Positive',
   'Source': [['sa'], ['12:14']],
   'Target': [[','], ['7:8']]}],
 'sent_id': 'xbank/wsj_0583-27',
 'text': 'Sansui , he said , is a perfect fit for Polly Peck \'s electronics operations , which make televisions , videocassette recorders , microwaves and other products on an " original equipment maker " basis for sale under other companies \' brand names .'}

Inconsistencies in dataset annotations

Hi!
I've just found an inconsistency in the Darmstadt dataset dev split. I haven't checked whether this also occurs in different datasets or in different splits.

Two back-to-back examples in the dev split look like this:

{
  "sent_id": "DeVry_University_95_05-16-2004-6",
  "text": "I can't overemphasize that enough .",
  "opinions": [
    {
      "Source": [
        [],
        []
      ],
      "Target": [
        [
          "that"
        ],
        [
          "22:26"
        ]
      ],
      "Polar_expression": [
        [
          "can't overemphasize enough"
        ],
        [
          "2:33"
        ]
      ],
      "Polarity": "Positive",
      "Intensity": "Strong"
    }
  ]
},
{
  "sent_id": "DeVry_University_95_05-16-2004-7",
  "text": "The school gives students a knowledge base that makes them extremely competitive in the corporate world .",
  "opinions": [
    {
      "Source": [
        [],
        []
      ],
      "Target": [
        [
          "students"
        ],
        [
          "17:25"
        ]
      ],
      "Polar_expression": [
        [
          "extremely",
          "competitive"
        ],
        [
          "59:68",
          "69:80"
        ]
      ],
      "Polarity": "Positive",
      "Intensity": "Strong"
    }
  ]
},

Usually the datapoints are handled like in the second sentence: polar expressions (as well as source and target fields for that matter) are whitespace separated, even if the words are directly back-to-back. In the first sentence the whole polar expression is listed as a whole though and the span("2:33") even includes the target word("that"|"22:26") while it is not present in the string("can't overemphasize enough").
I'm actually unsure whether this issue stems for the provided preprocessing function or the underlying dataset.

I also noticed that for this example sentence both splitting methods are applied for polar_expression:

{
  "sent_id": "Capella_University_50_12-09-2005-3",
  "text": "I have found the course work and research more challenging and of higher quality at Capella than at any of the other institutions I graduated from .",
  "opinions": [
    {
      "Source": [
        [],
        []
      ],
      "Target": [
        [
          "course work research"
        ],
        [
          "17:41"
        ]
      ],
      "Polar_expression": [
        [
          "higher quality"
        ],
        [
          "66:80"
        ]
      ],
      "Polarity": "Positive",
      "Intensity": "Average"
    },
    {
      "Source": [
        [],
        []
      ],
      "Target": [
        [
          "course work research"
        ],
        [
          "17:41"
        ]
      ],
      "Polar_expression": [
        [
          "more",
          "challenging"
        ],
        [
          "42:46",
          "47:58"
        ]
      ],
      "Polarity": "Positive",
      "Intensity": "Strong"
    }
  ]
},

sometimes the indices for the polarity_expression strings are also missing:

{
  "sent_id": "St_Leo_University_4_04-16-2004-5",
  "text": "The teachers are very helpful , and the staff is , as well .",
  "opinions": [
    {
      "Source": [
        [],
        []
      ],
      "Target": [
        [
          "teachers"
        ],
        [
          "4:12"
        ]
      ],
      "Polar_expression": [
        [
          "very",
          "helpful"
        ],
        []
      ],
      "Polarity": "Positive",
      "Intensity": "Strong"
    },
    {
      "Source": [
        [],
        []
      ],
      "Target": [
        [
          "staff"
        ],
        [
          "40:45"
        ]
      ],
      "Polar_expression": [
        [
          "very",
          "helpful"
        ],
        []
      ],
      "Polarity": "Positive",
      "Intensity": "Strong"
    }
  ]
},

python evaluate_single_dataset.py dev.json dev.json does not result in 100% F1

Hi,
I'm a bit unsure whether I'm misunderstanding the tool, but I believe using this script to compare a file to itself should always result in 100% F1 score:
https://github.com/jerbarnes/semeval22_structured_sentiment/blob/master/evaluation/evaluate_single_dataset.py
but e.g. on both darmstadt splits this is not the case:

$ python evaluate_single_dataset.py data/darmstadt_unis/dev.json data/darmstadt_unis/dev.json
Sentiment Tuple F1: 0.942
$ python evaluate_single_dataset.py data/darmstadt_unis/train.json data/darmstadt_unis/train.json
Sentiment Tuple F1: 0.964

am I using the tool wrong?
Thanks!

AssertionError: missing some sentences

I'm trying to submit some results, but I get assertion error telling me that some sentences are missing.
I tried it with a previously successful submission and I still got the same error. Is there a problem with codalab evaluation?

Error processing mpqa

Hi,

I'm trying to do the Step 1 but I'm getting this error for the MPQA 2.0 corpus:

2021-09-07 14:50:30 INFO: Loading these models for language: en (English):
========================
| Processor | Package  |
------------------------
| tokenize  | combined |
========================

2021-09-07 14:50:30 INFO: Use device: gpu
2021-09-07 14:50:30 INFO: Loading: tokenize
2021-09-07 14:50:38 INFO: Done loading processors!
  0%|                                                                                                                                                                           | 0/287 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "process_mpqa.py", line 355, in <module>
    main()
  File "process_mpqa.py", line 343, in main
    new = process_file(fname, nlp)
  File "process_mpqa.py", line 315, in process_file
    sents = get_sents(text, fname, nlp)
  File "process_mpqa.py", line 189, in get_sents
    sent_bidx = int(sentence.tokens[0].misc.split("|")[0].split("=")[1])
AttributeError: 'NoneType' object has no attribute 'split'

For the Darmstadt Service Review Corpus I have no problems.

MPQA dev split has no opinions

Hey there! Sorry to spam you -- just wanted to call out that the dev split of MPQA that process_mpqa.sh produces contains no opinions. Not sure if that's expected -- I can just use part of the training split as my dev set so not a big issue but just wanted to draw your attention to it. Thanks!

>>> with open('dev.json', 'r') as f:
...   dev = json.load(f)
... 
>>> c = Counter()
>>> for sentence in dev:
...   c[len(sentence['opinions'])] += 1
... 
>>> c
Counter({0: 2024})

Can we use the multilingual pretrained model (e.g. mBERT) in the two subtasks (Monolingual and Cross-lingual) ?

Error when uploading the file with the predictions

Hi,

I've uploaded the submission file and after a while I've seen that the status changed to "Failed".

Looking at the output file I can see in the "Scoring output log" something like this:

monolingual
########################################
SF1 on norec: XXXX
SF1 on multibooked_ca: XXXX
SF1 on multibooked_eu: XXXX
SF1 on opener_en: XXXX
SF1 on opener_es: XXXX

And If I see in the "Scoring error log":

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
Traceback (most recent call last):
  File "/tmp/codalab/tmpA_k0h4/run/program/evaluate.py", line 269, in <module>
    main()
  File "/tmp/codalab/tmpA_k0h4/run/program/evaluate.py", line 246, in main
    assert g.issubset(p), "missing some sentences: {}".format(g.difference(p))
AssertionError: missing some sentences: set([u'20020507/22.11.06-28210-40', u'xbank/wsj_0189-15', u'xbank/wsj_0266-8', u'ula/sw2078-UTF16-ms98-a-trans-211', u'ula/sw2078-UTF16-ms98-a-trans-210', u'non_fbis/16.01.33-12919-50', u'20020507/17.55.53-20579-19', u'xbank/wsj_0762-6', u'xbank/wsj_1033-10', u'20020316/20.37.48-18053-23', u'20020316/20.37.48-18053-24', u'xbank/wsj_0679-11', u'non_fbis/16.01.33-12919-49', u'non_fbis/16.01.33-12919-48', u'non_fbis/06.12.31-26764-20', u'non_fbis/12.15.47-5091-24', u'xbank/wsj_0176-7', u'xbank/wsj_0189-17', u'xbank/wsj_0557-20', u'xbank/wsj_0173-10', u'xbank/wsj_0144-9', u'xbank/wsj_0136-7', u'20020517/22.08.22-24562-15', u'20020517/22.08.22-24562-16', u'20020517/22.08.22-24562-17', u'20011130/12.33.55-762-10', u'20011130/12.33.55-762-11', u'20011204/21.34.10-25509-4', u'non_fbis/09.35.06-27851-12', u'xbank/wsj_0068-6', u'20020320/12.04.40-21590-29', u'20020517/22.08.22-24562-18', u'20020302/21.01.08-20603-7', u'ula/sw2078-UTF16-ms98-a-trans-207', u'ula/sw2015-ms98-a-trans-41', u'ula/sw2015-ms98-a-trans-40', u'ula/115CVL035-17', u'ula/115CVL035-16', u'20020411/22.23.02-12197-18', u'ula/110CYL068-31', u'xbank/wsj_0679-10', u'ula/sw2078-UTF16-ms98-a-trans-208', u'ula/sw2078-UTF16-ms98-a-trans-209', u'xbank/wsj_1038-9', u'xbank/wsj_0991-6', u'xbank/wsj_0679-9', u'ula/110CYL200-23', u'20010620/13.40.05-15087-17', u'20020123/21.21.45-6259-23', u'xbank/wsj_0551-4', u'20010706/02.01.27-21386-7', u'xbank/wsj_0068-7', u'xbank/wsj_0189-16', u'20020507/22.11.06-28210-39', u'ula/115CVL035-15', u'xbank/wsj_0173-9', u'xbank/wsj_0811-11', u'ula/115CVL035-14'])

I see that the MPQA and darmstadt_unis datasets are missing, but I've updated the data and executed the pre-processing scripts after the changes made in the 29.11.2021 before training and predicting the files.

Before the submission I've obtained the scores locally with the file evaluate_single_dataset.py and I can get the scores for all datasets in monolingual and for the crosslingual task.

What I'm doing wrong?

Regards.

jerbarnes / semeval22_structured_sentiment Goto Github PK

semeval22_structured_sentiment's Introduction

SemEval-2022 Shared Task 10: Structured Sentiment Analysis

LATEST NEWS

Table of contents:

Problem description

Subtasks

Monolingual

Data

Cross-lingual

Evaluation

Data format

Resources:

Submission via Codalab

Baselines

Important dates

Frequently asked questions

Task organizers

Citation

semeval22_structured_sentiment's People

Contributors

Stargazers

Watchers

Forkers

semeval22_structured_sentiment's Issues

Recommend Projects

Recommend Topics

Recommend Org