Giter Site home page Giter Site logo

tata's Introduction

TATA: Stance Detection via Topic-Agnostic and Topic-Aware Embeddings

Github Repository for TATA: Stance Detection via Topic-Agnostic and Topic-Aware Embeddings

Full Paper: https://aclanthology.org/2023.emnlp-main.694/

Stance detection is important for understanding different attitudes and beliefs on the Internet. However, given that a passage's stance toward a given topic is often highly dependent on that topic, building a stance detection model that generalizes to unseen topics is difficult. In this work, we propose using contrastive learning as well as an unlabeled dataset of news articles that cover a variety of different topics to train topic-agnostic/TAG and topic-aware/TAW embeddings for use in downstream stance detection. Combining these embeddings in our full TATA model, we achieve state-of-the-art performance across several public stance detection datasets (0.771-score on the Zero-shot VAST dataset).

Topic-Aware (TAW) Dataset

Within this work, utilizing a dataset of news articles from 3,074 news websites, the MPNet model, the Parrot paraphrase, and Flan-T5-XL, we extract and pair paragraphs with similar topics from different websites for use in training a Topic-Aware (TAW) encoding model. We supply both an extended dataset of 238,228 (where there are no more than 1,000 paragraphs from any one given website) and an unfiltered dataset of 984,539 (an unrestricted number of paragraphs from any given website). To request either (or both) dataset please fill out this Google form. This dataset may only be utilized for research purposes, the copyright of the articles within this dataset belongs to the respective websites.

Topic-Agnostic (TAG) Dataset

In order to initially train a dataset of topic-agnostic encoding layer for use in our stance detection model, we extended the original VAST dataset using the Dipper Paraphraser. You can download the extended VAST/TAG dataset, at the following link. As in the original VAST dataset 0=against, 1=pro, 2=neutral.

Request TATA Model Weights

In this work, we benchmark three different models, a Topic-Agnostic model (TAG), a Topic-Aware model (TAW), and a model that incorporates both the TAG and TAW models named TATA. To request the weights for these models, please fill out the following Google form.

Citing the paper

If you use the code or datasets from this apper, you can cite us with the following BibTex entry:

@inproceedings{hanley2023tata,
    title={{TATA}: Stance Detection via Topic-Agnostic and Topic-Aware Embeddings},
    author={Hanley, Hans W. A. and Durumeric, Zakir},
    booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
    year={2023},
    url={https://openreview.net/forum?id=J9Vx7eTuWb}
  }

License and Copyright

Copyright 2024 The Board of Trustees of The Leland Stanford Junior University

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

tata's People

Contributors

hanshanley avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tata's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.