Giter Site home page Giter Site logo

dissertation's Introduction

Dissertation

Research Question: Can Chatbots Truly Be 'Unbiased'?

This project aims to answer the above question by subjecting multiple chatbots, trained over many different types of dataset, to the Implicit Association Test (IAT), which can be found here. This repository shows the code used to load the datasets, train chatbots and test them, using an Textual User Interface (TUI).

Note: The original research question for this project was "Can Adding Bias to a Machine Make it more believable?". As part of this, a basic Flask server was coded in order to later facilitate the completion of online Turing Tests to test the believability of the chatbots made. However, the research question changed before I had figured out how to use Flask properly, and thus well before this server was completed. The code has been left, as an insight into how the code was planned to be structured, and how I was learning Flask, before the change.

Because of the change the only files of note are in main.py and the chatbot folder, all other files were discontinued for this project!

Usage

First, download the repository:

git clone https://github.com/jopokemine/Dissertation.git

Next, you will need to download the datasets, which can be found under the datasets heading. They should be placed in the folder chatbot/data.

Once the datasets are installed into the data folder, run the following to train a chatbot:

python3 main.py -tr -d [datasets]

And the following to test a chatbot:

python3 main.py -te -d [datasets]

Datasets

The available datasets, and where to get them, are:

  • Amazon link
    • Credit: Henderson, M., Budzianowski, P., Casanueva, I., Coope, S., Gerz, D., Kumar,G., Mrkši ́c, N., Spithourakis, G., Su, P.-H., Vulic, I., & Wen, T.-H. (2019). A repository of conversational datasets [Data available at github.com/PolyAI-LDN/conversational-datasets]. Proceedings of the Workshop on NLP for Conversational AI. https://arxiv.org/abs/1904.06472. License: Apache License, Version 2.0.
  • Convai link
    • Credit: Aliannejadi, M., Kiseleva, J., Chuklin, A., Dalton, J., & Burtsev, M. (2020). Con-vAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ). https://arxiv.org/abs/2009.11352.
  • Cornell link
    • Credit: Danescu-Niculescu-Mizil, C., & Lee, L. (2011). Chameleons in imagined conversations: A new approach to understanding coordination of linguisticstyle in dialogs.Proceedings of the Workshop on Cognitive Modelingand Computational Linguistics, ACL 2011.
  • OpenSubtitles link
    • Credit: Henderson, M., Budzianowski, P., Casanueva, I., Coope, S., Gerz, D., Kumar,G., Mrkši ́c, N., Spithourakis, G., Su, P.-H., Vulic, I., & Wen, T.-H. (2019). A repository of conversational datasets [Data available at github.com/PolyAI-LDN/conversational-datasets]. Proceedings of the Workshop on NLP for Conversational AI. https://arxiv.org/abs/1904.06472. License: Apache License, Version 2.0.
  • QA link
    • Credit: Smith, N. A., Heilman, M., & Hwa, R. (2008). Question generation as a competitive undergraduate course project. Proceedings of the NSF Workshopon the Question Generation Shared Task and Evaluation Challenge, 4–6.
  • Reddit link
    • Credit: Credit: Henderson, M., Budzianowski, P., Casanueva, I., Coope, S., Gerz, D., Kumar,G., Mrkši ́c, N., Spithourakis, G., Su, P.-H., Vulic, I., & Wen, T.-H. (2019). A repository of conversational datasets [Data available at github.com/PolyAI-LDN/conversational-datasets]. Proceedings of the Workshop on NLP for Conversational AI. https://arxiv.org/abs/1904.06472. License: Apache License, Version 2.0.
  • SQuAD link
    • Credit: Rajpurkar, P., Jia, R., & Liang, P. (2018). Know What You Don’t Know: Unanswerable Questions for SQuAD. CoRR,abs/1806.03822. https://arxiv.org/abs/1806.03822.
  • Twitter link

Note: Due to difficulties sensibly creating sentence pairs from the data available, the Reddit dataset remains unfinished!

A shared Google Drive Folder containing the code and the datasets can be found here

dissertation's People

Contributors

jopokemine avatar

Watchers

 avatar

dissertation's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.