Giter Site home page Giter Site logo

robspringles / automated-health-responses Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aus10powell/automated-health-responses

0.0 1.0 0.0 43.23 MB

A prototype project for automated, physician-like responses to medical questions

License: MIT License

Jupyter Notebook 99.02% Python 0.98%

automated-health-responses's Introduction

Automated-Health-Responses

Beyond the catch-all classification of "chatbot", there are some different flavors: sentence completion, Q/A, dialogue goal-oriented, VQA (or visual dialogue), negotiation, Machine Translation

Description: A prototype project for automated, physician-like responses to medical questions

Data/Resources:

Notes About Approaches

  • Dialogue systems (which include chatbots) generally can be classified under three categories:
    • The back-and-forth dialogue between algorithm and human
    • The frame-based, goal-oriented (think online help or call-routing)
    • The interactive Q/A system.
  • The mechanism to generate the machine response to these systems can be generative (the machine comes up with its own response), or responsive (returns a pre-determined answer based on a classification). Most succesful systems seem to have a combination of the two.

Notes about the dataset

Warning: Data from this forum may have more than its share of topics of a sexual nature. (Could easily be assumed because of the anonymity of reddit.)

Data is from when the subreddit was started (2014) to early 2018. There are approximately 30k threads, 109k responses.

Data Journal

  • 1st Iteration:

    • Decided on architecture for data prep on first model for conversations. We frame the problem as bootstrapping responses to conversations in a general sense of someone who has a health-related question and someone who has some sort of knowledge on the subject. Given that there are multiple responses to potential the same question, the first pass is: someone asks a question on reddit thread and everyone post in that thread not by the author is encoded as a response. This is a big consideration of what we could reasonably expect from a trained network. We are obviously over-sampling questions perhaps giving the network incentive to learn the most generic response to a random question
    • Found out reference code for the Tensorflow Seq2Seq model was depreciated because it uses static unrolling:
      • Static unrolling involves construction of computation graph with a fixed sequence of time step. Such a graph can only handle sequences of specific lengths. One solution for handling sequences of varying lengths is to create multiple graphs with different time lengths and separate the dataset into this buckets.
      • Action: Use Dynamic Unrolling Dynamic unrolling instead uses control flow ops to process sequence step by step. In TF this is supposed to more space efficient and just as fast. This is now a recommended way to implement RNNs.
  • 2nd Iteration:

    • So far just using a character-level, teacher-forcing for 1 step ahead Seq2Seq is doing reasonably well (currently based primarily off the reasonableness of responses to training set). This is currently serving as a baseline when deciding further directions to pursue.
      • There are some issues with current data approach since the model is tending to generalize to a politically phrased response: "I'm not a doctor but"
        • Q: Husband deteriorating before my eyes, doctors at a loss, no one will help; Reddit docs, I need you.
        • A: I don't think this is a single pain is not a doctor but I have a similar symptoms and the story
        • Q: pleomorphic adenoma and a little scared
        • A: I don't think this is a single pain is not a doctor but I have a similar symptoms and the story
        • Q: I think I have Strep Throat. I do not have insurance and I cannot afford to go to the doctor.
        • A: I don't think this is a single pain is not a doctor but I have a similar symptoms and the story
    • As suspected, even with seq2seq at a word level, we are getting not so great results. Although have not trained on full dataset yet, there is a decided improvement when using less than 30 words for response. One option would be change pipeline and limit words and sentences. However I suspect the bigger issue is that many posts to initial post are not direct responses. Structuring data using as parent/post might be the right approach to try first.
  • 3rd Iteration:

    • Altered dataset so each post that had a comment posted as reply is treated as direct response. So occasionally one comment may be both a query and a response. Test training at a word level without any cleansing of data lead to very poor results as expected.
  • 4th Iteration

    • Working on 3rd data for response-type system.

automated-health-responses's People

Contributors

aus10powell avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.