Giter Site home page Giter Site logo

teams-channel-content-export's Introduction

Microsoft Teams Channel Data Export for RAG-Enhanced Chatbot

This repository contains three Python scripts that facilitate the extraction of data from Microsoft Teams channels and its transformation into question-answer pairs for use in a Retrieval-Augmented Generation (RAG) enhanced chatbot.

To learn how to use Teams channel data to power a smart chatbot, check out the related blog post.

Overview

The channel_query.py script fetches and formats messages and their replies from Microsoft Teams using the Microsoft Graph API. The convert_channel_data_json.py script takes the JSON output file produced by the channel_query.py script and extracts question-answer pairs, creating a new JSON file for each pair using Azure OpenAI. The convert_channel_data_markdown.py script performs a similar function but generates the question-answer pairs as markdown, with the question set as a heading and the answer as content following the heading.

Prerequisites

  • Python 3
  • Required Python packages: requests, json, html, re, bs4, python-dotenv, openai, argparse, asyncio
  • Access to Microsoft Graph API and Azure OpenAI

Setup

  1. Clone the repository to your local machine.
  2. Install the required Python packages.
  3. Obtain an access token from the Microsoft Graph Explorer.
  4. Replace the values in the .env file with your actual ACCESS_TOKEN, GROUP_ID, and CHANNEL_ID, as well as your Azure OpenAI endpoint, API key, deployment, and API version.
  5. Save the .env file in the same directory as the scripts.

Usage

channel_query.py

This script fetches and formats messages and their replies from Microsoft Teams using the Microsoft Graph API. It cleans the HTML content of the messages and formats them into a JSON structure.

To run the script, use the command: python channel_query.py <output_file.json> <date_from as YYYY-MM-DD>

convert_channel_data_json.py

This script extracts question-answer pairs from a given JSON data file and creates a new JSON file for each pair. It uses the OpenAI API to generate questions and answers based on the input data.

To run the script, use the command: python convert_channel_data.py <input_file.json> <output_dir>

convert_channel_data_markdown.py

This script extracts question-answer pairs from a given JSON data file and creates a new markdown file for each pair. It uses the OpenAI API to generate questions and answers based on the input data. The question is set as a heading and the answer as content following the heading in the markdown file.

To run the script, use the command: python convert_channel_data_markdown.py <input_file.json> <output_dir>

License

MIT

teams-channel-content-export's People

Contributors

mario-guerra avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.