Giter Site home page Giter Site logo

tonic-ai / datatonic Goto Github PK

View Code? Open in Web Editor NEW
69.0 69.0 26.0 254 KB

🌟DataTonic : A Data-Capable AGI-style Agent Builder of Agents , that creates swarms , runs commands and securely processes and creates datasets, databases, visualisations, and analyses.

Home Page: https://www.tonic-ai.com

License: MIT License

Python 28.07% Jupyter Notebook 71.93%
agent-builder agi autogen azure chroma data data-science data-visualization database memgpt semantic-kernel semantic-memory taskweaver

datatonic's People

Contributors

dependabot[bot] avatar josephrp avatar jsaluja avatar mn-noor avatar zochory avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

datatonic's Issues

trulens eval: weaviate vs neo4j w/ imagebind multimodal vector db to ground Gemini

this sounded like a fun one
opened this for my evals plan so ppl can see what i'm doing, i'll solo this unless someone actively wants to help

https://github.com/weaviate-tutorials/multimodal-workshop/blob/main/2-multimodal/1-multimedia-search-complete.ipynb
https://github.com/tomasonjo/blogs/blob/master/llm/neo4j_llama_multimodal.ipynb
are the notebooks
@MN-Noor i might have q's for ya if i get stuck trying to implement the eval pipeline on these puppies, but i'll take a crack at it myself first unless something looks real fun to jump in on :P

# Open Tasks : Semantic Kernel Planner

Open Tasks

  • refactor semantic kernel to adapt to taskweaver with analysis using taskweaver.

  • refactor the plan to adapt to AutoGen and use case

  • major refactor based on recent updates to plugin configuration in semantic kernel

           improve the Semantic Kernel Planner
    

Originally posted by @Josephrp in #16 (comment)

MongoDB for Vector Database / Database provider choice

I would go for MongoDB

  • Used in AgentCloud (tested )
  • I'm familiar with it
  • Handle also vector through atlas vector search , memory,
  • is compatible or at least complementary with our stack
  • Cost effective and scalable with vCore and got some credits
  • Easy to use
  • connectivity with retool
  • Is compatible "natively" with Azure solutions
    image

Evaluation metrics through feedback function criteria

Use feedback functions to evaluate and log the quality of LLM app results

The third step is to run feedback functions on the prompt and responses from the app and to log the evaluation results. Note that as a developer you only need to add a few lines of code to start using feedback functions in your apps (see Figure 4(a)). You can also easily add functions tailored to the needs of your application.

Our goal with feedback functions is to programmatically check the app for quality metrics.

The first feedback function checks for language match between the prompt and the response. It’s a useful check since a natural user expectation is that the response is in the same language as the prompt. It is implemented with a call to a HuggingFace API that programmatically checks for language match.

The next feedback function checks how relevant the answer is to the question by using an Open AI LLM that is prompted to produce a relevance score.

Finally, the third feedback function checks how relevant individual chunks retrieved from the vector database are to the question, again using an OpenAI LLM in a similar manner. This is useful because the retrieval step from a vector database may produce chunks that are not relevant to the question and the quality of the final response would be better if these chunks are filtered out before producing the final response.

SQL Database Design

Based on the provided semantic_kernel_module.py and its integration with TaskWeaverSQLIntegration, we can design a database schema that effectively supports the data processing and storage requirements for generating a Statement of Work (SoW) document. The schema should accommodate storing results from TaskWeaver's processing steps and potentially other relevant data for the SoW.

#Database Schema Design

Results Table:

This table will store the outputs from each processing step in the TaskWeaver.

Copy code
CREATE TABLE IF NOT EXISTS results (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    section TEXT NOT NULL,
    details TEXT,
    processed_content TEXT,
    timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);
id: A unique identifier for each record.
section: The section of the SoW document this record belongs to (e.g., 'introduction', 'project_objectives_scope').
details: The raw details or inputs provided for processing this section.
processed_content: The output from the TaskWeaver processing step.
timestamp: The date and time when the record was created or processed.
Project Details Table (Optional):

If you want to store the initial project details separately:

Copy code
CREATE TABLE IF NOT EXISTS project_details (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    project_id TEXT NOT NULL,
    section TEXT NOT NULL,
    details TEXT,
    timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);
project_id: Identifier for the project.
section: Corresponding section in the SoW.
details: Raw details of the project for this section.

Implementing the Schema in TaskWeaverSQLIntegration

In the initialize_database method of TaskWeaverSQLIntegration, implement the SQL commands to create these tables:

Copy code
def initialize_database(self):
    self.db_connection.execute('''
        CREATE TABLE IF NOT EXISTS results (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            section TEXT NOT NULL,
            details TEXT,
            processed_content TEXT,
            timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
        );
    ''')
    # Uncomment the below code to create the project_details table
    # self.db_connection.execute('''
    #     CREATE TABLE IF NOT EXISTS project_details (
    #         id INTEGER PRIMARY KEY AUTOINCREMENT,
    #         project_id TEXT NOT NULL,
    #         section TEXT NOT NULL,
    #         details TEXT,
    #         timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
    #     );
    # ''')
    self.db_connection.commit()

Data Insertion and Retrieval

Modify process_and_store_data and retrieve_data_for_planner methods to insert and fetch data according to this schema.
Ensure that every insertion into the results table includes the section name, raw details, processed content, and timestamp.

Conclusion

This database schema provides a structured way to store and manage the data required for generating a SoW document. It accommodates the storage of both raw inputs and processed outputs, ensuring that all necessary information for each section of the SoW is readily available and well-organized.

Open Issues : DataTonic

Please make PRs to the Dev Branch :-)

Dev Branch

This issue is for the "dev branch" of the major refactor required for DataTonic.

When DataTonic was first designed it was organised like an educational repository with separations between modules and naming conventions to reflect that. It was then refactored to include semantic kernel and autogen orchestration frameworks.

Other orchestration frameworks seem unecessary and might better be produced as plugins .

Open Issues

  • refactor datatonic to be taskweaver-first
  • include plugins for taskweaver

Guidance

  • select minimum and exapandable architecture and codebase organisation to promote contributions and ease of maintenance
  • fix the imports and names
  • run taskweaver as a process

Data Tonic - Coding Tasks

Mainly Debug + Recompile +

The Tasks:

  • maybe add planning agents

  • maybe add taskweaver plugins

  • maybe add autogen agents

  • maybe add semantic kernel plugins

  • consider using facebook/seamless (on-device) to add a multilingual + accessible feature to the user interface

Improve Autogen Module

Summary

We're replicating the use case described here : #14
with data processing and deployment capabilities with taskweaver and semantic-kernel
then producing a statement of work executed by autogen.

Tasks

  1. Produce one or more variations of agents and teams to have a wide range of action.
  2. Move the MemGPTMemoryManager class to a separate file :-)

I want to particiapte in Hackathon with your team.

👋 I'm Muhammad Arham , a MERN Stack Developer. I build things on the internet and love building large-scale applications with system design in mind.
Skills:
Frontend Development: 💻 JavaScript (ES6+),HTML5, CSS3, SCSS, Tailwind CSS , Material UI, Ant Design, Bootstrap 5, React JS, React Native, Context API, Redux
Backend Development: 🚀 Node.js, Express.js, Firebase
Programming Languages: 📝 JavaScript (ES6+), Php
Version Control: 🧭 Git, GitHub
API Knowledge: 🌐 REST APIs
Databases: 📂 MongoDB, Firestore , MySql

My Portfolio 💻 : https://muhammad-arham.netlify.app/

I am looking for a team to participate in Hackathon. I am dedicated to work with your team and build things with kind and pure heart.

Improve Task Weaver Module

Summary

take the attached use case and align the current taskweaver planner for each data task

Requirements

  1. one or more additional plugins from TaskWeaver for example SQL

  2. one or more new taskweaver plugins or classes in the taskweaver_module per phase or per data task below.

  3. integrate taskweaver into semantic-kernel

       # Data Driven Advisory (Use Case)
    

Phase 1: Engagement Setup (1-2 weeks)

Client Background Information: Company history, mission, vision, and strategic objectives.
Industry Data: Market size, trends, competitors, and regulatory environment.
Stakeholder Information: Key stakeholders, organizational structure, and decision-makers.
Phase 2: Data Gathering and Analysis (3-6 weeks)

Operational Data: Sales figures, production data, supply chain details, employee information.
Financial Data: Profit and loss statements, balance sheets, cash flow statements, budgets.
Customer Data: Customer demographics, satisfaction surveys, purchase history.
Internal Documents: Previous strategy documents, reports, internal analyses.
Phase 3: In-Depth Analysis and Hypothesis Testing (4-8 weeks)

Segmented Data: More detailed operational and financial data broken down by business unit, geography, product line, etc.
Competitive Intelligence: Detailed competitor analysis, market share, business models.
Benchmarking Data: Industry benchmarks, best practices, case studies.
Qualitative Data: Interviews, focus groups, expert opinions.

make a dev branch

We need a dev branch, or branches for each open issue.

currently :

  • trulens evaluation
  • interface
  • app + coding tasks

Open Tasks : PR TruEra/TruLens

Open Tasks : Make a PR to TruEra/Trulens

  • link this issue to "add vectara" pull request on TruEra

  • re-factor the connector

  • write documentatoin

  • publish notebook

  • write truera blog (?)

            big thank you to 🏆😎 @MN-Noor for producing the first TruLens with gemini on RAG using open ai!
    

...

we'll all work on this together, normally if everyone does one, or at least contributes to a good one we will have secured this task.

Originally posted by @Josephrp in #1 (comment)

Improve Semantic Kernel Module

Summary

We need to align or produce a semantic kernel module/planner that takes the taskweaver outputs to produce something aligned with the Statement of Work for autogen to use.

Tasks

  1. produce one or more planners that will use taskweaver plugins and generate the below as an output.

(Pre Phase 1) - Statement Of Work:
Image
(Pre Phase 1 - Statement of Work)
Introduction

Overview: Brief description of the client's organization and the context of the engagement.
Purpose of the SoW: Clarification of the document's intent and its role as a guiding agreement.

Project Objectives and Scope

Objectives: Clear and specific goals the project aims to achieve.
Scope of Work: Detailed description of the services and tasks to be performed. This section delineates what is included and, just as importantly, what is not included in the engagement.

Project Approach and Methodology

Methodology: Explanation of the methodologies, frameworks, or strategies the consulting team will employ.
Phases of Work: Breakdown of the project into phases or milestones, each with specific tasks and objectives.

Deliverables

List of Deliverables: Detailed list of expected outputs, reports, presentations, tools, or models to be provided.
Quality Standards: Description of the standards or criteria against which the deliverables will be assessed. 

Timeline

Project Timeline: Detailed timeline of the project, including start and end dates, phase durations, and key milestones.
Review Points: Scheduled points for reviewing progress and adjusting plans as necessary.

*6. Roles and Responsibilities

Consulting Team Composition: Names and roles of the consultants involved.
Client Responsibilities: Specific tasks or inputs required from the client, such as data provision, key personnel involvement, etc.**

Pricing and Payment Terms

Fee Structure: Details on how the consulting fees are structured - fixed fee, time and materials, etc.
Payment Schedule: Timeline and conditions for payments.

Confidentiality, Legal, and Ethical Considerations

Confidentiality Clauses: Terms ensuring the confidentiality of shared information.
Legal and Compliance Aspects: Adherence to relevant laws and industry regulations.
Ethical Standards: Commitment to maintaining high ethical standards during the engagement.

Terms and Conditions

Contractual Terms: General terms including contract duration, termination conditions, dispute resolution mechanisms, etc.
Amendment Process: Process for making changes to the SoW.

Signatures

Sign-off: Signatures from authorized representatives of both the consulting firm and the client.

Interface : Retool as a choice for the front-end solution

Reason :

  • Very easy to go live, perfect for MVP at minimum
  • Have a self-hosting choice
  • I have a Business Account so pretty much all the features
  • Has builtin database, third party provider in it (Postgres, Mongo, OpenAI etc etc etc)
  • Styling possible
  • Can handle developer logics, workflows, APIs, ...
  • Is fit for the project in term of back-end<>LLM<>front-end

Example of an interface (more a wireframe, since it just for showing you
https://www.loom.com/share/385defc10f7b48db96471175902f983d?sid=f02cc527-2008-4811-9940-709cb07f19a1

image
image

# Open Tasks : EDIT README

Open Tasks : EDIT README

  • add the above to the README.md
  • add the file path for taskweaver persist memory
  • add the file path for semantic kernel memory
  • add the file path for user "add docs to embeddings"

Originally posted by @Josephrp in #49 (comment)

Brainstorming concept : DataTonic, help and find the most optimized LLM model for an user usecase

Problem :

Today there is a large choice of custom and sometime not accessible or complex, time consuming, would need to pay a subscription to *some bullshit AI solution over priced", would need to pay fair subscription price, or to pay expensive a freelance or agency to do it.

Concept :

Datatonic uses autogen, truegen, its own prompt engenering, its agent...

is able to evaluate and do testing in order to find the proper LLM model for given scenario
take into account multimodel (sound, image, text) using Gemini and Truegen.

Benefits for end users/companies :

  • Find cost efficient "bundle" for anyone with or without prior knowledge
  • Transparent with personalization and flexibility
  • always up to date, so always useful (not a temporary concept that will loss relevancy)
  • Give a clear documentation for part that Datatonic can't do for security reason
  • No lockup user is free to own its bundles prebuilt or just documented

Benefit for us :

  • Will probably always have usefulness, gave us additional data/insight through time
  • Business model with multiple income (pay as you go, agency service) (?)
  • Probably more

Limitation:
Seem complex, too irrealistic ?

Scenario example

User :

↳ Add a credential

  • Pay as you go
  • Bring your own key

↳ Explain what he want to accomplish

  • eg: I love monkeys, and i want a model able to ingest books in PDF about monkeys that can contain image about monkey, or youtube video that can also able to get sounds from it , or photo.

System (still need more information so will follow ask follow up until system has all he needs :

↳ Budget per token max

  • Dollar giving tokens number _(maybe system could make an a priori guess estimate of the number of tokens that would be consume for a month/week ?)

↳ Currently using solutions to add as integration :

  • Integration CTA
  • Postgres credential

↳ Specific demands for it like for eg :

  • "Must use Cohere"
  • "OSS LLM"...
  • "Intensive and often ingestion or From time to time ?"
  • "Chat history ? Conversation ? Persistant memory ? "

↳ When user has done and system has all infos he needs open draggrable list with each component from users

  • Budget
  • Must use Cohere
  • OSS LLM
  • ...

↳ Enterprise extra security needed ?

↳ ...

↳ ...

↳ User can re-order the list based on the importance
↳ Or other kind of measure like a typical five-level Likert item (trivial , not important, important, very important)

↳ Datatonic through its datasets (can be some benchmark of Transformers, embedding/chunking...), Existing evaluations based on the proper metric, on its own evaluation already made and often updated between each langage models, multimodal models (would be more complex (?))

↳ Provide few bundles possibilities (without to have to create each (possible?)
↳ User choose one of these, or through chat input ask for ajustment

Final step 🕺🏿↳ When user has chosen its bundle, Datatonic starts the work and user wil be notified when it's done

### Final step seem too much and seem to add too much complexity and seems a bit a non sense ? when we could provide detailed documentations instead... and give choice to make us built it for him/company as an agency with a support and on boarding

Supabase for User Authentication provider

I would suggest Supabase as it's easy, fast, free, combo well with any framework/tool
Ability to have Email + password login, Googe OAuth, Github OAuth etc...
Since it's Postgres (has also python, js ... client) and open source, is very flexible

Use Case : Data Driven Advisory

          # Data Driven Advisory (Use Case)

Phase 1: Engagement Setup (1-2 weeks)

Client Background Information: Company history, mission, vision, and strategic objectives.
Industry Data: Market size, trends, competitors, and regulatory environment.
Stakeholder Information: Key stakeholders, organizational structure, and decision-makers.

Phase 2: Data Gathering and Analysis (3-6 weeks)

Operational Data: Sales figures, production data, supply chain details, employee information.
Financial Data: Profit and loss statements, balance sheets, cash flow statements, budgets.
Customer Data: Customer demographics, satisfaction surveys, purchase history.
Internal Documents: Previous strategy documents, reports, internal analyses.

Phase 3: In-Depth Analysis and Hypothesis Testing (4-8 weeks)

Segmented Data: More detailed operational and financial data broken down by business unit, geography, product line, etc.
Competitive Intelligence: Detailed competitor analysis, market share, business models.
Benchmarking Data: Industry benchmarks, best practices, case studies.
Qualitative Data: Interviews, focus groups, expert opinions.

Phase 4: Solution Development and Validation (2-4 weeks)

Scenario Analysis Data: For testing different strategic options and their potential outcomes.
Risk Assessment Data: Data related to potential risks and mitigation strategies.
Feedback Data: Initial feedback on proposed solutions from a small group of stakeholders or pilot tests.

Phase 5: Final Recommendations and Implementation Planning (2-3 weeks)

Consolidated Analysis: Summarized data and analysis that support the final recommendations.
Stakeholder Feedback: Comprehensive feedback on proposed recommendations.
Implementation Data: Resources required for implementation, timelines, and milestones.

Phase 6: Implementation Support and Closure (Variable)

Performance Data: Metrics and KPIs to track the implementation progress.
Adjustment Data: Ongoing data collection for adjusting strategies as needed.
Final Outcome Data: Data reflecting the impact of the implemented solutions.

Post-Engagement (Optional)

Long-term Impact Data: Data collected over time to assess the long-term impact of the engagement.
Follow-up Feedback: Stakeholder feedback on the effectiveness and outcomes of the project.

Statement Of Work:

(example of a single fixed output useable by autogen)

Overview: Brief description of the client's organization and the context of the engagement.
Purpose of the SoW: Clarification of the document's intent and its role as a guiding agreement.

Project Objectives and Scope

Objectives: Clear and specific goals the project aims to achieve.
Scope of Work: Detailed description of the services and tasks to be performed. This section delineates what is included and, just as importantly, what is not included in the engagement.

Project Approach and Methodology

Methodology: Explanation of the methodologies, frameworks, or strategies the consulting team will employ.
Phases of Work: Breakdown of the project into phases or milestones, each with specific tasks and objectives.

Deliverables

List of Deliverables: Detailed list of expected outputs, reports, presentations, tools, or models to be provided.
Quality Standards: Description of the standards or criteria against which the deliverables will be assessed. 

Timeline

Project Timeline: Detailed timeline of the project, including start and end dates, phase durations, and key milestones.
Review Points: Scheduled points for reviewing progress and adjusting plans as necessary.

*6. Roles and Responsibilities

Consulting Team Composition: Names and roles of the consultants involved.
Client Responsibilities: Specific tasks or inputs required from the client, such as data provision, key personnel involvement, etc.**

Pricing and Payment Terms

Fee Structure: Details on how the consulting fees are structured - fixed fee, time and materials, etc.
Payment Schedule: Timeline and conditions for payments.

Confidentiality, Legal, and Ethical Considerations

Confidentiality Clauses: Terms ensuring the confidentiality of shared information.
Legal and Compliance Aspects: Adherence to relevant laws and industry regulations.
Ethical Standards: Commitment to maintaining high ethical standards during the engagement.

Terms and Conditions

Contractual Terms: General terms including contract duration, termination conditions, dispute resolution mechanisms, etc.
Amendment Process: Process for making changes to the SoW.

Signatures

Sign-off: Signatures from authorized representatives of both the consulting firm and the client.

Originally posted by @Josephrp in #3 (comment)

Priority Task : start using trulens to evaluate Gemini

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.