Giter Site home page Giter Site logo

gmpetrov / databerry Goto Github PK

View Code? Open in Web Editor NEW
2.9K 28.0 379.0 75.27 MB

The no-code platform for building custom LLM Agents

Home Page: https://chaindesk.ai

License: GNU Affero General Public License v3.0

TypeScript 88.71% JavaScript 3.23% CSS 1.10% Dockerfile 0.10% Lua 2.70% Shell 0.02% PLpgSQL 0.06% PHP 0.47% HTML 0.15% Smarty 0.03% MDX 3.45%
ai chatgpt llm no-code openai qdrant semantic-search typescript chatbot aichatbot

databerry's Introduction


Chaindesk
Chaindesk

The no-code platform for building custom LLM Agents


Chaindesk

Chaindesk provides a user-friendly solution to quickly setup a semantic search system over your personal data without any technical knowledge.

Features

  • Load data from anywhere
    • Raw text
    • Web page
    • Files
      • Word
      • Excel
      • Powerpoint
      • PDF
      • Markdown
      • Plain Text
    • Web Site (coming soon)
    • Notion (coming soon)
    • Airtable (coming soon)
  • No-code: User-friendly interface to manage your datastores and chat with your data
  • Securized API endpoint for querying your data
  • Auto sync data sources (coming soon)
  • Auto generates a ChatGPT Plugin for each datastore

Semantic Search Specs

  • Vector Database: Qdrant
  • Embeddings: Openai's text-embedding-ada-002
  • Chunk size: 1024 tokens

Stack

  • Next.js
  • Joy UI
  • LangchainJS
  • PostgreSQL
  • Prisma
  • Qdrant

Inspired by the ChatGPT Retrieval Plugin.

Run the project locally

Without docker compose

Minimum requirements to run the projects locally

  • Node.js v18
  • Postgres Database
  • Redis
  • Qdrant
  • GitHub App (NextAuth)
  • Email Provider (NextAuth)
  • OpenAI API Key
  • AWS S3 Credentials

Run locally (Docker required)

cp .env.example .env.local
# Add your own OPENAI_API_KEY

pnpm dev

# pupeteer browser local
brew install chromium --no-quarantine

# Dev emails inbox (maildev)
# visit http://localhost:1080

databerry's People

Contributors

benoitgdb avatar eltociear avatar fbossiere avatar gmpetrov avatar klaudioz avatar mohs3n71 avatar odapx avatar omahs avatar rhijjawi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

databerry's Issues

How do you run the worker process?

When I type
npx tsc --watch workers/datasource-loader.ts
I get many errors:

`npx tsc --watch workers/datasource-loader.ts
[4:52:48 PM] Starting compilation in watch mode...

node_modules/@types/react-is/node_modules/@types/react/index.d.ts:3077:14 - error TS2300: Duplicate identifier 'LibraryManagedAttributes'.

3077         type LibraryManagedAttributes<C, P> = C extends React.MemoExoticComponent<infer T> | React.LazyExoticComponent<infer T>
                  ~~~~~~~~~~~~~~~~~~~~~~~~

  node_modules/@types/react/index.d.ts:3135:14
    3135         type LibraryManagedAttributes<C, P> = C extends React.MemoExoticComponent<infer T> | React.LazyExoticComponent<infer T>
                      ~~~~~~~~~~~~~~~~~~~~~~~~
    'LibraryManagedAttributes' was also declared here.

node_modules/@types/react-is/node_modules/@types/react/index.d.ts:3088:13 - error TS2717: Subsequent property declarations must have the same type.  Property 'a' must be of type 'DetailedHTMLProps<AnchorHTMLAttributes<HTMLAnchorElement>, HTMLAnchorElement>', but here has type 'DetailedHTMLProps<AnchorHTMLAttributes<HTMLAnchorElement>, HTMLAnchorElement>'.

3088             a: React.DetailedHTMLProps<React.AnchorHTMLAttributes<HTMLAnchorElement>, HTMLAnchorElement>;
                 ~

  node_modules/@types/react/index.d.ts:3146:13
    3146             a: React.DetailedHTMLProps<React.AnchorHTMLAttributes<HTMLAnchorElement>, HTMLAnchorElement>;
                     ~
    'a' was also declared here.

node_modules/@types/react-is/node_modules/@types/react/index.d.ts:3089:13 - error TS2717: Subsequent property declarations must have the same type.  Property 'abbr' must be of type 'DetailedHTMLProps<HTMLAttributes<HTMLElement>, HTMLElement>', but here has type 'DetailedHTMLProps<HTMLAttributes<HTMLElement>, HTMLElement>'.

3089             abbr: React.DetailedHTMLProps<React.HTMLAttributes<HTMLElement>, HTMLElement>;
                 ~~~~

  node_modules/@types/react/index.d.ts:3147:13
    3147             abbr: React.DetailedHTMLProps<React.HTMLAttributes<HTMLElement>, HTMLElement>;
                     ~~~~
    'abbr' was also declared here.`

...

`workers/datasource-loader.ts:7:20 - error TS2307: Cannot find module '@app/utils/prisma-client' or its corresponding type declarations.

7 import prisma from '@app/utils/prisma-client';
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~

workers/datasource-loader.ts:8:32 - error TS2307: Cannot find module '@app/utils/task-load-datasource' or its corresponding type declarations.

8 import taskLoadDatasource from '@app/utils/task-load-datasource';
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[4:59:50 PM] Found 184 errors. Watching for file changes.`

This is what I get when I type:
npx tsx workers/datasource-loader.ts

BullMQ: DEPRECATION WARNING! Your redis options maxRetriesPerRequest must be null. On the next versions having this settings will throw an exception
BullMQ: DEPRECATION WARNING! Your redis options maxRetriesPerRequest must be null. On the next versions having this settings will throw an exception
(node:93790) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
Error: connect ECONNREFUSED 127.0.0.1:6379
    at __node_internal_captureLargerStackTrace (node:internal/errors:464:5)
    at __node_internal_exceptionWithHostPort (node:internal/errors:642:12)
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1161:16) {
  errno: -61,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 6379
}

Website crawling speed

How is it possible to increase the speed for adding websites pages?
When I do it on Chatbase it is very fast but it takes a very long time with Databerry

Unauthorized at getDatasource

Hello!

When trying to upload a file (such as a PDF), I get this

Error: Unauthorized
at getDatasource (webpack-internal:///(api)/./pages/api/datasources/[id]/index.ts:32:15)
at async Array.eval (webpack-internal:///(api)/./utils/createa-api-handler.ts:27:28)

Any idea what this entails and how to fix?

Notion Datasource

Requirements

  • public/private notion notebook as datasource
  • auto sync on change

maxRetriesPerRequest must be null

hi! thank you for your interesting project. please tell me what this error may be related to? I saw a similar question but did not understand how it was solved.

(venv) (base) USER@user@MacBook-Pro databerry % npm run worker:datasource-loader

> [email protected] worker:datasource-loader
> dotenv -e .env.local -- npx tsx --watch workers/datasource-loader.ts

Need to install the following packages:
  [email protected]
Ok to proceed? (y) y
(node:2237) ExperimentalWarning: Watch mode is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
BullMQ: DEPRECATION WARNING! Your redis options maxRetriesPerRequest must be null. On the next versions having this settings will throw an exception
BullMQ: DEPRECATION WARNING! Your redis options maxRetriesPerRequest must be null. On the next versions having this settings will throw an exception
(node:2238) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
Error: connect ECONNREFUSED 127.0.0.1:6379
    at __node_internal_captureLargerStackTrace (node:internal/errors:490:5)
    at __node_internal_exceptionWithHostPort (node:internal/errors:668:12)
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16) {
  errno: -61,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 6379
}

Error when uploading a pdf

For info I had to modify aws.ts because the PUT request failed with the original config

export const s3 = new S3({
signatureVersion: 'v4',
accessKeyId: process.env.APP_AWS_ACCESS_KEY,
secretAccessKey: process.env.APP_AWS_SECRET_KEY,
region: 'eu-west-3'. // I had to add this to specify the region
});

I think it is similar to this: aws/aws-sdk-ruby#690

Open source license?

I'm interested in this project, but currently also very focused on open source software. Could you be convinced to license this? I personally prefer copy-neutral (Apache, MIT), but I can work with anything OSI approved.

error when submiting a question to the chat

Hi,

This is what I see when I ask something in the chat:
AxiosError: Request failed with status code 404

This is what I get in the terminal:

event - compiled successfully in 509 ms (619 modules)
wait  - compiling /api/agents/[id]/query (client and server)...
event - compiled successfully in 493 ms (785 modules)
AxiosError: Request failed with status code 404
    at settle (webpack-internal:///(api)/./node_modules/.pnpm/[email protected]/node_modules/axios/lib/core/settle.js:24:12)
    at Unzip.handleStreamEnd (webpack-internal:///(api)/./node_modules/.pnpm/[email protected]/node_modules/axios/lib/adapters/http.js:584:71)
    at Unzip.emit (node:events:525:35)
    at endReadableNT (node:internal/streams/readable:1359:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'ERR_BAD_REQUEST',
  config: {
    transitional: {
      silentJSONParsing: true,
      forcedJSONParsing: true,
      clarifyTimeoutError: false
    },
    adapter: [ 'xhr', 'http' ],
    transformRequest: [ [Function: transformRequest] ],
    transformResponse: [ [Function: transformResponse] ],
    timeout: 0,
    xsrfCookieName: 'XSRF-TOKEN',
    xsrfHeaderName: 'X-XSRF-TOKEN',
    maxContentLength: -1,
    maxBodyLength: -1,
    env: { FormData: [Function], Blob: [class Blob] },
    validateStatus: [Function: validateStatus],
    headers: AxiosHeaders {
      Accept: 'application/json, text/plain, */*',
      'Content-Type': 'application/json',

When I go here:
http://localhost:3000/api/agents/clh9olhpl0001vnvlf5ay39sa/query
This is what I get:
not found

When I go here:
http://localhost:3000/api/agents
This is what I get:
[{"id":"clh9olhpl0001vnvlf5ay39sa","name":"vivien","description":"vivien","prompt":"As a customer support agent, please provide a helpful and professional response to the user's question or issue.","visibility":"private","ownerId":"clh7nm4fn0000vlnhf75ienzd","nbQueries":0,"interfaceConfig":{},"createdAt":"2023-05-04T22:10:54.829Z","updatedAt":"2023-05-04T22:10:54.829Z","tools":[{"id":"clh9olhpp0003vlvntpt9rf4e","type":"datastore","datastoreId":"clh7q9ipr0002vlbmda6d41i8"}]}]

Do you know why I get this error?

Integrations to Search API: SerpAPI or GoogleSearchAPI (please)

SerpAPI:
Install requirements with pip install google-search-results
Get a SerpAPI api key and either set it as an environment variable (SERPAPI_API_KEY) or pass it to the LLM constructor as serpapi_api_key.
GoogleSearchAPI:
Install requirements with pip install google-api-python-client
Get a Google api key and either set it as an environment variable (GOOGLE_API_KEY) or pass it to the LLM constructor as google_api_key. You will also need to set the GOOGLE_CSE_ID environment variable to your custom search engine id. You can pass it to the LLM constructor as google_cse_id as well.

Crisp: Use like Magic Replay

My idea was to write a Chrome plugin that accesses the API from you.
However, I can not pass the existing chat history reasonably. Do you have an idea?
I don't like the behavior in the original chat client of Crips with your plugin. The first question works fine, the second one has to be entered in a box, users don't understand that.

Agent > Deploy tab

  • Add Deploy tab between "Chat" and "Settings"
  • Move crisp & slack integration here

add metadata and query multiple datasoure search

How can this be done?

  • attach metadata to the each document
  • query over multiple data source names in langchain. This can be helpful over organising multiple documents under a data source and having multiple data source - eg topics and sub topics

SQL Databases as source

Is there any to get my data from an SQL DB source (mssql or Mysql), or any workarround available to achieve this.

My gial is training thz chat agent on my data, and get some valuable insights from the agent ad a replacement to my BI.

This is the way.

R&D: Agent Multi Tools

Today agents are able to connect to only 1 datastore

We need to make the agent capable of handling multiple tools (tools can be dastores, apis, other agents)

Self Hosting Question

Hello,

I wanted to try self hosting databerry. I understand most of the env file. However, I'm confused regarding this:

# Analysze js bundle
ANALYZE=false

# Next Auth
NEXTAUTH_SECRET=XXX

GITHUB_ID=XXX
GITHUB_SECRET=XXX

NEXT_PUBLIC_DASHBOARD_URL=http://localhost:3000

What do I put for my NEXTAUTH_SECRET? Is it whatever I want? What do I also put for my GHID or GHSecret. I got most of it. Is there a way I can manually just add user accounts into the database? I can't seem to sign in or make any sort of account.

Sign in with email and password

It seems there's already an existing password field placeholder, but doesn't seem to be used in the code/sign in flow. At the moment it's just sign in through email link.

Invalid encoding

This seems to happen on arabic utf8:

Error: Invalid encoding
      at module.exports.__wbindgen_error_new (/databerry/node_modules/.pnpm/@[email protected]/node_modules/@dqbd/tiktoken/tiktoken_bg.cjs:410:17)
      at wasm://wasm/00b5f812:wasm-function[23]:0x167bb
      at wasm://wasm/00b5f812:wasm-function[192]:0x4ce7d
      at module.exports.get_encoding (/databerry/node_modules/.pnpm/@[email protected]/node_modules/@dqbd/tiktoken/tiktoken_bg.cjs:159:14)
      at TokenTextSplitter.splitText (/databerry/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected][email protected]/node_modules/langchain/dist/text_splitter.js:231:39)
      at TokenTextSplitter.createDocuments (/databerry/node_modules/.pnpm/[email protected]_@[email protected][email protected][email protected][email protected][email protected]/node_modules/langchain/dist/text_splitter.js:31:33)
      at DatastoreManager.handleSplitDocument (/databerry/utils/datastores/index.ts:64:19)
      at DatastoreManager.upload (/databerry/utils/datastores/index.ts:29:20)
      at taskLoadDatasource (/databerry/utils/task-load-datasource.ts:208:18)
      at WorkerPro.import_bullmq_pro.WorkerPro.connection.connection (/databerry/workers/datasource-loader.ts:19:7)
  datasource {
    id: 'cliah5f0u00lxejtn6y399g64',
    type: 'web_page',
    name: 'https://www.tamm.abudhabi/ar-AE/life-events/Business/Manage-your-Business/Constructions/ManagetheHandoverofLandPlotCorners',
    status: 'running',
    config: {
      source: 'https://www.tamm.abudhabi/ar-AE/life-events/Business/Manage-your-Business/Constructions/ManagetheHandoverofLandPlotCorners',
      sitemap: 'https://www.tamm.abudhabi/sitemap.xml'

Databerry API Down?

Hello all!

I've been trying out Databerry and this thing is absolutely amazing! However, I've been trying to get the API to work so that I can query my agent but I keep getting 404s. Is querying agents only a premium feature?

import json
import requests

def get_agent_response(agent_id, api_key, query):
    url = f'https://api.databerry.ai/agents/query/{agent_id}'
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {api_key}',
    }
    data = {
        'query': query,
    }
    response = requests.post(url, headers=headers, data=json.dumps(data))
    return response

def main():
    agent_id = 'xxxxx'  
    api_key = 'xxxx'  
    query = 'xxx'

    response = get_agent_response(agent_id, api_key, query)
    if response.status_code == 200:
        print(response.json())
    else:
        print(f"Error: received status code {response.status_code}")

if __name__ == '__main__':
    main()

Banned Dependency Detected on Railway

What is Railway ?

Railway is an infrastructure platform where you can provision infrastructure, develop with that infrastructure locally, and then deploy to the cloud

Context

Build used : Commit 6dcde89

railway.json

{
  "$schema": "https://railway.app/railway.schema.json",
  "build": {
    "builder": "DOCKERFILE",
    "dockerfilePath": "Dockerfile"
  },
  "deploy": {
    "restartPolicyType": "ON_FAILURE",
    "restartPolicyMaxRetries": 10
  }
}

Build Log

====================================
Banned Dependency Detected!
====================================
 
torrent
 
Please remove this dependency from your project to use it on Railway

R&D: Structured Data

Semantic search does not perform well with structured data (e.g: CSV file)

I think it does not make sense to store this kind of data in a vector database.

Should we create an SQL table on the fly?

How to make it work seamlessly for the end user? Can it work in combination with Datastores or should we treat it as another tool?

Solution requirements

  • At runtime, the full CSV or SQL table should be loaded in memory

Ressources:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.