Giter Site home page Giter Site logo

context-labs / autodoc Goto Github PK

View Code? Open in Web Editor NEW
1.8K 17.0 105.0 23.49 MB

Experimental toolkit for auto-generating codebase documentation using LLMs

License: MIT License

TypeScript 91.00% JavaScript 9.00%
documentation-generator language-model cli-tool typescript

autodoc's Introduction


Markdownify
Autodoc

⚡ Toolkit for auto-generating codebase documentation using LLMs ⚡

Twitter URL NPM Package Twitter URL Discord Server

What is this?Get StartedCommunityContribute

What is this?

Autodoc is a experimental toolkit for auto-generating codebase documentation for git repositories using Large Language Models, like GPT-4 or Alpaca. Autodoc can be installed in your repo in about 5 minutes. It indexes your codebase through a depth-first traversal of all repository contents and calls an LLM to write documentation for each file and folder. These documents can be combined to describe the different components of your system and how they work together.

The generated documentation lives in your codebase, and travels where your code travels. Developers who download your code can use the doc command to ask questions about your codebase and get highly specific answers with reference links back to code files.

In the near future, documentation will be re-indexed as part your CI pipeline, so it is always up-to-date. If your interested in working contributing to this work, see this issue.

Status

Autodoc is in the early stages of development. It is functional, but not ready for production use. Things may break, or not work as expected. If you're interested in working on the core Autodoc framework, please see contributing. We would love to have your help!

FAQs

Question: I'm not getting good responses. How can I improve response quality?

Answer: Autodoc is in the early stages of development. As such, the response quality can vary widely based on the type of project your indexing and how questions are phrased. A few tips to writing good query:

  1. Be specific with your questions. Ask things like "What are the different components of authorization in this system?" rather than "explain auth". This will help Autodoc select the right context to get the best answer for your question.
  2. Use GPT-4. GPT-4 is substantially better at understanding code compared to GPT-3.5 and this understanding carries over into writing good documentation as well. If you don't have access, sign up here.

Examples

Below are a few examples of how Autodoc can be used.

  1. Autodoc - This repository contains documentation for itself, generated by Autodoc. It lives in the .autodoc folder. Follow the instructions here to learn how to query it.
  2. TolyGPT.com - TolyGPT is an Autodoc chatbot trained on the Solana validator codebase and deployed to the web for easy access. In the near future, Autodoc will support a web version in addition to the existing CLI tool.

Get Started

Requirements

Autodoc requires Node v18.0.0 or greater. v19.0.0 or greater is recommended. Make sure you're running the proper version:

$ node -v

Example output:

v19.8.1

Install the Autodoc CLI tool as a global NPM module:

$ npm install -g @context-labs/autodoc

This command installs the Autodoc CLI tool that will allow you to create and query Autodoc indexes.

Run doc to see the available commands.

Querying

You can query a repository that has Autodoc installed via the CLI. We'll use the Autodoc repository itself as an example to demonstrate how querying in Autodoc works, but this could be your own repository that contains an index.

Clone Autodoc and change directory to get started:

$ git clone https://github.com/context-labs/autodoc.git
$ cd autodoc

Right now Autodoc only supports OpenAI. Make sure you have have your OpenAI API key exported in your current session:

$ export OPENAI_API_KEY=<YOUR_KEY_HERE>

To start the Autodoc query CLI, run:

$ doc q

If this is your first time running doc q, you'll get a screen that prompts you to select which GPT models you have access to. Select whichever is appropriate for your level of access. If you aren't sure, select the first option:

Markdownify

You're now ready to query documentation for the Autodoc repository:

Markdownify

This is the core querying experience. It's very basic right now, with plenty of room of improvement. If you're interested in improving the Autodoc CLI querying experience, checkout this issue.

Indexing

Follow the steps below to generate documentation for your own repository using Autodoc.

Change directory into the root of your project:

cd $PROJECT_ROOT

Make sure your OpenAI API key is available in the current session:

$ export OPENAI_API_KEY=<YOUR_KEY_HERE>

Run the init command:

doc init

You will be prompted to enter the name of your project, GitHub url, and select which GPT models you have access to. If you aren't sure which models you have access to, select the first option. You can also specify your own GPT file/directory prompts that will be used to summarize/analyze the code repoThis command will generate an autodoc.config.json file in the root of your project to store the values. This file should be checked in to git.

Note: Do not skip entering these values or indexing may not work.

Prompt Configuration: You'll find prompt directions specified in prompts.ts, with some snippets customizable in the autodoc.config.json. The current prompts are developer focused and assume your repo is code focused. We will have more reference templates in the future.

Run the index command:

doc index

You should see a screen like this:

Markdownify

This screen estimates the cost of indexing your repository. You can also access this screen via the doc estimate command. If you've already indexed once, then doc index will only reindex files that have been changed on the second go.

For every file in your project, Autodoc calculates the number of tokens in the file based on the file content. The more lines of code, the larger the number of tokens. Using this number, it determine which model it will use on per file basis, always choosing the cheapest model whose context length supports the number of tokens in the file. If you're interested in helping make model selection configurable in Autodoc, check out this issue.

Note: This naive model selection strategy means that files under ~4000 tokens will be documented using GPT-3.5, which will result in less accurate documentation. We recommend using GPT-4 8K at a minimum. Indexing with GPT-4 results in significantly better output. You can apply for access here.

For large projects, the cost can be several hundred dollars. View OpenAI pricing here.

In the near future, we will support self-hosted models, such as Llama and Alpaca. Read this issue if you're interesting in contributing to this work.

When your repository is done being indexed, you should see a screen like this:

Markdownify

You can now query your application using the steps outlined in querying.

Community

There is a small group of us that are working full time on Autodoc. Join us on Discord, or follow us on Twitter for updates. We'll be posting regularly and continuing to improve the Autodoc application. Want to contribute? Read below.

Contributing

As an open source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infra, or better documentation.

For detailed information on how to contribute, see here.

autodoc's People

Contributors

0xturboblitz avatar andrewhong5297 avatar eabdelmoneim avatar fionnachan avatar klaudioz avatar nilotaviano avatar samheutmaker avatar yangeok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autodoc's Issues

Dialog with Autodoc using SshNet

Hello all
I'm trying to build a webApp to communicate my server that is running Autodoc.
At server side I have cloned, initialized and indexed some repos.

I wish to build a portal to have a dialog with my server, using SshNet.

I have many issues,
and cant see why I'm failing.

Any help?

this is my code I have been trying.

        public void NavigateToFolder(string folder)
        {
            try 
            {
                ShellStream stream = _client.CreateShellStream("commands", 0, 0, 0, 0, 1024);
                sendCommand("cd autodoc-poc/<My-Repo>", stream);
                sendCommand("doc q", stream);
           }

            catch (SshException sshException) 
            {
            }
        }

        public StringBuilder sendCommand(string customCMD, ShellStream stream)
        {
            StringBuilder answer;

            var reader = new StreamReader(stream);
            var writer = new StreamWriter(stream);
            writer.AutoFlush = true;
            WriteStream(customCMD, writer, stream);
            answer = ReadStream(reader);
            return answer;
        }

        private void WriteStream(string cmd, StreamWriter writer, ShellStream stream)
        {
            writer.WriteLine(cmd);
            while (stream.Length == 0)
            {
                Thread.Sleep(500);
            }
        }

        private StringBuilder ReadStream(StreamReader reader)
        {
            StringBuilder result = new StringBuilder();

            string line;
            while ((line = reader.ReadLine()) != null)
            {
                result.AppendLine(line);
            }
            return result;
        }

Create a `doc faq` script, that takes in a list of questions and generates an FAQ.md file (after the repo has been indexed)

A user has requested to be able to generate an FAQ file given a list of questions easily. This could be implemented as a new CLI command, and the list of questions can be added to the config as an array or something.

"The idea is to process the repo with autodoc, the generated MD files have some auto generated questions about the code, we would love to be able to add questions from our community and get them answered if answer to that question can be generated from the code."

This is a great beginner issue.

404s back from OpenAI

Any idea why I'd be seeing these 404s?

Failed to get summary for file TokenLib.sol
⠹ Processing 26 files...Error: Request failed with status code 404
    at createError (file:///home/dom/src/_AC/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:302:19)
    at settle (file:///home/dom/src/_AC/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:24:16)
    at file:///home/dom/src/_AC/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:185:19
    at new Promise (<anonymous>)
    at fetchAdapter (file:///home/dom/src/_AC/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:177:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  config: {
    transitional: {
      silentJSONParsing: true,
      forcedJSONParsing: true,
      clarifyTimeoutError: false
    },
    adapter: [AsyncFunction: fetchAdapter],
    transformRequest: [ [Function: transformRequest] ],
    transformResponse: [ [Function: transformResponse] ],
    timeout: 0,
    xsrfCookieName: 'XSRF-TOKEN',
    xsrfHeaderName: 'X-XSRF-TOKEN',
    maxContentLength: -1,
    maxBodyLength: -1,
    validateStatus: [Function: validateStatus],
    headers: {
      Accept: 'application/json, text/plain, */*',
      'Content-Type': 'application/json',
      'User-Agent': 'OpenAI/NodeJS/3.2.1',
      Authorization: 'Bearer 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'
    },
    method: 'post',
    data: '{"model":"gpt-4","temperature":0.1,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":false,"messages":[{"role":"user","content":"\\n    You are acting as a code documentation expert for a project called stm_sol.\\n    Below is the code from a file located at `stm_sol`. \\n    Write a detailed technical explanation of what this code does. \\n      Focus on the high-level purpose of the code and how it may be used in the larger project.\\n      Include code examples where appropriate. Keep you response between 100 and 300 words. \\n      DO NOT RETURN MORE THAN 300 WORDS.\\n      Output should be in markdown format.\\n      Do not just list the methods and classes in this file.\\n    Do not say \\"this file is a part of the stm_sol project\\".\\n\\n    code:\\n    // SPDX-License-Identifier: AGPL-3.0-only - (c) AirCarbon Pte Ltd - see /LICENSE.md for Terms\\n// Author: https://github.com/7-of-9\\n// Certik (AD): locked compiler version\\npragma solidity 0.8.5;\\n\\nimport \\"../Interfaces/StructLib.sol\\";\\n\\nimport \\"../StMaster/StMaster.sol\\";\\n\\nlibrary TransferLib {\\n    event TransferedFullSecToken(address indexed from, address indexed to, uint256 indexed stId, uint256 mergedToSecTokenId, uint256 qty, StructLib.TransferType transferType);\\n    event

Error during traversal: The text contains a special token that is not allowed

When I run doc index on the langchain repository, I receive the following error:

⠇ Processing 494 files...Error during traversal: The text contains a special token that is not allowed: <|endoftext|>
Failed to find `autodoc.config.json` file. Did you run `doc init`?
Error: The text contains a special token that is not allowed: <|endoftext|>
    at module.exports.__wbindgen_error_new (/usr/local/Cellar/node/19.8.1/lib/node_modules/@context-labs/autodoc/node_modules/@dqbd/tiktoken/tiktoken_bg.cjs:398:17)
    at wasm://wasm/00b63e2e:wasm-function[15]:0xebb8
    at wasm://wasm/00b63e2e:wasm-function[154]:0x48af5
    at Tiktoken.encode (/usr/local/Cellar/node/19.8.1/lib/node_modules/@context-labs/autodoc/node_modules/@dqbd/tiktoken/tiktoken_bg.cjs:257:18)
    at processFile (file:///usr/local/Cellar/node/19.8.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/processRepository.js:24:40)
    at async file:///usr/local/Cellar/node/19.8.1/lib/node_modules/@context-labs/autodoc/dist/cli/utils/traverseFileSystem.js:42:21
    at async Promise.all (index 2)
    at async dfs (file:///usr/local/Cellar/node/19.8.1/lib/node_modules/@context-labs/autodoc/dist/cli/utils/traverseFileSystem.js:38:13)
    at async file:///usr/local/Cellar/node/19.8.1/lib/node_modules/@context-labs/autodoc/dist/cli/utils/traverseFileSystem.js:25:21
    at async Promise.all (index 0)

I believe this is an issue with autodoc, rather than the langchain repository, as I have followed the instructions in the README file and run doc init in the langchain repository before running doc index.

Here is some information about my environment:

  • Operating system: macOS Monterey 12.6.3 (21G419)
  • Node.js version: v19.8.1

Please let me know if there is any additional information I can provide or steps I can take to resolve this issue.

no summary

I tried autodoc on multiple repositories including autodoc itself.

somehow it's not producing summary per folder. I tried default settings - gpt3.5, I tried all 3 positions, I even tried to edit manually the autodoc.config.json and to only leave gpt-4 as you have in autodoc repo. Still, I have md files for most source files, but no summary per folder.

tried on two machines ubuntu 20.04 and 22.04. with both nodejs18 and 19.

When Indexing, [TOO MANY REQUESTS] Keeps being thrown

Indexing as usual, after estimation, it runs for a bit then keeps throwing this error:

Failed to get summary for file github.py
⠹ Processing 724 files...Error: Request failed with status code 429
    at createError (file:///home/bewinxed/.nvm/versions/node/v18.15.0/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:302:19)
    at settle (file:///home/bewinxed/.nvm/versions/node/v18.15.0/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:24:16)
    at file:///home/bewinxed/.nvm/versions/node/v18.15.0/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:185:19
    at new Promise (<anonymous>)
    at fetchAdapter (file:///home/bewinxed/.nvm/versions/node/v18.15.0/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:177:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  config: {
    transitional: {
      silentJSONParsing: true,
      forcedJSONParsing: true,
      clarifyTimeoutError: false
    },
    adapter: [AsyncFunction: fetchAdapter],
    transformRequest: [ [Function: transformRequest] ],
    transformResponse: [ [Function: transformResponse] ],
    timeout: 0,
    xsrfCookieName: 'XSRF-TOKEN',
    xsrfHeaderName: 'X-XSRF-TOKEN',
    maxContentLength: -1,
    maxBodyLength: -1,
    validateStatus: [Function: validateStatus],
    headers: {
      Accept: 'application/json, text/plain, */*',
      'Content-Type': 'application/json',
      'User-Agent': 'OpenAI/NodeJS/3.2.1',
      Authorization: 'Bearer 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'
    },
    method: 'post',
    data: '{"model":"gpt-3.5-turbo","temperature":0.1,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":false,"messages":[{"role":"user","content":"\\n    You are acting as a code documentation expert for a project called RADAR.\\n    Below is the code from a file located at `RADAR`. \\n    Write a detailed technical explanation of what this code does. \\n    Focus on the high-level purpose of the code and how it may be used in the larger project.\\n    Include code examples where appropriate. Keep you response between 100 and 300 words. \\n    DO NOT RETURN MORE THAN 300 WORDS.\\n    Output should be in markdown format. \\n    Do not say \\"this file is a part of the RADAR project\\".\\n    Do not just list the methods and classes in this file.\\n\\n    Code:\\n    import json\\nfrom fastapi.encoders import jsonable_encoder\\nfrom typing import Optional\\nimport aiohttp\\nfrom fastapi import APIRouter, Query, Response\\n\\nfrom utils.lunaris import Lunaris\\n\\nAPI_KEY = \\"819a8443-a2fb-433f-83cd-7a47257bd548\\"\\n\\nrouter = APIRouter()\\n\\n\\[email protected](\\"/collection/find\\")\\nasync def find_collection_post(\\n    helloMoonCollectionId: str = None,\\n    collectionName: str = None,\\n):\\n    if all([helloMoonCollectionId, collectionName]):\\n        raise Exception(\\n            \\"You can only provide one of helloMoonCollectionId or collectionName\\"\\n        )\\n    if not any([helloMoonCollectionId, collectionName]):\\n        raise Exception(\\n            \\"You must provide either helloMoonCollectionId or collectionName\\"\\n        )\\n    return await Lunaris().find_collection(\\n        helloMoonCollectionId=helloMoonCollectionId, collectionName=collectionName\\n    )\\n\\n\\n    Response:\\n\\n  "}]}',
    url: 'https://api.openai.com/v1/chat/completions'
  },
  request: Request {
    [Symbol(realm)]: { settingsObject: [Object] },
    [Symbol(state)]: {
      method: 'POST',
      localURLsOnly: false,
      unsafeRequest: false,
      body: [Object],
      client: [Object],
      reservedClient: null,
      replacesClientId: '',
      window: 'client',
      keepalive: false,
      serviceWorkers: 'all',
      initiator: '',
      destination: '',
      priority: null,
      origin: 'client',
      policyContainer: 'client',
      referrer: 'client',
      referrerPolicy: '',
      mode: 'cors',
      useCORSPreflightFlag: false,
      credentials: 'same-origin',
      useCredentials: false,
      cache: 'default',
      redirect: 'follow',
      integrity: '',
      cryptoGraphicsNonceMetadata: '',
      parserMetadata: '',
      reloadNavigation: false,
      historyNavigation: false,
      userActivation: false,
      taintedOrigin: false,
      redirectCount: 0,
      responseTainting: 'basic',
      preventNoCacheCacheControlHeaderModification: false,
      done: false,
      timingAllowFailed: false,
      headersList: [HeadersList],
      urlList: [Array],
      url: [URL]
    },
    [Symbol(signal)]: AbortSignal { aborted: false },
    [Symbol(headers)]: HeadersList {
      cookies: null,
      [Symbol(headers map)]: [Map],
      [Symbol(headers map sorted)]: null
    }
  },
  response: {
    ok: false,
    status: 429,
    statusText: 'Too Many Requests',
    headers: HeadersList {
      cookies: null,
      [Symbol(headers map)]: [Map],
      [Symbol(headers map sorted)]: null
    },
    config: {
      transitional: [Object],
      adapter: [AsyncFunction: fetchAdapter],
      transformRequest: [Array],
      transformResponse: [Array],
      timeout: 0,
      xsrfCookieName: 'XSRF-TOKEN',
      xsrfHeaderName: 'X-XSRF-TOKEN',
      maxContentLength: -1,
      maxBodyLength: -1,
      validateStatus: [Function: validateStatus],
      headers: [Object],
      method: 'post',
      data: '{"model":"gpt-3.5-turbo","temperature":0.1,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":false,"messages":[{"role":"user","content":"\\n    You are acting as a code documentation expert for a project called RADAR.\\n    Below is the code from a file located at `RADAR`. \\n    Write a detailed technical explanation of what this code does. \\n    Focus on the high-level purpose of the code and how it may be used in the larger project.\\n    Include code examples where appropriate. Keep you response between 100 and 300 words. \\n    DO NOT RETURN MORE THAN 300 WORDS.\\n    Output should be in markdown format. \\n    Do not say \\"this file is a part of the RADAR project\\".\\n    Do not just list the methods and classes in this file.\\n\\n    Code:\\n    import json\\nfrom fastapi.encoders import jsonable_encoder\\nfrom typing import Optional\\nimport aiohttp\\nfrom fastapi import APIRouter, Query, Response\\n\\nfrom utils.lunaris import Lunaris\\n\\nAPI_KEY = \\"819a8443-a2fb-433f-83cd-7a47257bd548\\"\\n\\nrouter = APIRouter()\\n\\n\\[email protected](\\"/collection/find\\")\\nasync def find_collection_post(\\n    helloMoonCollectionId: str = None,\\n    collectionName: str = None,\\n):\\n    if all([helloMoonCollectionId, collectionName]):\\n        raise Exception(\\n            \\"You can only provide one of helloMoonCollectionId or collectionName\\"\\n        )\\n    if not any([helloMoonCollectionId, collectionName]):\\n        raise Exception(\\n            \\"You must provide either helloMoonCollectionId or collectionName\\"\\n        )\\n    return await Lunaris().find_collection(\\n        helloMoonCollectionId=helloMoonCollectionId, collectionName=collectionName\\n    )\\n\\n\\n    Response:\\n\\n  "}]}',
      url: 'https://api.openai.com/v1/chat/completions'
    },
    request: Request {
      [Symbol(realm)]: [Object],
      [Symbol(state)]: [Object],
      [Symbol(signal)]: [AbortSignal],
      [Symbol(headers)]: [HeadersList]
    },
    data: { error: [Object] }
  },
  isAxiosError: true,
  toJSO

[Error: ENOENT: no such file or directory] while indexing

I followed the steps to index my directory, which contains subfolders that also have git repos in them, but they are in the .gitignore file.

While running index, after 5 seconds i get this

Failed to find `autodoc.config.json` file. Did you run `doc init`?
[Error: ENOENT: no such file or directory, stat 'niftypay/mypython/bin/python'] {
  errno: -2,
  code: 'ENOENT',
  syscall: 'stat',
  path: 'niftypay/mypython/bin/python'
}

not sure what's wrong, niftypay is a subdirectory and it's in .gitignore

Allow for configuration of indexing strategy.

Autodoc currently only supports indexing a file using the most affordable models available in a projects autodoc.config.json. Ideally, we should allow for different types of indexing strategies. At very least, there should be a option to use the "best available" models, which would tell Autodoc to always choose the most powerful models that have been configured.

This is an evolving issue. Please reach out on Discord if you're interested in contributing.

Related #8.

ReferenceError: Headers is not defined

Tried running autodoc on a fresh repository and got this error at indexing time.

Steps to reproduce:

  1. doc init to create an autodoc.config.json file (contents pasted below)
  2. Run doc index and answer yes at the prompt

Stacktrace:

ReferenceError: Headers is not defined
    at createRequest (file:///home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:
234:21)
    at fetchAdapter (file:///home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:1
64:21)
    at dispatchRequest (/home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/axios/lib/core/dispatchRequest.js:58:10)
    at Axios.request (/home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/axios/lib/core/Axios.js:108:15)
    at Function.wrap [as request] (/home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/axios/lib/helpers/bind.js:9:15)
    at /home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/openai/dist/common.js:149:22
    at /home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/openai/dist/api.js:1738:133
    at runNextTicks (node:internal/process/task_queues:61:5)
    at listOnTimeout (node:internal/timers:528:9)
    at processTimers (node:internal/timers:502:7)
Failed to get summary for file turbo.py

autodoc.config.json:

{
  "name": "turbo-chat",
  "repositoryUrl": "https://github.com/creatorrr/turbo-chat",
  "root": ".",
  "output": "./.autodoc",
  "llms": [
    "gpt-3.5-turbo",
    "gpt-4"
  ],
  "ignore": [
    ".*",
    "*package-lock.json",
    "*package.json",
    "node_modules",
    "*dist*",
    "*build*",
    "*test*",
    "*.svg",
    "*.md",
    "*.mdx",
    "*.toml",
    "*autodoc*"
  ]
}

Failed to get Summary, Headers is not defined

Just ran this on a directory with the .gitignore folders excluded:

I get this on each file traversed:

ReferenceError: Headers is not defined
    at createRequest (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:234:21)
    at fetchAdapter (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:164:21)
    at dispatchRequest (/home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/axios/lib/core/dispatchRequest.js:58:10)
    at Axios.request (/home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/axios/lib/core/Axios.js:108:15)
    at Function.wrap [as request] (/home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/axios/lib/helpers/bind.js:9:15)
    at /home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/openai/dist/common.js:149:22
    at /home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/openai/dist/api.js:1738:133

at the end of it all:

Failed to get summary for file whitelists.py
✔ Processing 724 files...
⠋ Processing 168 folders... The provided folder path does not exist.
✔ Processing 168 folders... 
✔ Processing repository...
⠋ Creating markdown files...The provided folder path does not exist.
The provided folder path does not exist.
✔ Created 0 mardown files...
⠋ Create vector files...Error: ENOENT: no such file or directory, scandir '.autodoc/docs/markdown/'
    at Module.readdirSync (node:fs:1438:3)
    at processDirectory (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:29:20)
    at RepoLoader.load (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:57:22)
    at createVectorStore (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:62:34)
    at index (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/index.js:39:11) {
  errno: -2,
  syscall: 'scandir',
  code: 'ENOENT',
  path: '.autodoc/docs/markdown/'
}
Error: Could not read directory: .autodoc/docs/markdown/. Did you run `sh download.sh`?
    at processDirectory (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:33:15)
    at RepoLoader.load (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:57:22)
    at createVectorStore (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:62:34)
    at index (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/index.js:39:11)
✖ Create vector files...

Incremental re-indexing

Autodoc should support only indexing files and folders that have changed since the last index. At high-level, I think it looks something like this:

  1. Track the git sha at time of index.
  2. When indexing, compare files at last sha to current repository state.
  3. Calculate which branches have changes.
  4. Re-index changes branches.

If you're interested on this, please reach out.

Database can be shared?

Maybe a stupid question. So all content generated by autodoc and all contents autodoc required for querying is stored under the .autodoc folder, right? No other hidden local cache, right? As long as my server update this folder to my remote repo, I am able to share indexed database so that other developers don't have to manually index again?

Thanks!

OpenAI Base url?

Able to set my own openai base url?

Plan to support Azure API?

Azure OpenAI Support

Hey there.

I've noticed the langchain version being used is a bit old, and doesn't support azure open ai's properties for using it. Will there be an update in the near future for this support?

Support Alpaca and Llama models

Autodoc is currently reliant on OpenAI for access to cutting-edge language models. Going forward, we would like to support models running locally or at providers other than OpenAI, like Llama, or Alpaca. This gives developers more control over how their code is indexed, and allows indexing of private code that cannot be shared with OpenAI.

This is a big undertaking that will be an on-going process. A few thoughts for someone who wants to get starting hacking on this.

  1. It would be nice to be able to configure Autodoc with a LangChain LLM via the Autodoc config file. This would allow for complete control over how an LLM is configured.
  2. It seems like a lot of people are using llamma.cpp to run llamma locally. It may be worth using this as a starting point to support other models.

This issue is high priority. If you're interesting in working on it, please reach out.

incorrect links in references

I've noticed sometimes the reference links given are wrong for the source page, not sure how to address this 🤔

It's pulling the right resource, but all of these should link to that first link given. When I go to the generated markdown (see here https://github.com/duneanalytics/docs/blob/rework/.autodoc/docs/markdown/docs/api/FAQ/other.md), I don't see "All Ethereum and SQL Basics" referenced. So I don't know how it pulled it in as a link somehow.

Maybe the prompt "Always include a list of reference links to GitHub from the context. Links should ONLY come from the context." should be adjusted somehow? I can't follow how context is injected into the createChatChain prompt, maybe the context is mixing up content and source.

image

Error: Could not read directory: .autodoc/docs/markdown/. Did you run `sh download.sh`?

Followed the instruction as they're lined out. Everything should be setup correctly.

I keep getting Failed to get summary for file while indexing files, and at the end I get:

Error: Could not read directory: .autodoc/docs/markdown/. Did you run `sh download.sh`?
    at processDirectory (file:///opt/homebrew/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:33:15)
    at RepoLoader.load (file:///opt/homebrew/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:57:22)
    at createVectorStore (file:///opt/homebrew/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:62:34)
    at index (file:///opt/homebrew/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/index.js:39:11)
✖ Create vector files...

Improve querying experience

Right now the CLI querying experience is functional, but the UX is bad. Below are a few ideas for improvements. If you have your own ideas, please share them! Here are a few of my own.

  1. Currently, output is streamed token by token as plaintext. When the response is complete, we output the entire response as markdown using marked-terminal. It would be nice if we could figure out how to stream the response token by token as markdown. I'm not sure if this is possible using marked-terminal, but further investigation is required.
  2. When querying, the cursor sometime flickers. I'm not sure what is causing this, but it needs to be investigated and fixed.
  3. Querying responses currently wrap in the middle of a word. This makes them hard to read. We should fix this.
  4. It would be really cool to have a k9s style querying console for Autodoc. This would be a big undertaking that would 10x UX IMHO.

Error: TypeError: Cannot read properties of undefined (reading 'length')

I did a fresh install with yarn the project, and cloned a repo we had with the indices precomputed. Then imported the env var and ran doc q

Error:

TypeError: Cannot read properties of undefined (reading 'length')
    at makeQAPrompt (file:///Users/alfongj/.config/yarn/global/node_modules/@context-labs/autodoc/dist/cli/commands/query/createChatChain.js:26:14)
    at makeChain (file:///Users/alfongj/.config/yarn/global/node_modules/@context-labs/autodoc/dist/cli/commands/query/createChatChain.js:47:23)
    at query (file:///Users/alfongj/.config/yarn/global/node_modules/@context-labs/autodoc/dist/cli/commands/query/index.js:25:19)

Add support for querying multiple autodoc indexes at the same time.

Currently Autodoc can only query a package in which it has been directly installed. It would like to support querying dependencies and peer packages that have been distributed with an Autodoc index.

Dependency packages fairly easy. For example, in my autodoc.config.json, I could specify that I want Autodoc into node_modules for packages that have an .autodoc folder and include them when query. This allows for composability of documentation across the dependency graph of any given project.

It's slightly more tough for peer packages. You would need some way to define peers and then have Autodoc pull in their .autodoc indexes from somewhere external.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.