context-labs / autodoc Goto Github PK

View Code? Open in Web Editor NEW

1.9K 1.9K 112.0 23.49 MB

Experimental toolkit for auto-generating codebase documentation using LLMs

License: MIT License

TypeScript 91.28% JavaScript 8.72%

cli-tool documentation-generator language-model typescript

autodoc's People

Contributors

Stargazers

Watchers

Forkers

tmanager22 touristshaun jolks valgazeghc tluyben tesla3327 zeropie aperire andrewhong5297 neoblackcap italoacasas dmitrinz gaylonalfano zanderadam andrea-mariadb-2 dezigns333 andrea-mariadb-1-1 andrea-mariadb-muzica-tech ssusantachary asolpshinning mrgnlabs elliotberry 0xdigiscore tractortoby furqanrydhan nilotaviano aeeladawy 0xdte ldl19691031 elena314 ari1945 convergence-rfq evanorti diegofornalha yangeokforks klaudioz commerceless davidsmooke christianpraiss fionnachan super-rain zhu-weijie eabdelmoneim will-cann anli001024 leemgs sineth23 miken666 ksylvan thinkifyai adeelahmad lorsque-sir yezhwi leeseon sanzond lyhiving droc12 keniushadu tianxin459 dattgoswami jfontestad georgienwankwo ctrlshiftbryan litanlitudan itsharex simbaninja917 apoorvcodes sam-m-israel rafaelvp-db simjak zeekay samyaza-geek oisee itsnadeem kuritkaj tousif101 sheth-g 0xturboblitz carlosug onemec hyperupscale fancyfoot jedwards1230 victorga evkoh joontju brunoscaglione james4ever0 didierhk xinhen chapter2admin jasonio02 lazydayz137 chenchao2408 akrichikov fastdaima telwha jeffara quintiontang donvargax

autodoc's Issues

no summary

I tried autodoc on multiple repositories including autodoc itself.

somehow it's not producing summary per folder. I tried default settings - gpt3.5, I tried all 3 positions, I even tried to edit manually the autodoc.config.json and to only leave gpt-4 as you have in autodoc repo. Still, I have md files for most source files, but no summary per folder.

tried on two machines ubuntu 20.04 and 22.04. with both nodejs18 and 19.

Illegal instruction error log

https://bun.report/1.1.15/B_1b23ba1fA+gigoQ24o6kE+xyQ4imwkD4rhwkDs4gl8Cogs76C08q2gE2pzr0Cg9zp0CA2AizB

LOG:

Bun v1.1.15 (b23ba1fe) Linux x64 (baseline)
Args: "kuzco" "worker" "start"
Features: jsc Bun.stderr Bun.stdin(2) Bun.stdout fetch(69120) spawn(994534) WebSocket(5)
Builtins: "bun:main" "node:assert" "node:async_hooks" "node:buffer" "node:child_process" "node:crypto" "node:events" "node:fs" "node:fs/promises" "node:http" "node:https" "node:module" "node:net" "node:os" "node:path" "node:perf_hooks" "node:process" "node:readline" "node:stream" "node:string_decoder" "node:timers" "node:tls" "node:tty" "node:url" "node:util" "node:util/types" "node:zlib" "node:worker_threads" "node:diagnostics_channel"
Elapsed: 199011231ms | User: 18345921ms | Sys: 1248191ms
RSS: 0.02ZB | Peak: 6.24GB | Commit: 0.02ZB | Faults: 42454

panic(main thread): Segmentation fault at address 0x331
oh no: Bun has crashed. This indicates a bug in Bun, not your code.

To send a redacted crash report to Bun's team,
please file a GitHub issue using the link below:

https://bun.report/1.1.15/B_1b23ba1fA+gigoQ24o6kE+xyQ4imwkD4rhwkDs4gl8Cogs76C08q2gE2pzr0Cg9zp0CA2AizB

Illegal instruction
root@LAPTOP-G0N03UML:~/kuzco# time=2024-06-25T02:47:09.354+08:00 level=WARN source=sched.go:575 msg="gpu VRAM usage didn't recover within timeout" seconds=5.099512273 model=/root/.kuzco/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
time=2024-06-25T02:47:09.657+08:00 level=WARN source=sched.go:575 msg="gpu VRAM usage didn't recover within timeout" seconds=5.402685842 model=/root/.kuzco/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
time=2024-06-25T02:47:10.015+08:00 level=WARN source=sched.go:575 msg="gpu VRAM usage didn't recover within timeout" seconds=5.760262256 model=/root/.kuzco/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29

root@LAPTOP-G0N03UML:~/kuzco#

Incremental re-indexing

Autodoc should support only indexing files and folders that have changed since the last index. At high-level, I think it looks something like this:

Track the git sha at time of index.
When indexing, compare files at last sha to current repository state.
Calculate which branches have changes.
Re-index changes branches.

If you're interested on this, please reach out.

incorrect links in references

I've noticed sometimes the reference links given are wrong for the source page, not sure how to address this 🤔

It's pulling the right resource, but all of these should link to that first link given. When I go to the generated markdown (see here https://github.com/duneanalytics/docs/blob/rework/.autodoc/docs/markdown/docs/api/FAQ/other.md), I don't see "All Ethereum and SQL Basics" referenced. So I don't know how it pulled it in as a link somehow.

Maybe the prompt "Always include a list of reference links to GitHub from the context. Links should ONLY come from the context." should be adjusted somehow? I can't follow how context is injected into the createChatChain prompt, maybe the context is mixing up content and source.

Support Alpaca and Llama models

Autodoc is currently reliant on OpenAI for access to cutting-edge language models. Going forward, we would like to support models running locally or at providers other than OpenAI, like Llama, or Alpaca. This gives developers more control over how their code is indexed, and allows indexing of private code that cannot be shared with OpenAI.

This is a big undertaking that will be an on-going process. A few thoughts for someone who wants to get starting hacking on this.

It would be nice to be able to configure Autodoc with a LangChain LLM via the Autodoc config file. This would allow for complete control over how an LLM is configured.
It seems like a lot of people are using llamma.cpp to run llamma locally. It may be worth using this as a starting point to support other models.

This issue is high priority. If you're interesting in working on it, please reach out.

ReferenceError: Headers is not defined

Tried running autodoc on a fresh repository and got this error at indexing time.

Steps to reproduce:

doc init to create an autodoc.config.json file (contents pasted below)
Run doc index and answer yes at the prompt

Stacktrace:

ReferenceError: Headers is not defined
    at createRequest (file:///home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:
234:21)
    at fetchAdapter (file:///home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:1
64:21)
    at dispatchRequest (/home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/axios/lib/core/dispatchRequest.js:58:10)
    at Axios.request (/home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/axios/lib/core/Axios.js:108:15)
    at Function.wrap [as request] (/home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/axios/lib/helpers/bind.js:9:15)
    at /home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/openai/dist/common.js:149:22
    at /home/diwank/.fnm/node-versions/v17.9.1/installation/lib/node_modules/@context-labs/autodoc/node_modules/openai/dist/api.js:1738:133
    at runNextTicks (node:internal/process/task_queues:61:5)
    at listOnTimeout (node:internal/timers:528:9)
    at processTimers (node:internal/timers:502:7)
Failed to get summary for file turbo.py

autodoc.config.json:

{
  "name": "turbo-chat",
  "repositoryUrl": "https://github.com/creatorrr/turbo-chat",
  "root": ".",
  "output": "./.autodoc",
  "llms": [
    "gpt-3.5-turbo",
    "gpt-4"
  ],
  "ignore": [
    ".*",
    "*package-lock.json",
    "*package.json",
    "node_modules",
    "*dist*",
    "*build*",
    "*test*",
    "*.svg",
    "*.md",
    "*.mdx",
    "*.toml",
    "*autodoc*"
  ]
}

Azure OpenAI Support

Hey there.

I've noticed the langchain version being used is a bit old, and doesn't support azure open ai's properties for using it. Will there be an update in the near future for this support?

Database can be shared?

Maybe a stupid question. So all content generated by autodoc and all contents autodoc required for querying is stored under the .autodoc folder, right? No other hidden local cache, right? As long as my server update this folder to my remote repo, I am able to share indexed database so that other developers don't have to manually index again?

Thanks!

folder summary.json url seems to include the `.autodoc/docs/json` in the path

Should be an easy issue to solve later, but for some reason when FolderSummary is saved it is carrying this path. It just needs to be substring or regex'ed out when the folder is processed.

Easy beginner issue

Could be used to index documents like pdfs and talk to them ?

Leverage LangChain Git Repo Loader

Given the dependency on LangChain, it would be great to see this repo use the latest document loader for GithubRepoLoader.

Add support for querying multiple autodoc indexes at the same time.

Currently Autodoc can only query a package in which it has been directly installed. It would like to support querying dependencies and peer packages that have been distributed with an Autodoc index.

Dependency packages fairly easy. For example, in my autodoc.config.json, I could specify that I want Autodoc into node_modules for packages that have an .autodoc folder and include them when query. This allows for composability of documentation across the dependency graph of any given project.

It's slightly more tough for peer packages. You would need some way to define peers and then have Autodoc pull in their .autodoc indexes from somewhere external.

OpenAI Base url?

Able to set my own openai base url?

Plan to support Azure API?

autodoc-ker: Repo for us to share OSS [Autodoc](https://github.com/context-labs/autodoc) indexes.

I threw something together to help share indexes with the team (for OSS repos that we don't maintain). I was wondering if you'd be interested in letting me park this in your org?

https://github.com/dahifi/autodoc-ker

Improve querying experience

Right now the CLI querying experience is functional, but the UX is bad. Below are a few ideas for improvements. If you have your own ideas, please share them! Here are a few of my own.

Currently, output is streamed token by token as plaintext. When the response is complete, we output the entire response as markdown using marked-terminal. It would be nice if we could figure out how to stream the response token by token as markdown. I'm not sure if this is possible using marked-terminal, but further investigation is required.
When querying, the cursor sometime flickers. I'm not sure what is causing this, but it needs to be investigated and fixed.
Querying responses currently wrap in the middle of a word. This makes them hard to read. We should fix this.
It would be really cool to have a k9s style querying console for Autodoc. This would be a big undertaking that would 10x UX IMHO.

Error: this.callbackManager.handleLLMError is not a function

After successfully indexing my repo, I keep getting error message
"this.callbackManager.handleLLMError is not a function"

what can I do ?

When Indexing, [TOO MANY REQUESTS] Keeps being thrown

Indexing as usual, after estimation, it runs for a bit then keeps throwing this error:

Failed to get summary for file github.py
⠹ Processing 724 files...Error: Request failed with status code 429
    at createError (file:///home/bewinxed/.nvm/versions/node/v18.15.0/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:302:19)
    at settle (file:///home/bewinxed/.nvm/versions/node/v18.15.0/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:24:16)
    at file:///home/bewinxed/.nvm/versions/node/v18.15.0/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:185:19
    at new Promise (<anonymous>)
    at fetchAdapter (file:///home/bewinxed/.nvm/versions/node/v18.15.0/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:177:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  config: {
    transitional: {
      silentJSONParsing: true,
      forcedJSONParsing: true,
      clarifyTimeoutError: false
    },
    adapter: [AsyncFunction: fetchAdapter],
    transformRequest: [ [Function: transformRequest] ],
    transformResponse: [ [Function: transformResponse] ],
    timeout: 0,
    xsrfCookieName: 'XSRF-TOKEN',
    xsrfHeaderName: 'X-XSRF-TOKEN',
    maxContentLength: -1,
    maxBodyLength: -1,
    validateStatus: [Function: validateStatus],
    headers: {
      Accept: 'application/json, text/plain, */*',
      'Content-Type': 'application/json',
      'User-Agent': 'OpenAI/NodeJS/3.2.1',
      Authorization: 'Bearer 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'
    },
    method: 'post',
    data: '{"model":"gpt-3.5-turbo","temperature":0.1,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":false,"messages":[{"role":"user","content":"\\n    You are acting as a code documentation expert for a project called RADAR.\\n    Below is the code from a file located at `RADAR`. \\n    Write a detailed technical explanation of what this code does. \\n    Focus on the high-level purpose of the code and how it may be used in the larger project.\\n    Include code examples where appropriate. Keep you response between 100 and 300 words. \\n    DO NOT RETURN MORE THAN 300 WORDS.\\n    Output should be in markdown format. \\n    Do not say \\"this file is a part of the RADAR project\\".\\n    Do not just list the methods and classes in this file.\\n\\n    Code:\\n    import json\\nfrom fastapi.encoders import jsonable_encoder\\nfrom typing import Optional\\nimport aiohttp\\nfrom fastapi import APIRouter, Query, Response\\n\\nfrom utils.lunaris import Lunaris\\n\\nAPI_KEY = \\"819a8443-a2fb-433f-83cd-7a47257bd548\\"\\n\\nrouter = APIRouter()\\n\\n\\[email protected](\\"/collection/find\\")\\nasync def find_collection_post(\\n    helloMoonCollectionId: str = None,\\n    collectionName: str = None,\\n):\\n    if all([helloMoonCollectionId, collectionName]):\\n        raise Exception(\\n            \\"You can only provide one of helloMoonCollectionId or collectionName\\"\\n        )\\n    if not any([helloMoonCollectionId, collectionName]):\\n        raise Exception(\\n            \\"You must provide either helloMoonCollectionId or collectionName\\"\\n        )\\n    return await Lunaris().find_collection(\\n        helloMoonCollectionId=helloMoonCollectionId, collectionName=collectionName\\n    )\\n\\n\\n    Response:\\n\\n  "}]}',
    url: 'https://api.openai.com/v1/chat/completions'
  },
  request: Request {
    [Symbol(realm)]: { settingsObject: [Object] },
    [Symbol(state)]: {
      method: 'POST',
      localURLsOnly: false,
      unsafeRequest: false,
      body: [Object],
      client: [Object],
      reservedClient: null,
      replacesClientId: '',
      window: 'client',
      keepalive: false,
      serviceWorkers: 'all',
      initiator: '',
      destination: '',
      priority: null,
      origin: 'client',
      policyContainer: 'client',
      referrer: 'client',
      referrerPolicy: '',
      mode: 'cors',
      useCORSPreflightFlag: false,
      credentials: 'same-origin',
      useCredentials: false,
      cache: 'default',
      redirect: 'follow',
      integrity: '',
      cryptoGraphicsNonceMetadata: '',
      parserMetadata: '',
      reloadNavigation: false,
      historyNavigation: false,
      userActivation: false,
      taintedOrigin: false,
      redirectCount: 0,
      responseTainting: 'basic',
      preventNoCacheCacheControlHeaderModification: false,
      done: false,
      timingAllowFailed: false,
      headersList: [HeadersList],
      urlList: [Array],
      url: [URL]
    },
    [Symbol(signal)]: AbortSignal { aborted: false },
    [Symbol(headers)]: HeadersList {
      cookies: null,
      [Symbol(headers map)]: [Map],
      [Symbol(headers map sorted)]: null
    }
  },
  response: {
    ok: false,
    status: 429,
    statusText: 'Too Many Requests',
    headers: HeadersList {
      cookies: null,
      [Symbol(headers map)]: [Map],
      [Symbol(headers map sorted)]: null
    },
    config: {
      transitional: [Object],
      adapter: [AsyncFunction: fetchAdapter],
      transformRequest: [Array],
      transformResponse: [Array],
      timeout: 0,
      xsrfCookieName: 'XSRF-TOKEN',
      xsrfHeaderName: 'X-XSRF-TOKEN',
      maxContentLength: -1,
      maxBodyLength: -1,
      validateStatus: [Function: validateStatus],
      headers: [Object],
      method: 'post',
      data: '{"model":"gpt-3.5-turbo","temperature":0.1,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":false,"messages":[{"role":"user","content":"\\n    You are acting as a code documentation expert for a project called RADAR.\\n    Below is the code from a file located at `RADAR`. \\n    Write a detailed technical explanation of what this code does. \\n    Focus on the high-level purpose of the code and how it may be used in the larger project.\\n    Include code examples where appropriate. Keep you response between 100 and 300 words. \\n    DO NOT RETURN MORE THAN 300 WORDS.\\n    Output should be in markdown format. \\n    Do not say \\"this file is a part of the RADAR project\\".\\n    Do not just list the methods and classes in this file.\\n\\n    Code:\\n    import json\\nfrom fastapi.encoders import jsonable_encoder\\nfrom typing import Optional\\nimport aiohttp\\nfrom fastapi import APIRouter, Query, Response\\n\\nfrom utils.lunaris import Lunaris\\n\\nAPI_KEY = \\"819a8443-a2fb-433f-83cd-7a47257bd548\\"\\n\\nrouter = APIRouter()\\n\\n\\[email protected](\\"/collection/find\\")\\nasync def find_collection_post(\\n    helloMoonCollectionId: str = None,\\n    collectionName: str = None,\\n):\\n    if all([helloMoonCollectionId, collectionName]):\\n        raise Exception(\\n            \\"You can only provide one of helloMoonCollectionId or collectionName\\"\\n        )\\n    if not any([helloMoonCollectionId, collectionName]):\\n        raise Exception(\\n            \\"You must provide either helloMoonCollectionId or collectionName\\"\\n        )\\n    return await Lunaris().find_collection(\\n        helloMoonCollectionId=helloMoonCollectionId, collectionName=collectionName\\n    )\\n\\n\\n    Response:\\n\\n  "}]}',
      url: 'https://api.openai.com/v1/chat/completions'
    },
    request: Request {
      [Symbol(realm)]: [Object],
      [Symbol(state)]: [Object],
      [Symbol(signal)]: [AbortSignal],
      [Symbol(headers)]: [HeadersList]
    },
    data: { error: [Object] }
  },
  isAxiosError: true,
  toJSO

ReferenceError: Headers is not defined

When running doc index I get the error like

Failed to get summary for file index.js.map
ReferenceError: Headers is not defined

Failed to find `autodoc.config.json` file. (first time running doc index)

running doc index for the first time gave:

Failed to find autodoc.config.json file. Are you in the right directory?
[Error: ENOENT: no such file or directory, open './autodoc.config.json'] {
errno: -2,
code: 'ENOENT',
syscall: 'open',
path: './autodoc.config.json'
}

[Error: ENOENT: no such file or directory] while indexing

I followed the steps to index my directory, which contains subfolders that also have git repos in them, but they are in the .gitignore file.

While running index, after 5 seconds i get this

Failed to find `autodoc.config.json` file. Did you run `doc init`?
[Error: ENOENT: no such file or directory, stat 'niftypay/mypython/bin/python'] {
  errno: -2,
  code: 'ENOENT',
  syscall: 'stat',
  path: 'niftypay/mypython/bin/python'
}

not sure what's wrong, niftypay is a subdirectory and it's in .gitignore

Recent commit breaks the build

@sts07142 the 2138bbcdc3071e9e2413a125d107332723a3890a change breaks the build:

$ npm install && npm run build                         

up to date, audited 419 packages in 632ms

133 packages are looking for funding
  run `npm fund` for details

9 vulnerabilities (1 low, 5 moderate, 3 high)

To address issues that do not require attention, run:
  npm audit fix

To address all issues (including breaking changes), run:
  npm audit fix --force

Run `npm audit` for details.

> @context-labs/[email protected] build
> tsc

src/cli/commands/index/processRepository.ts:124:41 - error TS2345: Argument of type 'LLMModels' is not assignable to parameter of type 'TiktokenModel'.

124     const encoding = encoding_for_model(model.name);
                                            ~~~~~~~~~~

src/cli/commands/index/selectModel.ts:78:41 - error TS2345: Argument of type 'LLMModels' is not assignable to parameter of type 'TiktokenModel'.

78     const encoding = encoding_for_model(model);
                                           ~~~~~


Found 2 errors in 2 files.

Errors  Files
     1  src/cli/commands/index/processRepository.ts:124
     1  src/cli/commands/index/selectModel.ts:78

Ingesting/Parsing Multiple Libraries

It would be great to have Autodoc ingest/parse two or more repositories/libraries, then respond in a way that combines docs in a single chat experience.

For example:

Failed to get Summary, Headers is not defined

Just ran this on a directory with the .gitignore folders excluded:

I get this on each file traversed:

ReferenceError: Headers is not defined
    at createRequest (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:234:21)
    at fetchAdapter (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:164:21)
    at dispatchRequest (/home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/axios/lib/core/dispatchRequest.js:58:10)
    at Axios.request (/home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/axios/lib/core/Axios.js:108:15)
    at Function.wrap [as request] (/home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/axios/lib/helpers/bind.js:9:15)
    at /home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/openai/dist/common.js:149:22
    at /home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/node_modules/openai/dist/api.js:1738:133

at the end of it all:

Failed to get summary for file whitelists.py
✔ Processing 724 files...
⠋ Processing 168 folders... The provided folder path does not exist.
✔ Processing 168 folders... 
✔ Processing repository...
⠋ Creating markdown files...The provided folder path does not exist.
The provided folder path does not exist.
✔ Created 0 mardown files...
⠋ Create vector files...Error: ENOENT: no such file or directory, scandir '.autodoc/docs/markdown/'
    at Module.readdirSync (node:fs:1438:3)
    at processDirectory (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:29:20)
    at RepoLoader.load (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:57:22)
    at createVectorStore (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:62:34)
    at index (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/index.js:39:11) {
  errno: -2,
  syscall: 'scandir',
  code: 'ENOENT',
  path: '.autodoc/docs/markdown/'
}
Error: Could not read directory: .autodoc/docs/markdown/. Did you run `sh download.sh`?
    at processDirectory (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:33:15)
    at RepoLoader.load (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:57:22)
    at createVectorStore (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:62:34)
    at index (file:///home/bewinxed/.nvm/versions/node/v16.19.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/index.js:39:11)
✖ Create vector files...

`checksum` is not being computed

This PR helped in implementation of only indexing modified files. However, the checksum is not being generated in json files.

Dialog with Autodoc using SshNet

Hello all
I'm trying to build a webApp to communicate my server that is running Autodoc.
At server side I have cloned, initialized and indexed some repos.

I wish to build a portal to have a dialog with my server, using SshNet.

I have many issues,
and cant see why I'm failing.

Any help?

this is my code I have been trying.

        public void NavigateToFolder(string folder)
        {
            try 
            {
                ShellStream stream = _client.CreateShellStream("commands", 0, 0, 0, 0, 1024);
                sendCommand("cd autodoc-poc/<My-Repo>", stream);
                sendCommand("doc q", stream);
           }

            catch (SshException sshException) 
            {
            }
        }

        public StringBuilder sendCommand(string customCMD, ShellStream stream)
        {
            StringBuilder answer;

            var reader = new StreamReader(stream);
            var writer = new StreamWriter(stream);
            writer.AutoFlush = true;
            WriteStream(customCMD, writer, stream);
            answer = ReadStream(reader);
            return answer;
        }

        private void WriteStream(string cmd, StreamWriter writer, ShellStream stream)
        {
            writer.WriteLine(cmd);
            while (stream.Length == 0)
            {
                Thread.Sleep(500);
            }
        }

        private StringBuilder ReadStream(StreamReader reader)
        {
            StringBuilder result = new StringBuilder();

            string line;
            while ((line = reader.ReadLine()) != null)
            {
                result.AppendLine(line);
            }
            return result;
        }

Create a `doc faq` script, that takes in a list of questions and generates an FAQ.md file (after the repo has been indexed)

A user has requested to be able to generate an FAQ file given a list of questions easily. This could be implemented as a new CLI command, and the list of questions can be added to the config as an array or something.

"The idea is to process the repo with autodoc, the generated MD files have some auto generated questions about the code, we would love to be able to add questions from our community and get them answered if answer to that question can be generated from the code."

This is a great beginner issue.

Error during traversal: The text contains a special token that is not allowed

When I run doc index on the langchain repository, I receive the following error:

⠇ Processing 494 files...Error during traversal: The text contains a special token that is not allowed: <|endoftext|>
Failed to find `autodoc.config.json` file. Did you run `doc init`?
Error: The text contains a special token that is not allowed: <|endoftext|>
    at module.exports.__wbindgen_error_new (/usr/local/Cellar/node/19.8.1/lib/node_modules/@context-labs/autodoc/node_modules/@dqbd/tiktoken/tiktoken_bg.cjs:398:17)
    at wasm://wasm/00b63e2e:wasm-function[15]:0xebb8
    at wasm://wasm/00b63e2e:wasm-function[154]:0x48af5
    at Tiktoken.encode (/usr/local/Cellar/node/19.8.1/lib/node_modules/@context-labs/autodoc/node_modules/@dqbd/tiktoken/tiktoken_bg.cjs:257:18)
    at processFile (file:///usr/local/Cellar/node/19.8.1/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/processRepository.js:24:40)
    at async file:///usr/local/Cellar/node/19.8.1/lib/node_modules/@context-labs/autodoc/dist/cli/utils/traverseFileSystem.js:42:21
    at async Promise.all (index 2)
    at async dfs (file:///usr/local/Cellar/node/19.8.1/lib/node_modules/@context-labs/autodoc/dist/cli/utils/traverseFileSystem.js:38:13)
    at async file:///usr/local/Cellar/node/19.8.1/lib/node_modules/@context-labs/autodoc/dist/cli/utils/traverseFileSystem.js:25:21
    at async Promise.all (index 0)

I believe this is an issue with autodoc, rather than the langchain repository, as I have followed the instructions in the README file and run doc init in the langchain repository before running doc index.

Here is some information about my environment:

Operating system: macOS Monterey 12.6.3 (21G419)
Node.js version: v19.8.1

Please let me know if there is any additional information I can provide or steps I can take to resolve this issue.

Thinking...Something went wrong: this.callbackManager.handleLLMError is not a function

After executing doc q, I received an error message "Thinking...Something went wrong: this.callbackManager.handleLLMError is not a function", which was working fine before.

404s back from OpenAI

Any idea why I'd be seeing these 404s?

Failed to get summary for file TokenLib.sol
⠹ Processing 26 files...Error: Request failed with status code 404
    at createError (file:///home/dom/src/_AC/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:302:19)
    at settle (file:///home/dom/src/_AC/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:24:16)
    at file:///home/dom/src/_AC/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:185:19
    at new Promise (<anonymous>)
    at fetchAdapter (file:///home/dom/src/_AC/autodoc/node_modules/langchain/dist/util/axios-fetch-adapter.js:177:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  config: {
    transitional: {
      silentJSONParsing: true,
      forcedJSONParsing: true,
      clarifyTimeoutError: false
    },
    adapter: [AsyncFunction: fetchAdapter],
    transformRequest: [ [Function: transformRequest] ],
    transformResponse: [ [Function: transformResponse] ],
    timeout: 0,
    xsrfCookieName: 'XSRF-TOKEN',
    xsrfHeaderName: 'X-XSRF-TOKEN',
    maxContentLength: -1,
    maxBodyLength: -1,
    validateStatus: [Function: validateStatus],
    headers: {
      Accept: 'application/json, text/plain, */*',
      'Content-Type': 'application/json',
      'User-Agent': 'OpenAI/NodeJS/3.2.1',
      Authorization: 'Bearer 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'
    },
    method: 'post',
    data: '{"model":"gpt-4","temperature":0.1,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":false,"messages":[{"role":"user","content":"\\n    You are acting as a code documentation expert for a project called stm_sol.\\n    Below is the code from a file located at `stm_sol`. \\n    Write a detailed technical explanation of what this code does. \\n      Focus on the high-level purpose of the code and how it may be used in the larger project.\\n      Include code examples where appropriate. Keep you response between 100 and 300 words. \\n      DO NOT RETURN MORE THAN 300 WORDS.\\n      Output should be in markdown format.\\n      Do not just list the methods and classes in this file.\\n    Do not say \\"this file is a part of the stm_sol project\\".\\n\\n    code:\\n    // SPDX-License-Identifier: AGPL-3.0-only - (c) AirCarbon Pte Ltd - see /LICENSE.md for Terms\\n// Author: https://github.com/7-of-9\\n// Certik (AD): locked compiler version\\npragma solidity 0.8.5;\\n\\nimport \\"../Interfaces/StructLib.sol\\";\\n\\nimport \\"../StMaster/StMaster.sol\\";\\n\\nlibrary TransferLib {\\n    event TransferedFullSecToken(address indexed from, address indexed to, uint256 indexed stId, uint256 mergedToSecTokenId, uint256 qty, StructLib.TransferType transferType);\\n    event

Error: Could not read directory: .autodoc/docs/markdown/. Did you run `sh download.sh`?

Followed the instruction as they're lined out. Everything should be setup correctly.

I keep getting Failed to get summary for file while indexing files, and at the end I get:

Error: Could not read directory: .autodoc/docs/markdown/. Did you run `sh download.sh`?
    at processDirectory (file:///opt/homebrew/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:33:15)
    at RepoLoader.load (file:///opt/homebrew/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:57:22)
    at createVectorStore (file:///opt/homebrew/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/createVectorStore.js:62:34)
    at index (file:///opt/homebrew/lib/node_modules/@context-labs/autodoc/dist/cli/commands/index/index.js:39:11)
✖ Create vector files...

Add support for GPT4ALL models

https://github.com/nomic-ai/gpt4all

Use local working directory instead of a GitHub URL

Hi,

It would be great to be able to specify the local git repo in the current working directory instead of having to use a GitHub repository.

Thanks!

Error: TypeError: Cannot read properties of undefined (reading 'length')

I did a fresh install with yarn the project, and cloned a repo we had with the indices precomputed. Then imported the env var and ran doc q

Error:

TypeError: Cannot read properties of undefined (reading 'length')
    at makeQAPrompt (file:///Users/alfongj/.config/yarn/global/node_modules/@context-labs/autodoc/dist/cli/commands/query/createChatChain.js:26:14)
    at makeChain (file:///Users/alfongj/.config/yarn/global/node_modules/@context-labs/autodoc/dist/cli/commands/query/createChatChain.js:47:23)
    at query (file:///Users/alfongj/.config/yarn/global/node_modules/@context-labs/autodoc/dist/cli/commands/query/index.js:25:19)

Allow for configuration of indexing strategy.

Autodoc currently only supports indexing a file using the most affordable models available in a projects autodoc.config.json. Ideally, we should allow for different types of indexing strategies. At very least, there should be a option to use the "best available" models, which would tell Autodoc to always choose the most powerful models that have been configured.

This is an evolving issue. Please reach out on Discord if you're interested in contributing.

Related #8.