Giter Site home page Giter Site logo

weaviate / weaviate-io Goto Github PK

View Code? Open in Web Editor NEW
47.0 18.0 111.0 717.88 MB

Website for the Weaviate vector database

Home Page: https://weaviate.io

JavaScript 10.72% Shell 0.48% SCSS 13.61% CSS 0.03% Python 14.11% TypeScript 7.41% MDX 51.71% Go 0.75% Java 1.12% Jupyter Notebook 0.06%
vector-database vector-search vector-search-engine weaviate website generative-search hybrid-search

weaviate-io's Introduction

How to Build this Website

Weaviate uses Docusaurus 2 to build our documentation. Docusaurus is a static website generator that runs under Node.js. We use a Node.js project management tool called yarn to install Docusaurus and to manage project dependencies.

If you do not have Node.js and yarn installed on your system, install them first.

Node.js Installation

Use the nvm package manager to install Node.js. The nvm project page provides an installation script.

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.5/install.sh | bash

After you install nvm, use nvm to install Node.js.

nvm install node

By default, nvm installs the most recent version of Node.js. Install Node.js 19.9.0 as well. Version 19.9.0 is more compatible with the current weaviate.io project dependencies.

nvm install node 19.9.0
nvm use 19.9.0

yarn Installation

Node.js includes the npm package manager. Use npm to install yarn.

npm install --global yarn

Get the Code

To contribute to this web site, first fork this repository and create a local copy to work on.

  1. Log into your Github account.

  2. Fork the upstream repository, https://github.com/weaviate/weaviate-io.

  3. Clone the repository to your local system.

    git clone [email protected]:YOUR-GITHUB-HANDLE/weaviate-io.git
    

    For details on cloning a repository, including setting up an SSH key, see the GitHub documentation.

  4. Set the remote tracking branch.

    git remote add upstream https://github.com/weaviate/weaviate-io.git
    
  5. Check the remotes.

    git remote -v
    
    # The output resembles:
    
    origin	https://github.com/YOUR-GITHUB-HANDLE/weaviate-io.git (fetch)
    origin	https://github.com/YOUR-GITHUB-HANDLE/weaviate-io.git (push)
    upstream	https://github.com/weaviate/weaviate-io.git (fetch)
    upstream	https://github.com/weaviate/weaviate-io.git (push)
    
  6. Configure a tracking branch.

    This step lets you track upstream changes while you work on your update. When you are ready to contribute your changes, create a pull request against the upstream/main branch.

    Get the upstream branches.

    git fetch upstream
    

    Add upstream/main as a tracking branch when you create a new project branch. You can use git checkout to set the tracking branch, or choose an alternative method that fits your workflow.

    git checkout -b your-update-branch-name upstream/main
    

Update Dependencies

Once you have a local copy of the repository, you need to install Docusaurus and the other project dependencies.

Switch to the project directory, then use yarn to update the dependencies.

cd weaviate.io
yarn install

You may see some warnings during the installation.

Start the yarn Server

When the installation completes, start the yarn server to test your build.

yarn start &

yarn builds the project as a static web site and starts a server to host it. yarn also opens a browser window connected to http://localhost:3000/ where you can see your changes.

Most changes are reflected live without having to restart the server.

If you run yarn start in the foreground (without the "&"), you have to open a second terminal to continue working on the command line. When you open a second terminal, be sure to set the correct Node.js version before running additional yarn commands.

nvm use node 19.9.0

Build the Web Site

This command generates static content into the build directory. You can use a hosting service to serve the static content.

yarn build

The build command is useful when you are finished editing. If you ran yarn start to start a local web server, you do not need to use yarn build to see you changes while you are editing.

The build command runs a link checker. If you are having trouble with temporarily broken links, you can update the URL_IGNORES variable to disable checking for that link.

To disable link checking, add the broken URL to the URL_IGNORES lists in these scripts:

Check the link again before you submit a merge request. If the link works, remove it from the URL_IGNORES list. If the link doesn't work, tell us about it in the pull request.

Deployment

Using SSH:

USE_SSH=true yarn deploy

Not using SSH:

GIT_USER=<Your GitHub username> yarn deploy

If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the gh-pages branch.

Documentation

Code examples

Code examples in the documentation are in one of two formats:

New format

In many files, you will see a format similar to:

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import FilteredTextBlock from '@site/src/components/Documentation/FilteredTextBlock';
import PyCode from '!!raw-loader!/_includes/code/howto/manage-data.create.py';
import TSCode from '!!raw-loader!/_includes/code/howto/manage-data.create.ts';

<Tabs groupId="languages">
  <TabItem value="py" label="Python">
    <FilteredTextBlock
      text={PyCode}
      startMarker="# ValidateObject START"
      endMarker="# ValidateObject END"
      language="py"
    />
  </TabItem>

  <TabItem value="js" label="JavaScript/TypeScript">
    <FilteredTextBlock
      text={TSCode}
      startMarker="// ValidateObject START"
      endMarker="// ValidateObject END"
      language="ts"
    />
  </TabItem>
</Tabs>

This makes use of our custom FilteredTextBlock JSX component.

Here, the FilteredTextBlock component loads lines between the startMarker and endMarker from the imported scripts. This allows us to write complete scripts, which may include tests to reduce occurrences of erroneous code examples.

For more information about tests, please see README-tests.md.

Legacy format

In some code examples, the code will be written directly inside the TabItem component, as shown below.

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

<Tabs groupId="languages">
<TabItem value="py" label="Python">

    ```python
    import weaviate

    client = weaviate.Client("http://localhost:8080")
    ```

</TabItem>
<TabItem value="js" label="JavaScript/TypeScript">

    ```
    import weaviate from 'weaviate-ts-client';

    const client = weaviate.client({
      scheme: 'http',
      host: 'localhost:8080',
    });
    ```

</TabItem>

... and any other tabs

</Tabs>

Your IDE will not pick up any errors in these examples, so please make sure to test the code in your preferred environment before editing or adding them here.

weaviate-io's People

Contributors

antas-marcin avatar ayushpattnaik avatar bobvanluijt avatar byronvoorbach avatar cshorten avatar dandv avatar databyjp avatar daveatweaviate avatar dirkkul avatar dudanogueira avatar erika-cardenas avatar etiennedi avatar glockenbeat avatar hsm207 avatar iamleonie avatar ibilalkayy avatar itsajchan avatar malgamves avatar marionnehring avatar parkerduckworth avatar roschler avatar sebawita avatar shan-weaviate avatar svitlana-sm avatar thomashacker avatar trengrj avatar tsmith023 avatar victorialslocum avatar weroiko avatar zainhas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

weaviate-io's Issues

Automated image creation for OG images

We currently have OG images for the documentation. These are now created manually but it would be great to have them automated.

Conceptually it's simple:

  1. OG location is set to something like: https://API-URL/?={{ page.og }} where page.og is set on individual pages in the docs.
  2. The Cloud Functions endpoint renders a PNG or JPG like the ones Donna now manually created like this.

Improve Documentation on Filters page

This page: https://weaviate.io/developers/weaviate/current/graphql-references/filters.html

Can be improved:

  • There should be an explanation about different behavior of string and text fields on this page.
  • There should be an explanation of word-boundary token splitting and the effects it has. Also with a link to weaviate/weaviate#1821 which will give the user full control over this process
  • The page names everything filters, even things that aren't filters. This included nearObject, nearVector, etc. But also limit etc. Those need to be called parameters or operators to reduce the confusion, since they do not filter anything. Especially in the case of a near<> operator, the result set is exactly the same as unfiltered, just in a different order.

Installation page docker-compose.yml URL points to inappropriate version format

Code of Conduct

What part of document/web-page on weaviate.io is affected?

Referring to:
https://weaviate.io/developers/weaviate/current/getting-started/installation.html
https://github.com/semi-technologies/weaviate-io/blob/main/developers/weaviate/current/getting-started/installation.md


The curl link seems point to v15.0.0 Weaviate (Helm version) rather than v1.15.0 (which I think is the intended behavior). As a result the below curl command results in an error.
curl -o docker-compose.yml "https://configuration.semi.technology/v2/docker-compose/docker-compose.yml?enterprise_usage_collector=false&modules=standalone&runtime=docker-compose&weaviate_version=v15.0.0"

Propose to fix this by:

Changing the weaviate_version=v{{ site.weaviate_versions[current_page_version].helm_version }} reference to weaviate_version={{ current_page_version }}.

Additional comments?

No response

Better batch documentation

Code of Conduct

What part of document/web-page on weaviate.io is affected?

What part of document/web-page on weaviate.io is affected?
https://weaviate.io/developers/weaviate/current/restful-api-references/batch.html

a) mention how references in single batch are handled. Currently it raises error (and @etiennedi) said that it's not a bug
(I've also created a feature request weaviate/weaviate#1951)

b) better explanation how batch works

c) add a link to Jupyter notebook with more examples and explanation https://github.com/semi-technologies/weaviate-examples/blob/main/getting-started-with-python-client-colab/Getting_Started_With_Weaviate_Python_Client.ipynb
... however even this notebook don't really tackles error handling in batch, and contains lot's of topics unrelated to batch processing, so maybe providing more complete examples to docs would be preferable

I'd strongly consider also merging the page with https://weaviate.io/developers/weaviate/current/tutorials/how-to-import-data.html

... I seems to be discusing the same concepts, there is strong overlap but, however both pages adds some more information to the topic...

Minimal effort solution would be also adding link to each other.

Additional comments?

No response

Add contributor guidelines to readme

For example how to open issues (make a template?), how to write commit messages (start with gh-xx to refer to issue the commit is tackling), how and when to make PR, etc

Missing /objects in uri in example

While going through this tutorial, in the final part of the "explore graphql" section I believe there's a missing part of the uri in here:
$ curl -s http://localhost:8080/v1/{id}
This didn't work for me (told me the path was not found) so I tweaked it, adding the '/objects' in the middle like this:
$ curl -s http://localhost:8080/v1/objects/{id}
what did work for me. Perhaps this happens due to a misconfiguration during my installation or it's just a silly docs issueπŸ™ƒ.

Suboptimal docs menu UX on mobile

Code of Conduct

What part of document/web-page on weaviate.io is affected?

The menu on mobile looks like this:

Screenshot 2022-07-07 at 22 09 25

If I now want to op (for example) "architecture" the page refreshes and I need to re-open the menu.

Additional comments?

No response

[Documentation Feedback]: Define the PersistentVolumeClass needed/recommended for Weaviate

Page URL

https://github.com/semi-technologies/weaviate-io/blob/main/developers/weaviate/current/configuration/backups-and-persistence.md#kubernetes

User feedback

We needed to dig quite a lot into problems that I've encountered just to find out that my k8s setup was using PersistentVolumeClass that was a combination of hard-drive with SSD cache that made the result performance vary.. (and might result in weaviate timeous )

You might want to note this here... since the only way how to discover this (if you won't use the standard HELM chart) is by analysing the helm chart it self.

Just add a note here:

https://github.com/semi-technologies/weaviate-io/blob/main/developers/weaviate/current/configuration/backups-and-persistence.md#kubernetes

that you need
spec:
storageClassName: premium-rwo

(at least for GCP... you might want to check it for AWS/Azure too)

Docker Compose page refers to outdated `docker-compose`

Code of Conduct

What part of document/web-page on weaviate.io is affected?

https://weaviate.io/developers/weaviate/current/installation/docker-compose.html

Additional comments?

I followed the current Docker install steps on Ubuntu, which now recommend Docker Desktop. That installation brings compose as a docker command, so the docker-compose up -d command from the Weaviate instructions will fail.

Clarify nomenclature of modules

Make the distinction clear between modules:

  • 'Retrievers' - which can be sparse (tf-idf or bm25), dense (using ML, DPR), or combination
  • 'Readers' (e.g. QnA which finds an answer in a given context, NER) and "Generators" (e.g. QnA model which generates a answer in a full sentence given a context, or a summarization model, etc)

Remove unused images

@bobvanluijt can we remove images that are not used currently? (There are a lot of images still in the repo that were previously used in the Playbook, but that isn't available on weaviate.io right?)

v1.14 Release Checklist

Pre-release

  • update default version to v1.14.0
    • docs
    • contributor guide
  • Monitoring
    • Add page about monitoring
    • Add weaviate-examples example
    • update list of all env vars
  • Custom distances
    • Add dedicated page about distances
    • Explain how to set distance in schema page
    • Replace certainty with distance (still list certainty for backward compatibility, but mark as deprecated)
  • API Namespacing
    • Update page with new beacon format
    • On all CRUD /v1/objects/ operation update to new format. Still list old format as deprecated for compatiblity sake
    • Update all client examples

Post-release

  • update config gen
  • update WCS
  • update docker hash in examples repo

[Documentation Feedback]: Add words to Glossary

Page URL

https://weaviate.io/developers/weaviate/current/more-resources/glossary.html

We should decide if this page should explain only Weaviate-specific terms, or general vector search/ML terms. Right now it includes both, e.g. WCS and HNSW.

Terms to add

  • ANN
  • BERT (also in the image here)
  • Bi-encoder
  • centroid
  • cosine similarity
  • Cross-encoder
  • dense vector embedding
  • embedding (and what other types exist besides vector embeddings)
  • feature projection
  • loss function
  • pooling (also in the image here)
  • rank fusion
  • sparse vector
  • vector (emphasized here)

Improvement suggestion ref2vec

Code of Conduct

What part of document/web-page on weaviate.io is affected?

ref2vec docs

Additional comments?

Based on feedback from user.

I do think some bits of documentation are missing or incomplete. I think there could be more documentation on references and the ref2vec component. Some specific examples would be examples of queries (like above). More documentation and examples around using batches and multithreading. And lastly more documentation and examples on querying with the JS library

Link: https://weaviate.slack.com/archives/C017EG2SL3H/p1670279360834049?thread_ts=1669719437.324829&channel=C017EG2SL3H&message_ts=1670279360.834049

[Minor] Javascript in "how to query" example correct?

Thanks for doing such an extensive job on documentation in different languages! Still having some trouble setting everything up, but that's more because of my weird setup.

In the examples for "How to query" I couldn't help but feel that the javascript example is either "wrong" or requires some explanation, as to why i the javascript implementation the query string seems to be made be "embedded" through string concatination inside another part? Which doesn't seem to be the case in any of the query examples in the other languages.

I'm talking about the following line in https://weaviate.io/developers/weaviate/current/tutorials/how-to-query-data.html

path: ['inPublication", "Publication", "name'],

Is the usage of ' and " on purpose?

README

I think this repo needs a readme, especially given that its public

It should

  • Tell you how to run the docs locally
  • Point to where they are deployed
  • Provide any other additional information that contributors may need

Automatically set desired language in docs

The website contains code blocks in different languages. For example, you can see one here

If a user selects a language, they have to do this for all individual drop-down menus.

Goal

  1. When a user clicks a language in the accordion, store the chosen language in a cookie
  2. On the current page, open up all accordions to that same language
  3. When a user visits another page, open the accordions to the chosen language
  4. Repeat steps 2 and 3 if a user clicks another language

The script can be added

total number of objects in class

Code of Conduct

What part of document/web-page on weaviate.io is affected?

I hope to get the total number of objects in a class, but not find the api on guide,
Only find a restful "GET /v1/objects?class={className}&limit={limit}" will return totalCount of limit.

Additional comments?

No response

object insert failed

Code of Conduct

What part of document/web-page on weaviate.io is affected?

Error Message:
insert data failed: [WeaviateErrorMessage(message=store: import into index statistics_09_33_302: shard statistics_09_33_302_E3GKYnbdHurc: update vector index: insert doc id 22054596 to vector index: find and connect neighbors: entrypoint was deleted in the object store, it has been flagged for cleanup and should be fixed in the next cleanup cycle)]

When I insert about 1000W data continuously, will get the log above, and the class statistics_09_33_302 data total count is still increasing, but the search result is empty

CPU: 48 Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz
Memory: 64GB
objects storage: 60GB
weaviate use memory: 42GB
objects total count: 17292274

Additional comments?

No response

Website takes really long to generate

Using bundle exec jekyll serve to show the website during developing takes really long (around 250 seconds). Is there a way to optimize this?

(I'm using bundle exec jekyll serve --incremental to quickly refresh page content, but of course this doesn't regenerate the site structure)

Issue with menu in non-current docs

Code of Conduct

What part of document/web-page on weaviate.io is affected?

When I'm using the documentation and I'm using a non-current page (e.g., developers/weaviate/v1.12.2) I see the following menu:

Screenshot 2022-07-06 at 15 01 20

When using current (i.e., developers/weaviate/current) it works as desired.

Screenshot 2022-07-06 at 15 01 12

Additional comments?

No response

[Documentation Feedback]: Proposal for improving the Getting Started in 10 Minutes Doc

Page URL

https://weaviate.io/developers/weaviate/current/getting-started/quick-start.html

User feedback

Hi everyone!
I've been playing around with Weaviate on a cloud VM and would like to offer some feedback on the Getting Started in 10 minutes guide to, hopefully, get future users up to speed faster.

What currently works and what does not

The current guide relies a lot on dockerization to get the user up and running quickly, which I think is good, but at the risk of abstracting too much, so that the user, after having completed the Getting Started tutorial, doesn't really know enough about ingesting data or how to create a schema to then be able to immediately take what they learnt from the tutorial and start playing on their own. The links at the end of the tutorial do list further resources such as ingesting your own data and creating your own schema, but many of them are quite involved (which is fair enough, given it's a complicated piece of software) and not in the spirit of "get me up and running with my own cool stuff as fast as possible" πŸƒβ€β™€οΈ

I think what you would ideally want to cover in a Getting Started guide are the basics of 1) installation 2) ingesting your own data 3) querying it/doing something cool with it in such a way that after the tutorial is done, the user can take this basic scaffold and plug their own data into it/start building on their own.

Proposal

Create a Getting Started guide that takes a simple external dataset, such as the IMDB reviews dataset from Hugging Face, creates a simple schema for it and then ingests it using the Python client. This way, we could distill and integrate some of the knowledge in this Hackernoon article https://hackernoon.com/what-is-weaviate-and-how-to-create-data-schemas-in-it-7hy3460 into the tutorial instead of having to redirect the users there.
I'm happy to take a first stab at this in a PR.

Add more info on How to get the website working locally.

The readme on how to have the website working locally should have all the steps of dependencies required to build the webpage locally
Some suggestions

  1. Add a basic description on what the website is based on - ruby for example and what packages are required for it.
    Example(From the web - Only for Demonstration purposes)
    A short description on bundler - Bundler provides a consistent environment for Ruby projects by tracking and installing the exact gems and versions that are needed.

  2. Steps on how to get bundler on your system.
    For example for ubuntu run

  • gem install bundler [Add gem is a package manager for Ruby]

The list goes on.

The point is to have all the steps required to make the website work on any system(Assuming the system is Fresh and the user doesn't knows about technical terms like bundle, gem etc).

Target change page - Reamde.md

[Documentation Feedback]: Wikipedia search demos fail in Firefox due to HTTPS-only mode

Page URL

https://weaviate.io/developers/weaviate/current/getting-started/query.html

User feedback

Firefox has HTTPS-only mode enabled by default. This causes query examples like https://link.semi.technology/3vEV5dD to fail with "NetworkError when attempting to fetch resource."

image

Disabling HTTPS-only works around this issue. Would help to remove this point of friction for Firefox users who may not realize what's going on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.