Giter Site home page Giter Site logo

datastax / astrapy Goto Github PK

View Code? Open in Web Editor NEW
17.0 7.0 19.0 1.73 MB

AstraPy is a Pythonic interface for DataStax Astra DB and the Data API

Home Page: https://github.com/datastax/astrapy

License: Apache License 2.0

Python 99.82% Shell 0.02% Makefile 0.16%
python datastax stargate astradb

astrapy's People

Contributors

bjchambers avatar bradfordcp avatar caseyclements avatar cbornet avatar clun avatar erichare avatar hemidactylus avatar jimdickinson avatar johnsmartco avatar kidrecursive avatar nicoloboschi avatar synedra avatar ykdojo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

astrapy's Issues

"Beta" milestone

Add a CHANGES.txt and move to a release cadence where several PRs are accumulated in each release.
Also move the status on setup.py (=> PyPI) to Beta

Check and bump requests_toolbelt

I would like to thoroughly test if the requests_toolbelt dependency can be bumped to ^1.0.0 (I'm almost sure it can).

this is critical to get LangChain integration (otherwise there are version conflicts)

Align to paging parameter rename in API

The API has merged the fix to a naming inconsistency, in short the pagination calls (once fix is deployed, i.e. post API v 1.0.0-BETA-3) will return pageState instead of pagingState.

For astrapy, this means adapting the pagination method to look for the new name (or, perhaps even better, to fall back to the old name before stating there's no next page altogether)

error handling after each API call

Check each API response for the right shape (perhaps with automation to define the expected ok-shape)
and raise a verbose error if not

e.g. create collection currently can silently fail if 50 indexes (free tier)

Package "cassio" among the dependencies upon install

This makes it into the desired "one pip install" goal. Users would be instructed to "pip install astrapy" and they would get all required packages to go the CQL/CassIO route if they so wish.

Note for py 3.12: CassIO requires cassandra-driver. The latter would get pip-installed all right on 3.12 but then when imported there would be errors. We can ignore these (going to be addressed in ~1 month or so presumably). The important thing is that this added dependency does not break the main astrapy route, and it does not.

Only, as there is no ready wheel, the pip install proces is a bit longer on 3.12 (it maye be 2-3 minutes). But this would be resolved with a newer cassandra-driver supposedly going out soon.

Offer full support for customized/ non-prod DevOps API URL root

General problem: assuming prod in the devops API (url) vs keeping it simple for the happy path of prod usage.

Idea from a conversation:

There is no way to "spawn" a DBOps object from the general AstraDB. If there were one, then I would tell you:

regular usage would have users "spawn" an Ops from their main AstraDB instance, which would have an .ops() method spawning an AstraDBOps. That method would sound like this:


  def ops(self, dev_ops_url=None):
    if dev_ops_url is None and self.base_url = <production api url>:
        dev_ops_url = <prod OPS url>
   return AstraDBOps(self.api_key, dev_ops_url=dev_ops_url)```

String repr of classes

Instead of <astrapy.db.AstraDBCollection at 0x79205e617c70> we would like to repr as e.g.

Astra DB Collection[name="Blabla", dimension=123]

or whatever.

Connection example when connecting to K8ssandra

The current example on the README assume the developer connects to AstraDB.

I would be nice to also have example to connect to plain cassandra deployed by K8ssandra.

Or maybe this is not in the goal of this library? The name astrapy looks like it is targeted at AstraDB?

Introduce "dev requirements" and tie the setup.py reqs to the main reqfile

Split into requirements.txt and requirements-dev.txt

The latter with pytest, ruff, etc

The former with the reqs we want to bundle as package deps ... plus the special -e . to self-install in a local dev env.

Solves the risk if misaligning setup.py and requirements.txt (also keeping dev stuff out of the build)

create_collection, *and everything on it*, should support nonvector collections

As of 0.6.0,

astra_db.create_collection("novectors")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/stefano/personal/WORK_Datastax/Code/other_github/astrapy/astrapy/db.py", line 798, in create_collection
    self._request(
  File "/home/stefano/personal/WORK_Datastax/Code/other_github/astrapy/astrapy/db.py", line 725, in _request
    raise ValueError(json.dumps(responsebody["errors"]))
ValueError: [{"message": "Cannot invoke \"java.lang.Integer.intValue()\" because the return value of \"io.stargate.sgv2.jsonapi.api.model.command.impl.CreateCollectionCommand$Options$VectorSearchConfig.dimension()\" is null"}]

the reason is that the method creates an incomplete options = {"vector": {}}, which makes the API angry.

  1. Support for nonvectors in create_collection
  2. go through all methods on collection and make sure they are ok with there not being a vector
  3. specific tests

Vector-table creation name alignment

Per the latest JSON api specs (picture),

  • "size" => "dimensions"
  • "function" => "metric"

The current names are still supported but this is a good time to adapt to the new and final naming scheme.

postman-vectors

Remove support for Python 3.7

Features/packages that depend on 3.8 and higher are present.

Tasks

No tasks being tracked yet.

Final init/auth params naming

Suggestion: db_id => database_id everywhere.

Also (check with a broader set of folks) token => api_key (that was being discussed at length)

Name inconsistency `name` vs `collection`

Fix name inconsistency between name and collection (name is probably preferred, even better collection_name IMO):

      astra_client_vectordb1.create_collection(name="collection_test", size=5)
      astra_db_collection = AstraDbCollection(collection="collection_test")

None-safe handling of default namespace

Defaulting the namespace to "default_namespace" in the __init__ params is tricky.

One might use a pattern such as passing namespace = os.environ.get["NAMESPACE"] and end up with errors.
Better to have a None in the init signature and in the code convert Nones to "default_namespace"

Improve error handling for requests

Errors in the requests.request method in utils.py do not work as intended: the try/except block there makes the make_request method return a JSON instead of a request object, so that callers (such as _request of AstraDBCollection) fail when they try the .json() method on these.

I just saw this thanks to someAstra API hiccup:

    create_collection_response = self.astra_db.create_collection(
  File "[...]/astrapy/db.py", line 360, in create_collection
    response = self._request(
  File "[...]/astrapy/db.py", line 329, in _request
    responsebody = response.json()
AttributeError: 'dict' object has no attribute 'json'

Option 1: remove the try/except altogether in make_request. Onus is on the client.

Option 2:

  • make_request returns the request.json() when this works, otherwise returns {"errors": [...]}, ready to be used by the caller just like the errors genuinely coming from the API
  • callers would do something like responsebody = make_request(...) and call it a day
  • Except: the DevOps case, which also uses make_request, is different - that one is based on http status codes and headers, the response being the empty string. Perhaps the safest thing is a flag to make_request behave as required there?

Pagination as a higher-level method

Building on top of the methods that are 1:1 to HTTP calls.

Would dissolve the responses and present a smooth iterable of individual Documents, lazily managing usage of the underlying calls as the iterable gets exhausted.

Emulate single-doc upsert

Emulate a single-document upsert logic:

  1. try to insert
  2. if "already existing ID" error => find one and replace (wholly)

Name upsert, signature TDB.

(no need for "single" or "one" in the name, all methods without _many are implicitly one-doc methods).

This would build on existing methods that are 1:1 to HTTP calls (and can assume error handling is in place)

logging of all APi requests/responses

optional, maybe with a wrapper class or Logger.

Opt=in to log the vector values, a mechanism (on by default) to purge them from outputs for readability

More flexible test fixtures for single-method testing

The db and collection are independent fixtures right now, which makes it hard to e.g. select a single test method because it may happen that the collection is not created or similar (e.g. test_collection with -k test_insert_many or similar).

Make fixtures module-scoped and perhaps nest them or something, so that each method runs by itself all right
(also do not assume any order between test function calls)

Adapt the AstraDBOps to dev/prod/xxx environments

Currently the DEFAULT_HOST for the devops is harcoded and it's the prod one.

Dev shoud have DEFAULT_HOST = "https://api.dev.cloud.datastax.com".
So possibly a dict from the base_url to the devops corresponding url could be handy?
Unless there's a chance that the latter will even change. Anyway leaving a way for the user to specify a custom devops base host+url would be necessary.

Protect against empty "filter" clause not accepted by the API

Currently the vector API breaks on find if empty dicts are passed for example for sort.
In other words, ..."sort": {}... in the payload causes errors, while omitting it works all right.
While this will be addressed, I suggest to clean out empty dicts from such places before issuing requests. I don't think it would hurt and it would get all the stack to work sooner.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.