The astrapy from datastax

Adapt ops.py to the `client` param in make_request

A faulty commit has made into main.

Travis-based (or similar) discipline to ensure tests pass on commits

"Beta" milestone

Add a CHANGES.txt and move to a release cadence where several PRs are accumulated in each release.
Also move the status on setup.py (=> PyPI) to Beta

Check and bump requests_toolbelt

I would like to thoroughly test if the requests_toolbelt dependency can be bumped to ^1.0.0 (I'm almost sure it can).

this is critical to get LangChain integration (otherwise there are version conflicts)

Add user-agent header for python client requests

Better handling of endpoint URL and futureproofing

Switch to new "Token" header name in API calls

The new JSON API has renamed "X-Cassandra-Token" to "Token". astrapy should update accordingly

(the old name is supported for compatibility but better to stay up to date)

Investigate httpx and concurrency issues (i.e., langchain integration)

Raw response return in utils / make_requests

In:

    try:
        if return_type == "json":
            return r.json()
        else:
            return r

shouldn't it be r.text at the end?

Align to paging parameter rename in API

The API has merged the fix to a naming inconsistency, in short the pagination calls (once fix is deployed, i.e. post API v 1.0.0-BETA-3) ~~will return pageState instead of pagingState.~~

For astrapy, this means adapting the pagination method to look for the new name (or, perhaps even better, to fall back to the old name before stating there's no next page altogether)

error handling after each API call

Check each API response for the right shape (perhaps with automation to define the expected ok-shape)
and raise a verbose error if not

e.g. create collection currently can silently fail if 50 indexes (free tier)

Package "cassio" among the dependencies upon install

This makes it into the desired "one pip install" goal. Users would be instructed to "pip install astrapy" and they would get all required packages to go the CQL/CassIO route if they so wish.

Note for py 3.12: CassIO requires cassandra-driver. The latter would get pip-installed all right on 3.12 but then when imported there would be errors. We can ignore these (going to be addressed in ~1 month or so presumably). The important thing is that this added dependency does not break the main astrapy route, and it does not.

Only, as there is no ready wheel, the pip install proces is a bit longer on 3.12 (it maye be 2-3 minutes). But this would be resolved with a newer cassandra-driver supposedly going out soon.

Makefile/linter/codestyle support

Something like a ruff/black/mypy suite as an easy-to-use make command would encourage good writing style IMO

[core] methods mapped to HTTP calls should expose uniform options

some irregularities in what methods expose options, filter and so on. Check e.g. what methods would admit options and so on

Offer full support for customized/ non-prod DevOps API URL root

General problem: assuming prod in the devops API (url) vs keeping it simple for the happy path of prod usage.

Idea from a conversation:

There is no way to "spawn" a DBOps object from the general AstraDB. If there were one, then I would tell you:

regular usage would have users "spawn" an Ops from their main AstraDB instance, which would have an .ops() method spawning an AstraDBOps. That method would sound like this:


  def ops(self, dev_ops_url=None):
    if dev_ops_url is None and self.base_url = <production api url>:
        dev_ops_url = <prod OPS url>
   return AstraDBOps(self.api_key, dev_ops_url=dev_ops_url)```

Update README for newest client

Fix small formatting issues

HTTPX Issues with largely concurrent document inserts

Back out to requests for now until time for further investigation

[core] Most JSON-y parameters have "null-like" defaults despite no API support

Remove defaults, require the params be passed if required by API

Implement a high-level "Truncate" table operation

Grab the dimension of the existing vector table
Delete the existing collection
Create a new collection with the specified name and dimension

String repr of classes

Instead of <astrapy.db.AstraDBCollection at 0x79205e617c70> we would like to repr as e.g.

Astra DB Collection[name="Blabla", dimension=123]

or whatever.

Connection example when connecting to K8ssandra

The current example on the README assume the developer connects to AstraDB.

I would be nice to also have example to connect to plain cassandra deployed by K8ssandra.

Or maybe this is not in the goal of this library? The name astrapy looks like it is targeted at AstraDB?

REGRESSION: Upsert currently fails with existing doc id

Regression seems due to the propagation of a ValueError in the API response, when we were expecting a success response that we could parse.

Introduce "dev requirements" and tie the setup.py reqs to the main reqfile

Split into requirements.txt and requirements-dev.txt

The latter with pytest, ruff, etc

The former with the reqs we want to bundle as package deps ... plus the special -e . to self-install in a local dev env.

Solves the risk if misaligning setup.py and requirements.txt (also keeping dev stuff out of the build)

create_collection, and everything on it, should support nonvector collections

As of 0.6.0,

astra_db.create_collection("novectors")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/stefano/personal/WORK_Datastax/Code/other_github/astrapy/astrapy/db.py", line 798, in create_collection
    self._request(
  File "/home/stefano/personal/WORK_Datastax/Code/other_github/astrapy/astrapy/db.py", line 725, in _request
    raise ValueError(json.dumps(responsebody["errors"]))
ValueError: [{"message": "Cannot invoke \"java.lang.Integer.intValue()\" because the return value of \"io.stargate.sgv2.jsonapi.api.model.command.impl.CreateCollectionCommand$Options$VectorSearchConfig.dimension()\" is null"}]

the reason is that the method creates an incomplete options = {"vector": {}}, which makes the API angry.

Support for nonvectors in create_collection
go through all methods on collection and make sure they are ok with there not being a vector
specific tests

Vector-table creation name alignment

Per the latest JSON api specs (picture),

"size" => "dimensions"
"function" => "metric"

The current names are still supported but this is a good time to adapt to the new and final naming scheme.

Remove support for Python 3.7

Features/packages that depend on 3.8 and higher are present.

Tasks

Beta Give feedback

No tasks being tracked yet.

Options

Create Collection should return the collection object

So we don't have to make two separate calls

Bump version to 0.5.1 to prepare for next release

Final init/auth params naming

Suggestion: db_id => database_id everywhere.

Also (check with a broader set of folks) token => api_key (that was being discussed at length)

Name inconsistency `name` vs `collection`

Fix name inconsistency between name and collection (name is probably preferred, even better collection_name IMO):

      astra_client_vectordb1.create_collection(name="collection_test", size=5)
      astra_db_collection = AstraDbCollection(collection="collection_test")

None-safe handling of default namespace

Defaulting the namespace to "default_namespace" in the __init__ params is tricky.

One might use a pattern such as passing namespace = os.environ.get["NAMESPACE"] and end up with errors.
Better to have a None in the init signature and in the code convert Nones to "default_namespace"

Add vector_find() method with clearer parameter naming

Wrapper around find() with more "pythonic" syntactic sugar.

Revamp README

Improve error handling for requests

Errors in the requests.request method in utils.py do not work as intended: the try/except block there makes the make_request method return a JSON instead of a request object, so that callers (such as _request of AstraDBCollection) fail when they try the .json() method on these.

I just saw this thanks to someAstra API hiccup:

    create_collection_response = self.astra_db.create_collection(
  File "[...]/astrapy/db.py", line 360, in create_collection
    response = self._request(
  File "[...]/astrapy/db.py", line 329, in _request
    responsebody = response.json()
AttributeError: 'dict' object has no attribute 'json'

Option 1: remove the try/except altogether in make_request. Onus is on the client.

Option 2:

make_request returns the request.json() when this works, otherwise returns {"errors": [...]}, ready to be used by the caller just like the errors genuinely coming from the API
callers would do something like responsebody = make_request(...) and call it a day
Except: the DevOps case, which also uses make_request, is different - that one is based on http status codes and headers, the response being the empty string. Perhaps the safest thing is a flag to make_request behave as required there?

Support HTTP2 with the JSON API

This will involve a switch to httpx which has a number of other advantages over requests which is currently being used.

Support a deleteMany operation in the Python client

Rename api_key parameter to token

Based on JSON api developments

Pagination as a higher-level method

Building on top of the methods that are 1:1 to HTTP calls.

Would dissolve the responses and present a smooth iterable of individual Documents, lazily managing usage of the underlying calls as the iterable gets exhausted.

Emulate single-doc upsert

Emulate a single-document upsert logic:

try to insert
if "already existing ID" error => find one and replace (wholly)

Name upsert, signature TDB.

(no need for "single" or "one" in the name, all methods without _many are implicitly one-doc methods).

This would build on existing methods that are 1:1 to HTTP calls (and can assume error handling is in place)

Push and Pop have identical code

logging of all APi requests/responses

optional, maybe with a wrapper class or Logger.

Opt=in to log the vector values, a mechanism (on by default) to purge them from outputs for readability

More flexible test fixtures for single-method testing

The db and collection are independent fixtures right now, which makes it hard to e.g. select a single test method because it may happen that the collection is not created or similar (e.g. test_collection with -k test_insert_many or similar).

Make fixtures module-scoped and perhaps nest them or something, so that each method runs by itself all right
(also do not assume any order between test function calls)

Rename `size` parameter to `dimension`

Adapt the AstraDBOps to dev/prod/xxx environments

Currently the DEFAULT_HOST for the devops is harcoded and it's the prod one.

Dev shoud have DEFAULT_HOST = "https://api.dev.cloud.datastax.com".
So possibly a dict from the base_url to the devops corresponding url could be handy?
Unless there's a chance that the latter will even change. Anyway leaving a way for the user to specify a custom devops base host+url would be necessary.

remove python-dotenv from setup.py deps

It is used only in testing, no need to install it in the package

docstrings for user-facing methods

Protect against empty "filter" clause not accepted by the API

Currently the vector API breaks on find if empty dicts are passed for example for sort.
In other words, ..."sort": {}... in the payload causes errors, while omitting it works all right.
While this will be addressed, I suggest to clean out empty dicts from such places before issuing requests. I don't think it would hurt and it would get all the stack to work sooner.

datastax / astrapy Goto Github PK

astrapy's People

Contributors

Stargazers

Watchers

Forkers

astrapy's Issues

Tasks

Recommend Projects

Recommend Topics

Recommend Org