Giter Site home page Giter Site logo

ai-northstar-tech / vector-io Goto Github PK

View Code? Open in Web Editor NEW
181.0 4.0 25.0 4.46 MB

The only Vector tooling you'll need. Use the universal VDF format for vector datasets to easily export and import data from all vector databases

Home Page: https://vector-io.com

License: Apache License 2.0

Python 7.81% Jupyter Notebook 91.42% Shell 0.03% HTML 0.74% Ruby 0.01%
parquet vector-database vector-search-engine chromadb data-backup data-export data-import datastax huggingface huggingface-datasets

vector-io's People

Contributors

allcontributors[bot] avatar anush008 avatar dhruv-anand-aintech avatar flashblaze avatar horcruxno13 avatar jaelgu avatar jswortz avatar maghams62 avatar pre-commit-ci[bot] avatar qynikos avatar rajeshthallam avatar sweep-ai[bot] avatar tottenjordan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

vector-io's Issues

Sweep: Add Support for Turbopuffer

Documentation for Turbopuffer sdk: https://turbopuffer.com/docs/

  1. add turbopuffer[fast] to requirements.txt

  2. Upsert code:

import turbopuffer as tpuf

ns = tpuf.Namespace('namespace-name')
# If an error occurs, this call raises a tpuf.APIError if a retry was not successful.
ns.upsert(
  ids=[1, 2, 3, 4],
  vectors=[[0.1, 0.1], [0.2, 0.2], [0.3, 0.3], [0.4, 0.4]],
  attributes={
    'my-string': ['one', None, 'three', 'four'],
    'my-uint': [12, None, 84, 39],
    'my-string-array': [['a', 'b'], ['b', 'd'], [], ['c']]
  distance_metric='cosine_distance'
)
  1. Export code:
import turbopuffer as tpuf

ns = tpuf.Namespace('namespace-name')

# Cursor paging is handled automatically by the Python client
# If an error occurs, this call raises a tpuf.APIError if a retry was not successful.
for row in ns.vectors():
  print(row)
# VectorRow(id=1, vector=[0.1, 0.1], attributes={'key1': 'one', 'key2': 'a'})
# VectorRow(id=2, vector=[0.2, 0.2], attributes={'key1': 'two', 'key2': 'b'})
# VectorRow(id=3, vector=[0.3, 0.3], attributes={'key1': 'three', 'key2': 'c'})
# VectorRow(id=4, vector=[0.4, 0.4], attributes={'key1': 'four', 'key2': 'd'})

Follow the guidelines at AI-Northstar-Tech/vector-io#adding-a-new-vector-database to implement support for Turbopuffer in Vector-io

Follow the additions in PR: #77

Checklist of features for completion

  • Add mapping of distance metric names
  • Support local and cloud instances
  • Automatically create Python classes for index being exported
  • Export
    • Get all indexes by default
    • Option to Specify index names to export
    • DB-specific command line options (make_parser)
    • Allow input on terminal for each option above (via input() in python) export_vdb
    • Handle multiple vectors per row
  • Import
    • DB-specific command line options (make_parser)
    • Handle multiple vectors per row
    • Allow input on terminal for each option above (via input() in python) export_vdb
Checklist

Export from pinecone seems to be a little bit hacky

Thanks team for the great work!

When I go through pinecone's exporter, it seems that there is no guarantee that the exporter can get all data.
Any experiments about how fast and accurate we can get when we export from pinecone?

Qdrant import / collection not working

Details

curl -L -X POST 'http://localhost:6333/collections/my_imported_collection/points/search' -H 'Content-Type: application/json' \                                                                                                                                                                                      
                                                                                        --data-raw '{
                                                                                   "vector": [0.69,0.69,0.59,0.74,0.18,0.44,0.91,0.76,0.35,0.8,0.31,0.61,0.62,0.49,0.85,0.41,0.26,0.43,0.09,0.34,0.41,0.07,0.16,0.75,0.24,0.87,0.89,0.29,0.43,0.78,0.55,0.78,0.97,0.28,0.68,0.44,0.01,0.41,0.63,0.64,0.41,0.69,0.36,0.92,0.29,0.52,0.37,0.49,0.83,0.53,0.24,0.04,0.78,0.7,0.04,0.48,0.81,
0.33,0.39,0.24,0.26,0.68,0.26,0.02,0.69,0.38,0.76,0.67,0.65,0.83,0.28,0.98,0.59,0.49,0.67,0.42,0.4,0.11,0.1,0.94,0.89,0.45,0.73,0.87,0.76,0.7,0.06,0.45,0.95,0.27,0.3,0.84,0.11,0.06,0.6,0.94,0.56,0.68,0.99,0.33,0.22,0.71,0.49,0.6,0.84,0.63,0.71,0.1,0.96,0.59,0.9,0.92,0.5,0.03,0.65,0.39,0.96,0.72,0.87,0.02,0.38,0.61,0.91,0.34,0.95,0.47,0.82,0.81,0.47,0.31,0.72,0.27,0.88,0.47,0
.51,0.04,0.5,0.99,0.09,0.77,0.13,0.5,0.78,0.33,0.26,0.28,0.1,0.27,0.27,0.92,0.15,0.49,0.88,0.55,0.4,0.2,0.61,0.6,0.88,0.29,0.48,0.1,0.41,0.21,0.41,0.27,0.73,0.25,0.52,0.89,0.94,0.28,0.16,0.17,0.03,0.34,0.85,0.52,0.83,0.38,0.17,0.57,0.28,0.29,0.94,0.89,0.46,0.81,0.93,0.88,0.07,0.62,0.6,0.75,0.63,0.63,0.37,0.2,0.41,0.15,0.34,0.39,0.96,0.63,0.02,0.53,0.23,0.2,0.11,0.09,0.4,0.42
,0.64,0.68,0.78,0.13,0.45,0.47,0.32,0.95,0.66,0.41,0.02,0.22,0.16,0.1,0.51,0.31,0.3,0.41,0.85,0.36,0.85,0.27,0.26,0.96,0.73,0.62,0.99,0.96,0.47,0.3,0.78,0.53,0.46,0.24,0.39,0.58,0.64,0.4,0.68,0.02,0.86,0.04,0.38,0.11,0.69,0.9,0.4,0.88,0.14,0.96,0.65,0.74,0.32,0.59,0.83,0.22,0.91,0.55,0.7,0.87,0.23,0.19,0.94,0.98,0.16,0.86,0.76,0.18,0.43,0.18,0.69,0.07,0.31,0.52,0.93,0.91,0.9
7,0.32,0.15,0.98,0.23,0.36,0.48,0.18,0.56,0.77,0.21,0.87,0.65,0.1,0.3,0.52,0.57,0.9,0.65,0.62,0.94,0.96,0.33,0.24,0.89,0.02,0.75,0.77,0.1,0.75,0.84,0.49,0.15,0.19,0.37,0.12,0.2,0.56,0.99,0.44,0.74,0.08,0.52,0.36,0.07,0.07,0.78,0.8,0.39,0.79,0.58,0.16,0,0.46,0.38,0.05,0.26,0.18,0.27,0.21,0.57,0.07,0.86,0.54,0.31,0.25,0.14,0.56,0.14,0.98,0.06,0.14,0.76,0.6,0.93,0.58,0.9,0.18,0
.46,0.33,0.27,0.34,0.89,0.27,0.69,0.89,0.41,0.05,0.07,0.28,0.28,0.17,0.88,0.62,0.81,0.92,0.05,0.13,0.11,0.82,0.23,0.96,0.88,0.07,0.71,0.12,0.38,0.11,0.1,0.94,0.63,0.38,0.25,0.54,0.85,0.93,0.65,0.33,0.52,0.6,0.99,0.24,0.47,0.09,0.94,0.65,0.44,0.52,0.35,0.24,0.66,0.59,0.59,0.68,0.37,0.3,0.22,0.28,0.25,0.4,0.6,0.98,0.94,0.88,0.33,0.94,0.59,0.2,0.48,0.96,0.52,0.56,0.13,0.1,0.05,
0.14,0.97,0.14,0.35,0.67,0.36,0.22,0.58,0.29,0.85,0.07,0.18,0.77,0.5,0.13,0.51,0.11,0.92,0.53,0.34,0.85,0.63,0.7,0.07,0.31,0.12,0.64,0.47,0.56,0.17,0.54,0.68,0.95,0.2,0.3,0.12,0.42,0.83,0.42,0.23,0.34,0.7,0.66,0.77,0.15,0.97,0.22,0.26,0.6,0.99,0.67,0.1,0.82,0.03,0.72,0.8,0.88,0.59,0.71,0.77,0.65,0.88,0.59,0.6,0.24,0.22,0.08,0.17,0.69,0.34,0.63,0.24,0.94,0.17,0.12,0.08,0.12,0
.18,0.44,0.16,0.25,0.91,0.41,0.72,0.66,0.63,0.86,0.94,0.1,0.91,0.52,0.17,0.13,0.23,0.97,0.29,0.12,1,0.95,0.7,0.31,0.09,0.68,0.18,0.82,0.88,0.11,0.43,0.75,0.51,0.77,0.41,0.87,0.3,0.39,0.99,0.27,0.44,0.42,0.84,0.86,0.77,0.28,0.26,0.62,0.99,0.32,0.41,0.45,0.95,0.85,0.13,0.28,0.18,0.1,0.54,0.37,0.86,0.86,0.79,0.36,0.64,0.98,0.79,0.07,1,0.87,0.4,0.04,0.24,0.78,0.91,0.41,0.98,0.63
,0.73,0.49,0.35,0.55,0.87,0.24,0.26,0.14,0.67,0.1,0.97,0.09,0.44,0.04,0.87,0.24,0.4,0.49,0.32,0.5,0.44,0.91,0.57,0.67,0.39,0.22,0.81,0.28,0.14,0.73,0.93,0.27,0.63,0.67,0.06,0.89,0.8,0.82,0.09,0.81,0.32,0.16,0.19,0.71,0.03,0.84,0.19,0.89,0.83,0.52,0.61,0.79,0.2,0.54,0.64,0.13,0.46,0.24,0.15,0.84,0.31,0.43,0.52,0.4,0.49,0.32,0.63,0.05,0.69,0.67,0.48,0.94,0.89,0.02,0.28,0.4,0.3
4,0.34,0.03,0.02,0.65,0.85,0.78,0.72,0.47,0.67,0.32,0.03,0.11,0.27,0.8,0.59,0.58,0.83,0.92,0.87,0.95,0.61,0.06,0.72,0.13,0.63,0.47,0.59,0.1,0.95,0.75,0.37,0.63,0.99,0.58,0.1,0.37,0.8,0.36,0.05,0.92,0.76,0.41,0.99,0.03,0.06,0.09,0.28,0.86,0.99,0.69,0.52,0.62,0.79,0.33,0.12,0.7,0.58,0.56,0.02,0.46,0.77,0.88,0.09,0.37,0.51,0.87,0.45,0.5,0.45,0.42,0.43,0.64,0.66,0.13,0.94,0.35,0
.35,0.62,0.61,0.64,0.8,0.18,0.59,0.77,0.02,0.54,0.94,0.56,0.26,0.97,0.55,0.71,0.08,0.89,0.82,0.26,0.24,0.18,0.27,0.77,0.79,0.18,0.03,0.68,0.56,0.63,0.08,0.15,0.76,0.91,0.41,0.94,0.44,0.78,0.95,0.52,0.68,0.45,0.16,0.29,0.08,0.86,0.26,0.65,0.21,0.39,0.77,0.45,0.19,0.81,0.48,0.9,0.55,0.42,0.03,0.37,0.72,0.99,0.4,0.01,0.86,0.96,0.69,0.67,0.12,0.85,0.79,0.95,0.19,0.46,0.57,0.76,0
.82,0.63,0.51,0.86,0.29,0.05,0.24,0.42,0.99,0.13,0.55,0.73,0.63,0.24,0.94,0.67,0.34,0.79,0.47,0.89,0.98,0.98,0.99,0.32,0.93,0.42,0.95,0.74,0.64,0.33,0.98,0.64,0.35,0.91,0.4,0.9,0.97,0.22,0.82,0.65,0.47,0.61,0.94,0.24,0.42,0.64,0.24,0.63,0.17,0.39,0.42,0.31,0.84,0.34,0.41,0.78,0.13,0.02,0.63,0.87,0.58,0.37,0.62,0.31,0.9,0.02,0.47,0.12,0.39,0.84,0.96,0.55,0.24,0.21,0,0.96,0.56
,0.31,0.85,0.28,0.61,0.35,0.76,0.93,0.37,0.73,0.07,0.08,0.36,0.21,0.77,0.75,0.53,0.45,0.21,0.79,0.57,0.67,0.74,0.39,0.97,0.46,0.41,0.24,0.85,0,0.51,0.19,0.4,0.41,0.48,0.64,0.86,0.75,0.79,0.65,0.4,0.59,0.69,0.44,0.77,0.85,0.95,0.06,0.94,0.61,0.2,0.19,0.21,0.32,0.38,0.99,0.58,0.38,0.16,0.9,0.2,0.18,0.16,0.78,0.41,0.22,0.57,0.88,0.78,0.56,0.91,0.12,0.14,0.74,0.97,0.68,0.32,0.8,
0.56,0.93,0.15,0.4,0.93,0.33,0.28,0.67,0.5,0.93,0.89,0.54,0.79,0.87,0.74,0.25,0.66,0.42,0.03,0.24,0.06,0.36,0.54,0.41,0.05,0.05,0.39,0.24,0.15,0.89,0.67,0.84,0.47,0.3,0.9,0.79,0.58,0.1,0.95,0.49,0.78,0.75,0.93,0.42,0.53,0.17,0.12,0.01,0.34,0.65,0.46,0.32,0.04,0.15,0.03,0.6,0.29,0.07,0.87,0.29,0.6,0.6,0.15,0.14,0.29,0.52,0.34,0.58,0.44,0.85,0.23,0.08,0.26,0.21,0.02,0.4,0.89,0
.8,0.66,0.65,0.28,0.56,0.14,0.36,0.61,0.22,0.85,0.32,0.83,0.44,0.44,0.72,0.85,0.01,0.23,0.51,0.65,0.22,0.58,0.99,0.31,0.34,0.85,0.45,0.78,0.49,0.94,0.36,0.95,0.15,0.86,0.96,0.01,0.72,0.08,0.92,0.22,0.57,0.39,0.31,0.47,1,0.34,0.29,0.82,0.31,0.44,0.07,0.5,0.25,0.25,0.77,0.03,0.66,0.96,0.46,0.52,0.1,0.52,0.92,0.61,0.07,0.24,0.81,0.73,0.81,0,0.12,0.92,0.9,0.37,0.49,0.4,0.98,0.58
,0.35,0.59,0.11,0.05,0.17,0.61,0.85,0.07,0.33,0.5,0.97,0.87,0.22,0.63,1,0.1,0.28,0.15,0.49,0.5,0.25,0.68,0.8,0.24,0,0.05,0.59,0.54,0.88,0.49,0.17,0.36,0.03,0.46,0.03,0.31,0.46,0.39,0.55,0.05,0.92,0.42,0.37,0.87,0.49,0.52,0.21,0.2,0.86,0.49,0.75,0.97,0.65,0.57,0.74,0.72,0.77,0.51,0.04,0.02,0.08,0.52,0.5,0.66,0.39,0.85,0.63,0.11,0.37,0.51,0.85,0.03,0.19,0.65,0.65,0.33,0.28,0.7
1,0.68,0.88,0.31,0.99,0.52,0.08,0.83,0.59,0.22,0.31,0.86,0.44,0.59,0.4,0.61,0.33,0.3,0.23,0.83,0.02,0.8,0.98,0.11,0.35,0.48,0.45,0.58,0.23,0.18,0.23,0.36,0.16,0.35,0.88,0.25,0.39,0.9,0.03,0.81,0.69,0.2,0.36,0.61,0.85,0.17,0.43,0.5,0.57,0.95,0.94,0.67,0.27,0.42,0.76,0.63,0.41,0.46,0.99,0.37,0.56,0.21,0.32,0.98,0.7,0.7,0.32,0.93,0.34,0.84,0.34,0.94,0.18,0.31,0.07,0.59,0.98,0.9
1,0.3,0.06,0.17,0.73,0.72,0.93,0.5,0.04,0.22,0.54,0.06,0.21,0.09,0.77,0.67,0.94,0.39,0.66,0.93,0.03,0.11,0.92,0.47,0.81,0.38,0.03,0.06,0.84,0.66,0.01,0.82,0.68,0.86,0.97,0.15,0.99,0.29,0.56,0.41,0.4,0.11,0.65,0.34,0.47,0.1,0.22,0.56,0.32,0.74,0.18,0.15,0.42,0.69,0.18,0.17,0.07,0.98,0,0.86,0.92,0.56,0.53,0.12,0.01,0.58,0.72,0.28,0.59,0.17,0.95,0.21,0.29,0.78,0.45,0.19,0.26,0.
8,0.29,0.33,0.48,0.53,0.73,0.44,0.95,0.9,0.49,0.32,0,0.37,0.72,0.29,0.3,0.09,0.91,0.25,0.51,0.23,0.27,0.86,0.73,0.4,0.63,0.7,0.5,0.03,0.46,0.66,0.13,0.28,0.22,0.77,0.24,0.19,0.59,0.32,0.12,0.28,0.83,0.45,0.96,0.14,0.45,0.93,0.28,0.46,0.97,0.4,0.94,0.57,0.87,0.57,0.22,0.35,0.9,0.34,0.41,0.7,0.35,0.38,0.04,0.27,0.25,0.69,0.02,0.91,0.35,0.76,0.62,0.46,0.49,0.46,0.45,0.9,0.1,0.3
2,0.09,0.91,0.13,0.87,0.83,0.06,0.84,0.1,0.97,0.11,0.31,0.18,0.02,0.76,0.51,0.5,0.17,0.61,0.12,0.25,0.51,0.65,0.01,0.93,0.59,0.27,0.35,0.22,0.43,0.02,0.7,0.55,0.9,0.37,0.92,0.41,0.32,0.21,0.57,0.49,0.64,0.54,0.85,0.98,0.87,0.14,0.43,0.15,0.04,0.71,0.01,0.43,0.1,0.72,0.32,0.96,0.34,0.83,0.72,0.96,0.82,0.07,0.95,0,0.51,0.15,0.43,0.8,0.57,0.11,0.27,0.14,0.56,0.01,0.03,0.04,0.99
,0.92,0.49,0.39,0.64,0.13,0.82,0.66,0.1,0.94,0.47,0.61,0.3,0.3], "top": 3 }'
{"status":{"error":"Wrong input: Vector params for  are not specified in config"},"time":0.001418866}

the config.json of the collection looks strange to me:


{
    "params": {
        "vectors": {
            "vector": {
                "size": 1536,
                "distance": "Cosine"
            }
        },
        "shard_number": 1,
        "replication_factor": 1,
        "write_consistency_factor": 1,
        "on_disk_payload": true
    },
    "hnsw_config": {
        "m": 16,
        "ef_construct": 100,
        "full_scan_threshold": 10000,
        "max_indexing_threads": 0,
        "on_disk": false
    },
    "optimizer_config": {
        "deleted_threshold": 0.2,
        "vacuum_min_vector_number": 1000,
        "default_segment_number": 0,
        "max_segment_size": null,
        "memmap_threshold": null,
        "indexing_threshold": 20000,
        "flush_interval_sec": 5,
        "max_optimization_threads": null
    },
    "wal_config": {
        "wal_capacity_mb": 32,
        "wal_segments_ahead": 0
    },
    "quantization_config": null
}

Probably the reason for this issue is: "vectors": { "vector": { ... instead of

{
    "params": {
        "vectors": {
                "size": 1536,
                "distance": "Cosine"
        },
...}

Branch

No response

Checklist
  • Modify src/vdf_io/import_vdf/qdrant_import.py01405c0 Edit

Sweep: The export from pinecone fails due to some data type error

Details

Fetching namespaces: 0% 0/1 [02:54<?, ?it/s] Error: ("Could not convert '1719697028.0' with type str: tried to convert to double", 'Conversion failed for column created_at with type object') Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf_cli.py", line 89, in main run_export(span) File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf_cli.py", line 149, in run_export export_obj = slug_to_export_func[args["vector_database"]](args) File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/pinecone_export.py", line 164, in export_vdb pinecone_export.get_data() File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/pinecone_export.py", line 481, in get_data index_meta = self.get_data_for_index(index_name) File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/pinecone_export.py", line 575, in get_data_for_index total_size += self.save_vectors_to_parquet( File "/usr/local/lib/python3.10/dist-packages/vdf_io/export_vdf/vdb_export_cls.py", line 87, in save_vectors_to_parquet df.to_parquet(parquet_file) File "/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py", line 2970, in to_parquet return to_parquet( File "/usr/local/lib/python3.10/dist-packages/pandas/io/parquet.py", line 483, in to_parquet impl.write( File "/usr/local/lib/python3.10/dist-packages/pandas/io/parquet.py", line 189, in write table = self.api.Table.from_pandas(df, **from_pandas_kwargs) File "pyarrow/table.pxi", line 3874, in pyarrow.lib.Table.from_pandas File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 624, in dataframe_to_arrays arrays[i] = maybe_fut.result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 598, in convert_column raise e File "/usr/local/lib/python3.10/dist-packages/pyarrow/pandas_compat.py", line 592, in convert_column result = pa.array(col, type=type_, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 340, in pyarrow.lib.array File "pyarrow/array.pxi", line 86, in pyarrow.lib._ndarray_to_array File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: ("Could not convert '1719697028.0' with type str: tried to convert to double", 'Conversion failed for column created_at with type object') Exporting fluidaigpt-dev: 0% 0/1 [02:56<?, ?it/s] Final Step: Fetching vectors: 100% 14404/14404 [02:39<00:00, 90.24it/s]

Branch

No response

Add support for pgvector

Follow the guidelines at https://github.com/AI-Northstar-Tech/vector-io#adding-a-new-vector-database to implement support for PGVector in Vector-io

Join the Discord server for the library at https://discord.gg/RZbXha62Fg, and ask any questions on the #vector-io-dev channel.

Checklist of features for completion

  • Add mapping of distance metric names
  • Support local and cloud instances
  • Automatically create Python classes for index being exported
  • Export
    • Get all indexes by default
    • Option to Specify index names to export
    • DB-specific command line options (make_parser)
    • Allow input on terminal for each option above (via input() in python) export_vdb
    • Handle multiple vectors per row
  • Import
    • DB-specific command line options (make_parser)
    • Handle multiple vectors per row
    • Allow input on terminal for each option above (via input() in python) export_vdb

PGVector python client documentation: https://github.com/pgvector/pgvector-python

For any questions, contact @dhruv-anand-aintech on Linkedin (https://www.linkedin.com/in/dhruv-anand-ainorthstartech/)

Is it possible to skip namespaces with errors and move on to next ones?

Upon running the export script, I ran into this error and the script stopped. Is there any way to move on to the next namespace in case an error occurs while exporting?

Error: 1 validation error for NamespaceMeta██████████████████████████████████████████████████████████████████████████████████████████| 86/86 [00:02<00:00, 34.05it/s]
metric
  Field required [type=missing, input_value={'namespace': 'dev-clsika...ey96iff1y30/i1.parquet'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
Traceback (most recent call last):
  File "/home/neeraj/.local/lib/python3.10/site-packages/vdf_io/export_vdf_cli.py", line 56, in main
    run_export(span)
  File "/home/neeraj/.local/lib/python3.10/site-packages/vdf_io/export_vdf_cli.py", line 123, in run_export
    export_obj = slug_to_export_func[args["vector_database"]](args)
  File "/home/neeraj/.local/lib/python3.10/site-packages/vdf_io/export_vdf/pinecone_export.py", line 118, in export_vdb
    pinecone_export.get_data()
  File "/home/neeraj/.local/lib/python3.10/site-packages/vdf_io/export_vdf/pinecone_export.py", line 443, in get_data
    index_meta = self.get_data_for_index(index_name)
  File "/home/neeraj/.local/lib/python3.10/site-packages/vdf_io/export_vdf/pinecone_export.py", line 540, in get_data_for_index
    namespace_meta = NamespaceMeta(
  File "/home/neeraj/.local/lib/python3.10/site-packages/pydantic/main.py", line 171, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for NamespaceMeta
metric
  Field required [type=missing, input_value={'namespace': 'dev-clsika...ey96iff1y30/i1.parquet'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
Final Step: Fetching vectors: 172it [00:02, 64.60it/s]

KDB insert problem

Facing this issue while trying to upsert some data to kdb.ai

@qynikos, could you have a look? It's on v0.1.101

with this command:

import_vdf \        --id_column PMID \--subset \
        --max_num_rows 200 \
        --hf_dataset somewheresystems/dataclysm-pubmed \
        --vector_columns title_embedding,abstract_embedding \
        kdbai \
        --url https://cloud.kdb.ai/instance/n6qap7ddvz
Error: Error inserting chunk: Failed to insert data in table named: dataclysm_pubmed, because of: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>openresty/1.19.9.1</center>
</body>
</html>
.
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/kdbai_client/api.py", line 536, in insert
    return self.session._rest_post_qipc(Session.INSERT_PATH, self.name, data, True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/kdbai_client/api.py", line 341, in _rest_post_qipc
    res = request.urlopen(req)
          ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 563, in error
    return self._call_chain(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 502: Bad Gateway

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/vdf_io/import_vdf/kdbai_import.py", line 203, in upsert_data
    table.insert(chunk)
  File "/opt/homebrew/lib/python3.11/site-packages/kdbai_client/api.py", line 538, in insert
    raise KDBAIException(f'Failed to insert data in table named: {self.name}.', e=e)
kdbai_client.api.KDBAIException: Failed to insert data in table named: dataclysm_pubmed, because of: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>openresty/1.19.9.1</center>
</body>
</html>
.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/vdf_io/import_vdf_cli.py", line 53, in main
    run_import(span)
  File "/opt/homebrew/lib/python3.11/site-packages/vdf_io/import_vdf_cli.py", line 131, in run_import
    import_obj = slug_to_import_func[args["vector_database"]](args)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/vdf_io/import_vdf/kdbai_import.py", line 47, in import_vdb
    kdbai_import.upsert_data()
  File "/opt/homebrew/lib/python3.11/site-packages/vdf_io/import_vdf/kdbai_import.py", line 213, in upsert_data
    raise RuntimeError(f"Error inserting chunk: {e}")
RuntimeError: Error inserting chunk: Failed to insert data in table named: dataclysm_pubmed, because of: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>openresty/1.19.9.1</center>
</body>
</html>
.

Error: 'str' object has no attribute 'starts_with' while importing to Pinecone serverless

Hi, I exported data from pod based index and replaced a key and value in VDF_META.json so I can import it to a specific index in my serverless index:

Before

    "indexes": {
        "namespace1": [
            {
                "namespace": "dev-clsix3wzg00052eeep6bdgvcl",
                "index_name": "namespace1",
                "total_vector_count": 84,
                "exported_vector_count": 84,
                "dimensions": 1536,
                "model_name": "text-embedding-ada-002",
                "vector_columns": [
                    "vector"
                ],
                "data_path": "namespace1_dev-clsix3wzg00052eeep6bdgvcl/i1.parquet",
                "metric": "Cosine"
            }
        ]
    },

After

    "indexes": {
        "namespace1-serverless": [
            {
                "namespace": "dev-clsix3wzg00052eeep6bdgvcl",
                "index_name": "namespace1-serverless",
                "total_vector_count": 84,
                "exported_vector_count": 84,
                "dimensions": 1536,
                "model_name": "text-embedding-ada-002",
                "vector_columns": [
                    "vector"
                ],
                "data_path": "namespace1_dev-clsix3wzg00052eeep6bdgvcl/i1.parquet",
                "metric": "Cosine"
            }
        ]
    },

After trying to import via running import_vdf pinecone --serverless, I ran into this error:

Error: 'str' object has no attribute 'starts_with'
Traceback (most recent call last):
  File "/home/neeraj/.local/lib/python3.10/site-packages/vdf_io/import_vdf_cli.py", line 52, in main
    run_import(span)
  File "/home/neeraj/.local/lib/python3.10/site-packages/vdf_io/import_vdf_cli.py", line 129, in run_import
    import_obj = slug_to_import_func[args["vector_database"]](args)
  File "/home/neeraj/.local/lib/python3.10/site-packages/vdf_io/import_vdf/pinecone_import.py", line 80, in import_vdb
    pinecone_import.upsert_data()
  File "/home/neeraj/.local/lib/python3.10/site-packages/vdf_io/import_vdf/pinecone_import.py", line 172, in upsert_data
    parquet_files = self.get_parquet_files(final_data_path)
  File "/home/neeraj/.local/lib/python3.10/site-packages/vdf_io/import_vdf/vdf_import_cls.py", line 77, in get_parquet_files
    return get_parquet_files(data_path, self.args)
  File "/home/neeraj/.local/lib/python3.10/site-packages/vdf_io/util.py", line 196, in get_parquet_files
    if args.get("hf_dataset", None) or data_path.starts_with("hf://"):
AttributeError: 'str' object has no attribute 'starts_with'. Did you mean: 'startswith'?

Add Support for Weaviate

Follow the guidelines at https://github.com/AI-Northstar-Tech/vector-io#adding-a-new-vector-database to implement support for Weaviate in Vector-io

Join the Discord server for the library at https://discord.gg/RZbXha62Fg, and ask any questions on the #vector-io-dev channel.

Checklist of features for completion

  • Add mapping of distance metric names
  • Support local and cloud instances
  • Automatically create Python classes for index being exported
  • Export
    • Get all indexes by default
    • Option to Specify index names to export
    • DB-specific command line options (make_parser)
    • Allow input on terminal for each option above (via input() in python) export_vdb
    • Handle multiple vectors per row
  • Import
    • DB-specific command line options (make_parser)
    • Handle multiple vectors per row
    • Allow input on terminal for each option above (via input() in python) export_vdb

Pinecone Import: Multiple matches for FieldRef.Name(__filename) in id: string

Attached is one of the parquet files generated from a Pinecone export. When I try to re-import I get these errors regarding duplicate fields.

Multiple matches for FieldRef.Name(__filename) in id: string vector: list<element: double> __filename: string __ingested_at: string content_id: string filename: string ingested_at: string text: string __fragment_index: int32 __batch_index: int32 __last_in_fragment: bool __filename: string

i2.parquet.zip

"keep-alive" script for each cloud VDB

Free trials/free tiers of cloud VDB offerings have a fixed period of idling after which they take your free cluster/index offline. This script will do some no-op actions on the index to keep it alive. It can be run as a daily cron job.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.