Giter Site home page Giter Site logo

streaming-client's Introduction

DagsHub Client


Tests pip License Python Version DagsHub Docs DagsHub Client Docs

DagsHub Sign Up Discord DagsHub on Twitter

What is DagsHub?

DagsHub is a platform where machine learning and data science teams can build, manage, and collaborate on their projects. With DagsHub you can:

  1. Version code, data, and models in one place. Use the free provided DagsHub storage or connect it to your cloud storage
  2. Track Experiments using Git, DVC or MLflow, to provide a fully reproducible environment
  3. Visualize pipelines, data, and notebooks in and interactive, diff-able, and dynamic way
  4. Label your data directly on the platform using Label Studio
  5. Share your work with your team members
  6. Stream and upload your data in an intuitive and easy way, while preserving versioning and structure.

DagsHub is built firmly around open, standard formats for your project. In particular:

Therefore, you can work with DagsHub regardless of your chosen programming language or frameworks.

DagsHub Client API & CLI

This client library is meant to help you get started quickly with DagsHub. It is made up of Experiment tracking and Direct Data Access (DDA), a component to let you stream and upload your data.

For more details on the different functions of the client, check out the docs segments:

  1. Installation & Setup
  2. Data Streaming
  3. Data Upload
  4. Experiment Tracking
    1. Autologging
  5. Data Engine

Some functionality is supported only in Python.

To read about some of the awesome use cases for Direct Data Access, check out the relevant doc page.

Installation

pip install dagshub

Direct Data Access (DDA) functionality requires authentication, which you can easily do by running the following command in your terminal:

dagshub login

Quickstart for Data Streaming

The easiest way to start using DagsHub is via the Python Hooks method. To do this:

  1. Your DagsHub project,
  2. Copy the following 2 lines of code into your Python code which accesses your data:
    from dagshub.streaming import install_hooks
    install_hooks()
  3. That’s it! You now have streaming access to all your project files.

🀩 Check out this colab to see an example of this Data Streaming work end to end:

Open In Colab

Next Steps

You can dive into the expanded documentation, to learn more about data streaming, data upload and experiment tracking with DagsHub


Analytics

To improve your experience, we collect analytics on client usage. If you want to disable analytics collection, set the DAGSHUB_DISABLE_ANALYTICS environment variable to any value.

Made with 🐢 by DagsHub.

streaming-client's People

Contributors

arjvik avatar deanp70 avatar guysmoilov avatar idonov8 avatar jacob-zietek avatar jinensetpal avatar kbolashev avatar krishnaduttpanchagnula avatar martintali avatar mohithg avatar nirbarazida avatar pyup-bot avatar sdafni avatar simonlsk avatar talmalka123 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

streaming-client's Issues

Develop CLI Interface

Ensure we fix:
[ ] Mounting / Unmounting interface.
[ ] Identify if FUSE / monkeypatching should be used for interfacing

Metadata field can't handle strings longer than 255 characters

I tried to upload image captions as a metadata point. The idea being I could then filter the dataset based on the contents of the captions. I ran into an error:

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 542247/542247 [00:00<00:00, 2294043.12it/s]
  0%|                                                                                                                                                                                                                                                                   | 0/1 [00:00<?, ?it/s]
---------------------------------------------------------------------------
TransportQueryError                       Traceback (most recent call last)
Cell In[13], line 15
     12 for start in tqdm(range(0, total, batch)):
     13     data = all_metadata[start:start+batch]
---> 15     with ds.metadata_context() as ctx, open(annotations_file) as f:
     16         for image, metadata in data:
     17             ctx.update_metadata(image, metadata)

File ~/.miniforge3/envs/dagstest/lib/python3.10/contextlib.py:142, in _GeneratorContextManager.__exit__(self, typ, value, traceback)
    140 if typ is None:
    141     try:
--> 142         next(self.gen)
    143     except StopIteration:
    144         return False

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/model/datasource.py:118, in Datasource.metadata_context(self)
    116 ctx = MetadataContextManager(self)
    117 yield ctx
--> 118 self._upload_metadata(ctx.get_metadata_entries())

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/model/datasource.py:183, in Datasource._upload_metadata(self, metadata_entries)
    182 def _upload_metadata(self, metadata_entries: List[DatapointMetadataUpdateEntry]):
--> 183     self.source.client.update_metadata(self, metadata_entries)

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/client/data_client.py:109, in DataClient.update_metadata(self, datasource, entries)
    102 assert len(entries) > 0
    104 params = GqlMutations.update_metadata_params(
    105     datasource_id=datasource.source.id,
    106     datapoints=[e.to_dict() for e in entries]
    107 )
--> 109 return self._exec(q, params)

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/client/data_client.py:82, in DataClient._exec(self, query, params)
     80     logger.debug(f"Params: {params}")
     81 q = gql.gql(query)
---> 82 resp = self.client.execute(q, variable_values=params)
     83 return resp

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/gql/client.py:403, in Client.execute(self, document, variable_values, operation_name, serialize_variables, parse_result, get_execution_result, **kwargs)
    400     return data
    402 else:  # Sync transports
--> 403     return self.execute_sync(
    404         document,
    405         variable_values=variable_values,
    406         operation_name=operation_name,
    407         serialize_variables=serialize_variables,
    408         parse_result=parse_result,
    409         get_execution_result=get_execution_result,
    410         **kwargs,
    411     )

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/gql/client.py:221, in Client.execute_sync(self, document, variable_values, operation_name, serialize_variables, parse_result, get_execution_result, **kwargs)
    219 """:meta private:"""
    220 with self as session:
--> 221     return session.execute(
    222         document,
    223         variable_values=variable_values,
    224         operation_name=operation_name,
    225         serialize_variables=serialize_variables,
    226         parse_result=parse_result,
    227         get_execution_result=get_execution_result,
    228         **kwargs,
    229     )

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/gql/client.py:860, in SyncClientSession.execute(self, document, variable_values, operation_name, serialize_variables, parse_result, get_execution_result, **kwargs)
    858 # Raise an error if an error is returned in the ExecutionResult object
    859 if result.errors:
--> 860     raise TransportQueryError(
    861         str(result.errors[0]),
    862         errors=result.errors,
    863         data=result.data,
    864         extensions=result.extensions,
    865     )
    867 assert (
    868     result.data is not None
    869 ), "Transport returned an ExecutionResult without data or errors"
    871 if get_execution_result:

TransportQueryError: {'message': 'pq: value too long for type character varying(255)', 'path': ['updateMetadata']}

This was the code:

annotations_file = 'labels.tsv'

all_metadata = []
with open(annotations_file) as f:
    for row in tqdm(f.readlines()):
        image, caption, score = row.split('\t')[:3]
    all_metadata.append((image, {'caption': caption, 'score': score}))

total = len(all_metadata)

batch = 1000
for start in tqdm(range(0, total, batch)):
    data = all_metadata[start:start+batch]

    with ds.metadata_context() as ctx, open(annotations_file) as f:
        for image, metadata in data:
            ctx.update_metadata(image, metadata)

Import metadata from dataframe

Ingesting metadata should be available with a syntax like the following

import pandas as pd
import dagshub.dataengine

df = pd.read_csv("metadata.csv")
ds = dagshub.dataengine.get_datasource("images-ds")
ds.upload_metadata_from_dataframe(df, path_column="file_path")

List datasources API

Function to list all my available datasources for a repository.
Example:

sources = datasources.get_datasources(repo="simon/baby-yoda")
ds = sources[0]
ds...

Create a DagsHub plugin for Voxel to send to annotation

Being able to select a list of samples from Voxel, go to the dagshub plugin, then click "send to annotation"

  • Panel named dagshub that can see selected
  • Has a button "send to annotation"
  • Creates a new LS project and workspace if needed
  • Creates tasks (Hook onto the existing Voxel flow for that or not)
  • Have a select box to choose the ML backend for the project (placeholder for now)

Adding large amounts of metadata does not work

As a stress test, I have a repo with 542,247 images in it and wanted to add metadata to a data source. I ran the following code from a Jupyter notebook:

# Set up DagsHub
import os
os.environ["DAGSHUB_CLIENT_HOST"] = "https://test.dagshub.com"

from dagshub.data_engine.model import datasources

repo = "yonomitt/LAION-Aesthetics-V2-6.5plus"
image_root = "data"
try:
    ds = datasources.get_datasource(repo=repo, name="images")
except:
    ds = datasources.create_from_repo(repo=repo, name="images", path=image_root)


# Imports
from tqdm import tqdm


# Add metadata
annotations_file = 'labels.tsv'

with ds.metadata_context() as ctx, open(annotations_file) as f:
    for row in tqdm(f.readlines()):
        image, caption, score = row.split('\t')[:3]
        ctx.update_metadata(image, {'caption': caption, 'score': score})

The first time I ran this, it never returned (I waited several hours). The second time, I got a 502:

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 542247/542247 [00:01<00:00, 342020.81it/s]
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/requests/models.py:971, in Response.json(self, **kwargs)
    970 try:
--> 971     return complexjson.loads(self.text, **kwargs)
    972 except JSONDecodeError as e:
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError

File ~/.miniforge3/envs/dagstest/lib/python3.10/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File ~/.miniforge3/envs/dagstest/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File ~/.miniforge3/envs/dagstest/lib/python3.10/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

JSONDecodeError                           Traceback (most recent call last)
File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/gql/transport/requests.py:243, in RequestsHTTPTransport.execute(self, document, variable_values, operation_name, timeout, extra_args, upload_files)
    242 try:
--> 243     result = response.json()
    245     if log.isEnabledFor(logging.INFO):

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/requests/models.py:975, in Response.json(self, **kwargs)
    972 except JSONDecodeError as e:
    973     # Catch JSON-related errors and raise as requests.JSONDecodeError
    974     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 975     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

HTTPError                                 Traceback (most recent call last)
File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/gql/transport/requests.py:231, in RequestsHTTPTransport.execute.<locals>.raise_response_error(resp, reason)
    229 try:
    230     # Raise a HTTPError if response status is 400 or higher
--> 231     resp.raise_for_status()
    232 except requests.HTTPError as e:

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/requests/models.py:1021, in Response.raise_for_status(self)
   1020 if http_error_msg:
-> 1021     raise HTTPError(http_error_msg, response=self)

HTTPError: 502 Server Error: Bad Gateway for url: https://test.dagshub.com/api/v1/repos/yonomitt/LAION-Aesthetics-V2-6.5plus/data-engine/graphql

The above exception was the direct cause of the following exception:

TransportServerError                      Traceback (most recent call last)
Cell In[10], line 3
      1 annotations_file = 'labels.tsv'
----> 3 with ds.metadata_context() as ctx, open(annotations_file) as f:
      4     for row in tqdm(f.readlines()):
      5         image, caption, score = row.split('\t')[:3]

File ~/.miniforge3/envs/dagstest/lib/python3.10/contextlib.py:142, in _GeneratorContextManager.__exit__(self, typ, value, traceback)
    140 if typ is None:
    141     try:
--> 142         next(self.gen)
    143     except StopIteration:
    144         return False

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/model/datasource.py:118, in Datasource.metadata_context(self)
    116 ctx = MetadataContextManager(self)
    117 yield ctx
--> 118 self._upload_metadata(ctx.get_metadata_entries())

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/model/datasource.py:183, in Datasource._upload_metadata(self, metadata_entries)
    182 def _upload_metadata(self, metadata_entries: List[DatapointMetadataUpdateEntry]):
--> 183     self.source.client.update_metadata(self, metadata_entries)

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/client/data_client.py:109, in DataClient.update_metadata(self, datasource, entries)
    102 assert len(entries) > 0
    104 params = GqlMutations.update_metadata_params(
    105     datasource_id=datasource.source.id,
    106     datapoints=[e.to_dict() for e in entries]
    107 )
--> 109 return self._exec(q, params)

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/client/data_client.py:82, in DataClient._exec(self, query, params)
     80     logger.debug(f"Params: {params}")
     81 q = gql.gql(query)
---> 82 resp = self.client.execute(q, variable_values=params)
     83 return resp

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/gql/client.py:403, in Client.execute(self, document, variable_values, operation_name, serialize_variables, parse_result, get_execution_result, **kwargs)
    400     return data
    402 else:  # Sync transports
--> 403     return self.execute_sync(
    404         document,
    405         variable_values=variable_values,
    406         operation_name=operation_name,
    407         serialize_variables=serialize_variables,
    408         parse_result=parse_result,
    409         get_execution_result=get_execution_result,
    410         **kwargs,
    411     )

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/gql/client.py:221, in Client.execute_sync(self, document, variable_values, operation_name, serialize_variables, parse_result, get_execution_result, **kwargs)
    219 """:meta private:"""
    220 with self as session:
--> 221     return session.execute(
    222         document,
    223         variable_values=variable_values,
    224         operation_name=operation_name,
    225         serialize_variables=serialize_variables,
    226         parse_result=parse_result,
    227         get_execution_result=get_execution_result,
    228         **kwargs,
    229     )

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/gql/client.py:849, in SyncClientSession.execute(self, document, variable_values, operation_name, serialize_variables, parse_result, get_execution_result, **kwargs)
    829 """Execute the provided document AST synchronously using
    830 the sync transport.
    831 
   (...)
    845 
    846 The extra arguments are passed to the transport execute method."""
    848 # Validate and execute on the transport
--> 849 result = self._execute(
    850     document,
    851     variable_values=variable_values,
    852     operation_name=operation_name,
    853     serialize_variables=serialize_variables,
    854     parse_result=parse_result,
    855     **kwargs,
    856 )
    858 # Raise an error if an error is returned in the ExecutionResult object
    859 if result.errors:

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/gql/client.py:758, in SyncClientSession._execute(self, document, variable_values, operation_name, serialize_variables, parse_result, **kwargs)
    748         if serialize_variables or (
    749             serialize_variables is None and self.client.serialize_variables
    750         ):
    751             variable_values = serialize_variable_values(
    752                 self.client.schema,
    753                 document,
    754                 variable_values,
    755                 operation_name=operation_name,
    756             )
--> 758 result = self.transport.execute(
    759     document,
    760     variable_values=variable_values,
    761     operation_name=operation_name,
    762     **kwargs,
    763 )
    765 # Unserialize the result if requested
    766 if self.client.schema:

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/gql/transport/requests.py:249, in RequestsHTTPTransport.execute(self, document, variable_values, operation_name, timeout, extra_args, upload_files)
    246         log.info("<<< %s", response.text)
    248 except Exception:
--> 249     raise_response_error(response, "Not a JSON answer")
    251 if "errors" not in result and "data" not in result:
    252     raise_response_error(response, 'No "data" or "errors" keys in answer')

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/gql/transport/requests.py:233, in RequestsHTTPTransport.execute.<locals>.raise_response_error(resp, reason)
    231     resp.raise_for_status()
    232 except requests.HTTPError as e:
--> 233     raise TransportServerError(str(e), e.response.status_code) from e
    235 result_text = resp.text
    236 raise TransportProtocolError(
    237     f"Server did not return a GraphQL result: "
    238     f"{reason}: "
    239     f"{result_text}"
    240 )

TransportServerError: 502 Server Error: Bad Gateway for url: https://test.dagshub.com/api/v1/repos/yonomitt/LAION-Aesthetics-V2-6.5plus/data-engine/graphql

KeyError exception when trying to open a file that doesn't exist

Example trace for img = Image.open("data/raw/images/199.png")

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

--- Logging error ---
--- Logging error ---
--- Logging error ---
--- Logging error ---
--- Logging error ---
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/dagshub/streaming/filesystem.py", line 208, in stat
    return self.__stat(relative_path, dir_fd=self.project_root_fd)
FileNotFoundError: [Errno 2] No such file or directory: '<ipython-input-30-4f1112fdef07>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/linecache.py", line 74, in checkcache
    stat = os.stat(fullname)
  File "/usr/local/lib/python3.7/dist-packages/dagshub/streaming/filesystem.py", line 210, in stat
    if str(relative_path.name) not in self.dirtree[str(relative_path.parent)]:
KeyError: '.'

Original exception was:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/dagshub/streaming/filesystem.py", line 181, in open
    return self.__open(relative_path, mode, *args, **kwargs, opener=project_root_opener)
FileNotFoundError: [Errno 2] No such file or directory: 'data/raw/images/199.png'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-30-4f1112fdef07>", line 6, in <module>
  File "/usr/local/lib/python3.7/dist-packages/PIL/Image.py", line 2843, in open
    fp = builtins.open(filename, "rb")
  File "/usr/local/lib/python3.7/dist-packages/dagshub/streaming/filesystem.py", line 193, in open
    raise FileNotFoundError(f'Error finding {relative_path} in repo or on DagsHub')
FileNotFoundError: Error finding data/raw/images/199.png in repo or on DagsHub

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2040, in showtraceback
    stb = value._render_traceback_()
AttributeError: 'FileNotFoundError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/dagshub/streaming/filesystem.py", line 208, in stat
    return self.__stat(relative_path, dir_fd=self.project_root_fd)
FileNotFoundError: [Errno 2] No such file or directory: '<ipython-input-30-4f1112fdef07>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/IPython/core/ultratb.py", line 1101, in get_records
    return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset)
  File "/usr/local/lib/python3.7/dist-packages/IPython/core/ultratb.py", line 319, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/IPython/core/ultratb.py", line 353, in _fixed_getinnerframes
    records = fix_frame_records_filenames(inspect.getinnerframes(etb, context))
  File "/usr/lib/python3.7/inspect.py", line 1502, in getinnerframes
    frameinfo = (tb.tb_frame,) + getframeinfo(tb, context)
  File "/usr/lib/python3.7/inspect.py", line 1460, in getframeinfo
    filename = getsourcefile(frame) or getfile(frame)
  File "/usr/lib/python3.7/inspect.py", line 693, in getsourcefile
    if os.path.exists(filename):
  File "/usr/lib/python3.7/genericpath.py", line 19, in exists
    os.stat(path)
  File "/usr/local/lib/python3.7/dist-packages/dagshub/streaming/filesystem.py", line 210, in stat
    if str(relative_path.name) not in self.dirtree[str(relative_path.parent)]:
KeyError: '.'

self.dirtree[str(relative_path.parent)] should probably change to self.dirtree.get(str(relative_path.parent))

`InvalidPathFormatError` when calling `ds.all()`

My code:

len(ds.all().dataframe)

My repo:

https://test.dagshub.com/yonomitt/LAION-Aesthetics-V2-6.5plus/src/main/data

My error:

---------------------------------------------------------------------------
InvalidPathFormatError                    Traceback (most recent call last)
Cell In[15], line 1
----> 1 len(ds.all().dataframe)

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/client/dataclasses.py:77, in QueryResult.dataframe(self)
     75 for e in self.entries:
     76     names.append(e.path)
---> 77     urls.append(e.download_url(self.datasource))
     78     metadata_keys.update(e.metadata.keys())
     80 res = pd.DataFrame({"name": names, "dagshub_download_url": urls})

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/client/dataclasses.py:21, in Datapoint.download_url(self, ds)
     20 def download_url(self, ds: "Datasource"):
---> 21     return ds.source.raw_path(self)

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/model/datasource_state.py:93, in DatasourceState.raw_path(self, path)
     89 """
     90 Returns the url for the download path of a specified path
     91 """
     92 path = self._extract_path(path).strip("/")
---> 93 return self.root_raw_path + "/" + path

File ~/.miniforge3/envs/dagstest/lib/python3.10/functools.py:981, in cached_property.__get__(self, instance, owner)
    979 val = cache.get(self.attrname, _NOT_FOUND)
    980 if val is _NOT_FOUND:
--> 981     val = self.func(instance)
    982     try:
    983         cache[self.attrname] = val

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/model/datasource_state.py:111, in DatasourceState.root_raw_path(self)
    104 @cached_property
    105 def root_raw_path(self):
    106     """
    107     Returns the root raw path of the dataset for downloading files
    108     This is just a "prefix" of the datasource relative to the repo.
    109     In order to build a path of an entity you need to concatenate the path to this root
    110     """
--> 111     return self._root_path("raw")

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/model/datasource_state.py:115, in DatasourceState._root_path(self, path_type)
    113 def _root_path(self, path_type):
    114     assert path_type in ["raw", "content"]
--> 115     parts = self.path_parts()
    116     if self.source_type == DatasourceType.BUCKET:
    117         path_elems = [parts["schema"], parts["bucket"]]

File ~/.miniforge3/envs/dagstest/lib/python3.10/site-packages/dagshub/data_engine/model/datasource_state.py:145, in DatasourceState.path_parts(self)
    143 match = regex.fullmatch(self.path)
    144 if match is None:
--> 145     raise InvalidPathFormatError(f"{self.path} is not valid path format for type {self.source_type}.\n"
    146                                  f"Expected format: {expected_formats[self.source_type]}")
    147 return match.groupdict()

InvalidPathFormatError: repo://yonomitt/LAION-Aesthetics-V2-6.5plus/data is not valid path format for type DatasourceType.REPOSITORY.
Expected format: repo://owner/reponame/prefix

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.