I am chatting with <a class="user-mention notranslate" data-hovercard-type="user" data

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Intake Integration about jupyterlab-data-explorer HOT 8 CLOSED

jupyterlab commented on September 28, 2024 2

Intake Integration

from jupyterlab-data-explorer.

Comments (8)

martindurant commented on September 28, 2024 2

Create new repo for this work

I have no preference where this lives. On jupyterlab or other related org or in Intake, all are fine.

from jupyterlab-data-explorer.

saulshanabrook commented on September 28, 2024

I am chatting with @danielballan about this issue. We have come up with a plan!

Intake discovers catalogues in the system by looking at certain paths for .yml files. There is an open issue (intake/intake#404) to also discover catalogues in Python packages via an entry point intake.catalogues.

So what we can do is launch an Intake server as a Jupyter server proxy that serves up Intake's HTTP API. We can connect to this in a JupyterLab plugin and register a top level Intake dataset. The user should be able to see the catalogues within this dataset and expand them recursively. For datasources, users should be able to insert a snippet into their notebook that loads this datasource with intake, like import intake \n intake.cat.abcd.

On the client side this requires implementing a Intake client API in Javascript, which will use messagepack. It will also require writing a JupyterLab extension that registers data converters for these Intake URLs that hit the API.

We can then extend that, if we like, to actually request the contents of the data sources and display them in some way on the client. For example, we can display a numpy array in a datagrid. This will require writing custom logic for each intake driver to know how to request a chunk and parse the resulting data.

We can also display metadata provided by intake about data sources, like their shape and dtype. We should register this with the metadata service so that the user can see metadata in right hand side pane as they navigate their catalogue. Intake allows datasources to also provide arbitrary metadata. If the driver returns this metadata in JSON LD, we can also display that in the metadata explorer.

We also discussed letting users discover catalogues by finding their intake.yml files in the file system and expanding, as well showing the catalogues provided by different python packages. That way, when users are exploring the data registry they see the source the catalogue came from, instead of seeing all catalogues flattened at the top level. Authors could also write datasets.yml files that collate these separate catalogues for a single repo. We decided against this approach for now, since Intake already has a discovery mechanism for merging all the catalogues available to users.

cc @martindurant @gwbischof

from jupyterlab-data-explorer.

martindurant commented on September 28, 2024

Thanks for starting this discussion, I am actively thinking about it!

from jupyterlab-data-explorer.

saulshanabrook commented on September 28, 2024

@ian-r-rose has a dcat dataset intake driver that exposes metadata, so we should also try that pipeline of getting metadata from a driver, into the data explorer, and then into the metadata explorer: https://twitter.com/IanRRose/status/1182660959413784576

from jupyterlab-data-explorer.

martindurant commented on September 28, 2024

OK, I think I have got over my initial reservations: the frontend is much better off talking with a REST service than with a python kernel, so may as well indeed use the Intake server. Serving the "builtin" items it something we want to allow anyway, rather than always exposing a given cat. It may be useful (but not necessary) to expose connections to other servers too, in which case instead of intake.cat.abc, you would need cat = intake.open_catalog("..."); cat.abc.

The server likes to talk msgpack, rather than JSON, I hope that everything translates to the JS side. I suppose, if the matadat can be displayed in something like YAML blocks (i.e., as they would be in the catalog), that's enough.

So what needs to happen to make progress here?

from jupyterlab-data-explorer.

saulshanabrook commented on September 28, 2024

It may be useful (but not necessary) to expose connections to other servers too, in which case instead of intake.cat.abc, you would need cat = intake.open_catalog("..."); cat.abc.

I agree. I think this would be good to allow after initial work exposing the default server.

The server likes to talk msgpack, rather than JSON, I hope that everything translates to the JS side.

There is a msgpack client for javascript so this should be fine.

I suppose, if the matadat can be displayed in something like YAML blocks (i.e., as they would be in the catalog), that's enough.

Is there a standard for the metadata a driver exposes? Or is it up to them to expose whatever they want? If it is JSON-LD we could expose it in the metadata service. Otherwise, we could expose it however we like with whatever UI makes sense for it.

So what needs to happen to make progress here?

Create new repo for this work
Create Python package that exposes intake API through jupyter server proxy
Create JS package that exposes intake API in JS
Create JS package that exposes JupyterLab extension which connects to intake API with JS API package and adds this to the data registry.

from jupyterlab-data-explorer.

martindurant commented on September 28, 2024

Is there a standard for the metadata a driver exposes?

There are standard things that every entry has (name, description, driver, arguments), but the general metadata is totally arbitrary.

from jupyterlab-data-explorer.

martindurant commented on September 28, 2024

@saulshanabrook - this dropped off the table at some point. Are you still interested?

from jupyterlab-data-explorer.

Intake Integration about jupyterlab-data-explorer HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent