Giter Site home page Giter Site logo

jupyterlab / jupyterlab-data-explorer Goto Github PK

View Code? Open in Web Editor NEW
177.0 27.0 38.0 12.41 MB

First class datasets in JupyterLab

License: BSD 3-Clause "New" or "Revised" License

Shell 0.19% TypeScript 62.64% Jupyter Notebook 32.91% CSS 1.43% Python 0.17% JavaScript 2.67%
jupyterlab jupyterlab-extension data-registry rxjs observable

jupyterlab-data-explorer's Introduction

Installation | Documentation | Contributing | License | Team | Getting help |

PyPI version Downloads Build Status Build Status Documentation Status Crowdin GitHub Discourse Gitter Gitpod

Binder

An extensible environment for interactive and reproducible computing, based on the Jupyter Notebook and Architecture.

JupyterLab is the next-generation user interface for Project Jupyter offering all the familiar building blocks of the classic Jupyter Notebook (notebook, terminal, text editor, file browser, rich outputs, etc.) in a flexible and powerful user interface.

JupyterLab can be extended using npm packages that use our public APIs. The prebuilt extensions can be distributed via PyPI, conda, and other package managers. The source extensions can be installed directly from npm (search for jupyterlab-extension) but require an additional build step. You can also find JupyterLab extensions exploring GitHub topic jupyterlab-extension. To learn more about extensions, see the user documentation.

Read the current JupyterLab documentation on ReadTheDocs.

Important

JupyterLab 3 will reach its end of maintenance date on May 15, 2024, anywhere on Earth. To help us make this transition, fixes for critical issues will still be backported until December 31, 2024. If you are still running JupyterLab 3, we strongly encourage you to upgrade to JupyterLab 4 as soon as possible. For more information, see JupyterLab 3 end of maintenance on the Jupyter Blog.


Getting started

Installation

If you use conda, mamba, or pip, you can install JupyterLab with one of the following commands.

  • If you use conda:
    conda install -c conda-forge jupyterlab
  • If you use mamba:
    mamba install -c conda-forge jupyterlab
  • If you use pip:
    pip install jupyterlab
    If installing using pip install --user, you must add the user-level bin directory to your PATH environment variable in order to launch jupyter lab. If you are using a Unix derivative (e.g., FreeBSD, GNU/Linux, macOS), you can do this by running export PATH="$HOME/.local/bin:$PATH". If you are using a macOS version that comes with Python 2, run pip3 instead of pip.

For more detailed instructions, consult the installation guide. Project installation instructions from the git sources are available in the contributor documentation.

Installing with Previous Versions of Jupyter Notebook

When using a version of Jupyter Notebook earlier than 5.3, the following command must be run after installing JupyterLab to enable the JupyterLab server extension:

jupyter serverextension enable --py jupyterlab --sys-prefix

Running

Start up JupyterLab using:

jupyter lab

JupyterLab will open automatically in the browser. See the documentation for additional details.

If you encounter an error like "Command 'jupyter' not found", please make sure PATH environment variable is set correctly. Alternatively, you can start up JupyterLab using ~/.local/bin/jupyter lab without changing the PATH environment variable.

Prerequisites and Supported Browsers

The latest versions of the following browsers are currently known to work:

  • Firefox
  • Chrome
  • Safari

See our documentation for additional details.


Getting help

We encourage you to ask questions on the Discourse forum. A question answered there can become a useful resource for others.

Bug report

To report a bug please read the guidelines and then open a Github issue. To keep resolved issues self-contained, the lock bot will lock closed issues as resolved after a period of inactivity. If a related discussion is still needed after an issue is locked, please open a new issue and reference the old issue.

Feature request

We also welcome suggestions for new features as they help make the project more useful for everyone. To request a feature please use the feature request template.


Development

Extending JupyterLab

To start developing an extension for JupyterLab, see the developer documentation and the API docs.

Contributing

To contribute code or documentation to JupyterLab itself, please read the contributor documentation.

JupyterLab follows the Jupyter Community Guides.

License

JupyterLab uses a shared copyright model that enables all contributors to maintain the copyright on their contributions. All code is licensed under the terms of the revised BSD license.

Team

JupyterLab is part of Project Jupyter and is developed by an open community. The maintenance team is assisted by a much larger group of contributors to JupyterLab and Project Jupyter as a whole.

JupyterLab's current maintainers are listed in alphabetical order, with affiliation, and main areas of contribution:

  • Mehmet Bektas, Netflix (general development, extensions).
  • Alex Bozarth, IBM (general development, extensions).
  • Eric Charles, Datalayer, (general development, extensions).
  • Frédéric Collonval, WebScIT (general development, extensions).
  • Martha Cryan, Mito (general development, extensions).
  • Afshin Darian, QuantStack (co-creator, application/high-level architecture, prolific contributions throughout the code base).
  • Vidar T. Fauske, JPMorgan Chase (general development, extensions).
  • Brian Granger, AWS (co-creator, strategy, vision, management, UI/UX design, architecture).
  • Jason Grout, Databricks (co-creator, vision, general development).
  • Michał Krassowski, Quansight (general development, extensions).
  • Max Klein, JPMorgan Chase (UI Package, build system, general development, extensions).
  • Gonzalo Peña-Castellanos, QuanSight (general development, i18n, extensions).
  • Fernando Perez, UC Berkeley (co-creator, vision).
  • Isabela Presedo-Floyd, QuanSight Labs (design/UX).
  • Steven Silvester, MongoDB (co-creator, release management, packaging, prolific contributions throughout the code base).
  • Jeremy Tuloup, QuantStack (general development, extensions).

Maintainer emeritus:

  • Chris Colbert, Project Jupyter (co-creator, application/low-level architecture, technical leadership, vision, PhosphorJS)
  • Jessica Forde, Project Jupyter (demo, documentation)
  • Tim George, Cal Poly (UI/UX design, strategy, management, user needs analysis).
  • Cameron Oelsen, Cal Poly (UI/UX design).
  • Ian Rose, Quansight/City of LA (general core development, extensions).
  • Andrew Schlaepfer, Bloomberg (general development, extensions).
  • Saul Shanabrook, Quansight (general development, extensions)

This list is provided to give the reader context on who we are and how our team functions. To be listed, please submit a pull request with your information.


Weekly Dev Meeting

We have videoconference meetings every week where we discuss what we have been working on and get feedback from one another.

Anyone is welcome to attend, if they would like to discuss a topic or just listen in.

Notes are archived on GitHub Jupyter Frontends team compass.

jupyterlab-data-explorer's People

Contributors

acu192 avatar dependabot[bot] avatar ellisonbg avatar fcollonval avatar kgryte avatar saulshanabrook avatar telamonian avatar tgeorgeux avatar vidartf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jupyterlab-data-explorer's Issues

Quilt Data Integration

I wanted to write down some thought on possible ways Quilt Data could integrate with the data registry, to let JupyterLab users explore Quilt Data more easily.

The basic entity is a package, like this: https://quiltdata.com/package/uciml/iris. It is in the form <user>/<name>. As a URL, it could like like:

quilt://quiltdata.com/uciml/iris. So if a user could add a dataset with that, what would they wanna do with it?

  • See associated metadata provided on quilt website
  • Insert snippet in Python notebook to read this data (assuming the kernel has quilt installed)
  • View the files in it? It might be useful to expose each of these as a dataset. I wonder if it would, if we could access them client side to throw in a grid viewer or something.

Structured URLs

Currently, we have to rely on manually parsing and creating different types of URLs which represent different dataset locations. For example, the notebook URL looks like this:

// 'file:///{path}.ipynb/#/cells/{cellid}/outputs/{outputid}/data/{mimetype}'

It would be nice if we could just write a format string that looks like that, and get a way to both generate notebook URLs and extract the data from them. Currently, we have to do something like this instead:

const result = decodeURIComponent(url.hash).match(
/^[#]([/]cells[/]\d+[/]outputs[/]\d+)[/]data[/](.*)$/
);
if (
url.protocol !== "file:" ||
!url.pathname.endsWith(".ipynb") ||
!result
) {
return null;
}
const [, outputHash, type] = result;

This is error prone and requires duplicating code.

Luckily, there is a "URI Template RFC 6570" standard just for this use case!

We should add support this, using an existing URL template library or writing our own. Ones that look like they might work are:


Design

This is similar to how we created a type safe abstraction over different mimetypes, some of them with arguments:

export abstract class DataType<T, U> {
abstract parseMimeType(mimeType: MimeType_): T | typeof INVALID;
abstract createMimeType(typeData: T): MimeType_;
createDataset(data: U, typeData: T) {
return createDataset(this.createMimeType(typeData), data);
}
createDatasets(url: URL_, data: U, typeData: T) {
return createDatasets(url, this.createMimeType(typeData), data);
}
/**
* Filer dataset for mimetypes of this type.
*/
filterDataset(dataset: Dataset<any>): Map<T, U> {
const res = new Map<T, U>();
for (const [mimeType, [, data]] of dataset) {
const typeData_ = this.parseMimeType(mimeType);
if (typeData_ !== INVALID) {
res.set(typeData_, data as any);
}
}
return res;
}
}

It lets us define a mimetype once, like this:

const cellModelDataType = new DataTypeNoArgs<Observable<ICellModel>>(
"application/x.jupyterlab.cell-model"
);

and use it in converters to go to/from that mimetype:

return createConverter(
{ from: resolveDataType, to: cellModelDataType },
({ url }) => {
const result = url.hash.match(/^[#][/]cells[/](\d+)$/);
if (
url.protocol !== "file:" ||
!url.pathname.endsWith(".ipynb") ||
!result
) {
return null;
}
const cellID = Number(result[1]);
// Create the original notebook URL and get the cells from it
url.hash = "";
const notebookURL = url.toString();
return defer(() =>
notebookCellsDataType
.getDataset(registry.getURL(notebookURL))!
.pipe(map(cells => cells[cellID]))
);
}
);

In a similar fashion, we should be able to create an object that refers to a certain URL template once, and then use it in converters. So we could add an optional fromURL and toURL parameter to createConverter that takes in a URL template template, and so instead of getting/returning an actual URL, you just return the parameters extracted from the template.

So the URLTemplate type, that you pass in, would have to both have the string of the URL template, and have some types that specify the mapping from params to types, so probably an object. So possibly something like this:

const notebookTemplate = new TemplateURL<"path" | "cellID">(
  'file://{/path}.ipynb#/cells/{cellID}',
)

Dark theme support

None of the components support the dark theme. We need to use the JupyterLab CSS variables or the theme signal to track which theme we are on and color things appropriately.

Dummy: For photo uploads

Drag photos into comments on this issue to upload them and get a URL. Then you can link to them from the docs/readme

Allow querying relative URL

If we output a vega spec in a notebook that refers to another cell, the URL to reference it will be relative to the vega notebook, like ./other-data. When we render the vega spec, we should take its current URL and resolve the data in it relative to it.

Named outputs from notebooks

We should be able to name the outputs of cells in a notebook, so that we can refer to them by their name. For example, we should be able to output a dataframe with a name, then refer to it by that name in a vega output later on.

Probably depends on #16 first

User Needs Analysis & User Stories

We had an in-person meeting contributors from NYU and Project Jupyter to define the User Needs for the Data Explorer project. Please keep in mind, it's still a work in progress, and I'll be updating this thread as we have more finalized, but I wanted to open this up for wider input in the meantime. The working document is available at: Dataset Explorer: User Needs Analysis

Why would somebody use a data explorer?

  • To see a list of the datasets they have opened in JupyterLab or which someone has made available to them through the data registry.
  • Organize and collate groups of data sets for multi-user projects.
  • Social usages of data.
  • Cross-reference related usages of Data sets.
  • Provide suggestions/recommendations of other datasets that are relevant to their work.
  • To see different things they can do with those data sets.
  • To render the data sets using different visual representations (visualization, table, graph, etc.).
  • To view and edit metadata on datasets.
  • To encourage people to discuss and produce code snippets for data sets.
  • To explore data sets available to them for a pre-specified project.
  • Give a ‘sneak peak’ of data sets.
  • Browsing datasets by metadata (such as publisher, related datasets, dataset catalogue, etc.).
  • So they can push cleaned data back to the administrator to update the registry.
  • To share insights between projects.

Touch Points

Users will often use the Data Explorer:

  • At the beginning of a project to explore relevant data sets.
  • After they’ve imported (registered) the data sets relevant to the project. (Phase II)
  • During work on the project, as new areas of interest arise, researchers will be looking for data to operationalize and integrate these additional areas of interest.
  • After they’ve been working on the project for a while and want to review data sets.
  • At project wrap up, to confirm their data sets have been used properly.
  • While they are working with a data-set.

Relevant Personas

Joe Data Scientist (PI) - Has domain expertise, writes a lot of code.

Jane Data Set Administrator - Works with her developers to register correct datasets along with relevant metadata/comments/code snippets.

Mike Business intelligence analyst - Is doing similar tasks to Data Scientist but doesn’t write so much code.

Project Phases

Phase I

  • Extensions should register dataset that the user already has approved access to, and which are considered “actively used” by the user.
  • Assume number of registered datasets is small (<10).

Phase II

  • Data registry gains notion of different classes of datasets, such as active, available, request access.
  • Along with this, the UI/UX would need to modified to address the many dataset usage case (100s-1000s). Search, and namespaces would become important.

Nested Datasets / Data providers

Our current model for the data registry is a global set of datasets that anyone can add datasets too and query for all the available datasets (pretty much what is in this issue #3).

There are a few new ideas floating around that would potentially change this fundamental layout:

  • Adding the ability for datasets to have "children"
  • Adding the concept of "data providers"

First, I wanted to layout my current conceptual models to let them guide this change in API:

  • URLs already have an inherent nested structure to them. For examples file:///a/ is a "parent" of file:///a/b.csv. Also, both file:///a/ and file:///b/ share the same protocol of "file". We could make use of these semantics in the UI, to turn a flat list of URLs into a nested interactive tree view of datasets
  • Our current converter registry stores how to get to new ways of understanding existing datasets. We could define a converter that is of mimetype xxx.jupyter.datasets that returns a list of "children" datasets.

These conceptual models could be conflicting. Do you want the "tree view" of datasets to be determined by URL or by child/parent status? Basically, I am asking if the child of a dataset has to be related to its parent as a subpath in the URL.


Let's take a step back from these conceptual mind games and make a list of different use cases we are trying to target with the these concepts:

  • Files that have valid conversions in the file browser show up in the data explorer. Here, the explorer becomes an alternative view on top of the existing file API which exposes how these different files can be used as datasets. Users should be able to expand/unexpand directories to see available datasets
  • Notebooks should expose all their relevant outputs as datasets. So you should be able to see "inside" a notebook (at least those that are open) to view it's outputs.
  • If you have a datasets.yml file, that contains references to other datasets, you should be able to explore all of these in the data explorer.
  • With a quilt data plugin that allows you to explore quilt server, find users, find packages, find datasets inside of them. I would be nice to be able to search for datasets with certain attributes (like all vega lite type datasets), and need a way to paginate/filter because packages could have millions of datasets in them.

Already in these use cases, I have changed the fundamental framing of how datasets end up in the data explorer. Instead of having an imperative style "register/publish" function to add a dataset, there are different "providers" (we can call them) that users can explore and find datasets within. It is moving from a push (user adds datasets explicitly) to a pull (data registry queries provider to see what datasets are available) model.

This is nice, because then we move the state management into these providers. They can figure out when/how to deregister datasets and can implement whatever algorithms they want to list all the available datasets.

A few questions remain:

  • How do "providers" interact with URLs? Is there any correspondence or can any provider return any URL?
  • What is the API for a provider? How does it relate to nesting?

To answer these, I think it would be helpful to sketch out some different possibly user experiences for the data registry with nested data / data providers that covers the use cases we care about. This will inform what API makes sense to require for the provider and how providers relate to URLs. For example, if we have a nested structure that allows user to fold/unfold at parts of the URL path, then the provider should have something like queryChildren(basePath: string): Datasets. If we need pagination, then this response also should be paginated.

View Notebook Ouputs in Data Explorer

In order to support the previous integration we had with nteract's data explorer, we should be able to output a table mimetype from a notebook and view it with their data explorer by first registering it in ours. So we should try out enabling notebooks outputs as nested datasets.

I think they will be of the URL file://filename.ipynb#cells/123.

Filtering and Searching

We should investigate filtering and searching, both how these things should be done on the data registry level and how they should be displayed in the UX.

We need to be careful how specific we get on these interfaces to both support a wide set of use cases but keep the UI manageable.

@ellisonbg mentioned we might want to provide some built in filtering/searching just for in memory data, like possibly everything we show in the UX. And then more advanced filtering/searching could be implemented in an alternative UI if that is desired.

So in the default UI, maybe just show "activities" and nesting, and allow searching/filtering over visible URLs and activity names.

But we should explore this further, maybe there is some lazy filtering/searching we should support builtin.

Reference from @pacoid: https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9 It would be good to think about how support this kind of data registry

Add documentation

Before the first release, we should do some documentation. It should focus on three possibly users:

  • Users of data explorer
    • How to install
    • How to use
    • Built in supported types and viewers
  • Using data registry outside JupyterLab
    • How to depend on package
    • How to register converter
    • How to query data registry to get conversions, caching
    • How to create typed data type
    • Examples of different data types and conversions,
      • Observable based conversion with caching
      • Nested datasets
    • Built in data types and their relations
  • Extenders of data registry in JL
    • How plugins grab the data registry and register conversions

Currently I am thinking this can just go in markdown.

Cell metadata use case

@tonyfast brought up an idea today about how to view metadata about the cell your are looking it. What I understand is that a user would run an output in a cell that changes the output metadata to put some JSON LD in there about the cell. Then, they bring up the linked data browser (jupyterlab/jupyterlab-metadata-service#27 (comment)) and as they scroll through their cells they see the metadata about that cell.

One possible way to achieve this:

  • Register the cell output metadata in the data registry, like we are registering the cell outputs, as a "json-ld linked data" mimetype (#45)
  • Define a linked data provider that looks up the URL in the data registry and returns the data from "json-ld linked data" mimetype , if it exists for that URL
  • Update the active dataset when a user navigates in notebook, to the be the cell the user is on
  • Change the linked data browser in response to the active dataset changing

Create initial release

The two packages here are ready for an initial release on NPM. I believe @ellisonbg needs to do the honors, since I am not part of the jupyterlab npm org.

Voyager Converter

We should rewrite the voyager extension to use the conversion system. This will simplify its data ingestion logic.

Adding support for HDF5: feasible?

I'm looking into whether I can build on this extension in order to implement a long-held ambition: an HDF5 file viewer for Jlab.

An HDF5 file is kind of like its own mini filesystem: there's a tree of groups (equivalent to directories), and each group may contain datasets and/or other groups. My basic idea would be to expose the group tree in the dataset browser, and then be able to open/view a given dataset (assuming it's 2D or less) as a grid in the main area.

Along those lines, I have a bunch of questions:

  • Does this seem feasible/in line with your vision of this extension?
  • What exactly would I need to add to this extension in order to support HDF5? It looks like I'd at least need to add an appropriate converter in dataregistry-extension/src/files.tsx. Would anything else be required?
  • Does this extension have any stuff to help deal with large files?
  • It seems that the text in the grid view is not selectable, which is a little bit killer. The grid is implemented via the datagrid stuff from Phosphor, right? Is this an upstream issue?

Intake Integration

Intake is a "lightweight package for finding, investigating, loading and disseminating data." It would be nice to figure out how the JupyterLab data registry could integrate with this package.

Catalogs

Having JupyterLab be aware of Intake's "Data catalogs" are probably a good place to start. They "provide an abstraction that allows you to externally define, and optionally share, descriptions of datasets, called catalog entries."

Local

For example, if you have a catalog as a file on disk in a catalog.yaml file, we might want to be able to see the datasets it defines in the data registry. This is similar to how currently if you have a .ipynb file, you can view the the datasets in its cell outputs. To do this, we would have to be able to parse it's YAML format in javascript, and map the different entries to URLs.

For example, this catalog.yml file:

metadata:
  version: 1
sources:
  example:
    description: test
    driver: random
    args: {}

  entry1_full:
    description: entry1 full
    metadata:
      foo: 'bar'
      bar: [1, 2, 3]
    driver: csv
    args: # passed to the open() method
      urlpath: '{{ CATALOG_DIR }}/entry1_*.csv'

  entry1_part:
    description: entry1 part
    parameters: # User parameters
      part:
        description: section of the data
        type: str
        default: "stable"
        allowed: ["latest", "stable"]
    driver: csv
    args:
      urlpath: '{{ CATALOG_DIR }}/entry1_{{ part }}.csv'

Might map to a number of nested URLs:

./dataset.yml#/sources/example
./dataset.yml#/sources/entry1_full
./dataset.yml#/sources/entry1_part

And the ones that point to CSV files, would also point to some nested URLs, like dataset.yml#/sources/entry1_part would point to:

./entry1_latest.csv
./entry1_stable.csv

This basically requires re-implementing the logic of the all the drivers, so that they can work client side.

Remote

We could also support loading a remote Intake data catalog. If you loaded a URL like intake://catalog1:5000 in the data registry you would want to be able to see the datasets available. Here, the proxy mode might be useful:

Proxied access: In this mode, the catalog server uses its local drivers to open the data source and stream the data over the network to the client. The client does not need any special drivers to read the data, and can read data from files and data servers that it cannot access, as long as the catalog server has the required access.

If we implement a client API for this server protocol, then we can let it handle all the data parsing and just expose the results it returns to the user. We would have to look more in depth in its specification.

Show mimeType graph

At the meeting today @ellisonbg and @tonyfast mentioned that it might be nice to be able to actually see the graph of the mimetypes that is generated during the conversion process.

To do this, we would need to save some extra context.

Improve expand/collapse UI

Right now there is a "show"/"hide" button to deal with nested datasets. Instead this should probably be an arrow on the right side.

Add converters for tabular data

@ellisonbg mentioned that it would be good to support some default tabular data formats, to convert between them.

For each of these, we should define a data type, and define converters between them. Then we should make sure they work on some test datasets.

Some pipelines that should work after this:

  1. Open CSV files with nteract data viewer, by first converting to JSON table schema
  2. View pandas dataframe output in datagrid, by going from JSON table schema to datagrid model
  3. If we create a Vega Lite spec that refers to a dataset by url like file:///notebooks/Table.ipynb#/cells/4/outputs/0/data/application/vnd.dataresource+json, then this should use the pandas output from that cell in the notebook as an input to the vega spec. Depends on #20

Handle renaming datasets

@hoo761 has been doing some work tracking when cells are moved around in https://github.com/jupyterlab/jupyterlab-commenting. Ideally, if you are looking at the output for a cell, and someone moves that cell, then your view should be renamed as well to reflect the new cell number.

To do this, we need to add a concept of renaming to the data registry, to register when one URL should be moved to another and make sure the views handle this properly.

Closing widget and reopening fails

Once we close a view, the widget is disposed and we cannot re add it. We should change how the widget views work so they are either not disposed when they close or another is created

Initial repository setup

This is an issue to track the initial repository setup:

  • Add official Jupyter copyright and license.
  • Add standard sections to README, including team section.
  • Create skeleton for single npm package
  • Create JLab style labels.

Data explorer overview

This issue provides an overview of the roadmap of the data explorer.

Background

The JupyterLab data registry will enable extensions to 1) register abstract datasets with a central service and 2) monitor the registry for datasets. The dataset abstraction in the data registry includes:

  • Text-based MIME type
  • Optional URI to point at datasets that are persistent
  • Abstract dataset pointer

The data registry also includes a converter architecture that can convert datasets from one MIME type to another.

Conceptually, the data registry will make datasets a first class entity or noun in JupyterLab.

Data explorer UI

The Data explorer is a proposed user interface to enable users to explore datasets that different extensions have registered with the registry, and the do interesting things with the datasets, such as:

  • Render them using MIME renderers.
  • Comment on and annotate the datasets.
  • Create and view metadata attached to the datasets.

Conceptually, the data explorer UI will provide a user interface for the verbs related to a dataset, or the actions or activities a users can perform with the dataset, such as "render this as a table".

Initial design thoughts

  • Probably a left sidebar based UI as this is similar to others currently there with an "overview" or "explore" idea.
  • A list a datasets.
  • For each dataset a discoverable list of things you can do with the dataset:
    • MIME renderers.
    • Create/edit metadata.
    • Open comments for the dataset.
  • The metadata and commenting/annotation UIs will likely rely on another extension being developed separately.
  • We may also want extension points to register new "things you can do" for a given MIME type.
  • We will want to take into account the MIME type of the dataset, but also the different MIME types that can be created through the converter API.

The visual representation of the list of datasets, and the things you can do with them is still a core design question.

@saulshanabrook @tgeorgeux

Icons for common MIME Types available in Data Registry

I'm looking to make a list of common MIME types we'll need icons for.

  • text/csv
  • application/rdf+xml
  • text/richtext
  • application/rss+xml
  • application/sparql-query
  • application/json
  • application/x-latex

Do we have any need to represent data or video files at this point? Do any of those in the not make sense to include for now? Are there any obvious types missing?

I don't have a good feel for what MIME types will be considered 'common' in this use case, please help me populate this list.

UI/UX design of data explorer

This is a standing issue to prototype, discuss, and review the user interface and user experience design of the data explorer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.