Giter Site home page Giter Site logo

datopian / datapub Goto Github PK

View Code? Open in Web Editor NEW
21.0 6.0 4.0 8.5 MB

๐Ÿ“ React-based framework for building data publishing workflows (esp for CKAN)

Home Page: https://tech.datopian.com/publish/

License: MIT License

HTML 3.60% CSS 17.43% JavaScript 78.97%
ckan datahub data reactjs

datapub's People

Contributors

amercader avatar anuveyatsu avatar kmanaseryan avatar mariorodeghiero avatar mbeilin avatar risenw avatar rufuspollock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

datapub's Issues

[epic] Resource Schema (data dictionary) editor w/ infer

A Data Dictionary (or Table Schema) provides information about the data in a resource such as the names, types and description of the fields.

https://tech.datopian.com/publish/#adding-a-schema-data-dictionary-for-a-resource

UX

For more, see https://tech.datopian.com/publish/#adding-a-schema-data-dictionary-for-a-resource

Horizontal (spreadsheet like)

Vertical layout

Acceptance

Working table schema editor as react component

UX is along lines of https://tech.datopian.com/publish/#adding-a-schema-data-dictionary-for-a-resource (but feel free to adapt/innovate)

Basic

  • Initialize with Data Resource including a Table Schema
    • Can assume sample data is already inlined in sample attribute as array of dicts
  • Can edit and save
  • Basic UI: Show fields/headers and then key attributes for table schema
    • title
    • description (markdown?)
    • type (string, int etc)
    • format (optional)
  • Show the sample of the data either with the headers or separately from the headers
    • Probably with the headers is easier

Infer

  • Given Data Resource without Table Schema auto infer table schema

Bonus / future

  • Show validation errors as you change types
  • (?) If no sample present auto generate the sample

Tasks

  • Research https://github.com/datopian/import-ui
  • Stub a Data Resource fixture (with Table Schema)
  • Implement the basic editor (not sure about order of tasks so choose what you neeed)
    • React component with key fields
    • Create a story using storybook for displaying the schema editor
    • Jest test

Analysis

JSON Schema editors

Research editors for json schema to see if we could reuse for our purposes ...

Design

Here's a rough code sketch of how this component could work:

<TableSchema
  // props for initialization
  // http://frictionlessdata.io/table-schema
  // https://specs.frictionlessdata.io/table-schema/
  schema= Frictionless (f11s) Table Schema object
  // sample of the data for displaying etc
  sample= [{ a: 1, b: 2}, {a: 5, b: 10}]
  />

Schema example for this case:

schema = {
  fields: [
    {
      name: 'a',
      type: 'string',
      description: 'column a is about X',
      format: '...'
    },
    {
      name: 'b',
      type: 'integer'
    }
  ]
}

Inside TableSchema component:

TableSchema.props.schema = make a copy of passed in schema
As you edit in UI the internal schema object is updated

LATER/FUTURE: For inference and having this work with data files ...

resource = open(HTML5 file object)

TableSchema(resource.schema, getSample(resource))

Use https://github.com/datopian/data.js for opening files ... (why? gives you all the metadata)

No debugging information in console if upload fails

When trying to upload a file, if an error occurs somewhere in the process that is not originally expected, the error is suppressed and will not appear in logs. This makes debugging needlessly harder.

We should just add console.error() call to src/components/Upload/index.jsx onClickHandler catch call.

Use Typescript

In portal.js we agreed to move to typescript and I think we probably want this here.

Infer Table Schema from CSV (when uploading) and use in schema editor

If a file is selected we want to infer a frictionless style object that we can use for the metadata. Suggest using data.js and fixing this issue frictionlessdata/frictionless-js#49

Acceptance

  • When a CSV file is selected and it is tabular file the resource object in the state is updated with correct meta and inferred schema
    • Plus sample data e.g. first, say 20-30 rows is put in sample attribute.
  • this works for xlsx (first sheet)
  • When a linked file is chosen we infer core meta from url and try to load data (if CORs issue then just give up (maybe flag an error) MOVED to #32

Cache file hash

It seems that there are cases in which we might calculate the hash of the same type for the same file more than once. I suggest caching the hash on the File object by type once it has been calculated, as this is a fairly expensive operation. Once a hash has been cached, and File.hash(<type>) is called again, it should just be returned (assuming we have hash of <type> cached).

Stub the React application

Stub the react app for the data publishing app focused on adding a resource.

Acceptance

  • Create a repository on github
  • Create a simple react application
  • A basic README

Tasks

  • create a repository on github : https://github.com/datopian/datapub
  • create a simple react application
  • Add basic README
  • Add a basic component to upload a file to some storage and return the response

Edit metadata whilst data is uploading

When the upload is in progress the metadata fields (Title, description, encode, format) the metadata information should be editable at the same moment of the upload.

Acceptance

  • Metadata fields (title, description, encode, and format) editable when the file is uploading.

Tasks

  • UI - Enable metadata fields(title, description, encode, and format) when the file is uploading.
  • JS - Get the new values from the metadata fields and update the information.

Error when trying to upload a binary file

Expected Behavior

I expect to be able to cleanly upload, with no errors in the console, files of any type whether they can be parsed by datapub or not.

Actual Behavior

When trying to upload a binary file (extension is .bin, file is actually a random binary file but I suspect this will happen for many types of files that are not tabular data...), I get the following error in the js console:

Error: We do not have a parser for that format: bin
    _rows file-base.js:137
    rows file-base.js:116
    onChangeHandler index.jsx:33
    React 17
    unstable_runWithPriority scheduler.development.js:653
    React 4
index.js:1
    e index.js:1
    onChangeHandler index.jsx:37
    React 17
    unstable_runWithPriority scheduler.development.js:653
    React 4

This doesn't block the upload process, but this should not be an error level message.

Editable Metadata for file (whilst uploading) and default values are inferred from supplied file (e.g. title ...)

When the user selects a file and starts uploading they are presented with key metadata fields (e.g.title, description, encode, and format) that they can edit. These metadata fields should be auto-populated where possible. The user can then explicitly save (once the file has finished uploading).

Bonus would be extracting the table schema info: I want to extract the table schema info from the file and save it in a JSON object. See #13

Acceptance

Given a file by a user

  • Present a form with key metadata fields:
  • Auto-populate with the information extracted from the file where possible
    • path: from file name
    • title: either empty OR (tbd) a human version of name (e.g. _ => " ")
    • description - should be an empty and editable field
    • format - should be automatically filled with the extension of the file. The user can edit and select a new format from a format list (format list is passed to the app/component on initialisation)
    • encoding: should be empty and the user can select an option from a list of encodings (this encoding list should be passed to the app/component on initlisation)
  • This information should be serialized internally to a frictionless data Data Resource based structure
  • On Save this information will be used to update CKAN MetaStore
    • This assumes package id for this resource has been passed in.

Tasks

  • UI - Create metadata fields for (title, description, encode, and format) with a unique ID for which field.
  • JS - Get the ID's and auto-populate the metadata fields with the file information.
    • I think we have all info we need from HTML5 FileReader ...
    • If we are doing table schema we can/should use data.js https://github.com/datopian/data.js to load the file and infer information
  • TODO: saving process
    • need to check we don't save before file has finished uploading

Bonus

  • check and test more how the import-ui works
  • install and test react-table
  • create a variable "schema" type object
  • iterable the data(table schema) from the file and save in the variable.
  • Show this data with react table

Research and setup storybook for this app

Storybook is a tool that prepares a development environment for UI components. It allows you to develop and design your graphical interfaces quickly, isolated, and independently. Making it possible to define different states for components, thus documenting their states.

Acceptance

  • Short summary for future of what storybook is, why to use it, and how we integrate it
  • Have a story for the component
  • Instructions in README for devs on how to use it

Tasks

  • create a summary about storybook
  • create a story for the uploader component
  • add instructions in README how to use storybook

[resource] Upload or Link Chooser

When the user uploads an open upload app, the user wants to choose it is a file or link.

Acceptance

  • The user can choose what type of upload

* [ ] The metadata will infer the user to choose a link (?)

Tasks

  • UI - Create a new input to be able to choose to upload a link
  • JS/UI - Show both the drag and drop area and the input link
  • JS/UI - After the user chooses an upload link.
    • Does the drag and drop area should be hidden?

* [ ] JS/UI - After the user chooses an upload link and insert a link.
- Does the metadata should be auto-populate like already happens with the upload a file?

Bonus:

  • Refactor
    • Upload component
    • state structure
    • add more test

Componentize and declare the usage

At the moment, we have default setup created by CRA tool. This auto initializes the app without any arguments being passed:

// src/index.js
import React from 'react';
import ReactDOM from 'react-dom';
import App from './App';

ReactDOM.render(
  <React.StrictMode>
    <App />
  </React.StrictMode>,
  document.getElementById('root')
);

What do we want is to be able to choose components that is needed for a project and construct the UI:

For example, in a specific project app, you may use only FileUploader and MetadataEditor:

import React from 'react';
import { FileUploader, MetadataEditor } from 'datapub';

function MyApp() {
  return (
    < FileUploader />
    <MetadataEditor />
  );
}

export default MyApp;

or you can just load entire app with all default components:

import React from 'react';
import { ResourceManager } from 'datapub';

function MyApp() {
  return (
    < ResourceManager resource={resource} />
  );
}

export default MyApp;

same for multi resource manager:

import React from 'react';
import { MultiResourceManager } from 'datapub';

function MyApp() {
  return (
    <MultiResourceManager resources={resources} />
  );
}

export default MyApp;

Acceptance criteria

  • I can use specific components of the app in my project
  • I have clear docs about the usage

Tasks

  • Refactor /src/index.js according to analysis
  • Change the structure of the app including file names etc. to align with the analysis
  • Sync with @mariorodeghiero and create a PR so that we are on the same page

Analysis

How to refactor /src/index.js:

As we need to componentize this app, we need to export each component so that it can be imported and used from the client app:

// src/index.js
export { ResourceEditor, MultiResourceEditor } from './App';
export { FileUploader } from './components/FileUploader';
export { MetadataEditor } from './components/MetadataEditor';
export { SchemaEditor } from './components/SchemaEditor';
export { ProgressBar } from './components/ProgressBar';

We also can have an option for users who just wants to mount this app to the HTML:

// src/index.js
import React from 'react';
import ReactDOM from 'react-dom';
import App from './App';

function mountResourceEditorApp([elementId, datasetId] = ['root', undefined]) {
  if (!datasetId) {
    throw error
  }

  ReactDOM.render(
    <React.StrictMode>
      <App ... />
    </React.StrictMode>,
    document.getElementById(elementId)
  );
}

[resource] Resource edit page integration

Have DataPub work for resource edit as well as resource create

This issue will probably require changes both here and in ckanext-blob-storage to pass in the details of the existing resource (either just the resource id (and dataset id?) and then DataPub code looks up the resource object from the API or by passing in complete JSON object).

Acceptance

Resource edit page works correctly

  • You can visit the page, edit metadata or schema and save
  • You can remove existing data file or link
  • You can then add a new one (including re-inferring schema and other metadata (??) TODO: work out correct behaviour)

Tasks

  • Have DataPub take over resource edit page (ckanext-blob-storage)
    • Have DataPub passed the right info by CKAN v2 system
  • Add state attribute to DataPub to indicate whether in create or edit mode (create is default)
  • Call resource_update rather than resource create
  • Pass in the resource object
  • Support for removing existing resource
  • Redirect to dataset page on save (like for resource create)

Default config values are not being used

If I don't provide any config values the default values are not being used.

Expected result

Use the following config values:

{
    authToken: "be270cae-1c77-4853-b8c1-30b6cf5e9878",
    api: "http://localhost:5000",
    lfs: "http://localhost:5001", // Feel free to modify this
    organizationId: "myorg",
    datasetId: "data-test-2",
}

Actual result

{
    api: null,
    authToken: null,
    datasetId: "test-id",
    lfs: null,
    organizationId: null,
}

Table schema scroll + css

The table schema component renders without scroll(x) and scroll(y).

Acceptance

  • add scroll(x) and scroll(y)

Tasks

  • add max-width and max-height
  • add scroll(x) and scroll(y)
  • removed margin-top from app
  • centralize the tableschema
  • remove background-color from app

File upload progress reporting

When the user uploads a file to the storage the upload progress expects to be tracked showing a progress bar to the user with percentage done, time remaining and kb/s sec.

Acceptance

  • I can track the progress when uploading the file to the blog storage
  • Show a circular progress bar to the user with time remaining and percentage done and kb/s sec.

Tasks

  • UI - Create a circular progress bar
  • JS - Track the progress and populate the progress bar
  • JS/UI - Show percentage done, time remaining and kb/s sec

Ability to copy table schema from existing resource in the same dataset

When creating a new resource in a Dataset I want to be able to copy a table schema from an existing resource (if there is an existing resource) so that I don't have to re-enter the whole table schema again

Flow

In terms of flow i imagine something like:

  • Click: New resource
  • Select file to upload
    • File starts uploading, various editors including schema editor appear
  • There is a control (outside of schema editor) "Copy Schema over from Existing Resource" with select box with existing resources to choose from (resources for this dataset - gray out those resources (maybe with hover saying "no schema to copy") which do not have a schema attribute!)
    • If use choose a resource from that then you overwrite schema for the current new resource with schema from that resource selected
    • Schema editor updates
    • [Perhaps] some flash message like "Schema updated"

NB: we don't need this feature to show up on resource editing (though no harm in it being there i suppose so could have it)

Acceptance

  • When creating a table schema for a new resource I want to be able to start from an existing one so that I don't have to write everything out again
    • Default one at the dataset level OR copy from a previous resource [that you can copy over at creation time ...]
    • [ ]

Tasks

  • Mock this up in e.g. figma
  • Mock a dataset object with say 2 existing resources
    • Initiating the ResourceEditor app with a mock
    • Handle case (or check if happens) that ResourceEditor not initiated with a Dataset - this would involve retrieving the dataset from the API using the ckan-client-js retrieve method
  • Implement the widget and add to ResourceEditor app
  • (Test??)

[epic] React Resource editor (+ multiple resources at once, table schema, CKAN integration)

New React-based app/component for creating and editing one or more resources add adding to a Dataset. This replaces CKAN classic resource create and edit but ONLY for ckan inistances using ckanext-external-storage (which we hope will soon be most or all of our CKAN instances).

NB: we can assume dataset already exists. In later evolution we will extend to include dataset creation etc.

Design features:

  • Use React
  • (Resource) Schema will now stored as attribute schema on a resource (as per Frictionless pattern) rather than in Data Dictionary and associated with DataStore.
  • Inside React we use Frictionless spec for describing resources, table schemas etc (and convert to CKAN format at boundary)

Job Stories

TODO - see https://tech.datopian.com/publish/

Mockups

Figma link

Acceptance

In a CKAN classic instance using ckanext-external-storage the resource editor is a React app and:

  • I can upload resources (direct to cloud)
  • I can edit resource metadata
  • I can edit the table schema
  • This is all saved to CKAN MetaStore
  • This works for editing existing resources i.e. the React app can also load / get given an existing resource
  • I have a migration script for migrating existing CKAN instances (i.e moving data dictionary info to schema attribute)

Leftovers

  • Fixing up DataStore to retrieve table schema info from resource object Is this needed? If people who upgrade to ckanext-external-storage also upgrade to AirCan there is no need. => Probably no need

Status

graph TD

choose[Select File or Link File]

choose --> select[Select file to upload]
choose --> link[Link file to upload]

select --> openfile(Open File => data.js Resource/File object)
openfile --> upload[Upload in Background + Upload UI]
openfile --> metaeditor[Metadata Editor]
openfile -.if tabular.-> schemaeditor[Schema Editor]
upload --completes--> save
metaeditor --no errors--> save
schemaeditor --no errors--> save

link -. if you can open or just create default.-> openfile

cancel[Cancel this resource]

classDef done fill:#21bf73,stroke:#333,stroke-width:1px;
classDef nearlydone fill:lightgreen,stroke:#333,stroke-width:1px;
classDef inprogress fill:orange,stroke:#333,stroke-width:1px;
classDef next fill:lightblue,stroke:#333,stroke-width:1px;

class choose,upload,metaeditor,select done;
class select,nearlydone nearlydone;
class schemaeditor,inprogress inprogress;
class openfile next;
Loading

Tasks

v0.1 JS app (macro component) for adding a single resource to a dataset with file upload

  • Stub the React application 1d #2
  • Set up tests 1d #3
  • Setup storybook #4
  • Refactor current upload code into React setup so resource upload working 3d #6
  • Resource metadata editor #12

v0.2

v0.3

  • React Table Schema editor component #13 [2d]
  • Link or File Upload chooser #18 [1d]
  • Link a file component #21 [2d]

v0.4: Integrate this all together into a Resource editor

  • React App combining several lower level components into an app / flow: Upload, Edit metadata, Edit table schema, Save [1.5d] @mariorodeghiero #16
  • Saving metadata #29
  • Integrate resource editor into CKAN (should not be too bad as done via ckanext-external-storage though more "surgery" may be needed) [2d] #38

v0.5 Add multiple resources at once

  • Multiple files add #23 [3d]
  • UI adjustments for multiple files #24 [2d]

v0.6

  • Make this work when editing an existing resource [2d] @??

v??

  • Pause / resume (?) - depends on multipart support in giftless

Analysis

  • What is pattern for storing and sharing state amongst components in DataPub apps and workflows?
    • Redux
    • Apollo
  • Are we only replacing resource create and edit stage in the CKAN Classic? Eg, in Vanilla CKAN, admin creates a dataset by creating a package first, then moves to resource create step.
    • What are we doing eventually (and where?)? Hypotheses
      • We doing everything
      • We are just doing resource editor (and dropping into CKAN UI) <-- RIGHT NOW
      • We are doing several components (e.g. resource editor, dataset metadata editor) but not changing the flow

Components

Low level

  • FileOrLinkChooser => SharedState: FileOrLinkChosen
  • Select File => SharedState: RawFile
    • uses data.js to open the file and put that Resource object in your internal state system (apollo)
  • MetaEditor
  • UploadProgress
  • SchemaEditor
  • Save

Parent level components and State

  • ResourceManager
GsaApp = DatasetManager
GatesResourcEditorApp = Multi

DatasetManager
  dataset
  ui
    ...

ResourceManager(resource={}, dataset_id=None)

  state: {
    // only needed if this component saves the resource
    // (usually parent component e.g. DatasetManager will deal with saving ...)
    dataset_id: ...
    // follows frictionless Data Resource
    // if existing this will have lot of content o/w empty
    resource: {...}
    ui
      fileOrLink: file | link
      uploadComplete
      ...
  }
    
MultiResourceManager(dataset)

DatasetMetadataManager

Open method - from data.js

open(HTML File object or URL) => if you can't open the url just give { path: url }

resource

{
  name
  title
  format
  path
  sample: // first 10/20 rows ...
  schema
}

Integrate React app into CKAN Classic UI

Adding JS module - https://docs.ckan.org/en/2.8/theming/fanstatic.html - but this is for adding custom JS module. What we need is to replace current "resource editor" UI/UX with our React app.

Current flow:

sequenceDiagram

  Client->>CKAN: request `/datast/new`
  CKAN->>Client: HTML form + JS modules (jquery etc.)
  Client->>CKAN: Post form data to `/dataset/new`
  CKAN-->>CKAN: Create dataset in the db with state as draft
  CKAN->>Client: 302 to `/dataset/new_resource/{dataset-name}`
  Client->>CKAN: Request `/dataset/new_resource/{dataset-name}`
  CKAN->>Client: HTML form + jquery module for file uploader
  Client->>CKAN: Post form data to `/dataset/new_resource/{dataset-name}`
  CKAN-->>CKAN: Create a resource in the db + update dataset's state to active
  CKAN->>Client: 302 to `/dataset/{dataset-name}`
  Client->>CKAN: Requst `/dataset/{dataset-name}`
  CKAN->>Client: HTML of dataset page
Loading

Note that when uploading file, you should post multipart/form-data.

What do we need to replace "Resource Editor":

  • Build a JS module from the React app
  • Make sure you have fanstatic directory registered in the CKAN extension
  • Copy the JS module(s) to the fanstatic directory
  • Override 'new_resource.html' in the extension and init React module from there: {% resource 'my_extension/bundle.js' }

Integrated resource editor app in React

React App combining several lower level components into an app / flow: Upload, Edit metadata, Edit table schema, Save

Need to mock this up a bit.

  • Upload
  • Metadata editor
  • Schema editor

Could be stepped or all in one page (maybe hide show for some of it so it would work with multiple resources).

Acceptance criteria

  • I can use this app (with some config) to add new resources into a CKAN DMS

Tasks

  • Wire components together into an overall component
  • Mock up wireframe
  • Implement the wireframe

Upload of large files is failing

This issue is related datopian/datapub-nhs#14

After some analysis and tests to upload large files, the root cause to upload large files happens because of the code below.
Note: After removed this part, I tested with 600Mb and 1.gb and it works very well.

try {
	const rowStream = await resource.rows({ size: 20, keyed: true })
	resource.descriptor.sample = await toArray(rowStream)
	await resource.addSchema()
} catch (e) {
      console.error(e)
}

Why exist the try-catch above? The code above is necessary to be able to render data into the table schema component, but for some reason is consuming a lot of memory, and sometimes crash the browser with no errors on the console dev tools.

Another error: in one of our tests we also try to upload a different format and it failed, and showed a parser error, as you can see on the screenshot below.

image

UI adjustments for multiple files

When the user drags and drops multiple files, the user wants to see all the files with each status independently and when the user clicks on card status, in metadata editor expects to show the file information.

Acceptance

  • User can drag & drop multiple files
  • User can see the upload info for all files uploading independently
  • User can edit the metadata information for each file independently

Tasks

  • Draft design and UI flow in figma and check with client
  • Implement

e.g. prob something like ...

  • UI - Render all files in the upload status area
    • Render all files in the upload info
    • Show a maximum of 4 files
    • when the user create upload more than 4 files should appear a nice scroll bar in the upload area
      - [ ] In the upload info area show the total of files
    • When the upload request is a success or failure to change the progress Bar circle to success or failed
  • Be able to select each file uploading and edit each metadata information independently

[resource] Integrating DataPub into existing CKAN v2 resource create and edit flow

We need to insert DataPub resource editor into the pre-existing flow of CKAN resource editing (using https://github.com/datopian/ckanext-blob-storage to override CKAN templates).

Acceptance

  • Discovery of existing system and recommended intervention
  • Resource create flow done
  • Resource edit flow done
  • Documentation of any changes from current behaviour

Tasks

  • Analyse how current system works in CKAN (w/o DataPub) DONE: see below "current system"
  • Create flow showing how system with DataPub will work DONE: see below "new system"
    • Flag any differences
  • Implement resource create flow
  • Implement resource edit flow #37

Analysis

Current system

graph TD

dnew[Click Dataset New]
dmeta[Dataset Metadata]

dnew --> dmeta

rnew[New Resource]

dmeta --saves dataset as draft--> rnew
rnew --finish--> dpage[Dataset Page]
rnew --add another--> rnew
rnew --previous-->dmeta


dedit[Click Edit Dataset] --> dupdate[Update Dataset Metadata]
dupdate --save/update--> dpage
dedit --resources--> res[Resources page]
res --add new--> rnew
res --click on existing resource --> redit[Resource Edit]
redit --delete--> dpage
redit --update--> resview[Resource view page]
Loading

New system

graph TD

dnew[Click Dataset New]
dmeta[Dataset Metadata]

dnew --> dmeta

rnew[New Resource]

dmeta --saves dataset as draft--> rnew
rnew --save and publish + redirect--> dpage[Dataset Page]

dedit[Click Edit Dataset] --> dupdate[Update Dataset Metadata]
dupdate --save/update--> dpage
dedit --resources--> res[Resources page]
res --add new--> rnew
res --click on existing resource --> redit[Resource Edit]
redit --delete--> dpage
redit --update--> resview[Resource view page]
Loading

Resource editor

graph TD

start --> remove
remove --> showupload[Show Upload/Link options]
Loading

[resource] Component for Linking a Resource

Have a component for "Link a file" in Resource add

Acceptance

  • a form field providing a link
  • When you type something in the url component and it loses focus we should bring up metadata editor and schema editor with info filled in if we can
  • We should attempt to retrieve that link in the browser (using data.js)
    • show 404 if link does not work
    • Handle CORs errors gracefully
    • Use the info from data.js to populate meta and schema editors

Tasks

Refactor due to recent updates in ckan-client-js

Refactor due to recent updates in ckan-client-js so we can create/update resources + push to blob.

Acceptance

  • Refactor react code to use the recent updates in the SDK

Tasks

  • Refactor react code to use create resource
  • Refactor code to update resource
  • Refactor code to push to blob

[resource] Save (updated) resource metadata to DMS backend i.e. CKAN

Save resource metadata (including sha-256 field which indicates this is stored via giftless) into backend (CKAN).

Acceptance

  • On hitting save in resource manager resource metadata is saved to CKAN backend
  • And datasets is published (change from draft state)
  • And you are redirected to dataset page

Tasks

Preliminaries in external libs

Work here

  • On resource create page call relevant methods in JS SDK to save resource to backend DONE. But turns out there are issues
  • Resource Create page: Stop calling package_update and call resource_create and only do that on when user hits save
    • Have an internal method createResource(f11sResourceMeta, datasetIdOrName)
  • Rename "Save Metadata" to "Save and Publish". After saving then redirect to: /dataset/{dataset.name}
    • Rename button
    • (Continue doing resource create) AND Dataset state to active from draft and call push (package_update)
    • Redirect to dataset page

[resource] Multiple files add

When the user Drag & drop multiple files, the uploading process expects to start for all files automatically and start to track the progress independently for each file.

See #1 for background.

Wireframe

Acceptance

  • Be able to upload multiple files
  • Be able to track the upload progress independently for each file
  • Be able to edit metadata for each of the uploaded files
  • Be able to hit save (and save everything) once all uploads complete
    • Send each resource independently and remove from the UI list, so the user can edit/save the others resources and in the last resource sent, will redirect the user to the dataset page.
  • Be able to use the edit resource page

Tasks

  • Render all files in the upload info area
    • create an array with all files
    • insert this array of files in the state
    • each file should be identified with a unique key
    • the unique key also should be pass when the user clicks in the card uploading, so I can do any action for that file.
  • Use all the steps above for edit resource page

Questions:

  • Do we need to limit the file size?
  • Do we need to set the type of these files? ie. only accept CSV

DataPub as component library and trialling first external DataPub based apps

Data Publishing workflows are by their nature highly custom - though with many common components. For example, one site the resource editor may have one set of fields, and in another they have a different set of fields (as per the discussion in
http://tech.datopian.com/publish/)

Our approach to this is for DataPub to provide a rich set of common components for building those flows and a rich set of template applications that show how to combine those components.

Finally, to integrate a DataPub based app into CKAN v2 one needs a CKAN extension that would insert the React app into CKAN templates in right locations. For that we are creating https://github.com/datopian/ckanext-datapub

Where we are and what we do next

Right now the only app we have is a "demo" app in DataPub that shows off most of the components. That app is then embedded into ckanext-blob-storage

Where we want to go:

  • Separate apps datapub-xxx for each use case
  • DataPub.js has clear set of components with instruction/documentation of use (e.g. 1) run create-react-app 2) Change your App.js in this way etc). Also has demo (storybooked?) example apps that users of DataPub can use as a basis for their apps)
  • Separate extension ckanext-datapub that can be customized to integrate a given custom uploader (ckanext-blob-storage can probably revert to its original simplest state of just having different upload)

Acceptance

  • At least one external app
  • DataPub code updated (if needed) to support reuse in that app
  • DataPub "vanilla" app that does what CKAN v2.8 does as resource editor (modulo direct to cloud uploading) WONTFIX - IMO not worth this vs just using the improved editor we have
  • DataPub README update with instructions on reuse
  • ckanext-datapub exists and is functional for integrating datapub based apps into CKAN v2. https://github.com/datopian/ckanext-datapub
  • ckanext-blob-storage is back in its original state (? - we may want to always use DataPub going forward and not worth maintaining blob storage resource editor )
    • Update to tech.datopian.com/blob-storage/ to reflect ckanext-datapub approach

Tasks

  • External app trial
  • Update DataPub documentation and code #64
    • Code updates to make this a reusable library
    • README updates
    • making the core app a demo/example
  • New CKAN extension for integration ckanext-datapub #63

New CKAN extension `ckanext-datapub` for integrating DataPub based apps into CKAN v2 workflow

There are two areas of integration:

  • Dataset creator (and editor)
  • Resource editor

DataPub based app could replace one or both of these. (Note DataPub based app may have other steps that CKAN does not currently have but we are jus focusing on current CKAN v2 end).

We need a CKAN extension that can replace one or both of Dataset editing and Resource editing by handing off to a React app. The React app source code will likely be located elsewhere (need to specific its build state for integration).

Acceptance

Refactor current file upload code into React setup for resource creation

Integrate the ckan3-js-sdk with this code and refactor so that we can upload a single file to the storage.

Acceptance

  • I can upload a file to (azure) storage using the react app by dragging and dropping a file or selecting it
  • success/completion (or failure) is reported in some basic manner
  • (?) There is a test for this (? using mocks) and cypress (?)

Tasks

  • require SDK in datapub
  • check and provide the configuration necessary to authenticate in ckan-authz ~rufus not sure what this means
  • refactor react code and make it work with the SDK uploading one file to the storage
  • show the request response to the user

Trial uppy as library for upload SDK (including adjustment for Azure)

We want to research if we can use uppy for our file upload. https://uppy.io/ is a fully featured client library for managing uploads with many adapters. Features include:

  • support for many cloud storage sources out of the box
  • import direct from google drive, dropbox etc
  • resumable file uploads
  • i18n
  • recovery from crashes
  • etc

Acceptance

Answers to questions like:

  • Is uppy good?
  • How easy is it to use the features?
  • โ— can we integrate uppy with what we are doing? In particular, can we integrate a bespoke function for signing files for cloud storage upload?

Tasks

Discovery: Pause and resume on file uploading

This a discovery ticket to research how the pause/resume works and what is necessary to implement this functionality.

Acceptance

  • Research and write about how to implement pause/resume functionality.

Tasks

  • Research and take note of what we need to implement the pause/resume functionality.

Update the code to make it reusable

Acceptance

  • This can be installed in my React app using git repo in the package.json
  • I can import { Upload } from 'datapub'; in my app and this will not throw error.
  • README is updated so that accordingly

Tasks

  • Update library so that we can use this in any React based project.
    • Metadata
    • Choose
    • InputFile
    • InputUrl
    • ProgressBar
    • Upload
      - [ ] ResourceEditor
    • TableSchema
  • Import doesn't work. Fix this and update docs accordingly

[resource] Remove 2 step process and show schema editor

Current setup

2 stages and you have to click between.

image

Issue is that UX is bad as people don't see schema editor.

Wanted

Show both metadata editor and schema editor at same time one below the other. Something like this without the 1 and 2:

image
image
image

Pattern for creating custom publish flows on top of DataPub

We already have an example where we need to customize the publish flow:

Also, a resource can be restricted (eg, public vs private) but it is something custom and it is missing in DataPub. Eg: "restricted": "{"level": "public"}"

We need custom publish flows per client. e.g. this feature is specific to a given ckan deployment. Rather than modify ckanext-blob-storage we basically need to be able to install a custom React app.

ckanext-blob-storage should have NO dependency on datapub.js. What we want is to pass a React app into CKAN or rather ckanext-blob-storage.

You create

ckanext-publish-{client-name}

And inside of this is a React app using components from DataPub.js.

OR we have a way inside ckanext-blob-storage to point to a react app.

Basic tests working for the app with CI

Set up tests with Jest in the React application to test the upload flow. We use Jest because it is now default for React and has nice documentation, support, and tutorials.

2 kinds of tests:

  • Component tests via jest
  • Integration tests (via cypress) - make not be needed for simple components ...

Acceptance

  • We have one or more tests of basic system
  • We have continuous integration with github actions i.e. tests are run on every push etc

Tasks

  • Have a basic test for your first component with jest
  • Instructions on how to run the tests in the developer section of README
  • Set up github actions for jest

[epic] React Resource Editor (+ table schema editor + CKAN integration)

New React-based app for creating and editing resources. This replaces CKAN classic resource create and edit but ONLY for ckan inistances using ckanext-external-storage (which we hope will soon be most or all of our CKAN instances).

Design features:

  • Use React
  • (Resource) Schema will now stored as attribute schema on a resource (as per Frictionless pattern) rather than in Data Dictionary and associated with DataStore.
  • Inside React we use Frictionless spec for describing resources, table schemas etc (and convert to CKAN format at boundary)

Acceptance

In a CKAN classic instance using ckanext-external-storage the resource editor is a React app and:

  • I can upload resources (direct to cloud)
  • I can edit resource metadata
  • I can edit the table schema
  • This is all saved to CKAN MetaStore
  • This works for editing existing resources i.e. the React app can also load / get given an existing resource
  • I have a migration script for migrating existing CKAN instances (i.e moving data dictionary info to schema attribute)

Leftovers

  • Fixing up DataStore to retrieve table schema info from resource object Is this needed? If people who upgrade to ckanext-external-storage also upgrade to AirCan there is no need. => Probably no need

Tasks

Est (remaining): 9d

  • Resource upload and metadata editor #1
    • Resource upload
    • Resource metadata editor
    • Multiple resources
  • Developing a React Table Schema editor component #13 [2d]
  • Integrate this component into a Resource editor @anuveyatsu
  • Make this work when editing an existing resource [2d] @??
  • Integrate resource editor into CKAN (should not be too bad as done via ckanext-external-storage though more "surgery" may be needed) [1d]

Analysis

  • What is pattern for storing and sharing state amongst components in DataPub apps and workflows?
    • Redux
    • Apollo
  • Are we only replacing resource create and edit stage in the CKAN Classic? Eg, in Vanilla CKAN, admin creates a dataset by creating a package first, then moves to resource create step.
    • What are we doing eventually (and where?)? Hypotheses
      • We doing everything
      • We are just doing resource editor (and dropping into CKAN UI) <-- RIGHT NOW
      • We are doing several components (e.g. resource editor, dataset metadata editor) but not changing the flow

Integrate React app into CKAN Classic UI

Adding JS module - https://docs.ckan.org/en/2.8/theming/fanstatic.html - but this is for adding custom JS module. What we need is to replace current "resource editor" UI/UX with our React app.

Current flow:

sequenceDiagram

  Client->>CKAN: request `/datast/new`
  CKAN->>Client: HTML form + JS modules (jquery etc.)
  Client->>CKAN: Post form data to `/dataset/new`
  CKAN-->>CKAN: Create dataset in the db with state as draft
  CKAN->>Client: 302 to `/dataset/new_resource/{dataset-name}`
  Client->>CKAN: Request `/dataset/new_resource/{dataset-name}`
  CKAN->>Client: HTML form + jquery module for file uploader
  Client->>CKAN: Post form data to `/dataset/new_resource/{dataset-name}`
  CKAN-->>CKAN: Create a resource in the db + update dataset's state to active
  CKAN->>Client: 302 to `/dataset/{dataset-name}`
  Client->>CKAN: Requst `/dataset/{dataset-name}`
  CKAN->>Client: HTML of dataset page
Loading

Note that when uploading file, you should post multipart/form-data.

What do we need to replace "Resource Editor":

  • Build a JS module from the React app
  • Make sure you have fanstatic directory registered in the CKAN extension
  • Copy the JS module(s) to the fanstatic directory
  • Override 'new_resource.html' in the extension and init React module from there: {% resource 'my_extension/bundle.js' }

Report: bugs after trial of DataPub in the existing project

Tasks

  • Icon for upload doesn't load @mariorodeghiero
  • Progress bar - doesn't work if file already exists - stays at 0% but says upload succeeded @mariorodeghiero
  • 'Save' button is disabled so I cannot edit the metadata and save @mariorodeghiero
  • After resource is created following properties are missing:
    • URL for resource
    • Name
    • Size
    • Mimetype
    • See example of resource_show response below
  • Also, a resource can be restricted (eg, public vs private) but it is something custom and it is missing in DataPub. Eg: "restricted": "{\"level\": \"public\"}"
{
"help": ".../api/3/action/help_show?name=resource_show",
"success": true,
"result": {
"cache_last_updated": null,
"package_id": "433aa98f-9dec-4052-a4b4-0b2590a9e5e8",
"datastore_active": false,
"id": "8ef6662d-5052-437c-9f5d-7704f8b5d88a",
"size": null,
"restricted": "",
"encoding": "utf-8",
"state": "active",
"schema": "{u'fields': [{u'type': u'yearmonth', u'name': u'YEAR_MONTH', u'format': u'default'}, {u'type': u'string', u'name': u'ODS_CODE', u'format': u'default'}, {u'type': u'integer', u'name': u'VMP_SNOMED_CODE', u'format': u'default'}, {u'type': u'string', u'name': u'VMP_PRODUCT_NAME', u'format': u'default'}, {u'type': u'integer', u'name': u'UNIT_OF_MEASURE_IDENTIFIER', u'format': u'default'}, {u'type': u'string', u'name': u'UNIT_OF_MEASURE_NAME', u'format': u'default'}, {u'type': u'integer', u'name': u'TOTAL_QUANITY_IN_VMP_UNIT', u'format': u'default'}], u'missingValues': [u'']}",
"dialect": "{u'quoteChar': u'\"', u'delimiter': u','}",
"hash": "c252dc3247713d7bd83c9b7a7729168273092231f736422c1a599e8dbf58a014",
"description": "",
"format": "CSV",
"last_modified": null,
"url_type": null,
"mimetype": null,
"cache_url": null,
"created": "2020-09-17T06:41:28.013104",
"url": "",
"mimetype_inner": null,
"position": 19,
"revision_id": "d6716aed-38e0-4126-9bde-ff6c244346e0",
"resource_type": null
}
}

Table schema column empty

When the user adds a resource with empty headers broken to render the table schema component.

Acceptance

  • render the table schema component when the resource with empty headers

Tasks

  • set column_${index} when the header and accessor is empty
  • render without broken the application

error:

Screenshot from 2020-10-05 12-32-27

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.