Giter Site home page Giter Site logo

universaldatatool / universal-data-tool Goto Github PK

View Code? Open in Web Editor NEW
1.9K 37.0 183.0 253.34 MB

Collaborate & label any type of data, images, text, or documents, in an easy web interface or desktop app.

Home Page: https://universaldatatool.com

License: MIT License

JavaScript 98.34% HTML 1.31% CSS 0.03% Dockerfile 0.03% Shell 0.12% Singularity 0.17%
computer-vision annotate-images entity-recognition desktop classification dataset annotation-tool deep-learning text-annotation named-entity-recognition

universal-data-tool's Introduction

Universal Data Tool

GitHub version Master Branch npm version GitHub license Platform Support Web/Win/Linux/Mac Slack Image Twitter Logo

Try it out at udt.dev, download the desktop app or run on-premise.

DocsWebsitePlaygroundLibrary UsageOn-Premise

The Universal Data Tool is a web/desktop app for editing and annotating images, text, audio, documents and to view and edit any data defined in the extensible .udt.json and .udt.csv standard.

Supported Data

Image SegmentationImage ClassificationText ClassificationNamed Entity RecognitionNamed Entity Relations / Part of Speech TaggingAudio TranscriptionData EntryVideo SegmentationLandmark / Pose Annotation

Recent Updates

Follow our development on Youtube!

Features

Sponsors

wao.ai sponsorship image momentum image enabled intelligence image

Installation

Web App

Just visit universaldatatool.com!

Trying to run the web app locally? Run npm install then npm run start after cloning this repository to start the web server.

Desktop Application

Download the latest release from the releases page and run the executable you downloaded.

Contributing

Contributors ✨

Thanks goes to these wonderful people (emoji key):


Severin Ibarluzea

💻 📖 👀

Puskuruk

💻 👀

CedricJean

💻

beru

💻

Marc

💻 📖

Wafaa-arbash

📖

Pierre Grimaud

📖

sreevardhanreddi

💻

Mohammed Eldadah

💻

x213212

💻

hysios

💻

Cong Dao

💻

Renato Junior

🌍

Rick

🌍 💻

anaplian

💻

Miguel Carvalho

🌍

Kyle OBrien

💻

Hakkı Yağız ERDİNÇ

💻

João Victor Davim

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

universal-data-tool's People

Contributors

allcontributors[bot] avatar beru avatar cedricjean avatar cedricprofessionnel avatar congdv avatar davebulaval avatar foldblade avatar glebshulga avatar hakkiyagiz avatar hysios avatar jvdavim avatar miguelcarvalho13 avatar mrdadah avatar mrjunato avatar obrien-k avatar ownmarc avatar pgrimaud avatar puskuruk avatar rickstaa avatar semantic-release-bot avatar seveibar avatar wafaa-arbash avatar wuubi avatar x213212 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

universal-data-tool's Issues

Exclusive image classification Output should be string, not array

Steps to reproduce:

  1. Create image classification task with some image samples (use cat toy dataset) and click "No" to allow multiple classifcations
  2. Complete some samples
  3. Example the JSON in the settings page "Edit JSON" button. The taskOutput contains single item arrays for each sample when it should have an array

NER splits on letters with accents

I do not think the NER interface should split on letters with accents. Here are some really common in French "ùûàâçéèêëïîô"

image

Bugs

Played with your tool a little, here is a few things I found :

  • Settings wont pop up when on full screen mode
  • Resizing a box where the right side becomes the left side (or the opposite) is not possible, it simply redo the original box. Same for top/bottom. I think this should not be the expected behavior.

Here are some suggestions :

  • Make Task description optional (ReactImageAnnotate component)
  • Make the "selector" always present (it should be the default cursor i think). Pressing a key like "W" from cursor should get us in the drawing mode and once completed, back to selector mode.
  • Divider between the always present and optional tools on the left side bar
  • Center the image by default in the Pan
  • For bounding box, a single click shouldn't create a 1 pixel box. Either make it so we can draw box (click -> drag -> click) or simply do nothing since it is probably a missclick
  • Put the label inside the annotation if the annotation of the bounding box is 100% height of the image
  • 1 color per class, I think this should the default. Could be optional to have different colors for all.
  • make it possible to have a "default class" just like Labelimg. All Labelimg is missing is being able to change the default class using hotkeys!
  • make it impossible to put the corners outside of the image. In the gray area.
  • make it possible to go to the next image without comming back to the selector every time

Feature: 3D Bounding Boxes

We believe that people can need "3D Bounding Boxes" and we want to work on that. 🎉

if you'd like this functionality please let us know by leaving a thumbs 🤘

Feature: Hotkeys for Composite Tasks + "Automatic Next" on Composite Tasks

Composite tasks currently take a lot of clicking to complete. This update should introduce hotkeys for each subinterface in a composite task and allow the user to configure the completion of a composite task to open the next subtask automatically. I.e. after completing the first task within the composite task you complete the next.

feature: import of NER JSON

Hi there. hank you for a great tool.
I am curious whether you are considering support for import of pre-annotated text for NER?
This is a very common task in active learning setup / post-regex-clean up step.

SQLite Collaboration Server

First reported in #16. The collaboration server is currently written with a scalable serverless architecture hosted on zeit now. We want to have a different codebase for the local one. Because the zeit now code was built for a commercial project, we can't open-source the code. But we can build a new version that implements the API.

Here is the full specification:

Universal Data Tool Collaborative Editing Server

Goals

  • Users should be able to collaborate with other users to complete the labeling of a dataset together
  • Users should receive notifications as work is completed or started by other users
  • Users should receive "updates" from other users in less than 500ms
  • The "Settings" should be able to be edited by any user
  • New data uploaded should be supported by any user
  • Collaborative links should be shareable
  • The first time someone enters collaboration mode a dialog should explain how to share the link etc.

Out of Scope

  • Should not require any login
  • Collaborative editing on a per-sample basis
    • Collisions should take "last person who submitted edit"
  • Completion time estimate

Key Technologies

  • fast-json-patch is used to send patches
  • object-hash is used to hash objects to produce hashOfLatestState
  • micro is used for endpoints
  • ava is used for testing
  • sqlite is used as the database
  • better-sqlite3 is an npm module that makes the connection to sqlite very fast and simple

Architecture

The following endpoints are used...

  • POST /udt/session: Creates a link to a UDT session. Whoever initiates collaboration mode calls this. It is called exactly once to start a session. A session lasts indefinitely. Returns the url to the session.
  • GET /udt/session/<session_id>: Gets the latest version of the UDT JSON file by getting the latest session_state (see DB Architecture)
  • GET /udt/session/<session_id>/diffs: Gets recent diffs for the JSON file
    • The requestor must provide the querystring parameter since=<ISODATE> indicating that they would like the diffs since the last time they polled.
    • The UDT will poll this every 250-500ms. Most of the time it'll return an empty array of patches.
    • Responds with { patches: Array<JSONDiffPatch>, hashOfLatestState, latestVersion }
  • PATCH /udt/session/<session_id>: Sends a JSONDiffPatch object with changes
    • Request contains { patch, mySessionStateId }
      • patch is applied against the latest session state to generate a new session state.
      • mySessionStateId isn't used (for now)
    • Should return { hashOfLatestState, latestVersion }
  • PATCH /udt/session/<session_id>/sample: Creates modifies or deletes a sample
    • This endpoint should be used instead of the /udt/session/<session_id> endpoint for updating, creating or deleting samples because it can handle certain edge cases better.
    • A request contains { operation, sampleIndex, [newInput], [newOutput], [previousInput] }
      • operation can be "DELETE", "CREATE", "UPDATE"
      • newInput is the taskData[sampleIndex] that the UDT observes when it sends the request
        • If "UPDATE" or "DELETE", use previousInput to find the true sample index. (i.e. do a deep comparison to find the sampleIndex using the latest version of the state).
      • newOutput is the new output for "UPDATE" operations. It is optional because the user may not want
      • sampleIndex provided by the requestor not be used.
    • Should return { hashOfLatestState, latestVersion }

Example

Let's look at a typical collaborative workflow to see how these endpoints work:

  1. After User1 engages collaboration mode, an API request is sent to POST /udt/sessionUser1's editor parses the response and creates a link for them to share.
  2. User1 shares the link with their team (only User2) and begins to edit
  3. User2 uses the link to join the session. They get the latest version of the UDT JSON by calling GET /udt/session/<session_id>. They know the session_id because it's embedded in the link.
  4. User2 edits something in the settings. The UDT makes a request to PATCH /udt/session/<session_id> with a JSONDiffPatch containing they're changes.
  5. User1 polls GET /udt/session/<session_id>/diffs?since=<last_version> to get the latest patches. User1's editor sees that there's a patch to apply from User2. They apply the patch, and display a notification for the user.
  6. User1 begins to edit a sample. This triggers a request to PATCH /udt/session/<session_id>/sample changing the taskData[sampleIndex].isBeingEdited to true.
  7. User1 finishes editing a sample. This triggers a request to PATCH /udt/session/<session_id>/sample changing the taskData[sampleIndex].isBeingEdited to true and and taskOutput[sampleIndex] to their newOutput

Database Architecture

One table called session_state representing each state of the JSON file. It contains the following columns:

  • session_state_id uuid randomly generated
  • short_id text randomly generated: represents the session id
  • udt_json jsonb: The state of the UDT file
  • patch jsonb: The patch that created this version from the previous version
  • previous_session_state_id uuid: Identifier for previous state
  • version integer: Integer identifying the revision number
  • created_at timestamptz: Timestamp on creation

The database will have the following constraints applied

  • UNIQUE previous_session_state_id
    • Each session can only have one subsequent state. This prevents certain race conditions.

The database will have the following SQL triggers:

  • Delete session_states that are older than 1 hour AND not the latest state
    • Triggered when a session state is inserted.

Feature: Import Samples from UDT CSV

As discussed in #32, we should add an import dialog for CSV data. Basically it would import the sample portion of a *.udt.csv file, which is structured as shown below....

path . document output
interface { ... }
samples.0 This strainer makes a great... { "entities": [ { "label": "hat", "start": ... } ]}
samples.1 Boy spaghetti is sure tasty... {"entities": [ { "label": "food", "start": ... } ]}

Add Separate Dependency List for React Library

We need to create a separate dependencies list for react usage. Many react users won't need youtube-dl or ffmpeg libaries and we don't want things to be super bloated if they're using it as an npm module.

Paste URLs doesn't recognize images with GET Parameters

Steps to reproduce:

  1. Paste image url like the one below:
    https://scontent-lga3-1.cdninstagram.com/v/t51.2885-15/e35/c0.180.1440.1440a/s480x480/92319440_155591232435058_5851057296458538518_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_cat=110&_nc_ohc=pbVh20DKP50AX-QKGFD&oh=fd6b34bf2605f170104db599b1131d84&oe=5EB4AD12

  2. Sample is not imported

Should support composite GUI configuration

There should be a recursive GUI configuration for composite type tasks. The following story would show a basic interface...

import React from "react"

import { storiesOf } from "@storybook/react"
import { action } from "@storybook/addon-actions"

import CompositeConfiguration from "./"

storiesOf("CompositeConfiguration", module).add("Basic", () => (
  <CompositeConfiguration
    onSaveTaskOutputItem={action("CompositeConfiguration")}
    interface={{
      type: "composite",
      fields: [
         {
             "fieldName": "Field 1",
             "interface": { "type": "image_segmentation"}
          }, 
          {
             "fieldName": "Field 2",
             "interface": { "type": "audio_transcription"}
          }, 
      ]
      description: "This is an **audio transcription** description."
    }}
  />
))
  • I'm not sure what taskData or taskOutput is supposed to look like, other than the keys of each taskOutput should be the field name e.g. taskOutput[0]["Field 1"] is the output from Field 1.

README Images

This issue is just used to upload images for usage in the README.

label-hand

Collaborative Session should Sync with Recent Item

When people exit a collaborative session, they're expecting the file to retain the same name it did before when it was stored in local storage.

We should link the session the user is working in to a local storage item. If they created a collaborative session while working in a local storage item it should be linked to that item.

Empty Labeling State

The empty labeling state should tell the user what to do (i.e. if the interface is "empty")

Feature: Sample Colors / States

Many people have a pipeline where a machine learning sample can be in an "incomplete" "in review" and "complete" state. This feature should give flexibility for the user to determine the sample's state when saving in an efficient manner.

Editing Settings in Collaborative Session is annoying

This is because changes are immediately sent to the server, then during the reconciliation some immediate user changes are reverted.

Try editing a text field for a setting while in a collaborative session. You'll see that many patches are sent and it's difficult to type.

The easiest solution is to implement a "Save" button on the Settings page. This will also make it more difficult for a team member to accidentally mess up the settings. The Save button will appear only when in a collaborative session.

Why only allow file uploads from desktop version?

I launched the react app (since I would like to deploy this as a service), however I cant seem to be able to upload a file with samples. THe button is greyed out and reads "DESKTOP ONLY"

Why adding this arbitrary limitation to a web app?

Feature: Video Annotation

We've heard some people ask for this, please thumbs up if you're interested!

Video annotation has some unique challenges. Labels should automatically interpolate between frames- e.g. if the user annotates frame 1 and 5 of a moving object, frames 2,3,4 should be automatically move the bounding box to different parts of the movement.

bug: unable to import text on Mac App

I am not able to import Directories or Test Snippets.

When I go Directory route, I can select a directory, but next it puts me to "Grid" view with nothing there.

If I press "Text Snippets", I can type, but I cannot paste any text from systems buffer (Cmd+V does not work, and there is no contextual menu).

I am using latest version downloaded 2020-03-23

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.