Giter Site home page Giter Site logo

tandembank / data-science.dataset-labeller Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 424 KB

Web-based tool for labelling datasets

License: MIT License

Python 40.81% HTML 2.76% JavaScript 42.78% CSS 12.04% Dockerfile 1.61%
datascience python javascript django react datasets dataset-generation

data-science.dataset-labeller's Introduction

Dataset labeller

This is a web-based tool we developed to label datasets quickly at Tandem. It is based on Python, Django, React and run through Docker Compose.

Creating a new dataset and starting to label

Installation and Running

The easiest way to get this application running is via Docker Compose. Once you have this working, run the following commands to install.

git clone https://github.com/tandembank/data-science.dataset-labeller.git
cd data-science.dataset-labeller
cp docker-compose.example.yml docker-compose.yml
docker-compose build

You should now have the Docker image built. To run it, along with it's database server, run this from the same loaction:

docker-compose up

After a few seconds you should be able to access it via http://localhost:8080/ in your browser.

Usage

This is the process for labelling a new dataset:

  1. Upload a CSV file containing rows that you want to label.
  2. Give it a name.
  3. Select the columns that should be displayed to a person labelling.
  4. Define the possible category labels and keyboard shortcuts to make things faster.
  5. Decide how many people need to label each row datapoint – this is useful if you want to get a consensus.
  6. Save dataset.
  7. Get your team to login and label it.
  8. View job progress on the dashboard.
  9. Download the labelled dataset as a CSV – it'll have an extra column with the labels

Features

  • Import and export data in the format that you're comfortable with – no need to pre-process data, just select the columns to display for labelling.
  • Each user has their own account so you can see who labelled what.
  • Labellers can access the tool remotely or within a corporate network using just their web browser.
  • Slick and quick user interface while you're labelling – the next few datapoints are already loaded in your browsers so they're ready to show as soon as you've labelled the current one.
  • Multiple users can be labelling at once as we use locks to avoid collisions.
  • If some datapoints are tricky to label or your team are going at break-neck speed you can choose to get a consensus from an odd number of users, say 3 or 5.
  • Database included and configured in the Docker Compose file.
  • Cell content such as JSON lists gets displayed nicely formatted. We aim to extend this to identify other formats and image URLs.

data-science.dataset-labeller's People

Contributors

damianmoore avatar

Stargazers

 avatar Thomas Friedel avatar  avatar

Watchers

 avatar James Cloos avatar

data-science.dataset-labeller's Issues

Support displaying images from URLs in datasets

If a presented field during labelling contains a URL that ends in a JPEG or PNG file extension, load it in the background before the item is displayed, in the order that they will be displayed. Render the image, scaled-down if needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.