Giter Site home page Giter Site logo

birthbaum / gestureimt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhongyi-zhou/gestureimt

0.0 0.0 0.0 27.66 MB

Gesture-aware Interactive Machine Teaching with In-situ Object Annotations (UIST 22)

License: Creative Commons Zero v1.0 Universal

Shell 0.52% JavaScript 6.43% C++ 5.50% Python 74.90% C 0.55% CSS 0.48% Cuda 9.97% HTML 1.65%

gestureimt's Introduction

Gesture-aware Interactive Machine Teaching with In-situ Object Annotations (UIST 22)

demo


Gesture-aware Interactive Machine Teaching with In-situ Object Annotations
Zhongyi Zhou, Koji Yatani
The University of Tokyo
UIST 2022
Abstract: Interactive Machine Teaching (IMT) systems allow non-experts to easily create Machine Learning (ML) models. However, existing vision-based IMT systems either ignore annotations on the objects of interest or require users to annotate in a post-hoc manner. Without the annotations on objects, the model may misinterpret the objects using unrelated features. Post-hoc annotations cause additional workload, which diminishes the usability of the overall model building process. In this paper, we develop LookHere, which integrates in-situ object annotations into vision-based IMT. LookHere exploits users' deictic gestures to segment the objects of interest in real time. This segmentation information can be additionally used for training. To achieve the reliable performance of this object segmentation, we utilize our custom dataset called HuTics, including 2040 front-facing images of deictic gestures toward various objects by 170 people. The quantitative results of our user study showed that participants were 16.3 times faster in creating a model with our system compared to a standard IMT system with a post-hoc annotation process while demonstrating comparable accuracies. Additionally, models created by our system showed a significant accuracy improvement ($\Delta mIoU=0.466$) in segmenting the objects of interest compared to those without annotations.

Getting Started

This code has been tested on PyTorch 1.12 with CUDA 11.6 and PyTorch 1.10 with CUDA 11.3.

To install PyTorch 1.12 with CUDA 11.6,

chmod +x ./install/init_cuda_11_6.sh
./install/init_cuda_11_6.sh

To install PyTorch 1.12 with CUDA 11.3,

chmod +x ./install/init_cuda_11_3.sh
./install/init_cuda_11_3.sh

If you are using other versions

(not necessary if the code above succeeds.) This project may also work in other version of PyTorch. You can exam the required packages under ./install and install them by yourself. You also need to download two checkpoint files from Google Drive:

  • put resnet18_adam.pth.tar under ./demo_app/src/ckpt/ and ./object_highlights/ckpt/
  • put unet-b0-bgr-100epoch.pt under ./demo_app/src/ckpt/

Website Demo

Initialization

conda activate lookhere
cd demo_app
./gen_keys.sh

Run the server

python app.py

Teaching

Then you can access the teaching interface via

You can also access this website through LAN:

Check demo_app/README.md for more details on how to use the app.

Training

All your teaching data will be stored at ./tmp/000_test/. You can start training using

./src/trainer/train.sh ./tmp/000_test/ours/ 1

This project does not include the function for automatic training in the system. Please implement this function yourself by refering to the codes used above.

Model Assessment

Once the training process finishes, you can assess your model via this link:

HuTics: Human Deictic Gestures Dataset

HuTics covers covers four kinds of deictic gestures to objects. Note that we only annotate the segmentation masks of the objects. The hand segmentation masks are generated from this work.

This dataset is under the license of [CC-BY-NonCommercial].

Download: [google drive]

Exhibiting Pointing
Presenting Touching

Gesture-aware Object-Agnostic Segmentation

You need to first download the HuTics dataset above.

Start training the network. Please check the following path to your dataset location.

cd object_highlights
conda activate lookhere
./trainer/train.sh PATH_TO_HUTICS

After the training process finishes, you need to convert the rgb-based ckpt into the bgr-based one.

python utils/ckpt_rgb2bgr.py --input ${YOUR_INPUT_RGB_MODEL.pt} --output ${YOUR_OUTPUT_BGR_MODEL.pt}

The model is now ready, and you can use the trained model for the inference.

python demo_video.py --objckpt ${YOUR_OUTPUT_BGR_MODEL.pt} 

The output video will be at vids/tissue_out.mp4

Related Work

Citations

@misc{zhou2022gesture,
  doi = {10.48550/ARXIV.2208.01211},
  url = {https://arxiv.org/abs/2208.01211},
  author = {Zhou, Zhongyi and Yatani, Koji},
  title = {Gesture-aware Interactive Machine Teaching with In-situ Object Annotations},
  publisher = {arXiv},
  year = {2022}
}

@inproceedings{zhou2021enhancing,
author = {Zhou, Zhongyi and Yatani, Koji},
title = {Enhancing Model Assessment in Vision-Based Interactive Machine Teaching through Real-Time Saliency Map Visualization},
year = {2021},
isbn = {9781450386555},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3474349.3480194},
doi = {10.1145/3474349.3480194},
pages = {112โ€“114},
numpages = {3},
keywords = {Visualization, Saliency Map, Interactive Machine Teaching},
location = {Virtual Event, USA},
series = {UIST '21}
}

gestureimt's People

Contributors

zhongyi-zhou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.