Giter Site home page Giter Site logo

3d_skeleton_conversion's Introduction

2D_to_3D_conversion

The goal of this repo is to predict the depth coordinate of 2D videos.

  • Data
    The videos in the dataset are recordings of people speaking in sign language. For each video, 26 body keypoints of the person have been extracted to obtain a stick-figure representation. The depth coordinate for each keypoint has been estimated using a multiple-camera set-up. The data is stored in the following two numpy arrays:

    • body_data.npy stores for each video in the dataset and for each frame in the video the 2D-coordinates of each keypoint ([x_1, y_1, x_2, y_2, ..., x_26, y_26]). It has shape [NUM_VIDEOS, NUM_FRAMES, 52]
    • body_ground.npy stores for each video in the dataset and for each frame in the video the depth coordiante of each keypoint ([z_1, ..., z_26]). It has shape [NUM_VIDEOS, NUM_FRAMES, 26]

    This folder is not available in the repo.

  • Preprocessing
    In the preprocessing folder the data is normalized and prepared to be fed into the neural network. There are the following files:

    • skeleton_parts.py contains a dictionary to convert from keypoint number to bodypart and viceversa.
    • plot_3D_skeleton.py contains the function plot_3D_skeleton that allows to visualize animated 3D plots of the keypoint figures.
    • rotate_skeleton.py contains the function rotate_skeleton that centers the skeleton in the origin of coordinates and rotates it so that its column (Mid-Hip to Neck vector) is in the Y axis and it is facing forwards (which means that the Nose to Neck vector is in the XY plane).
    • scale_axes.py contains the function scale_axes that computes the 2D-length of the skeleton's column and scales the 3 axes with this length. This normalization is inspired in the paper Can 3D Pose be Learned from 2D Projections Alone, Dylan Drover, Rohith MV, Ching-Hang Chen, ECCVW 2018
    • main.py performs all of the mentioned normalization steps to finally obtain the final xyz_data has shape [TOTAL_NUM_FRAMES, 26, 3] (important remark: with the rotation, the depth coordinate is now dim0 or x)
  • Model
    This folder contains the neural network class DepthLSTM. It consists of a LSTM layer with input size = 52 (two coordinates for each joint), arbitrary hidden_size and num_layers followed by a Linear layer with input size = hidden_size and output size = 26 (one depth coordinate for each joint). LSTM neural networks are explained here: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  • Test_Train
    This folder contains the scripts for training and testing the model:

    • train_epoch.py: trains the model for one epoch. It implements stateful training: the dataset is divided in batches, and each batch is divided in consecutive windows of fixed length SEQ_LEN. Then each window is forwarded through the model, the state from the last frame is saved and it used to start training of the following window.
    • test_epoch.py: given a model and some test data without the depth coordinate, it returns the average MSE loss and the predicted depth coordinate.
    • main.py: this script performs the training and testing of the model for NUM_EPOCHS, plots the MSE loss for training and testing and visualizes the ground truth and predicted skeletons

3d_skeleton_conversion's People

Contributors

mireiahernandez avatar

Watchers

 avatar

Forkers

fasladodo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.