Hi, I am trying to understand how the gesture part of the applicatio

Any guide to the right data for the task <a class="user-mention notranslate" data-hove

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Identification of Gesture in hand tracking model about mediapipe HOT 9 CLOSED

google commented on April 27, 2024 5

Identification of Gesture in hand tracking model

from mediapipe.

Comments (9)

DokRaphael commented on April 27, 2024 6

Hey, I am not really a developper (I am a designer) but I tried to implement an open hand recognition
Basically I am summing up the angles between each fingertip and wrist. I am assuming that the wrist is the landmark 0 and each fingertip is : 4, 8, 12, 16 and 20 (I could be completely wrong with this). I am really sorry for my coding, I am just trying things out here and there fast and dirty.

Here is what I did in the landmark_letterbox_removal_calculator.cc :

// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include <cmath>
#include <vector>

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/landmark.pb.h"
#include "mediapipe/framework/port/ret_check.h"

namespace mediapipe {

namespace {

constexpr char kLandmarksTag[] = "LANDMARKS";
    constexpr char kOpen[] = "OPENHAND";
constexpr char kLetterboxPaddingTag[] = "LETTERBOX_PADDING";

}  // namespace

// Adjusts landmark locations on a letterboxed image to the corresponding
// locations on the same image with the letterbox removed. This is useful to map
// the landmarks inferred from a letterboxed image, for example, output of
// the ImageTransformationCalculator when the scale mode is FIT, back to the
// corresponding input image before letterboxing.
//
// Input:
//   LANDMARKS: An std::vector<NormalizedLandmark> representing landmarks on an
//   letterboxed image.
//
//   LETTERBOX_PADDING: An std::array<float, 4> representing the letterbox
//   padding from the 4 sides ([left, top, right, bottom]) of the letterboxed
//   image, normalized to [0.f, 1.f] by the letterboxed image dimensions.
//
// Output:
//   LANDMARKS: An std::vector<NormalizedLandmark> representing landmarks with
//   their locations adjusted to the letterbox-removed (non-padded) image.
//
// Usage example:
// node {
//   calculator: "LandmarkLetterboxRemovalCalculator"
//   input_stream: "LANDMARKS:landmarks"
//   input_stream: "LETTERBOX_PADDING:letterbox_padding"
//   output_stream: "LANDMARKS:adjusted_landmarks"
// }
class LandmarkLetterboxRemovalCalculator : public CalculatorBase {
 public:
  static ::mediapipe::Status GetContract(CalculatorContract* cc) {
    RET_CHECK(cc->Inputs().HasTag(kLandmarksTag) &&
              cc->Inputs().HasTag(kLetterboxPaddingTag))
        << "Missing one or more input streams.";

    cc->Inputs().Tag(kLandmarksTag).Set<std::vector<NormalizedLandmark>>();
    cc->Inputs().Tag(kLetterboxPaddingTag).Set<std::array<float, 4>>();

    cc->Outputs().Tag(kLandmarksTag).Set<std::vector<NormalizedLandmark>>();

    return ::mediapipe::OkStatus();
  }

  ::mediapipe::Status Open(CalculatorContext* cc) override {
    cc->SetOffset(TimestampDiff(0));

    return ::mediapipe::OkStatus();
  }

  ::mediapipe::Status Process(CalculatorContext* cc) override {
    // Only process if there's input landmarks.
    if (cc->Inputs().Tag(kLandmarksTag).IsEmpty()) {
      return ::mediapipe::OkStatus();
    }

    const auto& input_landmarks =
        cc->Inputs().Tag(kLandmarksTag).Get<std::vector<NormalizedLandmark>>();
    const auto& letterbox_padding =
        cc->Inputs().Tag(kLetterboxPaddingTag).Get<std::array<float, 4>>();

    const float left = letterbox_padding[0];
    const float top = letterbox_padding[1];
    const float left_and_right = letterbox_padding[0] + letterbox_padding[2];
    const float top_and_bottom = letterbox_padding[1] + letterbox_padding[3];

    auto output_landmarks =
        absl::make_unique<std::vector<NormalizedLandmark>>();
    int i = 0;
    float x0 = 0;
    float y0 = 0;
    float z0 = 0;

    float x2 = 0;
    float y2 = 0;
    float z2 = 0;

    float x3 = 0;
    float y3 = 0;
    float z3 = 0;

    float x4 = 0;
    float y4 = 0;
    float z4 = 0;

    float x5 = 0;
    float y5 = 0;
    float z5 = 0;

    float x6 = 0;
    float y6 = 0;
    float z6 = 0;

    auto openhand = false;
    for (const auto& landmark : input_landmarks) {

      NormalizedLandmark new_landmark;
      const float new_x = (landmark.x() - left) / (1.0f - left_and_right);
      const float new_y = (landmark.y() - top) / (1.0f - top_and_bottom);

      new_landmark.set_x(new_x);
      new_landmark.set_y(new_y);
      // Keep z-coord as is.
      new_landmark.set_z(landmark.z());
      // std::cout << new_landmark.x();
      
      output_landmarks->emplace_back(new_landmark);
      if(i==0){
        x0 = landmark.x();
        y0 = landmark.y();
        z0 = landmark.z();
      }
      if(i==4){
        x2 = landmark.x();
        y2 = landmark.y();
        z2 = landmark.z();
      }
      if(i==8){
        x3 = landmark.x();
        y3 = landmark.y();
        z3 = landmark.z();
      }
      if(i==12){
        x4 = landmark.x();
        y4 = landmark.y();
        z4 = landmark.z();
      }
      if(i==16){
        x5 = landmark.x();
        y5 = landmark.y();
        z5 = landmark.z();
      }
      if(i==20){
        x6 = landmark.x();
        y6 = landmark.y();
        z6 = landmark.z();
      }
      i++;
    }
    float vx1 = x2-x0;
    float vy1 = y2-y0;
    float vz1 = z2-z0;
    float vx2 = x3-x0;
    float vy2 = y3-y0;
    float vz2 = z3-z0;

    float dot1 = vx1*vx2 + vy1*vy2 +vz1*vz2;    
    float lenv1 = vx1*vx1 + vy1*vy1 + vz1*vz1;
    float lenv2 = vx2*vx2 + vy2*vy2 + vz2*vz2;
    float angle1 = acos(dot1/sqrt(lenv1 * lenv2));
    // float det1 = vx1*vy2 - vy1*vx2;
    // float angle1 = atan2(det1, dot1);

    float vx3 = x4-x0;
    float vy3 = y4-y0;
    float vz3 = z4-z0;

    float dot2 = vx2*vx3 + vy2*vy3 + vz2*vz3; 
    float lenv3 = vx3*vx3 + vy3*vy3 + vz3*vz3;
    float angle2 = acos(dot2/sqrt(lenv2 * lenv3));
    // float det2 = vx2*vy3 - vy2*vx3;
    // float angle2 = atan2(det2, dot2);

    float vx4 = x5-x0;
    float vy4 = y5-y0;
    float vz4 = z5-z0;

    float dot3 = vx3*vx4 + vy3*vy4 + vz3*vz4;  
    float lenv4 = vx4*vx4 + vy4*vy4 + vz4*vz4;
    float angle3 = acos(dot3/sqrt(lenv3 * lenv4));  
    // float det3 = vx3*vy4 - vy3*vx4;
    // float angle3 = atan2(det3, dot3);

    float vx5 = x6-x0;
    float vy5 = y6-y0;
    float vz5 = z6-z0;

    float dot4 = vx4*vx5 + vy4*vy5 +vz4*vz5;    
    float lenv5 = vx5*vx5 + vy5*vy5 + vz5*vz5;
    float angle4 = acos(dot4/sqrt(lenv4 * lenv5));
    // float det4 = vx4*vy5 - vy4*vx5;
    // float angle4 = atan2(det4, dot4);

    float angle = angle1 + angle2 + angle3 + angle4;


    // float dot = a[0]*b[0] + a[1]*b[1];
    // float det = a[0]*b[1] - a[1]*b[0];
    // float angle = atan2(det, dot);

    if(abs(angle)>1.1){
      std::cout << "hand is open" << std::endl;
      openhand = true;
    }else {
      openhand = false;
    }

      // int k = abs( sqrt( (double)x + (double)y ) );
      
      
//    cc->Outputs()
//        .Tag(kOpen).Add(openhand, cc->InputTimestamp());
    cc->Outputs()
        .Tag(kLandmarksTag)
        .Add(output_landmarks.release(), cc->InputTimestamp());
    return ::mediapipe::OkStatus();
  }
};
REGISTER_CALCULATOR(LandmarkLetterboxRemovalCalculator);

}  // namespace mediapipe

@mgyong Is this a good direction ? Now I have to figure out how I can surface this info out in objective-c. Is there a simple way to catch that in objective-c to work with some iOS native stuff ?

from mediapipe.

lisbravo commented on April 27, 2024 5

@psykhon We do not have plans to release the gesture as the code is very basic and not production ready. It is just a series of rules mapping to a gesture.
We encourage our users to write their own gesture recognition based of the hand tracking example that outputs 21 landmarks of the hand

I understand, but at the same time I guess that if you want this framework to be accepted, you shouldn't tease with examples that later can not be reproduced, otherwise it just feels like clickbait

from mediapipe.

DebankurS commented on April 27, 2024 3

Any guide to the right data for the task @mgyong. I could not find the labelled 21 landmark dataset for this

from mediapipe.

Suraj520 commented on April 27, 2024

Eagerly looking forward to the resolution of the above issue!

from mediapipe.

mgyong commented on April 27, 2024

@DebankurS Gesture as detailed in the Google Hand tracking AI blog post is not available in the open source example. @fanzhanggoogle

from mediapipe.

lisbravo commented on April 27, 2024

@DebankurS Gesture as detailed in the Google Hand tracking AI blog post is not available in the open source example. @fanzhanggoogle

-And do you plan to make it available?
-Can you please explain why some parts of the code are open and some, like this case, are not?

from mediapipe.

mgyong commented on April 27, 2024

@psykhon We do not have plans to release the gesture as the code is very basic and not production ready. It is just a series of rules mapping to a gesture.
We encourage our users to write their own gesture recognition based of the hand tracking example that outputs 21 landmarks of the hand

from mediapipe.

mgyong commented on April 27, 2024

@psykhon We have full examples for hand tracking (released model + open sourced pipeline). Definitely not teasing in any way :-)

from mediapipe.

gabrielstuff commented on April 27, 2024

hello,
It is really interesting to see so many people looking for the matching algorithm. Looking at what mediapipe output for hand gives, I feel like the same idea as in the following article could be use :

https://medium.com/tensorflow/move-mirror-an-ai-experiment-with-pose-estimation-in-the-browser-using-tensorflow-js-2f7b769f9b23#3965

The landmark of the hand are not that far away from the ones of a body. So I do not see why the Cosine similarity methodology or the weighted matching methodology would not work.

Finally they use a Vantage point tree algorithm to find the closest distance with all other landmarks.

from mediapipe.

Identification of Gesture in hand tracking model about mediapipe HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent