Giter Site home page Giter Site logo

Comments (9)

DokRaphael avatar DokRaphael commented on April 27, 2024 6

Hey, I am not really a developper (I am a designer) but I tried to implement an open hand recognition
Basically I am summing up the angles between each fingertip and wrist. I am assuming that the wrist is the landmark 0 and each fingertip is : 4, 8, 12, 16 and 20 (I could be completely wrong with this). I am really sorry for my coding, I am just trying things out here and there fast and dirty.

Here is what I did in the landmark_letterbox_removal_calculator.cc :

// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include <cmath>
#include <vector>

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/landmark.pb.h"
#include "mediapipe/framework/port/ret_check.h"

namespace mediapipe {

namespace {

constexpr char kLandmarksTag[] = "LANDMARKS";
    constexpr char kOpen[] = "OPENHAND";
constexpr char kLetterboxPaddingTag[] = "LETTERBOX_PADDING";

}  // namespace

// Adjusts landmark locations on a letterboxed image to the corresponding
// locations on the same image with the letterbox removed. This is useful to map
// the landmarks inferred from a letterboxed image, for example, output of
// the ImageTransformationCalculator when the scale mode is FIT, back to the
// corresponding input image before letterboxing.
//
// Input:
//   LANDMARKS: An std::vector<NormalizedLandmark> representing landmarks on an
//   letterboxed image.
//
//   LETTERBOX_PADDING: An std::array<float, 4> representing the letterbox
//   padding from the 4 sides ([left, top, right, bottom]) of the letterboxed
//   image, normalized to [0.f, 1.f] by the letterboxed image dimensions.
//
// Output:
//   LANDMARKS: An std::vector<NormalizedLandmark> representing landmarks with
//   their locations adjusted to the letterbox-removed (non-padded) image.
//
// Usage example:
// node {
//   calculator: "LandmarkLetterboxRemovalCalculator"
//   input_stream: "LANDMARKS:landmarks"
//   input_stream: "LETTERBOX_PADDING:letterbox_padding"
//   output_stream: "LANDMARKS:adjusted_landmarks"
// }
class LandmarkLetterboxRemovalCalculator : public CalculatorBase {
 public:
  static ::mediapipe::Status GetContract(CalculatorContract* cc) {
    RET_CHECK(cc->Inputs().HasTag(kLandmarksTag) &&
              cc->Inputs().HasTag(kLetterboxPaddingTag))
        << "Missing one or more input streams.";

    cc->Inputs().Tag(kLandmarksTag).Set<std::vector<NormalizedLandmark>>();
    cc->Inputs().Tag(kLetterboxPaddingTag).Set<std::array<float, 4>>();

    cc->Outputs().Tag(kLandmarksTag).Set<std::vector<NormalizedLandmark>>();

    return ::mediapipe::OkStatus();
  }

  ::mediapipe::Status Open(CalculatorContext* cc) override {
    cc->SetOffset(TimestampDiff(0));

    return ::mediapipe::OkStatus();
  }

  ::mediapipe::Status Process(CalculatorContext* cc) override {
    // Only process if there's input landmarks.
    if (cc->Inputs().Tag(kLandmarksTag).IsEmpty()) {
      return ::mediapipe::OkStatus();
    }

    const auto& input_landmarks =
        cc->Inputs().Tag(kLandmarksTag).Get<std::vector<NormalizedLandmark>>();
    const auto& letterbox_padding =
        cc->Inputs().Tag(kLetterboxPaddingTag).Get<std::array<float, 4>>();

    const float left = letterbox_padding[0];
    const float top = letterbox_padding[1];
    const float left_and_right = letterbox_padding[0] + letterbox_padding[2];
    const float top_and_bottom = letterbox_padding[1] + letterbox_padding[3];

    auto output_landmarks =
        absl::make_unique<std::vector<NormalizedLandmark>>();
    int i = 0;
    float x0 = 0;
    float y0 = 0;
    float z0 = 0;

    float x2 = 0;
    float y2 = 0;
    float z2 = 0;

    float x3 = 0;
    float y3 = 0;
    float z3 = 0;

    float x4 = 0;
    float y4 = 0;
    float z4 = 0;

    float x5 = 0;
    float y5 = 0;
    float z5 = 0;

    float x6 = 0;
    float y6 = 0;
    float z6 = 0;

    auto openhand = false;
    for (const auto& landmark : input_landmarks) {

      NormalizedLandmark new_landmark;
      const float new_x = (landmark.x() - left) / (1.0f - left_and_right);
      const float new_y = (landmark.y() - top) / (1.0f - top_and_bottom);

      new_landmark.set_x(new_x);
      new_landmark.set_y(new_y);
      // Keep z-coord as is.
      new_landmark.set_z(landmark.z());
      // std::cout << new_landmark.x();
      
      output_landmarks->emplace_back(new_landmark);
      if(i==0){
        x0 = landmark.x();
        y0 = landmark.y();
        z0 = landmark.z();
      }
      if(i==4){
        x2 = landmark.x();
        y2 = landmark.y();
        z2 = landmark.z();
      }
      if(i==8){
        x3 = landmark.x();
        y3 = landmark.y();
        z3 = landmark.z();
      }
      if(i==12){
        x4 = landmark.x();
        y4 = landmark.y();
        z4 = landmark.z();
      }
      if(i==16){
        x5 = landmark.x();
        y5 = landmark.y();
        z5 = landmark.z();
      }
      if(i==20){
        x6 = landmark.x();
        y6 = landmark.y();
        z6 = landmark.z();
      }
      i++;
    }
    float vx1 = x2-x0;
    float vy1 = y2-y0;
    float vz1 = z2-z0;
    float vx2 = x3-x0;
    float vy2 = y3-y0;
    float vz2 = z3-z0;

    float dot1 = vx1*vx2 + vy1*vy2 +vz1*vz2;    
    float lenv1 = vx1*vx1 + vy1*vy1 + vz1*vz1;
    float lenv2 = vx2*vx2 + vy2*vy2 + vz2*vz2;
    float angle1 = acos(dot1/sqrt(lenv1 * lenv2));
    // float det1 = vx1*vy2 - vy1*vx2;
    // float angle1 = atan2(det1, dot1);

    float vx3 = x4-x0;
    float vy3 = y4-y0;
    float vz3 = z4-z0;

    float dot2 = vx2*vx3 + vy2*vy3 + vz2*vz3; 
    float lenv3 = vx3*vx3 + vy3*vy3 + vz3*vz3;
    float angle2 = acos(dot2/sqrt(lenv2 * lenv3));
    // float det2 = vx2*vy3 - vy2*vx3;
    // float angle2 = atan2(det2, dot2);

    float vx4 = x5-x0;
    float vy4 = y5-y0;
    float vz4 = z5-z0;

    float dot3 = vx3*vx4 + vy3*vy4 + vz3*vz4;  
    float lenv4 = vx4*vx4 + vy4*vy4 + vz4*vz4;
    float angle3 = acos(dot3/sqrt(lenv3 * lenv4));  
    // float det3 = vx3*vy4 - vy3*vx4;
    // float angle3 = atan2(det3, dot3);

    float vx5 = x6-x0;
    float vy5 = y6-y0;
    float vz5 = z6-z0;

    float dot4 = vx4*vx5 + vy4*vy5 +vz4*vz5;    
    float lenv5 = vx5*vx5 + vy5*vy5 + vz5*vz5;
    float angle4 = acos(dot4/sqrt(lenv4 * lenv5));
    // float det4 = vx4*vy5 - vy4*vx5;
    // float angle4 = atan2(det4, dot4);

    float angle = angle1 + angle2 + angle3 + angle4;


    // float dot = a[0]*b[0] + a[1]*b[1];
    // float det = a[0]*b[1] - a[1]*b[0];
    // float angle = atan2(det, dot);

    if(abs(angle)>1.1){
      std::cout << "hand is open" << std::endl;
      openhand = true;
    }else {
      openhand = false;
    }

      // int k = abs( sqrt( (double)x + (double)y ) );
      
      
//    cc->Outputs()
//        .Tag(kOpen).Add(openhand, cc->InputTimestamp());
    cc->Outputs()
        .Tag(kLandmarksTag)
        .Add(output_landmarks.release(), cc->InputTimestamp());
    return ::mediapipe::OkStatus();
  }
};
REGISTER_CALCULATOR(LandmarkLetterboxRemovalCalculator);

}  // namespace mediapipe

@mgyong Is this a good direction ? Now I have to figure out how I can surface this info out in objective-c. Is there a simple way to catch that in objective-c to work with some iOS native stuff ?

from mediapipe.

lisbravo avatar lisbravo commented on April 27, 2024 5

@psykhon We do not have plans to release the gesture as the code is very basic and not production ready. It is just a series of rules mapping to a gesture.
We encourage our users to write their own gesture recognition based of the hand tracking example that outputs 21 landmarks of the hand

I understand, but at the same time I guess that if you want this framework to be accepted, you shouldn't tease with examples that later can not be reproduced, otherwise it just feels like clickbait

from mediapipe.

DebankurS avatar DebankurS commented on April 27, 2024 3

Any guide to the right data for the task @mgyong. I could not find the labelled 21 landmark dataset for this

from mediapipe.

Suraj520 avatar Suraj520 commented on April 27, 2024

Eagerly looking forward to the resolution of the above issue!

from mediapipe.

mgyong avatar mgyong commented on April 27, 2024

@DebankurS Gesture as detailed in the Google Hand tracking AI blog post is not available in the open source example. @fanzhanggoogle

from mediapipe.

lisbravo avatar lisbravo commented on April 27, 2024

@DebankurS Gesture as detailed in the Google Hand tracking AI blog post is not available in the open source example. @fanzhanggoogle

-And do you plan to make it available?
-Can you please explain why some parts of the code are open and some, like this case, are not?

from mediapipe.

mgyong avatar mgyong commented on April 27, 2024

@psykhon We do not have plans to release the gesture as the code is very basic and not production ready. It is just a series of rules mapping to a gesture.
We encourage our users to write their own gesture recognition based of the hand tracking example that outputs 21 landmarks of the hand

from mediapipe.

mgyong avatar mgyong commented on April 27, 2024

@psykhon We have full examples for hand tracking (released model + open sourced pipeline). Definitely not teasing in any way :-)

from mediapipe.

gabrielstuff avatar gabrielstuff commented on April 27, 2024

hello,
It is really interesting to see so many people looking for the matching algorithm. Looking at what mediapipe output for hand gives, I feel like the same idea as in the following article could be use :

https://medium.com/tensorflow/move-mirror-an-ai-experiment-with-pose-estimation-in-the-browser-using-tensorflow-js-2f7b769f9b23#3965

The landmark of the hand are not that far away from the ones of a body. So I do not see why the Cosine similarity methodology or the weighted matching methodology would not work.

Finally they use a Vantage point tree algorithm to find the closest distance with all other landmarks.

from mediapipe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.