Comments (9)
Hey, I am not really a developper (I am a designer) but I tried to implement an open hand recognition
Basically I am summing up the angles between each fingertip and wrist. I am assuming that the wrist is the landmark 0 and each fingertip is : 4, 8, 12, 16 and 20 (I could be completely wrong with this). I am really sorry for my coding, I am just trying things out here and there fast and dirty.
Here is what I did in the landmark_letterbox_removal_calculator.cc :
// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <cmath>
#include <vector>
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/landmark.pb.h"
#include "mediapipe/framework/port/ret_check.h"
namespace mediapipe {
namespace {
constexpr char kLandmarksTag[] = "LANDMARKS";
constexpr char kOpen[] = "OPENHAND";
constexpr char kLetterboxPaddingTag[] = "LETTERBOX_PADDING";
} // namespace
// Adjusts landmark locations on a letterboxed image to the corresponding
// locations on the same image with the letterbox removed. This is useful to map
// the landmarks inferred from a letterboxed image, for example, output of
// the ImageTransformationCalculator when the scale mode is FIT, back to the
// corresponding input image before letterboxing.
//
// Input:
// LANDMARKS: An std::vector<NormalizedLandmark> representing landmarks on an
// letterboxed image.
//
// LETTERBOX_PADDING: An std::array<float, 4> representing the letterbox
// padding from the 4 sides ([left, top, right, bottom]) of the letterboxed
// image, normalized to [0.f, 1.f] by the letterboxed image dimensions.
//
// Output:
// LANDMARKS: An std::vector<NormalizedLandmark> representing landmarks with
// their locations adjusted to the letterbox-removed (non-padded) image.
//
// Usage example:
// node {
// calculator: "LandmarkLetterboxRemovalCalculator"
// input_stream: "LANDMARKS:landmarks"
// input_stream: "LETTERBOX_PADDING:letterbox_padding"
// output_stream: "LANDMARKS:adjusted_landmarks"
// }
class LandmarkLetterboxRemovalCalculator : public CalculatorBase {
public:
static ::mediapipe::Status GetContract(CalculatorContract* cc) {
RET_CHECK(cc->Inputs().HasTag(kLandmarksTag) &&
cc->Inputs().HasTag(kLetterboxPaddingTag))
<< "Missing one or more input streams.";
cc->Inputs().Tag(kLandmarksTag).Set<std::vector<NormalizedLandmark>>();
cc->Inputs().Tag(kLetterboxPaddingTag).Set<std::array<float, 4>>();
cc->Outputs().Tag(kLandmarksTag).Set<std::vector<NormalizedLandmark>>();
return ::mediapipe::OkStatus();
}
::mediapipe::Status Open(CalculatorContext* cc) override {
cc->SetOffset(TimestampDiff(0));
return ::mediapipe::OkStatus();
}
::mediapipe::Status Process(CalculatorContext* cc) override {
// Only process if there's input landmarks.
if (cc->Inputs().Tag(kLandmarksTag).IsEmpty()) {
return ::mediapipe::OkStatus();
}
const auto& input_landmarks =
cc->Inputs().Tag(kLandmarksTag).Get<std::vector<NormalizedLandmark>>();
const auto& letterbox_padding =
cc->Inputs().Tag(kLetterboxPaddingTag).Get<std::array<float, 4>>();
const float left = letterbox_padding[0];
const float top = letterbox_padding[1];
const float left_and_right = letterbox_padding[0] + letterbox_padding[2];
const float top_and_bottom = letterbox_padding[1] + letterbox_padding[3];
auto output_landmarks =
absl::make_unique<std::vector<NormalizedLandmark>>();
int i = 0;
float x0 = 0;
float y0 = 0;
float z0 = 0;
float x2 = 0;
float y2 = 0;
float z2 = 0;
float x3 = 0;
float y3 = 0;
float z3 = 0;
float x4 = 0;
float y4 = 0;
float z4 = 0;
float x5 = 0;
float y5 = 0;
float z5 = 0;
float x6 = 0;
float y6 = 0;
float z6 = 0;
auto openhand = false;
for (const auto& landmark : input_landmarks) {
NormalizedLandmark new_landmark;
const float new_x = (landmark.x() - left) / (1.0f - left_and_right);
const float new_y = (landmark.y() - top) / (1.0f - top_and_bottom);
new_landmark.set_x(new_x);
new_landmark.set_y(new_y);
// Keep z-coord as is.
new_landmark.set_z(landmark.z());
// std::cout << new_landmark.x();
output_landmarks->emplace_back(new_landmark);
if(i==0){
x0 = landmark.x();
y0 = landmark.y();
z0 = landmark.z();
}
if(i==4){
x2 = landmark.x();
y2 = landmark.y();
z2 = landmark.z();
}
if(i==8){
x3 = landmark.x();
y3 = landmark.y();
z3 = landmark.z();
}
if(i==12){
x4 = landmark.x();
y4 = landmark.y();
z4 = landmark.z();
}
if(i==16){
x5 = landmark.x();
y5 = landmark.y();
z5 = landmark.z();
}
if(i==20){
x6 = landmark.x();
y6 = landmark.y();
z6 = landmark.z();
}
i++;
}
float vx1 = x2-x0;
float vy1 = y2-y0;
float vz1 = z2-z0;
float vx2 = x3-x0;
float vy2 = y3-y0;
float vz2 = z3-z0;
float dot1 = vx1*vx2 + vy1*vy2 +vz1*vz2;
float lenv1 = vx1*vx1 + vy1*vy1 + vz1*vz1;
float lenv2 = vx2*vx2 + vy2*vy2 + vz2*vz2;
float angle1 = acos(dot1/sqrt(lenv1 * lenv2));
// float det1 = vx1*vy2 - vy1*vx2;
// float angle1 = atan2(det1, dot1);
float vx3 = x4-x0;
float vy3 = y4-y0;
float vz3 = z4-z0;
float dot2 = vx2*vx3 + vy2*vy3 + vz2*vz3;
float lenv3 = vx3*vx3 + vy3*vy3 + vz3*vz3;
float angle2 = acos(dot2/sqrt(lenv2 * lenv3));
// float det2 = vx2*vy3 - vy2*vx3;
// float angle2 = atan2(det2, dot2);
float vx4 = x5-x0;
float vy4 = y5-y0;
float vz4 = z5-z0;
float dot3 = vx3*vx4 + vy3*vy4 + vz3*vz4;
float lenv4 = vx4*vx4 + vy4*vy4 + vz4*vz4;
float angle3 = acos(dot3/sqrt(lenv3 * lenv4));
// float det3 = vx3*vy4 - vy3*vx4;
// float angle3 = atan2(det3, dot3);
float vx5 = x6-x0;
float vy5 = y6-y0;
float vz5 = z6-z0;
float dot4 = vx4*vx5 + vy4*vy5 +vz4*vz5;
float lenv5 = vx5*vx5 + vy5*vy5 + vz5*vz5;
float angle4 = acos(dot4/sqrt(lenv4 * lenv5));
// float det4 = vx4*vy5 - vy4*vx5;
// float angle4 = atan2(det4, dot4);
float angle = angle1 + angle2 + angle3 + angle4;
// float dot = a[0]*b[0] + a[1]*b[1];
// float det = a[0]*b[1] - a[1]*b[0];
// float angle = atan2(det, dot);
if(abs(angle)>1.1){
std::cout << "hand is open" << std::endl;
openhand = true;
}else {
openhand = false;
}
// int k = abs( sqrt( (double)x + (double)y ) );
// cc->Outputs()
// .Tag(kOpen).Add(openhand, cc->InputTimestamp());
cc->Outputs()
.Tag(kLandmarksTag)
.Add(output_landmarks.release(), cc->InputTimestamp());
return ::mediapipe::OkStatus();
}
};
REGISTER_CALCULATOR(LandmarkLetterboxRemovalCalculator);
} // namespace mediapipe
@mgyong Is this a good direction ? Now I have to figure out how I can surface this info out in objective-c. Is there a simple way to catch that in objective-c to work with some iOS native stuff ?
from mediapipe.
@psykhon We do not have plans to release the gesture as the code is very basic and not production ready. It is just a series of rules mapping to a gesture.
We encourage our users to write their own gesture recognition based of the hand tracking example that outputs 21 landmarks of the hand
I understand, but at the same time I guess that if you want this framework to be accepted, you shouldn't tease with examples that later can not be reproduced, otherwise it just feels like clickbait
from mediapipe.
Any guide to the right data for the task @mgyong. I could not find the labelled 21 landmark dataset for this
from mediapipe.
Eagerly looking forward to the resolution of the above issue!
from mediapipe.
@DebankurS Gesture as detailed in the Google Hand tracking AI blog post is not available in the open source example. @fanzhanggoogle
from mediapipe.
@DebankurS Gesture as detailed in the Google Hand tracking AI blog post is not available in the open source example. @fanzhanggoogle
-And do you plan to make it available?
-Can you please explain why some parts of the code are open and some, like this case, are not?
from mediapipe.
@psykhon We do not have plans to release the gesture as the code is very basic and not production ready. It is just a series of rules mapping to a gesture.
We encourage our users to write their own gesture recognition based of the hand tracking example that outputs 21 landmarks of the hand
from mediapipe.
@psykhon We have full examples for hand tracking (released model + open sourced pipeline). Definitely not teasing in any way :-)
from mediapipe.
hello,
It is really interesting to see so many people looking for the matching algorithm. Looking at what mediapipe output for hand gives, I feel like the same idea as in the following article could be use :
The landmark of the hand are not that far away from the ones of a body. So I do not see why the Cosine similarity methodology or the weighted matching methodology would not work.
Finally they use a Vantage point tree algorithm to find the closest distance with all other landmarks.
from mediapipe.
Related Issues (20)
- Blendshape scores and facial_transformation_matrixes in holistic model HOT 2
- Add Jetpack Compose Support HOT 1
- crash at libimagegenerator_gpu.so lib HOT 1
- Add support for latest protobuf HOT 1
- How change graph vertical_fov_degrees value with input_side_packet or other way [python] HOT 2
- Pose Landmarks model not working as expected on iOS HOT 2
- Multiple Person Pose Detection HOT 1
- mediapipe cannot correctly identify landmark HOT 1
- How to set a system prompt for RAG implementation for Inference for Gemma 2b on IOS ? HOT 3
- Installation Issue HOT 2
- MediaPipe python 3.12 not working HOT 1
- Landmarks for FaceMesh model HOT 4
- tasks-vision does not appear to be compatible with opencv.js HOT 3
- ImageProcessingOptions package is not correct in docs HOT 7
- CanvasRenderingContext2D is not defined in DrawingUtils constructor
- Broken links in docs/getting_started
- Broken links in docs/getting_started
- ImageEmbedderOption quantize behavior HOT 1
- Reuse video texture for mask mixing HOT 4
- Mediapipe tasks-vision iOS 17+ inside Web Workers not working, despite iOS 17+ already support it. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mediapipe.