Comments (6)
After having ran a few tests, with lots of failures, I would like to share some observations for others trying to infer this model on videos in the wild (some points may not be related to your problem Michalis):
- the current code stops if there is not stricktly 1 person detected in a given frame, so transitions, text screens or crowds will stop everything. That's understandable but the code need to be tweak if you want the comps to not stop brutally.
- it seems to me that if at least one hand with finger is not detected it fails, but happy to be corrected. At a minimum the model is currently very sensitive to hand and finger detection. All videos in the paper video have characters exhibiting hands all the time and in crisp/sharp situations (no or little blur)
- the quality of video in the wild is quite often limited due to compression, hence the openpose detection may not give the full face/finger details, and then this pose estimation model will not work or not converge properly.
- the model works not necessarily for body poses who are not facing the camera, e.g. people showing their profile or their back to the camera.
- in order to run the model fit on the full skeleton you should have the character in full in the frames all the times while for most videos at some times the camera frame is going to crop the lower part body hence forcing to edit the video and run the full fit on one part and the upper body model on the other part.
So you can still get the code to work for video in the wild, but the above discard a ton of candidates, for the remaining ones you may need to edit heavily to strip out sequences failing the codes (intro/transition screens, no people or over 1 person sequences)
Btw, great piece of code anyway, love the comments, and so impressed by what you guys are doing with OpenPose as a backbone. When it works, the face and hand fitting is amazing!
Regarding perfs, on my core5 with 980ti it takes approx the following per 1080p res frame:
- openpose: 0.3"
- PAF: 5"
- raw full body/face/hands fit:6.7"
- tracked full body/face/hands fit: 27"
(I did not time the ffmpeg video to frame extract because it is fast and not very relevant here)
So the total render time per frame on my machine is approx 12" for the non-tracked model and of 32" for the tracked version.
Below render where all works, great face and hands/arms fit, the body pose is not ideal (should be turned by approx 70deg on the left instead of facing the cam) but I had to run the upper body model so that is understandable:
from monoculartotalcapture.
Also when I run openpose on this specific video it seems to be working well, detecting the body pose well for the whole body.
from monoculartotalcapture.
I can think of 2 possible problems with this output, both related to the resolution of your input video. This input video seems to be very low-resolution, so
(1) The person under estimation would be extremely far away from the camera (very large z value), out of the rendering range of OpenGL, given the current way we estimate the absolute translation of the person in 3D space. This means the person is there but won't be rendered by OpenGL. To fix this, instead of putting the image on the top left corner, you can try to resize the video to 1080 in height or 1920 in width (whichever fits), and then feed into our pipeline.
(2) The input resolution definitely have an influence on the performance of our method, which is in general true for any computer vision algorithm. It is very unlikely our method will be able to get hands pose under this resolution (we mention this is the discussion session of our paper), and our body network possibly won't work perfectly in this case (but you should still see a person there; the current problem is definitely about the distance as explained in point 1).
from monoculartotalcapture.
Please refer to here for what I mean about the rendering range of OpenGL.
from monoculartotalcapture.
Assuming there is some minimum decent resolution, if one increases the Z_Max would that better capture people a bit far away from the camera without compromising the tracking?
from monoculartotalcapture.
After having ran a few tests, with lots of failures, I would like to share some observations for others trying to infer this model on videos in the wild (some points may not be related to your problem Michalis):
- the current code stops if there is not stricktly 1 person detected in a given frame, so transitions, text screens or crowds will stop everything. That's understandable but the code need to be tweak if you want the comps to not stop brutally.
- it seems to me that if at least one hand with finger is not detected it fails, but happy to be corrected. At a minimum the model is currently very sensitive to hand and finger detection. All videos in the paper video have characters exhibiting hands all the time and in crisp/sharp situations (no or little blur)
- the quality of video in the wild is quite often limited due to compression, hence the openpose detection may not give the full face/finger details, and then this pose estimation model will not work or not converge properly.
- the model works not necessarily for body poses who are not facing the camera, e.g. people showing their profile or their back to the camera.
- in order to run the model fit on the full skeleton you should have the character in full in the frames all the times while for most videos at some times the camera frame is going to crop the lower part body hence forcing to edit the video and run the full fit on one part and the upper body model on the other part.
So you can still get the code to work for video in the wild, but the above discard a ton of candidates, for the remaining ones you may need to edit heavily to strip out sequences failing the codes (intro/transition screens, no people or over 1 person sequences)
Btw, great piece of code anyway, love the comments, and so impressed by what you guys are doing with OpenPose as a backbone. When it works, the face and hand fitting is amazing!
Regarding perfs, on my core5 with 980ti it takes approx the following per 1080p res frame:
- openpose: 0.3"
- PAF: 5"
- raw full body/face/hands fit:6.7"
- tracked full body/face/hands fit: 27"
(I did not time the ffmpeg video to frame extract because it is fast and not very relevant here)
So the total render time per frame on my machine is approx 12" for the non-tracked model and of 32" for the tracked version.Below render where all works, great face and hands/arms fit, the body pose is not ideal (should be turned by approx 70deg on the left instead of facing the cam) but I had to run the upper body model so that is understandable:
Thank you for your interest in our code and great analysis of the result. Your comment is very true in general. Our code works only when the details in hands are clearly visible in the images (a good test is to see whether Openpose correctly produces the output). Trying to predict reasonable output in the blurry case is beyond the scope of this paper, as an optimization-based method will never be able to handle that scenario.
from monoculartotalcapture.
Related Issues (20)
- How to get the model? HOT 1
- Server posefs1.perception.cs.cmu.edu Offline
- BUILD ERROR : Glew HOT 12
- Docker is built successfully, but cannot run HOT 1
- about pose_to_transforms
- What's the difference between this work and 3D pose estimation? HOT 1
- Problems with remote server operation
- Problems running on remote servers HOT 1
- How to save the fitting result to obj files?
- Error during docker installation. HOT 9
- CMakeFiles/igl.dir/build.make:1420: recipe for target 'CMakeFiles/igl.dir/include/igl/dual_contouring.cpp.o' failed #1729
- undefined reference to `glewInit' undefined reference to `__glewUseProgram' Renderer.cpp:(.text+0x435): undefined reference to `__glewGenVertexArrays' HOT 1
- There only 30 frames generated by the method while processing a half body video. HOT 1
- posefs1.perception.cs.cmu.edu/mtc seems down, alternative site? HOT 8
- does it work with opencv version 4.2?
- question about save test time
- ---solved
- Error when fitting Adam HOT 1
- question about the time of mesh tracking project
- Unable to download data. http://posefs1.perception.cs.cmu.edu/mtc/mtc_snapshots.zip Timeouts HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from monoculartotalcapture.