Giter Site home page Giter Site logo

marcelomata / 3ddfa_v2 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cleardusk/3ddfa_v2

0.0 1.0 0.0 74.76 MB

The official PyTorch implementation of Towards Fast, Accurate and Stable 3D Dense Face Alignment, ECCV 2020.

License: MIT License

Python 76.20% Shell 0.17% C++ 16.66% CMake 0.25% Jupyter Notebook 2.48% C 4.24%

3ddfa_v2's Introduction

Towards Fast, Accurate and Stable 3D Dense Face Alignment

License GitHub repo size CodeFactor

By Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei and Stan Z. Li. The code repo is owned and maintained by Jianzhu Guo.

demo

[Updates]

  • 2021.1.15: Borrow the implementation of Dense-Head-Pose-Estimation for the faster mesh rendering (speedup about 3x, 15ms -> 4ms), see utils/render_ctypes.py for details.
  • 2020.10.7: Add the latency evaluation of the full pipeline in latency.py, just run by python3 latency.py --onnx, see Latency evaluation for details.
  • 2020.10.6: Add onnxruntime support for FaceBoxes to reduce the face detection latency, just append the --onnx action to activate it, see FaceBoxes_ONNX.py for details.
  • 2020.10.2: Add onnxruntime support to greatly reduce the 3dmm parameters inference latency, just append the --onnx action when running demo.py, see TDDFA_ONNX.py for details.
  • 2020.9.20: Add features including pose estimation and serializations to .ply and .obj, see pose, ply, obj options in demo.py.
  • 2020.9.19: Add PNCC (Projected Normalized Coordinate Code), uv texture mapping features, see pncc, uv_tex options in demo.py.

Introduction

This work extends 3DDFA, named 3DDFA_V2, titled Towards Fast, Accurate and Stable 3D Dense Face Alignment, accepted by ECCV 2020. The supplementary material is here. The gif above shows a webcam demo of the tracking result, in the scenario of my lab. This repo is the official implementation of 3DDFA_V2.

Compared to 3DDFA, 3DDFA_V2 achieves better performance and stability. Besides, 3DDFA_V2 incorporates the fast face detector FaceBoxes instead of Dlib. A simple 3D render written by c++ and cython is also included. This repo supports the onnxruntime, and the latency of regressing 3DMM parameters using the default backbone is about 1.35ms/image on CPU with a single image as input. If you are interested in this repo, just try it on this google colab! Welcome for valuable issues, PRs and discussions ๐Ÿ˜„

Getting started

Requirements

See requirements.txt, tested on macOS and Linux platforms. The Windows users may refer to FQA for building issues. Note that this repo uses Python3. The major dependencies are PyTorch, numpy, opencv-python and onnxruntime, etc. If you run the demos with --onnx flag to do acceleration, you may need to install libomp first, i.e., brew install libomp on macOS.

Usage

  1. Clone this repo
git clone https://github.com/cleardusk/3DDFA_V2.git
cd 3DDFA_V2
  1. Build the cython version of NMS, Sim3DR, and the faster mesh render
sh ./build.sh
  1. Run demos
# 1. running on still image, the options include: 2d_sparse, 2d_dense, 3d, depth, pncc, pose, uv_tex, ply, obj
python3 demo.py -f examples/inputs/emma.jpg --onnx # -o [2d_sparse, 2d_dense, 3d, depth, pncc, pose, uv_tex, ply, obj]

# 2. running on videos
python3 demo_video.py -f examples/inputs/videos/214.avi --onnx

# 3. running on videos smoothly by looking ahead by `n_next` frames
python3 demo_video_smooth.py -f examples/inputs/videos/214.avi --onnx

# 4. running on webcam
python3 demo_webcam_smooth.py --onnx

The implementation of tracking is simply by alignment. If the head pose > 90ยฐ or the motion is too fast, the alignment may fail. A threshold is used to trickly check the tracking state, but it is unstable.

You can refer to demo.ipynb or google colab for the step-by-step tutorial of running on the still image.

For example, running python3 demo.py -f examples/inputs/emma.jpg -o 3d will give the result below:

demo

Another example:

demo

Running on a video will give:

demo

More results or demos to see: Hathaway.

Features (up to now)

2D sparse 2D dense 3D
2d sparse 2d dense 3d
Depth PNCC UV texture
depth pncc uv_tex
Pose Serialization to .ply Serialization to .obj
pose ply obj

Configs

The default backbone is MobileNet_V1 with input size 120x120 and the default pre-trained weight is weights/mb1_120x120.pth, shown in configs/mb1_120x120.yml. This repo provides another config in configs/mb05_120x120.yml, with the widen factor 0.5, being smaller and faster. You can specify the config by -c or --config option. The released models are shown in the below table. Note that the inference time on CPU in the paper is evaluated using TensorFlow.

Model Input #Params #Macs Inference (TF)
MobileNet 120x120 3.27M 183.5M ~6.2ms
MobileNet x0.5 120x120 0.85M 49.5M ~2.9ms

Surprisingly, the latency of onnxruntime is much smaller. The inference time on CPU with different threads is shown below. The results are tested on my MBP (i5-8259U CPU @ 2.30GHz on 13-inch MacBook Pro), with the 1.5.1 version of onnxruntime. The thread number is set by os.environ["OMP_NUM_THREADS"], see speed_cpu.py for more details.

Model THREAD=1 THREAD=2 THREAD=4
MobileNet 4.4ms 2.25ms 1.35ms
MobileNet x0.5 1.37ms 0.7ms 0.5ms

Latency

The onnx option greatly reduces the overall CPU latency, but face detection still takes up most of the latency time, e.g., 15ms for a 720p image. 3DMM parameters regression takes about 1~2ms for one face, and the dense reconstruction (more than 30,000 points, i.e. 38,365) is about 1ms for one face. Tracking applications may benefit from the fast 3DMM regression speed, since detection is not needed for every frame. The latency is tested using my 13-inch MacBook Pro (i5-8259U CPU @ 2.30GHz).

The default OMP_NUM_THREADS is set 4, you can specify it by setting os.environ['OMP_NUM_THREADS'] = '$NUM' or inserting export OMP_NUM_THREADS=$NUM before running the python script.

demo

FQA

  1. What is the training data?

    We use 300W-LP for training. You can refer to our paper for more details about the training. Since few images are closed-eyes in the training data 300W-LP, the landmarks of eyes are not accurate when closing. The eyes part of the webcam demo are also not good.

  2. Running on Windows.

    You can refer to this comment for building NMS on Windows.

Acknowledgement

Other implementations or applications

Citation

If your work or research benefits from this repo, please cite two bibs below : ) and ๐ŸŒŸ this repo.

@inproceedings{guo2020towards,
    title =        {Towards Fast, Accurate and Stable 3D Dense Face Alignment},
    author =       {Guo, Jianzhu and Zhu, Xiangyu and Yang, Yang and Yang, Fan and Lei, Zhen and Li, Stan Z},
    booktitle =    {Proceedings of the European Conference on Computer Vision (ECCV)},
    year =         {2020}
}

@misc{3ddfa_cleardusk,
    author =       {Guo, Jianzhu and Zhu, Xiangyu and Lei, Zhen},
    title =        {3DDFA},
    howpublished = {\url{https://github.com/cleardusk/3DDFA}},
    year =         {2018}
}

Contact

Jianzhu Guo (้ƒญๅปบ็ ) [Homepage, Google Scholar]: [email protected] or [email protected].

3ddfa_v2's People

Contributors

cleardusk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.