Giter Site home page Giter Site logo

hou-yz / mvdet Goto Github PK

View Code? Open in Web Editor NEW
161.0 5.0 29.0 3.56 MB

[ECCV 2020] Codes and MultiviewX dataset for "Multiview Detection with Feature Perspective Transformation".

Home Page: https://hou-yz.github.io/publication/2020-eccv2020-mvdet

Python 50.31% Objective-C 0.31% MATLAB 37.34% C++ 11.55% C 0.45% Shell 0.03%
pytorch multiview pedestrian-detection dataset detection detector

mvdet's Introduction

Multiview Detection with Feature Perspective Transformation [Website] [arXiv]

@inproceedings{hou2020multiview,
  title={Multiview Detection with Feature Perspective Transformation},
  author={Hou, Yunzhong and Zheng, Liang and Gould, Stephen},
  booktitle={ECCV},
  year={2020}
}

Please visit link for our new work MVDeTr, a transformer-powered multiview detector that achieves new state-of-the-art!

Overview

We release the PyTorch code for MVDet, a state-of-the-art multiview pedestrian detector; and MultiviewX dataset, a novel synthetic multiview pedestrian detection datatset.

Wildtrack MultiviewX
alt text alt text

Content

MultiviewX dataset

Using pedestrian models from PersonX, in Unity, we build a novel synthetic dataset MultiviewX.

alt text

MultiviewX dataset covers a square of 16 meters by 25 meters. We quantize the ground plane into a 640x1000 grid. There are 6 cameras with overlapping field-of-view in MultiviewX dataset, each of which outputs a 1080x1920 resolution image. We also generate annotations for 400 frames in MultiviewX at 2 fps (same as Wildtrack). On average, 4.41 cameras are covering the same location.

Download MultiviewX

Please refer to this link for download.

Build your own version

Please refer to this repo for a detailed guide & toolkits you might need.

MVDet Code

This repo is dedicated to the code for MVDet.

alt text

Dependencies

This code uses the following libraries

  • python 3.7+
  • pytorch 1.4+ & tochvision
  • numpy
  • matplotlib
  • pillow
  • opencv-python
  • kornia
  • matlab & matlabengine (required for evaluation) (see this link for detailed guide)

Data Preparation

By default, all datasets are in ~/Data/. We use MultiviewX and Wildtrack in this project.

Your ~/Data/ folder should look like this

Data
├── MultiviewX/
│   └── ...
└── Wildtrack/ 
    └── ...

Training

In order to train classifiers, please run the following,

CUDA_VISIBLE_DEVICES=0,1 python main.py -d wildtrack

This should automatically return evaluation results similar to the reported 88.2% MODA on Wildtrack dataset.

Pre-trained models

You can download the checkpoints at this link.

mvdet's People

Contributors

hou-yz avatar zichengduan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mvdet's Issues

Lightweight MVDet.

Thank you for this article, I have a great reference! Now I want to apply it to the project, using the Jetson SUB DEVELOPER KIT. Its CPU and GPU have a combined memory of 16GB, which is obviously not compatible with MVDET's requirement for two graphics cards with more than 11GB of memory. So I would like to ask if there is a MVDET lightweight approach, or is there a lightweight approach to optimize it? Hope to hear from you soon.Best wishes.

摄像头重叠可视化的问题

作者,您好!
向您请教一个问题,我使用了您程序datasets/frameDatasets.py中的test()函数进行逆投影变换的可视化,把WILDTRACK数据集7个摄像头的可视化结果叠在一起是这样的,如下图:
Figure_1
但是在WIDETRACK数据集文章中,他给出的交叠结果是这样的:
image
两张图片并不相似。我检查了每个人的position在图片中的投影,发现是没问题的。
image
所以我的问题是:为什么自己显示交叠区域与论文中的差这么多呢??这会让人怀疑是否逆透视变换正确.....

Unity-related questions

Hello, I am experiencing some difficulty getting from the PersonX tutorial on the official PersonX repo to achieving some likeness of your MultiviewX dataset.

Could you perhaps shed some light on how you managed all of the individual people (game objects) and their movement? I am new to Unity and somehow there doesn't seem to be that much documentation on such uses.

For example, the default PersonX code sequentially loads, moves around, and destroys the models. I tried loading multiple models using a for loop and such but they seemed to malfunction and stay in one fixed position.

Any help would be much appreciated. Thank you.

大卷积核可视化

请问Output: pedestrian occupancy map这个可视化在MVdet源码中有体现嘛

Error when load checkpoint: MultiviewDetector.pth

When I run the code :
` resume_fname = resume_dir + '/MultiviewDetector.pth'

    model.load_state_dict(torch.load(resume_fname, map_location='cuda:0'))`

I come across an error:
RuntimeError: Error(s) in loading state_dict for PerspTransDetector: Missing key(s) in state_dict: "map_classifier.0.weight", "map_classifier.0.bias", "map_classifier.2.weight", "map_classifier.2.bias", "map_classifier.4.weight". Unexpected key(s) in state_dict: "base_pt1.0.weight", "base_pt1.1.weight", "base_pt1.1.bias", "base_pt1.1.running_mean", "base_pt1.1.running_var", "base_pt1.1.num_batches_tracked", "base_pt1.4.0.conv1.weight", "base_pt1.4.0.bn1.weight", "base_pt1.4.0.bn1.bias", "base_pt1.4.0.bn1.running_mean", "base_pt1.4.0.bn1.running_var", "base_pt1.4.0.bn1.num_batches_tracked", "base_pt1.4.0.conv2.weight", "base_pt1.4.0.bn2.weight", "base_pt1.4.0.bn2.bias", "base_pt1.4.0.bn2.running_mean", "base_pt1.4.0.bn2.running_var", "base_pt1.4.0.bn2.num_batches_tracked", "base_pt1.4.1.conv1.weight", "base_pt1.4.1.bn1.weight", "base_pt1.4.1.bn1.bias", "base_pt1.4.1.bn1.running_mean", "base_pt1.4.1.bn1.running_var", "base_pt1.4.1.bn1.num_batches_tracked", "base_pt1.4.1.conv2.weight", "base_pt1.4.1.bn2.weight", "base_pt1.4.1.bn2.bias", "base_pt1.4.1.bn2.running_mean", "base_pt1.4.1.bn2.running_var", "base_pt1.4.1.bn2.num_batches_tracked", "base_pt1.5.0.conv1.weight", "base_pt1.5.0.bn1.weight", "base_pt1.5.0.bn1.bias", "base_pt1.5.0.bn1.running_mean", "base_pt1.5.0.bn1.running_var", "base_pt1.5.0.bn1.num_batches_tracked", "base_pt1.5.0.conv2.weight", "base_pt1.5.0.bn2.weight", "base_pt1.5.0.bn2.bias", "base_pt1.5.0.bn2.running_mean", "base_pt1.5.0.bn2.running_var", "base_pt1.5.0.bn2.num_batches_tracked", "base_pt1.5.0.downsample.0.weight", "base_pt1.5.0.downsample.1.weight", "base_pt1.5.0.downsample.1.bias", "base_pt1.5.0.downsample.1.running_mean", "base_pt1.5.0.downsample.1.running_var", "base_pt1.5.0.downsample.1.num_batches_tracked", "base_pt1.5.1.conv1.weight", "base_pt1.5.1.bn1.weight", "base_pt1.5.1.bn1.bias", "base_pt1.5.1.bn1.running_mean", "base_pt1.5.1.bn1.running_var", "base_pt1.5.1.bn1.num_batches_tracked", "base_pt1.5.1.conv2.weight", "base_pt1.5.1.bn2.weight", "base_pt1.5.1.bn2.bias", "base_pt1.5.1.bn2.running_mean", "base_pt1.5.1.bn2.running_var", "base_pt1.5.1.bn2.num_batches_tracked", "base_pt1.6.0.conv1.weight", "base_pt1.6.0.bn1.weight", "base_pt1.6.0.bn1.bias", "base_pt1.6.0.bn1.running_mean", "base_pt1.6.0.bn1.running_var", "base_pt1.6.0.bn1.num_batches_tracked", "base_pt1.6.0.conv2.weight", "base_pt1.6.0.bn2.weight", "base_pt1.6.0.bn2.bias", "base_pt1.6.0.bn2.running_mean", "base_pt1.6.0.bn2.running_var", "base_pt1.6.0.bn2.num_batches_tracked", "base_pt1.6.0.downsample.0.weight", "base_pt1.6.0.downsample.1.weight", "base_pt1.6.0.downsample.1.bias", "base_pt1.6.0.downsample.1.running_mean", "base_pt1.6.0.downsample.1.running_var", "base_pt1.6.0.downsample.1.num_batches_tracked", "base_pt1.6.1.conv1.weight", "base_pt1.6.1.bn1.weight", "base_pt1.6.1.bn1.bias", "base_pt1.6.1.bn1.running_mean", "base_pt1.6.1.bn1.running_var", "base_pt1.6.1.bn1.num_batches_tracked", "base_pt1.6.1.conv2.weight", "base_pt1.6.1.bn2.weight", "base_pt1.6.1.bn2.bias", "base_pt1.6.1.bn2.running_mean", "base_pt1.6.1.bn2.running_var", "base_pt1.6.1.bn2.num_batches_tracked", "world_classifier.0.weight", "world_classifier.0.bias", "world_classifier.2.weight", "world_classifier.2.bias", "world_classifier.4.weight".
Please tell me how can I resolve it?

Error when load checkpoint: Missing key(s) in state_dict & Unexpected key(s) in state_dict

Help: An error occured to me as shown in below, how to solve it?

Missing key(s) in state_dict: "map_classifier.0.weight", "map_classifier.0.bias", "map_classifier.2.weight", "map_classifier.2.bias", "map_classifier.4.weight".

Unexpected key(s) in state_dict: "world_classifier.0.weight", "world_classifier.0.bias", "world_classifier.2.weight", "world_classifier.2.bias", "world_classifier.4.weight".

What is the best way to express the performance of each frame?

Hello author, thank you for your article. I have benefited a lot. I have a few questions to ask you. The final performance indicators of this task are MODA and MODP, but they are performance representations of a series of frames or the entire test set. , if I want to express the performance of each frame, which indicator do you think is more appropriate? Thank you again for your great contribution to this field, and I look forward to your reply. I wish you good luck in your work!

Ground truth bounding box is not perfect.

Hi !.
Thank you for sharing your work.

I found that the bounding box annotation in Multiview data set is not perfect.
Some people are not detected from the bounding box annotation.

I guess the reason you didn't label them is those people are outside of the occupancy map?

Model checkpoints

Thank you for your valuable contribution to the field.

When I start training the model I am not being able to see the model checkpoints or weights, also training take a long time. I appreciate any suggestions.

Best wishes,

Problem with black screen during training

I can successfully run MVDet, but during the training process, my computer suddenly goes black, and there is no response to any keyboard or mouse input. The monitor shows no HDMI signal input, but the graphics card fan is still running crazily. I can train up to epoch 4 at most, but most of the time I can only reach epoch 1. I was running on Ubuntu18.04, and my graphics card is two NVIDIA GeForce RTX 3080 with 10G memory each, and I’m not sure if that’s enough. The GPU memory usage rate does not stay at 100% during the code execution process, and the temperature is up to 86°C. I would like to know if the author and other friends have encountered similar problems, and if you can provide some possibilities and suggestions. This is really important to me, thank you.

What is the minimum hardware requirement?

Hi, could you please tell me the GPU consumption of the training? I only got two RTX2070Super at home, would it be possible to train the network with only these equipment? Many thanks

Are the detection results corresponding to the pixels obtained after NMS processing of the pedestrian occupancy map equivalent to the confidence of the anchor-based bounding box in the target detection task?

Hello author, thank you very much for your work, which has brought me great inspiration. I have a question I want to know,Are the detection results corresponding to the pixels obtained after NMS processing of the pedestrian occupancy map equivalent to the confidence of the anchor-based bounding box in the target detection task?I want to optimize inference based on this work. I wonder if the average value of this probability value can be used as a performance indicator of inference results without groundtruth.Looking forward to your reply, thanks again!!!

different grid unit between two datasets

Hi, Thanks for sharing such a nice work! I am still little confused about the worldgrid2worldcoord_matrix, why the unit of grid of MultiviewX is meter but in Wildtrack is centimeter. Does this setting have any special meaning? If I also set the wildtrack one also to meters, will it have any effect?

Pretrained model for WILDTRACK dataset

Congratulations on your work, I liked it so much! Would you be able to share a pretrained model for the WILDTRACK dataset? I do not have two GPUs available right now for training due to the COVID-19 pandemic. Best regards.

how to show ground plane?

I'm running the code and don't know how to show the ground plane. can you show me how to show it. how can i extract the .jeson files in annotations_positions. Thanks you very much.

Wrong MultiviewX grid visualization

I use calibration from MultivewX to run grid_visualize.py, code shown below.

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import cv2
from multiview_detector.utils import projection
from multiview_detector.datasets.MultiviewX import MultiviewX

if __name__ == '__main__':
    img = Image.open('D:/dataset/tracking/CMC/data/images/MultiviewX/Image_subsets/C1/0000.png')
    dataset = MultiviewX('D:/dataset/tracking/CMC/data/images/MultiviewX')
    xi = np.arange(0, 480, 40)
    yi = np.arange(0, 1440, 40)
    world_grid = np.stack(np.meshgrid(xi, yi, indexing='ij')).reshape([2, -1])
    world_coord = dataset.get_worldcoord_from_worldgrid(world_grid)
    img_coord = projection.get_imagecoord_from_worldcoord(world_coord, dataset.intrinsic_matrices[0],
                                                          dataset.extrinsic_matrices[0])
    img_coord = img_coord[:, np.where((img_coord[0] > 0) & (img_coord[1] > 0) &
                                      (img_coord[0] < 1920) & (img_coord[1] < 1080))[0]]
    plt.imshow(img)
    plt.show()
    img_coord = img_coord.astype(int).transpose()
    img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
    for point in img_coord:
        cv2.circle(img, tuple(point.astype(int)), 5, (0, 255, 0), -1)
    img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    img.save('img_grid_visualize_mx.png')
    plt.imshow(img)
    plt.show()
    pass

However, the result of grid visualization seems not to be corrected. Could you please help me correct it?

image

Here is the grid visualization for the Wildtrack dataset
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.