Giter Site home page Giter Site logo

chang-chia-chi / pedestrian-detection Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 4.0 112.9 MB

Pedestrian Detection using Fast-R-CNN with Pytorch.

Jupyter Notebook 99.15% Python 0.85%
python pytorch pedestrian-detection fast-rcnn transfer-learning coco-dataset

pedestrian-detection's Introduction

Pedestrian-Detection

This is my final project result of digital-image-processing course held by NCTU in 2020.

Pedestrian Detection

The goal is to detect pedestrian in pictures or video, and count approximate number of people in the scene. Using pre-trained Fast-R-CNN model and transfer learning, the performance is pretty good with only 260 training pictures[performance]. Because the problem scale is not big, Google Colaboratory is used for easy enviroment setting and free GPU.

Dataset

Two dataset are used, one is from pytorch official turtorial, the other is from widerperson and randomly select 100 pictures.

Dataset Feature
Pytorch tutorial Only two~three people in the picture.
WiderPerson Tens of people in the picture, which will highly enhance ability of ai to identify people in crowd.

See link for training pictures and corresponding coco dataset json file.

CoCo Dataset JSON Format (reference)

CoCo is abbreviation of Common Objects in COntext, quote from cocodataset.org:

COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.

Coco has been used for so many projects because it's one of best image dataset in the world. Below will shortly introduce basic structure of coco dataset format for your own training data. You could find example python code in link which illustrate how to convert annotation or mask information to json file.

Structure and section

Structure

{
    "info": info,
    "licenses": [license],
    "images": [image],
    "annotations": [annotation],
    "categories": [category]
}

section

1. Info: contains high level information about the dataset.

"info":{   
    "year": int,      
    "version": str,     
    "description": str,   
    "contributor": str,   
    "url": str,   
    "date_created": datetime,   
}   

2. licence: contains a list of image licenses that apply to images in the dataset.

"license":[
        {
        "id": int, 
        "name": str,  
        "url": str,    
        }   
]   

3. images: contains the complete list of images in your dataset.

image{    
    "id": int,    
    "width": int,   
    "height": int,    
    "file_name": str,   
    "license": int,   
    "flickr_url": str,    
    "coco_url": str,    
    "date_captured": datetime,    
}

4. annotation: contains a list of every individual object annotation from every image in the dataset.

annotation{   
    "id": int,    
    "image_id": int,
    "category_id": int,
    "segmentation": [], (Fast-R-Cnn does not need this information)
    "area": float,
    "bbox": [x,y,width,height],
    "iscrowd": 0 or 1,
}

This one is the most trickiest and important to understand, so a table collect purpose for each variable as below:

variable purpose
id annotation id
image_id corresponding image's id
category_id which category of the object belongs to
segmentation segmentation information for Mask-R-CNN
area area for marked object in the picture, usually computed by height * width of box
bbox x, y are position of left corner of box; width and height are box dimension
iscrowd specifies whether the segmentation is for a single object or for a group/cluster of objects

P.S. Every object marked has one annotation. So it's one to one relationship between object and annotation id.

5. catogories: contains a list of categories (e.g. dog, person) and each of those belongs to a supercategory (e.g. animal, human).

{   
    "id": int,    
    "name": str,    
    "supercategory": str,   
}   

pedestrian-detection's People

Contributors

chang-chia-chi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.