Giter Site home page Giter Site logo

cerenaut / preprocess-cifar Goto Github PK

View Code? Open in Web Editor NEW
3.0 4.0 5.0 43 KB

Preprocess CIFAR dataset, creating a set of images.

License: GNU General Public License v3.0

Jupyter Notebook 72.54% Python 27.46%
preprocess cifar machine-learning artificial-intelligence artificial-general-intelligence agi

preprocess-cifar's Introduction

Preprocess-CIFAR

A tool for converting CIFAR-10 and CIFAR-100 datasest into PNG images with additional preprocessing options such as grayscaling.

Introduction

The tools provided are compatible with CIFAR-10 and CIFAR-100 datasets which contains 32x32 images that are a subset of the 80 Million Tiny Images dataset.

CIFAR-10 dataset contains 60,000 32x32 colour images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. CIFAR-100 is similar to CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class.

Benchmarks for the CIFAR-10 and CIFAR-100 datasets, and others can be found here.

Preprocessing

The training and test datasets are provided in files that contain a 'pickled' object produced with cPickle.

The script loads the datasets into Numpy arrays, and the features are separated from the labels. The data is then converted to images into training and testing directories. The format for the filename is as follows: TYPE_RANDOM_LABEL_LABELCOUNT.png

  • TYPE: Indicates dataset type, could be either train or test
  • RANDOM: Short randomly generated UUID-style characters e.g. 7daa28
  • LABEL: The groundtruth label for the image (between 0-9)
  • LABELCOUNT: The count for how many times a label was seen to easily

This format is useful for quickly extracting information about the dataset and target labels from the filename, while ensuring that each image's filename is unique.

Getting Started

Requirements

  • Python 2.7+

Installation

Install the Python dependencies using pip: pip install -r REQUIREMENTS.txt

Usage

CIFAR-10

The training data in CIFAR-10 comes in 5 different 'batch' files, while the testing data comes in a single file. Before starting, ensure that you have the data_batch_1, data_batch_2, data_batch_3, data_batch_4, data_batch_5 and test_batch provided here. The script accepts a folder path as the input directory containing the necessary files. The script also assumes the output directory exists so ensure that you have a designated output directory for the preprocessed images, it will not be created automatically.

To preprocess the training set, use the following:

python src/cifar10.py --dataset train --input_folder /path/to/pickled/files --output_path /path/to/output/training

To preprocess the test set, use the following:

python src/cifar10.py --dataset test --input_folder /path/to/pickled/files --output_path /path/to/output/testing

Note: We assume that the filenames are kept intact from the original dataset. If they have been renamed, the constants can be easily changed inside src/cifar10.py to the appropriate filename.

CIFAR-100

Unlike CIFAR-10, the CIFAR-100 dataset comes in a single file for the training set and a sinlge file for the test set. Before starting, ensure that you have the train and test sets provided here. The script accepts a folder path as the input directory containing the necessary files. As above, the script assumes the output directory exists so ensure that you have a designated output directory for the preprocessed images.

To preprocess the training set, use the following:

python src/cifar100.py --dataset train --input_file /path/to/train --output_path /path/to/output/training

To preprocess the test set, use the following:

python src/cifar100.py --dataset train --input_file /path/to/test --output_path /path/to/output/testing

Grayscale

The original images are coloured, you may optionally pass the --grayscale parameter to convert the images to grayscale.

Logging

You may optionally pass the --logging info parameter to display the progress of the script, which looks like this:

...
[utils.py:120 - preprocess() - INFO] Step #6000: saved train_b9c487_42_65.png
[utils.py:120 - preprocess() - INFO] Step #7000: saved train_8961fc_14_71.png
[utils.py:120 - preprocess() - INFO] Step #8000: saved train_a300ef_91_76.png
[utils.py:120 - preprocess() - INFO] Step #9000: saved train_23da3c_42_93.png
[utils.py:120 - preprocess() - INFO] Step #10000: saved train_686758_48_103.png
[utils.py:120 - preprocess() - INFO] Step #11000: saved train_56e075_64_130.png
[utils.py:120 - preprocess() - INFO] Step #12000: saved train_fa3998_53_121.png
[utils.py:120 - preprocess() - INFO] Step #13000: saved train_378111_17_105.png
...

Reference

Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.

preprocess-cifar's People

Contributors

abdel avatar affogato avatar drawlinson avatar maximelr avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.