Giter Site home page Giter Site logo

foveate_blockwise's Introduction

foveate_blockwise

Real-time image and video foveation transform using PyCUDA

Foveation implementation using adaptive Gaussian blurring optimized for real-time performance. The algorithm exploits the CUDA architecture to generate the foveated image in blocks of varying blurring strength.

See it in action: https://youtu.be/Rr5oaiIsVbA

What is Block-wise Foveation?

blockwise_approach

Our visual field is non-uniform. Acuity is high at the fovea, where cone photoreceptors are closely packed, allowing us to discern fine details. Cone density drops off sharply towards the periphery, giving us lower spatial acuity but greater awareness of our surroundings with a wide field of view.

Block-wise foveation enables real-time experimentation with spatial frequency models of human or animal retinas. It is built around the CUDA architecture, utilizing the parallel processing power of the GPU to perform spatially-variant Gaussian blur on the image frame.

Eye trackers enable us to discern the fixation point of the user and allocate resources appropriately, be it the rendering workload in virtual reality applications, or compression strength for streaming video. The flexibility and real-time performance of block-wise foveation supports psychophysical experiments to determine the parameters of a foveation pipeline, and serves as a showcase of the suitability of the CUDA architecture for GPU-accelerated foveation.

Getting Started

Blurring strength throughout the image frame can be defined in one of two ways:

  1. A circularly-symmetric function can be used to define the spatial frequency falloff with eccentricity from the fixation point - an implementation is provided based on parameters and psychometric functions sourced from Wilson S. Geisler, Jeffrey S. Perry, "Real-time foveated multiresolution system for low-bandwidth video communication," Proc. SPIE 3299, Human Vision and Electronic Imaging III, (17 July 1998).

  2. A greyscale image can be used as a map of retinal ganglion cell (RGC) density distribution and therefore the blurring strength across the image frame.

Example of greyscale RGC maps and their foveation transforms: map example

The fixation point (center of gaze) can be displaced anywhere in the visual field.

We provide three files:

  1. foveate_blockwise.py: Foveates and displays/saves a single image from the /images directory.
  2. foveate_blockwise_track.py: A real-time foveation demo where the fixation point follows the mouse cursor.
  3. foveate_blockwise_draw.py: Similar to the tracking demo, but the user first draws a greyscale RGC mapping before seeing it in action on an image.

Install

This implementation requires the CUDA Toolkit and PyCUDA wrapper.

PyCUDA and other requirements can be installed using pip:

pip install -r requirements.txt

Run

To foveate a single image:

python src/foveate_blockwise.py -v

To foveate and save a particular image, place it in the /images directory, then specify its name with the -i parameter. To save the image use the -o option, and provide the output directory and filename:

python src/foveate_blockwise.py -i my_image.jpg -o output/fov_image.png

To run the tracking demo:

python src/foveate_blockwise_track.py 

To run the drawing demo:

python src/foveate_blockwise_draw.py

Other available options for each file can be found with the -h parameter:

  • -h, --help: Displays help
  • -p, --gazePosition: Gaze position coordinates, (vertical down) then (horizontal right), (e.g. -p 512,512)
  • -f, --fragmentSize: Width and height of fragments for foveation, (e.g. -f 16,16), default: 32 x 32
  • -v, --visualize: Show foveated images
  • -i, --inputFile: Input image from "images" folder
  • -o, --outputDir: Output directory and filename

Citation

If you found this code useful, consider citing:

@article{BlockwiseFoveation,
author = {Malkin, Elian and Deza, Arturo and Poggio, Tomaso},
title = {{CUDA}-{Optimized} real-time rendering of a {Foveated} {Visual} {System}}
year = {2020},
publisher = {OpenReview},
journal = {Shared Visual Representations in Human & Machine Intelligence},
howpublished = {\url{https://openreview.net/forum?id=ZMsqkUadtZ7}}
}

License

MIT License.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.