Giter Site home page Giter Site logo

nikolasmarkou / blind_image_denoising Goto Github PK

View Code? Open in Web Editor NEW
6.0 4.0 1.0 141.57 MB

Implementing CVPR 2020 paper "ROBUST AND INTERPRETABLE BLIND IMAGE DENOISING VIA BIAS - FREE CONVOLUTIONAL NEURAL NETWORKS"

License: MIT License

Python 1.23% Jupyter Notebook 98.77% Makefile 0.01%
computer-vision deep-learning deep-neural-networks tensorflow machine-learning keras keras-tensorflow keras-neural-networks python denoise

blind_image_denoising's Introduction

A library for blind image denoising algorithms using bias free denoising CNN's


Getting Started โ€ข License

python tensorflow


Blind Image Denoising

The idea is that denoising is a task orthogonal to most medium/high level computer vision tasks and should always be performed beforehand by a fast, independently trained, bias free network. This would enable any medium/high level vision networks to focus on their main task.

Target

My target is to create a series of:

  • multi scale
  • interpretable
  • high performance
  • low memory footprint

models that performs denoising on an input (grayscale or colored) image.

The bias-free nature of the model allows for easy interpretation and use as prior for hard inverse problems.

Interpretation

Interpretation comes naturally by implementing the CVPR 2020 paper :

"ROBUST AND INTERPRETABLE BLIND IMAGE DENOISING VIA BIAS - FREE CONVOLUTIONAL NEURAL NETWORKS"

This paper provides excellent results

The bias-free nature of the model means that it is completely interpretable as a weighted mask per pixel for each pixel as shown below.

Corruption types

In order to train such a model we corrupt an input image using several types of noise and then try to recover the original image

  • subsampling noise
  • normally distributed additive noise (same per channel / different same per channel)
  • normally distributed multiplicative noise (same per channel / different same per channel)

in addition to that there is a small smoothing kernel (3x3) that probabilistically embeds the noise.

Pretrained models

Currently we have 3 pretrained models:

They are all resnet variants with depths 6, 12 and 18. They were all trained for 20 epochs on KITTI, Megadepth, BDD, WIDER and WFLW datasets.

Image examples

The following samples are 256x256 crops from the KITTI dataset, denoised using the resnet_color_1x18_bn_16x3x3_256x256_l1_relu model.

We add truncated normal noise with different standard deviations and calculate the Mean Absolute Error (MAE) both for the noisy images, and the denoised images. The pixel range is 0-255.

We can clearly see that the model adapts well to different ranges of noise.

noise (std) MAE (noisy) MAE (denoised) Normal - Noisy - Denoised
1 0.65 4.33
5 3.50 3.39
10 6.44 5.19
20 13.22 6.60
30 19.84 8.46
40 27.02 12.95
50 30.59 15.06
60 34.34 17.81
70 40.64 22.36
80 45.68 27.99

How to use (from scratch)

  1. prepare training input
  2. prepare training configuration
  3. run training
  4. export to tflite and saved_model format
  5. use models

Train

Prepare a training configuration and train with the following command:

python -m bfcnn.train \ 
  --model-directory ${TRAINING_DIR} \ 
  --pipeline-config ${PIPELINE}

Export

Export to frozen graph and/or tflite with the following command:

python -m bfcnn.export \
    --checkpoint-directory ${TRAINING_DIR} \
    --pipeline-config ${PIPELINE} \
    --output-directory ${OUTPUT_DIR} \
    --to-tflite

How to use (pretrained)

Use any of the pretrained models included in the package.

import bfcnn
import tensorflow as tf

# load model
denoiser_model = \
    bfcnn.load_model(
        "resnet_color_1x6_bn_16x3x3_256x256_l1_relu")

# create random tensor
input_tensor = \
    tf.random.uniform(
        shape=[1, 256, 256, 3],
        minval=0,
        maxval=255,
        dtype=tf.int32)
input_tensor = \
    tf.cast(
        input_tensor,
        dtype=tf.uint8)

# run inference
denoised_tensor = denoiser_model(input_tensor)

Designing the best possible denoiser

  1. Add a small hinge at the MAE loss. values from 0.5 - 2.0 (from 255) seem to work very good
  2. Multiscale models work better, 3-4 scales is ideal. LUnet seems to perform very well.
  3. Soft-Orthogonal regularization provides better generalization, but it's slower to train.
  4. Effective Receptive Field regularization provides better generalization, but it's slower to train.
  5. Squeeze-and-Excite provides a small boost without many additional parameters.
  6. Avoid Batch Normalization at the end.
  7. Residual learning (learning the noise) trains faster and gives better metrics but may give out artifacts, so better avoid it. All these options are supported in the configuration.

Model types

We have used traditional (bias free) architectures.

  • resnet
  • resnet with sparse constraint
  • resnet with on/off per resnet block gates
  • all the above models with multi-scale processing

Multi-Scale

The system is trained in multiple scales by implementing ideas from LapSRN (Laplacian Pyramid Super-Resolution Network) and MS-LapSRN (Multi-Scale Laplacian Pyramid Super-Resolution Network)

Low-Memory footprint

By using a gaussian pyramid and a shared bias-free CNN model between each scale, we can ensure that we have a small enough model to run on very small devices while ensure we have a big enough ERF (effective receptive field) for the task at hand.

Additions

Our addition (not in the paper) is the laplacian multi-scale pyramid that expands the effective receptive field without the need to add many more layers (keeping it cheap computationally).

Which breaks down the original image into 3 different scales and processes them independently:

We also have the option to add residuals at the end of each processing levels, so it works like an iterative process:

Our addition (not in the paper) is the gaussian multi-scale pyramid that expands the effective receptive field without the need to add many more layers (keeping it cheap computationally).

Every resnet block has the option to include a residual squeeze and excite element (not in the paper) to it.

Normalization layer

Our addition (not in the paper) is a (non-channel wise and non-learnable) normalization layer (not BatchNorm) after the DepthWise operations. This is to enforce sparsity with the differentiable relu below.

Differentiable RELU

Our addition (not in the paper) is a differentiable relu for specific operations.

Added optional orthogonality regularization constraint as found in paper Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?. This forces a soft ortho-normal constraint on the kernels.

Custom regularization that forces a soft orthogonal constraint on the kernels while still allowing the kernels to grow independently or shrink to almost zero.

Custom regularization that gives incentive to convolutional kernels to have higher weights away from the center

References

  1. Robust and interpretable blind image denoising via bias-free convolutional neural networks
  2. Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks
  3. Densely Residual Laplacian Super-Resolution
  4. Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?
  5. Squeeze-and-Excitation Networks

Special Thanks

I would like to thank Pantelis Georgiades and Alexandros Georgiou from the Cyprus Institute for doing precious hyperparameter search for me on their super computer. Their help accelerated my project enormously.

blind_image_denoising's People

Contributors

nikolasmarkou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

devhliu

blind_image_denoising's Issues

Human Perceptual Loss

A fantastic repository, thank you.

I'm just getting started with it, but I just thought I'd reach out and ask if you'd accept a pull request that trains for human percetual quality rather than MAE / PSNR in future?

I'm thinking a simple way to achieve a partial solution is to retrain on images in a colourspace like OKLAB where perceptual difference is baked in, and the perceived colour difference formula is trivial, instead of monstrous!

I was also thinking 'edge loss' or extra channels for the H&V image gradients during training with L1 loss, discarded after, could be good.

A rolls royce solution might be adversarial loss, perhaps using a secondary network like Netflix vmaf or something?

If there are any resources related to perceptual quality rather than PSNR, please do point me in the right direction :-)

Add SSIM in loss function

Use "Image Quality Assessment: From Error Visibility to Structural Similarity, 2004" to enhance the loss function

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.