Giter Site home page Giter Site logo

iamaaditya / image-compression-cnn Goto Github PK

View Code? Open in Web Editor NEW
318.0 11.0 84.0 141.6 MB

Semantic JPEG image compression using deep convolutional neural network (CNN)

License: MIT License

Python 100.00%
cnn image-compression jpeg deep-convolutional-networks

image-compression-cnn's Introduction

Semantic Perceptual Image Compression using Deep Convolution Networks

This code is part of the paper arxiv, abstract of the paper is provided at the bottom of this page. It consists of three parts:

  1. Code to generate Multi-structure region of interest (MSROI) (This uses CNN model. A pretrained model has been provided)

  2. Code to use MSROI map to semantically compress image as JPEG

  3. Code to train a CNN model (to be used by 1)

Requirements:

  1. Tensorflow
  2. Numpy
  3. Pandas
  4. Python PIL
  5. Python SKimage

For detailed requirements list please see requirements.txt

Recomended:

  1. Imagemagick (for faster image operations)
  2. VQMT (for obtaining metrics to compare images)

Table of Contents

How to use this code ?

Generating Map

```
python generate_map.py <image_file>
```

Generates Map and overlay file inside 'output' directory.

If you get this error

```
InvalidArgumentError (see above for traceback): Unsuccessful TensorSliceReader constructor: 
Failed to get matching files on models/model-50: Not found: models
```

It means you have not downloaded the model file or it is not accesible. Code assumes a model files inside models directory. Model has been uploaded to Github, but if it does not download due to GH's restriction you may download it from here https://www.dropbox.com/s/izfas78534qjg08/models.tar.gz?dl=0

Compressing image using the Map

```
python combine_images.py -image <image_file> -map <map_file>
```

Map file is the file generated by aforementioned step. Default name for map is output/msroi_map.jpg

There are several other command line options. Please check the code for the more details.

IMPORTANT: Current default setting has threshold of 20%, i.e the compressed filesize is allowed to be 20% more than the standard JPEG. This is done so that difference in 'semantic object' compression can be visually examined. For fair comparison use '-threshold_pct 1'.

Training your own model

To train your model, you will need class labelled training examples, like CIFAR, Caltech or Imagenet. There is no need for 'localization' ground truth.

  1. Generate the data pickles
    python prepare_data.py
    

Make sure that self.images point to the directory containing images.

  1. It is not required to use pretrained VGG weights, but if you do training will be faster. You may download pretrained weights referred in Params file as vgg_weights from here.

  2. Use train.py to train the model. Models will be saved in 'models' directory after every 10 epoch. All the parematers and hyper-paramter can be adjusted at param.py

Evaluating metrics

  1. Use the '-print_metrics' command while calling 'combine_images.py'. This will print the metrics on STDOUT with this format --
jpeg_psnr,jpeg_ssim,our_ssim,our_q,jpeg_psnrhvs,png_size,model_number,our_size,filename,jpeg_vifp,jpeg_q,jpeg_msssim,our_psnrhvsm,jpeg_psnrhvsm,our_vifp,our_psnr,our_msssim,our_psnrhvs,jpeg_size
  1. Pass the file which contains one line of metrics (as shown above) to the file 'read_log.py'. This will print various stats, and also plot the graphs as shown in the paper.

Multi-Structure Region-of-interest

Comparison of MSROI with other techniques Only our model identifies the face of the boy on the right as well the hands of both children at the bottom.

What this is ?

  • Find all semantic regions in an image in a single pass
  • Train without the localization data
  • Maximize the number of objects detected (maybe all?)
  • Need not be precise
  • It is used for image compression because we need less precision but more generic information about the content of the image

What this is NOT ?

Design Choices

  • Tensorflow 3D convolutions for class invariant features

  • Multi-label nn.softmax instead of nn.sparse (non-exclusive classes)

  • Argsort and not argmax to obtain top-k class information

FAQ about image compression

  1. Is the final image really a standard JPEG?

    Yes, the final image is a standard JPEG as it is encoded using standard JPEG.

  2. But how can you improve JPEG using JPEG ?

    Standard JPEG uses a image level Quantization scaling Q. However, not all parts of the image be compressed at same level. Our method allows to use variable Q.

  3. Don't we have to store the variable Q in the image file?

    No. Because the final image is encoded using a single Q. Please see Section 4 of our paper.

Possible Issues

  1. Cannot import Image (in util.py) Resolution: Change it to from PIL import Image

  2. ValueError: setting an array element with a sequence. Resolution: The image file you are passing does not exist

  3. UserWarning: Possible precision loss when converting from float32 to uint8 Resolution: This is only a warning from skimage. Nothing needs to be done

  4. No message or nothing Default behaviour of the program is such that there is no message unless there is an error, or verbose is set to True. Check "output" directory for output files. If there is no output directory then make one and run the code again.

Credits

My sincere thanks to @jazzsaxmafia, @carpedm20 and @metalbubble from whose code I learned and borrowed heavily.

Abstract

========================================================================================================================

It has long been considered a significant problem to improve the visual quality of lossy image and video compression. Recent advances in computing power together with the availability of large training data sets has increased interest in the application of deep learning cnns to address image recognition and image processing tasks. Here, we present a powerful cnn tailored to the specific task of semantic image understanding to achieve higher visual quality in lossy compression. A modest increase in complexity is incorporated to the encoder which allows a standard, off-the-shelf jpeg decoder to be used. While jpeg encoding may be optimized for generic images, the process is ultimately unaware of the specific content of the image to be compressed. Our technique makes jpeg content-aware by designing and training a model to identify multiple semantic regions in a given image. Unlike object detection techniques, our model does not require labeling of object positions and is able to identify objects in a single pass. We present a new cnn architecture directed specifically to image compression, which generates a map that highlights semantically-salient regions so that they can be encoded at higher quality as compared to background regions. By adding a complete set of features for every class, and then taking a threshold over the sum of all feature activations, we generate a map that highlights semantically-salient regions so that they can be encoded at a better quality compared to background regions. Experiments are presented on the Kodak PhotoCD dataset and the MIT Saliency Benchmark dataset, in which our algorithm achieves higher visual quality for the same compressed size.

image-compression-cnn's People

Contributors

dependabot[bot] avatar iamaaditya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

image-compression-cnn's Issues

Issue with generate_map.py: "TypeError: Input 'split_dim' of 'Split' Op has type float32 that does not match expected type of int32"

python generate_map.py test.jpg
2017-03-12 06:16:15: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.8.0.dylib locally
2017-03-12 06:16:15: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.5.dylib locally
2017-03-12 06:16:15: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.8.0.dylib locally
2017-03-12 06:16:15: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.1.dylib locally
2017-03-12 06:16:15: I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.8.0.dylib locally
Traceback (most recent call last):
  File "generate_map.py", line 34, in <module>
    conv_last, gap, class_prob = cnn.build(images_tf)
  File "/Users/ME/github/neural-compression/image-compression-cnn/model.py", line 102, in build
    image = self.image_conversion_scaling(image)
  File "/Users/ME/github/neural-compression/image-compression-cnn/model.py", line 95, in image_conversion_scaling
    r, g, b = tf.split(3, 3, image)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1222, in split
    split_dim=axis, num_split=num_or_size_splits, value=value, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3426, in _split
    num_split=num_split, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 509, in apply_op
    (prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'split_dim' of 'Split' Op has type float32 that does not match expected type of int32.

I have downloaded the model files and had to change the line 7 from params.py to correctly load the caffe_layers_value.pickle data.

But for some reason i can't generate the ROI map – is my Tensorflow version (1.0.1) too new? You have set the version to 0.11.0 in requirements.txt. Do you think the new TF version breaks the code?

ROI range size

Can I change the heat map range of the output by adjusting the parameters? In other words, making the ROI area wider

Some problems in codings

In combine_images.py, 143 lines, if current_size<= original_size*(1 + args.threshold_pct/100.0). It means current_size is large than standard JPEG, because threshold_pct is set 20 by default at begaining. But in papers, you said your algorithm achieves higher visual quality for the same compressed size. How do I understand it?

Python3 support

Nice project! I would be great if you added python3 support though. I just switched down to python2.7 to get things running for now, but it would be nice to future-proof the project!

some questions of training

Hello, would you like help me with below questions?

  1. when try to training mime own model,
    there has an error occured:
    python prepare_data.py
    File "prepare_data.py", line 14
    label_dict = pd.Series(labels, index = label_names) label_dict - = 1
    ^
    SyntaxError: invalid syntax
    is there has any error in the latest codes of master?

  2. before traning mime own model,
    is there has any data should be prepared?
    a. you will need class labelled training examples, like CIFAR, Caltech or Imagenet
    should i install the CIFAR, Caltech or Imagenet in pc?
    b. what type image could be training? is there has size, content, format limition?
    c. is there has any place to download those images?
    thanks

Image compression problem?

By how much on average is an input JPG supposed to be compressed? I am not seeing any compression in file size.

combine_images.py very slow?

Hi,

The 2nd step (Compressing image using the Map) is very slow, i.e:

$ python combine_images.py -image test.jpg -map output/msroi_map.jpg

(the image size is 4032x3024)

took ~22 minutes!

Is this normal? where I should check for this slowness?

Error when train

Hello,

  1. when I train the owner model, I got an error such as below:
    DataLossError (see above for traceback): Unable to open table file models/model_0630.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
    [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

and I have modify params.py file like below:

self.num_epochs = 1 ##200
self.batch_size = 64 #32
self.max_iters = 2 ## 200000
self.test_every_iter = 10 ##200

and after the train operation success finished,
model.data-00000-of-00001 has been generated in models folder,
and there has error when to generate the map with this model.

  1. if I want train the model with pb format, is it possible?
    I changed the train.py file such as below:
    output_graph_def = convert_variables_to_constants(sess, sess.graph_def, ["output"])
    with tf.gfile.FastGFile("models\model.pb", "wb") as f:
    f.write(output_graph_def.SerializeToString())

but it has error such as:
File "train_hj.py", line 113, in
output_graph_def = convert_variables_to_constants(sess, sess.graph_def, ["class"])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py", line 202, in convert_variables_to_constants
inference_graph = extract_sub_graph(input_graph_def, output_node_names)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py", line 141, in extract_sub_graph
assert d in name_to_node_map, "%s is not in graph" % d
AssertionError: class is not in graph

would you like to point out where is error?
thanks.

ValueError: invalid literal for int() with base 10: ''

Traceback (most recent call last):
File "prepare_data.py", line 24, in
label_dict = pd.Series(labels, index = label_names) - 1
File "/home/tq/.local/lib/python3.5/site-packages/pandas/core/series.py", line 239, in init
data = list(data)
File "prepare_data.py", line 22, in
labels = map(lambda x: int(x), labels)
ValueError: invalid literal for int() with base 10: ''

how to calculate PSNR of salient regions

hello!,sorry to disturb you !
I download your code ,but it cant not calculate PSNR of salient regions. which is very import metric.
And this metric demonstrates that this compression method is indeed able to
preserve visual quality in targeted region.
Would you please tell me how to do that ? I would be grateful to you. It does trouble me a lot.

faild when python generate_map.py image.png

python generate_map.py image.png
/usr/local/lib/python2.7/dist-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
Traceback (most recent call last):
File "generate_map.py", line 32, in
cnn.load_vgg_weights()
File "/home/linux/work/tensorflow/image-compression-cnn/model.py", line 33, in load_vgg_weights
with open(hyper.vgg_weights) as f:
IOError: [Errno 2] No such file or directory: './data/caffe_layers_value.pickle'

is it need to caffe also?
mine tensorflow's version is tensorflow-1.1.0-cp27-cp27mu-manylinux1_x86_64.whl

thanks.

Error when train

Hello,

  1. when I train the owner model, I got an error such as below:
    DataLossError (see above for traceback): Unable to open table file models/model_0630.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
    [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

and I have modify params.py file like below:

self.num_epochs = 1 ##200
self.batch_size = 64 #32
self.max_iters = 2 ## 200000
self.test_every_iter = 10 ##200

and after the train operation success finished,
model.data-00000-of-00001 has been generated in models folder,
would you like to share me where is error?

  1. if I want train the model with pb format, is it possible?
    I changed the train.py file such as below:
    output_graph_def = convert_variables_to_constants(sess, sess.graph_def, ["output"])
    with tf.gfile.FastGFile("models\model.pb", "wb") as f:
    f.write(output_graph_def.SerializeToString())

but it has error such as:
File "train_hj.py", line 113, in
output_graph_def = convert_variables_to_constants(sess, sess.graph_def, ["class"])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py", line 202, in convert_variables_to_constants
inference_graph = extract_sub_graph(input_graph_def, output_node_names)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py", line 141, in extract_sub_graph
assert d in name_to_node_map, "%s is not in graph" % d
AssertionError: class is not in graph

would you like to point out where is error?
thanks.

Error with generate map

When I run the command python generate_map.py <image_file>
and this error happened, my tensorflow version is 1.1.0.

/home/shuai/.local/lib/python2.7/site-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
  warn("The default mode, 'constant', will be changed to 'reflect' in "
Traceback (most recent call last):
  File "generate_map.py", line 32, in <module>
    cnn.load_vgg_weights()
  File "/home/shuai/image-compression-cnn/model.py", line 33, in load_vgg_weights
    with open(hyper.vgg_weights) as f:
IOError: [Errno 2] No such file or directory: './data/caffe_layers_value.pickle'

The map of one image is different from each time

@iamaaditya excuse me? I have met a strange problem,that is when I use the model (generate_map.py) provided by you to train a picture to generate a map multiple times, the ROI on each map is different from each time. Have you ever met this problem? Could you tell me the reason why this phenomenon emerge?Because i think the map of one image should be the same if the parameters in the modle are set .

errors in model.py

Hello, thanks for your generosity providing source codes. I got an error when I ran the train.py, which occurs in model.py in conv6_1. The error message is "KeyError". I image that VGG has no conv6_1 while the codes try to extract weights and bias from conv6_1. So should I change line 48 to if hyper.fine_tuning and name != 'conv6' and name !='conv6_1' and name != 'depth' : ?
Thank you again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.