Giter Site home page Giter Site logo

misc's Introduction

MISC

The official repo for MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

Dependency

GPT-4 Vision

CLIP_Surgery

Stable Diffusion 2.1

DiffBIR

CompressAI

Instruction

Download weights and put them into the weight folder:

DiffBIR (general_full_v1.ckpt): link Cheng2020-Tuned (cheng_small.pth.tar): link

If you want to use 'mask', download the CLIP_Surgery model. Put the `clip' folder in the same directory as this project.

Run the ipynb code in different modes to decompress the image!

  1. If you want pixel-instructed decoding, set the mode as 'pixel', a larger `block_num_min' means more pixels, with a larger bpps cost.

  2. If you want net-instructed decoding, set the mode as 'net' to use our fine-tuned Cheng-2020 net. You can also use your own net weight trained by CompressAI.

  3. If you want to use other models (like VVC, HiFiC, ...) as the starting point of diffusion, set the mode as 'ref', run your own model, and give the decompressed image and the bpps of your model.

Demo

[Feb 29, 2024] A simple Jupyter demo is uploaded. The encoder and decoder model weights will be uploaded soon.

[Apr 24, 2024] The model weights are uploaded. Please follow the instruction when using the ipynb file. We are working on a pipeline for en/decoding a group of image.

Visualzation Result

Citation

If you find our work useful, please cite our paper as:

@misc{li2024misc,
      title={MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model}, 
      author={Chunyi Li and Guo Lu and Donghui Feng and Haoning Wu and Zicheng Zhang and Xiaohong Liu and Guangtao Zhai and Weisi Lin and Wenjun Zhang},
      year={2024},
      eprint={2402.16749},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

misc's People

Contributors

lcysyzxdxc avatar

Stargazers

 avatar Sixian Wang avatar  avatar yu_bao avatar Weixia Zhang avatar YimChow avatar wenhaoni avatar elucida avatar Zicheng Zhang avatar  avatar

Watchers

 avatar

Forkers

tokkiwa

misc's Issues

training process

Your work is really impressive! I was wondering if you could share the code or any details about your training process.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.