Giter Site home page Giter Site logo

clip-gaze's Introduction

clip-gaze

An art analysis tool powered by CLIP.

Motivation

Diffusion models (such as Stable Diffusion) used OpenAI's CLIP in order to perform textual analysis of their training data. Precisely what these machine learning systems actually learned from their training data is opaque. This tool helps us understand how CLIP, and therefore the models that use CLIP, see images.

What does it do?

Given an image and a series of (text) phrases it calculates the relative likelihood of each phrase to be a good description of the image. Note that this is not the same thing as "given a text phrase, calculate the accuracy of that phrase".

An example

Let's show it the painting "Brücke über die Marne bei Creteil" by Cézanne. If we download the 2,175 × 1,713 pixel version of the painting and open it (e.g. using PIL.Image.open from the package pillow) as image we can then pass it to the gaze command.

# Assuming you have already saved the image to `image`
import clip_gaze
pprint.pprint(
    {
        "artist": clip_gaze.gaze(image, clip_gaze.ARTISTS_BY_TRAINING_PREVALENCE[:200]),
        "surface": clip_gaze.gaze(image, clip_gaze.SURFACES),
        "movement": clip_gaze.gaze(image, clip_gaze.MOVEMENTS)
    }

# Returns
{'artist': ['by paul cézanne (82%)',
            'by clyfford still (07%)',
            'by arnold böcklin (04%)',
            'by franz kline (01%)',
            'by giorgio de chirico (01%)'],
 'movement': ['tonalism movement (16%)',
              'impressionism movement (09%)',
              'american scene painting movement (09%)',
              'modern european ink painting movement (09%)',
              'post-impressionism movement (09%)'],
 'surface': ['on canvas (86%)',
             'on paperboard (11%)',
             'on vellum (01%)',
             'on wood (01%)',
             'on card stock (00%)']}

As you can see CLIP suggests that, of the options provided, the terms "by paul cézanne", "tonalism movement", and "on canvas" are the most likely to describe the input image.

gaze works by having CLIP assess the relative likelihood of the options within each category. Here is a table of lists built into the module.

Variable Description Example
ARTISTS_BY_NAME List of 6000 artists in alphabetical order. Sandra Chevrier
ARTISTS_BY_TRAINING_PREVALENCE List of 900 artists in the order of prevalence in the training data (most prevalent first). Sometimes the most famous artists are credited without a first name, and so you may find those as separate entries alongside their full name. Sandra Chevrier
MOVEMENTS Artistic movement Afrofuturism
PAINTING_MATERIALS Materials for creating paintings Acrylic Paint
PRINTING_TECHNIQUES Technique for creating an impression Aquatint
QUALITIES Subjective (even more-so than the others) assessment of artwork Exceptional
SCULTPURE_MATERIALS Material that is sculpted into artwork Bronze
SITES Art websites, each of which have their own tastes (and phrasing) Popular on Reddit
SURFACES Material to which the artistic material is applied Canvas
TOOLS Object this is used to apply the material to the surface Brush

For example:

clip_gaze.MOVEMENTS # A list of the prompts describing art history movements

Arguments for gaze

Argument Description Default
image The image to inspect Required
prompts A list of prompts, see earlier table for examples Required
batch_size Limit how many prompts to inspect at once. This defaults to None (meaning all inputs are inspected at the same time). If you have insufficient vram then consider setting this to 10 to start with. None
only_show_best Show only this many results in each category, set it to None for no limit 5
format_output Turn the output into something easier for people to read (e.g. percentage in brackets) True
device Defaults to "cuda" (which will run on the gpu) and falls back to "cpu" if cuda is not available. "cuda"

Finer control

The clip_gaze.gaze command wraps multiple calls to clip_gaze.probabilities, selecting the highest-probabilitiy options and formatting text. If you want raw results based then skip gaze and instead use:

clip_gaze.probabilities(image, clip_gaze.ARTISTS_BY_NAME)

# Returns the probability scores for all 6000 or so artists in the list

How does it work?

CLIP is a tool provided by OpenAI that calculates the similarity between an image and some text. This is a machine learning system trained on an enormous amount of data, and that data will contain biases (intentional and unintentional). It is not a source of truth, but a useful tool to give you ideas about where to search next.

This tool works by downloading CLIP onto your computer and running it locally. This is not an easy task for all computers, especially older ones. See the "Arguments for gaze" section above for a way to change memory load.

Biases

This software is built on a machine learning system, and the biases in this tool come in two parts:

  1. CLIP itself comes with its own biases, and we refer the user to OpenAI's own work on explaining and mitigating that bias
  2. The lists of chosen phrases

The lists used in this software are primarily from Wikipedia and from the training data that CLIP used. Neither of these sources are perfect, and care should be taken when using this software to account for these biases where possible. Although the lists are long (e.g. the list of 6000 artists) there are no claims of completeness or relative importance made.

clip-gaze's People

Contributors

hmillerbakewell avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

clip-gaze's Issues

About CLIP decoding

Hey, I saw your comment on this reddit thread about CLIP text decoding and I'd like to inform you of a repository that does that.

This repository: https://github.com/dhg-wei/DeCap looks like just an image captioning repository, on the surface. But, in the backbone, their method first projects the image embedding to text space and THEN decodes it. Their paper is super clear about that.

They have a pretrained model available, though some light tweaking may be necessary to remove the input image requirement.

Cheers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.