Giter Site home page Giter Site logo

hpsv2's People

Contributors

enderfga avatar jaeger416 avatar tgxs002 avatar w-zhih avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hpsv2's Issues

Access for New Test data in README

Hello, I am seeking to use new test data in README as a benchmark for inference.
I thought that the released test dataset was the HPDv2 test dataset.

Then, How can I access the New Test dataset?

Unable to reproduce the preference prediction accuracy results

Hey, thanks so much for your great work and for releasing all your resources publicly.

I was interested in reproducing table 6 in your paper:
Screenshot 2024-04-05 at 1 13 55 PM

To reproduce the CLIP-ViT-H-14 and PickScore model results, I modified your evaluation script slightly to use the following script:

# adopted from: https://github.com/tgxs002/HPSv2/blob/master/hpsv2/evaluation.py

import os
import json
import numpy as np
from tqdm import tqdm
from argparse import ArgumentParser
from PIL import Image
from tqdm import tqdm
import huggingface_hub

import torch
from torch.utils.data import Dataset, DataLoader

from open_clip import create_model_and_transforms, get_tokenizer
from hpsv2.utils import root_path, hps_version_map
from transformers import AutoModel, AutoProcessor

### copied from: https://github.com/tgxs002/HPSv2/blob/866735ecaae999fa714bd9edfa05aa2672669ee3/hpsv2/src/training/train.py#L358
def inversion_score(p1, p2):
    assert len(p1) == len(p2), f'{len(p1)}, {len(p2)}'
    n = len(p1)
    cnt = 0
    for i in range(n-1):
        for j in range(i+1, n):
            if p1[i] > p1[j] and p2[i] < p2[j]:
                cnt += 1
            elif p1[i] < p1[j] and p2[i] > p2[j]:
                cnt += 1
    return 1 - cnt / (n * (n - 1) / 2)

class RankingDataset(Dataset):
    def __init__(self, meta_file, image_folder, transforms, tokenizer):
        self.transforms = transforms
        self.image_folder = image_folder     
        self.open_image = Image.open
        self.tokenizer = tokenizer

        with open(meta_file, 'r') as f:
            self.test_dict = json.load(f)
    
    def __len__(self):
        return len(self.test_dict)

    def __getitem__(self, idx):
        try:
            dict_ = self.test_dict[idx]
            if self.transforms is not None:
                images = [self.transforms(self.open_image(os.path.join(self.image_folder, file_names))) for file_names in dict_['image_path']]
            else:
                images = [self.open_image(os.path.join(self.image_folder, file_names)) for file_names in dict_['image_path']]

            paths = [os.path.join(self.image_folder, file_names) for file_names in dict_['image_path']]
            label = dict_['rank']
            if self.tokenizer is None:
                caption = dict_['prompt']
            else:
                caption = self.tokenizer(dict_['prompt'])
            return images, paths, label, caption
        except Exception as e:
            raise e
            # return self.__getitem__((idx + 1) % len(self))

def evaluate_rank(data_path, image_folder, model, batch_size, preprocess_val, tokenizer, device):
    meta_file = data_path + '/hpdv2_test.json' # this is taken from: https://huggingface.co/datasets/ymhao/HPDv2/tree/main
    dataset = RankingDataset(meta_file, image_folder, preprocess_val, None)

    score = 0
    total = len(dataset)
    all_rankings = []
    with torch.inference_mode(), torch.cuda.amp.autocast():
        for sample in tqdm(dataset, total=len(dataset), ascii=True):
            images, paths, labels, caption = sample

            processed_images = tokenizer(
                images=images,
                padding=True,
                truncation=True,
                max_length=77,
                return_tensors="pt",
            ).to(device)
            image_tensor = model.get_image_features(**processed_images)

            c1 = tokenizer(
                text=caption,
                padding=True,
                truncation=True,
                max_length=77,
                return_tensors="pt",
            ).to(device)
            caption_tensor = model.get_text_features(**c1)

            image_tensor /= image_tensor.norm(dim=-1, keepdim=True)
            caption_tensor /= caption_tensor.norm(dim=-1, keepdim=True)

            num_images = image_tensor.shape[0]

            logits_per_image = model.logit_scale.exp() * image_tensor @ caption_tensor.T
            logits_per_image = logits_per_image.squeeze(-1)

            predicted = list(torch.argsort(-logits_per_image).cpu().numpy())

            score += inversion_score(predicted, labels)

    print('ranking_acc:', score/total)

def initialize_model():

    device = "cuda"
    processor_name_or_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
    model_pretrained_name_or_path = "yuvalkirstain/PickScore_v1" # for pickscore
    # model_pretrained_name_or_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K" # for clip model
    processor = AutoProcessor.from_pretrained(processor_name_or_path)
    model = AutoModel.from_pretrained(model_pretrained_name_or_path).eval().to(device)

    return model, processor

def evaluate(mode: str, root_dir: str, data_path: str = os.path.join(root_path,'datasets/benchmark'), checkpoint_path: str = None, batch_size: int = 20, hps_version: str = "v2.1") -> None:
    
    model, processor = initialize_model()
    
    evaluate_rank(data_path, root_dir, model, batch_size, None, processor, device='cuda')

if __name__ == '__main__':
    # Parse arguments
    parser = ArgumentParser()
    parser.add_argument('--data-type', type=str, required=True, choices=['benchmark', 'benchmark_all', 'test', 'ImageReward', 'drawbench'])
    # this is the path to the folder where the test json is located
    parser.add_argument('--data-path', type=str, required=True, help='path to dataset')
    # this is the path to the folder where the test images are located
    parser.add_argument('--image-path', type=str, required=True, help='path to image files')
    parser.add_argument('--checkpoint', type=str, default=os.path.join(root_path,'HPS_v2_compressed.pt'), help='path to checkpoint')
    parser.add_argument('--batch-size', type=int, default=20)
    args = parser.parse_args()
    
    evaluate(mode=args.data_type, data_path=args.data_path, root_dir=args.image_path, checkpoint_path=args.checkpoint, batch_size=args.batch_size)

When I run this script with both CLIP and PickScore on the HPDv2 test set, I get the following results:

Model Results from paper Reproduced results with the above script
CLIP ViT-H-14 65.10 50.50
PickScore 79.80 48.84

Could you please check and verify this script and the numbers to see if there was any issue from my side in reproducing the numbers? I am unable to figure out any other reasons for why I cannot reproduce the numbers from table 6. Thanks!

the given training data link cannot be downloaded

Hey there! Thanks for sharing! However, the training set download link you provided is really slow when downloading in Chrome and it's not able to complete the download. Would it be possible for you to upload it to HuggingFace or GoogleDrive? Thanks a lot!

About SDXL-Base-v1.0 performance

Hi, I'm really impressed by your work and it will be very helpful for T2I research community.

I have two questions.

  1. In your benchmark, I wonder the test settings for the SDXL-Base-v0.9 peformance, e.g., inference resolution, cfg_scale, the number of steps.

  2. Have you experimented SDXL-Base-v1.0 model?

When I tried to examine the SDXL-Base-v1.0 (using huggingface), I got this:

setting:

  • resolution: 512x512
  • #steps: 50
  • cfg_scale: 5.0 (default in huggingface)
Model paintings photo anime concept-art average
SDXL-Base v1.0 0.2655 0.2618 0.269 0.264 0.2651

This result is not better than your SDXL-Base-v0.9.

It will be helpful if you comment on this.

Thanks in advance :)

Do you test in HPSv1 datasets by using HPSv2 checkpoint?

Hi, I use HPSv2 checkpoint to test HPSv1 datasets, and I get 59.51% acc. But if I use HPSv1 checkpoint, I get 65.44% acc. Why make it worse? Domain adapter?
Btw, the aesthetic predictor will get 55.57% acc in HPSv1. Does it normal?
The num_images is a tensor of 2, such as [2, 2, 2, 2...].
HPSv2 checkpoint to test HPSv1 datasets

    for batch in bar:
        images, num_images, labels, caption, rank = batch
        images = images.cuda()
        num_images = num_images.cuda()
        # labels = labels.cuda()
        caption = caption.cuda()
        rank = rank.cuda()

        with torch.no_grad():
            image_features = model.encode_image(images)
            text_features = model.encode_text(caption)

            image_features = image_features / image_features.norm(dim=-1, keepdim=True)
            text_features = text_features / text_features.norm(dim=-1, keepdim=True)

            logits_per_image = image_features @ text_features.T
            paired_logits_list = [logit[:, i] for i, logit in enumerate(logits_per_image.split(num_images.tolist()))]
        predicted = [torch.argsort(-k) for k in paired_logits_list]
        hps_ranking = [[predicted[i].tolist().index(j) for j in range(n)] for i, n in enumerate(num_images)]
        rank = [i for i in rank.split(num_images.tolist())]
        score += sum([inversion_score(hps_ranking[i], rank[i]) for i in range(len(hps_ranking))])
    ranking_acc = score / total
    print(ranking_acc)

HPSv1 checkpoint to test HPSv1 datasets

    for batch in bar:
        images, num_images, labels, caption, rank = batch
        images = images.cuda()
        num_images = num_images.cuda()
        # labels = labels.cuda()
        caption = caption.cuda()
        rank = rank.cuda()

        with torch.no_grad():
            with torch.cuda.amp.autocast():
                outputs = model(images, caption)
                image_features, text_features, logit_scale = outputs["image_features"], outputs["text_features"], outputs[
                    "logit_scale"]
                logits_per_image = logit_scale * image_features @ text_features.T 
                paired_logits_list = [logit[:, i] for i, logit in enumerate(logits_per_image.split(num_images.tolist()))]

        predicted = [torch.argsort(-k) for k in paired_logits_list]
        hps_ranking = [[predicted[i].tolist().index(j) for j in range(n)] for i, n in enumerate(num_images)]
        rank = [i for i in rank.split(num_images.tolist())]
        score += sum([inversion_score(hps_ranking[i], rank[i]) for i in range(len(hps_ranking))])
    ranking_acc = score / total * 100
    print(ranking_acc)

Reproduction of benchmark

Hi~, HPSv2 is really nice work. But when I reproduce the v2.1 benchmark, I can not get the same results reported in your readme. Could you tell me how to fix it please? These are my codes of jupyter notebook:

import torch
from PIL import Image
import hpsv2
from hpsv2.src.open_clip import create_model_and_transforms, get_tokenizer
import warnings
import argparse
import os
import requests
from clint.textui import progress
from typing import Union
import huggingface_hub
from hpsv2.utils import root_path, hps_version_map

#warnings.filterwarnings("ignore", category=UserWarning)

def score(model, img_path, prompt) -> list:    
    if isinstance(img_path, list):
        result = []
        for one_img_path in img_path:
            # Load your image and prompt
            with torch.no_grad():
                # Process the image
                if isinstance(one_img_path, str):
                    image = preprocess_val(Image.open(one_img_path)).unsqueeze(0).to(device=device, non_blocking=True)
                elif isinstance(one_img_path, Image.Image):
                    image = preprocess_val(one_img_path).unsqueeze(0).to(device=device, non_blocking=True)
                else:
                    raise TypeError('The type of parameter img_path is illegal.')
                # Process the prompt
                text = tokenizer([prompt]).to(device=device, non_blocking=True)
                # Calculate the HPS
                with torch.cuda.amp.autocast():
                    outputs = model(image, text)
                    image_features, text_features = outputs["image_features"], outputs["text_features"]
                    logits_per_image = image_features @ text_features.T

                    hps_score = torch.diagonal(logits_per_image).cpu().numpy()
            result.append(hps_score[0])    
        return result
    elif isinstance(img_path, str):
        # Load your image and prompt
        with torch.no_grad():
            # Process the image
            image = preprocess_val(Image.open(img_path)).unsqueeze(0).to(device=device, non_blocking=True)
            # Process the prompt
            text = tokenizer([prompt]).to(device=device, non_blocking=True)
            # Calculate the HPS
            with torch.cuda.amp.autocast():
                outputs = model(image, text)
                image_features, text_features = outputs["image_features"], outputs["text_features"]
                logits_per_image = image_features @ text_features.T

                hps_score = torch.diagonal(logits_per_image).cpu().numpy()
        return [hps_score[0]]
    elif isinstance(img_path, Image.Image):
        # Load your image and prompt
        with torch.no_grad():
            # Process the image
            image = preprocess_val(img_path).unsqueeze(0).to(device=device, non_blocking=True)
            # Process the prompt
            text = tokenizer([prompt]).to(device=device, non_blocking=True)
            # Calculate the HPS
            with torch.cuda.amp.autocast():
                outputs = model(image, text)
                image_features, text_features = outputs["image_features"], outputs["text_features"]
                logits_per_image = image_features @ text_features.T

                hps_score = torch.diagonal(logits_per_image).cpu().numpy()
        return [hps_score[0]]
    else:
        raise TypeError('The type of parameter img_path is illegal.')
        

For easily running with every image once, I split the original codes


model_dict = {}
device = 'cuda' if torch.cuda.is_available() else 'cpu'

    
model, preprocess_train, preprocess_val = create_model_and_transforms(
    'ViT-H-14',
    'laion2B-s32B-b79K',
    precision='amp',
    device=device,
    jit=False,
    force_quick_gelu=False,
    force_custom_text=False,
    force_patch_dropout=False,
    force_image_size=None,
    pretrained_image=False,
    image_mean=None,
    image_std=None,
    light_augmentation=True,
    aug_cfg={},
    output_dict=True,
    with_score_predictor=False,
    with_region_predictor=False
)
model_dict['model'] = model
model_dict['preprocess_val'] = preprocess_val



checkpoint = os.path.join(root_path,'HPS_v2_compressed.pt')
cp = None
hps_version = "v2.1"

model = model_dict['model']
preprocess_val = model_dict['preprocess_val']

# check if the checkpoint exists
if not os.path.exists(root_path):
    os.makedirs(root_path)
if cp is None:
    cp = huggingface_hub.hf_hub_download("xswu/HPSv2", hps_version_map[hps_version])

checkpoint = torch.load(cp, map_location=device)
model.load_state_dict(checkpoint['state_dict'])
tokenizer = get_tokenizer('ViT-H-14')
model = model.to(device)
model.eval()

Then I download the test data and reproduce the results of each categories(for example, photo).

from numpy import *
prompts = ["A man taking a drink from a water fountain.", ...]
root = '/my_path/HPDv2/SDXL-refiner-0.9/photo'
imgs = os.listdir(root)
imgs.sort()
ret = []
for i,n in enumerate(imgs):
    print(n, prompts[i])
    s = score(model, os.path.join(root, n), prompts[i])
    ret.append(s)
    
print(mean(ret))

And I get (31.52 v.s. 33.26) for anime, (26.51 v.s. 28.38 ) for photo.

please help check the use of hpsv2 func

我使用了import hpsv2 result = hpsv2.score(imgs_path, '') result来对同一prompt生成的图进行打分,同时又用python img_score.py --image-path assets/demo_image.jpg --prompt 'A cat with two horns on its head' 来对这组图中的单一图片进行打分,为什么两者分数有区别呢?

Having issues with using v2.1 inference

TypeError: score() got an unexpected keyword argument 'hps_version'
I got this error while trying to run the code .
result = hpsv2.score(images[i], captions[i],hps_version="v2.1")
I am trying to run multiple inference at the same time

About the evaluation of generating real people

Hello author, I would like to ask whether the code you provide can evaluate the quality of the generated human picture, including the generation error of the human body and the beauty of the human body. I hope to get your reply, thank you very much

format of test.json

About test.json:

[
    {
        'prompt': str,
        'image_path': list[str],
        'rank': list[int], # ranking for image at the same index in image_path
    },
    ...
]

I wonder know that if image_path is [better_img, worse_img, same_worse_img], the rank will be [1, 2, 2], am I right?

How to train with our custom data?

Hello,
Thanks a lot for open sourcing the awesome work. The paper and work looks promising, and i wish to train the model with my custom data.
Can you please guide me on the training data format and other details that the training pipeline would expect? Is there any doc/tutorial for training?
Thanks in advance.

For the test of custom data

For the test of custom data, is the prompt word only contains the content, can not appear about the style of the word? And I want to know what outputs["image_features"], outputs["text_features"], outputs["logit_scale"] mean in score.py, And finally what it means to use hps_score[0] as a score,
image

What is the score range?

Greetings!
Could you tell me what is the data range of HPSv2 output?
Since the HPSv2 model is actually a CLIP model, is the score range [-1, 1]?

ModuleNotFoundError: No module named '_curses'

Traceback (most recent call last):
File "d:\human-preference-CLIP\main.py", line 2, in
import hpsv2
File "D:\human-preference-CLIP\venv\lib\site-packages\hpsv2_init_.py", line 7, in
from . import evaluate as eval
File "D:\human-preference-CLIP\venv\lib\site-packages\hpsv2\evaluate.py", line 16, in
from hpsv2.src.training.train import calc_ImageReward, inversion_score
File "D:\human-preference-CLIP\venv\lib\site-packages\hpsv2\src\training\train.py", line 18, in
from .data import ImageRewardDataset, RankingDataset
File "D:\human-preference-CLIP\venv\lib\site-packages\hpsv2\src\training\data.py", line 4, in
from curses import meta
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\curses_init_.py", line 13, in
from _curses import *
ModuleNotFoundError: No module named '_curses'

Question about benchmark DALL-E 2 subset

Hello! Thanks for your excellent work. I'm curious about how the DALL-E subset in benchmark was collected. The images in datasets are 512x512, but Bing's image creator should output 1024x1024?

huggingface dataset path: HPDv2/benchmark/benchmark_imgs/DALLE.tar.gz

Where can I download the file ’ImageReward_test.json‘

Hi, i have reproduce the result on HPDV2. But when i tried to evaluate the hpsv2 on test set of ImageReward, I found the official json file cannot be directly used (i.e., suitable for ’ImageRewardDataset‘). Can you share ’ImageReward_test.json‘ so that i can reproduce the results on ImageReward? Many thanks!

Can I use it to compare different text-image pairs?

Hi, thanks for sharing such great work. However, there is a question.
If I only have a bunch of image and text pairs, that is, [[prompt1, image1], [prompt2, image2], ...[promptN, imageN]], there is a one-to-one correspondence between them, not a one-to-many relationship, may I ask? , in this case how should I use your model to rank the aesthetic and human preference scores of these image-text pairs?

Choos cuda device

It will be helpful if we can pass in the torch device in the hpsv2.score(..., device=torch.device("cuda") function. Currently, the model will be forced to run on GPU0

questions about ImageReward and HPS Comparison

It's exciting to see new results coming out of this field, and I can't wait to explore this project!
I have a couple of open questions I'd like to discuss with you here, I noticed the comparison between HPS V2 and ImageReward in the paper, they were not trained and tested on the same dataset, is this unfair?
1 Has ImageReward been retrained using our data in the ImageReward comparison?
2 How much did HPS V2 improve compared to ImageReward in the test results in the ImageReward dataset?

Model outputs don't seem to have enough granularity from low precision

When using the model on fairly large amounts of images, I get a lot of exact matches on score, which I assume is due to a precision issue, especially made apparent by the fp16 outputs -- this is a significant problem for comparing parameters in my use case where small differences matter and I need to be able to decide a winner. I edited the library manually to force it to run in FP32 precision by disabling the hardcoded AMP context managers and by casting the model itself to FP32, and it seems that that resolves the issue, albeit at the cost of it running about 3.5x slower (on Ampere).

Since the performance gap is significant, it would obviously be unreasonable to make this the default or only option, but is it possible that you could look into making an option to either run the model in full precision outright or create an alternative mixed-precision inference regime that allows for higher granularity?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.