Giter Site home page Giter Site logo

frozenburning / text2light Goto Github PK

View Code? Open in Web Editor NEW
541.0 11.0 44.0 16.7 MB

[SIGGRAPH Asia 2022] Text2Light: Zero-Shot Text-Driven HDR Panorama Generation

Home Page: https://frozenburning.github.io/projects/text2light/

License: Other

Python 4.16% Jupyter Notebook 95.84%
hdr hdr-image hdri rendering siggraph-asia text2image inverse-tonemapping panorama 3d-generation

text2light's People

Contributors

frozenburning avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

text2light's Issues

The generated image content and text do not match

Hey @FrozenBurning

Thanks for your awesome contribution! I am very interested in this job and have conducted some tests, and currently had some issues.


Using the provided weight file, there is a significant deviation between the image content and the text,such as:

example 1

comand: python text2light.py -rg logs/global_sampler_clip -rl logs/local_sampler --outdir ./generated_panorama --text "purple petal flower" --clip clip_emb.npy --sritmo ./logs/sritmo.pth --sr_factor 4

The images generated by two different inferences using the above command are significantly different from the text(purple petal flower):
image

image

example 2

comand: python text2light.py -rg logs/global_sampler_clip -rl logs/local_sampler_outdoor --outdir ./generated_panorama --text "Elephant, Watering Hole, Baby Elephant" --clip clip_emb.npy --sritmo ./logs/sritmo.pth --sr_factor 4

image


Except for the above,with using "eval mode" in the model and [email protected]_grad()$ in inferencing, it seems that the previous inference will still affect the next inference result, causing the content of the image generated by the current inference to be similar to the previous text, with a significant deviation from the input text content of the current inference.

And regardless of the type of local sampler used, the generated hrldr often has darker and colder tones, but the hdr is also overexposed, such as

example1

hrldr:
image

hdr:
image

example2

hrldr:
image

hdr:
image

Which parameters can be adjusted to improve the generation quality of hrldr and hdr images???


Also, please tell me how to handle the above problem。

Thanks!!!

DataSet

can u tell me what the data set of this algorithm is?
thx

Question about LDR->HDR using SRiTMO

I would like to thank you for creating such a great work.

I am currently working on panorama prediction, and my pipeline currently produces LDR 360 panoramas. I would like to feed the output of my pipeline to the SRiTMO module to get HDR panoramas.

However, when I feed the LDR images from the Laval Indoor Dataset (which is what the paper was trained on) to the SRiTMO module, the output values reach into the millions, while the actual HDR images are not that high. Is this normal? Do you expect this to happen?

This behavior causes the RMSE of the inverse tone-mapping between SRiTMO and the ground truth to reach a range of millions, compared to less than one as reported in the paper (Table 3).

Or I may misunderstand the table. Could you please explain the exact method you used to calculate the RMSE?

You can find the LDR and HDR images from the Laval Indoor Dataset that I used to test here:

https://vistec-my.sharepoint.com/:f:/g/personal/pakkapon_p_s19_vistec_ac_th/EoiQ7nRJZkFDiLQnKIDLVvYBjRF4xKU2shCnfDI1g14zew?e=N1NQtG

Best regards,
Pakkapon Phongthawee

Retraining

Where can I retrain the model with my own dataset? And what are the configurations for the model to read the dataset that I can provide

Is it possible to do all the training on colab?

I want to train with my own dataset and I preprocessed and everything, but its been tricky doing it all on colab because the trainingstage1.py is using different versions of pytorch lightning, does it have to be only in a conda environment?

SRiTMO as a standalone LDR to HDR operator

Hey @FrozenBurning

Thanks for your awesome contribution!

I've been particularly interested in SRiTMO for it's simplicity and speed in generating HDRs from LDRs. I've been testing it isolated from the other parts of your algorithm and finding it has quite good results with regular LDR panos.

One thing I've noticed that I think is related to #6 is sometimes the HDRs come out overexposed. I find to rectify this, I need to adjust the balance, luma threshold and boost values. But it feels like it needs to be adjusted individually to various LDRs and is not very generalizable.

I'm curious as to if you think we should be doing further normalization operation according to the LDR luminance scale of "in-the-wild" LDRs? something similar to the luminance invariance scale normalization done on the training set?

Would love to know your thoughts here!

GPU memory issue

Whenever I try running a 4K image, I get this error

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 GiB (GPU 0; 24.00 GiB total capacity; 4.58 GiB already allocated; 142.00 MiB free; 22.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have 3090 with 24GB ram. Can you please suggest a solution for this?

dataset

"Hello, I am a doctoral student at Zhejiang University, and I am very interested in your research. Can you share your training dataset?"

About the contrastive loss

It seems that the contrastive loss have no gradient on the network. The "gen_img_emb" is generated by fixed CLIP while the " psed_emb" is pre-computed in dataloader:

with torch.no_grad():
        x_sample_nopix = self.decode_to_img(index_sample, [index_sample.shape[0], 256, 8, 16]) #hack
        preprocess = _transform(224)
        gen_img_emb = self.clip.encode_image(preprocess(x_sample_nopix))
        gen_img_emb /= gen_img_emb.norm(dim=-1, keepdim=True)
        
        psed_emb = batch['psed_emb']
        sim = torch.cosine_similarity(gen_img_emb.unsqueeze(1), psed_emb.unsqueeze(0), dim=-1)

looking forward to your reply.

There are problems with the generated HDRIs.

Dear authors, thanks for this amazing work!

I tried to generate the HDRIs using the pretrained models and render the balls using these HDRIs.

However, the rendering results seem to be brighter than yours. One of them is given below.

I don't know whether have I lose some steps, so I raise this issue.

hdr_ green grass field with trees and mountains in the distance _balls

Question about Fig. 2. Spherical Positional Encoding

Hello @FrozenBurning, thank you for sharing your impressive work!

I have read your paper and have a question regarding Figure 2. The spherical positional encoding diagram appears similar to the Integrated Positional Encoding (IPE) suggested by Mip-NeRF. However, I am uncertain about the mean and variance related to the sampling process, as it is not explained in the paper.

Would it be possible for you to kindly provide some explanation regarding the figure, specifically regarding the mean and variance of the sampling process? This would greatly enhance my understanding of your work.

Thank you very much!

image

lighting estimation

Hi, thanks for sharing your great work!

I am curious that is it possible to do lighting estimation using GAN inversion as in Stylelight?

The error of environment

Thanks for releasing this fantastic code. When I run the code, there is the error as follows:
image

Do you know how to solve this problem? thanks!

Trying to generate 8K hdr images

Dear author,

Thanks for releasing these fantastic code. I tried to generate a 8K hdr image with your repo but found some issue.
I changed the input
h=512 w=1024
into
h=1024 w=2048
in this file https://github.com/FrozenBurning/Text2Light/blob/master/text2light.py#L74

The generated holistic image is good, since these parameters will not affect this file.
holistic_ Sunset California beach
However, the [2048x1024] ldr file has some problem(I resized it manually for uploading)
ldr_ Sunset California beach

It seems that the spe coordinate should be changed with the input width and height. I'm not so sure what shoud be adjusted. Could you please look into it?

The command line I used is
python text2light.py -rg logs/global_sampler_clip -rl logs/local_sampler_outdoor --outdir ./generated_panorama --text "Beijing afternoon 4 pm" --clip clip_emb.npy --sritmo ./logs/sritmo.pth --sr_factor 4

Thank you!

Best wishes

No HDR export

Unless I'm missing something it doesn't seem to output HDR images, only PNG. Am I doing something wrong?

Bug in Google Collab version.

Hello, while trying to execute the last block of code from Google Collab, I'm given this error

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

[<ipython-input-21-027de206138d>](https://localhost:8080/#) in <module>
     10 if opt.resume_global:
     11     if not os.path.exists(opt.resume_global):
---> 12         raise ValueError("Cannot find {}".format(opt.resume_global))
     13     print("Resuming from global sampler ckpt...")
     14     assert os.path.isdir(opt.resume_global), opt.resume_global

ValueError: Cannot find ./logs/text2light_released_model/global_sampler_clip/

All other code blocks seemed to have executed correctly, so I'm not entirely sure what the problem is.

Edit
Actually, it seems as though it didn't download all the files it needed to from Google Drive. Swapping over to the One Drive files seems to have fixed it.

How to generate richer images

This is really a great work. I have been very interested in generating panoramic images recently and have also tried your project. But why are the images generated using the same text the same? Is it necessary to set it up somewhere?

Files under taming/data do not exist

Dear sir,

Hi, I am working on your wonderful work, Text2Light.
I've found out that taming/data directory and files under the directory are not available in your repo.

I've also tried to get the same files from https://github.com/CompVis/taming-transformers, but some functions do not match to your setting (e.g., taming/data/custom.py - CustomTrainHolistic, CustomTestHolistic).

Could you upload the files?
Thanks a lot!

Text descriptions

Hi,

I wonder can you share the text descriptions during the inference stage? So that we can evaluate our method with yours on an apple-to-apple basis

DataSet

Hello, Thanks for your amazing work! About dataset , i'd like to ask about the size of pictures in the dataset, 1k,2k,4k or ? For example, Poly-Haven?

newly built env invalid version

thank you for this very interesting project. as a visual effects artist, the future of AI visual effects tools excites me.

I am trying to run your project but an hitting a few walls.

$ python text2light.py -rg ./checkpoints/global_sampler_clip -rl ./checkpoints/local_sampler_outdoor --outdir ./generated_panorama --text "a landscape of mountains with clouds on a sunny day" --clip ./clip_emb.npy --sritmo ./logs/sritmo.pth --sr_factor 4
Resuming from global sampler ckpt...
logdir:./checkpoints/global_sampler_clip
./checkpoints/global_sampler_clip/checkpoints/last.ckpt
Deleting the first-stage restore-ckpt path from the config...
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
Traceback (most recent call last):
  File "text2light.py", line 296, in <module>
    global_sampler = load_model(config, ckpt, gpu, eval_mode)
  File "text2light.py", line 259, in load_model
    model = load_model_from_config(config.model, state_dict, gpu=gpu, eval_mode=eval_mode)["model"]
  File "text2light.py", line 242, in load_model_from_config
    model = instantiate_from_config(config)
  File "/media/po/FasterTheFaster/apps/Text2Light/taming/util.py", line 28, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "/media/po/FasterTheFaster/apps/Text2Light/taming/models/global_sampler.py", line 135, in __init__
    super().__init__(transformer_config, first_stage_config, cond_stage_config, permuter_config, ckpt_path, ignore_keys, first_stage_key, cond_stage_key, downsample_cond_size, pkeep, sos_token, unconditional)
  File "/media/po/FasterTheFaster/apps/Text2Light/taming/models/base_sampler.py", line 39, in __init__
    self.transformer = instantiate_from_config(config=transformer_config)
  File "/media/po/FasterTheFaster/apps/Text2Light/taming/util.py", line 28, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "/media/po/FasterTheFaster/apps/Text2Light/taming/util.py", line 23, in get_obj_from_str
    return getattr(importlib.import_module(module, package=None), cls)
  File "/home/po/anaconda3/envs/text2light/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/media/po/FasterTheFaster/apps/Text2Light/taming/modules/transformer/mingpt.py", line 17, in <module>
    from transformers import top_k_top_p_filtering
  File "/home/po/anaconda3/envs/text2light/lib/python3.8/site-packages/transformers/__init__.py", line 43, in <module>
    from . import dependency_versions_check
  File "/home/po/anaconda3/envs/text2light/lib/python3.8/site-packages/transformers/dependency_versions_check.py", line 41, in <module>
    require_version_core(deps[pkg])
  File "/home/po/anaconda3/envs/text2light/lib/python3.8/site-packages/transformers/utils/versions.py", line 94, in require_version_core
    return require_version(requirement, hint)
  File "/home/po/anaconda3/envs/text2light/lib/python3.8/site-packages/transformers/utils/versions.py", line 85, in require_version
    if want_ver is not None and not ops[op](version.parse(got_ver), version.parse(want_ver)):
  File "/home/po/anaconda3/envs/text2light/lib/python3.8/site-packages/packaging/version.py", line 52, in parse
    return Version(version)
  File "/home/po/anaconda3/envs/text2light/lib/python3.8/site-packages/packaging/version.py", line 197, in __init__
    raise InvalidVersion(f"Invalid version: '{version}'")
packaging.version.InvalidVersion: Invalid version: '0.10.1,<0.11'

not being a master of python it appears that one of the anaconda downloaded packages is out of date?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.