frozenburning / text2light Goto Github PK
View Code? Open in Web Editor NEW[SIGGRAPH Asia 2022] Text2Light: Zero-Shot Text-Driven HDR Panorama Generation
Home Page: https://frozenburning.github.io/projects/text2light/
License: Other
[SIGGRAPH Asia 2022] Text2Light: Zero-Shot Text-Driven HDR Panorama Generation
Home Page: https://frozenburning.github.io/projects/text2light/
License: Other
Hey @FrozenBurning
Thanks for your awesome contribution! I am very interested in this job and have conducted some tests, and currently had some issues.
Using the provided weight file, there is a significant deviation between the image content and the text,such as:
comand: python text2light.py -rg logs/global_sampler_clip -rl logs/local_sampler --outdir ./generated_panorama --text "purple petal flower" --clip clip_emb.npy --sritmo ./logs/sritmo.pth --sr_factor 4
The images generated by two different inferences using the above command are significantly different from the text(purple petal flower):
comand: python text2light.py -rg logs/global_sampler_clip -rl logs/local_sampler_outdoor --outdir ./generated_panorama --text "Elephant, Watering Hole, Baby Elephant" --clip clip_emb.npy --sritmo ./logs/sritmo.pth --sr_factor 4
Except for the above,with using "eval mode" in the model and
And regardless of the type of local sampler used, the generated hrldr often has darker and colder tones, but the hdr is also overexposed, such as
Which parameters can be adjusted to improve the generation quality of hrldr and hdr images???
Also, please tell me how to handle the above problem。
Thanks!!!
can u tell me what the data set of this algorithm is?
thx
I would like to thank you for creating such a great work.
I am currently working on panorama prediction, and my pipeline currently produces LDR 360 panoramas. I would like to feed the output of my pipeline to the SRiTMO module to get HDR panoramas.
However, when I feed the LDR images from the Laval Indoor Dataset (which is what the paper was trained on) to the SRiTMO module, the output values reach into the millions, while the actual HDR images are not that high. Is this normal? Do you expect this to happen?
This behavior causes the RMSE of the inverse tone-mapping between SRiTMO and the ground truth to reach a range of millions, compared to less than one as reported in the paper (Table 3).
Or I may misunderstand the table. Could you please explain the exact method you used to calculate the RMSE?
You can find the LDR and HDR images from the Laval Indoor Dataset that I used to test here:
Best regards,
Pakkapon Phongthawee
Where can I retrain the model with my own dataset? And what are the configurations for the model to read the dataset that I can provide
I want to train with my own dataset and I preprocessed and everything, but its been tricky doing it all on colab because the trainingstage1.py is using different versions of pytorch lightning, does it have to be only in a conda environment?
Hey @FrozenBurning
Thanks for your awesome contribution!
I've been particularly interested in SRiTMO for it's simplicity and speed in generating HDRs from LDRs. I've been testing it isolated from the other parts of your algorithm and finding it has quite good results with regular LDR panos.
One thing I've noticed that I think is related to #6 is sometimes the HDRs come out overexposed. I find to rectify this, I need to adjust the balance, luma threshold and boost values. But it feels like it needs to be adjusted individually to various LDRs and is not very generalizable.
I'm curious as to if you think we should be doing further normalization operation according to the LDR luminance scale of "in-the-wild" LDRs? something similar to the luminance invariance scale normalization done on the training set?
Would love to know your thoughts here!
"ValueError: Cannot find ./logs/text2light_released_model/global_sampler_clip/"
Whenever I try running a 4K image, I get this error
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 GiB (GPU 0; 24.00 GiB total capacity; 4.58 GiB already allocated; 142.00 MiB free; 22.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I have 3090 with 24GB ram. Can you please suggest a solution for this?
"Hello, I am a doctoral student at Zhejiang University, and I am very interested in your research. Can you share your training dataset?"
It seems that the contrastive loss have no gradient on the network. The "gen_img_emb" is generated by fixed CLIP while the " psed_emb" is pre-computed in dataloader:
with torch.no_grad():
x_sample_nopix = self.decode_to_img(index_sample, [index_sample.shape[0], 256, 8, 16]) #hack
preprocess = _transform(224)
gen_img_emb = self.clip.encode_image(preprocess(x_sample_nopix))
gen_img_emb /= gen_img_emb.norm(dim=-1, keepdim=True)
psed_emb = batch['psed_emb']
sim = torch.cosine_similarity(gen_img_emb.unsqueeze(1), psed_emb.unsqueeze(0), dim=-1)
looking forward to your reply.
Dear authors, thanks for this amazing work!
I tried to generate the HDRIs using the pretrained models and render the balls using these HDRIs.
However, the rendering results seem to be brighter than yours. One of them is given below.
I don't know whether have I lose some steps, so I raise this issue.
Hello @FrozenBurning, thank you for sharing your impressive work!
I have read your paper and have a question regarding Figure 2. The spherical positional encoding diagram appears similar to the Integrated Positional Encoding (IPE) suggested by Mip-NeRF. However, I am uncertain about the mean and variance related to the sampling process, as it is not explained in the paper.
Would it be possible for you to kindly provide some explanation regarding the figure, specifically regarding the mean and variance of the sampling process? This would greatly enhance my understanding of your work.
Thank you very much!
Can this code run on 24G 3090 GPU?
thanks
Hi, thanks for sharing your great work!
I am curious that is it possible to do lighting estimation using GAN inversion as in Stylelight?
Dear author,
Thanks for releasing these fantastic code. I tried to generate a 8K hdr image with your repo but found some issue.
I changed the input
h=512 w=1024
into
h=1024 w=2048
in this file https://github.com/FrozenBurning/Text2Light/blob/master/text2light.py#L74
The generated holistic image is good, since these parameters will not affect this file.
However, the [2048x1024] ldr file has some problem(I resized it manually for uploading)
It seems that the spe coordinate should be changed with the input width and height. I'm not so sure what shoud be adjusted. Could you please look into it?
The command line I used is
python text2light.py -rg logs/global_sampler_clip -rl logs/local_sampler_outdoor --outdir ./generated_panorama --text "Beijing afternoon 4 pm" --clip clip_emb.npy --sritmo ./logs/sritmo.pth --sr_factor 4
Thank you!
Best wishes
Unless I'm missing something it doesn't seem to output HDR images, only PNG. Am I doing something wrong?
Hello, while trying to execute the last block of code from Google Collab, I'm given this error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-21-027de206138d>](https://localhost:8080/#) in <module>
10 if opt.resume_global:
11 if not os.path.exists(opt.resume_global):
---> 12 raise ValueError("Cannot find {}".format(opt.resume_global))
13 print("Resuming from global sampler ckpt...")
14 assert os.path.isdir(opt.resume_global), opt.resume_global
ValueError: Cannot find ./logs/text2light_released_model/global_sampler_clip/
All other code blocks seemed to have executed correctly, so I'm not entirely sure what the problem is.
Edit
Actually, it seems as though it didn't download all the files it needed to from Google Drive. Swapping over to the One Drive files seems to have fixed it.
This is really a great work. I have been very interested in generating panoramic images recently and have also tried your project. But why are the images generated using the same text the same? Is it necessary to set it up somewhere?
Dear sir,
Hi, I am working on your wonderful work, Text2Light.
I've found out that taming/data directory and files under the directory are not available in your repo.
I've also tried to get the same files from https://github.com/CompVis/taming-transformers, but some functions do not match to your setting (e.g., taming/data/custom.py - CustomTrainHolistic, CustomTestHolistic).
Could you upload the files?
Thanks a lot!
Hi,
I wonder can you share the text descriptions during the inference stage? So that we can evaluate our method with yours on an apple-to-apple basis
Hello, Thanks for your amazing work! About dataset , i'd like to ask about the size of pictures in the dataset, 1k,2k,4k or ? For example, Poly-Haven?
thank you for this very interesting project. as a visual effects artist, the future of AI visual effects tools excites me.
I am trying to run your project but an hitting a few walls.
$ python text2light.py -rg ./checkpoints/global_sampler_clip -rl ./checkpoints/local_sampler_outdoor --outdir ./generated_panorama --text "a landscape of mountains with clouds on a sunny day" --clip ./clip_emb.npy --sritmo ./logs/sritmo.pth --sr_factor 4
Resuming from global sampler ckpt...
logdir:./checkpoints/global_sampler_clip
./checkpoints/global_sampler_clip/checkpoints/last.ckpt
Deleting the first-stage restore-ckpt path from the config...
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
Traceback (most recent call last):
File "text2light.py", line 296, in <module>
global_sampler = load_model(config, ckpt, gpu, eval_mode)
File "text2light.py", line 259, in load_model
model = load_model_from_config(config.model, state_dict, gpu=gpu, eval_mode=eval_mode)["model"]
File "text2light.py", line 242, in load_model_from_config
model = instantiate_from_config(config)
File "/media/po/FasterTheFaster/apps/Text2Light/taming/util.py", line 28, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "/media/po/FasterTheFaster/apps/Text2Light/taming/models/global_sampler.py", line 135, in __init__
super().__init__(transformer_config, first_stage_config, cond_stage_config, permuter_config, ckpt_path, ignore_keys, first_stage_key, cond_stage_key, downsample_cond_size, pkeep, sos_token, unconditional)
File "/media/po/FasterTheFaster/apps/Text2Light/taming/models/base_sampler.py", line 39, in __init__
self.transformer = instantiate_from_config(config=transformer_config)
File "/media/po/FasterTheFaster/apps/Text2Light/taming/util.py", line 28, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "/media/po/FasterTheFaster/apps/Text2Light/taming/util.py", line 23, in get_obj_from_str
return getattr(importlib.import_module(module, package=None), cls)
File "/home/po/anaconda3/envs/text2light/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 843, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/media/po/FasterTheFaster/apps/Text2Light/taming/modules/transformer/mingpt.py", line 17, in <module>
from transformers import top_k_top_p_filtering
File "/home/po/anaconda3/envs/text2light/lib/python3.8/site-packages/transformers/__init__.py", line 43, in <module>
from . import dependency_versions_check
File "/home/po/anaconda3/envs/text2light/lib/python3.8/site-packages/transformers/dependency_versions_check.py", line 41, in <module>
require_version_core(deps[pkg])
File "/home/po/anaconda3/envs/text2light/lib/python3.8/site-packages/transformers/utils/versions.py", line 94, in require_version_core
return require_version(requirement, hint)
File "/home/po/anaconda3/envs/text2light/lib/python3.8/site-packages/transformers/utils/versions.py", line 85, in require_version
if want_ver is not None and not ops[op](version.parse(got_ver), version.parse(want_ver)):
File "/home/po/anaconda3/envs/text2light/lib/python3.8/site-packages/packaging/version.py", line 52, in parse
return Version(version)
File "/home/po/anaconda3/envs/text2light/lib/python3.8/site-packages/packaging/version.py", line 197, in __init__
raise InvalidVersion(f"Invalid version: '{version}'")
packaging.version.InvalidVersion: Invalid version: '0.10.1,<0.11'
not being a master of python it appears that one of the anaconda downloaded packages is out of date?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.