Giter Site home page Giter Site logo

openlrm's Introduction

logo

3DTopia

A two-stage text-to-3D generation model. The first stage uses diffusion model to quickly generate candidates. The second stage refines the assets chosen from the first stage.

demo.mp4

News

[2024/03/10] Our captions for Objaverse is released here.

[2024/03/04] Our technical report is released here.

[2024/01/18] We release a text-to-3D model 3DTopia!

Citation

@article{hong20243dtopia,
  title={3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors},
  author={Hong, Fangzhou and Tang, Jiaxiang and Cao, Ziang and Shi, Min and Wu, Tong and Chen, Zhaoxi and Wang, Tengfei and Pan, Liang and Lin, Dahua and Liu, Ziwei},
  journal={arXiv preprint arXiv:2403.02234},
  year={2024}
}

1. Quick Start

1.1 Install Environment for this Repository

We recommend using Anaconda to manage the environment.

conda env create -f environment.yml

1.2 Install Second Stage Refiner

Please refer to threefiner to install our second stage mesh refiner. We have tested installing both environments together with Pytorch 1.12.0 and CUDA 11.3.

1.3 Download Checkpoints [Optional]

We have implemented automatic checkpoint download for both gradio_demo.py and sample_stage1.py. If you prefer to download manually, you may download checkpoint 3dtopia_diffusion_state_dict.ckpt or model.safetensors from huggingface.

Q&A

  • If you encounter this error in the second stage ImportError: /lib64/libc.so.6: version 'GLIBC_2.25' not found, try to install a lower version of pymeshlab by pip install pymeshlab==0.2.

2. Inference

2.1 First Stage

Run the following command to sample a robot as the first stage. Results will be located under the folder results.

python -u sample_stage1.py --text "a robot" --samples 1 --sampler ddim --steps 200 --cfg_scale 7.5 --seed 0

Arguments:

  • --ckpt specifies checkpoint file path;
  • --test_folder controls which subfolder to put all the results;
  • --seed will fix random seeds; --sampler can be set to ddim for DDIM sampling (By default, we use 1000 steps DDPM sampling);
  • --steps controls sampling steps only for DDIM;
  • --samples controls number of samples;
  • --text is the input text;
  • --no_video and --no_mcubes suppress rendering multi-view videos and marching cubes, which are by-default enabled;
  • --mcubes_res controls the resolution of the 3D volumn sampled for marching cubes; One can lower this resolution to save graphics memory;
  • --render_res controls the resolution of the rendered video;

2.2 Second Stage

There are two steps as the second stage refinement. Here is a simple example. Please refer to threefiner for more detailed usage.

# step 1
threefiner sd --mesh results/default/stage1/a_robot_0_0.ply --prompt "a robot" --text_dir --front_dir='-y' --outdir results/default/stage2/ --save a_robot_0_0_sd.glb
# step 2
threefiner if2 --mesh results/default/stage2/a_robot_0_0_sd.glb --prompt "a robot" --outdir results/default/stage2/ --save a_robot_0_0_if2.glb

The resulting mesh can be found at results/default/stage2/a_robot_0_0_if2.glb

3. Acknowledgement

We thank the community for building and open-sourcing the foundation of this work. Specifically, we want to thank EG3D, Stable Diffusion for their codes. We also want to thank Objaverse for the wonderful dataset.

openlrm's People

Contributors

tengfei-wang avatar zexinhe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openlrm's Issues

DINO encoder

Dear authors:

Thank you for your great work! I wonder why you used DINOV1 instead DINOV2, which is more suitable for dense prediction task. Thank you!

How to load custom-trained model and inference?

Thank you for releasing the training code. I have train a model and I am wondering how to load it on the inference stage.
Could you please give me some example scripts or advice?

The checkpoints strcture:
exps/..../000100/
custom_checkpoint_0.pkl model.safetensors optimizer.bin random_states_0.pkl

Fixed focal

Thanks for your great work!
I noticed that you used a fixed focal length when rendering objaverse, which can cause the focal length of the input image to be inconsistent with the camera model. I’m curious why you didn’t use a random focal length.

Issue serving a model trained with the provided training code

I'm trying to run inference on a custom model, trained with the provided code, but there seems to be a problem with building the model:

self.model = self._build_model(self.cfg).to(self.device)
def _build_model(self, cfg):
from openlrm.models import model_dict
hf_model_cls = wrap_model_hub(model_dict[self.EXP_TYPE])
model = hf_model_cls.from_pretrained(cfg.model_name)
return model

(venv) root@bc700a1d6a6c:/workspace/OpenLRM# python -m openlrm.launch infer.lrm --infer=configs/infer-b.yaml model_name=exps/checkpoints/lrm-objaverse/overfitting-test/001000 image_input=test.png export_video=true export_mesh=true
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
config.json not found in /workspace/OpenLRM/exps/checkpoints/lrm-objaverse/overfitting-test/001000
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/OpenLRM/openlrm/launch.py", line 36, in <module>
    main()
  File "/workspace/OpenLRM/openlrm/launch.py", line 31, in main
    with RunnerClass() as runner:
  File "/workspace/OpenLRM/openlrm/runners/infer/lrm.py", line 121, in __init__
    self.model = self._build_model(self.cfg).to(self.device)
  File "/workspace/OpenLRM/openlrm/runners/infer/lrm.py", line 126, in _build_model
    model = hf_model_cls.from_pretrained(cfg.model_name)
  File "/workspace/OpenLRM/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/workspace/OpenLRM/venv/lib/python3.10/site-packages/huggingface_hub/hub_mixin.py", line 277, in from_pretrained
    instance = cls._from_pretrained(
  File "/workspace/OpenLRM/venv/lib/python3.10/site-packages/huggingface_hub/hub_mixin.py", line 485, in _from_pretrained
    model = cls(**model_kwargs)
TypeError: wrap_model_hub.<locals>.HfModel.__init__() missing 1 required positional argument: 'config'

and the folder that is passed as model_name argument looks like this:

exps
|-- checkpoints
|   `-- lrm-objaverse
|       `-- overfitting-test
|           `-- 001000
|               |-- custom_checkpoint_0.pkl
|               |-- model.safetensors
|               |-- optimizer.bin
|               `-- random_states_0.pkl

which contains a file named model.safetensors as required by huggingface_hub when initialising from path.

From some tests it seems that the method hf_model_cls.from_pretrained needs as dictionary the section "model" from file configs/train-sample.yaml

model:
    camera_embed_dim: 1024
    rendering_samples_per_ray: 96
    transformer_dim: 512
    transformer_layers: 12
    transformer_heads: 8
    triplane_low_res: 32
    triplane_high_res: 64
    triplane_dim: 32
    encoder_type: dinov2
    encoder_model_name: dinov2_vits14_reg
    encoder_feat_dim: 384
    encoder_freeze: false

But even so, after passing this as a dictionary, the code breaks a bit further:

(venv) root@bc700a1d6a6c:/workspace/OpenLRM# python -m openlrm.launch infer.lrm --infer=configs/infer-b.yaml model_name=exps/checkpoints/lrm-objaverse/overfitting-test/001000 image_input=test.png export_video=true export_mesh=true
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
config.json not found in /workspace/OpenLRM/exps/checkpoints/lrm-objaverse/overfitting-test/001000
[2024-03-25 15:34:32,383] openlrm.models.modeling_lrm: [INFO] Using DINOv2 as the encoder
/workspace/OpenLRM/openlrm/models/encoders/dinov2/layers/swiglu_ffn.py:43: UserWarning: xFormers is available (SwiGLU)
  warnings.warn("xFormers is available (SwiGLU)")
/workspace/OpenLRM/openlrm/models/encoders/dinov2/layers/attention.py:27: UserWarning: xFormers is available (Attention)
  warnings.warn("xFormers is available (Attention)")
/workspace/OpenLRM/openlrm/models/encoders/dinov2/layers/block.py:39: UserWarning: xFormers is available (Block)
  warnings.warn("xFormers is available (Block)")
Loading weights from local directory
  0%|                                                                                                                                                                                                                              | 0/1 [00:00<?, ?it/s]/workspace/OpenLRM/openlrm/datasets/cam_utils.py:153: UserWarning: Using torch.cross without specifying the dim arg is deprecated.
Please either pass the dim explicitly or simply use torch.linalg.cross.
The default value of dim will change to agree with that of linalg.cross in a future release. (Triggered internally at ../aten/src/ATen/native/Cross.cpp:63.)
  x_axis = torch.cross(up_world, z_axis)
  0%|                                                                                                                                                                                                                              | 0/1 [00:14<?, ?it/s]
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/OpenLRM/openlrm/launch.py", line 36, in <module>
    main()
  File "/workspace/OpenLRM/openlrm/launch.py", line 32, in main
    runner.run()
  File "/workspace/OpenLRM/openlrm/runners/infer/base_inferrer.py", line 62, in run
    self.infer()
  File "/workspace/OpenLRM/openlrm/runners/infer/lrm.py", line 298, in infer
    self.infer_single(
  File "/workspace/OpenLRM/openlrm/runners/infer/lrm.py", line 258, in infer_single
    mesh = self.infer_mesh(planes, mesh_size=mesh_size, mesh_thres=mesh_thres, dump_mesh_path=dump_mesh_path)
  File "/workspace/OpenLRM/openlrm/runners/infer/lrm.py", line 221, in infer_mesh
    vtx_colors = self.model.synthesizer.forward_points(planes, vtx_tensor)['rgb'].squeeze(0).cpu().numpy()  # (0, 1)
  File "/workspace/OpenLRM/openlrm/models/rendering/synthesizer.py", line 206, in forward_points
    for k in outs[0].keys()
IndexError: list index out of range

Could anyone help here?

MVImgNet code

could you release the part of processing MVImgNet datasets please? I notice that you use it in your experiment, but commented code of this part.

About training dataset

Hi, could you provide the your rendering code for objaverse? we wanna train OpenLRM by myself

Error in exporting mesh and video when applying official examples

similar as closed issue #1, but I did not find an answer.

when I try the official example exporting videos, the videos are blank
python -m lrm.inferrer --model_name openlrm-base-obj-1.0 --source_image ./assets/sample_input/owl.png --export_video

In the videos I output, it shows a white image with a faintly visible object rotating in the center, and the object is very unclear with the white background. I have tried this on both 2080ti and 3090, also used the openlrm-small-obj-1.0 model and other example images, but the results are similar.

when I try the official example exporting mesh the vertices are empty.

python -m lrm.inferrer --model_name openlrm-base-obj-1.0 --source_image ./assets/sample_input/owl.png --export_mesh

lrm/models/rendering/synthesizer.py", line 189, in forward_points

for k in outs[0].keys()

IndexError: list index out of range

I looked at the result after executing forward_planes in line 165 inferrer.py.
forwardplane

and result after excuting model.synthesizer in line 157 inferrer.py.,in which the images_rgb values for each frame seem to be abnormal.
image_rgb

How can I fix it and perform similar as official example?

question about inference

hello! when I run python -m openlrm.launch infer.lrm --infer "./configs/infer-b.yaml" model_name="zxhezexin/openlrm-mix-base-1.1" image_input="./assets/sample_input/owl.png" export_video=true export_mesh=trueas the example did , it failed, error message:
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/data1/ubu/OpenLRM/openlrm/launch.py", line 36, in
main()
File "/data1/ubu/OpenLRM/openlrm/launch.py", line 31, in main
with RunnerClass() as runner:
^^^^^^^^^^^^^
File "/data1/ubu/OpenLRM/openlrm/runners/infer/lrm.py", line 121, in init
self.model = self._build_model(self.cfg).to(self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data1/ubu/OpenLRM/openlrm/runners/infer/lrm.py", line 126, in _build_model
model = hf_model_cls.from_pretrained(cfg.model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubu/anaconda3/envs/lrm/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/ubu/anaconda3/envs/lrm/lib/python3.11/site-packages/huggingface_hub/hub_mixin.py", line 277, in from_pretrained
instance = cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/home/ubu/anaconda3/envs/lrm/lib/python3.11/site-packages/huggingface_hub/hub_mixin.py", line 485, in _from_pretrained
model = cls(**model_kwargs)
^^^^^^^^^^^^^^^^^^^
TypeError: wrap_model_hub..HfModel.init() missing 1 required positional argument: 'config'

my environment python=3.11
torch=2.1.2 cuda=11.8
Is there anything wrong with my operation?

Computation Cost

Thanks for open-sourcing this wonderful work.

I'm curious about the computation cost for training your LRM on the Objaverse dataset.

For example, it takes 128 A100-40G GPUs for 3 days to complete training on the Objaverse + MVImgNet dataset in the original paper.

Question about default source camera and the render camera setting.

I noticed that only when source camera following the setting that azimuth=270, elevation=0, the triplane would render the same image as input under input pose.
My question is, when I have a image under pose $T_0$, and I want to render a image under pose $T_1$. How should I adjust the render camera?
I've tried to get the ref_pose $T_2 = T_dT_0^{-1}$ where $T_d$ is the default pose, and convert $T_1$ with ref_pose, but it totally gives an wrong image.
In my setting, the camera distance is 1.5, and the input azimuth is 2.3562(rad), elevation is -0.1317(rad)
the target azimuth is 0, elevation is 0.52359878.
The input image
006
And the target image
000
Both two image are rendered by blender.

How can I solve this issue on the colab implementation?

Example usage

EXPORT_VIDEO=True
EXPORT_MESH=True
INFER_CONFIG="./configs/infer-b.yaml"
MODEL_NAME="zxhezexin/openlrm-mix-base-1.1"
IMAGE_INPUT="./assets/sample_input/owl.png"

!python -m openlrm.launch infer.lrm --infer $INFER_CONFIG model_name=$MODEL_NAME image_input=$IMAGE_INPUT export_video=$EXPORT_VIDEO export_mesh=$EXPORT_MESH

/usr/bin/python3: No module named openlrm

Export mesh not working

I am trying the official example but the vertices are empty

python -m lrm.inferrer --model_name lrm-base-obj-v1 --source_image ./assets/sample_input/owl.png --export_mesh

lrm/models/rendering/synthesizer.py", line 189, in forward_points
for k in outs[0].keys()
IndexError: list index out of range

Fine tuning on a dataset

  1. Do we need to add 'views' folder as root path in training config (views folder has rgba and pose folders) or do we need to add models to directory as well?
  2. How to configure model for fine tuning?

CUDA OOM in model v1.1; Not able to run v1.0

Hi,

thanks for open-sourcing the code.

After today's push, i meet CUDA OOM issue for all 6 v1.1 models with the new code base. I have a 4090 24GB which is still having the OOM issue.

Meanwhile, i have trouble with loading the model v1.0 with current code base. It shows json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 3 column 1 (char 44).

The v1.0 model is working in the last commit d4caebb.

I guess there is some trouble with the code base. Could you please have a look?

Thanks!

Details about the dataset

Thanks for open-sourcing this amazing work. I am curious about the details of the dataset used for training, i.e. the scale and radius when rendering the objaverse dataset? And what's the orientation of the camera coordinate system, is it x front, y right and z up?

Possible to release rendered Objaverse images?

Thank you so much for this fantastic work!

Though I noticed the doc of the training stage is not available yet, may I have the images rendered from the Objaverse dataset so that I can finetune the models by myself? It seems like a long time would be required to render the whole Objaverse dataset,

How to inference custom-trained model

Hello OpenLRM Team,

Following the guidance provided in issue #24, I've managed to run python scripts/convert_hf.py --config <YOUR_EXACT_TRAINING_CONFIG> using the files from exps/checkpoints such as custom_check and model.sfet. This process successfully generated a config.json file and model.safetensors in the releases folder. However, I'm struggling to use these files for inference. Following the instructions in the readme section on inference, I keep encountering 'key not found' errors. It seems I might be setting the properties incorrectly, but I can't figure out the right approach. Could you provide additional guidance on how to proceed with inference using trained models? Any assistance would be greatly appreciated.

Release the scene level model

Dear Authors,

I greatly appreciate your efforts in open-sourcing this impressive model. I am interested in the upcoming release of the scene-level pretrained model on MV-imagenet+Objaverse. Could you kindly share the expected release date for this? Additionally, I would like to inquire if there are any intentions to train the model on Objectverse XL in the future. Thank you for your attention to these queries.

The performance of released model

I did not find information about the evaluation of the released models. Are there any evaluations of the released models, that can be related to the original paper in the same settings?

Inference error

$ python -m lrm.inferrer --model_name openlrm-base-obj-1.0 --source_image ./assets/sample_input/owl.png --export_video

======== Loaded model from checkpoint ========
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/mnt/bn/dika/mlx/workspace/project/OpenLRM/lrm/inferrer.py", line 265, in
inferrer.infer(
File "/mnt/bn/dika/mlx/workspace/project/OpenLRM/lrm/inferrer.py", line 210, in infer
results = self.infer_single(
File "/mnt/bn/dika/mlx/workspace/project/OpenLRM/lrm/inferrer.py", line 155, in infer_single
planes = self.model.forward_planes(image, source_camera)
File "/mnt/bn/dika/mlx/workspace/project/OpenLRM/lrm/models/generator.py", line 91, in forward_planes
planes = self.transformer(image_feats, camera_embeddings)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/bn/dika/mlx/workspace/project/OpenLRM/lrm/models/transformer.py", line 126, in forward
x = layer(x, image_feats, camera_embeddings)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/bn/dika/mlx/workspace/project/OpenLRM/lrm/models/transformer.py", line 79, in forward
x = x + self.self_attn(before_sa, before_sa, before_sa)[0]
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1160, in forward
return torch._native_multi_head_attention(
TypeError: _native_multi_head_attention(): argument 'qkv_bias' (position 7) must be Tensor, not NoneType

Seeking clarification on a code snippet in synthesizer.py

Hello 3DTopia,

I hope this message finds you well. I am a user of the OpenLRM repository and while exploring the code, I came across a specific line that I'm struggling to comprehend. The code snippet in question can be found at this link: link to the code snippet.

The line I'm referring to is:

rgb = torch.sigmoid(x[..., 1:])(1 + 2*0.001) - 0.001 # Utilizes sigmoid clamping from MipNeRF

I would greatly appreciate it if you could provide some clarification on its intended meaning and functionality. Specifically, I would like to understand how the sigmoid clamping from MipNeRF is being utilized to affect the RGB values.

Thank you for taking the time to help me understand this code better. I eagerly await your response.

Objaverse + MVImgNet Models

Awesome work! Working on a project with openlrm and i am wondering when the Objaverse + MVImgNet weights may be released?

How the training set is rendered for Objaverse

Hi, amazing work, and it's nice to see an open-sourced version of LRM!

I wonder how the camera and object scale are defined when rendering the Objaverse dataset.
Based on your code, the camera is at a distance of 2, but can you confirm that?
How is the scene normalized?

Thanks

HF Demo - 3D model?

HF demo seems to have output video only - Why not also expose the 3d model for download?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.