lukashermann / hulc Goto Github PK
View Code? Open in Web Editor NEWHierarchical Universal Language Conditioned Policies
Home Page: http://hulc.cs.uni-freiburg.de
License: MIT License
Hierarchical Universal Language Conditioned Policies
Home Page: http://hulc.cs.uni-freiburg.de
License: MIT License
Hi, I tried using the dataset ABC to train the model. I downloaded the task_ABC_D.zip and tried to unzip it. I tried the commands unzip
and 7z
but both can not unzip it successfully. The error said 'start of central directory not found zip file corrupt'. I am sure the zip file has already been fully downloaded (518 GB).
Thank you for your great work! I wonder how to evaluate the trained model in a rl manner. Can you provide an example? Thx.
Hello, first of all thank you for this great work. I have a question about the action decoder. Here you are using a mixture distribution of 10 logistic distributions. Using linear layers after the RNN you have predicted the means, logit_scales, and logit_probs for the distributions. I am wondering how exactly you train the distributions.
I know that the best distribution is the most weighted, but what exactly does that mean for the other distributions in terms of mean, logit_scale, and logit_probs? What is the effect of backpropagation here?
Naively, shouldn't all 10 distributions converge to the optimum?
Thank you for your time and help.
Hi,
I am trying to evaluate the provided pretrained models. I followed the hulc install instructions, but initially ran into this issue:
Traceback (most recent call last): File "hulc/evaluation/evaluate_policy.py", line 10, in <module> from calvin_agent.evaluation.utils import get_default_model_and_env ModuleNotFoundError: No module named 'calvin_agent'
To resolve this I installed the calvin repo.
However, I am now getting the following error:
ImportError: Encountered error: "No module named 'hulc.datasets'" when loading module 'hulc.datasets.hulc_data_module.HulcDataModule'
It seems like the hulc/datasets
folder no longer exists in this repository?
Any help is appreciated, thanks!
Hi, Could I also ask how long it takes to train your hulc model for each epoch with 8x NVIDIA RTX 2080Ti. Training costs many memory resources(like 200+ GB) under my setup.
I am trying to interactively test the learned policy by giving the agent an arbitrary command, not the predefined task description. I saw the file hulc/evaluation/test_policy_interactive.py
but I found it initialize the language encoder SBert("mpnet"). Could I ask which language model you used (maybe pytorch code) to get the "lang_paraphrase-MiniLM-L3-v2" embedding you used for training?
Hi,
I am currently facing the error, while trying to use the shared memory variant of the dataset D. The error occurs in the following line:
https://github.com/lukashermann/hulc/blob/main/hulc/datasets/utils/shared_memory_utils.py#L192
where the start_idx variable does not match the current dataset. I tried to reinstall the dataset to make sure I installed everything, but it did not help and tried to fix it.
Without using the shared memory variant the code runs without any errors. However, I have some general performance issues using a Slurm cluster with 4x3090. Currently, one epoch of training Hulc on task D takes approximate 70 hours without the shared memory. I already tried experimenting with the batch size and the number of workers, but so far it did not help. Does not using the shared memory dataset causes such a huge difference in training speed?
Do you have some advice to improve the performance?
Thanks in advance!
Best regards.
if "lang" in self.modality_scope:
latent_goal = self.language_goal(dataset_batch["lang"])
I found the part of the batch data is "vim" ,and part of them is 'lang" .why is this setting?
where is the data processing code? if I want to use another lang embedding of another language encoder ,how can i change the code?
Thanks for your answering
Hi, thank you so much for this great repo. See question above.
Thanks for your excellent work!
When I download the pre-trained model and evaluate it, the error occurs 'No module named 'hulc.models.decoders.action_decoder_gripper_cam_rnn'
when loading module
'hulc.models.decoders.action_decoder_gripper_cam_rnn.ActionDecoderGripperCamRNN''.
It seems that the current repo does not include action_decoder_gripper_cam_rnn.
Hi,
Thanks for your great work. I am trying to evaluate the pretrained model and I run: python hulc/evaluation/evaluate_policy.py --dataset_path /home/systemtec/hulc/dataset/task_D_D --train_folder /home/systemtec/hulc/checkpoints/HULC_D_D --checkpoint /home/systemtec/hulc/checkpoints/HULC_D_D/saved_models/HULC_D_D.ckpt --debug
pybullet build time: May 20 2022 19:44:17
Global seed set to 0
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/systemtec/mambaforge/envs/hulc_venv/lib/python3.8/site-packages/hydra/ │
│ _internal/utils.py:570 in _locate │
│ │
│ 567 │ for n in reversed(range(len(parts))): │
│ 568 │ │ try: │
│ 569 │ │ │ mod = ".".join(parts[:n]) │
│ ❱ 570 │ │ │ module = import_module(mod) │
│ 571 │ │ except Exception as e: │
│ 572 │ │ │ if n == 0: │
│ 573 │ │ │ │ raise ImportError(f"Error loading module '{path}'") fr │
│ │
│ ╭───────────────────────────── locals ──────────────────────────────╮ │
│ │ builtins = <module 'builtins' (built-in)> │ │
│ │ import_module = <function import_module at 0x7f2625708040> │ │
│ │ mod = '' │ │
│ │ module = None │ │
│ │ n = 0 │ │
│ │ parts = ['lfp', 'utils', 'transforms', 'NormalizeVector'] │ │
│ │ path = 'lfp.utils.transforms.NormalizeVector' │ │
│ ╰───────────────────────────────────────────────────────────────────╯ │
│ │
│ /home/systemtec/mambaforge/envs/hulc_venv/lib/python3.8/importlib/init.p │
│ y:127 in import_module │
│ │
│ 124 │ │ │ if character != '.': │
│ 125 │ │ │ │ break │
│ 126 │ │ │ level += 1 │
│ ❱ 127 │ return _bootstrap._gcd_import(name[level:], package, level) │
│ 128 │
│ 129 │
│ 130 _RELOADING = {} │
│ │
│ ╭──── locals ────╮ │
│ │ level = 0 │ │
│ │ name = '' │ │
│ │ package = None │ │
│ ╰────────────────╯ │
│ in _gcd_import:1011 │
│ ╭──── locals ────╮ │
│ │ level = 0 │ │
│ │ name = '' │ │
│ │ package = None │ │
│ ╰────────────────╯ │
│ in _sanity_check:950 │
│ ╭──── locals ────╮ │
│ │ level = 0 │ │
│ │ name = '' │ │
│ │ package = None │ │
│ ╰────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
ValueError: Empty module name
ImportError: Error loading module 'lfp.utils.transforms.NormalizeVector'
If I run the training on calvin_debug_dataset
, everything works fine but if I use the real dataset task_ABCD_D
the training crases after completing the initial shared memory loading.
This is the stack trace:
Traceback (most recent call last):
File "/home/jovyan/lcrs/hulc/hulc/training.py", line 171, in <module>
train()
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/main.py", line 48, in decorated_main
_run_hydra(
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 377, in _run_hydra
run_and_report(
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 294, in run_and_report
raise ex
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 378, in <lambda>
lambda: hydra.run(
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 111, in run
_ = ret.return_value
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "/home/jovyan/lcrs/hulc/hulc/training.py", line 74, in train
trainer.fit(model, datamodule=datamodule, ckpt_path=chk) # type: ignore
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 603, in fit
call._call_and_handle_interrupt(
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1037, in _run
self._call_setup_hook() # allow user to setup lightning_module in accelerator environment
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1284, in _call_setup_hook
self._call_lightning_datamodule_hook("setup", stage=fn)
File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1361, in _call_lightning_datamodule_hook
return fn(*args, **kwargs)
File "/home/jovyan/lcrs/calvin/calvin_models/calvin_agent/datasets/calvin_data_module.py", line 92, in setup
train_dataset.setup_shm_lookup(train_shm_lookup)
File "/home/jovyan/lcrs/calvin/calvin_models/calvin_agent/datasets/shm_dataset.py", line 41, in setup_shm_lookup
key = list(self.episode_lookup_dict.keys())[0]
IndexError: list index out of range
Hi, on line 935 of hulc/models/hulc.py
, I don't see clip_inference
being used. Should this function be removed?
First of all, thank you for your excellent work. Your provided policy has a very high success rate.
When training with your hulc module, I found that the training loss went down smoothly. However, the validation loss does not go down since the first epoch. The following figure shows my action_loss_pp on wandb.
I trained with dataset D's training set and validated with D's validation set.
I checked the validation step and I think it goes well. I don't know if its the problem of D's validation set
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.