The hulc from lukashermann

task_ABC_D dataset can not be unzipped

Hi, I tried using the dataset ABC to train the model. I downloaded the task_ABC_D.zip and tried to unzip it. I tried the commands unzip and 7z but both can not unzip it successfully. The error said 'start of central directory not found zip file corrupt'. I am sure the zip file has already been fully downloaded (518 GB).

how to evaluate the model in a rl manner

Thank you for your great work! I wonder how to evaluate the trained model in a rl manner. Can you provide an example? Thx.

Questions about running the code.

Hi, thanks for sharing your work, I met this problem when I followed the instructions run the training code.

I successfully installed the calvin_env.

I don't know why this happens, there might be something wrong with the calvin_env?
It would be so nice if you could answer my question, thanks again.

Question about Mixture Distribution

Hello, first of all thank you for this great work. I have a question about the action decoder. Here you are using a mixture distribution of 10 logistic distributions. Using linear layers after the RNN you have predicted the means, logit_scales, and logit_probs for the distributions. I am wondering how exactly you train the distributions.
I know that the best distribution is the most weighted, but what exactly does that mean for the other distributions in terms of mean, logit_scale, and logit_probs? What is the effect of backpropagation here?
Naively, shouldn't all 10 distributions converge to the optimum?

Thank you for your time and help.

Evaluate Pretrained Models - "No module named 'hulc.datasets'"

Hi,

I am trying to evaluate the provided pretrained models. I followed the hulc install instructions, but initially ran into this issue:

Traceback (most recent call last): File "hulc/evaluation/evaluate_policy.py", line 10, in <module> from calvin_agent.evaluation.utils import get_default_model_and_env ModuleNotFoundError: No module named 'calvin_agent'

To resolve this I installed the calvin repo.

However, I am now getting the following error:
ImportError: Encountered error: "No module named 'hulc.datasets'" when loading module 'hulc.datasets.hulc_data_module.HulcDataModule'

It seems like the hulc/datasets folder no longer exists in this repository?

Any help is appreciated, thanks!

Training Time

Hi, Could I also ask how long it takes to train your hulc model for each epoch with 8x NVIDIA RTX 2080Ti. Training costs many memory resources(like 200+ GB) under my setup.

Which language model you used to get the "lang_paraphrase-MiniLM-L3-v2" embedding.

I am trying to interactively test the learned policy by giving the agent an arbitrary command, not the predefined task description. I saw the file hulc/evaluation/test_policy_interactive.py but I found it initialize the language encoder SBert("mpnet"). Could I ask which language model you used (maybe pytorch code) to get the "lang_paraphrase-MiniLM-L3-v2" embedding you used for training?

Error while loading data into shared memory

Hi,

I am currently facing the error, while trying to use the shared memory variant of the dataset D. The error occurs in the following line:
https://github.com/lukashermann/hulc/blob/main/hulc/datasets/utils/shared_memory_utils.py#L192
where the start_idx variable does not match the current dataset. I tried to reinstall the dataset to make sure I installed everything, but it did not help and tried to fix it.

Without using the shared memory variant the code runs without any errors. However, I have some general performance issues using a Slurm cluster with 4x3090. Currently, one epoch of training Hulc on task D takes approximate 70 hours without the shared memory. I already tried experimenting with the batch size and the number of workers, but so far it did not help. Does not using the shared memory dataset causes such a huge difference in training speed?
Do you have some advice to improve the performance?

Thanks in advance!
Best regards.

ALSA lib error

while I run the traing code
I found the ALSA error:

ALSA lib conf.c:5180:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5703:(snd_config_expand) Evaluate error: No such file or directory

why this will use ALSA and how can i fix it

data process

if "lang" in self.modality_scope:
latent_goal = self.language_goal(dataset_batch["lang"])

I found the part of the batch data is "vim" ,and part of them is 'lang" .why is this setting?
where is the data processing code? if I want to use another lang embedding of another language encoder ,how can i change the code?
Thanks for your answering

Error while running trainning.py

Hi everyone,

I am trying to run the trainning.py for debug dataset without using the pretrained model. Now I stuck at this TypeError:

Since I am not sure about what 'calvin_env.file' here refer to, I am confused.

Thanks for any tipps in advance.

Where to find MCIL code?

Hi, thank you so much for this great repo. See question above.

Erros when evaluating the pre-trained model

Thanks for your excellent work!

When I download the pre-trained model and evaluate it, the error occurs 'No module named 'hulc.models.decoders.action_decoder_gripper_cam_rnn' when loading module
'hulc.models.decoders.action_decoder_gripper_cam_rnn.ActionDecoderGripperCamRNN''.

It seems that the current repo does not include action_decoder_gripper_cam_rnn.

ValueError: Empty module name

Hi,

Thanks for your great work. I am trying to evaluate the pretrained model and I run: python hulc/evaluation/evaluate_policy.py --dataset_path /home/systemtec/hulc/dataset/task_D_D --train_folder /home/systemtec/hulc/checkpoints/HULC_D_D --checkpoint /home/systemtec/hulc/checkpoints/HULC_D_D/saved_models/HULC_D_D.ckpt --debug

pybullet build time: May 20 2022 19:44:17
Global seed set to 0
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/systemtec/mambaforge/envs/hulc_venv/lib/python3.8/site-packages/hydra/ │
│ _internal/utils.py:570 in _locate │
│ │
│ 567 │ for n in reversed(range(len(parts))): │
│ 568 │ │ try: │
│ 569 │ │ │ mod = ".".join(parts[:n]) │
│ ❱ 570 │ │ │ module = import_module(mod) │
│ 571 │ │ except Exception as e: │
│ 572 │ │ │ if n == 0: │
│ 573 │ │ │ │ raise ImportError(f"Error loading module '{path}'") fr │
│ │
│ ╭───────────────────────────── locals ──────────────────────────────╮ │
│ │ builtins = <module 'builtins' (built-in)> │ │
│ │ import_module = <function import_module at 0x7f2625708040> │ │
│ │ mod = '' │ │
│ │ module = None │ │
│ │ n = 0 │ │
│ │ parts = ['lfp', 'utils', 'transforms', 'NormalizeVector'] │ │
│ │ path = 'lfp.utils.transforms.NormalizeVector' │ │
│ ╰───────────────────────────────────────────────────────────────────╯ │
│ │
│ /home/systemtec/mambaforge/envs/hulc_venv/lib/python3.8/importlib/init.p │
│ y:127 in import_module │
│ │
│ 124 │ │ │ if character != '.': │
│ 125 │ │ │ │ break │
│ 126 │ │ │ level += 1 │
│ ❱ 127 │ return _bootstrap._gcd_import(name[level:], package, level) │
│ 128 │
│ 129 │
│ 130 _RELOADING = {} │
│ │
│ ╭──── locals ────╮ │
│ │ level = 0 │ │
│ │ name = '' │ │
│ │ package = None │ │
│ ╰────────────────╯ │
│ in _gcd_import:1011 │
│ ╭──── locals ────╮ │
│ │ level = 0 │ │
│ │ name = '' │ │
│ │ package = None │ │
│ ╰────────────────╯ │
│ in _sanity_check:950 │
│ ╭──── locals ────╮ │
│ │ level = 0 │ │
│ │ name = '' │ │
│ │ package = None │ │
│ ╰────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
ValueError: Empty module name
ImportError: Error loading module 'lfp.utils.transforms.NormalizeVector'

Training crashes with task_ABCD_D dataset

If I run the training on calvin_debug_dataset, everything works fine but if I use the real dataset task_ABCD_D the training crases after completing the initial shared memory loading.
This is the stack trace:

Traceback (most recent call last):
  File "/home/jovyan/lcrs/hulc/hulc/training.py", line 171, in <module>
    train()
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/main.py", line 48, in decorated_main
    _run_hydra(
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 377, in _run_hydra
    run_and_report(
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 294, in run_and_report
    raise ex
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 378, in <lambda>
    lambda: hydra.run(
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 111, in run
    _ = ret.return_value
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/core/utils.py", line 160, in run_job
    ret.return_value = task_function(task_cfg)
  File "/home/jovyan/lcrs/hulc/hulc/training.py", line 74, in train
    trainer.fit(model, datamodule=datamodule, ckpt_path=chk)  # type: ignore
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 603, in fit
    call._call_and_handle_interrupt(
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1037, in _run
    self._call_setup_hook()  # allow user to setup lightning_module in accelerator environment
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1284, in _call_setup_hook
    self._call_lightning_datamodule_hook("setup", stage=fn)
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1361, in _call_lightning_datamodule_hook
    return fn(*args, **kwargs)
  File "/home/jovyan/lcrs/calvin/calvin_models/calvin_agent/datasets/calvin_data_module.py", line 92, in setup
    train_dataset.setup_shm_lookup(train_shm_lookup)
  File "/home/jovyan/lcrs/calvin/calvin_models/calvin_agent/datasets/shm_dataset.py", line 41, in setup_shm_lookup
    key = list(self.episode_lookup_dict.keys())[0]
IndexError: list index out of range

Unused function

Hi, on line 935 of hulc/models/hulc.py, I don't see clip_inference being used. Should this function be removed?

The validation action loss does not go down

First of all, thank you for your excellent work. Your provided policy has a very high success rate.
When training with your hulc module, I found that the training loss went down smoothly. However, the validation loss does not go down since the first epoch. The following figure shows my action_loss_pp on wandb.

I trained with dataset D's training set and validated with D's validation set.
I checked the validation step and I think it goes well. I don't know if its the problem of D's validation set

lukashermann / hulc Goto Github PK

hulc's People

Contributors

Stargazers

Watchers

Forkers

hulc's Issues

Recommend Projects

Recommend Topics

Recommend Org