Giter Site home page Giter Site logo

lukashermann / hulc Goto Github PK

View Code? Open in Web Editor NEW
60.0 2.0 9.0 29.51 MB

Hierarchical Universal Language Conditioned Policies

Home Page: http://hulc.cs.uni-freiburg.de

License: MIT License

Python 97.90% Shell 2.10%
computer-vision deep-learning grounding manipulation natural-language-processing pytorch robotics vision vision-and-language vision-language

hulc's People

Contributors

lukashermann avatar mees avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

hulc's Issues

task_ABC_D dataset can not be unzipped

Hi, I tried using the dataset ABC to train the model. I downloaded the task_ABC_D.zip and tried to unzip it. I tried the commands unzip and 7z but both can not unzip it successfully. The error said 'start of central directory not found zip file corrupt'. I am sure the zip file has already been fully downloaded (518 GB).

Questions about running the code.

Hi, thanks for sharing your work, I met this problem when I followed the instructions run the training code.

image
I successfully installed the calvin_env.
image

I don't know why this happens, there might be something wrong with the calvin_env?
It would be so nice if you could answer my question, thanks again.

Question about Mixture Distribution

Hello, first of all thank you for this great work. I have a question about the action decoder. Here you are using a mixture distribution of 10 logistic distributions. Using linear layers after the RNN you have predicted the means, logit_scales, and logit_probs for the distributions. I am wondering how exactly you train the distributions.
I know that the best distribution is the most weighted, but what exactly does that mean for the other distributions in terms of mean, logit_scale, and logit_probs? What is the effect of backpropagation here?
Naively, shouldn't all 10 distributions converge to the optimum?

Thank you for your time and help.

Evaluate Pretrained Models - "No module named 'hulc.datasets'"

Hi,

I am trying to evaluate the provided pretrained models. I followed the hulc install instructions, but initially ran into this issue:

Traceback (most recent call last): File "hulc/evaluation/evaluate_policy.py", line 10, in <module> from calvin_agent.evaluation.utils import get_default_model_and_env ModuleNotFoundError: No module named 'calvin_agent'

To resolve this I installed the calvin repo.

However, I am now getting the following error:
ImportError: Encountered error: "No module named 'hulc.datasets'" when loading module 'hulc.datasets.hulc_data_module.HulcDataModule'

It seems like the hulc/datasets folder no longer exists in this repository?

Any help is appreciated, thanks!

Training Time

Hi, Could I also ask how long it takes to train your hulc model for each epoch with 8x NVIDIA RTX 2080Ti. Training costs many memory resources(like 200+ GB) under my setup.

Which language model you used to get the "lang_paraphrase-MiniLM-L3-v2" embedding.

I am trying to interactively test the learned policy by giving the agent an arbitrary command, not the predefined task description. I saw the file hulc/evaluation/test_policy_interactive.py but I found it initialize the language encoder SBert("mpnet"). Could I ask which language model you used (maybe pytorch code) to get the "lang_paraphrase-MiniLM-L3-v2" embedding you used for training?

Error while loading data into shared memory

Hi,

I am currently facing the error, while trying to use the shared memory variant of the dataset D. The error occurs in the following line:
https://github.com/lukashermann/hulc/blob/main/hulc/datasets/utils/shared_memory_utils.py#L192
where the start_idx variable does not match the current dataset. I tried to reinstall the dataset to make sure I installed everything, but it did not help and tried to fix it.

image

Without using the shared memory variant the code runs without any errors. However, I have some general performance issues using a Slurm cluster with 4x3090. Currently, one epoch of training Hulc on task D takes approximate 70 hours without the shared memory. I already tried experimenting with the batch size and the number of workers, but so far it did not help. Does not using the shared memory dataset causes such a huge difference in training speed?
Do you have some advice to improve the performance?

Thanks in advance!
Best regards.

ALSA lib error

while I run the traing code
I found the ALSA error:
image

ALSA lib conf.c:5180:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5703:(snd_config_expand) Evaluate error: No such file or directory

why this will use ALSA and how can i fix it

data process

if "lang" in self.modality_scope:
latent_goal = self.language_goal(dataset_batch["lang"])

I found the part of the batch data is "vim" ,and part of them is 'lang" .why is this setting?
where is the data processing code? if I want to use another lang embedding of another language encoder ,how can i change the code?
Thanks for your answering

Error while running trainning.py

Hi everyone,

I am trying to run the trainning.py for debug dataset without using the pretrained model. Now I stuck at this TypeError:

'calvin_env file' is 'None'

Since I am not sure about what 'calvin_env.file' here refer to, I am confused.

Thanks for any tipps in advance.

Erros when evaluating the pre-trained model

Thanks for your excellent work!

When I download the pre-trained model and evaluate it, the error occurs 'No module named 'hulc.models.decoders.action_decoder_gripper_cam_rnn' when loading module
'hulc.models.decoders.action_decoder_gripper_cam_rnn.ActionDecoderGripperCamRNN''.

It seems that the current repo does not include action_decoder_gripper_cam_rnn.

ValueError: Empty module name

Hi,

Thanks for your great work. I am trying to evaluate the pretrained model and I run: python hulc/evaluation/evaluate_policy.py --dataset_path /home/systemtec/hulc/dataset/task_D_D --train_folder /home/systemtec/hulc/checkpoints/HULC_D_D --checkpoint /home/systemtec/hulc/checkpoints/HULC_D_D/saved_models/HULC_D_D.ckpt --debug

pybullet build time: May 20 2022 19:44:17
Global seed set to 0
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/systemtec/mambaforge/envs/hulc_venv/lib/python3.8/site-packages/hydra/ │
│ _internal/utils.py:570 in _locate │
│ │
│ 567 │ for n in reversed(range(len(parts))): │
│ 568 │ │ try: │
│ 569 │ │ │ mod = ".".join(parts[:n]) │
│ ❱ 570 │ │ │ module = import_module(mod) │
│ 571 │ │ except Exception as e: │
│ 572 │ │ │ if n == 0: │
│ 573 │ │ │ │ raise ImportError(f"Error loading module '{path}'") fr │
│ │
│ ╭───────────────────────────── locals ──────────────────────────────╮ │
│ │ builtins = <module 'builtins' (built-in)> │ │
│ │ import_module = <function import_module at 0x7f2625708040> │ │
│ │ mod = '' │ │
│ │ module = None │ │
│ │ n = 0 │ │
│ │ parts = ['lfp', 'utils', 'transforms', 'NormalizeVector'] │ │
│ │ path = 'lfp.utils.transforms.NormalizeVector' │ │
│ ╰───────────────────────────────────────────────────────────────────╯ │
│ │
│ /home/systemtec/mambaforge/envs/hulc_venv/lib/python3.8/importlib/init.p │
│ y:127 in import_module │
│ │
│ 124 │ │ │ if character != '.': │
│ 125 │ │ │ │ break │
│ 126 │ │ │ level += 1 │
│ ❱ 127 │ return _bootstrap._gcd_import(name[level:], package, level) │
│ 128 │
│ 129 │
│ 130 _RELOADING = {} │
│ │
│ ╭──── locals ────╮ │
│ │ level = 0 │ │
│ │ name = '' │ │
│ │ package = None │ │
│ ╰────────────────╯ │
│ in _gcd_import:1011 │
│ ╭──── locals ────╮ │
│ │ level = 0 │ │
│ │ name = '' │ │
│ │ package = None │ │
│ ╰────────────────╯ │
│ in _sanity_check:950 │
│ ╭──── locals ────╮ │
│ │ level = 0 │ │
│ │ name = '' │ │
│ │ package = None │ │
│ ╰────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
ValueError: Empty module name
ImportError: Error loading module 'lfp.utils.transforms.NormalizeVector'

Training crashes with task_ABCD_D dataset

If I run the training on calvin_debug_dataset, everything works fine but if I use the real dataset task_ABCD_D the training crases after completing the initial shared memory loading.
This is the stack trace:

Traceback (most recent call last):
  File "/home/jovyan/lcrs/hulc/hulc/training.py", line 171, in <module>
    train()
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/main.py", line 48, in decorated_main
    _run_hydra(
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 377, in _run_hydra
    run_and_report(
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 294, in run_and_report
    raise ex
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/utils.py", line 378, in <lambda>
    lambda: hydra.run(
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 111, in run
    _ = ret.return_value
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/hydra/core/utils.py", line 160, in run_job
    ret.return_value = task_function(task_cfg)
  File "/home/jovyan/lcrs/hulc/hulc/training.py", line 74, in train
    trainer.fit(model, datamodule=datamodule, ckpt_path=chk)  # type: ignore
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 603, in fit
    call._call_and_handle_interrupt(
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1037, in _run
    self._call_setup_hook()  # allow user to setup lightning_module in accelerator environment
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1284, in _call_setup_hook
    self._call_lightning_datamodule_hook("setup", stage=fn)
  File "/opt/conda/envs/lcrs_venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1361, in _call_lightning_datamodule_hook
    return fn(*args, **kwargs)
  File "/home/jovyan/lcrs/calvin/calvin_models/calvin_agent/datasets/calvin_data_module.py", line 92, in setup
    train_dataset.setup_shm_lookup(train_shm_lookup)
  File "/home/jovyan/lcrs/calvin/calvin_models/calvin_agent/datasets/shm_dataset.py", line 41, in setup_shm_lookup
    key = list(self.episode_lookup_dict.keys())[0]
IndexError: list index out of range

Unused function

Hi, on line 935 of hulc/models/hulc.py, I don't see clip_inference being used. Should this function be removed?

The validation action loss does not go down

First of all, thank you for your excellent work. Your provided policy has a very high success rate.
When training with your hulc module, I found that the training loss went down smoothly. However, the validation loss does not go down since the first epoch. The following figure shows my action_loss_pp on wandb.
Screenshot 2022-11-05 at 22 20 42
I trained with dataset D's training set and validated with D's validation set.
I checked the validation step and I think it goes well. I don't know if its the problem of D's validation set

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.