Giter Site home page Giter Site logo

helper_new's Introduction

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

Website

This repo contains code and data for running HELPER.

Contents

Installation

Environment

(1) Start by cloning the repository:

git clone https://github.com/Gabesarch/HELPER.git

(1a) (optional) If you are using conda, create an environment:

conda create -n helper python=3.8

(2) Install PyTorch with the CUDA version you have. For example, run the following for CUDA 11.1:

pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html

(3) Install additional requirements:

pip install -r requirements.txt

(4) Install Detectron2 (needed for SOLQ detector) with correct PyTorch and CUDA version. E.g. for PyTorch 1.10 & CUDA 11.1:

python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.10/index.html

(5) Install teach:

pip install -e teach

(6) Build SOLQ deformable attention:

cd ./SOLQ/models/ops && sh make.sh && cd ../../..

(7) Clone ZoeDepth repo

git clone https://github.com/isl-org/ZoeDepth.git
cd ZoeDepth
git checkout edb6daf45458569e24f50250ef1ed08c015f17a7

TEACh Dataset

  1. Download the TEACh dataset following the instructions in the TEACh repo
teach_download 

Model Checkpoints and GPT Embeddings

To our model on the TEACh dataset, you'll first need the GPT embeddings for example retrieval:

  1. Download GPT embeddings for example retrieval: here. Unzip it to get the gpt_embedding folder in ./data folder (or in a desired foldered and set --gpt_embedding_dir argument). Alternatively, you can download the file with gdown (pip install gdown):
cd data
gdown 1kqZZXdglNICjDlDKygd19JyyBzkkk-UL
unzip gpt_embeddings.zip
rm gpt_embeddings.zip

TO run our model with estimated depth and segmentation, download the SOLQ and ZoeDepth checkpoints:

  1. Download SOLQ checkpoint: here. Place it in the ./checkpoints folder (or anywhere you want and specify the path with --solq_checkpoint). Alternatively, you can download the file with gdown (pip install gdown):
cd checkpoints
gdown 1hTCtTuygPCJnhAkGeVPzWGHiY3PHNE2j
  1. Download ZoeDepth checkpoint: here. Place it in the ./checkpoints folder (or anywhere you want and specify the path with --zoedepth_checkpoint). (Also make sure you clone the ZoeDepth repo: git clone https://github.com/isl-org/ZoeDepth.git) Alternatively, you can download the file with gdown (pip install gdown):
cd checkpoints
gdown 1gMe8_5PzaNKWLT5OP-9KKEYhbNxRjk9F

Running TEACh benchmark

Running the TfD evaluation

  1. (if required) Start x server. if an X server is not already running on your machine. First, open a screen with the desired node, and run the following to open an x server on that node:
python startx.py 0

Specify the server port number with the argument --server_port (default 0).

  1. Set OpenAI keys. Set Azure keys:
export AZURE_OPENAI_KEY={KEY}
export AZURE_OPENAI_ENDPOINT={ENDPOINT}

(If not using Azure) Important! If using openai API, append --use_openai to arguments. Then set openai key:

export OPENAI_API_KEY={KEY}
  1. Run agent. To run the agent with all modules and estimated perception on TfD validation unseen, run the following:
python main.py \
 --mode teach_eval_tfd \
 --split valid_unseen \
 --gpt_embedding_dir ./data/gpt_embeddings \
 --teach_data_dir PATH_TO_TEACH_DATASET \
 --server_port X_SERVER_PORT_HERE \
 --episode_in_try_except \
 --use_llm_search \
 --use_constraint_check \
 --run_error_correction_llm \
 --zoedepth_checkpoint ./checkpoints/ZOEDEPTH-model-00015000.pth \
 --solq_checkpoint ./checkpoints/SOLQ-model-00023000.pth \
 --set_name HELPER_teach_tfd_validunseen

Change split to --split valid_seen to evaluate validation seen set.

Metrics

All metrics will be saved to ./output/metrics/{set_name}. Metrics and videos will also automatically be logged to wandb.

Movie generation

To create movies of the agent, append --create_movie to the arguments. This will by default create a movie for every episode rendered to ./output/movies. To change the episode frequency of logging, alter --log_every (e.g., --log_every 10 to render videos every 10 episodes). To remove the map visualization, append --remove_map_vis to the arguments. This can speed up the episode since rendering the map visual can slow down episodes.

Ablations

The following arguments can be removed to run the ablations:

  1. Remove memory augmented prompting. Add argument --ablate_example_retrieval.
  2. Remove LLM search (locator) (only random). Remove --use_llm_search.
  3. Remove constraint check (inspector). Remove --use_constraint_check.
  4. Remove error correction (rectifier). Remove --run_error_correction_llm.
  5. Change openai model type. Change --openai_model argument (e.g., --openai_model gpt-3.5-turbo).

Ground truth

The following arguments can be added to run with ground truth:

  1. GT depth --use_gt_depth. Reccomended to also add --increased_explore with estimated segmentation for best performance.
  2. GT segmentation --use_gt_seg.
  3. GT action success --use_gt_success_checker.
  4. GT error feedback --use_GT_error_feedback.
  5. GT constraint check using controller metadata --use_GT_constraint_checks.
  6. Increase max API fails --max_api_fails {MAX_FAILS}.

User Feedback

To run with user feedback, add --use_progress_check. Two additional metric files (for feedback query 1 & 2) will be saved to ./output/metrics/{set_name}.

Running the EDH evaluation

See the teach_edh branch for how to run the TEACh EDH evaluation.

Citation

If you like this paper, please cite us:

@proceedings{findings-2023-findings-association-linguistics-emnlp,
    title = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    editor = "Sarch, Gabriel  and
      Wu, Yue  and
      Tarr, Michael and
      Fragkiadaki, Katerina",
    month = dec,
    year = "2023",
    publisher = "Association for Computational Linguistics",
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.