Giter Site home page Giter Site logo

robotics-fm-survey's Introduction

Paper List in the survey paper

The papers that we are surveying are listed in this file. The papers are grouped by the following categories:

  • Foundation models used in Robotics. For these papers, the authors apply existing vision and language foundation models, such as LLM, VLM, vision FM and text-conditioned image generation models in modules of robotics, such as perception, decision making and planning, and action.

  • Robotic Foundation Models. For these papers, the authors propose new foundation models used in one specific robotic applications, such as control using imilation learning and reinforcement learning. We also include genera-purpose foundation models, such as GATO, PALM-E in this category.

The taxonomy is shown in this figure,

We list all the papers surveyed in our paper. The dates are based on the first released date on arxiv. This list will be constantly updated.

NOTE: We only include papers with experiments on real physical robotics, in high-fidelity robotic simulation environments, or using real robotics datasets.

Foundation models used in Robotics

Perception

  • CLIPORT CLIPORT: What and Where Pathways for Robotic Manipulation, 24 Sep 2021, paper link
  • LM-Nav LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action, 10 Jul 2022 paper link
  • NLMap Open-vocabulary Queryable Scene Representations for Real World Planning, 20 Sep 2022, Paper Link
  • CLIP-Fields CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory, 11 Oct 2022, paper link
  • VLMap Visual Language Maps for Robot Navigation, 11 Oct 2022, paper link
  • ConceptFusion ConceptFusion: Open-set Multimodal 3D Mapping, 14 Feb 2023, Paper Link
  • WVN Fast Traversability Estimation for Wild Visual Navigation, 15 May 2023, paper link
  • HomeRobot HomeRobot: Open-Vocabulary Mobile Manipulation, 20 Jun 2023, paper link
  • Act3D Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation, 30 Jun 2023, paper link
  • F3RM Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation, 27 Jul 2023, paper link
  • AnyLoc Towards Universal Visual Place Recognition, 1 Aug 2023, paper link
  • GNFactor GNFactor Multi-Task Real Robot Learning with Generalizable Neural Feature Fields, 31 Aug 2023, paper link
  • MOSAIC MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Perception, 15 Sep 2023, paper link

Task Planning

  • Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers, 25 Mar 2022, paper link
  • Socratic Models Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language, 1 Apr 2022, paper link
  • SayCan Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, 4 Apr 2022, paper link
  • Correcting Robot Plans with Natural Language Feedback, 11 Apr 2022, paper link
  • Housekeep Housekeep: Tidying Virtual Households using Commonsense Reasoning, 22 May 2022, paper link
  • Inner Monologue Inner Monologue: Embodied Reasoning through Planning with Language Models, 12 Jul 2022, paper link
  • Code as Policies Code as Policies: Language Model Programs for Embodied Control, 16 Sep 2022, paper link
  • ProgPrompt ProgPrompt: Generating Situated Robot Task Plans using Large Language Models, 22 Sep 2022, paper link
  • VIMA VIMA: General Robot Manipulation with Multimodal Prompts, 6 Oct 2022, paper link
  • LILAC “No, to the Right” – Online Language Corrections for Robotic Manipulation via Shared Autonomy, 6 Jan 2023, paper link
  • SceneDiffuser Diffusion-based Generation, Optimization, and Planning in 3D Scenes, 15 Jan 2023, paper link
  • ChatGPT for Robotics ChatGPT for Robotics: Design Principles and Model Abilities, 20 Feb 2023, paper link
  • Grounded Decoding Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control, 1 Mar 2023 , paper link
  • TidyBot TidyBot: Personalized Robot Assistance with Large Language Models, 9 May 2023, paper link
  • Instruct2Act Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model, 18 May 2023, paper link
  • KNOWNO Robots That Ask For Help:Uncertainty Alignment for Large Language Model Planners, 4 Jul 2023, paper link
  • RoCo RoCo: Dialectic Multi-Robot Collaboration with Large Language Models, 10 Jul 2023, paper link
  • SayPlan SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning, 12 Jul 2023 , paper link
  • VLP: Video Language Planning, 16 Oct 2023 , paper link
  • SuSIE SuSIE: Subgoal Synthesis via Image Editing, 2023, paper link
  • RoboTool RoboTool: Creative Robot Tool Use with Large Language Models, 23 Oct 2023, project link
  • AutoRT AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents, 05 Jan 2024, oaoer link

Action Generation

  • SayTap SayTap: Language to Quadrupedal Locomotion, 13 Jun 2023, paper link
  • L2R Language to Rewards for Robotic Skill Synthesis, 14 Jun 2023 , Paper Link
  • VoxPoser VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models, 12 Jul 2023, paper link
  • ReasonedExplorer Reasoning about the Unseen for Efficient Outdoor Object Navigation, 18 Sep 2023, paper link
  • Eureka Eureka: Human-Level Reward Design via Coding Large Language Models, 19 Oct 2023, paper link

Data Generation

  • CACTI CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning, 12 Dec 2022, paper link
  • ROSIE Scaling Robot Learning with Semantically Imagined Experience, 22 Feb 2023 , paper link
  • GenSim GenSim: Generating Robotic Simulation Tasks via Large Language Models, 2 Oct 2023, paper link
  • RoboGen RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation, 2 Nov 2023, paper link
  • RT-Trajectory RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches, 3 Nov 2023, paper link

Robotic Foundation Models

Single-Purpose

Action Generation

  • ZeST Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?, 23 Apr 2022 , paper link
  • Behavior Transformers Behavior Transformers: Cloning k modes with one stone, 22 Jun 2022, paper link
  • ATLA Leveraging Language for Accelerated Learning of Tool Manipulation, 27 Jun 2022, paper link
  • LATTE LATTE: LAnguage Trajectory TransformEr, 4 Aug 2022, paper link
  • Perceiver-Actor Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation, 12 Sep 2022, paper link
  • MVP Real-World Robot Learning with Masked Visual Pre-training, 6 Oct 2022, paper link
  • GNM GNM: A General Navigation Model to Drive Any Robot, 7 Oct 2022 , Paper Link
  • Interactive Language Interactive Language: Talking to Robots in Real Time, 12 Oct 2022 , paper link
  • Conditional Behavior Transformers (C-BeT)From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data, 18 Oct 2022, paper link
  • STAP STAP: Sequencing Task-Agnostic Policies, 21 Oct 2022, paper link
  • LILA LILA: Language-Informed Latent Actions, 31 Oct 2022, paper link
  • DIAL Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models, 21 Nov 2022 , paper link
  • RT-1 RT-1: Robotics Transformer for Real-World Control at Scale, Dec 2022, paper link
  • MOO Open-World Object Manipulation using Pre-Trained Vision-Language Models, 2 Mar 2023, paper link
  • RC-1 Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?, 31 Mar 2023 , paper link
  • CoTPC Chain-of-Thought Predictive Control, 3 Apr 2023 , paper link
  • ARNOLD ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes, 9 Apr 2023, paper link
  • Optimus Imitating Task and Motion Planning with Visuomotor Transformers, 25 May 2023, paper link
  • RoboCat RoboCat: A self-improving robotic agent, 20 Jun 2023, paper link
  • Scaling Up and Distilling Down Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition, 26 Jul 2023, paper link
  • ViNT ViNT: A Foundation Model for Visual Navigation, 26 Jun 2023, Paper Link
  • RT-2 RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, 28 Jul 2023, paper link
  • RoboAgent RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking, 5 Sep 2023, paper link
  • RT-X Open X-Embodiment: Robotic Learning Datasets and RT-X Models, 13 Oct 2023, paper link
  • Q-Transformer Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, 18 Sept 2023, paper link
  • On Bringing Robots Home, 27 Nov 2023, paper link
  • Octo Octo: An Open-Source Generalist Robot Policy, 14 Dec 2023, paper link

General-Purpose

  • GATO A Generalist Agent, 12 May 2022, paper link
  • PACT PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training, 22 Sep 2022, paper link
  • PALM-E PaLM-E: An Embodied Multimodal Language Model, 6 Mar 2023, paper link
  • LEO An Embodied Generalist Agent in 3D World, 18 Nov 2023, paper link

Related Surveys and repositories

Robotics surveys

  • Reinforcement Learning in Robotics: A Survey, 2013, paper link
  • A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms, 2021, paper link
  • How to train your robot with deep reinforcement learning: lessons we have learned, 2021, paper link

Foundation models surveys

  • On the Opportunities and Risks of Foundation Models, 2021, paper link
  • Foundation Models for Decision Making: Problems, Methods, and Opportunities, 2023, paper link
  • Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond, 2023, paper link
  • Challenges and Applications of Large Language Models, 2023, paper link
  • A Survey on Large Language Model based Autonomous Agents, 2023, paper link

Foundation models and robotics

  • Awesome-LLM-Robotics repo link
  • Foundation Models in Robotics: Applications, Challenges, and the Future, 2023, paper link

BibTex

If you find our survey paper helpful, please kindly consider citing us:

@misc{hu2023robofm,
      title={Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis}, 
      author={Yafei Hu and Quanting Xie and Vidhi Jain and Jonathan Francis and Jay Patrikar and Nikhil Keetha and Seungchan Kim and Yaqi Xie and Tianyi Zhang and Shibo Zhao and Yu-Quan Chong and Chen Wang and Katia Sycara and Matthew Johnson-Roberson and Dhruv Batra and Xiaolong Wang and Sebastian Scherer and Zsolt Kira and Fei Xia and Yonatan Bisk},
      year={2023},
      eprint={2312.08782},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

robotics-fm-survey's People

Contributors

jeffreyyh avatar fxia22 avatar tyz1030 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.