Paper List in the survey paper

The papers that we are surveying are listed in this file. The papers are grouped by the following categories:

Foundation models used in Robotics. For these papers, the authors apply existing vision and language foundation models, such as LLM, VLM, vision FM and text-conditioned image generation models in modules of robotics, such as perception, decision making and planning, and action.
Robotic Foundation Models. For these papers, the authors propose new foundation models used in one specific robotic applications, such as control using imilation learning and reinforcement learning. We also include genera-purpose foundation models, such as GATO, PALM-E in this category.

The taxonomy is shown in this figure,

We list all the papers surveyed in our paper. The dates are based on the first released date on arxiv. This list will be constantly updated.

NOTE: We only include papers with experiments on real physical robotics, in high-fidelity robotic simulation environments, or using real robotics datasets.

Foundation models used in Robotics

Perception

CLIPORT CLIPORT: What and Where Pathways for Robotic Manipulation, 24 Sep 2021, paper link
LM-Nav LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action, 10 Jul 2022 paper link
NLMap Open-vocabulary Queryable Scene Representations for Real World Planning, 20 Sep 2022, Paper Link
CLIP-Fields CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory, 11 Oct 2022, paper link
VLMap Visual Language Maps for Robot Navigation, 11 Oct 2022, paper link
ConceptFusion ConceptFusion: Open-set Multimodal 3D Mapping, 14 Feb 2023, Paper Link
WVN Fast Traversability Estimation for Wild Visual Navigation, 15 May 2023, paper link
HomeRobot HomeRobot: Open-Vocabulary Mobile Manipulation, 20 Jun 2023, paper link
Act3D Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation, 30 Jun 2023, paper link
F3RM Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation, 27 Jul 2023, paper link
AnyLoc Towards Universal Visual Place Recognition, 1 Aug 2023, paper link
GNFactor GNFactor Multi-Task Real Robot Learning with Generalizable Neural Feature Fields, 31 Aug 2023, paper link
MOSAIC MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Perception, 15 Sep 2023, paper link

Task Planning

Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers, 25 Mar 2022, paper link
Socratic Models Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language, 1 Apr 2022, paper link
SayCan Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, 4 Apr 2022, paper link
Correcting Robot Plans with Natural Language Feedback, 11 Apr 2022, paper link
Housekeep Housekeep: Tidying Virtual Households using Commonsense Reasoning, 22 May 2022, paper link
Inner Monologue Inner Monologue: Embodied Reasoning through Planning with Language Models, 12 Jul 2022, paper link
Code as Policies Code as Policies: Language Model Programs for Embodied Control, 16 Sep 2022, paper link
ProgPrompt ProgPrompt: Generating Situated Robot Task Plans using Large Language Models, 22 Sep 2022, paper link
VIMA VIMA: General Robot Manipulation with Multimodal Prompts, 6 Oct 2022, paper link
LILAC “No, to the Right” – Online Language Corrections for Robotic Manipulation via Shared Autonomy, 6 Jan 2023, paper link
SceneDiffuser Diffusion-based Generation, Optimization, and Planning in 3D Scenes, 15 Jan 2023, paper link
ChatGPT for Robotics ChatGPT for Robotics: Design Principles and Model Abilities, 20 Feb 2023, paper link
Grounded Decoding Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control, 1 Mar 2023 , paper link
TidyBot TidyBot: Personalized Robot Assistance with Large Language Models, 9 May 2023, paper link
Instruct2Act Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model, 18 May 2023, paper link
KNOWNO Robots That Ask For Help:Uncertainty Alignment for Large Language Model Planners, 4 Jul 2023, paper link
RoCo RoCo: Dialectic Multi-Robot Collaboration with Large Language Models, 10 Jul 2023, paper link
SayPlan SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning, 12 Jul 2023 , paper link
VLP: Video Language Planning, 16 Oct 2023 , paper link
SuSIE SuSIE: Subgoal Synthesis via Image Editing, 2023, paper link
RoboTool RoboTool: Creative Robot Tool Use with Large Language Models, 23 Oct 2023, project link
AutoRT AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents, 05 Jan 2024, oaoer link

Action Generation

SayTap SayTap: Language to Quadrupedal Locomotion, 13 Jun 2023, paper link
L2R Language to Rewards for Robotic Skill Synthesis, 14 Jun 2023 , Paper Link
VoxPoser VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models, 12 Jul 2023, paper link
ReasonedExplorer Reasoning about the Unseen for Efficient Outdoor Object Navigation, 18 Sep 2023, paper link
Eureka Eureka: Human-Level Reward Design via Coding Large Language Models, 19 Oct 2023, paper link

Data Generation

CACTI CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning, 12 Dec 2022, paper link
ROSIE Scaling Robot Learning with Semantically Imagined Experience, 22 Feb 2023 , paper link
GenSim GenSim: Generating Robotic Simulation Tasks via Large Language Models, 2 Oct 2023, paper link
RoboGen RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation, 2 Nov 2023, paper link
RT-Trajectory RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches, 3 Nov 2023, paper link

Robotic Foundation Models

Single-Purpose

Action Generation

ZeST Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?, 23 Apr 2022 , paper link
Behavior Transformers Behavior Transformers: Cloning k modes with one stone, 22 Jun 2022, paper link
ATLA Leveraging Language for Accelerated Learning of Tool Manipulation, 27 Jun 2022, paper link
LATTE LATTE: LAnguage Trajectory TransformEr, 4 Aug 2022, paper link
Perceiver-Actor Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation, 12 Sep 2022, paper link
MVP Real-World Robot Learning with Masked Visual Pre-training, 6 Oct 2022, paper link
GNM GNM: A General Navigation Model to Drive Any Robot, 7 Oct 2022 , Paper Link
Interactive Language Interactive Language: Talking to Robots in Real Time, 12 Oct 2022 , paper link
Conditional Behavior Transformers (C-BeT)From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data, 18 Oct 2022, paper link
STAP STAP: Sequencing Task-Agnostic Policies, 21 Oct 2022, paper link
LILA LILA: Language-Informed Latent Actions, 31 Oct 2022, paper link
DIAL Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models, 21 Nov 2022 , paper link
RT-1 RT-1: Robotics Transformer for Real-World Control at Scale, Dec 2022, paper link
MOO Open-World Object Manipulation using Pre-Trained Vision-Language Models, 2 Mar 2023, paper link
RC-1 Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?, 31 Mar 2023 , paper link
CoTPC Chain-of-Thought Predictive Control, 3 Apr 2023 , paper link
ARNOLD ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes, 9 Apr 2023, paper link
Optimus Imitating Task and Motion Planning with Visuomotor Transformers, 25 May 2023, paper link
RoboCat RoboCat: A self-improving robotic agent, 20 Jun 2023, paper link
Scaling Up and Distilling Down Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition, 26 Jul 2023, paper link
ViNT ViNT: A Foundation Model for Visual Navigation, 26 Jun 2023, Paper Link
RT-2 RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, 28 Jul 2023, paper link
RoboAgent RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking, 5 Sep 2023, paper link
RT-X Open X-Embodiment: Robotic Learning Datasets and RT-X Models, 13 Oct 2023, paper link
Q-Transformer Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, 18 Sept 2023, paper link
On Bringing Robots Home, 27 Nov 2023, paper link
Octo Octo: An Open-Source Generalist Robot Policy, 14 Dec 2023, paper link

General-Purpose

GATO A Generalist Agent, 12 May 2022, paper link
PACT PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training, 22 Sep 2022, paper link
PALM-E PaLM-E: An Embodied Multimodal Language Model, 6 Mar 2023, paper link
LEO An Embodied Generalist Agent in 3D World, 18 Nov 2023, paper link

Related Surveys and repositories

Robotics surveys

Reinforcement Learning in Robotics: A Survey, 2013, paper link
A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms, 2021, paper link
How to train your robot with deep reinforcement learning: lessons we have learned, 2021, paper link

Foundation models surveys

On the Opportunities and Risks of Foundation Models, 2021, paper link
Foundation Models for Decision Making: Problems, Methods, and Opportunities, 2023, paper link
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond, 2023, paper link
Challenges and Applications of Large Language Models, 2023, paper link
A Survey on Large Language Model based Autonomous Agents, 2023, paper link

Foundation models and robotics

Awesome-LLM-Robotics repo link
Foundation Models in Robotics: Applications, Challenges, and the Future, 2023, paper link

BibTex

If you find our survey paper helpful, please kindly consider citing us:

@misc{hu2023robofm,
      title={Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis}, 
      author={Yafei Hu and Quanting Xie and Vidhi Jain and Jonathan Francis and Jay Patrikar and Nikhil Keetha and Seungchan Kim and Yaqi Xie and Tianyi Zhang and Shibo Zhao and Yu-Quan Chong and Chen Wang and Katia Sycara and Matthew Johnson-Roberson and Dhruv Batra and Xiaolong Wang and Sebastian Scherer and Zsolt Kira and Fei Xia and Yonatan Bisk},
      year={2023},
      eprint={2312.08782},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

yangjin-kai / robotics-fm-survey Goto Github PK

robotics-fm-survey's Introduction

Paper List in the survey paper

Foundation models used in Robotics

Perception

Task Planning

Action Generation

Data Generation

Robotic Foundation Models

Single-Purpose

Action Generation

General-Purpose

Related Surveys and repositories

Robotics surveys

Foundation models surveys

Foundation models and robotics

BibTex

robotics-fm-survey's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org