Giter Site home page Giter Site logo

vlm_arm's Introduction

机械臂+大模型+多模态=人机协作具身智能体

作者:同济子豪兄

相关视频

机械臂接入GPT4o大模型,秒变多模态AI贾维斯:https://www.bilibili.com/video/BV18w4m1U7Fi

听得懂人话、看得懂图像、指哪打哪的机械臂是怎么炼成的:https://www.bilibili.com/video/BV1Cn4y1R7V2

同济子豪兄亚马逊**峰会演讲:多模态生成式AI的N种玩法:https://www.bilibili.com/video/BV1Pi421U7D6

首发实测!百度文心大模型4.0 Turbo接入机械臂智能体:https://www.bilibili.com/video/BV16M4m1m7Z1

【通义灵码】AI帮我啃祖传代码是什么体验?:https://www.bilibili.com/video/BV1Qz421i7Nd

原理

原理图1-压缩

目标:听人话、看图像、找坐标、排动作、定格式

智能体Agent大语言模型:Yi-Large、Claude 3 Opus、文心大模型4.0 Turbo

多模态视觉理解大模型:GPT4v、GPT4o、Yi-Vision、Claude 3 Opus、智谱CogVLM2-Grounding、通义千问Qwen-VL-Max

机械臂及配件

机械臂:大象机器人Mycobot 280 Pi

开发板:树莓派4B Ubuntu 20.04

配件:摄像头法兰、吸泵

购买同款:淘宝搜大象机器人,报子豪兄粉丝可以打九五折

注意事项

复现教程:https://njapov1vnz.feishu.cn/docx/Qosedmc5NoYK7IxVoMBcD47jn9b?from=from_copylink

开机教程:https://njapov1vnz.feishu.cn/docx/SJQXdIWfUo85HjxXyEBc0Wpfnqc?from=from_copylink

  • 需要安装配置python3.12环境及所需工具包
  • 需要把API_KEY.py中的KEY换成你自己的KEY
  • 需要确认麦克风ID和扬声器设备
  • 需要确认摄像头和语音正常

特别感谢

零一万物 马诺

百度飞桨 刘聪琳

大象机器人售后技术支持团队

恒之未来 宋佩恒

能源算力融合(哈密)研究院算力与计算服务研究所 杨耀东

上海人工智能实验室 李佳伦

华中科技大学网络空间安全学院TAI团队 章航韬

渭南师范学院 田文博

vlm_arm's People

Contributors

tommyzihao avatar

Stargazers

Zhang Zhonghao avatar SimonLiu avatar  avatar  avatar Shuai wang avatar  avatar Yaohui avatar  avatar  avatar  avatar tensorgit2021 avatar www.DigCore.cn avatar edenl avatar  avatar  avatar XueYouguang avatar 白枫 avatar Joshua avatar  avatar YI avatar Chenyu Lyu avatar Katuwawala avatar  avatar dianye avatar  avatar  avatar  avatar  avatar  avatar  avatar 杨昊之 avatar  avatar  avatar seven avatar  avatar  avatar  avatar VIVA avatar  avatar  avatar amadman avatar  avatar PeaceWord avatar Dennis_chen avatar  avatar 潘浩淼 avatar  avatar  avatar  avatar  avatar  avatar  avatar Qiu Ziyu avatar  avatar hj avatar wangyl avatar  avatar  avatar  avatar R-M avatar Amazing Superman avatar Qingyuan Shan avatar sam avatar huertong avatar 江慕漓 avatar  avatar  avatar snowball avatar  avatar Songhua Yang avatar Philbert-LX avatar Rongge avatar  avatar  avatar dragon10 avatar  avatar suyu avatar  avatar Eureka avatar LukeLook avatar  avatar  avatar  avatar Jianqiang Shi avatar  avatar  avatar pinghe avatar  avatar  avatar Bowen Weipeng avatar Dong Xianglei avatar  avatar linyi30 avatar 夏佳炜 avatar  avatar ZCQvvv777 avatar  avatar Hiro Nakamoro avatar 水兰 Waterland avatar Kyle avatar

Watchers

YI avatar  avatar  avatar 欧其罐 avatar  avatar

vlm_arm's Issues

What does the FACTOR in post_processing_viz() do and how well does it work?

Great work!

In the examples given in SYSTEM_PROMPT, the coordinates of the bounding box corners is out of range, since the input image is 640*480. And function post_processing_viz() uses a FACTOR equals to 999 to normalize the output coordinates from the LLM, I guess.

However, since the range of coordinates are not specified in SYSTEM_PROMPT, what if the output coordinates are larger than 999? Did this ever happen in your experiments?

And how well does this FACTOR works? For the vl_now_viz.jpg shows the bounding box of Li Yunlong is actually not accurate.

Also, why is the FACTOR set to 999? Is this an empirical design or does it has something to do with the LLM used?

Thank you for your excellent work again and looking forward to your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.