Comments (1)
Date: 2024-04-13
谜一样的多模态大模型
尽管目前最先进的模型,如 Gemini 1.5 Pro、gpt-4-turbo和claude-3-opus-20240229,在面对 hCAPTCHA 的多模态挑战时,还无法仅依靠单步提示就能顺利解决。
出于实验研究的目的,我们搭建了一个简易的LangGraph有向无环状态机。这个模型使用了一点标注的数据集,并通过问答的形式来辅助识别和整理输出结果。
引入“人在回路中”(human-in-the-loop)的方法就像给了答案提示一样。例如,把“the odd one out”直接翻译成“wolf”,然后用边界框标出所有目标并加上序号。这样做的目的是帮助模型更好地理解和处理任务。
潦草的日志与阶段性结论
于是,一个简洁的提示词模板就形成了:<[challenge-prompt] | [bounding-box description] | [output-parser]>
然而,尽管这种指导已经很直接了,LVM(Large Vocabulary Models)在处理风格化任务时仍然不尽人意。或者说,虽然它能够处理,但想要达到传统监督学习模型那种高度定制化、轻量级且易于部署的效果,并同时保持高精度,这还是不太现实的。
目前可用的数据集规模还相对较小,我尚未进行过更严格的基准测试,但从直觉来看,LVM 在提示词引导下的表现似乎连基本水平都未达到——至少与当前最先进的图像分类和目标检测模型相比是这样的。
预见性与黎明曙光
最近由于工作需求,我接触到了 Devika 和 Web_Voyager 这两个令人惊叹的示例项目。
这让我预见到一种未来可能性(虽然我在去年4月就已提出),即在不久后,由“大型语言模型”驱动的系统将能够学习或模拟人类行为,导致所有基于交互的 CAPTCHA 变得无效。更进一步,利用“大模型”攻破这类验证系统的成本将会极低,无需如现今这般构建复杂的提示链路工作流。
from hcaptcha-challenger.
Related Issues (20)
- [Challenge] Please click each image containing a pirate ship HOT 1
- [Challenge] [retrain] please click on the most similar object to the following reference shape
- [Challenge] [retrain] Please click on the object that appears only once
- feat(components): zero-shot object detection
- feat(control): nested CLIP candidates
- Can you give me an example code on using hcaptcha-challenger with selenium
- [Question] The project is failing 90% of the time because of new challenges HOT 15
- [Challenge] Please find and click on the letter K/W in the image below.
- [Challenge] Pleasae click at the star
- [Challenge] Please click each image containing a bumblebee
- Browserless Challenges HOT 1
- Solved
- google colab error
- url 'https://api.hcaptcha.com/getcaptcha/' returning base64 instead json HOT 8
- bug: workflow post-action
- [Challenge]
- feat(cloudflare-edge-node): FastAPI Python Workers
- feat(schema): Estimate Bottle Orientation with Computer Vision
- Solving captcha on queries
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hcaptcha-challenger.