Light

om-ai-lab / omdet Goto Github PK

View Code? Open in Web Editor NEW

1.5K 1.5K 141.0 10.01 MB

Real-time and accurate open-vocabulary end-to-end object detection

License: Apache License 2.0

Python 100.00%

object-detection open-vocabulary vision-and-language zero-shot-object-detection

omdet's People

Contributors

Stargazers

Watchers

Forkers

nemonameless zaku-zaku s8xy iam20cm farmingtong closegoingaway moguijoe billionerd masemxiao maigone paramedick wensiyuansix e-kiss-me vamoko spicyguml hs991023 minisoco mistyr0se eltociear n0wwa fskeo lycokie ntt720 monsterdove coder-drinker molierflower xupercoin whuhxb hay-man obsidian6s nicbair tutuna d3p10y cerviny windb3ll tufo830 jbluv awekling qutterfly garnue piapplepi tang-juan nanpusher excelisa besi1z titalk zemire97 lycosine plaid3 ymzhang96 w90o0u xiao2duan twacoco paoyes luozhe023 qugou1350636 jinyi-sama aimogmog commachan nap1ch kamifr raymusk ai2047 zshpro stlkoch bartslab leonz87 tqcheung err-nil jtt1998 skillcampalan yetaye alexyiy nielsrogge wongli233 lt6253090 rongtongxueya bitfact yonokawa sparkcus xuyu67 quantumira hui13579246 reikolo xigua369 zeozez looongarch jarreleye coolume halfloat hx621 poyexe jingxio picnicode elvistai meanchen shuaibibobo f2wong ririkoa van224

omdet's Issues

Has this work been accepted in any peer-reviewed conference?

A quick question. Has this work been accepted in any peer-reviewed conference? like ECCV24?

Thank you so much.

关于预测和训练

您好作者，我对您的研究非常感兴趣，但是我遇到了一些问题，需要您的解答：
1.请问一下在使用模型预测时能否不设定词汇表,像yolo-world那样直接预测图片。或者说我如何查看clip的词汇表里面是否有我需要检测的目标？
2.能否使用自己的预训练clip权重并使用自己的数据集训练OmDet模型吗？
非常感谢解答！

没有base64模式？

根据OmAgent设置好了OmDet但是在测试的时候发现目前这个版本的OmDet并不支持base64模式的数据传输？

Running this model on CPU?

Would it be crazy to try to run the turbo model on a CPU? Or is there a tiny version that could be run on CPU?

onnx convert error

onnx convert error

About OmDet-Turbo-Base

It is glad to see your work OmDet-Turbo that I have used GroundingDINO for so long. It seems on some datasets, OmDet-Turbo-Base is competitive. However, currently it seems only the Swin-Tiny-based version is released. How about OmDet-Turbo-Base ?

Model Conversion

Hi, I am trying to convert the .py model to coreML form , but couldn't succeeded and then after converting the model successfully to ONNNX form , still can't convert the model in coreML to use in an IOS application. Kindly direct me to do the same.

Thanks and Regards

请问有TensorRT版本的GroundingDino可以分享吗？

碰巧看到您有GroundingDino导出为TensorRT的分享
想问一下您那边有FP32 的tensorRT版本的GroundingDino分享一下吗？

question in "Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head"

Hello, author, your work is excellent. I have a question about tensorrt-groundingdino on your paper "Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head", I can only convert gd in tensorrt-fp32 by trtexec which is around 90ms on A100, could you share your menthod on how to convert gd to tensorrt? Thank you.

OinW35@mAP=30.1

关于Oinw35@mAP=30.1有些疑问：
1、在OmDetv2中(2024的IET CV)这篇文章中，ConvNext-B的性能是20.9，此篇文章性能为30.1(模型较小但是SOTA)，性能差异较大，是否出现了笔误？

A huge question about the zero shot results in the paper

你们论文中提到使用的预训练数据是o365, goldG, hake, hoi A, PhraseCut，你确定这些数据能训练出来coco零样本57.1，53.4？按照我的训练经验，零样本指标会比训练数据中添加coco训练低8个点，那你们加点coco训练，分分钟coco 第一。

此外你们表二里又显示coco零样本显著不如Grounding DINO，我不认为swint换成convnextB或者多的三个数据集hake, hoi A, PhraseCut能带来10个点以上的提升。因此，我怀疑你们实际使用的预训练数据并不和论文中描述一致，并且可能在预训练过程中泄露了coco数据

About the requirement in install.md

Hi,

I am trying to try the inference demo following the install.dm. It seems the 'requirements.txt' file is missing. Could you please upload the 'requirements.txt' ? Thank you so much.

Is there a way to export onnx similar to yolo-world

nice work! Is there a way to export onnx both text backbone and omdet, similar to yolo-world?

Integrating OmDet Turbo in Transformers 🤗

Hi Om people!

I am an MLE at Hugging Face, and given the popularity and performance of your model, we wanted to see if you would be interested in working with us to integrate OmDet Turbo into the Transformers 🤗 library. Looking forward to hearing back from you!

Best,
Yoni

Pretrain Consumption (Pretrain Cost)

How long does the pre-train take? I see you use 16 A100 and I want to know the approximate training time . Thanks.
And further more, I can't find the training code from scratch.

Batched MultiHeadAttention

Hi! Yoni from Hugging Face again.
I'm opening a separate issue because there seems to be a potentially important problem in the model's encoder.

OmDet/omdet/omdet_v2_turbo/ela_encoder.py

Line 27 in 542ce97

self.self_attn = torch.nn.MultiheadAttention(d_model, nhead, attn_dropout)

Shouldn't this MultiHeadAttention be initialized with batch_first=True, as the inputs of the self_attn layer are of the shape (batch_size, ...)? This causes inconsistencies when using the model for batch inference.

Thanks for your consideration!

How to convert to onnx

awesome work!!
can you provide sample code to convert pytorch to onnx ？

train code

hi, this is a good job.
do you have the plan to release the train code???

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.