Giter Site home page Giter Site logo

om-ai-lab / omdet Goto Github PK

View Code? Open in Web Editor NEW
1.5K 1.5K 141.0 10.01 MB

Real-time and accurate open-vocabulary end-to-end object detection

License: Apache License 2.0

Python 100.00%
object-detection open-vocabulary vision-and-language zero-shot-object-detection

omdet's People

Contributors

eltociear avatar hx621 avatar nielsrogge avatar nxf1111 avatar p3ngliu avatar snakeztc avatar xeonhis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

omdet's Issues

关于预测和训练

您好作者,我对您的研究非常感兴趣,但是我遇到了一些问题,需要您的解答:
1.请问一下在使用模型预测时能否不设定词汇表,像yolo-world那样直接预测图片。或者说我如何查看clip的词汇表里面是否有我需要检测的目标?
2.能否使用自己的预训练clip权重并使用自己的数据集训练OmDet模型吗?
非常感谢解答!

没有base64模式?

根据OmAgent设置好了OmDet但是在测试的时候发现目前这个版本的OmDet并不支持base64模式的数据传输?

Running this model on CPU?

Would it be crazy to try to run the turbo model on a CPU? Or is there a tiny version that could be run on CPU?

About OmDet-Turbo-Base

It is glad to see your work OmDet-Turbo that I have used GroundingDINO for so long. It seems on some datasets, OmDet-Turbo-Base is competitive. However, currently it seems only the Swin-Tiny-based version is released. How about OmDet-Turbo-Base ?

Model Conversion

Hi, I am trying to convert the .py model to coreML form , but couldn't succeeded and then after converting the model successfully to ONNNX form , still can't convert the model in coreML to use in an IOS application. Kindly direct me to do the same.

Thanks and Regards

OinW35@mAP=30.1

关于Oinw35@mAP=30.1有些疑问:
1、在OmDetv2中(2024的IET CV)这篇文章中,ConvNext-B的性能是20.9,此篇文章性能为30.1(模型较小但是SOTA),性能差异较大,是否出现了笔误?

A huge question about the zero shot results in the paper

1716861752327
你们论文中提到使用的预训练数据是o365, goldG, hake, hoi A, PhraseCut,你确定这些数据能训练出来coco零样本57.1,53.4?按照我的训练经验,零样本指标会比训练数据中添加coco训练低8个点,那你们加点coco训练,分分钟coco 第一。
1716862155700
此外你们表二里又显示coco零样本显著不如Grounding DINO,我不认为swint换成convnextB或者多的三个数据集hake, hoi A, PhraseCut能带来10个点以上的提升。因此,我怀疑你们实际使用的预训练数据并不和论文中描述一致,并且可能在预训练过程中泄露了coco数据

About the requirement in install.md

Hi,

I am trying to try the inference demo following the install.dm. It seems the 'requirements.txt' file is missing. Could you please upload the 'requirements.txt' ? Thank you so much.

Integrating OmDet Turbo in Transformers 🤗

Hi Om people!

I am an MLE at Hugging Face, and given the popularity and performance of your model, we wanted to see if you would be interested in working with us to integrate OmDet Turbo into the Transformers 🤗 library. Looking forward to hearing back from you!

Best,
Yoni

Pretrain Consumption (Pretrain Cost)

How long does the pre-train take? I see you use 16 A100 and I want to know the approximate training time . Thanks.
And further more, I can't find the training code from scratch.

Batched MultiHeadAttention

Hi! Yoni from Hugging Face again.
I'm opening a separate issue because there seems to be a potentially important problem in the model's encoder.

self.self_attn = torch.nn.MultiheadAttention(d_model, nhead, attn_dropout)

Shouldn't this MultiHeadAttention be initialized with batch_first=True, as the inputs of the self_attn layer are of the shape (batch_size, ...)? This causes inconsistencies when using the model for batch inference.

Thanks for your consideration!

train code

hi, this is a good job.
do you have the plan to release the train code???

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.