Topic: vqa Goto Github

Some thing interesting about vqa

👇 Here are 215 public repositories matching this topic...

abachaa / existing-medical-qa-datasets

vqa,Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems

User: abachaa

medical-qa-datasets qa question-answering nlp bionlp medical-qa medical-informatics vqa computer-vision datasets

abachaa / vqa-med-2019

vqa,Visual Question Answering in the Medical Domain VQA-Med 2019

User: abachaa

vqa vqa-dataset medical-imaging radiology nlp imageclef vqa-med

airi-institute / omnifusion

vqa,OmniFusion — a multimodal model to communicate using text and images

Organization: airi-institute

large-language-models multimodal visual-encoding ai-assistant transformer vcr vqa

antoyang / frozenbilm

vqa,[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

User: antoyang

Home Page: https://arxiv.org/abs/2206.08155

multimodal-learning video-understanding vqa weakly-supervised-learning large-language-models pre-training video-question-answering videoqa vision-and-language visual-question-answering

antoyang / just-ask

vqa,[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

User: antoyang

Home Page: https://arxiv.org/abs/2012.00451

vqa visual-question-answering videoqa video-question-answering video-understanding question-generation weakly-supervised-learning vision-and-language pre-training multimodal-learning

ap229997 / conditional-batch-norm

vqa,Pytorch implementation of NIPS 2017 paper "Modulating early visual processing by language"

User: ap229997

pytorch cbn modulated-resnet vqa

asdf0982 / vqa-mfb.pytorch

vqa,This project is out of date, I don't remember the details inside...

User: asdf0982

vqa bilinear-pooling deep-learning

bdbc-kg-nlp / qa-survey-cn

vqa,北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答（KBQA），基于文本的问答系统（TextQA），基于表格的问答系统（TableQA）、基于视觉的问答系统（VisualQA）和机器阅读理解（MRC）等，每类任务分别对学术界和工业界进行了相关总结。

Organization: bdbc-kg-nlp

survey nlp question-answering kbqa cqa qa-survey vqa tqa qa

cadene / vqa.pytorch

vqa,Visual Question Answering in Pytorch

User: cadene

vqa deep-learning resnet skipthoughts pytorch clevr coco torch vgenome

cdancette / rubi.bootstrap.pytorch

vqa,NeurIPS 2019 Paper: RUBi : Reducing Unimodal Biases for Visual Question Answering

User: cdancette

vqa bias deep-learning pytorch bias-reduction

china-uk-zsl / zs-f-vqa

vqa,[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph

Organization: china-uk-zsl

Home Page: https://arxiv.org/abs/2107.05348

vqa zero-shot knowledge-graph commonsense commonsense-reasoning visual-question-answering zsl fvqa zs-f-vqa

chingyaoc / awesome-vqa

vqa,Visual Q&A reading list

User: chingyaoc

vqa papers arxiv

chingyaoc / vqa-tensorflow

vqa,Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering

User: chingyaoc

tensorflow vqa question-answering

cloud-cv / vqa

vqa,CloudCV Visual Question Answering Demo

Organization: cloud-cv

Home Page: https://vqa.cloudcv.org/

vqa vqa-dataset machine-learning artificial-intelligence

cvlab-tohoku / dense-coattention-network

vqa,Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

Organization: cvlab-tohoku

deep-learning vqa dense-coattn-network dense-symmetric-co-attention

cyanogenoid / pytorch-vqa

vqa,Strong baseline for visual question answering

User: cyanogenoid

pytorch vqa visual-question-answering baseline

davidmascharka / tbd-nets

vqa,PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

User: davidmascharka

Home Page: https://arxiv.org/abs/1803.05268

machine-learning pytorch visualization deep-learning visual-question-answering vqa neural-networks

denisdsh / vizwiz-vqa-pytorch

vqa,PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People

User: denisdsh

vizwiz vqa pytorch visual-question-answering

facebookresearch / mmf

vqa,A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Organization: facebookresearch

Home Page: https://mmf.sh/

pytorch vqa pretrained-models multimodal deep-learning captioning dialog textvqa hateful-memes multi-tasking

fuxiaoliu / lrv-instruction

vqa,[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

User: fuxiaoliu

Home Page: https://fuxiaoliu.github.io/LRV/

evaluation gpt-4 hallucination object-detection vision vqa llama vicuna llava gpt

hengyuan-hu / bottom-up-attention-vqa

vqa,An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

User: hengyuan-hu

vqa pytorch bottom-up-attention

hila-chefer / transformer-mm-explainability

vqa,[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

User: hila-chefer

transformers transformer vqa detr visualization explainability explainable-ai interpretability lxmert visualbert

jayleicn / clipbert

vqa,[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

User: jayleicn

Home Page: https://arxiv.org/abs/2102.06183

pytorch video-retrieval video-question-answering vqa vision-and-language cvpr2021

jnhwkim / mullowbivqa

vqa,Hadamard Product for Low-rank Bilinear Pooling

User: jnhwkim

mlb hadamard-product vqa question-answering iclr2017

jokieleung / awesome-visual-question-answering

vqa,A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

User: jokieleung

awesome-list vqa attention-networks multi-modal multi-modal-learning

kdexd / probnmn-clevr

vqa,Code for ICML 2019 paper "Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering" [long-oral]

User: kdexd

Home Page: https://kdexd.github.io/probnmn-clevr

icml icml-2019 probabilistic-models neural-module-networks vqa clevr

linjieli222 / vqa_regat

vqa,Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"

User: linjieli222

Home Page: https://arxiv.org/abs/1903.12314

pytorch attention vqa

microsoft / oscar

vqa,Oscar and VinVL

Organization: microsoft

vision-and-language pre-training image-captioning vqa image-text-search oscar vinvl

milvlg / openvqa

vqa,A lightweight, scalable, and general framework for visual question answering research

Organization: milvlg

visual-question-answering vqa pytorch deep-learning benchmark

milvlg / rosita

vqa,ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Organization: milvlg

vision-and-language vqa pre-training image-text-retrieval referring-expression-comprehension

nvlabs / prismer

vqa,The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Organization: nvlabs

Home Page: https://shikun.io/projects/prismer

image-captioning language-model multi-modal-learning multi-task-learning vision-language-model vision-and-language vqa

nvlabs / relvit

vqa,[ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning

Organization: nvlabs

iclr2022 pytorch vqa hico-det visual-reasoning

open-compass / vlmevalkit

vqa,Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks

Organization: open-compass

Home Page: https://rank.opencompass.org.cn/leaderboard-multimodal

gpt-4v large-language-models llava multi-modal openai vqa llm openai-api mplug-owl qwen

opengvlab / interngpt

vqa,InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Organization: opengvlab

Home Page: https://igpt.opengvlab.com

chatgpt foundation-model gpt gpt-4 gradio husky image-captioning langchain llm multimodal vqa internimage llama vicuna video-generation sam segment-anything click imagebind draggan

opengvlab / multi-modality-arena

vqa,Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Organization: opengvlab

chat chatbot chatgpt gradio large-language-models llms vqa multi-modality vision-language-model

pairlab / slotformer

vqa,Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models

Organization: pairlab

Home Page: https://slotformer.github.io/

computer-vision deep-learning dynamics-model planning pytorch video-prediction vqa

peteanderson80 / bottom-up-attention

vqa,Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

User: peteanderson80

Home Page: http://panderson.me/up-down-attention/

vqa visual-question-answering captioning-images faster-rcnn caffe image-captioning mscoco mscoco-dataset

stanfordnlp / mac-network

vqa,Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)

Organization: stanfordnlp

attention clevr machine-reasoning compositional-attention-networks tensorflow question-answering vqa

thaolmk54 / hcrn-videoqa

vqa,Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)

User: thaolmk54

tgif-qa videoqa question-answering vqa

theocoombes / clipcap

vqa,Using pretrained encoder and language models to generate captions from multimedia inputs.

User: theocoombes

audio-captioning encoder-decoder image-captioning language-model vision-transformer vqa

vacancy / nscl-pytorch-release

vqa,PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).

User: vacancy

Home Page: http://nscl.csail.mit.edu

concept-learning neuro-symbolic-learning vqa

violetteshev / bottom-up-features

vqa,Bottom-up features extractor implemented in PyTorch.

User: violetteshev

vqa visual-question-answering faster-rcnn feature-extraction pytorch

vztu / videval

vqa,[IEEE TIP'2021] "UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content", Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik

User: vztu

Home Page: https://ieeexplore.ieee.org/document/9405420

dataset evaluation feature-extraction bvqa-model vqa iqa video-quality-assessment image-quality-assessment user-generated-content