Topic: vqa Goto Github
Some thing interesting about vqa
Some thing interesting about vqa
vqa,Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems
User: abachaa
vqa,Visual Question Answering in the Medical Domain VQA-Med 2019
User: abachaa
vqa,OmniFusion — a multimodal model to communicate using text and images
Organization: airi-institute
vqa,[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
User: antoyang
Home Page: https://arxiv.org/abs/2206.08155
vqa,[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
User: antoyang
Home Page: https://arxiv.org/abs/2012.00451
vqa,Pytorch implementation of NIPS 2017 paper "Modulating early visual processing by language"
User: ap229997
vqa,This project is out of date, I don't remember the details inside...
User: asdf0982
vqa,北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答(KBQA),基于文本的问答系统(TextQA),基于表格的问答系统(TableQA)、基于视觉的问答系统(VisualQA)和机器阅读理解(MRC)等,每类任务分别对学术界和工业界进行了相关总结。
Organization: bdbc-kg-nlp
vqa,Visual Question Answering in Pytorch
User: cadene
vqa,NeurIPS 2019 Paper: RUBi : Reducing Unimodal Biases for Visual Question Answering
User: cdancette
vqa,[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph
Organization: china-uk-zsl
Home Page: https://arxiv.org/abs/2107.05348
vqa,Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering
User: chingyaoc
vqa,CloudCV Visual Question Answering Demo
Organization: cloud-cv
Home Page: https://vqa.cloudcv.org/
vqa,Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Organization: cvlab-tohoku
vqa,Strong baseline for visual question answering
User: cyanogenoid
vqa,PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
User: davidmascharka
Home Page: https://arxiv.org/abs/1803.05268
vqa,PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People
User: denisdsh
vqa,A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Organization: facebookresearch
Home Page: https://mmf.sh/
vqa,[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
User: fuxiaoliu
Home Page: https://fuxiaoliu.github.io/LRV/
vqa,An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
User: hengyuan-hu
vqa,[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
User: hila-chefer
vqa,[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
User: jayleicn
Home Page: https://arxiv.org/abs/2102.06183
vqa,Hadamard Product for Low-rank Bilinear Pooling
User: jnhwkim
vqa,A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
User: jokieleung
vqa,Code for ICML 2019 paper "Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering" [long-oral]
User: kdexd
Home Page: https://kdexd.github.io/probnmn-clevr
vqa,Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
User: linjieli222
Home Page: https://arxiv.org/abs/1903.12314
vqa,Oscar and VinVL
Organization: microsoft
vqa,A lightweight, scalable, and general framework for visual question answering research
Organization: milvlg
vqa,ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Organization: milvlg
vqa,The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Organization: nvlabs
Home Page: https://shikun.io/projects/prismer
vqa,Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks
Organization: open-compass
Home Page: https://rank.opencompass.org.cn/leaderboard-multimodal
vqa,InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Organization: opengvlab
Home Page: https://igpt.opengvlab.com
vqa,Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Organization: opengvlab
vqa,Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models
Organization: pairlab
Home Page: https://slotformer.github.io/
vqa,Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
User: peteanderson80
Home Page: http://panderson.me/up-down-attention/
vqa,Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Organization: stanfordnlp
vqa,Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
User: thaolmk54
vqa,Using pretrained encoder and language models to generate captions from multimedia inputs.
User: theocoombes
vqa,PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
User: vacancy
Home Page: http://nscl.csail.mit.edu
vqa,Bottom-up features extractor implemented in PyTorch.
User: violetteshev
vqa,[IEEE TIP'2021] "UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content", Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik
User: vztu
Home Page: https://ieeexplore.ieee.org/document/9405420
vqa,读过的CV方向的一些论文,图像生成文字、弱监督分割等
User: wangleihitcs
vqa,Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models
User: wuziyi616
Home Page: https://slotdiffusion.github.io/
vqa,mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Organization: x-plug
Home Page: https://arxiv.org/abs/2205.12005
vqa,mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Organization: x-plug
vqa,[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
User: yuleiniu
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.