Giter Site home page Giter Site logo

yzc526's Projects

aoanet icon aoanet

Code for paper "Attention on Attention for Image Captioning". ICCV 2019

bottom-up-attention-vqa icon bottom-up-attention-vqa

An updated PyTorch implementation of hengyuan-hu's version for 'Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering'

butd_model icon butd_model

A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.

cv-backbones icon cv-backbones

CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.

m3ae icon m3ae

[MICCAI-2022] This is the official implementation of Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training.

m3ae_public icon m3ae_public

Multimodal Masked Autoencoders (M3AE): A JAX/Flax Implementation

mae icon mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

mae-pytorch icon mae-pytorch

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

models icon models

Models and examples built with TensorFlow

pytorch-image-models icon pytorch-image-models

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

r2 icon r2

[ACL-2021] The official implementation of Cross-modal Memory Networks for Radiology Report Generation.

self-critical.pytorch icon self-critical.pytorch

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

subword-nmt icon subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation

swin-transformer icon swin-transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

unilm icon unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

videocaption icon videocaption

视频的文本摘要(标注),输入一段视频,通过深度学习网络和人工智能程序识别视频主要表达的意思(Input a video output a txt decribing the video)。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.