Giter Site home page Giter Site logo

linerhome / attngan-zsl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from olegkhomenko/attngan-zsl

0.0 0.0 0.0 38.49 MB

Combining "A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts" and "AttnGAN"

Python 99.92% Shell 0.08%

attngan-zsl's Introduction

ZSL-AttnGAN

Using GAZSL from A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts [1] in AttnGAN [2] model.

Problem statements

One wants to use noisy texts in AttnGAN to generate birds. This yields the following challenges:

  • sentences are of arbitrary length
  • not all sentences are relevant

The original model may be improved with GAN for Zero-shot learning [1] to generate better (in any sence) text and image embeddings.

Possible solutions

In the original AttnGAN model, there are two types of text-modality embeddings: Word-level embeddings and Sentence-level embeddings (an average over all words). Hence, there are several ways how we can use ZSL within the model:

  1. Use ZSL encoder (text/image) to summarize text, image into a matrix of embeddings (pseudo-words), or word-level embeddings (fully replace original encoder). Matching distribution between words and matrix of image features
  2. Use ZSL encoder (text/image) on top of encoded words and matrix of image features. ZSL_ENC: W -> emb
  3. Use ZSL encoder (text/image) on top of sentence embedding and sentence of image features. ZSL_ENC: W.mean() -> emb
  4. Incorporate ZSL approach in the training phase, not pretraining one

Implemented solution

To keep things simple I've built ZSL Generator and ZSL Discriminator on top of average embeddings obtained from the original encoders architecture (ZSL_ENC: W.mean() -> emb). The overall task is to introduce adversarial and classification loss via the discriminative model into an optimization objective.

  • Because in the original ZSL Paper WGAN + GP is used it is usually recommended to update Discriminator more frequently than Generator (5:1 in the original ZSL code), for the sake of simplicity I ignore that fact and use the vanilla adv. loss function.
  • Visual pivot regularization from the original paper is dropped
  • We can omit KL-Loss because we have introduced stochastic part in Sentence Embedder (ZSL part uses z ~ N)
  • There may be some bugs, need to perform further checks

TODO:

  • Choose better adv. loss function and hacks from other papers
  • Try to incorporate ZSL for word-level embeddings (ZSL_ENC: W -> emb) and for training step (not pretraining one)
  • Test eval
  • Tune hyperparams
  • TODOs in the code

Results:

  • DASSM model was trained for 200 epochs. However it was only one run, so probably there may be bugs.
  • The main model is training, however it's required to tune lambdas and smoothing factors to balance components of the objective function (see image below)

Weights

Yandex.Disk

attngan-zsl's People

Contributors

olegkhomenko avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.