Giter Site home page Giter Site logo

cs224u_xmtc's Introduction

Extreme Multi-label Text Classification

Overview

Embedding-based method

Extreme multi-label text classification refers to the problem of tagging the most relevant sets of labels to text documents. In XMTC, the number of labels could be extremely large, i.e., thousands or millions. This large number introduces difficulties when solving this problem - data sparsity and computational cost. Many approaches have been introduced to solve these problems in XMTC, which fall into three categories: embedding-based, tree-based, and deep learning method.

Embedding based method

The main idea of this type of approach is to project labels to a lower-dimensional space. Research focus on different method of compression process (mapping original labels to a lower-dimensional space) and decompression process (mapping back to the original high-dimensional space). Approches include compressed sensoring, bloom filtering, SVD (singular value decomposition), SLEEC (Sparse Local Embeddings for Extreme Multi-label Classification), etc.

Tree-based method

Similar to decision tree, it partitions the instance space recursively at each non-lead node. However, instead of selecting one feature, it partitions the instance space by hyperplane. FastXML is a representative method in this category.

Deep learning method

This type of method takes the advantages of taking context information into consideration (while others use bag-of-word). CNN-Kim attempts to apply CNN by concatenate word embeddings of a document, which is analog to an image. XML-CNN adds dynamic max pooling change the objective function s.t. the model is more compatible with XMTC problem.

Implementation

This project is a python implementation of SLEEC and XML-CNN.

SLEEC

  • Compression process
    • Key idea: SLEEC only preserves pariwise distances between only neareset neighbors of label vectors.
    • Label embedding learning: SVP (Singular Value Projection), non-convex optimization
  • Regression
    • Regressor learning: ADMM (Alternating Direction Method of Multipliers)
  • Decompression process
    • kNN search of the nearest neighbors of the predicted labels and adds the label vectors of the knn original label vectors
  • This example notebook illustrates an entire workflow of SLEEC with explainations or see SLEEC code.

XML-CNN

  • Network architecture
    • Word embedding inputs
    • Conv layers with different kernel-size
      • Intuition: kernel_size=2 looks 2 consecutive words
      • Different kernel-size generates multiple feature maps (analog to image color blobs, edges, etc.)
    • Dynamic max pooling
      • Key idea: since the length of documents are variant, the output volume of the previous conv layer cannot be connected to a dense layer. Dynamic pooling makes sure the consistent output shape and select the most important information.
      • k max pooling & k chunk max pooling
    • Bottleneck FC layer
      • Size reduction and dealing with overfitting
    • FC layer
      • Multi-class: softmax, categorical cross entropy
      • Multi-label: sigmoid, binary entropy
  • See XML-CNN code.

Reference

cs224u_xmtc's People

Contributors

ruoxi17 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.