Giter Site home page Giter Site logo

sagar23sj / deep_transfer_learning_nlp_dhs2019 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dipanjans/deep_transfer_learning_nlp_dhs2019

0.0 1.0 0.0 23.55 MB

Contains the code and deck for the presentation on Applying Deep Transfer Learning for NLP in Analytics Vidhya's DataHack Summit 2019

License: GNU General Public License v3.0

Jupyter Notebook 100.00%

deep_transfer_learning_nlp_dhs2019's Introduction

Applying Deep Transfer Learning for Natural Language Processing (NLP)

Handling tough real-world problems in Natural Language Processing (NLP) include tackling with class imbalance and the lack of availability of enough labeled data for training. Thanks to the recent advancements in deep transfer learning in NLP, we have been able to make rapid strides in not only tackling these problems but also leverage these models for diverse downstream NLP tasks.

The intent of this hack session is two-fold, we will first look at various SOTA models in deep transfer learning for NLP with hands-on examples and then talk about how these models were used in a real-world industry use-case around proactive detection of security vulnerabilities.

Part 1 - Deep Transfer Learning Techniques for NLP

In this first part of this hands-on hack session, we will take a trip through the various advances in deep transfer learning for NLP including the following:

  • Pre-trained word embeddings for Deep Learning Models (FastText with CNNs\Bi-directional LSTMs + Attention)
  • Universal Embeddings (Sentence Encoders, NNLMs)
  • Transformers (BERT, DistilBERT)

We will take a benchmark classification dataset and train and compare the performance of these models. All examples will be showcased using Python and leveraging the latest and best of TensorFlow 2.0.

Part 2 - Industry Case Study: Proactive Identification of Software Dependency Vulnerabilities

The second part of this hack session will briefly cover a real-world industry use case around proactive detection of security vulnerabilities in software. The idea here is that open-source and third-party libraries (dependencies) can often cost any enterprise dearly since they are not often aware of potential vulnerabilities which might be present in these dependencies. Can we leverage deep learning to proactively find out and flag dependencies having a sign of a potential vulnerability before it becomes a serious issue (Example: the requests library from python was one of the most vulnerable dependencies in the recent past which a lot of developers were not even aware of!).

This solution uses state-of-the-art deep learning models in NLP like BERT to go through public data including GitHub events data, Bugzilla, Mailing list conversations to predict probable security vulnerabilities. This should give the audience an idea of how we leveraged deep transfer learning for NLP in a very unique domain and also tackle problems like extreme class imbalance.

Key Takeaways from this Hack Session

  • Learn to train and fine-tune pre-trained SOTA models including BERT and DistilBERT for downstream NLP tasks like classification
  • Examples showcased using the latest and best in TensorFlow 2.0, TF-Hub and the excellent Transformers framework
  • Learn about a real-world industry use-case on predicting software dependency vulnerabilities using these techniques

Hack Session Examples Powered By

deep_transfer_learning_nlp_dhs2019's People

Contributors

dipanjans avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.