Giter Site home page Giter Site logo

hiredscorelabs / tamnun-ml Goto Github PK

View Code? Open in Web Editor NEW
112.0 9.0 10.0 77 KB

An easy to use open-source library for advanced Deep Learning and Natural Language Processing

Jupyter Notebook 51.16% Python 48.38% Makefile 0.46%
deep-learning natural-language-processing machine-learning transfer-learning

tamnun-ml's Introduction

Tamnun ML

PyPI pyversions CircleCI

tamnun is a python framework for Machine and Deep learning algorithms and methods especially in the field of Natural Language Processing and Transfer Learning. The aim of tamnun is to provide an easy to use interfaces to build powerful models based on most recent SOTA methods.

For more about tamnun, feel free to read the introduction to TamnunML on Medium.

Getting Started

tamnun depends on several other machine learning and deep learning frameworks like pytorch, keras and others. To install tamnun and all it's dependencies run:

$ git clone https://github.com/hiredscorelabs/tamnun-ml
$ cd tamnun-ml
$ python setup.py install

Or using PyPI:

pip install tamnun

Jump in and try out an example:

$ cd examples
$ python finetune_bert.py

Or take a look at the Jupyer notebooks here.

BERT

BERT stands for Bidirectional Encoder Representations from Transformers which is a language model trained by Google and introduced in their paper. Here we use the excellent PyTorch-Pretrained-BERT library and wrap it to provide an easy to use scikit-learn interface for easy BERT fine-tuning. At the moment, tamnun BERT classifier supports binary and multi-class classification. To fine-tune BERT on a specific task:

from tamnun.bert import BertClassifier, BertVectorizer
from sklearn.pipeline import make_pipeline

clf = make_pipeline(BertVectorizer(), BertClassifier(num_of_classes=2)).fit(train_X, train_y)
predicted = clf.predict(test_X)

Please see this notebook for full code example.

Fitting (almost) any PyTorch Module using just one line

You can use the TorchEstimator object to fit any pytorch module with just one line:

from torch import nn
from tamnun.core import TorchEstimator

module = nn.Linear(128, 2)
clf = TorchEstimator(module, task_type='classification').fit(train_X, train_y)

See this file for a full example of fitting nn.Linear module on the MNIST (classification of handwritten digits) dataset.

Distiller Transfer Learning

This module distills a very big (like BERT) model into a much smaller model. Inspired by this paper.

from tamnun.bert import BertClassifier, BertVectorizer
from tamnun.transfer import Distiller

bert_clf =  make_pipeline(BertVectorizer(do_truncate=True, max_len=3), BertClassifier(num_of_classes=2))
distilled_clf = make_pipeline(CountVectorizer(ngram_range=(1,3)), LinearRegression())

distiller = Distiller(teacher_model=bert_clf, teacher_predict_func=bert_clf.decision_function, student_model=distilled_clf).fit(train_texts, train_y, unlabeled_X=unlabeled_texts)

predicted_logits = distiller.transform(test_texts)

For full BERT distillation example see this notebook.

Support

Getting Help

You can ask questions and join the development discussion on Github Issues

License

Apache License 2.0 (Same as Tensorflow)

tamnun-ml's People

Contributors

jondot avatar shudima avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tamnun-ml's Issues

Tamnun-ml for multilabel classifier

Hi,
I looked over the notebook for one label classifier, can you explain how to use this method for multi label classifier? or save the probabilities for each label?
Thank you!

training on gpu, predicting on cpu

Hi,
I tried predicting with the trained (pretrained+fine tuned) model using a server on a docker container, but had a problem.
looks like the model can only predict in a GPU environment (and indeed I managed to upload a local server on my machine).
I tried both loading the model with pickle and loading the model using torch.load with map_device = 'cpu', but still it had an error whenever i tried to predict.

Support for other pretrained BERT models

Thanks for the great module, it seems to work right out of the box!

Is there a way to specify a different pretrained model other than those listed in here?

It would be great to use bioBERT or sciBERT, the latter of which has a better vocabulary. A quick look suggest that you're using pytorch_transformers.BertTokenizer which seems to have methods for custom calls.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.