Giter Site home page Giter Site logo

logistic_regression's Introduction

Logistic Regression Classification Baseline

This repository provides a simple logistic regression classification baseline for NLP research in text classification.

Through simple commands, one can:

  • Run random search trials over a variety of LR hyperparameters, including those involving the input representation.
  • Run cross-validation/jackknifing if dev set is not available
  • Run experiments with (possibly stratified) subsamples of the training data
  • Parallelize experiments using gnu parallel
  • Visualize the effect of individual hyperparameters on classification performance

This repository just expects a train.jsonl file, in JSON lines format, each line corresponding to the format {"text":..., "label":...}. You can also supply a dev.jsonl file. If you don't, we will jackknife the training data and report performance metrics over all splits.

Run single experiment

python -m lr.train --train_file data/train.jsonl --dev_file data/dev.jsonl --search_trials 5 --serialization_dir model_logs/lr -o

Run single experiment on (stratified) sampled data

python -m lr.train --train_file data/train.jsonl --dev_file data/dev.jsonl --search_trials 10 --serialization_dir model_logs/sampled_lr --train_subsample 1000 --stratified -o

Run single jackknifing experiment

python -m lr.train --train_file data/train.jsonl --search_trials 10 --jackknife_partitions 3 --save_jackknife_partitions --serialization_dir model_logs/jackknife_lr  --stratified --train_subsample 1000 -o

Evaluate on test data

parallel --ungroup python -m lr.train --train_file data/train.jsonl --dev_file data/dev.jsonl --test_file data/test.jsonl --search_trials 1  --serialization_dir model_logs/ag_lr/exp_{#} --evaluate_on_test -o ::: {1..6}

Run many experiments in parallel

parallel --ungroup python -m lr.train --train_file data/train.jsonl --dev_file data/dev.jsonl --search_trials 1  --serialization_dir model_logs/parallel_lr/exp_{#}  -o ::: {1..6}

Merge multiple experiment results

python -m lr.merge --experiments model_logs/parallel_lr/* --output-file model_logs/parallel_lr/master_results.jsonl

Visualize scatterplot of hyperparameter vs performance

python -m lr.plot --hyperparameter C  --results_file parallel_lr/master_results.jsonl -p dev_f1 

Visualize boxplot of hyperparameter vs performance

python -m lr.plot --hyperparameter weight --boxplot --results_file parallel_lr/master_results.jsonl -p dev_f1 

logistic_regression's People

Contributors

kernelmachine avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.