Giter Site home page Giter Site logo

uclnlp / stat-nlp-book Goto Github PK

View Code? Open in Web Editor NEW
266.0 36.0 63.0 57.71 MB

Interactive Lecture Notes, Slides and Exercises for Statistical NLP

Home Page: http://uclmr.github.io/stat-nlp-book

Jupyter Notebook 55.31% Python 0.56% Shell 0.01% CSS 0.01% TeX 0.01% HTML 42.76% JavaScript 1.35% Dockerfile 0.01%

stat-nlp-book's Introduction

The Stat-NLP-Book Project

Render Book Statically

The easiest option for reading the book is via the static nbviewer. While this does not allow you to change and execute code, it also doesn't require you to install software locally and only needs a browser.

Docker installation

We assume you have a command line interface (CLI) in your OS (bash, zsh, cygwin, git-bash, power-shell etc.). We assume this CLI sets the variable $(pwd) to the current directory. If it doesn't replace all mentions of $(pwd) with the current directory you are in.

When using Windows PowerShell all instances of "$(pwd)" should be replaced with ${PWD}.

Install Docker

Go to the docker webpage and follow the instruction for your platform.

Download Stat-NLP-Book Image

Next you can download the stat-nlp-book docker image like so:

docker pull riedelcastro/stat-nlp-book

Get Stat-NLP-Book Repository

You can use the git installation in the docker container to get the repository:

docker run -v "$(pwd)":/home/jovyan/work riedelcastro/stat-nlp-book git clone https://github.com/uclmr/stat-nlp-book.git  

Note: this will create a new stat-nlp-book directory in your current directory.

Change into Stat-NLP-Book directory

We assume from here on that you are in the top level stat-nlp-book directory:

cd stat-nlp-book

Note: you need to be in the stat-nlp-book directory every time you want to run/update the book.

Run Notebook

docker run -it --rm -p 8888:8888 -v "$(pwd)":/home/jovyan/work riedelcastro/stat-nlp-book 

You are now ready to visit the overview page of the installed book.

Usage

Once installed you can always run your notebook server by first changing into your local stat-nlp-book directory, and then executing:

docker run -it --rm -p 8888:8888 -v "$(pwd)":/home/jovyan/work riedelcastro/stat-nlp-book 

This is assuming that your docker daemon is running and that you are in the stat-nlp-book directory. How to run the docker daemon depends on your system.

Update the notebook

We frequently make changes to the book. To get these changes you should first make sure to clean your local changes to avoid merge conflicts. That is, you might have made changes (by changing the code or simply running it) to the files that we changed. In these cases git will complain when you do the update. To overcome this you can undo all your changes by executing:

docker run -v "$(pwd)":/home/jovyan/work riedelcastro/stat-nlp-book git checkout -- .

If you want to keep your changes create copies of the changed files. Jupyter has a "Make a copy" option in the "File" menu for this. You can also create a clone of this repository to keep your own changes and merge our changes in a more controlled manner.

To get the actual updates then run

docker run -v "$(pwd)":/home/jovyan/work riedelcastro/stat-nlp-book git pull

Access Content

The repository contains a lot of material, some of which may not be ready for consumption yet. This is why you should always access content through the top-level overview page (local-link).

virtualenv installation [BETA]

Install virtualenv

Follow the instructions here In short:

pip3 install virtualenv

git clone the stat-nlp-book repository

git clone https://github.com/uclmr/stat-nlp-book.git

Create virtual environment

Enter the cloned stat-nlp-book directory:

cd stat-nlp-book

and create the virtual environment:

virtualenv -p python3 venv

Enter the virtual environment

source venv/bin/activate

Install dependencies

pip3 install --upgrade pip
pip3 install -r requirements.txt
pip3 install git+git://github.com/robjstan/tikzmagic.git
jupyter-nbextension install rise --py --sys-prefix
jupyter-nbextension enable rise --py --sys-prefix    

Run the notebook

jupyter notebook

Installation on the UCL CS cluster

Install virtualenv

When installing virtualenv (full instructions here here) on the CS cluster you will likely have to install it with the --user flag. In short:

pip3 install --user virtualenv

At this point virtualenv may not yet directly be found. You can solve this by finding its location via

pip3 show virtualenv

then appending the LOCATION shown (a directory name) to your $PATH variable using

export PATH=$PATH:LOCATION

and giving permission to execute via

chmod u=rwx LOCATION/virtualenv.py

You should then be able to run virtualenv.py. You can check this by running

virtualenv.py --version

git clone the stat-nlp-book repository

Now we're ready to clone the notebook:

git clone https://github.com/uclmr/stat-nlp-book.git

Create virtual environment

Enter the cloned stat-nlp-book directory via

cd stat-nlp-book

and create the virtual environment:

virtualenv.py -p python3 venv

Enter the virtual environment

source venv/bin/activate

Install dependencies

pip3 install --upgrade pip
pip3 install -r requirements.txt
pip3 install git+git://github.com/robjstan/tikzmagic.git
jupyter-nbextension install rise --py --sys-prefix
jupyter-nbextension enable rise --py --sys-prefix

Run the notebook

jupyter notebook

Access in local browser

With the notebook running on the UCL CS cluster, you can also access it locally via first setting up an SSH tunnel

# run this on your local machine
ssh -N -f -L localhost:8157:localhost:8888 username@cs_cluster

and accessing it through your local browser by entering

localhost:8157

into the browser address bar.

stat-nlp-book's People

Contributors

ahoho avatar andreasvlachos avatar dhammo2 avatar geospith avatar isabelleaugenstein avatar johannesmaxwel avatar mbosnjak avatar narad avatar riedelcastro avatar rockt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stat-nlp-book's Issues

Unable to sample from interpolated N-gram models.

When trying to sample from an interpolated N-gram model, there is an error saying that the probabilities do not sum to one. This is despite the fact that the normalisation tests sum to 1 and the model gives a valid perplexity. Here's a simple model that demonstrates this issue.

image

Student Exercise Environment

When the students develops own code (say, for an exercise or the assignment), they will have several options:

  • clone the repo, setup the virtual env, develop in IDE
  • load a docker container with the notebook, and then edit everything within the notebook (and hence within the container)
  • load docker, edit files within the docker container (using command line editors)
  • load the docker container, mount a local directory with their code (and/or our code), and edit locally but run in the docker container
  • etc.

We should decide on a preferred mode, and then only support that mode.

Problem of update.

Dear developer,

I meet a problem of update, could you helps me figure it out. basically i am presently a student of MSc Web science and Big data. I have attached the error message.
Thank you so much

yaowangyideMacBook-Pro:stat-nlp-book yaowangyi$ docker run -v $PWD:/home/jovyan/work riedelcastro/stat-nlp-book git pull
error: The following untracked working tree files would be overwritten by merge:
data/ohhla/dev/www.ohhla.com/YFA_common.html
data/ohhla/dev/www.ohhla.com/anonymous/b_rhymes/backonmy/decision.brm.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/beatnuts/massacre/slam_pit.btn.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/big_sean/detroit/common.bsn.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/big_sean/hallfame/switchup.bsn.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/blackstr/blackstr/respire.blk.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/brnubian/found/maybeone.brn.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/be/be.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/be/chi_city.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/be/corner.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/be/faithful.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/be/foodlive.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/be/go.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/be/its_your.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/be/love_is.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/be/r_people.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/be/testify.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/be/they_say.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/circus/aquarius.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/circus/between.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/circus/close2me.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/circus/electric.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/circus/gotright.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/circus/heaven.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/circus/hustle.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/circus/iammusic.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/circus/new_wave.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/circus/sl_power.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/circus/star_69.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dollar/a_penny.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dollar/blows_to.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dollar/breaker.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dollar/by_pound.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dollar/charms.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dollar/heidihoe.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dollar/justnick.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dollar/pitchin.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dollar/puppy.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dollar/takeitez.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dollar/tricks.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dollar/twoscoop.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/believer.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/blue_sky.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/celebr8.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/cloth.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/dreamer.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/g_dreams.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/gd_remix.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/gold.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/lovinlst.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/pops_be.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/raw_howu.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/sweet.cms.txt.html
data/ohhla/dev/www.ohhla.com/anonymous/common/dreamer/thebermx.cms.txt.html
data/
Aborting
Updating 8993d56..c52edc8

LaplaceLM counts correction

The correct version is:

def counts(self, word_and_history):
word = word_and_history[0]
return 0.0 if word not in self.vocab else
(self.base_lm.counts(word_and_history) + self.alpha)

Docker task

Just viewed, you needed some help in your Docker configuration. I'll willing to work on that project

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.