cs230-stanford / cs230-code-examples Goto Github PK

View Code? Open in Web Editor NEW

3.4K 64.0 951.0 177 KB

Code examples in pyTorch and Tensorflow for CS230

License: Other

Python 100.00%

tensorflow pytorch computer-vision natural-language-processing

cs230-code-examples's People

Contributors

Stargazers

Watchers

Forkers

akashmjn twistedmove lijingwang ejones313 chto shebak11 jpoothokaran nyerneni allisonpark jxlin udanchhoo bcp39 oadelso statt8900 jbenf3 denny1038 pabtorre khawar08 reactivetype stevenji learnaidrist yhamidullah halldorag cassandra-t-oduola ykhoja kiank xinyuxu17 rohitapte yiyang7 cimeister wenshengl tianran9898 shoumo95 peggyyuchunwang samarjit98 jsantoso-stts webdizz sahana-sivaraj lving hgh2017 pydemia teamiceberg pkraison mshapi2 imanbarjastehua spendyala peterxiaoguo sammy-87 hbcbh1999 leidaguo aminakmim kaparthy shaoniwang algodeep zhouweiti privatecollections bluemoonwencong annie201 bamboowave naveenkunareddy wangmiao1981 vpertsas kaiya usuallya hqzhu0913 jmwdpk cmk sophyjj kfengtee jongbinjung vkkhare tusharbihani richgit101 lukebelieves gnanimail chao1981 n7nx cristiname moujunpeng tangguihua xiaoshengjun allensmile xiao-wang007 yangls06 davidxiaozhi jogging520 suzoosuagr ide8 sumhncku cassiasamp gary-robotics sampathweb qiaoxie hal2001 chaitusvk weiczhu pandinosaurus rajiv256 neerajsarwan switchfootsid

cs230-code-examples's Issues

The NLP NER example only predicts correct Os

Hi, the NER example has an accuracy of 0.82 or 0.78 after running for 10 epochs on the toy 10 values dataset. I was using this code to startup a NER task I have to do. It is just predicting 'O's. I checked on set of predictions at the end of the epochs and the I's and B's predicted were all wrong.

Use a score like F1 on the B/I to get a better idea.

Note -
I am not a student in the course. and I know it is supposed to be starter code, but the way it is presented, one expects that it'll work for the toy example.

Batch norm layer fails when batch_size is set to 1

When running the code, I found that this particular layer (and only this one) fails when the batch size is set to 1.

cs230-code-examples/pytorch/vision/model/net.py

Line 78 in 159df10

s = F.dropout(F.relu(self.fcbn1(self.fc1(s))),

Windows compatability

cs230-code-examples/tensorflow/vision/build_dataset.py

Line 40 in 96ac6fd

image.save(os.path.join(output_dir, filename.split('/')[-1]))

I think we can maybe use os.path.basename(filename) instead of filename.split('/')[-1] as in windows os the paths to images get a \\ instead of /. Just with this change we can keep a consistent backslash system. Just a thought.

Using saved models for prediction of a single image

Hi,

I am looking for an example how to use a saved checkpoint of your example for simple inference of a given image, i.e. perform classification on a single image. Unfortunately, searching the internet has not been successful so far.

What I haven been doing so far is the following:

with tf.Session() as sess:
    saver = tf.train.import_meta_graph(metafile)
    saver.restore(sess, path_to_ckpt)
    graph = tf.get_default_graph()
    
    output = graph.get_tensor_by_name('model/pred:0')
    pred = sess.run([output], feed_dict={x: image})

Unfortunately, I am not sure what x is supposed to be. Could you please provide an example for a simple prediction of a single image? Especially, I would need to know which layers to give what name in the model_fn so that I can reference them in the code above.

Best regards.

Problem in Build the dataset of size 64x64

Hello,

After I do the first step (Build the dataset of size 64x64), there is file 64x64_SIGNS file created, but there is no image in the train_signs. How to solve it?

Tutorials

Content ideas for the tutorials explaining the posts

Content ideas for tutorials

Code in https://github.com/cs230-stanford/cs230-stanford.github.io

structure of the project (files' roles, experiment pipeline)
how to run the toy examples
explain how to use logger
explain where to define the model or change it
explain how to change hyperparameters
how to feed data...
use github release to have multiple version of the code?
Explain the general idea of training multiple models, trying different structures...
- make sure that experiments are reproducible
  - for instance, if model.py has incompatible changes (ex: adds batch norm), previous params.json cannot be run again
  - have to update old params.json to match the new change (ex: put params.use_bn argument, and add it to all old params.json)
- give good names to the dirs in experiments
- visualize on tensorboard
- don't spend too much time watching training progress: launch hyperparam search, let it run and get back later (make sure there is no bug first)
explain how to properly split train / dev / test
- hardcode the split in three folders

Organization ideas

add a number to each post? ex: "3. Creating input pipelines..."
- would be easier to understand the structure
- in each post, at the beginning put the full list

Performance issue in tensorflow/vision/model/input_fn.py (by P3)

Hello! I've found a performance issue in input_fn.py: batch() should be called before map(), which could make your program more efficient. Here is the tensorflow document to support it.

Detailed description is listed below:

tensorflow/vision/model/input_fn.py: .batch(params.batch_size)(here) should be called before .map(parse_fn, num_parallel_calls=params.num_parallel_calls)(here).
tensorflow/vision/model/input_fn.py: .batch(params.batch_size)(here) should be called before .map(parse_fn)(here).

Besides, you need to check the function called in map()(e.g., parse_fn called in .map(parse_fn)) whether to be affected or not to make the changed code work properly. For example, if parse_fn needs data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z).

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

TODOs

TODO

Error when run build_dataset.py on windows

In Windows OS, folder names in a path join together with back slash [ \ ] instead of slash [ / ] like this:

C:\Program Files\NVIDIA GPU Computing Toolkit

so build_dataset.py throw an error. because it can't split filename from directory.

I solve it by replace the slash with double back slash '\'

image.save(os.path.join(output_dir, **filename.split('\\')[-1])**)

Thanks.

Performance issues in the program

Hello,I found a performance issue in the definition of input_fn ,
cs230-stanford/cs230-code-examples/blob/master/tensorflow/vision/model/input_fn.py,
dataset = (tf.data.Dataset.map was called without num_parallel_calls.
I think it will increase the efficiency of your program if you add this.

Here is the documemtation of tensorflow to support this thing.

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

Error running data_loader.py

Since utils.py is not present in the same directory as data_loader.py, this will throw an error.

I did a workaround like this:
import sys sys.path.append("."); import utils # if running from root folder, else append '..'

Feedback from TAs

It's not good to call it "starter code", since it makes it like we are hand holding the students too much
we should rename the whole project to something like "cs230-code-examples"
- they can use it as examples, and copy some of the code
put a license on the code
when we refer to certain files in the code, should we put a link to them?
- ex: train.py
- choose it from tensorflow / pytorch / vision / nlp?

hyperparameter search issue

When I run this script using the provided sample dataset:
python search_hyperparams.py --data_dir data/small --parent_dir experiments/learning_rate

It throwed me an error:

The Variable API has been deprecated in Pytorch 1.10

The Variable API has been deprecated in Pytorch 1.10.

From Pytorch 1.10 documentation:
"Autograd automatically supports Tensors with requires_grad set to True
Variables are no longer necessary to use autograd with tensors."

The code can be further simplified if lines with Variable calls are removed. For example lines 56-57 of train.pyfile:

# convert to torch Variables
train_batch, labels_batch = Variable(train_batch), Variable(labels_batch)

The calculated metrics are not precise.

The code in pytorch/vision/train.py and pytorch/vision/evaluate.py describe how to calculate metrics with batches of data.

In train.py, since it calculates the metrics once in a while, it doesn't represent the metrics of whole dataset.
In evaluate.py, since the size of dataset may be not divisible by the batch size, the calculated metrics are not precise, either. The better way is to calculate a weighted sum of the mean values of batches which weighted by the number of examples and divide it by the size of whole dataset.

async=true fails since async is a keyword

cs230-code-examples/pytorch/vision/train.py

Line 51 in 159df10

    
           train_batch, labels_batch = train_batch.cuda(async=True), labels_batch.cuda(async=True)

Organization of the blog posts

General (common between TensorFlow and PyTorch)

Introduction to project starter code
Logging + hyperparams
AWS setup
Train/Dev/Test set

TensorFlow

Getting started
Dataset pipeline: tf.data
Creating the model (tf.layers) + training + evaluation

model
training ops
input_fn and model_fn
evaluation and tf.metrics
initialization
saving
tensorboard
global_step

forget about masking when compute accuracy

cs230-code-examples/tensorflow/nlp/model/model_fn.py

Line 68 in 159df10

accuracy = tf.reduce_mean(tf.cast(tf.equal(labels, predictions), tf.float32))

The computation of accuracy and the metrics of accuracy below are lack of masking. In this way the accuracy could be wrong because of wrong predictions of padded tokens.

OutOfRangeError if test set larger than dev set

There is a little issue in cs230-code-examples/tensorflow/nlp/evaluate.py: You use the dev set for evaluation:

path_eval_sentences = os.path.join(args.data_dir, 'dev/sentences.txt')
path_eval_labels = os.path.join(args.data_dir, 'dev/labels.txt')

But later you iterate over the size of the test set:

params.eval_size = params.test_size

If the test set is larger than the dev set, this leads to an OutOfRangeError. If the test set is smaller than the dev set, the iteration stops too early.

Thanks for sharing the code!