wangkuiyi / gotorch Goto Github PK

A Go idiomatic binding to the C++ core of PyTorch

License: MIT License

Go 72.34% Makefile 0.55% C++ 16.15% Dockerfile 0.26% Python 6.40% Shell 1.21% C 3.09%

gotorch's Introduction

GoTorch

GoTorch reimplements PyTorch high-level APIs, including modules and functionals, in idiomatic Go. Thus enables deep learning programming in Go and Go+. This project is in its very early stage.

Easy Switch

Writing deep learning systems in Go is as efficiently as in Python. The DCGAN training programs in GoTorch and PyTorch call similar APIs, have similar program structure, and have a similar number of lines. Go+ has a syntax similar to Python. The Go+ compiler translates Go+ programs into Go source programs. It is a joy to write Go+ programs that calls Go packages like GoTorch.

We have a plan of a translator that migrates existing PyTorch models in Python into GoTorch.

Benefits

Higher runtime efficiency. Go programs run as efficiently as C++.
Training and prediction in the same language. No longer training in Python and online prediction in C++. All in Go/Go+. No TensorFlow graphs or PyTorch tracing.
Same data processing code for training and prediction. No need to Wrap OpenCV functions into TensorFlow operators in C++ for prediction and Python for training.
Supports many machine learning paradigms., including adversarial, reinforcement, and imitation learning -- those we cannot split into training and prediction.
Same program for edge and cloud. GoTorch programs compile and run on phones and self-driving cars as they do on servers and desktops.

The Tech Stack

GoTorch works with the following open-source communities to form Go+Torch.

the Go+ community,
the PyTorch community, and
the TensorFlow XLA ecosystem.

The following figure reveals the stack of technologies.

Go+ applications   # users write DL applications in Go+,
     │             # whose syntax is as concise as Python
 [Go+ compiler]
     ↓
Go source code -→ GoTorch -→ libtorch -→ pytorch/xla -→ XLA ops
     │
 [Go compiler]
     ↓
executable binary  # x86_64, ARM, CUDA, TPU
                   # Linux, macOS, Android, iOS

Documentation

gotorch's People

Contributors

Stargazers

Watchers

gotorch's Issues

Support various ML paradigms -- DCGAN + CIFAR10

depends on #175

mkdir nn/functional and move modules.go there

go test -v fails with Raspbian 10 (buster) on RPi 4

pi@raspberrypi:~/go/src/github.com/wangkuiyi/gotorch/cgotorch $ make -f Makefile.rpi
rm -f libtorch
ln -s rpi/libtorch libtorch
clang++ -std=c++14 \
-I .. \
-I libtorch/include \
-I libtorch/include/torch/csrc/api/include \
-L libtorch/lib \
-fPIC \
-shared \
cgotorch.cc \
-o libcgotorch.so -install_name @rpath/libcgotorch.so \
-Wl,-rpath,libtorch/lib \
-Wl,-force_load libtorch/lib/libc10.so \
-lc10 -ltorch -ltorch_cpu \
-D_GLIBCXX_USE_CXX11_ABI=1
clang: warning: argument unused during compilation: '-install_name @rpath/libcgotorch.so' [-Wunused-command-line-argument]
pi@raspberrypi:~/go/src/github.com/wangkuiyi/gotorch/cgotorch $ cd ..
pi@raspberrypi:~/go/src/github.com/wangkuiyi/gotorch $ go test -v
=== RUN   TestPanicMNIST
--- PASS: TestPanicMNIST (0.00s)
=== RUN   TestLogSoftmax
--- PASS: TestLogSoftmax (0.00s)
=== RUN   ExampleBackward
--- FAIL: ExampleBackward (0.00s)
panic: size mismatch, m1: [17179869187 x 0], m2: [4294967300 x 0] at /home/pi/src/pytorch/aten/src/TH/generic/THTensorMath.cpp:4 [recovered]
	panic: size mismatch, m1: [17179869187 x 0], m2: [4294967300 x 0] at /home/pi/src/pytorch/aten/src/TH/generic/THTensorMath.cpp:4

goroutine 1 [running]:
testing.(*InternalExample).processRunResult(0x253de90, 0x0, 0x0, 0x15c6f4, 0x0, 0x28c140, 0x2410848, 0x1)
	/home/pi/usr/go/src/testing/example.go:89 +0x488
testing.runExample.func2(0x78b7046f, 0xbfc4a256, 0x7982f7, 0x0, 0x5020e0, 0x2410818, 0x24100d8, 0x241c400, 0x253de90, 0x253dea8)
	/home/pi/usr/go/src/testing/run_example.go:58 +0xd4
panic(0x28c140, 0x2410848)
	/home/pi/usr/go/src/runtime/panic.go:969 +0x118
github.com/wangkuiyi/gotorch.MustNil(0x2153628)
	/home/pi/go/src/github.com/wangkuiyi/gotorch/tensor.go:59 +0x70
github.com/wangkuiyi/gotorch.MM(0x2410838, 0x2410820, 0x2)
	/home/pi/go/src/github.com/wangkuiyi/gotorch/tensor.go:171 +0x48
github.com/wangkuiyi/gotorch_test.ExampleBackward()
	/home/pi/go/src/github.com/wangkuiyi/gotorch/backward_test.go:13 +0x104
testing.runExample(0x2d0b6c, 0xf, 0x2e64a8, 0x0, 0x0, 0x0, 0x0)
	/home/pi/usr/go/src/testing/run_example.go:62 +0x184
testing.runExamples(0x253df70, 0x4ca840, 0x7, 0x7, 0x101)
	/home/pi/usr/go/src/testing/example.go:44 +0x104
testing.(*M).Run(0x24512c0, 0x0)
	/home/pi/usr/go/src/testing/testing.go:1250 +0x1f8
main.main()
	_testmain.go:62 +0x120
exit status 2
FAIL	github.com/wangkuiyi/gotorch	0.325s
pi@raspberrypi:~/go/src/github.com/wangkuiyi/gotorch $

runtime.LockOSThread() reduce throughput heavily

Training Resnet50 in P100.

Before #319, the throughput of GoTorch is 70 samples/s.

After #319, the throughput of GoTorch is 27 samples/s.

A magic number

gotorch/example/dcgan/visualize_pickle.py

Line 39 in f875d63

images = [images[i].transpose(1, 2, 0) for i in range(64)]

What does the range(64) mean?

go test fails with -race

=== RUN   ExampleBackward
fatal error: checkptr: unsafe pointer arithmetic

goroutine 1 [running]:
runtime.throw(0x4504143, 0x23)
	/usr/local/Cellar/go/1.14.6/libexec/src/runtime/panic.go:1116 +0x72 fp=0xc0001499b0 sp=0xc000149980 pc=0x4036842
runtime.checkptrArithmetic(0xc0000100a0, 0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.14.6/libexec/src/runtime/checkptr.go:43 +0xb5 fp=0xc0001499e0 sp=0xc0001499b0 pc=0x4008c15
github.com/wangkuiyi/gotorch.Optimizer.AddParameters.func1(0xc000010098, 0xc0000100a0, 0xc000149a98)
	/Users/yi/go/src/github.com/wangkuiyi/gotorch/optim.go:45 +0x70 fp=0xc000149a28 sp=0xc0001499e0 pc=0x41c6150
github.com/wangkuiyi/gotorch.Optimizer.AddParameters(0xc000010098, 0xc000149af0, 0x1, 0x1)
	/Users/yi/go/src/github.com/wangkuiyi/gotorch/optim.go:45 +0x1d4 fp=0xc000149ac0 sp=0xc000149a28 pc=0x41c4c94
github.com/wangkuiyi/gotorch_test.ExampleBackward()
	/Users/yi/go/src/github.com/wangkuiyi/gotorch/backward_test.go:10 +0x110 fp=0xc000149b28 sp=0xc000149ac0 pc=0x44128a0
testing.runExample(0x44fc6be, 0xf, 0x4511ed0, 0x0, 0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.14.6/libexec/src/testing/run_example.go:62 +0x275 fp=0xc000149c68 sp=0xc000149b28 pc=0x4149eb5
testing.runExamples(0xc000149ed8, 0x47e60e0, 0x7, 0x7, 0x101)
	/usr/local/Cellar/go/1.14.6/libexec/src/testing/example.go:44 +0x212 fp=0xc000149d68 sp=0xc000149c68 pc=0x4147c52
testing.(*M).Run(0xc000140080, 0x0)
	/usr/local/Cellar/go/1.14.6/libexec/src/testing/testing.go:1250 +0x4f4 fp=0xc000149f00 sp=0xc000149d68 pc=0x414ed84
main.main()
	_testmain.go:62 +0x224 fp=0xc000149f88 sp=0xc000149f00 pc=0x4414f54
runtime.main()
	/usr/local/Cellar/go/1.14.6/libexec/src/runtime/proc.go:203 +0x1fa fp=0xc000149fe0 sp=0xc000149f88 pc=0x4038eaa
runtime.goexit()
	/usr/local/Cellar/go/1.14.6/libexec/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc000149fe8 sp=0xc000149fe0 pc=0x406a431

goroutine 9 [runnable]:
testing.runExample.func1(0xc000010080, 0xc00005a540)
	/usr/local/Cellar/go/1.14.6/libexec/src/testing/run_example.go:35
created by testing.runExample
	/usr/local/Cellar/go/1.14.6/libexec/src/testing/run_example.go:35 +0x1c7
FAIL	github.com/wangkuiyi/gotorch	0.391s

Should we merge aten and torch into gotorch?

An alternative to having gotorch/aten and gotorch/torch is that we have only gotorch/, which includes files like tensor.go and optim/adam.go and optim/sgd.go.

I am afraid that in the future, we are going to use the XLA backend of PyTorch https://github.com/pytorch/xla, and there might be a vague boundary between aten and torch.

Memory leak while training Resnet50 model

After 120 iterations, memory usage was increased to 78Gi.

Complete MNIST example

As the discussion on torch.Module API #28 , I will complete the MNIST e2e example, there are some Pytorch module has port into GoTorch, we also need some others to complete the MNIST example:

nn.NLLLoss
nn.LogSoftmax

That the MNIST example would be like:

import (
	torch "github.com/wangkuiyi/gotorch"
)

type Net struct {
  fc1, fc2, fc3  torch.Linear
}

func NewNet() torch.Module{
  return &Net{
    fc1 : torch.Linear(28 * 28, 512, false),
    fc2 : torch.Linear(512, 512, false),
    fc3 : torch.Linear(512, 10, false),
  }
}

func (n Net) Forward(x torch.Tensor) torch.Tensor {
  x := torch.View(x)
  x = n.fc1(x)
  x = torch.Relu(x)
  x = n.fc2(x)
  x = torch.Relu(x)
  return n.fc3(x)
}

func main() {
  dataset := torch.NewMNIST(dataDir())
  dataset.AddTransforms([]torch.Transform{
		torch.NewNormalize(0.1307, 0.3081),
		torch.NewStack(),
  })
  trainLoader := torch.NewDataLoader(dataset, 8)
  net := NewNet()
  criterion = torch.CrossEntropyLoss()
  opt := torch.SGD(0.1, 0, 0, 0, false)
  opt.AddParameters(torch.GetParameters(net))

  batchIdx := 0
  for trainLoader.Scan() {
    batch := trainLoader.Batch()
    pre := net.Forward(batch.Data)
    loss := criterion(pre,  batch.Target)
    fmt.Println("BatchIdx: [%d],  Loss: [%f]", batchIdx, loss.item())
    opt.ZeroGrad()
    loss.Backward()
    opt.Step()
    batchIdx++
  }
  torch.FinishGC()
  opt.Close()
  torch.CloseModule(net)
}

Support GPU training in GoTorch

With the PyTorch API, users can specify the runtime device easily just like the following code:

Device  = torch.Device("cuda")      # create CUDA device instance
x = torch.randn((2,3)).to(device)   # assign Tensor memory to CUDA device  
net = MNISTNet()
net.to(device)                      # assign module parameters to CUDA device

In GoTorch, we would like to provide the same API Tensor.To and Module.To so that users can train/pred a model on various device.

Device := torch.NewDevice("cuda")                      // create a CUDA device instance
x := torch.RandN([]int64{2,3}, false).To(device)       // assign Tensor x memory to CUDA device
net = NewMNISTNet()
net.To(device)                                         // assign parameters memory to CUDA device

For the Tensor.To API, we just port device and Tensor::to function to Go.
For the Module.To API, we should assign parameters to device recursive, c.f. https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/include/torch/nn/module.h#L676

TODO list:

port device to Go to implement Tensor.To API.
Dockerfile.gpu and Makefile.gpu to build Gotorch with CUDA version.
Implement Module.To API.
Complete MNIST example running on CUDA device.

Summarize the problem of using runtime.SetFinalizer

Cannot allow "undefined behavior" of uninitialized models

#95 (comment)

Build for NVIDIA Drive PX2

The arch is aarch64.

yi@nvidia:~/go/src/github.com/wangkuiyi/gotorch$ uname -a
Linux nvidia 4.4.38-rt49-tegra #1 SMP PREEMPT RT Tue Jul 25 09:26:02 PDT 2017 aarch64 aarch64 aarch64 GNU/Linux

I downloaded the pre-built libtorch for ARM from https://github.com/ljk53/pytorch-rpi.

yi@nvidia:~/go/src/github.com/wangkuiyi/gotorch$ cgotorch/build.sh
~/go/src/github.com/wangkuiyi/gotorch/cgotorch ~/go/src/github.com/wangkuiyi/gotorch
Building for Raspbian ...
rm -f libtorch
ln -s rpi/libtorch libtorch
g++ -std=c++14 \
-I .. \
-I libtorch/include \
-I libtorch/include/torch/csrc/api/include \
-L libtorch/lib \
-fPIC \
-shared \
optim.cc device.cc mnist_dataset.cc torch.cc pickle.cc functional.cc tensor.cc init.cc \
-O -o libcgotorch.so  \
-Wl,-rpath,libtorch/lib \
-Wl,-force_load libtorch/lib/libc10.so \
-lc10 -ltorch -ltorch_cpu \
-D_GLIBCXX_USE_CXX11_ABI=1
libtorch/lib/libc10.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
Makefile:7: recipe for target 'libcgotorch.so' failed
make: *** [libcgotorch.so] Error 1
~/go/src/github.com/wangkuiyi/gotorch

Add Chinese doc

We'd better add a suite of Chinese doc corresponding to our English version.

Support PyTorch state_dict

As described https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference

It seems that

We should rename Module.GetNamedParameters into Module.StateDict.
We should add a Module.LoadStateDict.

Enhance MNIST example to support prediction and Raspberry Pi Demo

A tutorial
A demo video

torch.nn.Module in Go

PyTorch APi has a key concept -- torch.nn.Module. Many builtin and user-defined models are classes derived from torch.nn.Module. The only method to override is forward(x).

Usually, a torch.nn.Module-derived class has data members representing the model parameters. For example, nn.Linear, the PyTorch implementation of the fully-connected layer has W and B -- the weights and the bias respectively.

In Go/Go+, the concept corresponds to a base class in Python is an interface. So, we provide type Module interface to mimic torch.nn.Module.

Then, we need a solution to free up tensors when a model's life is over.

Which levels of abstractions in C++ should be exposed to Go

There are three levels of abstractions:

Level 1: native function is a low-level API. There are many basic mathematical operations in it.
Level 2: nn.functional is a middle-level API. It's more close to deep learning. It uses basic mathematical operations to compose a complex neural network operation.
Level 3: nn.module is a high-level API. A module contains many states, such as parameters and buffers. It's a C++ class.

Let's take padding operator as an example:

In native function, there is a cat function
In nn.functional, there is a pad function, which calls cat function in level 1.
In nn.module, there are ZeroPad2d class, ReplicationPad3d class, which call pad function in level 2.

	expose to Go	API	contain state
native function	C++ function, easy	low-level API, flexible, few users may use it	No
nn.functional	C++ function, easy	middle-level API, most users use it	No
nn.module	C++ class, hard	high-level API, most users use it	Yes, parameters and buffers

There is another interesting thing, nn.functional will try to fuse some basic native functions. Here is an example of nn.function.linear.

I am wondering which levels of abstractions in C++ I am supposed to expose to Go?

Should we unify the build environment?

Recently, my two PRs work well on my local mac development environment but fail in CI.

In #60, I use clang-format to format the C++ codes in local already, but it fails the pre-commit checking in CI.

$ pre-commit run -a
go fmt...................................................................Passed
go lint..................................................................Passed
validate toml........................................(no files to check)Skipped
Check files aren't using go's testing package........(no files to check)Skipped
cpplint..................................................................Passed
cppcheck.................................................................Passed
clang-format.............................................................Failed
- hook id: clang-format
- files were modified by this hook

In #64, I find that the macOS and Linux treat int64_t differently. In macOS, in64_t is long long, while long in Linux.

So, I have to write code (*C.longlong)(unsafe.Pointer(&stride[0])) in macOS, while (*C.long)(unsafe.Pointer(&stride[0])) in Linux. This introduce conditional compilation in go codes.

ExampleTrainMNIST takes forever on Raspbian 10 32bit

After #98, GoTorch builds and runs on Raspbian 10, but the test ExampleTrainMNIST takes forever.

Show performance - ResNet50 + ImageNet

depends on #175

Implement ResNet for gotorch

Examples in C++ and Python are in https://github.com/pytorch/vision/tree/master/torchvision/csrc/models and https://github.com/pytorch/vision/tree/master/torchvision/models
Prerequisites:

Wraptorch::adaptive_avg_pool2d in libtorch
Wrap torch::max_pool2d in libtorch
Wrap torch::nn::functional::group_norm in libtorch (the C++ version of resnet doesn't use group_norm at all, the python version use nn.GroupNorm only in a conditional statement)

Can't optimize a non-leaf Tensor

When I run the dcgan example in GPU, I get the following error message:

terminate called after throwing an instance of 'c10::Error'
  what():  can't optimize a non-leaf Tensor
Exception raised from add_param_group at ../torch/csrc/api/src/optim/optimizer.cpp:80 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x69 (0x7f52653a7eb9 in /go/src/github.com/wangkuiyi/gotorch/cgotorch/libtorch/lib/libc10.so)

I find the same issue in PyTorch community: https://discuss.pytorch.org/t/tensor-to-device-changes-is-leaf-causing-cant-optimize-a-non-leaf-tensor/37659

test = torch.zeros((10,10)).requires_grad_(True)
print(test.is_leaf) # True
test = test.to(data.device)
print(test.is_leaf) # False

The To operation returns a new tensor, so, test becomes a non-leaf tensor.

We should keep the original reference. Instead calling v.Set(reflect.ValueOf(t.To(device, t.Dtype()))), we should call t.SetData(t.To(device, t.Dtype())) in Go code.

Need pre-commit to check the code format

We need to configure pre-commit on this project to check the Go code.

write DCGAN example with different frontend languages

We want to compare the DCGAN example with different frontend languages: Python/C++/Go/Go+

Python version: https://github.com/pytorch/examples/blob/master/dcgan/main.py
C++ version: https://github.com/pytorch/examples/blob/master/cpp/dcgan/dcgan.cpp

We will add the Go version and Go+ version later.

Cannot find symbols including the MNIST dataset on Linux

The command go test -v in the container complains that it cannot find symbols including the MNIST dataset. It is weird that go test -v works with macOS.

root@a483ce0b3e5d:/go/src/github.com/wangkuiyi/gotorch# go test -v
# github.com/wangkuiyi/gotorch
./cgotorch/libcgotorch.so: undefined reference to `torch::data::datasets::MNIST::MNIST(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::data::datasets::MNIST::Mode)'
./cgotorch/libcgotorch.so: undefined reference to `c10::Symbol::fromQualString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
./cgotorch/libcgotorch.so: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
collect2: error: ld returned 1 exit status
FAIL	github.com/wangkuiyi/gotorch [build failed]

Anyway, we can merge this PR and fix the problem in future PRs.

Originally posted by @wangkuiyi in #58 (comment)

Verify the loss value of ResNet training between Go and Python version

To get a ResNet50 training baseline of loss value, I running the resnet.py example and got the following logs:

batch: 10, loss: 10.711030, acc1: 0.000000, acc5: 0.000000
batch: 20, loss: 7.499339, acc1: 0.000000, acc5: 0.000000
batch: 30, loss: 7.281894, acc1: 0.000000, acc5: 0.000000
batch: 40, loss: 7.059255, acc1: 0.000000, acc5: 0.000000
batch: 50, loss: 7.000484, acc1: 0.000000, acc5: 0.000000
batch: 60, loss: 6.871602, acc1: 0.000000, acc5: 3.125000
batch: 70, loss: 6.962079, acc1: 0.000000, acc5: 0.000000
batch: 80, loss: 6.872428, acc1: 0.000000, acc5: 0.000000
batch: 90, loss: 6.922100, acc1: 0.000000, acc5: 0.000000
batch: 100, loss: 6.918412, acc1: 0.000000, acc5: 0.000000
batch: 110, loss: 6.880023, acc1: 0.000000, acc5: 0.000000
batch: 120, loss: 6.936709, acc1: 0.000000, acc5: 3.125000
batch: 130, loss: 6.936309, acc1: 0.000000, acc5: 0.000000
batch: 140, loss: 6.923660, acc1: 0.000000, acc5: 0.000000
batch: 150, loss: 6.924109, acc1: 0.000000, acc5: 0.000000
batch: 160, loss: 6.923644, acc1: 0.000000, acc5: 3.125000
...

TestTensorString failed on macOS

I noticed the failure due to TravsiCI failed with #191. TravisCI is configured to run on macOS VM.

=== RUN   TestTensorString
    TestTensorString: tensor_test.go:84: 
        	Error Trace:	tensor_test.go:84
        	Error:      	Not equal: 
        	            	expected: " 1.0000  1.1000  1.2000\n 2.0000  3.0000  4.0000\n[ CPUFloatType{2,3} ]"
        	            	actual  : "   0.0141    2.0000  512.0001\n   0.0000    0.0000    0.0000\n[ CPUDoubleType{2,3} ]"
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,3 +1,3 @@
        	            	- 1.0000  1.1000  1.2000
        	            	- 2.0000  3.0000  4.0000
        	            	-[ CPUFloatType{2,3} ]
        	            	+   0.0141    2.0000  512.0001
        	            	+   0.0000    0.0000    0.0000
        	            	+[ CPUDoubleType{2,3} ]
        	Test:       	TestTensorString

Is MustNil safe?

The definition is here

gotorch/tensor.go

Lines 21 to 27 in 66df6a4

    
           func MustNil(err unsafe.Pointer) { 
        
           	if err != nil { 
        
           		msg := C.GoString((*C.char)(err)) 
        
           		C.FreeString((*C.char)(err)) 
        
           		panic(msg) 
        
           	} 
        
           }

After converting the C pointer err into a Go string msg, we free err. Is msg still a valid Go object with its underlying C pointer freed?

Loading data in GoTorch

PyTorch provides Dataset and DataLoader API to load and generate data to train the model with, c.f. https://pytorch.org/tutorials/advanced/cpp_frontend.html#loading-data.
We would like to provide the same way in GoTorch.

MNIST training example on GoTorch and LibTorch can not convergence to the same accuracy

LibTorch MNIST example got the loss 0.0269 after 5 epochs:

Epoch 0, Loss: 0.1280
Epoch 1, Loss: 0.0659
Epoch 2, Loss: 0.0396
Epoch 3, Loss: 0.0304
Epoch 4, Loss: 0.0269

GoTorch MNIST example got the loss 1.4148 after 5 epochs:

2020/08/12 22:37:41 Epoch: 0, Loss: 4.8264
2020/08/12 22:37:46 Epoch: 1, Loss: 5.9624
2020/08/12 22:37:52 Epoch: 2, Loss: 2.4493
2020/08/12 22:37:58 Epoch: 3, Loss: 0.9619
2020/08/12 22:38:04 Epoch: 4, Loss: 1.4148

Why The Monad Pattern Looks Promising in Go+Torch Design

Monad is a programming pattern that records the output of each function call in a data structure, so we can free them at once afterward. It applies to many programming languages. Let us see why it is important to Go+Torch.

Go uses the pattern extensively, see https://www.innoq.com/en/blog/golang-errors-monads/ for an example.

Case Study 1: Free Tensors

We now allocate Tensor objects using new to keep the reference count in the shared_ptr field of the C++ Tensor class:

gotorch/cgotorch/cgotorch.cc

Line 16 in 4ade9aa

return new at::Tensor(std::move(c));

The Tensor objects newed would cause memory leak if we don't recycle them.

Assume that Go has a similar frontend API as C++, then according to the C++ MNIST example, let's think about the following problems.

Problem 1: Destruct `Tensor`s Created In The Train Loop To Avoid Memory Leak

Tensors Created In the C++ train loop:
The train loop in mnist.cpp is like:

for (auto& batch : data_loader) {
    auto data = batch.data.to(device), targets = batch.target.to(device);  // `data` and `targets` are `Tensor`s
    optimizer.zero_grad();
    auto output = model.forward(data);  // `output` is a `Tensor`
    auto loss = torch::nll_loss(output, targets);  // `loss` is a `Tensor`
    AT_ASSERT(!std::isnan(loss.template item<float>()));
    loss.backward();
    optimizer.step();
    //...
}

We can see these Tensors has to be created:

data and targets as the features and labels of the dataset
output as the predictions of the data
loss

We can use `defer` to destruct `Tensor`s in the train loop

Because data, targets, output, and loss are all stack variables, they are created and destroyed in each iteration of the C++ train loop. This implied the libtorch framework would take ownership of the Tensor s if necessary. As a result, a naive API of gotorch can use defer to recycle the reference-counted Tensors. That is, the following imaginary code would work okay.

// We need this nested function to make `defer` works as expected.
func step(batch *Batch) {
    // `data`, `targets`, `output`, `loss` are `Tensor`s.
    data := batch.Data.To(device)
    defer data.Close()
    target := batch.Target.To(device)  
    defer target.Close()
    optimizer.zero_grad()
    output := model.Forward(data)
    defer output.Close()
    loss = torch.NllLoss(output, targets)
    defer loss.Close()
    loss.Backward()
    optimizer.Step()
    // ...
}
for batch := range data_loader {
    step(batch)
}

The defers are a bit tedious, maybe we can improve the syntax of Go+ to save typing.

Tensors Created In the C++ forward method
The forward method is called by the train loop above, in the C++ mnist example, the forward method looks like:

torch::Tensor forward(torch::Tensor x) {
    x = torch::relu(torch::max_pool2d(conv1->forward(x), 2));
    x = torch::relu(
        torch::max_pool2d(conv2_drop->forward(conv2->forward(x)), 2));
    x = x.view({-1, 320});
    x = torch::relu(fc1->forward(x));
    x = torch::dropout(x, /*p=*/0.5, /*training=*/is_training());
    x = fc2->forward(x);
    return torch::log_softmax(x, /*dim=*/1);
  }

We can use `defer` to destruct `Tensor`s in the `Forward` function (in a tricky way)

Similar to the train loop above, x is a Tensor on the stack and is destroyed at the end of the function scope. The difference is that x is reassigned multiple times. So we cannot simply use defer x.Close() here. A workaround is requiring users to use a different idiom, for a naive example:

func (net *Net) Forward(x torch.Tensor) torch.Tensor {  // The argument x is recycled in the train loop
    var tensors []Tensor
    defer func () {
        for t := range tensors {
            t.Close()
        }
    }()
    x = torch.Relu(torch.MaxPool2d(net.conv1.Forward(x), 2))
    append(tensors, x)
    x = torch.Relu(
        torch.MaxPool2d(net.conv2_drop.Forward(net.conv2.Forward(x)), 2))
    append(tensors, x)
    x = x.View([]int{-1, 320})
    append(tensors, x)
    x = torch.Relu(net.fc1.Forward(x))
    append(tensors, x)
    x = torch.Dropout(x, /*p=*/0.5, /*training=*/is_training())
    append(tensors, x)
    x = net.fc2.Forward(x)
    append(tensors, x)
    return torch.LogSoftmax(x, /*dim=*/1)  // The return value is recycled in the train loop
  }

Obviously, this is not very elegant.

Should we bookkeeping the `Tensors` in C++?

A better way is keeping the tensors array in C++ rather than in Go, for example, we can use std::vector to record each C++ Tensor created by Go API, and provide a torch.CleanTensors for users to call at the end of the train loop. However, this solution is harder to design properly, for example, we have to take goroutines into consideration so as to avoid corrupting the std::vector.

Case Study 2: Record Errors

Few functions in libtorch have the noexcept tag. This implies that most of the functions in C++ may throw an exception. We have to expose an error return type for these functions' wrappers in Go. Recall the step function above:

func step(batch *Batch) {
    // `data`, `targets`, `output`, `loss` are `Tensor`s.
    data := batch.Data.To(device)
    defer data.Close()
    // ...
}

It may become the following in production code:

func step(batch *Batch) error {
    // `data`, `targets`, `output`, `loss` are `Tensor`s.
    data, err := batch.Data.To(device)
    if err != nil {
        return ...
    }
    defer data.Close()
    // ...
}

That is, the user should check whether there's an error on each line. This may be tedious too. Go+ has a neat syntax to unwrap errors, but I cannot think of an elegant way to solve the problem for the time being. See previous discussions also: goplus/gop#307 (comment), goplus/gop#307 (comment)

Decoding jpg diff between Go image library and Python PIL library

I use the ToTensor transform to read the same image in GoTorch and PyTorch:

The last three Tensor value in GoTorch:

0.0788  0.0936  0.0936

In PyTorch:

0.0784, 0.0941, 0.0941

There is a little diff.

DataLoader caused memory leak

After simply repeating 10 times on the dataloader scanning, the memory increased to 1.66 GB.

Comparison of Implementations of MobileNetV2 in Go and Python

MobileNetV2 (Inverted Residuals and Linear Bottlenecks) is a vision model for mobile devices.
pytorch has official implementations in both Python(mobilenet.py) and C++(mobilenet.h,
mobilenet.cpp), with about the same amount of code (162 lines v.s. 185 lines).

The following table compares the Go version and the Python version in terms of lines, as expected, they have about the same amount of code, too:

Go	Python
package vision import ( "math" "torch" "torch/nn" "torch/nn/init" ) func max(x, y int64) int64 { // math.max only works on floats if x > y { return x } return y } func makeDivisible(value float64, divisor int64, minValue int64) int64 { if minValue == nil { min_value = divisor } newValue := max(minValue, (int64(value+float64(divisor)/2)/divisor)divisor) if newValue < .9value { newValue += divisor } return newValue } type ConvBNReLU struct { nn.Sequential } func NewConvBNReLU(in_planes, out_planes, kernel_size, stride, groups int64) ConvBNReLU { ret := ConvBNReLU{nn.NewSequential()} options := nn.Conv2dOptions{in_planes, out_planes, kernel_size} ret.PushBack(nn.NewConv2d( options.stride(stride).padding(padding).groups(groups).bias(false))) ret.PushBack(nn.BatchNorm2d{out_planes}) ret.PushBack(nn.Functional{nn.ReLU}) } func (net ConvBNReLU) Forward(x torch.Tensor) torch.Tensor { return net.Sequential.Forward(x) } type MobileNetInvertedResidual struct { nn.Module stride int64 useResConnect bool conv nn.Sequential } func NewMobileNetInvertedResidual( input, output, stride int64, expandRatio float64) MobileNetInvertedResidual { net := MobileNetInvertedResidual{ Module: nn.NewModule(), stride: stride, useResConnect: stride == 1 && input == output, conv: nn.NewSequential()} net.stride = stride net.useResConnect = stride == 1 && input == output net.conv = nn.NewSequential() doubleCompare := func(a, b float64) { return math.Abs(a-b) < 1e-20 } torch.CHECK(stride == 1 \|\| stride == 2) hiddenDim := int64(math.Round(float64(input) expandRatio)) if !doubleCompare(expandRatio, 1) { conv.PushBack(NewConvBNReLU(input, hiddenDim, 1, 3, 1, 1)) } net.conv.PushBack(NewConvBNReLU(hiddenDim, hiddenDim, 3, stride, hiddenDim, 1)) options := nn.Conv2dOptions{hiddenDim, output, 1} net.conv.PushBack(nn.NewConv2d(options.stride(1).padding(0).bias(false))) net.RegisterModule("conv", net.conv) return net } func (net MobileNetInvertedResidual) Forward(x torch.Tensor) torch.Tensor { if net.useResConnect { return net.Add(x + net.conv.Forward(x)) } return net.conv.Forward(x) } type MobileNetV2 struct { nn.Module // nn.Module is a monadic type lastChannel int64 features, classifier nn.Sequential } func NewMobileNetV2( numClasses int64, widthMult float64, invertedResidualSettings [][]int64, roundNearest int64) MobileNetV2 { net := MobileNetV2{ Module: nn.NewModule(), features: nn.NewSequential(), classfier: nn.NewSequential()} var inputChannel int64 = 32 var lastChannel int64 = 1280 if invertedResidualSettings == nil \|\| len(invertedResidualSettings) == 0 { invertedResidualSettings := [][]int64{ // t, c, n, s {1, 16, 1, 1}, {6, 24, 2, 2}, {6, 32, 3, 2}, {6, 64, 4, 2}, {6, 96, 3, 1}, {6, 160, 3, 2}, {6, 320, 1, 1}, } } torch.CHECK( len(invertedResidualSettings[0]) == 4, "inverted_residual_settings should contain 4-element vectors") inputChannel := makeDivisible(inputChannelwidthMult, roundNearest, nil) net.lastChannel = makeDivisible(lastChannelmath.max(1.0, widthMult), roundNearest, nil) net.features.PushBack(NewConvBNReLU(3, inputChannel, 3, 2)) for setting := range invertedResidualSettings { outputChannel := makeDivisible(setting[1]widthMult, roundNearest, nil) for i := 0; i < setting[2]; i++ { stride := 1 if i == 0 { stride = setting[3] } features.PushBack( NewMobileNetInvertedResidual( inputChannel, outputChannel, stride, setting[0])) inputChannel = outputChannel } } net.features.PushBack(NewConvBNReLU(inputChannel, net.lastChannel, 1, 3, 1, 1)) classifier.PushBack(nn.Dropout(0.2)) classifier.PushBack(nn.Linear(net.lastChannel, net.numClasses)) net.RegisterModule("features", net.features) net.RegisterModule("classifier", net.classifier) for module := range net.Modules(false) { switch M := module.(type) { case nn.Conv2d: init.KaimingNormal(M.Weight, 0, torch.kFanOut) if M.options.Bias { init.zeros(M.Bias) } case nn.BatchNorm2d: init.Ones(M.Weight) init.Zeros(M.Bias) case nn.Linear: init.Normal(M.Weight, 0, 0.01) init.Zero(M.Bias) } } return net } func (net *MobileNetV2) Forward(x torch.Tensor) torch.Tensor { x = net.features.Forward(x) x = net.Mean(x, []int{2, 3}) x = net.classifier.Forwart(x) return x }	from torch import nn def _make_divisible(v, divisor, min_value=None): """ This function is taken from the original tf repo. It ensures that all layers have a channel number that is divisible by 8 It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py :param v: :param divisor: :param min_value: :return: """ if min_value is None: min_value = divisor new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) # Make sure that round down does not go down by more than 10%. if new_v < 0.9 * v: new_v += divisor return new_v class ConvBNReLU(nn.Sequential): def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None): padding = (kernel_size - 1) // 2 if norm_layer is None: norm_layer = nn.BatchNorm2d super(ConvBNReLU, self).__init__( nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False), norm_layer(out_planes), nn.ReLU6(inplace=True) ) class InvertedResidual(nn.Module): def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None): super(InvertedResidual, self).__init__() self.stride = stride assert stride in [1, 2] if norm_layer is None: norm_layer = nn.BatchNorm2d hidden_dim = int(round(inp * expand_ratio)) self.use_res_connect = self.stride == 1 and inp == oup layers = [] if expand_ratio != 1: # pw layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer)) layers.extend([ # dw ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer), # pw-linear nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False), norm_layer(oup), ]) self.conv = nn.Sequential(layers) def forward(self, x): if self.use_res_connect: return x + self.conv(x) else: return self.conv(x) class MobileNetV2(nn.Module): def __init__(self, num_classes=1000, width_mult=1.0, inverted_residual_setting=None, round_nearest=8, block=None, norm_layer=None): """ MobileNet V2 main class Args: num_classes (int): Number of classes width_mult (float): Width multiplier - adjusts number of channels in each layer by this amount inverted_residual_setting: Network structure round_nearest (int): Round the number of channels in each layer to be a multiple of this number Set to 1 to turn off rounding block: Module specifying inverted residual building block for mobilenet norm_layer: Module specifying the normalization layer to use """ super(MobileNetV2, self).__init__() if block is None: block = InvertedResidual if norm_layer is None: norm_layer = nn.BatchNorm2d input_channel = 32 last_channel = 1280 if inverted_residual_setting is None: inverted_residual_setting = [ # t, c, n, s [1, 16, 1, 1], [6, 24, 2, 2], [6, 32, 3, 2], [6, 64, 4, 2], [6, 96, 3, 1], [6, 160, 3, 2], [6, 320, 1, 1], ] # only check the first element, assuming user knows t,c,n,s are required if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4: raise ValueError("inverted_residual_setting should be non-empty " "or a 4-element list, got {}".format(inverted_residual_setting)) # building first layer input_channel = _make_divisible(input_channel width_mult, round_nearest) self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest) features = [ConvBNReLU(3, input_channel, stride=2, norm_layer=norm_layer)] # building inverted residual blocks for t, c, n, s in inverted_residual_setting: output_channel = _make_divisible(c * width_mult, round_nearest) for i in range(n): stride = s if i == 0 else 1 features.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer)) input_channel = output_channel # building last several layers features.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1, norm_layer=norm_layer)) # make it nn.Sequential self.features = nn.Sequential(*features) # building classifier self.classifier = nn.Sequential( nn.Dropout(0.2), nn.Linear(self.last_channel, num_classes), ) # weight initialization for m in self.modules(): if isinstance(m, nn.Conv2d): nn.init.kaiming_normal_(m.weight, mode='fan_out') if m.bias is not None: nn.init.zeros_(m.bias) elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)): nn.init.ones_(m.weight) nn.init.zeros_(m.bias) elif isinstance(m, nn.Linear): nn.init.normal_(m.weight, 0, 0.01) nn.init.zeros_(m.bias) def _forward_impl(self, x): # This exists since TorchScript doesn't support inheritance, so the superclass method # (this one) needs to have a name other than `forward` that can be accessed in a subclass x = self.features(x) # Cannot use "squeeze" as batch-size can be 1 => must use reshape with x.shape[0] x = nn.functional.adaptive_avg_pool2d(x, 1).reshape(x.shape[0], -1) x = self.classifier(x) return x def forward(self, x): return self._forward_impl(x)
174 lines	162 lines

Python

package vision

import (
	"math"
	"torch"
	"torch/nn"
	"torch/nn/init"
)

func max(x, y int64) int64 { // math.max only works on floats
	if x > y {
		return x
	}
	return y
}

func makeDivisible(value float64, divisor int64, minValue *int64) int64 {
	if minValue == nil {
		min_value = divisor
	}
	newValue := max(*minValue, (int64(value+float64(divisor)/2)/divisor)*divisor)
	if newValue < .9*value {
		newValue += divisor
	}
	return newValue
}

type ConvBNReLU struct {
	nn.Sequential
}

func NewConvBNReLU(in_planes, out_planes, kernel_size, stride, groups int64) ConvBNReLU {
	ret := ConvBNReLU{nn.NewSequential()}
	options := nn.Conv2dOptions{in_planes, out_planes, kernel_size}
	ret.PushBack(nn.NewConv2d(
		options.stride(stride).padding(padding).groups(groups).bias(false)))
	ret.PushBack(nn.BatchNorm2d{out_planes})
	ret.PushBack(nn.Functional{nn.ReLU})
}

func (net *ConvBNReLU) Forward(x torch.Tensor) torch.Tensor {
	return net.Sequential.Forward(x)
}

type MobileNetInvertedResidual struct {
	nn.Module
	stride        int64
	useResConnect bool
	conv          nn.Sequential
}

func NewMobileNetInvertedResidual(
	input, output, stride int64, expandRatio float64) MobileNetInvertedResidual {
	net := MobileNetInvertedResidual{
		Module:        nn.NewModule(),
		stride:        stride,
		useResConnect: stride == 1 && input == output,
		conv:          nn.NewSequential()}

	net.stride = stride
	net.useResConnect = stride == 1 && input == output
	net.conv = nn.NewSequential()

	doubleCompare := func(a, b float64) {
		return math.Abs(a-b) < 1e-20
	}

	torch.CHECK(stride == 1 || stride == 2)
	hiddenDim := int64(math.Round(float64(input) * expandRatio))

	if !doubleCompare(expandRatio, 1) {
		conv.PushBack(NewConvBNReLU(input, hiddenDim, 1, 3, 1, 1))
	}
	net.conv.PushBack(NewConvBNReLU(hiddenDim, hiddenDim, 3, stride, hiddenDim, 1))
	options := nn.Conv2dOptions{hiddenDim, output, 1}
	net.conv.PushBack(nn.NewConv2d(options.stride(1).padding(0).bias(false)))

	net.RegisterModule("conv", net.conv)
	return net
}

func (net *MobileNetInvertedResidual) Forward(x torch.Tensor) torch.Tensor {
	if net.useResConnect {
		return net.Add(x + net.conv.Forward(x))
	}
	return net.conv.Forward(x)
}

type MobileNetV2 struct {
	nn.Module            // nn.Module is a monadic type
	lastChannel          int64
	features, classifier nn.Sequential
}

func NewMobileNetV2(
	numClasses int64,
	widthMult float64,
	invertedResidualSettings [][]int64,
	roundNearest int64) MobileNetV2 {
	net := MobileNetV2{
		Module:    nn.NewModule(),
		features:  nn.NewSequential(),
		classfier: nn.NewSequential()}
	var inputChannel int64 = 32
	var lastChannel int64 = 1280

	if invertedResidualSettings == nil || len(invertedResidualSettings) == 0 {
		invertedResidualSettings := [][]int64{
			// t, c, n, s
			{1, 16, 1, 1},
			{6, 24, 2, 2},
			{6, 32, 3, 2},
			{6, 64, 4, 2},
			{6, 96, 3, 1},
			{6, 160, 3, 2},
			{6, 320, 1, 1},
		}
	}

	torch.CHECK(
		len(invertedResidualSettings[0]) == 4,
		"inverted_residual_settings should contain 4-element vectors")

	inputChannel := makeDivisible(inputChannel*widthMult, roundNearest, nil)
	net.lastChannel =
		makeDivisible(lastChannel*math.max(1.0, widthMult), roundNearest, nil)
	net.features.PushBack(NewConvBNReLU(3, inputChannel, 3, 2))

	for setting := range invertedResidualSettings {
		outputChannel := makeDivisible(setting[1]*widthMult, roundNearest, nil)

		for i := 0; i < setting[2]; i++ {
			stride := 1
			if i == 0 {
				stride = setting[3]
			}
			features.PushBack(
				NewMobileNetInvertedResidual(
					inputChannel, outputChannel, stride, setting[0]))
			inputChannel = outputChannel
		}
	}
	net.features.PushBack(NewConvBNReLU(inputChannel, net.lastChannel, 1, 3, 1, 1))

	classifier.PushBack(nn.Dropout(0.2))
	classifier.PushBack(nn.Linear(net.lastChannel, net.numClasses))

	net.RegisterModule("features", net.features)
	net.RegisterModule("classifier", net.classifier)

	for module := range net.Modules(false) {
		switch M := module.(type) {
		case nn.Conv2d:
			init.KaimingNormal(M.Weight, 0, torch.kFanOut)
			if M.options.Bias {
				init.zeros(M.Bias)
			}
		case nn.BatchNorm2d:
			init.Ones(M.Weight)
			init.Zeros(M.Bias)
		case nn.Linear:
			init.Normal(M.Weight, 0, 0.01)
			init.Zero(M.Bias)
		}
	}
	return net
}

func (net *MobileNetV2) Forward(x torch.Tensor) torch.Tensor {
	x = net.features.Forward(x)
	x = net.Mean(x, []int{2, 3})
	x = net.classifier.Forwart(x)
	return x
}

from torch import nn


def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


class ConvBNReLU(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None):
        padding = (kernel_size - 1) // 2
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
            norm_layer(out_planes),
            nn.ReLU6(inplace=True)
        )


class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d

        hidden_dim = int(round(inp * expand_ratio))
        self.use_res_connect = self.stride == 1 and inp == oup

        layers = []
        if expand_ratio != 1:
            # pw
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer))
        layers.extend([
            # dw
            ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer),
            # pw-linear
            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
            norm_layer(oup),
        ])
        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self,
                 num_classes=1000,
                 width_mult=1.0,
                 inverted_residual_setting=None,
                 round_nearest=8,
                 block=None,
                 norm_layer=None):
        """
        MobileNet V2 main class

        Args:
            num_classes (int): Number of classes
            width_mult (float): Width multiplier - adjusts number of channels in each layer by this amount
            inverted_residual_setting: Network structure
            round_nearest (int): Round the number of channels in each layer to be a multiple of this number
            Set to 1 to turn off rounding
            block: Module specifying inverted residual building block for mobilenet
            norm_layer: Module specifying the normalization layer to use

        """
        super(MobileNetV2, self).__init__()

        if block is None:
            block = InvertedResidual

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d

        input_channel = 32
        last_channel = 1280

        if inverted_residual_setting is None:
            inverted_residual_setting = [
                # t, c, n, s
                [1, 16, 1, 1],
                [6, 24, 2, 2],
                [6, 32, 3, 2],
                [6, 64, 4, 2],
                [6, 96, 3, 1],
                [6, 160, 3, 2],
                [6, 320, 1, 1],
            ]

        # only check the first element, assuming user knows t,c,n,s are required
        if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:
            raise ValueError("inverted_residual_setting should be non-empty "
                             "or a 4-element list, got {}".format(inverted_residual_setting))

        # building first layer
        input_channel = _make_divisible(input_channel * width_mult, round_nearest)
        self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
        features = [ConvBNReLU(3, input_channel, stride=2, norm_layer=norm_layer)]
        # building inverted residual blocks
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * width_mult, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer))
                input_channel = output_channel
        # building last several layers
        features.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1, norm_layer=norm_layer))
        # make it nn.Sequential
        self.features = nn.Sequential(*features)

        # building classifier
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(self.last_channel, num_classes),
        )

        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def _forward_impl(self, x):
        # This exists since TorchScript doesn't support inheritance, so the superclass method
        # (this one) needs to have a name other than `forward` that can be accessed in a subclass
        x = self.features(x)
        # Cannot use "squeeze" as batch-size can be 1 => must use reshape with x.shape[0]
        x = nn.functional.adaptive_avg_pool2d(x, 1).reshape(x.shape[0], -1)
        x = self.classifier(x)
        return x

    def forward(self, x):
        return self._forward_impl(x)

174 lines

162 lines

The C++ version has 185 lines because it strictly follows the 80 character line length limit. The code amount of C++, Go, Python is comparable.

Compare different frontend language training on MNIST dataset

Just like wring a program to print "Hello World" is our first cause on coding, training a model to implement handwriting recognition on the MNIST database is usually the first course on Deep Learning.

This issue tried to compare various frond-end language on how to train the model with C++, Go, Python, and Go+Torch.

C++	Go
#include <torch/torch.h> #include <cstddef> #include <cstdio> #include <iostream> #include <string> #include <vector> struct Net: torch::nn::Module { Net() : conv1(torch::nn::Conv2dOptions(1, 10, /kernel_size=/5)), conv2(torch::nn::Conv2dOptions(10, 20, /kernel_size=/5)), dropout1(0.25), dropout2(0.5), fc1(320, 50), fc2(50, 10) { register_module("conv1", conv1); register_module("conv2", conv2); register_module("dropout1", dropout1); register_module("dropout2", dropout2); register_module("fc1", fc1); register_module("fc2", fc2); } torch::Tensor forward(torch::Tensor x) { auto x = conv1->forward(x); x = torch::relu(x); x = conv2->forward(x); x = torch::relu(x); x = torch::max_pool2d(x, 2); x = dropout1(x); x = torch::flatten(x, 1); x = fc1(x); x = torch::relu(x); x = dropout2(x); auto output = fc2(x); return torch::log_softmax(x, 1); } torch::nn::Conv2d conv1; torch::nn::Conv2d conv2; torch::nn::Dropout dropout1; torch::nn::Dropout dropout2; torch::nn::Linear fc1; torch::nn::Linear fc2; }; auto main() -> int { Net model; model.train(); auto sgd = torch::optim::SGD( model.parameters(), torch::optim::SGDOptions(0.01).momentum(0.5)); sgd.zero_grad(); auto data = torch::rand({2, 3, 224, 224}); auto target = torch::randint(1, 10, {2, }); auto output = model.forward(data); auto loss = torch::nll_loss(output, target); loss.backward(); sgd.step(); std::printf("Loss: %.6f", loss.template item<float>()); }	package main import ( torch "github.com/wangkuiyi/gotorch" ) type Net struct { torch.Module conv1 torch.Conv2d conv2 torch.Conv2d dropout1 torch.Dropout1 dropout2 torch.Dropout2 fc1 torch.Linear fc2 torch.Linear } func NewNet() { n := &Net{ torch.Model{}, conv1: &torch.Conv2d(1, 10, 5), conv2: &torch.Conv2d(10, 20, 5), dropout1: &torch.Dropout1(0.25) dropout2: &torch.Dropout2(0.5) fc1: &torch.Linear(9216, 128), fc2: &torch.Linear(128, 10), } n.registerModule() return m } func (n Net) registerModule() { n.RegisterModule("conv1", n.conv1) n.RegisterModule("conv2", n.conv2) n.RegisterModule("dropout1", n.dropout1) n.RegisterModule("dropout2", n.dropout2) n.RegisterModule("fc1", n.fc1) n.RegisterModule("fc2", n.fc2) } func (n Net) Forward(x torch.Tensor) torch.Tensor { x := n.conv1.Forward(x) x = torch.Relu(x) x = n.conv2.Forward(x) x = torch.Relu(x) x = torch.MaxPool2d(x, 2) x = n.dropout1(x) x = torch.Flatten(x, 1) x = n.fc1(x) x = torch.Relu(x) x = n.dropout2(x) x = n.fc2(x) output := torch.LogSoftMax(x, 1) return output } func main() { model := NewNet() model.Train() sgd := torch.NewSGD(n.Parameters(), 0.01, 0.5) sgd.ZeroGrad() data := torch.Rand({2, 1, 28, 28}) target := torch.RandInt({1,10, {2, }}) output := n.Forward(data) loss := torch.NllLoss(output, target) loss.Backward() sgd.Step() fmt.Println("Loss:") }

C++

#include <torch/torch.h>

#include <cstddef>
#include <cstdio>
#include <iostream>
#include <string>
#include <vector>

struct Net: torch::nn::Module {
  Net()
      : conv1(torch::nn::Conv2dOptions(1, 10, /*kernel_size=*/5)),
        conv2(torch::nn::Conv2dOptions(10, 20, /*kernel_size=*/5)),
        dropout1(0.25),
        dropout2(0.5),
        fc1(320, 50),
        fc2(50, 10) {
    register_module("conv1", conv1);
    register_module("conv2", conv2);
    register_module("dropout1", dropout1);
    register_module("dropout2", dropout2);
    register_module("fc1", fc1);
    register_module("fc2", fc2);
  }

  torch::Tensor forward(torch::Tensor x) {
    auto x = conv1->forward(x);
    x = torch::relu(x);
    x = conv2->forward(x);
    x = torch::relu(x);
    x = torch::max_pool2d(x, 2);
    x = dropout1(x);
    x = torch::flatten(x, 1);
    x = fc1(x);
    x = torch::relu(x);
    x = dropout2(x);
    auto output = fc2(x);
    return torch::log_softmax(x, 1);
  }

  torch::nn::Conv2d conv1;
  torch::nn::Conv2d conv2;
  torch::nn::Dropout dropout1;
  torch::nn::Dropout dropout2;
  torch::nn::Linear fc1;
  torch::nn::Linear fc2;
};

auto main() -> int {
  Net model;
  model.train();
  auto sgd = torch::optim::SGD(
      model.parameters(), torch::optim::SGDOptions(0.01).momentum(0.5));
  sgd.zero_grad();
  auto data = torch::rand({2, 3, 224, 224});
  auto target = torch::randint(1, 10, {2, });
  auto output = model.forward(data);
  auto loss = torch::nll_loss(output, target);
  loss.backward();
  sgd.step();
  std::printf("Loss: %.6f", loss.template item<float>());
}

package main
import (
	torch "github.com/wangkuiyi/gotorch"
)

type Net struct {
	torch.Module
	conv1 torch.Conv2d
	conv2 torch.Conv2d
	dropout1 torch.Dropout1
	dropout2 torch.Dropout2
	fc1 torch.Linear
	fc2 torch.Linear
}

func NewNet() {
	n := &Net{
		torch.Model{},
		conv1: &torch.Conv2d(1, 10, 5),
		conv2: &torch.Conv2d(10, 20, 5),
		dropout1: &torch.Dropout1(0.25)
		dropout2: &torch.Dropout2(0.5)
		fc1: &torch.Linear(9216, 128),
		fc2: &torch.Linear(128, 10),
	}
	n.registerModule()
	return m
}

func (n Net) registerModule() {
	n.RegisterModule("conv1", n.conv1)
	n.RegisterModule("conv2", n.conv2)
	n.RegisterModule("dropout1", n.dropout1)
	n.RegisterModule("dropout2", n.dropout2)
	n.RegisterModule("fc1", n.fc1)
	n.RegisterModule("fc2", n.fc2)
}

func (n Net) Forward(x torch.Tensor) torch.Tensor {
	x := n.conv1.Forward(x)
	x = torch.Relu(x)
	x = n.conv2.Forward(x)
	x = torch.Relu(x)
	x = torch.MaxPool2d(x, 2)
	x = n.dropout1(x)
	x = torch.Flatten(x, 1)
	x = n.fc1(x)
	x = torch.Relu(x)
	x = n.dropout2(x)
	x = n.fc2(x)
	output := torch.LogSoftMax(x, 1)
	return output 
}

func main() {
	model := NewNet()
	model.Train()
	sgd := torch.NewSGD(n.Parameters(), 0.01, 0.5)
	sgd.ZeroGrad()
	data := torch.Rand({2, 1, 28, 28})
	target := torch.RandInt({1,10, {2, }})
	output := n.Forward(data)
	loss := torch.NllLoss(output, target)
	loss.Backward()
	sgd.Step()
	fmt.Println("Loss:")
}

Python	Go+Torch
from __future__ import print_function import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 32, 3, 1) self.conv2 = nn.Conv2d(32, 64, 3, 1) self.dropout1 = nn.Dropout2d(0.25) self.dropout2 = nn.Dropout2d(0.5) self.fc1 = nn.Linear(9216, 128) self.fc2 = nn.Linear(128, 10) def forward(self, x): x = self.conv1(x) x = F.relu(x) x = self.conv2(x) x = F.relu(x) x = F.max_pool2d(x, 2) x = self.dropout1(x) x = torch.flatten(x, 1) x = self.fc1(x) x = F.relu(x) x = self.dropout2(x) x = self.fc2(x) output = F.log_softmax(x, dim=1) return output model = Net() model.train() optimizer = optim.Adadelta(model.parameters(), lr=0.1) data = torch.rand((2, 1, 28, 28)) target = torch.randint(1, 10, (2,)) output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step() print("Loss: {:.6f}".format(loss.item()))	package main import ( torch "github.com/wangkuiyi/gotorch" ) type Net struct { torch.Module conv1 torch.Conv2d conv2 torch.Conv2d dropout1 torch.Dropout1 dropout2 torch.Dropout2 fc1 torch.Linear fc2 torch.Linear } func NewNet() { n := &Net { torch.Model{}, conv1: &torch.Conv2d(1, 10, 5), conv2: &torch.Conv2d(10, 20, 5), dropout1: &torch.Dropout1(0.25) dropout2: &torch.Dropout2(0.5) fc1: &torch.Linear(9216, 128), fc2: &torch.Linear(128, 10), } return n } func (n Net) Forward(x torch.Tensor) torch.Tensor { x := n.conv1.Forward(x) x = torch.Relu(x) x = n.conv2.Forward(x) x = torch.Relu(x) x = torch.MaxPool2d(x, 2) x = n.dropout1(x) x = torch.Flatten(x, 1) x = n.fc1(x) x = torch.Relu(x) x = n.dropout2(x) x = n.fc2(x) output := torch.LogSoftMax(x, 1) return output } model := Net() model.Train() sgd := torch.NewSGD(m.Parameters(), 0.01, 0.5) sgd.ZeroGrad() data := torch.Rand({2, 1, 28, 28}) target := torch.RandInt({1,10, {2, }}) output := m.Forward(data) loss := torch.NllLoss(output, target) loss.Backward() sgd.Step() println("Loss: %0.6f", loss.Item())

Python

Go+Torch

from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output


model = Net()
model.train()
optimizer = optim.Adadelta(model.parameters(), lr=0.1)
data = torch.rand((2, 1, 28, 28))
target = torch.randint(1, 10, (2,))
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
print("Loss: {:.6f}".format(loss.item()))

package main
import (
	torch "github.com/wangkuiyi/gotorch"
)

type Net struct {
	torch.Module
	conv1 torch.Conv2d
	conv2 torch.Conv2d
	dropout1 torch.Dropout1
	dropout2 torch.Dropout2
	fc1 torch.Linear
	fc2 torch.Linear
}

func NewNet() {
	n := &Net {
		torch.Model{},
		conv1: &torch.Conv2d(1, 10, 5),
		conv2: &torch.Conv2d(10, 20, 5),
		dropout1: &torch.Dropout1(0.25)
		dropout2: &torch.Dropout2(0.5)
		fc1: &torch.Linear(9216, 128),
		fc2: &torch.Linear(128, 10),
	}
	return n
}

func (n Net) Forward(x torch.Tensor) torch.Tensor {
	x := n.conv1.Forward(x)
	x = torch.Relu(x)
	x = n.conv2.Forward(x)
	x = torch.Relu(x)
	x = torch.MaxPool2d(x, 2)
	x = n.dropout1(x)
	x = torch.Flatten(x, 1)
	x = n.fc1(x)
	x = torch.Relu(x)
	x = n.dropout2(x)
	x = n.fc2(x)
	output := torch.LogSoftMax(x, 1)
	return output 
}

model := Net()
model.Train()
sgd := torch.NewSGD(m.Parameters(), 0.01, 0.5)
sgd.ZeroGrad()
data := torch.Rand({2, 1, 28, 28})
target := torch.RandInt({1,10, {2, }})
output := m.Forward(data)
loss := torch.NllLoss(output, target)
loss.Backward()
sgd.Step()
println("Loss: %0.6f", loss.Item())

Simplify MNIST example codes with Go+

Case 1: type inference

Go code:

x = torch.View(x, []int64{-1, 28 * 28})

Go+ code:

x = torch.View(x, {-1, 28 * 28})

Case 2: default parameter value

Go code:

loss := F.NllLoss(pred, target, torch.Tensor{}, -100, "mean")

Go+ code:

loss := F.NllLoss(pred, target)

Case 3: for range support, goplus/gop#508

Go code:

for epoch := 0; epoch < epochs; epoch++ {
}

Go+ code:

for epoch <- range(epochs) {
}

MNIST example throws exceptions due to no dataset

=== RUN   ExampleMNIST
libc++abi.dylib: terminating with uncaught exception of type c10::Error: Error opening images file at ./data/train-images-idx3-ubyte (read_images at ../torch/csrc/api/src/data/datasets/mnist.cpp:66)
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x477ff47 in libc10.dylib)
frame #1: torch::data::datasets::MNIST::MNIST(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, torch::data::datasets::MNIST::Mode) + 3018 (0xb2d046a in libtorch_cpu.dylib)
frame #2: MNIST + 70 (0x465c1e6 in libcgotorch.so)
frame #3: _cgo_2879cf5c9dd9_Cfunc_MNIST + 29 (0x42a0a2d in gotorch.test)
frame #4: runtime.asmcgocall + 112 (0x4067760 in gotorch.test)

It seems that we need to download the dataset and unpack it to ./data.

`NewFunctional` only accepts functions with `func(torch.Tensor) torch.Tensor` type

NewFunctional only accepts functions with func(torch.Tensor) torch.Tensor type.

We could write the following codes:

nn.NewFunctional(torch.Tanh)

However, LeakyRelu takes two input parameters.

func LeakyRelu(t Tensor, negativeSlope float64) Tensor {
	return t.LeakyRelu(negativeSlope)
}

We could not write the following codes directly.

nn.NewFunctional(torch.LeakyRelu(0.2))

Maybe we should borrow more features from the functional programming language, like currying in Haskell.

torch.LeakyRelu(0.2) will return a function with func(torch.Tensor) torch.Tensor type. Then, it will work well with NewFunctional.

There is also a project maxsz/curry which provides a way to support currying in Go.

Do we really need Module.Init?

From the definition as the following, I understand the only purpose for user-defined modules to call Module.Init in their newers is to let each sub-module know about its parent.

gotorch/nn/module.go

Lines 46 to 64 in 047d424

    
           func (m *Module) Init(outer IModule) { 
        
           	if m.outer != nil { 
        
           		return 
        
           	} 
        
           	moduleType := reflect.TypeOf(m).Elem() 
        
           	fv := reflect.ValueOf(outer).Elem() 
        
           	for i := 0; i < fv.NumField(); i++ { 
        
           		v := fv.Field(i) 
        
           		f := fv.Type().Field(i) 
        
           		if f.Type == moduleType && f.Name == moduleType.Name() { 
        
           			if v.Addr() == reflect.ValueOf(m) { 
        
           				// Calling Init in a valid Module: struct{*Module} or struct{Module} 
        
           				m.outer = outer 
        
           				m.isTraining = true 
        
           			} 
        
           		} 
        
           	} 
        
           	torchCheck(m.outer != nil, "GoTorch requires defining modules via embedding a `Module` struct by value") 
        
           }

Is this purpose due to the requirement that when the user calls a module's To or ZeroGrad method, we can trace up to the top ancestor of the sub-module hierarchy and make sure that all modules in the hierarchy move to the specified device or have all parameter gradients cleared?

If this is the reasoning behind Module.Init, I am afraid that the implementation of To and ZeroGrad are not tracing up to the root; instead, I see them simply call m.outer.

gotorch/nn/module.go

Lines 97 to 101 in 047d424

    
           func (m *Module) To(device torch.Device) { 
        
           	// TODO(shendiaomo): to be implemented after the `To` method of `Tensors` is ready 
        
           	moduleType := reflect.TypeOf((*IModule)(nil)).Elem() 
        
           	tensorType := reflect.TypeOf((*torch.Tensor)(nil)).Elem() 
        
           	sv := reflect.ValueOf(m.outer).Elem() // Elem gets what the pointer points to.

reference:

Port more PyTorch modules

By grepping the official DCGAN example program, we see the following modules need to be ported before we can run DCGAN with GoTorch.

$ curl -Ls https://raw.githubusercontent.com/pytorch/examples/master/dcgan/main.py | grep 'nn\.'
import torch.nn.parallel
cudnn.benchmark = True
        torch.nn.init.normal_(m.weight, 0.0, 0.02)
        torch.nn.init.normal_(m.weight, 1.0, 0.02)
        torch.nn.init.zeros_(m.bias)
class Generator(nn.Module):
        self.main = nn.Sequential(
            nn.ConvTranspose2d(     nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 2,     ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            nn.ConvTranspose2d(    ngf,      nc, 4, 2, 1, bias=False),
            nn.Tanh()
            output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
class Discriminator(nn.Module):
        self.main = nn.Sequential(
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
            output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
criterion = nn.BCELoss()

Tune function rangeI

The current definition is here

gotorch/example/resnet/resnet.go

Lines 31 to 40 in 9010c80

    
           func rangeI(n int64) []int64 { 
        
           	res := []int64{} 
        
           	if n <= 0 { 
        
           		return res 
        
           	} 
        
           	for i := int64(0); i < n; i++ { 
        
           		res = append(res, i) 
        
           	} 
        
           	return res 
        
           }

I noticed some issues with this function.

The frequent call to append might be expensive. Each call might deallocate and reallocate and copy existing slice data. Instead, we can make([]int64, n), then copy each generated element to the right place.
This function would crash the program if the parameter n is too large. It would be safer to add a check to panic if n is larger than a threshold.

sgd example crashes randomly

I run sgd example many times. I find that it may crash with a low probability.

Unify data processing for training and prediction

To see if we can define a framework of Dataset, DataLoader, and Transformers in Go for GoTorch

crashed on the invalide MNIST dataset root folder

As the title description, the crashed logs as the following:

=== RUN   TestExampleMNIST
libc++abi.dylib: terminating with uncaught exception of type c10::Error: Error opening images file at ./unsdsdf/train-images-idx3-ubyte (read_images at ../torch/csrc/api/src/data/datasets/mnist.cpp:66)
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x4759f47 in libc10.dylib)
frame #1: torch::data::datasets::MNIST::MNIST(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, torch::data::datasets::MNIST::Mode) + 3018 (0xc8ab46a in libtorch_cpu.dylib)
frame #2: MNIST + 73 (0x4630859 in libcgotorch.so)
frame #3: _cgo_e3c33f78a9c2_Cfunc_MNIST + 29 (0x42920dd in gotorch.test)
frame #4: runtime.asmcgocall + 112 (0x405efa0 in gotorch.test)

SIGABRT: abort
PC=0x7fff6a37533a m=0 sigcode=0

goroutine 0 [idle]:
runtime: unknown pc 0x7fff6a37533a
stack: frame={sp:0x7ffeefbfec88, fp:0x0} stack=[0x7ffeefb80718,0x7ffeefbff780)
00007ffeefbfeb88:  00007ffeefbff1a0  00007ffeefbfebb0
00007ffeefbfeb98:  0000000009900acb  0000000000000000
00007ffeefbfeba8:  0000000000000041  00007ffeefbff130
00007ffeefbfebb8:  00007ffeefbfebf0  000000000000037f
00007ffeefbfebc8:  0000000000000000  0000000032aaaba2
00007ffeefbfebd8:  0000000000000000  0000000000000000
00007ffeefbfebe8:  00007ffeefbfed20  0000000000000000
....

Transformers required by ImageNet training example

To process the ImageNet dataset, we need

ImageNetDataset to load ImageNet images with .tar.gz file.
convert HWC to CHW format

and the following transform functions:

GPU memory profiling

We are trying to compare the GPU memory consumption between GoTorch and PyTorch with the Resnet50 model. The scripts locate at https://github.com/wangkuiyi/gotorch/tree/develop/example/resnet.

The GPU card is P100 with 16G memory.

Experiment 1:

Following is the result, it's measured with nvidia-smi command.

	Only Forward	Forward and Backward
PyTorch	3719 MiB	2545 MiB
GoTorch	2447 MiB	2767 MiB

We remove three-line codes in Only Forward scenario:

# optimizer.zero_grad()
# loss.backward()
# optimizer.step()

Experiment 2:

GPU memory with different batch size:

Batch Size	16	128	160
PyTorch	2545 MiB	13161 MiB	15295 MiB
GoTorch	2767 MiB	14755 MiB	OOM

	func MustNil(err unsafe.Pointer) {
	if err != nil {
	msg := C.GoString((*C.char)(err))
	C.FreeString((*C.char)(err))
	panic(msg)
	}
	}

	func (m *Module) Init(outer IModule) {
	if m.outer != nil {
	return
	}
	moduleType := reflect.TypeOf(m).Elem()
	fv := reflect.ValueOf(outer).Elem()
	for i := 0; i < fv.NumField(); i++ {
	v := fv.Field(i)
	f := fv.Type().Field(i)
	if f.Type == moduleType && f.Name == moduleType.Name() {
	if v.Addr() == reflect.ValueOf(m) {
	// Calling Init in a valid Module: struct{*Module} or struct{Module}
	m.outer = outer
	m.isTraining = true
	}
	}
	}
	torchCheck(m.outer != nil, "GoTorch requires defining modules via embedding a `Module` struct by value")
	}

	func (m *Module) To(device torch.Device) {
	// TODO(shendiaomo): to be implemented after the `To` method of `Tensors` is ready
	moduleType := reflect.TypeOf((*IModule)(nil)).Elem()
	tensorType := reflect.TypeOf((*torch.Tensor)(nil)).Elem()
	sv := reflect.ValueOf(m.outer).Elem() // Elem gets what the pointer points to.

	func rangeI(n int64) []int64 {
	res := []int64{}
	if n <= 0 {
	return res
	}
	for i := int64(0); i < n; i++ {
	res = append(res, i)
	}
	return res
	}

wangkuiyi / gotorch Goto Github PK

gotorch's Introduction

GoTorch

Easy Switch

Benefits

The Tech Stack

Documentation

gotorch's People

Contributors

Stargazers

Watchers

Forkers

gotorch's Issues

Case Study 1: Free Tensors

Problem 1: Destruct Tensors Created In The Train Loop To Avoid Memory Leak

We can use defer to destruct Tensors in the train loop

We can use defer to destruct Tensors in the Forward function (in a tricky way)

Should we bookkeeping the Tensors in C++?

Case Study 2: Record Errors

Recommend Projects

Recommend Topics

Recommend Org

Problem 1: Destruct `Tensor`s Created In The Train Loop To Avoid Memory Leak

We can use `defer` to destruct `Tensor`s in the train loop

We can use `defer` to destruct `Tensor`s in the `Forward` function (in a tricky way)

Should we bookkeeping the `Tensors` in C++?