Giter Site home page Giter Site logo

gotorch's Introduction

GoTorch

TravisCI codecov CircleCI GoDoc

GoTorch reimplements PyTorch high-level APIs, including modules and functionals, in idiomatic Go. Thus enables deep learning programming in Go and Go+. This project is in its very early stage.

Easy Switch

Writing deep learning systems in Go is as efficiently as in Python. The DCGAN training programs in GoTorch and PyTorch call similar APIs, have similar program structure, and have a similar number of lines. Go+ has a syntax similar to Python. The Go+ compiler translates Go+ programs into Go source programs. It is a joy to write Go+ programs that calls Go packages like GoTorch.

We have a plan of a translator that migrates existing PyTorch models in Python into GoTorch.

Benefits

  1. Higher runtime efficiency. Go programs run as efficiently as C++.

  2. Training and prediction in the same language. No longer training in Python and online prediction in C++. All in Go/Go+. No TensorFlow graphs or PyTorch tracing.

  3. Same data processing code for training and prediction. No need to Wrap OpenCV functions into TensorFlow operators in C++ for prediction and Python for training.

  4. Supports many machine learning paradigms., including adversarial, reinforcement, and imitation learning -- those we cannot split into training and prediction.

  5. Same program for edge and cloud. GoTorch programs compile and run on phones and self-driving cars as they do on servers and desktops.

The Tech Stack

GoTorch works with the following open-source communities to form Go+Torch.

  • the Go+ community,
  • the PyTorch community, and
  • the TensorFlow XLA ecosystem.

The following figure reveals the stack of technologies.

Go+ applications   # users write DL applications in Go+,
     │             # whose syntax is as concise as Python
 [Go+ compiler]
     ↓
Go source code -→ GoTorch -→ libtorch -→ pytorch/xla -→ XLA ops
     │
 [Go compiler]
     ↓
executable binary  # x86_64, ARM, CUDA, TPU
                   # Linux, macOS, Android, iOS

Documentation

gotorch's People

Contributors

hexvalid avatar lhw362950217 avatar ljk53 avatar qijune avatar qiukun avatar shendiaomo avatar sneaxiy avatar typhoonzero avatar wangkuiyi avatar yancey1989 avatar zhiqwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gotorch's Issues

go test -v fails with Raspbian 10 (buster) on RPi 4

pi@raspberrypi:~/go/src/github.com/wangkuiyi/gotorch/cgotorch $ make -f Makefile.rpi
rm -f libtorch
ln -s rpi/libtorch libtorch
clang++ -std=c++14 \
-I .. \
-I libtorch/include \
-I libtorch/include/torch/csrc/api/include \
-L libtorch/lib \
-fPIC \
-shared \
cgotorch.cc \
-o libcgotorch.so -install_name @rpath/libcgotorch.so \
-Wl,-rpath,libtorch/lib \
-Wl,-force_load libtorch/lib/libc10.so \
-lc10 -ltorch -ltorch_cpu \
-D_GLIBCXX_USE_CXX11_ABI=1
clang: warning: argument unused during compilation: '-install_name @rpath/libcgotorch.so' [-Wunused-command-line-argument]
pi@raspberrypi:~/go/src/github.com/wangkuiyi/gotorch/cgotorch $ cd ..
pi@raspberrypi:~/go/src/github.com/wangkuiyi/gotorch $ go test -v
=== RUN   TestPanicMNIST
--- PASS: TestPanicMNIST (0.00s)
=== RUN   TestLogSoftmax
--- PASS: TestLogSoftmax (0.00s)
=== RUN   ExampleBackward
--- FAIL: ExampleBackward (0.00s)
panic: size mismatch, m1: [17179869187 x 0], m2: [4294967300 x 0] at /home/pi/src/pytorch/aten/src/TH/generic/THTensorMath.cpp:4 [recovered]
	panic: size mismatch, m1: [17179869187 x 0], m2: [4294967300 x 0] at /home/pi/src/pytorch/aten/src/TH/generic/THTensorMath.cpp:4

goroutine 1 [running]:
testing.(*InternalExample).processRunResult(0x253de90, 0x0, 0x0, 0x15c6f4, 0x0, 0x28c140, 0x2410848, 0x1)
	/home/pi/usr/go/src/testing/example.go:89 +0x488
testing.runExample.func2(0x78b7046f, 0xbfc4a256, 0x7982f7, 0x0, 0x5020e0, 0x2410818, 0x24100d8, 0x241c400, 0x253de90, 0x253dea8)
	/home/pi/usr/go/src/testing/run_example.go:58 +0xd4
panic(0x28c140, 0x2410848)
	/home/pi/usr/go/src/runtime/panic.go:969 +0x118
github.com/wangkuiyi/gotorch.MustNil(0x2153628)
	/home/pi/go/src/github.com/wangkuiyi/gotorch/tensor.go:59 +0x70
github.com/wangkuiyi/gotorch.MM(0x2410838, 0x2410820, 0x2)
	/home/pi/go/src/github.com/wangkuiyi/gotorch/tensor.go:171 +0x48
github.com/wangkuiyi/gotorch_test.ExampleBackward()
	/home/pi/go/src/github.com/wangkuiyi/gotorch/backward_test.go:13 +0x104
testing.runExample(0x2d0b6c, 0xf, 0x2e64a8, 0x0, 0x0, 0x0, 0x0)
	/home/pi/usr/go/src/testing/run_example.go:62 +0x184
testing.runExamples(0x253df70, 0x4ca840, 0x7, 0x7, 0x101)
	/home/pi/usr/go/src/testing/example.go:44 +0x104
testing.(*M).Run(0x24512c0, 0x0)
	/home/pi/usr/go/src/testing/testing.go:1250 +0x1f8
main.main()
	_testmain.go:62 +0x120
exit status 2
FAIL	github.com/wangkuiyi/gotorch	0.325s
pi@raspberrypi:~/go/src/github.com/wangkuiyi/gotorch $

go test fails with -race

=== RUN   ExampleBackward
fatal error: checkptr: unsafe pointer arithmetic

goroutine 1 [running]:
runtime.throw(0x4504143, 0x23)
	/usr/local/Cellar/go/1.14.6/libexec/src/runtime/panic.go:1116 +0x72 fp=0xc0001499b0 sp=0xc000149980 pc=0x4036842
runtime.checkptrArithmetic(0xc0000100a0, 0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.14.6/libexec/src/runtime/checkptr.go:43 +0xb5 fp=0xc0001499e0 sp=0xc0001499b0 pc=0x4008c15
github.com/wangkuiyi/gotorch.Optimizer.AddParameters.func1(0xc000010098, 0xc0000100a0, 0xc000149a98)
	/Users/yi/go/src/github.com/wangkuiyi/gotorch/optim.go:45 +0x70 fp=0xc000149a28 sp=0xc0001499e0 pc=0x41c6150
github.com/wangkuiyi/gotorch.Optimizer.AddParameters(0xc000010098, 0xc000149af0, 0x1, 0x1)
	/Users/yi/go/src/github.com/wangkuiyi/gotorch/optim.go:45 +0x1d4 fp=0xc000149ac0 sp=0xc000149a28 pc=0x41c4c94
github.com/wangkuiyi/gotorch_test.ExampleBackward()
	/Users/yi/go/src/github.com/wangkuiyi/gotorch/backward_test.go:10 +0x110 fp=0xc000149b28 sp=0xc000149ac0 pc=0x44128a0
testing.runExample(0x44fc6be, 0xf, 0x4511ed0, 0x0, 0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.14.6/libexec/src/testing/run_example.go:62 +0x275 fp=0xc000149c68 sp=0xc000149b28 pc=0x4149eb5
testing.runExamples(0xc000149ed8, 0x47e60e0, 0x7, 0x7, 0x101)
	/usr/local/Cellar/go/1.14.6/libexec/src/testing/example.go:44 +0x212 fp=0xc000149d68 sp=0xc000149c68 pc=0x4147c52
testing.(*M).Run(0xc000140080, 0x0)
	/usr/local/Cellar/go/1.14.6/libexec/src/testing/testing.go:1250 +0x4f4 fp=0xc000149f00 sp=0xc000149d68 pc=0x414ed84
main.main()
	_testmain.go:62 +0x224 fp=0xc000149f88 sp=0xc000149f00 pc=0x4414f54
runtime.main()
	/usr/local/Cellar/go/1.14.6/libexec/src/runtime/proc.go:203 +0x1fa fp=0xc000149fe0 sp=0xc000149f88 pc=0x4038eaa
runtime.goexit()
	/usr/local/Cellar/go/1.14.6/libexec/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc000149fe8 sp=0xc000149fe0 pc=0x406a431

goroutine 9 [runnable]:
testing.runExample.func1(0xc000010080, 0xc00005a540)
	/usr/local/Cellar/go/1.14.6/libexec/src/testing/run_example.go:35
created by testing.runExample
	/usr/local/Cellar/go/1.14.6/libexec/src/testing/run_example.go:35 +0x1c7
FAIL	github.com/wangkuiyi/gotorch	0.391s

Should we merge aten and torch into gotorch?

An alternative to having gotorch/aten and gotorch/torch is that we have only gotorch/, which includes files like tensor.go and optim/adam.go and optim/sgd.go.

I am afraid that in the future, we are going to use the XLA backend of PyTorch https://github.com/pytorch/xla, and there might be a vague boundary between aten and torch.

Complete MNIST example

As the discussion on torch.Module API #28 , I will complete the MNIST e2e example, there are some Pytorch module has port into GoTorch, we also need some others to complete the MNIST example:

  • nn.NLLLoss
  • nn.LogSoftmax

That the MNIST example would be like:

import (
	torch "github.com/wangkuiyi/gotorch"
)

type Net struct {
  fc1, fc2, fc3  torch.Linear
}

func NewNet() torch.Module{
  return &Net{
    fc1 : torch.Linear(28 * 28, 512, false),
    fc2 : torch.Linear(512, 512, false),
    fc3 : torch.Linear(512, 10, false),
  }
}

func (n Net) Forward(x torch.Tensor) torch.Tensor {
  x := torch.View(x)
  x = n.fc1(x)
  x = torch.Relu(x)
  x = n.fc2(x)
  x = torch.Relu(x)
  return n.fc3(x)
}

func main() {
  dataset := torch.NewMNIST(dataDir())
  dataset.AddTransforms([]torch.Transform{
		torch.NewNormalize(0.1307, 0.3081),
		torch.NewStack(),
  })
  trainLoader := torch.NewDataLoader(dataset, 8)
  net := NewNet()
  criterion = torch.CrossEntropyLoss()
  opt := torch.SGD(0.1, 0, 0, 0, false)
  opt.AddParameters(torch.GetParameters(net))

  batchIdx := 0
  for trainLoader.Scan() {
    batch := trainLoader.Batch()
    pre := net.Forward(batch.Data)
    loss := criterion(pre,  batch.Target)
    fmt.Println("BatchIdx: [%d],  Loss: [%f]", batchIdx, loss.item())
    opt.ZeroGrad()
    loss.Backward()
    opt.Step()
    batchIdx++
  }
  torch.FinishGC()
  opt.Close()
  torch.CloseModule(net)
}

Support GPU training in GoTorch

With the PyTorch API, users can specify the runtime device easily just like the following code:

Device  = torch.Device("cuda")      # create CUDA device instance
x = torch.randn((2,3)).to(device)   # assign Tensor memory to CUDA device  
net = MNISTNet()
net.to(device)                      # assign module parameters to CUDA device

In GoTorch, we would like to provide the same API Tensor.To and Module.To so that users can train/pred a model on various device.

Device := torch.NewDevice("cuda")                      // create a CUDA device instance
x := torch.RandN([]int64{2,3}, false).To(device)       // assign Tensor x memory to CUDA device
net = NewMNISTNet()
net.To(device)                                         // assign parameters memory to CUDA device 

For the Tensor.To API, we just port device and Tensor::to function to Go.
For the Module.To API, we should assign parameters to device recursive, c.f. https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/include/torch/nn/module.h#L676

TODO list:

  • port device to Go to implement Tensor.To API.
  • Dockerfile.gpu and Makefile.gpu to build Gotorch with CUDA version.
  • Implement Module.To API.
  • Complete MNIST example running on CUDA device.

Build for NVIDIA Drive PX2

The arch is aarch64.

yi@nvidia:~/go/src/github.com/wangkuiyi/gotorch$ uname -a
Linux nvidia 4.4.38-rt49-tegra #1 SMP PREEMPT RT Tue Jul 25 09:26:02 PDT 2017 aarch64 aarch64 aarch64 GNU/Linux

I downloaded the pre-built libtorch for ARM from https://github.com/ljk53/pytorch-rpi.

yi@nvidia:~/go/src/github.com/wangkuiyi/gotorch$ cgotorch/build.sh
~/go/src/github.com/wangkuiyi/gotorch/cgotorch ~/go/src/github.com/wangkuiyi/gotorch
Building for Raspbian ...
rm -f libtorch
ln -s rpi/libtorch libtorch
g++ -std=c++14 \
-I .. \
-I libtorch/include \
-I libtorch/include/torch/csrc/api/include \
-L libtorch/lib \
-fPIC \
-shared \
optim.cc device.cc mnist_dataset.cc torch.cc pickle.cc functional.cc tensor.cc init.cc \
-O -o libcgotorch.so  \
-Wl,-rpath,libtorch/lib \
-Wl,-force_load libtorch/lib/libc10.so \
-lc10 -ltorch -ltorch_cpu \
-D_GLIBCXX_USE_CXX11_ABI=1
libtorch/lib/libc10.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
Makefile:7: recipe for target 'libcgotorch.so' failed
make: *** [libcgotorch.so] Error 1
~/go/src/github.com/wangkuiyi/gotorch

Add Chinese doc

We'd better add a suite of Chinese doc corresponding to our English version.

torch.nn.Module in Go

PyTorch APi has a key concept -- torch.nn.Module. Many builtin and user-defined models are classes derived from torch.nn.Module. The only method to override is forward(x).

Usually, a torch.nn.Module-derived class has data members representing the model parameters. For example, nn.Linear, the PyTorch implementation of the fully-connected layer has W and B -- the weights and the bias respectively.

In Go/Go+, the concept corresponds to a base class in Python is an interface. So, we provide type Module interface to mimic torch.nn.Module.

Then, we need a solution to free up tensors when a model's life is over.

Which levels of abstractions in C++ should be exposed to Go

There are three levels of abstractions:

  • Level 1: native function is a low-level API. There are many basic mathematical operations in it.

  • Level 2: nn.functional is a middle-level API. It's more close to deep learning. It uses basic mathematical operations to compose a complex neural network operation.

  • Level 3: nn.module is a high-level API. A module contains many states, such as parameters and buffers. It's a C++ class.

Let's take padding operator as an example:

expose to Go API contain state
native function C++ function, easy low-level API, flexible, few users may use it No
nn.functional C++ function, easy middle-level API, most users use it No
nn.module C++ class, hard high-level API, most users use it Yes, parameters and buffers

There is another interesting thing, nn.functional will try to fuse some basic native functions. Here is an example of nn.function.linear.

I am wondering which levels of abstractions in C++ I am supposed to expose to Go?

Should we unify the build environment?

Recently, my two PRs work well on my local mac development environment but fail in CI.

In #60, I use clang-format to format the C++ codes in local already, but it fails the pre-commit checking in CI.

$ pre-commit run -a
go fmt...................................................................Passed
go lint..................................................................Passed
validate toml........................................(no files to check)Skipped
Check files aren't using go's testing package........(no files to check)Skipped
cpplint..................................................................Passed
cppcheck.................................................................Passed
clang-format.............................................................Failed
- hook id: clang-format
- files were modified by this hook

In #64, I find that the macOS and Linux treat int64_t differently. In macOS, in64_t is long long, while long in Linux.

So, I have to write code (*C.longlong)(unsafe.Pointer(&stride[0])) in macOS, while (*C.long)(unsafe.Pointer(&stride[0])) in Linux. This introduce conditional compilation in go codes.

Can't optimize a non-leaf Tensor

When I run the dcgan example in GPU, I get the following error message:

terminate called after throwing an instance of 'c10::Error'
  what():  can't optimize a non-leaf Tensor
Exception raised from add_param_group at ../torch/csrc/api/src/optim/optimizer.cpp:80 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x69 (0x7f52653a7eb9 in /go/src/github.com/wangkuiyi/gotorch/cgotorch/libtorch/lib/libc10.so)

I find the same issue in PyTorch community: https://discuss.pytorch.org/t/tensor-to-device-changes-is-leaf-causing-cant-optimize-a-non-leaf-tensor/37659

test = torch.zeros((10,10)).requires_grad_(True)
print(test.is_leaf) # True
test = test.to(data.device)
print(test.is_leaf) # False

The To operation returns a new tensor, so, test becomes a non-leaf tensor.

We should keep the original reference. Instead calling v.Set(reflect.ValueOf(t.To(device, t.Dtype()))), we should call t.SetData(t.To(device, t.Dtype())) in Go code.

Cannot find symbols including the MNIST dataset on Linux

The command go test -v in the container complains that it cannot find symbols including the MNIST dataset. It is weird that go test -v works with macOS.

root@a483ce0b3e5d:/go/src/github.com/wangkuiyi/gotorch# go test -v
# github.com/wangkuiyi/gotorch
./cgotorch/libcgotorch.so: undefined reference to `torch::data::datasets::MNIST::MNIST(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::data::datasets::MNIST::Mode)'
./cgotorch/libcgotorch.so: undefined reference to `c10::Symbol::fromQualString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
./cgotorch/libcgotorch.so: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
collect2: error: ld returned 1 exit status
FAIL	github.com/wangkuiyi/gotorch [build failed]

Anyway, we can merge this PR and fix the problem in future PRs.

Originally posted by @wangkuiyi in #58 (comment)

Verify the loss value of ResNet training between Go and Python version

To get a ResNet50 training baseline of loss value, I running the resnet.py example and got the following logs:

batch: 10, loss: 10.711030, acc1: 0.000000, acc5: 0.000000
batch: 20, loss: 7.499339, acc1: 0.000000, acc5: 0.000000
batch: 30, loss: 7.281894, acc1: 0.000000, acc5: 0.000000
batch: 40, loss: 7.059255, acc1: 0.000000, acc5: 0.000000
batch: 50, loss: 7.000484, acc1: 0.000000, acc5: 0.000000
batch: 60, loss: 6.871602, acc1: 0.000000, acc5: 3.125000
batch: 70, loss: 6.962079, acc1: 0.000000, acc5: 0.000000
batch: 80, loss: 6.872428, acc1: 0.000000, acc5: 0.000000
batch: 90, loss: 6.922100, acc1: 0.000000, acc5: 0.000000
batch: 100, loss: 6.918412, acc1: 0.000000, acc5: 0.000000
batch: 110, loss: 6.880023, acc1: 0.000000, acc5: 0.000000
batch: 120, loss: 6.936709, acc1: 0.000000, acc5: 3.125000
batch: 130, loss: 6.936309, acc1: 0.000000, acc5: 0.000000
batch: 140, loss: 6.923660, acc1: 0.000000, acc5: 0.000000
batch: 150, loss: 6.924109, acc1: 0.000000, acc5: 0.000000
batch: 160, loss: 6.923644, acc1: 0.000000, acc5: 3.125000
...

TestTensorString failed on macOS

I noticed the failure due to TravsiCI failed with #191. TravisCI is configured to run on macOS VM.

=== RUN   TestTensorString
    TestTensorString: tensor_test.go:84: 
        	Error Trace:	tensor_test.go:84
        	Error:      	Not equal: 
        	            	expected: " 1.0000  1.1000  1.2000\n 2.0000  3.0000  4.0000\n[ CPUFloatType{2,3} ]"
        	            	actual  : "   0.0141    2.0000  512.0001\n   0.0000    0.0000    0.0000\n[ CPUDoubleType{2,3} ]"
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,3 +1,3 @@
        	            	- 1.0000  1.1000  1.2000
        	            	- 2.0000  3.0000  4.0000
        	            	-[ CPUFloatType{2,3} ]
        	            	+   0.0141    2.0000  512.0001
        	            	+   0.0000    0.0000    0.0000
        	            	+[ CPUDoubleType{2,3} ]
        	Test:       	TestTensorString

Is MustNil safe?

The definition is here

gotorch/tensor.go

Lines 21 to 27 in 66df6a4

func MustNil(err unsafe.Pointer) {
if err != nil {
msg := C.GoString((*C.char)(err))
C.FreeString((*C.char)(err))
panic(msg)
}
}

After converting the C pointer err into a Go string msg, we free err. Is msg still a valid Go object with its underlying C pointer freed?

MNIST training example on GoTorch and LibTorch can not convergence to the same accuracy

LibTorch MNIST example got the loss 0.0269 after 5 epochs:

Epoch 0, Loss: 0.1280
Epoch 1, Loss: 0.0659
Epoch 2, Loss: 0.0396
Epoch 3, Loss: 0.0304
Epoch 4, Loss: 0.0269

GoTorch MNIST example got the loss 1.4148 after 5 epochs:

2020/08/12 22:37:41 Epoch: 0, Loss: 4.8264
2020/08/12 22:37:46 Epoch: 1, Loss: 5.9624
2020/08/12 22:37:52 Epoch: 2, Loss: 2.4493
2020/08/12 22:37:58 Epoch: 3, Loss: 0.9619
2020/08/12 22:38:04 Epoch: 4, Loss: 1.4148

Why The Monad Pattern Looks Promising in Go+Torch Design

Monad is a programming pattern that records the output of each function call in a data structure, so we can free them at once afterward. It applies to many programming languages. Let us see why it is important to Go+Torch.

Go uses the pattern extensively, see https://www.innoq.com/en/blog/golang-errors-monads/ for an example.

Case Study 1: Free Tensors

We now allocate Tensor objects using new to keep the reference count in the shared_ptr field of the C++ Tensor class:

return new at::Tensor(std::move(c));

The Tensor objects newed would cause memory leak if we don't recycle them.

Assume that Go has a similar frontend API as C++, then according to the C++ MNIST example, let's think about the following problems.

Problem 1: Destruct Tensors Created In The Train Loop To Avoid Memory Leak

  1. Tensors Created In the C++ train loop:
    The train loop in mnist.cpp is like:

    for (auto& batch : data_loader) {
        auto data = batch.data.to(device), targets = batch.target.to(device);  // `data` and `targets` are `Tensor`s
        optimizer.zero_grad();
        auto output = model.forward(data);  // `output` is a `Tensor`
        auto loss = torch::nll_loss(output, targets);  // `loss` is a `Tensor`
        AT_ASSERT(!std::isnan(loss.template item<float>()));
        loss.backward();
        optimizer.step();
        //...
    }

    We can see these Tensors has to be created:

    1. data and targets as the features and labels of the dataset
    2. output as the predictions of the data
    3. loss

    We can use defer to destruct Tensors in the train loop

    Because data, targets, output, and loss are all stack variables, they are created and destroyed in each iteration of the C++ train loop. This implied the libtorch framework would take ownership of the Tensor s if necessary. As a result, a naive API of gotorch can use defer to recycle the reference-counted Tensors. That is, the following imaginary code would work okay.

    // We need this nested function to make `defer` works as expected.
    func step(batch *Batch) {
        // `data`, `targets`, `output`, `loss` are `Tensor`s.
        data := batch.Data.To(device)
        defer data.Close()
        target := batch.Target.To(device)  
        defer target.Close()
        optimizer.zero_grad()
        output := model.Forward(data)
        defer output.Close()
        loss = torch.NllLoss(output, targets)
        defer loss.Close()
        loss.Backward()
        optimizer.Step()
        // ...
    }
    for batch := range data_loader {
        step(batch)
    }

    The defers are a bit tedious, maybe we can improve the syntax of Go+ to save typing.

  2. Tensors Created In the C++ forward method
    The forward method is called by the train loop above, in the C++ mnist example, the forward method looks like:

    torch::Tensor forward(torch::Tensor x) {
        x = torch::relu(torch::max_pool2d(conv1->forward(x), 2));
        x = torch::relu(
            torch::max_pool2d(conv2_drop->forward(conv2->forward(x)), 2));
        x = x.view({-1, 320});
        x = torch::relu(fc1->forward(x));
        x = torch::dropout(x, /*p=*/0.5, /*training=*/is_training());
        x = fc2->forward(x);
        return torch::log_softmax(x, /*dim=*/1);
      }

    We can use defer to destruct Tensors in the Forward function (in a tricky way)

    Similar to the train loop above, x is a Tensor on the stack and is destroyed at the end of the function scope. The difference is that x is reassigned multiple times. So we cannot simply use defer x.Close() here. A workaround is requiring users to use a different idiom, for a naive example:

    func (net *Net) Forward(x torch.Tensor) torch.Tensor {  // The argument x is recycled in the train loop
        var tensors []Tensor
        defer func () {
            for t := range tensors {
                t.Close()
            }
        }()
        x = torch.Relu(torch.MaxPool2d(net.conv1.Forward(x), 2))
        append(tensors, x)
        x = torch.Relu(
            torch.MaxPool2d(net.conv2_drop.Forward(net.conv2.Forward(x)), 2))
        append(tensors, x)
        x = x.View([]int{-1, 320})
        append(tensors, x)
        x = torch.Relu(net.fc1.Forward(x))
        append(tensors, x)
        x = torch.Dropout(x, /*p=*/0.5, /*training=*/is_training())
        append(tensors, x)
        x = net.fc2.Forward(x)
        append(tensors, x)
        return torch.LogSoftmax(x, /*dim=*/1)  // The return value is recycled in the train loop
      }

    Obviously, this is not very elegant.

    Should we bookkeeping the Tensors in C++?

    A better way is keeping the tensors array in C++ rather than in Go, for example, we can use std::vector to record each C++ Tensor created by Go API, and provide a torch.CleanTensors for users to call at the end of the train loop. However, this solution is harder to design properly, for example, we have to take goroutines into consideration so as to avoid corrupting the std::vector.

Case Study 2: Record Errors

Few functions in libtorch have the noexcept tag. This implies that most of the functions in C++ may throw an exception. We have to expose an error return type for these functions' wrappers in Go. Recall the step function above:

func step(batch *Batch) {
    // `data`, `targets`, `output`, `loss` are `Tensor`s.
    data := batch.Data.To(device)
    defer data.Close()
    // ...
}

It may become the following in production code:

func step(batch *Batch) error {
    // `data`, `targets`, `output`, `loss` are `Tensor`s.
    data, err := batch.Data.To(device)
    if err != nil {
        return ...
    }
    defer data.Close()
    // ...
}

That is, the user should check whether there's an error on each line. This may be tedious too. Go+ has a neat syntax to unwrap errors, but I cannot think of an elegant way to solve the problem for the time being. See previous discussions also: goplus/gop#307 (comment), goplus/gop#307 (comment)

Comparison of Implementations of MobileNetV2 in Go and Python

MobileNetV2 (Inverted Residuals and Linear Bottlenecks) is a vision model for mobile devices.
pytorch has official implementations in both Python(mobilenet.py) and C++(mobilenet.h,
mobilenet.cpp), with about the same amount of code (162 lines v.s. 185 lines).

The following table compares the Go version and the Python version in terms of lines, as expected, they have about the same amount of code, too:

GoPython
package vision

import (
	"math"
	"torch"
	"torch/nn"
	"torch/nn/init"
)

func max(x, y int64) int64 { // math.max only works on floats
	if x > y {
		return x
	}
	return y
}

func makeDivisible(value float64, divisor int64, minValue *int64) int64 {
	if minValue == nil {
		min_value = divisor
	}
	newValue := max(*minValue, (int64(value+float64(divisor)/2)/divisor)*divisor)
	if newValue < .9*value {
		newValue += divisor
	}
	return newValue
}

type ConvBNReLU struct {
	nn.Sequential
}

func NewConvBNReLU(in_planes, out_planes, kernel_size, stride, groups int64) ConvBNReLU {
	ret := ConvBNReLU{nn.NewSequential()}
	options := nn.Conv2dOptions{in_planes, out_planes, kernel_size}
	ret.PushBack(nn.NewConv2d(
		options.stride(stride).padding(padding).groups(groups).bias(false)))
	ret.PushBack(nn.BatchNorm2d{out_planes})
	ret.PushBack(nn.Functional{nn.ReLU})
}

func (net *ConvBNReLU) Forward(x torch.Tensor) torch.Tensor {
	return net.Sequential.Forward(x)
}

type MobileNetInvertedResidual struct {
	nn.Module
	stride        int64
	useResConnect bool
	conv          nn.Sequential
}

func NewMobileNetInvertedResidual(
	input, output, stride int64, expandRatio float64) MobileNetInvertedResidual {
	net := MobileNetInvertedResidual{
		Module:        nn.NewModule(),
		stride:        stride,
		useResConnect: stride == 1 && input == output,
		conv:          nn.NewSequential()}

	net.stride = stride
	net.useResConnect = stride == 1 && input == output
	net.conv = nn.NewSequential()

	doubleCompare := func(a, b float64) {
		return math.Abs(a-b) < 1e-20
	}

	torch.CHECK(stride == 1 || stride == 2)
	hiddenDim := int64(math.Round(float64(input) * expandRatio))

	if !doubleCompare(expandRatio, 1) {
		conv.PushBack(NewConvBNReLU(input, hiddenDim, 1, 3, 1, 1))
	}
	net.conv.PushBack(NewConvBNReLU(hiddenDim, hiddenDim, 3, stride, hiddenDim, 1))
	options := nn.Conv2dOptions{hiddenDim, output, 1}
	net.conv.PushBack(nn.NewConv2d(options.stride(1).padding(0).bias(false)))

	net.RegisterModule("conv", net.conv)
	return net
}

func (net *MobileNetInvertedResidual) Forward(x torch.Tensor) torch.Tensor {
	if net.useResConnect {
		return net.Add(x + net.conv.Forward(x))
	}
	return net.conv.Forward(x)
}

type MobileNetV2 struct {
	nn.Module            // nn.Module is a monadic type
	lastChannel          int64
	features, classifier nn.Sequential
}

func NewMobileNetV2(
	numClasses int64,
	widthMult float64,
	invertedResidualSettings [][]int64,
	roundNearest int64) MobileNetV2 {
	net := MobileNetV2{
		Module:    nn.NewModule(),
		features:  nn.NewSequential(),
		classfier: nn.NewSequential()}
	var inputChannel int64 = 32
	var lastChannel int64 = 1280

	if invertedResidualSettings == nil || len(invertedResidualSettings) == 0 {
		invertedResidualSettings := [][]int64{
			// t, c, n, s
			{1, 16, 1, 1},
			{6, 24, 2, 2},
			{6, 32, 3, 2},
			{6, 64, 4, 2},
			{6, 96, 3, 1},
			{6, 160, 3, 2},
			{6, 320, 1, 1},
		}
	}

	torch.CHECK(
		len(invertedResidualSettings[0]) == 4,
		"inverted_residual_settings should contain 4-element vectors")

	inputChannel := makeDivisible(inputChannel*widthMult, roundNearest, nil)
	net.lastChannel =
		makeDivisible(lastChannel*math.max(1.0, widthMult), roundNearest, nil)
	net.features.PushBack(NewConvBNReLU(3, inputChannel, 3, 2))

	for setting := range invertedResidualSettings {
		outputChannel := makeDivisible(setting[1]*widthMult, roundNearest, nil)

		for i := 0; i < setting[2]; i++ {
			stride := 1
			if i == 0 {
				stride = setting[3]
			}
			features.PushBack(
				NewMobileNetInvertedResidual(
					inputChannel, outputChannel, stride, setting[0]))
			inputChannel = outputChannel
		}
	}
	net.features.PushBack(NewConvBNReLU(inputChannel, net.lastChannel, 1, 3, 1, 1))

	classifier.PushBack(nn.Dropout(0.2))
	classifier.PushBack(nn.Linear(net.lastChannel, net.numClasses))

	net.RegisterModule("features", net.features)
	net.RegisterModule("classifier", net.classifier)

	for module := range net.Modules(false) {
		switch M := module.(type) {
		case nn.Conv2d:
			init.KaimingNormal(M.Weight, 0, torch.kFanOut)
			if M.options.Bias {
				init.zeros(M.Bias)
			}
		case nn.BatchNorm2d:
			init.Ones(M.Weight)
			init.Zeros(M.Bias)
		case nn.Linear:
			init.Normal(M.Weight, 0, 0.01)
			init.Zero(M.Bias)
		}
	}
	return net
}

func (net *MobileNetV2) Forward(x torch.Tensor) torch.Tensor {
	x = net.features.Forward(x)
	x = net.Mean(x, []int{2, 3})
	x = net.classifier.Forwart(x)
	return x
}
from torch import nn


def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


class ConvBNReLU(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None):
        padding = (kernel_size - 1) // 2
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
            norm_layer(out_planes),
            nn.ReLU6(inplace=True)
        )


class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d

        hidden_dim = int(round(inp * expand_ratio))
        self.use_res_connect = self.stride == 1 and inp == oup

        layers = []
        if expand_ratio != 1:
            # pw
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer))
        layers.extend([
            # dw
            ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer),
            # pw-linear
            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
            norm_layer(oup),
        ])
        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self,
                 num_classes=1000,
                 width_mult=1.0,
                 inverted_residual_setting=None,
                 round_nearest=8,
                 block=None,
                 norm_layer=None):
        """
        MobileNet V2 main class

        Args:
            num_classes (int): Number of classes
            width_mult (float): Width multiplier - adjusts number of channels in each layer by this amount
            inverted_residual_setting: Network structure
            round_nearest (int): Round the number of channels in each layer to be a multiple of this number
            Set to 1 to turn off rounding
            block: Module specifying inverted residual building block for mobilenet
            norm_layer: Module specifying the normalization layer to use

        """
        super(MobileNetV2, self).__init__()

        if block is None:
            block = InvertedResidual

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d

        input_channel = 32
        last_channel = 1280

        if inverted_residual_setting is None:
            inverted_residual_setting = [
                # t, c, n, s
                [1, 16, 1, 1],
                [6, 24, 2, 2],
                [6, 32, 3, 2],
                [6, 64, 4, 2],
                [6, 96, 3, 1],
                [6, 160, 3, 2],
                [6, 320, 1, 1],
            ]

        # only check the first element, assuming user knows t,c,n,s are required
        if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:
            raise ValueError("inverted_residual_setting should be non-empty "
                             "or a 4-element list, got {}".format(inverted_residual_setting))

        # building first layer
        input_channel = _make_divisible(input_channel * width_mult, round_nearest)
        self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
        features = [ConvBNReLU(3, input_channel, stride=2, norm_layer=norm_layer)]
        # building inverted residual blocks
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * width_mult, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer))
                input_channel = output_channel
        # building last several layers
        features.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1, norm_layer=norm_layer))
        # make it nn.Sequential
        self.features = nn.Sequential(*features)

        # building classifier
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(self.last_channel, num_classes),
        )

        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def _forward_impl(self, x):
        # This exists since TorchScript doesn't support inheritance, so the superclass method
        # (this one) needs to have a name other than `forward` that can be accessed in a subclass
        x = self.features(x)
        # Cannot use "squeeze" as batch-size can be 1 => must use reshape with x.shape[0]
        x = nn.functional.adaptive_avg_pool2d(x, 1).reshape(x.shape[0], -1)
        x = self.classifier(x)
        return x

    def forward(self, x):
        return self._forward_impl(x)
174 lines162 lines

The C++ version has 185 lines because it strictly follows the 80 character line length limit. The code amount of C++, Go, Python is comparable.

Compare different frontend language training on MNIST dataset

Just like wring a program to print "Hello World" is our first cause on coding, training a model to implement handwriting recognition on the MNIST database is usually the first course on Deep Learning.

This issue tried to compare various frond-end language on how to train the model with C++, Go, Python, and Go+Torch.

C++ Go
#include <torch/torch.h>

#include <cstddef>
#include <cstdio>
#include <iostream>
#include <string>
#include <vector>

struct Net: torch::nn::Module {
  Net()
      : conv1(torch::nn::Conv2dOptions(1, 10, /*kernel_size=*/5)),
        conv2(torch::nn::Conv2dOptions(10, 20, /*kernel_size=*/5)),
        dropout1(0.25),
        dropout2(0.5),
        fc1(320, 50),
        fc2(50, 10) {
    register_module("conv1", conv1);
    register_module("conv2", conv2);
    register_module("dropout1", dropout1);
    register_module("dropout2", dropout2);
    register_module("fc1", fc1);
    register_module("fc2", fc2);
  }

  torch::Tensor forward(torch::Tensor x) {
    auto x = conv1->forward(x);
    x = torch::relu(x);
    x = conv2->forward(x);
    x = torch::relu(x);
    x = torch::max_pool2d(x, 2);
    x = dropout1(x);
    x = torch::flatten(x, 1);
    x = fc1(x);
    x = torch::relu(x);
    x = dropout2(x);
    auto output = fc2(x);
    return torch::log_softmax(x, 1);
  }

  torch::nn::Conv2d conv1;
  torch::nn::Conv2d conv2;
  torch::nn::Dropout dropout1;
  torch::nn::Dropout dropout2;
  torch::nn::Linear fc1;
  torch::nn::Linear fc2;
};

auto main() -> int {
  Net model;
  model.train();
  auto sgd = torch::optim::SGD(
      model.parameters(), torch::optim::SGDOptions(0.01).momentum(0.5));
  sgd.zero_grad();
  auto data = torch::rand({2, 3, 224, 224});
  auto target = torch::randint(1, 10, {2, });
  auto output = model.forward(data);
  auto loss = torch::nll_loss(output, target);
  loss.backward();
  sgd.step();
  std::printf("Loss: %.6f", loss.template item<float>());
}
package main
import (
	torch "github.com/wangkuiyi/gotorch"
)

type Net struct {
	torch.Module
	conv1 torch.Conv2d
	conv2 torch.Conv2d
	dropout1 torch.Dropout1
	dropout2 torch.Dropout2
	fc1 torch.Linear
	fc2 torch.Linear
}

func NewNet() {
	n := &Net{
		torch.Model{},
		conv1: &torch.Conv2d(1, 10, 5),
		conv2: &torch.Conv2d(10, 20, 5),
		dropout1: &torch.Dropout1(0.25)
		dropout2: &torch.Dropout2(0.5)
		fc1: &torch.Linear(9216, 128),
		fc2: &torch.Linear(128, 10),
	}
	n.registerModule()
	return m
}

func (n Net) registerModule() {
	n.RegisterModule("conv1", n.conv1)
	n.RegisterModule("conv2", n.conv2)
	n.RegisterModule("dropout1", n.dropout1)
	n.RegisterModule("dropout2", n.dropout2)
	n.RegisterModule("fc1", n.fc1)
	n.RegisterModule("fc2", n.fc2)
}

func (n Net) Forward(x torch.Tensor) torch.Tensor {
	x := n.conv1.Forward(x)
	x = torch.Relu(x)
	x = n.conv2.Forward(x)
	x = torch.Relu(x)
	x = torch.MaxPool2d(x, 2)
	x = n.dropout1(x)
	x = torch.Flatten(x, 1)
	x = n.fc1(x)
	x = torch.Relu(x)
	x = n.dropout2(x)
	x = n.fc2(x)
	output := torch.LogSoftMax(x, 1)
	return output 
}

func main() {
	model := NewNet()
	model.Train()
	sgd := torch.NewSGD(n.Parameters(), 0.01, 0.5)
	sgd.ZeroGrad()
	data := torch.Rand({2, 1, 28, 28})
	target := torch.RandInt({1,10, {2, }})
	output := n.Forward(data)
	loss := torch.NllLoss(output, target)
	loss.Backward()
	sgd.Step()
	fmt.Println("Loss:")
}
Python Go+Torch
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output


model = Net()
model.train()
optimizer = optim.Adadelta(model.parameters(), lr=0.1)
data = torch.rand((2, 1, 28, 28))
target = torch.randint(1, 10, (2,))
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
print("Loss: {:.6f}".format(loss.item()))
package main
import (
	torch "github.com/wangkuiyi/gotorch"
)

type Net struct {
	torch.Module
	conv1 torch.Conv2d
	conv2 torch.Conv2d
	dropout1 torch.Dropout1
	dropout2 torch.Dropout2
	fc1 torch.Linear
	fc2 torch.Linear
}

func NewNet() {
	n := &Net {
		torch.Model{},
		conv1: &torch.Conv2d(1, 10, 5),
		conv2: &torch.Conv2d(10, 20, 5),
		dropout1: &torch.Dropout1(0.25)
		dropout2: &torch.Dropout2(0.5)
		fc1: &torch.Linear(9216, 128),
		fc2: &torch.Linear(128, 10),
	}
	return n
}

func (n Net) Forward(x torch.Tensor) torch.Tensor {
	x := n.conv1.Forward(x)
	x = torch.Relu(x)
	x = n.conv2.Forward(x)
	x = torch.Relu(x)
	x = torch.MaxPool2d(x, 2)
	x = n.dropout1(x)
	x = torch.Flatten(x, 1)
	x = n.fc1(x)
	x = torch.Relu(x)
	x = n.dropout2(x)
	x = n.fc2(x)
	output := torch.LogSoftMax(x, 1)
	return output 
}

model := Net()
model.Train()
sgd := torch.NewSGD(m.Parameters(), 0.01, 0.5)
sgd.ZeroGrad()
data := torch.Rand({2, 1, 28, 28})
target := torch.RandInt({1,10, {2, }})
output := m.Forward(data)
loss := torch.NllLoss(output, target)
loss.Backward()
sgd.Step()
println("Loss: %0.6f", loss.Item())

Simplify MNIST example codes with Go+

  • Case 1: type inference

Go code:

x = torch.View(x, []int64{-1, 28 * 28})

Go+ code:

x = torch.View(x, {-1, 28 * 28})
  • Case 2: default parameter value

Go code:

loss := F.NllLoss(pred, target, torch.Tensor{}, -100, "mean")

Go+ code:

loss := F.NllLoss(pred, target)

Go code:

for epoch := 0; epoch < epochs; epoch++ {
}

Go+ code:

for epoch <- range(epochs) {
}

MNIST example throws exceptions due to no dataset

=== RUN   ExampleMNIST
libc++abi.dylib: terminating with uncaught exception of type c10::Error: Error opening images file at ./data/train-images-idx3-ubyte (read_images at ../torch/csrc/api/src/data/datasets/mnist.cpp:66)
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x477ff47 in libc10.dylib)
frame #1: torch::data::datasets::MNIST::MNIST(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, torch::data::datasets::MNIST::Mode) + 3018 (0xb2d046a in libtorch_cpu.dylib)
frame #2: MNIST + 70 (0x465c1e6 in libcgotorch.so)
frame #3: _cgo_2879cf5c9dd9_Cfunc_MNIST + 29 (0x42a0a2d in gotorch.test)
frame #4: runtime.asmcgocall + 112 (0x4067760 in gotorch.test)

It seems that we need to download the dataset and unpack it to ./data.

`NewFunctional` only accepts functions with `func(torch.Tensor) torch.Tensor` type

NewFunctional only accepts functions with func(torch.Tensor) torch.Tensor type.

We could write the following codes:

nn.NewFunctional(torch.Tanh)

However, LeakyRelu takes two input parameters.

func LeakyRelu(t Tensor, negativeSlope float64) Tensor {
	return t.LeakyRelu(negativeSlope)
}

We could not write the following codes directly.

nn.NewFunctional(torch.LeakyRelu(0.2))

Maybe we should borrow more features from the functional programming language, like currying in Haskell.

torch.LeakyRelu(0.2) will return a function with func(torch.Tensor) torch.Tensor type. Then, it will work well with NewFunctional.

There is also a project maxsz/curry which provides a way to support currying in Go.

Do we really need Module.Init?

From the definition as the following, I understand the only purpose for user-defined modules to call Module.Init in their newers is to let each sub-module know about its parent.

gotorch/nn/module.go

Lines 46 to 64 in 047d424

func (m *Module) Init(outer IModule) {
if m.outer != nil {
return
}
moduleType := reflect.TypeOf(m).Elem()
fv := reflect.ValueOf(outer).Elem()
for i := 0; i < fv.NumField(); i++ {
v := fv.Field(i)
f := fv.Type().Field(i)
if f.Type == moduleType && f.Name == moduleType.Name() {
if v.Addr() == reflect.ValueOf(m) {
// Calling Init in a valid Module: struct{*Module} or struct{Module}
m.outer = outer
m.isTraining = true
}
}
}
torchCheck(m.outer != nil, "GoTorch requires defining modules via embedding a `Module` struct by value")
}

Is this purpose due to the requirement that when the user calls a module's To or ZeroGrad method, we can trace up to the top ancestor of the sub-module hierarchy and make sure that all modules in the hierarchy move to the specified device or have all parameter gradients cleared?

If this is the reasoning behind Module.Init, I am afraid that the implementation of To and ZeroGrad are not tracing up to the root; instead, I see them simply call m.outer.

gotorch/nn/module.go

Lines 97 to 101 in 047d424

func (m *Module) To(device torch.Device) {
// TODO(shendiaomo): to be implemented after the `To` method of `Tensors` is ready
moduleType := reflect.TypeOf((*IModule)(nil)).Elem()
tensorType := reflect.TypeOf((*torch.Tensor)(nil)).Elem()
sv := reflect.ValueOf(m.outer).Elem() // Elem gets what the pointer points to.

Port more PyTorch modules

By grepping the official DCGAN example program, we see the following modules need to be ported before we can run DCGAN with GoTorch.

  • nn.BCELoss
  • nn.BatchNorm2d
  • nn.Conv2d
  • nn.ConvTranspose2d
  • nn.LeakyReLU
  • nn.ReLU
  • nn.Sequential
  • nn.Sigmoid
  • nn.Tanh
$ curl -Ls https://raw.githubusercontent.com/pytorch/examples/master/dcgan/main.py | grep 'nn\.'
import torch.nn.parallel
cudnn.benchmark = True
        torch.nn.init.normal_(m.weight, 0.0, 0.02)
        torch.nn.init.normal_(m.weight, 1.0, 0.02)
        torch.nn.init.zeros_(m.bias)
class Generator(nn.Module):
        self.main = nn.Sequential(
            nn.ConvTranspose2d(     nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            nn.ConvTranspose2d(ngf * 2,     ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            nn.ConvTranspose2d(    ngf,      nc, 4, 2, 1, bias=False),
            nn.Tanh()
            output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
class Discriminator(nn.Module):
        self.main = nn.Sequential(
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
            output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
criterion = nn.BCELoss()

Tune function rangeI

The current definition is here

func rangeI(n int64) []int64 {
res := []int64{}
if n <= 0 {
return res
}
for i := int64(0); i < n; i++ {
res = append(res, i)
}
return res
}

I noticed some issues with this function.

  1. The frequent call to append might be expensive. Each call might deallocate and reallocate and copy existing slice data. Instead, we can make([]int64, n), then copy each generated element to the right place.

  2. This function would crash the program if the parameter n is too large. It would be safer to add a check to panic if n is larger than a threshold.

crashed on the invalide MNIST dataset root folder

As the title description, the crashed logs as the following:

=== RUN   TestExampleMNIST
libc++abi.dylib: terminating with uncaught exception of type c10::Error: Error opening images file at ./unsdsdf/train-images-idx3-ubyte (read_images at ../torch/csrc/api/src/data/datasets/mnist.cpp:66)
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x4759f47 in libc10.dylib)
frame #1: torch::data::datasets::MNIST::MNIST(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, torch::data::datasets::MNIST::Mode) + 3018 (0xc8ab46a in libtorch_cpu.dylib)
frame #2: MNIST + 73 (0x4630859 in libcgotorch.so)
frame #3: _cgo_e3c33f78a9c2_Cfunc_MNIST + 29 (0x42920dd in gotorch.test)
frame #4: runtime.asmcgocall + 112 (0x405efa0 in gotorch.test)

SIGABRT: abort
PC=0x7fff6a37533a m=0 sigcode=0

goroutine 0 [idle]:
runtime: unknown pc 0x7fff6a37533a
stack: frame={sp:0x7ffeefbfec88, fp:0x0} stack=[0x7ffeefb80718,0x7ffeefbff780)
00007ffeefbfeb88:  00007ffeefbff1a0  00007ffeefbfebb0
00007ffeefbfeb98:  0000000009900acb  0000000000000000
00007ffeefbfeba8:  0000000000000041  00007ffeefbff130
00007ffeefbfebb8:  00007ffeefbfebf0  000000000000037f
00007ffeefbfebc8:  0000000000000000  0000000032aaaba2
00007ffeefbfebd8:  0000000000000000  0000000000000000
00007ffeefbfebe8:  00007ffeefbfed20  0000000000000000
....

GPU memory profiling

We are trying to compare the GPU memory consumption between GoTorch and PyTorch with the Resnet50 model. The scripts locate at https://github.com/wangkuiyi/gotorch/tree/develop/example/resnet.

The GPU card is P100 with 16G memory.

Experiment 1:

Following is the result, it's measured with nvidia-smi command.

Only Forward Forward and Backward
PyTorch 3719 MiB 2545 MiB
GoTorch 2447 MiB 2767 MiB

We remove three-line codes in Only Forward scenario:

# optimizer.zero_grad()
# loss.backward()
# optimizer.step()

Experiment 2:

GPU memory with different batch size:

Batch Size 16 128 160
PyTorch 2545 MiB 13161 MiB 15295 MiB
GoTorch 2767 MiB 14755 MiB OOM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.