Giter Site home page Giter Site logo

llvm / torch-mlir Goto Github PK

View Code? Open in Web Editor NEW
1.2K 249.0 426.0 19.67 MB

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

License: Other

CMake 0.76% C++ 49.37% C 0.38% Shell 0.65% Python 23.09% MLIR 16.32% Starlark 0.47% Dockerfile 0.04% PowerShell 0.02% Jupyter Notebook 8.90%
pytorch compiler mlir

torch-mlir's People

Contributors

aartbik avatar antoniojkim avatar asaadaldien avatar ashay avatar cathyzhyi avatar dan-garvey avatar dellis23 avatar gpetters94 avatar henrytwo avatar makslevental avatar mgehre-amd avatar newling avatar penguin-wwy avatar powderluv avatar qedawkins avatar qingyunqu avatar ramiro050 avatar renxida avatar rsuderman avatar shukla-gaurav avatar silvasean avatar sjain-stanford avatar sjarus avatar stellaraccident avatar stephenneuendorffer avatar vivekkhandelwal1 avatar vremold avatar xinyu302 avatar zhekunz2 avatar zjgarvey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torch-mlir's Issues

Cannot export model with Adadelta

Here is a simple python script that reproduces the issue.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_mlir

torch_mlir.debug_trace_to_stderr()

N = 3
Cin = 16
Cout = 4
w = 10
h = 10

class Net(nn.Module):
    def __init__(self, Cin, Cout):
      super(Net, self).__init__()
      self.conv1 = nn.Conv2d(Cin, Cout, (3,3))
    def forward(self, x):
      x = self.conv1(x)
      output = F.log_softmax(x, dim=1)
      return output

model = Net(Cin, Cout)
inputs = torch.ones((N,Cin,h,w))
criterion = torch.nn.NLLLoss()
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, Cout)
optimizer = torch.optim.Adadelta(model.parameters(), lr=1e-3)

mb = torch_mlir.ModuleBuilder()
with mb.capture_function("adadelta_test", [inputs, target]) as f:
  optimizer.zero_grad()
  loss = criterion(model(inputs), target)
  loss.backward()
  optimizer.step()
  f.returns([loss])
mb.module.operation.print(large_elements_limit=2)

When I run this, I get the following output.

TORCH_MLIR TRACE: Convolution (unboxed) dispatch: aten::convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::_log_softmax(Tensor self, int dim, bool half_to_float) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::nll_loss2d_forward(Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index) -> (Tensor output, Tensor total_weight)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::nll_loss2d_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::_log_softmax_backward_data(Tensor grad_output, Tensor output, int dim, Tensor self) -> (Tensor)
TORCH_MLIR TRACE: mkldnn_convolution_backward dispatch: aten::mkldnn_convolution_backward(Tensor self, Tensor grad_output, Tensor weight, int[] padding, int[] stride, int[] dilation, int groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
TORCH_MLIR TRACE: copy_ dispatch: aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
TORCH_MLIR TRACE: copy_ dispatch: aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::zero_(Tensor(a!) self) -> (Tensor(a!))
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::zero_(Tensor(a!) self) -> (Tensor(a!))
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::addcmul.out(Tensor self, Tensor tensor1, Tensor tensor2, *, Scalar value=1, Tensor(a!) out) -> (Tensor(a!))
Traceback (most recent call last):
  File "models/conv2d.py", line 34, in <module>
    optimizer.step()
  File "/pytorch_nightly/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/pytorch_nightly/lib/python3.8/site-packages/torch/optim/adadelta.py", line 74, in step
    square_avg.mul_(rho).addcmul_(grad, grad, value=1 - rho)
RuntimeError: isTensor() INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/core/ivalue_inl.h":130, please report a bug to PyTorch. Expected Tensor but got Double

Any ideas on what could be causing this?

Generate better docs for ATenOps.td

In general the doc for these ops is a bit terse right now, can you revisit these? Possibly linking to an official documentation page would be fine as well I guess.

Originally posted by @joker-eph in #16

MLIR generated using saved/cached loss tensors instead of loss operand

Aim: Tracing the computation graph of 1 iteration of training from pytorch to mlir using modulebuilder.

Models tried: simple conv mnist model, dlrm, and a very simple 2 layer fully connected network.

Current status: Forward pass works great! but for the backward pass, it uses the loss tensors generated during the trace.

Way to spot:
We can see that the operand assigned to the loss computed is only used during it's assignment and return of function

Sample example:
Python script + mlir file generated is stored:
simple_example.zip

Thanks yall :)

Torch Lowering Path Question

Hello guys,

I have been reading code about graph lowering path, basically, the path is as follow:
Torchscript -> Translate into Torch MLIR Dialect -> Lowering into Linalg(computation ops)/Std(Basic ops)/SCF(loops/control flow) -> call IREE backend

Could someone give me some basic rationale about this path, Why directly lower down to Linalg instead of to HLO as there are optimization paths that can be reused in HLO like operation fusion, etc?

Thanks in advance,
Yang

Embedding Bag Tracing bug

easy_emb.zip
Hey everybody,

I am trying to trace a network with embedding bag in it but I found some bug during backward pass(aside from caching of tensors). So when computing the gradient it tries to do some index_add ->cumsum_ -> resize and then index_select but I think it's missing a step of reducing the value after cumsum_ by 1 because when it tries to do index select it goes above the size of the vector it's trying to access by 1. I have a unit test attached.

To generate MLIR:

cd easy_emb
python emb.py
vim/(your favourite editor) embedding.mlir

Or we can just open the file pre-generated inside the zip and look
especially on line 59 - line 66

CMake error after recent update

https://buildkite.com/iree/mlir-npcomp-standalone/builds/27#6fba062f-0357-4195-9d16-4daa9f651d75

CMake Error at /work/install/llvm-project/mlir-generic-rtti/lib/cmake/mlir/AddMLIR.cmake:187 (get_target_property):

ย  | INTERFACE_LIBRARY targets may only have whitelisted properties. The
ย  | property "LINK_LIBRARIES" is not allowed.
ย  | Call Stack (most recent call first):
ย  | /work/install/llvm-project/mlir-generic-rtti/lib/cmake/mlir/AddMLIR.cmake:213 (mlir_check_link_libraries)
ย  | lib/Python/CMakeLists.txt:54 (mlir_check_all_link_libraries)

Remove use of language features > 3.6

Some 3.7+ language features snuck in and should be removed.

Traceback (most recent call last):

ย  | File "/work/.mmrepo/universe/github.com/llvm/mlir-npcomp.git/test/Python/Compiler/comparisons.py", line 3, in
ย  | from npcomp.compiler import test_config
ย  | File "/work/build/npcomp_default/python/npcomp/init.py", line 5, in
ย  | from . import tracing
ย  | File "/work/build/npcomp_default/python/npcomp/tracing/init.py", line 3, in
ย  | from .mlir_trace import *
ย  | File "/work/build/npcomp_default/python/npcomp/tracing/mlir_trace.py", line 15, in
ย  | from npcomp.tracing.emitters import *
ย  | File "/work/build/npcomp_default/python/npcomp/tracing/emitters.py", line 22, in
ย  | defaults=(TraceValueType.NDARRAY,))):
ย  | TypeError: namedtuple() got an unexpected keyword argument 'defaults'

The CI runs with python 3.6.

Specific test for bool const tensors

I suspect that the AcapDispatch code for materializing a const bool tensor may have some issues, but we lack the facilities to exercise it properly. Add a test specifically for this at the appropriate point.

Convert ATen.td to "let results =" style

We generally prefer the "let results =" style vs Results inheritance, especially since you use the let-form for arguments.

Since the form you have it in seems consistent through the file, let's not change now. We can do a cleanup to the let form in a followup if desired.

Originally posted by @stellaraccident in #16

From _torch_mlir import _get_mlir ERROR

I have successfully build the llvm, mlir , and pytorch front, generate the _mlir and _torch_mlir *.so file. And I have add thems into PATHPYTHON . But when I test, there exist some error:

Import Error

 import npcomp.frontends.pytorch as torch_mlir
  File "mlir-npcomp/build/python/npcomp/frontends/pytorch/__init__.py", line 8, in <module>
    from _torch_mlir import _get_mlir
ImportError: cannot import name '_get_mlir' from '_torch_mlir' (mlir-npcomp/build/python/_torch_mlir.cpython-37m-x86_64-linux-gnu.so)

The _torch_mlir module can be import successfully in python, I have test help(_torch_mlir) and type(_torch_mlir) in python, but the printed result not found _get_mlir function.

code

import npcomp.frontends.pytorch as torch_mlir

dev = torch_mlir.mlir_device()
t0 = torch.randn((4,4), device=dev)
t1 = torch.randn((4,4)).to(dev)
t2 = t0 + t1
t2_mlir = torch_mlir.get_mlir( t2 )
t2_cpu = t2.to('cpu')

No module named 'torch_mlir'

Hello!

I am just trying to go through build instructions in the README. I built Pytorch Frontend in docker container following the instructions and then installed IREE via pip3 install. But When I try to run the e2e test targeting the IREE backend I got the following error:

root@0e8aafd709c6:/src/mlir-npcomp# python frontends/pytorch/e2e_testing/torchscript/main.py --config=iree
Traceback (most recent call last):
  File "frontends/pytorch/e2e_testing/torchscript/main.py", line 9, in <module>
    from torch_mlir.torchscript.e2e_test.framework import run_tests
ModuleNotFoundError: No module named 'torch_mlir'

Could anyone help me? I think I didn't miss any instructions in the README...

Problems with Torch graph bindings

Running into an issue

silvasean@silvasean0:~/pg/mlir-npcomp/mlir-npcomp$ source .env; python frontends/pytorch/test/graph_export/test_script_add3.py
Traceback (most recent call last):
  File "frontends/pytorch/test/graph_export/test_script_add3.py", line 21, in <module>
    def add3(t0, t1, t2):
TypeError: import_function(): incompatible function arguments. The following argument types are supported:
    1. (self: _torch_mlir.ModuleBuilder, arg0: torch::jit::StrongFunctionPtr) -> torch::jit::StrongFunctionPtr

Invoked with: <_torch_mlir.ModuleBuilder object at 0x7f81c3e8d0b0>, <torch.jit.ScriptFunction object at 0x7f81a8cb36d0>
  1. For some reason, on my system pybind11.h is in two places, and despite our efforts in the CMakeLists.txt file the non-torch one seems to get used for the torch bindings
/usr/local/google/home/silvasean/.local/lib/python3.8/site-packages/pybind11/include/pybind11/pybind11.h
/usr/local/google/home/silvasean/.local/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h
  1. Even after I fix that by manually commenting out the necessary CMakeLists.txt lines, I get the same issue. Unclear why.

Cannot export convolution with loss function

I created a simple example as a stepping stone towards the backward pass. It builds on the conv2d forward pass and just adds the negative log likelihood loss.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_mlir

N = 3
Cin = 16
Cout = 4
w = 10
h = 10

class Net(nn.Module):
    def __init__(self, Cin, Cout):
      super(Net, self).__init__()
      self.conv1 = nn.Conv2d(Cin, Cout, (3,3))
    def forward(self, x):
      x = self.conv1(x)
      output = F.log_softmax(x, dim=1)
      return output

model = Net(Cin, Cout)
inputs = torch.ones((N,Cin,h,w))
loss = torch.nn.NLLLoss()
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, Cout)

mb = torch_mlir.ModuleBuilder()
with mb.capture_function("resa", [inputs]) as f:
  #f.returns([model(inputs)])                  # This works
  f.returns([loss(model(inputs), target)])      # This does not work
mb.module.operation.print(large_elements_limit=2)

When I try to run this on 30adf9e, I get the following error.

Traceback (most recent call last):
  File "models/conv2d.py", line 29, in <module>
    f.returns([loss(model(inputs), target)])      # This does not work
  File "/pytorch_nightly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/pytorch_nightly/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 213, in forward
    return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
  File "/pytorch_nightly/lib/python3.8/site-packages/torch/nn/functional.py", line 2237, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: unsupported PyTorch scalar type: UNKNOWN_SCALAR

Enable CI for PyTorch frontend

The instructions in the README and docker image are now up to date. It would be nice to get the CI going for it. I'm not entirely certain how to adapt the LLVM install caching to building within a container.

We might want to wait too until we get closer to PyTorch head: I suspect we'll be successful then at just installing an appropriate version and building against it (and can forgo the container in the CI).

Make enabling asan easier

Dumping here repro steps that got asan working on a ubuntu 20.04 system. The default way that LLVM handles this does not seem to play well with shared libraries.

unset CC
unset CXX
export CC
export CXX

./build_tools/install_mlir.sh -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DLLVM_USE_SANITIZER=Address
./build_tools/cmake_configure.sh -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DLLVM_USE_SANITIZER=Address
cd build
ninja
export LSAN_OPTIONS=detect_leaks=0
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libasan.so.5
ninja check
ninja check-frontends-pytorch

The issue with naively attempting it was as noted here: google/sanitizers#796 (comment)

I suspect that when building with clang, there should be a -shared-libsan on appropriate link lines.

Boxed kernel assertions

RuntimeError: false INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/core/boxing/impl/boxing.h":48, please report a bug to PyTorch. Tried to call KernelFunction::call() for a kernel that only has a boxed kernel and doesn't support calling from an unboxed API yet.

When hooking the dispatcher, not all kernels are supported with registration via a boxed fallback kernel. Need to special case/skip many of these. Moving out of the capture closure usually gets things moving for now.

Use `BUILD_SHARED_LIBS` instead of `LLVM_BUILD_LLVM_DYLIB`

On llvm/circt#767, they were experiencing some of the same issues we have intermittently observed with respect to effects of TypeID multiple definition issues resulting in equality of types being dependent on which shared library does the check.

It is my belief that the way that libLLVM.so/libMLIR.so/libNPCOMP.so are "over-linked" (and order inverted on the link command line) creates the conditions for this kind of issue to surface (although, I have not actually managed to ever nail it down to a specific smoking gun -- more of a "that is clearly not the right way to do it and would result in this kind of issue easily" kind of judgments).

I suggested that @mikeurbach try to use BUILD_SHARED_LIBS mode because it gets the shared-library layering correctly, and this resolved the mismatch for them. I suggest we switch npcomp to the same regime and remove support for linking against libMLIR.so. I am very slowly trying to complete https://reviews.llvm.org/D94387, which should fix the situation for the aggregate dylib linking modes, which would be nice for an eventual production release. In that new world, the dylib building mode is a specialization of BUILD_SHARED_LIBS, so we would need to switch locally regardless.

Legitimate failures when lowering pytorch to std (9 test failures)

Failed Tests (9):
  FRONTENDS_PYTORCH :: test_export_ResA.py
  FRONTENDS_PYTORCH :: test_export_add3.py
  FRONTENDS_PYTORCH :: test_export_batchnorm.py
  FRONTENDS_PYTORCH :: test_export_conv2d_back.py
  FRONTENDS_PYTORCH :: test_export_multi_out.py
  FRONTENDS_PYTORCH :: test_export_resnet18.py
  FRONTENDS_PYTORCH :: test_export_vgg11.py
  FRONTENDS_PYTORCH :: test_op_report_conv2d.py
  FRONTENDS_PYTORCH :: test_op_report_vgg_style_lenet.py

Sample of errors:

  • error: 'aten.relu_' op operand #0 must be tensor of any type values, but got 'memref<32x64x32x32xf32>'
  • aten to loops conversion failed error: 'std.call' op 'native_batch_norm_4F32_1F32_1F32_4F32_1F32_1F32_1F32_1F32_out' does not reference a valid function
  • error: unsupported or non-LLVM operation: aten.constant
  • JIT session error: Symbols not found: [ _mlir_ciface_as_strided_1F32_4F32_out ]
  • error: 'aten.div' op operand #0 must be tensor of any type values, but got 'f32'

Error when running example scripts

Hi,

I am trying to run example scripts under frontends/pytorch/examples but I got two kinds of errors:

  1. When I am running scripts using capture_function, I got:
    image
    the same error occurs when I run cos_e2e.py, div_inplace_e2e.py, mm_e2e.py, mul_maximum_e2e.py, tanh_out_e2e.py. Could someone tell me how to generate these op definitions and where to put them?

  2. Running torchscript_**_e2e.py scripts gives me:
    image

I am still learning the code so this might be a trivial question or maybe I just missed some steps?
I went through all the build steps in README, I successfully ran tools/torchscrip_e2e_test.sh. I installed iree backend using pip.

Rename 'master' branch to 'main'

Following the work done in the rest of the org. This project is small enough that I don't think it requires special coordination. I'll make the change sometime in the next couple of days.

Error when building npcomp in docker

There are two problems I ran into today after fetching the latest code:

  1. When I run cmake --build /build/npcomp --target check-npcomp check-frontends-pytorch in docker environment following README.md in the top directory, I got:
    image
    This used to work and there used to be directories under /build such as llvm-install,llvm-build, npcomp. Now there is only /build/npcomp and the directory is almost empty except a .env file and few newly created directories.

I noticed some changes in the build process in #251 and #258. Could someone update this README file to point it to the correct build output directory?

  1. The second problem is probably related to the first: tools/torchscript_e2e_test.sh --config=iree this command will show that it cannot find the corresponding test package:
    image

Thanks!

Cannot intercept aten::conv2d when dispatching through backend keys

When using a backend dispatch key (i.e. PrivateUse3), aten::conv2d calls are never recorded; however, when using AutogradPrivateUse3, they are (but this has other problems). conv is special in a number of ways and need to check with PT devs regarding how to resolve.

ModuleBuilder represents torch.cos with a constant

The MLIR generated for torch.cos via ModuleBuilder ignores %arg0. Instead, it returns a constant with the result of torch.kernel_call "aten::mm" %arg0 for the %arg0 used during capture_function.

This can be reproduced with the code below or via python3 frontends/pytorch/examples/cos_e2e.py after #134 is submitted.

import torch
import torch_mlir

torch.manual_seed(0)
input = torch.rand(2, 3)

mb = torch_mlir.ModuleBuilder()
with mb.capture_function("cos", [input]) as f:
  result = torch.cos(input)
  f.returns([result])

print(mb.module)
module  {
  func @cos(%arg0: !numpy.ndarray<[2,3]:f32>) -> !numpy.ndarray<[2,3]:f32> {
    %cst = constant dense<[[0.879371106, 0.719147384, 0.996088445], [0.991296648, 0.953116595, 0.805617868]]> : tensor<2x3xf32>
    %0 = numpy.create_array_from_tensor %cst : (tensor<2x3xf32>) -> !numpy.ndarray<[2,3]:f32>
    return %0 : !numpy.ndarray<[2,3]:f32>
  }
}

Resnet 18 iree path

Hello,
I substituted refjit backend with iree backend in torchscript_resnet18_e2e.py:

backend = iree.IreeNpcompBackend()
#backend = refjit.RefjitNpcompBackend()

I know this probably won't run successfully ... But worth a try, I got this error:
image

I also ran the iree-translate command alone (I copied the instruction from the last line of the screenshot above, I also dumped the input file using mb.module.operation.get_asm and f.write and then passed the input file as iree-translate's parameter):
image

My questions are:

  1. Are these two errors caused by the same reason but just different print-out or am I invoking the iree-translate tool the wrong way?
  2. Since I am not very familiar with iree (as well as MLIR since I am still studying it:)), how to debug this kind of problem in general?
  3. Is there any plan to pass the resnet18 test with iree backend?

Thanks in advance to anyone who helps me with these questions!

Participate in NPBench

First of all, awesome project!

It would be interesting to see the results of the NPComp infrastructure vs. other python compilers, such as Numba, on scientific python apps. NPBench has a wide variety of HPC and computational science apps written in numpy. It'd be great if you had an implementation/results there!

Building in the docker container: ninja: error: loading 'build.ninja': No such file or directory

I have followed the instruction:

  1. Setup docker container 1. Done
  2. In docker container, do Command prep 2. Done and without a problem.
  3. Try either vanilla compile or PyTorch Frontend compile, running ./build_tools/cmake_configure.sh is fine. But then both give me:

ninja: error: loading 'build.ninja': No such file or directory

Do you have a recommendation how does this ninja error can get fixed?

Incorrect IR when using the cat operator

Here is a simple example showing the problem. The python code is shown below.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_mlir

torch_mlir.debug_trace_to_stderr()

N = 3
Cin = 16
Cout = 4
w = 10
h = 10

class Net(nn.Module):
    def __init__(self, Cin, Cout):
      super(Net, self).__init__()
      self.conv1 = nn.Conv2d(Cin, Cout, (3,3))
    def forward(self, x):
      x0 = self.conv1(x)
      x1 = self.conv1(x)
      z = torch.cat([x0, x1])
      output = F.log_softmax(z, dim=1)
      return output

model = Net(Cin, Cout)
inputs = torch.ones((N,Cin,h,w))
weight = torch.randn(Cout)
loss = torch.nn.NLLLoss()
target = torch.empty(2*N, 8, 8, dtype=torch.long).random_(0, Cout)

mb = torch_mlir.ModuleBuilder()
with mb.capture_function("cat_test", [inputs, target]) as f:
  result = loss(model(inputs), target)
  f.returns([result])
mb.module.operation.print(large_elements_limit=2)

This results in the following output.

TORCH_MLIR TRACE: Convolution (unboxed) dispatch: aten::convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups) -> (Tensor)
TORCH_MLIR TRACE: Convolution (unboxed) dispatch: aten::convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::_cat(Tensor[] tensors, int dim=0) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::_log_softmax(Tensor self, int dim, bool half_to_float) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::nll_loss2d_forward(Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index) -> (Tensor output, Tensor total_weight)
module {
  func @resa(%arg0: !numpy.ndarray<[3,16,10,10]:f32>, %arg1: !numpy.ndarray<[6,8,8]:i64>) -> !numpy.ndarray<[]:f32> {
    %cst = constant opaque<"", "0xDEADBEEF"> : tensor<4x16x3x3xf32>
    %cst_0 = constant opaque<"", "0xDEADBEEF"> : tensor<4xf32>
    %c1_i64 = constant 1 : i64
    %c1_i64_1 = constant 1 : i64
    %0 = basicpy.build_list %c1_i64, %c1_i64_1 : (i64, i64) -> !basicpy.ListType
    %c0_i64 = constant 0 : i64
    %c0_i64_2 = constant 0 : i64
    %1 = basicpy.build_list %c0_i64, %c0_i64_2 : (i64, i64) -> !basicpy.ListType
    %c1_i64_3 = constant 1 : i64
    %c1_i64_4 = constant 1 : i64
    %2 = basicpy.build_list %c1_i64_3, %c1_i64_4 : (i64, i64) -> !basicpy.ListType
    %false = constant false
    %c0_i64_5 = constant 0 : i64
    %c0_i64_6 = constant 0 : i64
    %3 = basicpy.build_list %c0_i64_5, %c0_i64_6 : (i64, i64) -> !basicpy.ListType
    %c1_i64_7 = constant 1 : i64
    %c1_i64_8 = constant 1 : i64
    %c1_i64_9 = constant 1 : i64
    %4 = basicpy.build_list %c1_i64_8, %c1_i64_9 : (i64, i64) -> !basicpy.ListType
    %c0_i64_10 = constant 0 : i64
    %c0_i64_11 = constant 0 : i64
    %5 = basicpy.build_list %c0_i64_10, %c0_i64_11 : (i64, i64) -> !basicpy.ListType
    %c1_i64_12 = constant 1 : i64
    %c1_i64_13 = constant 1 : i64
    %6 = basicpy.build_list %c1_i64_12, %c1_i64_13 : (i64, i64) -> !basicpy.ListType
    %false_14 = constant false
    %c0_i64_15 = constant 0 : i64
    %c0_i64_16 = constant 0 : i64
    %7 = basicpy.build_list %c0_i64_15, %c0_i64_16 : (i64, i64) -> !basicpy.ListType
    %c1_i64_17 = constant 1 : i64
    %8 = basicpy.build_list %12, %13 : (!numpy.ndarray<[3,4,8,8]:f32>, !numpy.ndarray<[3,4,8,8]:f32>) -> !basicpy.ListType
    %c0_i64_18 = constant 0 : i64
    %c1_i64_19 = constant 1 : i64
    %false_20 = constant false
    %9 = basicpy.singleton : !basicpy.NoneType
    %c1_i64_21 = constant 1 : i64
    %c-100_i64 = constant -100 : i64
    %10 = numpy.create_array_from_tensor %cst : (tensor<4x16x3x3xf32>) -> !numpy.ndarray<[4,16,3,3]:f32>
    %11 = numpy.create_array_from_tensor %cst_0 : (tensor<4xf32>) -> !numpy.ndarray<[4]:f32>
    %12 = torch.kernel_call "aten::convolution" %arg0, %10, %11, %0, %1, %2, %false, %3, %c1_i64_7 : (!numpy.ndarray<[3,16,10,10]:f32>, !numpy.ndarray<[4,16,3,3]:f32>, !numpy.ndarray<[4]:f32>, !basicpy.ListType, !basicpy.ListType, !basicpy.ListType, i1, !basicpy.ListType, i64) -> !numpy.ndarray<[3,4,8,8]:f32> {sigArgTypes = ["Tensor", "Tensor", "Tensor?", "int[]", "int[]", "int[]", "bool", "int[]", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}
    %13 = torch.kernel_call "aten::convolution" %arg0, %10, %11, %4, %5, %6, %false_14, %7, %c1_i64_17 : (!numpy.ndarray<[3,16,10,10]:f32>, !numpy.ndarray<[4,16,3,3]:f32>, !numpy.ndarray<[4]:f32>, !basicpy.ListType, !basicpy.ListType, !basicpy.ListType, i1, !basicpy.ListType, i64) -> !numpy.ndarray<[3,4,8,8]:f32> {sigArgTypes = ["Tensor", "Tensor", "Tensor?", "int[]", "int[]", "int[]", "bool", "int[]", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}
    %14 = torch.kernel_call "aten::_cat" %8, %c0_i64_18 : (!basicpy.ListType, i64) -> !numpy.ndarray<[6,4,8,8]:f32> {sigArgTypes = ["Tensor[]", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}
    %15 = torch.kernel_call "aten::_log_softmax" %14, %c1_i64_19, %false_20 : (!numpy.ndarray<[6,4,8,8]:f32>, i64, i1) -> !numpy.ndarray<[6,4,8,8]:f32> {sigArgTypes = ["Tensor", "int", "bool"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}
    %16:2 = torch.kernel_call "aten::nll_loss2d_forward" %15, %arg1, %9, %c1_i64_21, %c-100_i64 : (!numpy.ndarray<[6,4,8,8]:f32>, !numpy.ndarray<[6,8,8]:i64>, !basicpy.NoneType, i64, i64) -> (!numpy.ndarray<[]:f32>, !numpy.ndarray<[]:f32>) {sigArgTypes = ["Tensor", "Tensor", "Tensor?", "int", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor", "Tensor"]}
    return %16#0 : !numpy.ndarray<[]:f32>
  }

The problem with this output however is here

%8 = basicpy.build_list %12, %13 : (!numpy.ndarray<[3,4,8,8]:f32>, !numpy.ndarray<[3,4,8,8]:f32>) -> !basicpy.ListType
...
%12 = torch.kernel_call "aten::convolution" %arg0, %10, %11, %0, %1, %2, %false, %3, %c1_i64_7 : (!numpy.ndarray<[3,16,10,10]:f32>, !numpy.ndarray<[4,16,3,3]:f32>, !numpy.ndarray<[4]:f32>, !basicpy.ListType, !basicpy.ListType, !basicpy.ListType, i1, !basicpy.ListType, i64) -> !numpy.ndarray<[3,4,8,8]:f32> {sigArgTypes = ["Tensor", "Tensor", "Tensor?", "int[]", "int[]", "int[]", "bool", "int[]", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}
%13 = torch.kernel_call "aten::convolution" %arg0, %10, %11, %4, %5, %6, %false_14, %7, %c1_i64_17 : (!numpy.ndarray<[3,16,10,10]:f32>, !numpy.ndarray<[4,16,3,3]:f32>, !numpy.ndarray<[4]:f32>, !basicpy.ListType, !basicpy.ListType, !basicpy.ListType, i1, !basicpy.ListType, i64) -> !numpy.ndarray<[3,4,8,8]:f32> {sigArgTypes = ["Tensor", "Tensor", "Tensor?", "int[]", "int[]", "int[]", "bool", "int[]", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}

because the build_list op is referencing %12 and %13 when they don't exist yet. When I feed this to npcomp-opt, I get

conv2d.mlir:33:10: error: operand #0 does not dominate this use
    %8 = basicpy.build_list %12, %13 : (!numpy.ndarray<[3,4,8,8]:f32>, !numpy.ndarray<[3,4,8,8]:f32>) -> !basicpy.ListType
         ^
conv2d.mlir:33:10: note: see current operation: %8 = "basicpy.build_list"(%12, %13) : (!numpy.ndarray<[3,4,8,8]:f32>, !numpy.ndarray<[3,4,8,8]:f32>) -> !basicpy.ListType
conv2d.mlir:42:11: note: operand defined here
    %12 = torch.kernel_call "aten::convolution" %arg0, %10, %11, %0, %1, %2, %false, %3, %c1_i64_7 : (!numpy.ndarray<[3,16,10,10]:f32>, !numpy.ndarray<[4,16,3,3]:f32>, !numpy.ndarray<[4]:f32>, !basicpy.ListType, !basicpy.ListType, !basicpy.ListType, i1, !basicpy.ListType, i64) -> !numpy.ndarray<[3,4,8,8]:f32> {sigArgTypes = ["Tensor", "Tensor", "Tensor?", "int[]", "int[]", "int[]", "bool", "int[]", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}

Manually moving the definition of %8 after the definition of %12 and %13 fixes the problem. This seems to suggest that when the build_list operator is being constructed it is not being inserted in the right location.

Error while building project - Variable not defined: 'Shape_ExtentTensorType'

I am trying to build mlir-comp and it fails with the following error:

[27/126] Building TCPOps.h.inc... FAILED: include/npcomp/Dialect/TCP/IR/TCPOps.h.inc cd .../mlir-npcomp/include/npcomp/Dialect/TCP/IR/TCPOps.td:42:50: error: Variable not defined: 'Shape_ExtentTensorType' let arguments = (ins AnyRankedTensor:$operand, Shape_ExtentTensorType:$shape); ^ [28/126] Building TCPOps.cpp.inc... FAILED: include/npcomp/Dialect/TCP/IR/TCPOps.cpp.inc .../mlir-npcomp/include/npcomp/Dialect/TCP/IR/TCPOps.td:42:50: error: Variable not defined: 'Shape_ExtentTensorType' let arguments = (ins AnyRankedTensor:$operand, Shape_ExtentTensorType:$shape); ^ [32/126] Building TCPOpsDialect.h.inc...

And further down with aten_ops too. I am not sure if the above error is related with this one.
../frontends/pytorch/lib/aten_ops.cpp:107:19: error: no member named 'addmm' in namespace 'at::native'; did you mean 'addmv'? at::native::addmm(torch_a, torch_b, torch_c, alpha, beta).clone(); ~~~~~~~~~~~~^~~~~ addmv

Trailing for "add_" not working

Hey guys,

I tried updating to this patch
but when I tried tracing with acap_dispatch, it seems to still be using "add" instead of "add_". I have a small unit test and the output that is generated as attachment
add_test.zip
. Any help/comments would be appreciated! :)

Cannot export model with torch.arange

Below is a simple python script that reproduces the issue.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_mlir

torch_mlir.debug_trace_to_stderr()

N = 3
Cin = 16
Cout = 4
w = 10
h = 10

class Net(nn.Module):
    def __init__(self, Cin, Cout):
      super(Net, self).__init__()
      self.conv1 = nn.Conv2d(Cin, Cout, (3,3))
    def forward(self, x):
      x = self.conv1(x)
      indices = torch.arange(N)
      x = x[indices, :, :, :]
      output = F.log_softmax(x, dim=1)
      return output

model = Net(Cin, Cout)
inputs = torch.ones((N,Cin,h,w))
loss = torch.nn.NLLLoss()
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, Cout)

mb = torch_mlir.ModuleBuilder()
with mb.capture_function("arange_test", [inputs, target]) as f:
  result = loss(model(inputs), target)
  result.backward()
  f.returns([result] + [p.grad for p in model.parameters()])
mb.module.operation.print(large_elements_limit=2)

When I try to run this, I get the following output.

TORCH_MLIR TRACE: Convolution (unboxed) dispatch: aten::convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::arange.start_out(Scalar start, Scalar end, Scalar step=1, *, Tensor(a!) out) -> (Tensor(a!))
Traceback (most recent call last):
  File "models/conv2d.py", line 32, in <module>
    result = loss(model(inputs), target)
  File "/pytorch_nightly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
    result = self.forward(*input, **kwargs)
  File "models/conv2d.py", line 20, in forward
    indices = torch.arange(N)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::arange.start_out.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, PrivateUse2, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /pytorch/build/aten/src/ATen/CPUType.cpp:2154 [kernel]
PrivateUse2: registered at /mlir-npcomp/frontends/pytorch/csrc/c10_dispatch/acap_dispatch.cpp:645 [backend fallback]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_1.cpp:10219 [kernel]
Autocast: fallthrough registered at /pytorch/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:527 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

The error states that there is no fallback kernel defined for this op (arange.start_out). But I noticed that in type_dispatch/aten_mlir_type_default.cpp there is a function RegisterAtenTypeFunctions that registers arange.start_out

          .op(torch::RegisterOperators::options()
                  .schema("aten::arange.start_out(Scalar start, Scalar end, "
                          "Scalar step=1, *, Tensor(a!) out) -> Tensor(a!)")
                  .impl_unboxedOnlyKernel<at::Tensor &(at::Tensor &, at::Scalar,
                                                       at::Scalar, at::Scalar),
                                          &ATenMLIRTypeDefault::arange_out>(
                      at::TensorTypeId::XLATensorId)
                  .aliasAnalysis(c10::AliasAnalysisKind::FROM_SCHEMA))

But the error also states that I passed in an empty list of Tensors. Does that mean we need another definition of this op or just how it handles optional arguments because I have only passed in the end argument in the above function? Or do you think the root cause of this is something completely different?

Missing `_get_mlir` Python method binding in PyTorch Frontend

Importing the PyTorch frontend complains of a missing _get_mlir method in the _torch_mlir.so binary. I am currently on master (81dd571).

# Build script.
LLVM_VERSION=10
export CC=clang-$LLVM_VERSION
export CXX=clang++
export LDFLAGS=-fuse-ld=$(which ld.lld)

sh ./build_tools/install_mlir.sh
sh ./build_tools/cmake_configure.sh

# Build and run tests
cd build
ninja
ninja check-npcomp
# Produce import error.
PYTHONPATH=build/python:build/frontends/pytorch/csrc python
Python 3.8.5 (default, Jul 27 2020, 10:09:03)
[GCC 10.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import npcomp.frontends.pytorch
2020-09-09 13:13:57.642115: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/python/npcomp/frontends/pytorch/__init__.py", line 8, in <module>
    from _torch_mlir import _get_mlir
ImportError: cannot import name '_get_mlir' from '_torch_mlir' (build/frontends/pytorch/csrc/_torch_mlir.so)

Bugs of tcf::ConvNCHWOp

Right now I'm learning the source code. And I saw the following code in TCFToLinalg.cpp:67

auto heightPlusTwicePadding = builder.create<SubIOp>(op->getLoc(), height, twicePaddingHeight);

According to PyTorch conv2d document
I think it might be:

auto heightPlusTwicePadding = builder.create<AddIOp>(op->getLoc(), height, twicePaddingHeight);

I'm just started learning PyTorch and MLIR things. I'm not sure if I'm correct?

Besides, I have another question. Does it support conversion (tcf::ConvNCHWOp->linalg::ConvNCHWOp) with padding, dilation, stride parameter?

ninja check-npcomp failed

when I run ninja check-npcomp, there are some erros :

Testing Time: 2.29s
Unsupported: 1
Passed : 2
Failed : 74
FAILED: test/CMakeFiles/check-npcomp-lit

why?

RuntimeError: Unsupported capture value returned from kernel (Bool): True

In test_export_conv2d_back.py:

Traceback (most recent call last):
  File "/src/mlir-npcomp/frontends/pytorch/test/acap_export/test_export_conv2d_back.py", line 30, in <module>
    result = model(tensor)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 419, in forward
    return self._conv_forward(input, self.weight)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 415, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Unsupported capture value returned from kernel (Bool): True

StatisticsOpInterface is ripe for refactoring

This looks like a quite heavy type to return here.
It isn't clear to me what this is doing right now, but taking a StringMap (or better if possible a DenseMap<StringRef. uint64_t>) as output operand would allow to not realloc constantly by reusing the same map over and over from the call site.

Originally posted by @joker-eph in https://github.com/_render_node/MDIzOlB1bGxSZXF1ZXN0UmV2aWV3VGhyZWFkMjkyOTIwMTA4OnYy/pull_request_review_threads/discussion

Is it possible to build npcomp with a prebuilt llvm-project ?

Hi,

I'm thinking to contribute some build feature to the npcomp cmake build script, and want some advice before doing it.

Here are our problem.

We developed a project Foo, which will use mlir's python binding from a modified llvm-project, this llvm-project is updated with upstream, and shipped to us as a prebuilt library.
Now , we want also to use npcomp in our project, but the current build process need to build llvm from source, and will create a bundled mlir pacakge, so this just doesn't work with us.

I think npcomp should be able to work in a more decoupled way, what I want to do are:

  1. Build npcomp with a prebuilt llvm-project, with something like cmake -DLLVM_INSTALL_ROOT
  2. Re-arrange code/file to remove the "python_packages/npcomp_core/mlir", so the import mlir will import milr from the prebuilt mlir , and import npcomp will just import things npcomp needed

Is it fine to do so?

Sincerely

Make required PyTorch version clear

With PyTorch nightly (1.9.0.dev20210216) installed via conda, I get the following error

static_assert(c10::guts::false_t<Func>(), ".impl_UNBOXED(...) was removed. Please use .impl(...) instead.");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~
../frontends/pytorch/csrc/builder/acap_dispatch.cpp:596:5: note: in instantiation of function template specialization 'torch::Library::impl_UNBOXED<const char *, at::Tensor (const at::Tensor &, const at::Tensor &, const c10::optional<at::Tensor> &, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long)>' requested here
  m.impl_UNBOXED("convolution", &AcapController::convolutionKernel);

I could do a bisection but it would be better if you could point me to a version that is known to work. Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.