llvm / torch-mlir Goto Github PK

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

License: Other

CMake 0.76% C++ 49.37% C 0.38% Shell 0.65% Python 23.09% MLIR 16.32% Starlark 0.47% Dockerfile 0.04% PowerShell 0.02% Jupyter Notebook 8.90%

pytorch compiler mlir

torch-mlir's People

Contributors

Stargazers

Watchers

Forkers

stellaraccident phoenix-meadowlark stephenneuendorffer zeta1999 sighingnow bsarden ajarthurs joker-eph strint bytemiaow qixiuai saeta gear cathyzhyi raikonenfnu cjolivier01 amaliujia alphargb river707 astrotuna201 annie-sihan-chen ekmixon hunbssfy standardgalactic dan-garvey ramiro050 gpetters94 joejiong silvasean dumpmemory sogartar sailfish009 imaginary-person pashu123 fifield metavai timeflies1113 shukla-gaurav sjarus svd16 eliasj42 yutyrannus willdla peimu wffpy vivekkhandelwal1 ff7250 ljfitz nithinsubbiah harsh-nod anupgangwar nirvedhmeshram xndcn sunshinemyson makslevental googol-lab 00mjk gprateek93 zzy82518996 xiaoycolor asaadaldien mbrukman techthiyanes henrytwo antoniojkim cerebras lipracer rdadolf zhuyetuo classicvalues rnett jungmair mycpuorg redbopo jasonmaojinsong heluocs jxhekang lorenzbc charlifu mookel isabella232 cothello sjw36 gglin001 vasantha-helprack denolf rh-yu howin98 pcf000 vidsinghal guoqiangjia powderluv rnshah9 tengxu-sun albertdmath burntfalafel pai-disc svoch sjain-stanford shraiysh

torch-mlir's Issues

Cannot export model with Adadelta

Here is a simple python script that reproduces the issue.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_mlir

torch_mlir.debug_trace_to_stderr()

N = 3
Cin = 16
Cout = 4
w = 10
h = 10

class Net(nn.Module):
    def __init__(self, Cin, Cout):
      super(Net, self).__init__()
      self.conv1 = nn.Conv2d(Cin, Cout, (3,3))
    def forward(self, x):
      x = self.conv1(x)
      output = F.log_softmax(x, dim=1)
      return output

model = Net(Cin, Cout)
inputs = torch.ones((N,Cin,h,w))
criterion = torch.nn.NLLLoss()
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, Cout)
optimizer = torch.optim.Adadelta(model.parameters(), lr=1e-3)

mb = torch_mlir.ModuleBuilder()
with mb.capture_function("adadelta_test", [inputs, target]) as f:
  optimizer.zero_grad()
  loss = criterion(model(inputs), target)
  loss.backward()
  optimizer.step()
  f.returns([loss])
mb.module.operation.print(large_elements_limit=2)

When I run this, I get the following output.

TORCH_MLIR TRACE: Convolution (unboxed) dispatch: aten::convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::_log_softmax(Tensor self, int dim, bool half_to_float) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::nll_loss2d_forward(Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index) -> (Tensor output, Tensor total_weight)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::nll_loss2d_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::_log_softmax_backward_data(Tensor grad_output, Tensor output, int dim, Tensor self) -> (Tensor)
TORCH_MLIR TRACE: mkldnn_convolution_backward dispatch: aten::mkldnn_convolution_backward(Tensor self, Tensor grad_output, Tensor weight, int[] padding, int[] stride, int[] dilation, int groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor)
TORCH_MLIR TRACE: copy_ dispatch: aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
TORCH_MLIR TRACE: copy_ dispatch: aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::zero_(Tensor(a!) self) -> (Tensor(a!))
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::zero_(Tensor(a!) self) -> (Tensor(a!))
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::addcmul.out(Tensor self, Tensor tensor1, Tensor tensor2, *, Scalar value=1, Tensor(a!) out) -> (Tensor(a!))
Traceback (most recent call last):
  File "models/conv2d.py", line 34, in <module>
    optimizer.step()
  File "/pytorch_nightly/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/pytorch_nightly/lib/python3.8/site-packages/torch/optim/adadelta.py", line 74, in step
    square_avg.mul_(rho).addcmul_(grad, grad, value=1 - rho)
RuntimeError: isTensor() INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/core/ivalue_inl.h":130, please report a bug to PyTorch. Expected Tensor but got Double

Any ideas on what could be causing this?

Generate better docs for ATenOps.td

In general the doc for these ops is a bit terse right now, can you revisit these? Possibly linking to an official documentation page would be fine as well I guess.

Originally posted by @joker-eph in #16

MLIR generated using saved/cached loss tensors instead of loss operand

Aim: Tracing the computation graph of 1 iteration of training from pytorch to mlir using modulebuilder.

Models tried: simple conv mnist model, dlrm, and a very simple 2 layer fully connected network.

Current status: Forward pass works great! but for the backward pass, it uses the loss tensors generated during the trace.

Way to spot:
We can see that the operand assigned to the loss computed is only used during it's assignment and return of function

Sample example:
Python script + mlir file generated is stored:
simple_example.zip

Thanks yall :)

Torch Lowering Path Question

Hello guys,

I have been reading code about graph lowering path, basically, the path is as follow:
Torchscript -> Translate into Torch MLIR Dialect -> Lowering into Linalg(computation ops)/Std(Basic ops)/SCF(loops/control flow) -> call IREE backend

Could someone give me some basic rationale about this path, Why directly lower down to Linalg instead of to HLO as there are optimization paths that can be reused in HLO like operation fusion, etc?

Thanks in advance,
Yang

ATenPasses should use Tablegen'd pass registration

ATen Passes are built using the old explicit registration patterns. They should get updated to be consistent with other NPCOMP passes.

Originally posted by @stellaraccident in #16

Embedding Bag Tracing bug

easy_emb.zip
Hey everybody,

I am trying to trace a network with embedding bag in it but I found some bug during backward pass(aside from caching of tensors). So when computing the gradient it tries to do some index_add ->cumsum_ -> resize and then index_select but I think it's missing a step of reducing the value after cumsum_ by 1 because when it tries to do index select it goes above the size of the vector it's trying to access by 1. I have a unit test attached.

To generate MLIR:

cd easy_emb
python emb.py
vim/(your favourite editor) embedding.mlir

Or we can just open the file pre-generated inside the zip and look
especially on line 59 - line 66

CMake error after recent update

https://buildkite.com/iree/mlir-npcomp-standalone/builds/27#6fba062f-0357-4195-9d16-4daa9f651d75

CMake Error at /work/install/llvm-project/mlir-generic-rtti/lib/cmake/mlir/AddMLIR.cmake:187 (get_target_property):

| INTERFACE_LIBRARY targets may only have whitelisted properties. The
| property "LINK_LIBRARIES" is not allowed.
| Call Stack (most recent call first):
| /work/install/llvm-project/mlir-generic-rtti/lib/cmake/mlir/AddMLIR.cmake:213 (mlir_check_link_libraries)
| lib/Python/CMakeLists.txt:54 (mlir_check_all_link_libraries)

Remove use of language features > 3.6

Some 3.7+ language features snuck in and should be removed.

Traceback (most recent call last):

| File "/work/.mmrepo/universe/github.com/llvm/mlir-npcomp.git/test/Python/Compiler/comparisons.py", line 3, in
| from npcomp.compiler import test_config
| File "/work/build/npcomp_default/python/npcomp/init.py", line 5, in
| from . import tracing
| File "/work/build/npcomp_default/python/npcomp/tracing/init.py", line 3, in
| from .mlir_trace import *
| File "/work/build/npcomp_default/python/npcomp/tracing/mlir_trace.py", line 15, in
| from npcomp.tracing.emitters import *
| File "/work/build/npcomp_default/python/npcomp/tracing/emitters.py", line 22, in
| defaults=(TraceValueType.NDARRAY,))):
| TypeError: namedtuple() got an unexpected keyword argument 'defaults'

The CI runs with python 3.6.

ReturnEliminationPass needs unit tests

do we have tests for this pass? I didn't see return-elimination in any tests.

Originally posted by @silvasean in #16

Specific test for bool const tensors

I suspect that the AcapDispatch code for materializing a const bool tensor may have some issues, but we lack the facilities to exercise it properly. Add a test specifically for this at the appropriate point.

Convert ATen.td to "let results =" style

We generally prefer the "let results =" style vs Results inheritance, especially since you use the let-form for arguments.

Since the form you have it in seems consistent through the file, let's not change now. We can do a cleanup to the let form in a followup if desired.

Originally posted by @stellaraccident in #16

From _torch_mlir import _get_mlir ERROR

I have successfully build the llvm, mlir , and pytorch front, generate the _mlir and _torch_mlir *.so file. And I have add thems into PATHPYTHON . But when I test, there exist some error:

Import Error

 import npcomp.frontends.pytorch as torch_mlir
  File "mlir-npcomp/build/python/npcomp/frontends/pytorch/__init__.py", line 8, in <module>
    from _torch_mlir import _get_mlir
ImportError: cannot import name '_get_mlir' from '_torch_mlir' (mlir-npcomp/build/python/_torch_mlir.cpython-37m-x86_64-linux-gnu.so)

The _torch_mlir module can be import successfully in python, I have test help(_torch_mlir) and type(_torch_mlir) in python, but the printed result not found _get_mlir function.

code

import npcomp.frontends.pytorch as torch_mlir

dev = torch_mlir.mlir_device()
t0 = torch.randn((4,4), device=dev)
t1 = torch.randn((4,4)).to(dev)
t2 = t0 + t1
t2_mlir = torch_mlir.get_mlir( t2 )
t2_cpu = t2.to('cpu')

No module named 'torch_mlir'

Hello!

I am just trying to go through build instructions in the README. I built Pytorch Frontend in docker container following the instructions and then installed IREE via pip3 install. But When I try to run the e2e test targeting the IREE backend I got the following error:

root@0e8aafd709c6:/src/mlir-npcomp# python frontends/pytorch/e2e_testing/torchscript/main.py --config=iree
Traceback (most recent call last):
  File "frontends/pytorch/e2e_testing/torchscript/main.py", line 9, in <module>
    from torch_mlir.torchscript.e2e_test.framework import run_tests
ModuleNotFoundError: No module named 'torch_mlir'

Could anyone help me? I think I didn't miss any instructions in the README...

ATen passes expect a toplevel operation named 'graph'

This should be generalized.

Problems with Torch graph bindings

Running into an issue

silvasean@silvasean0:~/pg/mlir-npcomp/mlir-npcomp$ source .env; python frontends/pytorch/test/graph_export/test_script_add3.py
Traceback (most recent call last):
  File "frontends/pytorch/test/graph_export/test_script_add3.py", line 21, in <module>
    def add3(t0, t1, t2):
TypeError: import_function(): incompatible function arguments. The following argument types are supported:
    1. (self: _torch_mlir.ModuleBuilder, arg0: torch::jit::StrongFunctionPtr) -> torch::jit::StrongFunctionPtr

Invoked with: <_torch_mlir.ModuleBuilder object at 0x7f81c3e8d0b0>, <torch.jit.ScriptFunction object at 0x7f81a8cb36d0>

For some reason, on my system pybind11.h is in two places, and despite our efforts in the CMakeLists.txt file the non-torch one seems to get used for the torch bindings

/usr/local/google/home/silvasean/.local/lib/python3.8/site-packages/pybind11/include/pybind11/pybind11.h
/usr/local/google/home/silvasean/.local/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h

Even after I fix that by manually commenting out the necessary CMakeLists.txt lines, I get the same issue. Unclear why.

Cannot export convolution with loss function

I created a simple example as a stepping stone towards the backward pass. It builds on the conv2d forward pass and just adds the negative log likelihood loss.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_mlir

N = 3
Cin = 16
Cout = 4
w = 10
h = 10

class Net(nn.Module):
    def __init__(self, Cin, Cout):
      super(Net, self).__init__()
      self.conv1 = nn.Conv2d(Cin, Cout, (3,3))
    def forward(self, x):
      x = self.conv1(x)
      output = F.log_softmax(x, dim=1)
      return output

model = Net(Cin, Cout)
inputs = torch.ones((N,Cin,h,w))
loss = torch.nn.NLLLoss()
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, Cout)

mb = torch_mlir.ModuleBuilder()
with mb.capture_function("resa", [inputs]) as f:
  #f.returns([model(inputs)])                  # This works
  f.returns([loss(model(inputs), target)])      # This does not work
mb.module.operation.print(large_elements_limit=2)

When I try to run this on 30adf9e, I get the following error.

Traceback (most recent call last):
  File "models/conv2d.py", line 29, in <module>
    f.returns([loss(model(inputs), target)])      # This does not work
  File "/pytorch_nightly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/pytorch_nightly/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 213, in forward
    return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
  File "/pytorch_nightly/lib/python3.8/site-packages/torch/nn/functional.py", line 2237, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: unsupported PyTorch scalar type: UNKNOWN_SCALAR

Enable CI for PyTorch frontend

The instructions in the README and docker image are now up to date. It would be nice to get the CI going for it. I'm not entirely certain how to adapt the LLVM install caching to building within a container.

We might want to wait too until we get closer to PyTorch head: I suspect we'll be successful then at just installing an appropriate version and building against it (and can forgo the container in the CI).

ATenLoweringPass only supports certain types in mangling.

TODOs and cleaned up comments with explanation of what isn't supported yet?
Probably we want a Type Interface that supports mangling.

Originally posted by @stellaraccident in https://github.com/_render_node/MDE3OlB1bGxSZXF1ZXN0UmV2aWV3NDY1MzAzMTgx/pull_request_reviews/more_threads

Make enabling asan easier

Dumping here repro steps that got asan working on a ubuntu 20.04 system. The default way that LLVM handles this does not seem to play well with shared libraries.

unset CC
unset CXX
export CC
export CXX

./build_tools/install_mlir.sh -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DLLVM_USE_SANITIZER=Address
./build_tools/cmake_configure.sh -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DLLVM_USE_SANITIZER=Address
cd build
ninja
export LSAN_OPTIONS=detect_leaks=0
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libasan.so.5
ninja check
ninja check-frontends-pytorch

The issue with naively attempting it was as noted here: google/sanitizers#796 (comment)

I suspect that when building with clang, there should be a -shared-libsan on appropriate link lines.

Boxed kernel assertions

RuntimeError: false INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/core/boxing/impl/boxing.h":48, please report a bug to PyTorch. Tried to call KernelFunction::call() for a kernel that only has a boxed kernel and doesn't support calling from an unboxed API yet.

When hooking the dispatcher, not all kernels are supported with registration via a boxed fallback kernel. Need to special case/skip many of these. Moving out of the capture closure usually gets things moving for now.

Convert ATenLoweringPass (and others) to use OpConversionPattern

Prefer to use an OperandAdaptor

Originally posted by @stellaraccident in https://github.com/_render_node/MDE3OlB1bGxSZXF1ZXN0UmV2aWV3NDY1MzAzMTgx/pull_request_reviews/more_threads

Perform cleanup of Type and Attribute Kinds for all dialects

We have an in-npcomp type range that we should branch off of, but since these are being refactored away upstream, let's not bother updating this.

Originally posted by @stellaraccident in #16

Use `BUILD_SHARED_LIBS` instead of `LLVM_BUILD_LLVM_DYLIB`

On llvm/circt#767, they were experiencing some of the same issues we have intermittently observed with respect to effects of TypeID multiple definition issues resulting in equality of types being dependent on which shared library does the check.

It is my belief that the way that libLLVM.so/libMLIR.so/libNPCOMP.so are "over-linked" (and order inverted on the link command line) creates the conditions for this kind of issue to surface (although, I have not actually managed to ever nail it down to a specific smoking gun -- more of a "that is clearly not the right way to do it and would result in this kind of issue easily" kind of judgments).

I suggested that @mikeurbach try to use BUILD_SHARED_LIBS mode because it gets the shared-library layering correctly, and this resolved the mismatch for them. I suggest we switch npcomp to the same regime and remove support for linking against libMLIR.so. I am very slowly trying to complete https://reviews.llvm.org/D94387, which should fix the situation for the aggregate dylib linking modes, which would be nice for an eventual production release. In that new world, the dylib building mode is a specialization of BUILD_SHARED_LIBS, so we would need to switch locally regardless.

Legitimate failures when lowering pytorch to std (9 test failures)

Failed Tests (9):
  FRONTENDS_PYTORCH :: test_export_ResA.py
  FRONTENDS_PYTORCH :: test_export_add3.py
  FRONTENDS_PYTORCH :: test_export_batchnorm.py
  FRONTENDS_PYTORCH :: test_export_conv2d_back.py
  FRONTENDS_PYTORCH :: test_export_multi_out.py
  FRONTENDS_PYTORCH :: test_export_resnet18.py
  FRONTENDS_PYTORCH :: test_export_vgg11.py
  FRONTENDS_PYTORCH :: test_op_report_conv2d.py
  FRONTENDS_PYTORCH :: test_op_report_vgg_style_lenet.py

Sample of errors:

error: 'aten.relu_' op operand #0 must be tensor of any type values, but got 'memref<32x64x32x32xf32>'
aten to loops conversion failed error: 'std.call' op 'native_batch_norm_4F32_1F32_1F32_4F32_1F32_1F32_1F32_1F32_out' does not reference a valid function
error: unsupported or non-LLVM operation: aten.constant
JIT session error: Symbols not found: [ _mlir_ciface_as_strided_1F32_4F32_out ]
error: 'aten.div' op operand #0 must be tensor of any type values, but got 'f32'

Error when running example scripts

Hi,

I am trying to run example scripts under frontends/pytorch/examples but I got two kinds of errors:

When I am running scripts using capture_function, I got:

the same error occurs when I run cos_e2e.py, div_inplace_e2e.py, mm_e2e.py, mul_maximum_e2e.py, tanh_out_e2e.py. Could someone tell me how to generate these op definitions and where to put them?
Running torchscript_**_e2e.py scripts gives me:

I am still learning the code so this might be a trivial question or maybe I just missed some steps?
I went through all the build steps in README, I successfully ran tools/torchscrip_e2e_test.sh. I installed iree backend using pip.

Rename 'master' branch to 'main'

Following the work done in the rest of the org. This project is small enough that I don't think it requires special coordination. I'll make the change sometime in the next couple of days.

Error when building npcomp in docker

There are two problems I ran into today after fetching the latest code:

When I run cmake --build /build/npcomp --target check-npcomp check-frontends-pytorch in docker environment following README.md in the top directory, I got:

This used to work and there used to be directories under /build such as llvm-install,llvm-build, npcomp. Now there is only /build/npcomp and the directory is almost empty except a .env file and few newly created directories.

I noticed some changes in the build process in #251 and #258. Could someone update this README file to point it to the correct build output directory?

The second problem is probably related to the first: tools/torchscript_e2e_test.sh --config=iree this command will show that it cannot find the corresponding test package:

Thanks!

Full-size printouts of dense constants for real models is killing lit

Unfortunately, we don't have the knobs exposed yet through the Python API to do abbreviated printouts.

Cannot intercept aten::conv2d when dispatching through backend keys

When using a backend dispatch key (i.e. PrivateUse3), aten::conv2d calls are never recorded; however, when using AutogradPrivateUse3, they are (but this has other problems). conv is special in a number of ways and need to check with PT devs regarding how to resolve.

ModuleBuilder represents torch.cos with a constant

The MLIR generated for torch.cos via ModuleBuilder ignores %arg0. Instead, it returns a constant with the result of torch.kernel_call "aten::mm" %arg0 for the %arg0 used during capture_function.

This can be reproduced with the code below or via python3 frontends/pytorch/examples/cos_e2e.py after #134 is submitted.

import torch
import torch_mlir

torch.manual_seed(0)
input = torch.rand(2, 3)

mb = torch_mlir.ModuleBuilder()
with mb.capture_function("cos", [input]) as f:
  result = torch.cos(input)
  f.returns([result])

print(mb.module)

module  {
  func @cos(%arg0: !numpy.ndarray<[2,3]:f32>) -> !numpy.ndarray<[2,3]:f32> {
    %cst = constant dense<[[0.879371106, 0.719147384, 0.996088445], [0.991296648, 0.953116595, 0.805617868]]> : tensor<2x3xf32>
    %0 = numpy.create_array_from_tensor %cst : (tensor<2x3xf32>) -> !numpy.ndarray<[2,3]:f32>
    return %0 : !numpy.ndarray<[2,3]:f32>
  }
}

Resnet 18 iree path

Hello,
I substituted refjit backend with iree backend in torchscript_resnet18_e2e.py:

backend = iree.IreeNpcompBackend()
#backend = refjit.RefjitNpcompBackend()

I know this probably won't run successfully ... But worth a try, I got this error:

I also ran the iree-translate command alone (I copied the instruction from the last line of the screenshot above, I also dumped the input file using mb.module.operation.get_asm and f.write and then passed the input file as iree-translate's parameter):

My questions are:

Are these two errors caused by the same reason but just different print-out or am I invoking the iree-translate tool the wrong way?
Since I am not very familiar with iree (as well as MLIR since I am still studying it:)), how to debug this kind of problem in general?
Is there any plan to pass the resnet18 test with iree backend?

Thanks in advance to anyone who helps me with these questions!

Participate in NPBench

First of all, awesome project!

It would be interesting to see the results of the NPComp infrastructure vs. other python compilers, such as Numba, on scientific python apps. NPBench has a wide variety of HPC and computational science apps written in numpy. It'd be great if you had an implementation/results there!

Building in the docker container: ninja: error: loading 'build.ninja': No such file or directory

I have followed the instruction:

Setup docker container 1. Done
In docker container, do Command prep 2. Done and without a problem.
Try either vanilla compile or PyTorch Frontend compile, running ./build_tools/cmake_configure.sh is fine. But then both give me:

ninja: error: loading 'build.ninja': No such file or directory

Do you have a recommendation how does this ninja error can get fixed?

Incorrect IR when using the cat operator

Here is a simple example showing the problem. The python code is shown below.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_mlir

torch_mlir.debug_trace_to_stderr()

N = 3
Cin = 16
Cout = 4
w = 10
h = 10

class Net(nn.Module):
    def __init__(self, Cin, Cout):
      super(Net, self).__init__()
      self.conv1 = nn.Conv2d(Cin, Cout, (3,3))
    def forward(self, x):
      x0 = self.conv1(x)
      x1 = self.conv1(x)
      z = torch.cat([x0, x1])
      output = F.log_softmax(z, dim=1)
      return output

model = Net(Cin, Cout)
inputs = torch.ones((N,Cin,h,w))
weight = torch.randn(Cout)
loss = torch.nn.NLLLoss()
target = torch.empty(2*N, 8, 8, dtype=torch.long).random_(0, Cout)

mb = torch_mlir.ModuleBuilder()
with mb.capture_function("cat_test", [inputs, target]) as f:
  result = loss(model(inputs), target)
  f.returns([result])
mb.module.operation.print(large_elements_limit=2)

This results in the following output.

TORCH_MLIR TRACE: Convolution (unboxed) dispatch: aten::convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups) -> (Tensor)
TORCH_MLIR TRACE: Convolution (unboxed) dispatch: aten::convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::_cat(Tensor[] tensors, int dim=0) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::_log_softmax(Tensor self, int dim, bool half_to_float) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::nll_loss2d_forward(Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index) -> (Tensor output, Tensor total_weight)
module {
  func @resa(%arg0: !numpy.ndarray<[3,16,10,10]:f32>, %arg1: !numpy.ndarray<[6,8,8]:i64>) -> !numpy.ndarray<[]:f32> {
    %cst = constant opaque<"", "0xDEADBEEF"> : tensor<4x16x3x3xf32>
    %cst_0 = constant opaque<"", "0xDEADBEEF"> : tensor<4xf32>
    %c1_i64 = constant 1 : i64
    %c1_i64_1 = constant 1 : i64
    %0 = basicpy.build_list %c1_i64, %c1_i64_1 : (i64, i64) -> !basicpy.ListType
    %c0_i64 = constant 0 : i64
    %c0_i64_2 = constant 0 : i64
    %1 = basicpy.build_list %c0_i64, %c0_i64_2 : (i64, i64) -> !basicpy.ListType
    %c1_i64_3 = constant 1 : i64
    %c1_i64_4 = constant 1 : i64
    %2 = basicpy.build_list %c1_i64_3, %c1_i64_4 : (i64, i64) -> !basicpy.ListType
    %false = constant false
    %c0_i64_5 = constant 0 : i64
    %c0_i64_6 = constant 0 : i64
    %3 = basicpy.build_list %c0_i64_5, %c0_i64_6 : (i64, i64) -> !basicpy.ListType
    %c1_i64_7 = constant 1 : i64
    %c1_i64_8 = constant 1 : i64
    %c1_i64_9 = constant 1 : i64
    %4 = basicpy.build_list %c1_i64_8, %c1_i64_9 : (i64, i64) -> !basicpy.ListType
    %c0_i64_10 = constant 0 : i64
    %c0_i64_11 = constant 0 : i64
    %5 = basicpy.build_list %c0_i64_10, %c0_i64_11 : (i64, i64) -> !basicpy.ListType
    %c1_i64_12 = constant 1 : i64
    %c1_i64_13 = constant 1 : i64
    %6 = basicpy.build_list %c1_i64_12, %c1_i64_13 : (i64, i64) -> !basicpy.ListType
    %false_14 = constant false
    %c0_i64_15 = constant 0 : i64
    %c0_i64_16 = constant 0 : i64
    %7 = basicpy.build_list %c0_i64_15, %c0_i64_16 : (i64, i64) -> !basicpy.ListType
    %c1_i64_17 = constant 1 : i64
    %8 = basicpy.build_list %12, %13 : (!numpy.ndarray<[3,4,8,8]:f32>, !numpy.ndarray<[3,4,8,8]:f32>) -> !basicpy.ListType
    %c0_i64_18 = constant 0 : i64
    %c1_i64_19 = constant 1 : i64
    %false_20 = constant false
    %9 = basicpy.singleton : !basicpy.NoneType
    %c1_i64_21 = constant 1 : i64
    %c-100_i64 = constant -100 : i64
    %10 = numpy.create_array_from_tensor %cst : (tensor<4x16x3x3xf32>) -> !numpy.ndarray<[4,16,3,3]:f32>
    %11 = numpy.create_array_from_tensor %cst_0 : (tensor<4xf32>) -> !numpy.ndarray<[4]:f32>
    %12 = torch.kernel_call "aten::convolution" %arg0, %10, %11, %0, %1, %2, %false, %3, %c1_i64_7 : (!numpy.ndarray<[3,16,10,10]:f32>, !numpy.ndarray<[4,16,3,3]:f32>, !numpy.ndarray<[4]:f32>, !basicpy.ListType, !basicpy.ListType, !basicpy.ListType, i1, !basicpy.ListType, i64) -> !numpy.ndarray<[3,4,8,8]:f32> {sigArgTypes = ["Tensor", "Tensor", "Tensor?", "int[]", "int[]", "int[]", "bool", "int[]", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}
    %13 = torch.kernel_call "aten::convolution" %arg0, %10, %11, %4, %5, %6, %false_14, %7, %c1_i64_17 : (!numpy.ndarray<[3,16,10,10]:f32>, !numpy.ndarray<[4,16,3,3]:f32>, !numpy.ndarray<[4]:f32>, !basicpy.ListType, !basicpy.ListType, !basicpy.ListType, i1, !basicpy.ListType, i64) -> !numpy.ndarray<[3,4,8,8]:f32> {sigArgTypes = ["Tensor", "Tensor", "Tensor?", "int[]", "int[]", "int[]", "bool", "int[]", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}
    %14 = torch.kernel_call "aten::_cat" %8, %c0_i64_18 : (!basicpy.ListType, i64) -> !numpy.ndarray<[6,4,8,8]:f32> {sigArgTypes = ["Tensor[]", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}
    %15 = torch.kernel_call "aten::_log_softmax" %14, %c1_i64_19, %false_20 : (!numpy.ndarray<[6,4,8,8]:f32>, i64, i1) -> !numpy.ndarray<[6,4,8,8]:f32> {sigArgTypes = ["Tensor", "int", "bool"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}
    %16:2 = torch.kernel_call "aten::nll_loss2d_forward" %15, %arg1, %9, %c1_i64_21, %c-100_i64 : (!numpy.ndarray<[6,4,8,8]:f32>, !numpy.ndarray<[6,8,8]:i64>, !basicpy.NoneType, i64, i64) -> (!numpy.ndarray<[]:f32>, !numpy.ndarray<[]:f32>) {sigArgTypes = ["Tensor", "Tensor", "Tensor?", "int", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor", "Tensor"]}
    return %16#0 : !numpy.ndarray<[]:f32>
  }

The problem with this output however is here

%8 = basicpy.build_list %12, %13 : (!numpy.ndarray<[3,4,8,8]:f32>, !numpy.ndarray<[3,4,8,8]:f32>) -> !basicpy.ListType
...
%12 = torch.kernel_call "aten::convolution" %arg0, %10, %11, %0, %1, %2, %false, %3, %c1_i64_7 : (!numpy.ndarray<[3,16,10,10]:f32>, !numpy.ndarray<[4,16,3,3]:f32>, !numpy.ndarray<[4]:f32>, !basicpy.ListType, !basicpy.ListType, !basicpy.ListType, i1, !basicpy.ListType, i64) -> !numpy.ndarray<[3,4,8,8]:f32> {sigArgTypes = ["Tensor", "Tensor", "Tensor?", "int[]", "int[]", "int[]", "bool", "int[]", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}
%13 = torch.kernel_call "aten::convolution" %arg0, %10, %11, %4, %5, %6, %false_14, %7, %c1_i64_17 : (!numpy.ndarray<[3,16,10,10]:f32>, !numpy.ndarray<[4,16,3,3]:f32>, !numpy.ndarray<[4]:f32>, !basicpy.ListType, !basicpy.ListType, !basicpy.ListType, i1, !basicpy.ListType, i64) -> !numpy.ndarray<[3,4,8,8]:f32> {sigArgTypes = ["Tensor", "Tensor", "Tensor?", "int[]", "int[]", "int[]", "bool", "int[]", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}

because the build_list op is referencing %12 and %13 when they don't exist yet. When I feed this to npcomp-opt, I get

conv2d.mlir:33:10: error: operand #0 does not dominate this use
    %8 = basicpy.build_list %12, %13 : (!numpy.ndarray<[3,4,8,8]:f32>, !numpy.ndarray<[3,4,8,8]:f32>) -> !basicpy.ListType
         ^
conv2d.mlir:33:10: note: see current operation: %8 = "basicpy.build_list"(%12, %13) : (!numpy.ndarray<[3,4,8,8]:f32>, !numpy.ndarray<[3,4,8,8]:f32>) -> !basicpy.ListType
conv2d.mlir:42:11: note: operand defined here
    %12 = torch.kernel_call "aten::convolution" %arg0, %10, %11, %0, %1, %2, %false, %3, %c1_i64_7 : (!numpy.ndarray<[3,16,10,10]:f32>, !numpy.ndarray<[4,16,3,3]:f32>, !numpy.ndarray<[4]:f32>, !basicpy.ListType, !basicpy.ListType, !basicpy.ListType, i1, !basicpy.ListType, i64) -> !numpy.ndarray<[3,4,8,8]:f32> {sigArgTypes = ["Tensor", "Tensor", "Tensor?", "int[]", "int[]", "int[]", "bool", "int[]", "int"], sigIsMutable = false, sigIsVararg = false, sigIsVarret = false, sigRetTypes = ["Tensor"]}

Manually moving the definition of %8 after the definition of %12 and %13 fixes the problem. This seems to suggest that when the build_list operator is being constructed it is not being inserted in the right location.

[TorchToLinalg] Test/implement more cases of torch.nn.functional.linear for

We added support in #209 for some limited cases:

bias is present
2d input
same dtype for all tensors

Adding test cases / implementing support / verifying that they work for these other cases would be welcome.

Test cases likely would involve calling torch.nn.functional.linear directly (instead of using torch.nn.Linear):
https://pytorch.org/docs/master/generated/torch.nn.functional.linear.html#torch.nn.functional.linear

Error while building project - Variable not defined: 'Shape_ExtentTensorType'

I am trying to build mlir-comp and it fails with the following error:

[27/126] Building TCPOps.h.inc... FAILED: include/npcomp/Dialect/TCP/IR/TCPOps.h.inc cd .../mlir-npcomp/include/npcomp/Dialect/TCP/IR/TCPOps.td:42:50: error: Variable not defined: 'Shape_ExtentTensorType' let arguments = (ins AnyRankedTensor:$operand, Shape_ExtentTensorType:$shape); ^ [28/126] Building TCPOps.cpp.inc... FAILED: include/npcomp/Dialect/TCP/IR/TCPOps.cpp.inc .../mlir-npcomp/include/npcomp/Dialect/TCP/IR/TCPOps.td:42:50: error: Variable not defined: 'Shape_ExtentTensorType' let arguments = (ins AnyRankedTensor:$operand, Shape_ExtentTensorType:$shape); ^ [32/126] Building TCPOpsDialect.h.inc...

And further down with aten_ops too. I am not sure if the above error is related with this one.
../frontends/pytorch/lib/aten_ops.cpp:107:19: error: no member named 'addmm' in namespace 'at::native'; did you mean 'addmv'? at::native::addmm(torch_a, torch_b, torch_c, alpha, beta).clone(); ~~~~~~~~~~~~^~~~~ addmv

Trailing for "add_" not working

Hey guys,

I tried updating to this patch
but when I tried tracing with acap_dispatch, it seems to still be using "add" instead of "add_". I have a small unit test and the output that is generated as attachment
add_test.zip
. Any help/comments would be appreciated! :)

Cannot export model with torch.arange

Below is a simple python script that reproduces the issue.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_mlir

torch_mlir.debug_trace_to_stderr()

N = 3
Cin = 16
Cout = 4
w = 10
h = 10

class Net(nn.Module):
    def __init__(self, Cin, Cout):
      super(Net, self).__init__()
      self.conv1 = nn.Conv2d(Cin, Cout, (3,3))
    def forward(self, x):
      x = self.conv1(x)
      indices = torch.arange(N)
      x = x[indices, :, :, :]
      output = F.log_softmax(x, dim=1)
      return output

model = Net(Cin, Cout)
inputs = torch.ones((N,Cin,h,w))
loss = torch.nn.NLLLoss()
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, Cout)

mb = torch_mlir.ModuleBuilder()
with mb.capture_function("arange_test", [inputs, target]) as f:
  result = loss(model(inputs), target)
  result.backward()
  f.returns([result] + [p.grad for p in model.parameters()])
mb.module.operation.print(large_elements_limit=2)

When I try to run this, I get the following output.

TORCH_MLIR TRACE: Convolution (unboxed) dispatch: aten::convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups) -> (Tensor)
TORCH_MLIR TRACE: Fallback (boxed) dispatch: aten::arange.start_out(Scalar start, Scalar end, Scalar step=1, *, Tensor(a!) out) -> (Tensor(a!))
Traceback (most recent call last):
  File "models/conv2d.py", line 32, in <module>
    result = loss(model(inputs), target)
  File "/pytorch_nightly/lib/python3.8/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
    result = self.forward(*input, **kwargs)
  File "models/conv2d.py", line 20, in forward
    indices = torch.arange(N)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::arange.start_out.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, PrivateUse2, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /pytorch/build/aten/src/ATen/CPUType.cpp:2154 [kernel]
PrivateUse2: registered at /mlir-npcomp/frontends/pytorch/csrc/c10_dispatch/acap_dispatch.cpp:645 [backend fallback]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_1.cpp:8628 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_1.cpp:10219 [kernel]
Autocast: fallthrough registered at /pytorch/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:527 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

The error states that there is no fallback kernel defined for this op (arange.start_out). But I noticed that in type_dispatch/aten_mlir_type_default.cpp there is a function RegisterAtenTypeFunctions that registers arange.start_out

          .op(torch::RegisterOperators::options()
                  .schema("aten::arange.start_out(Scalar start, Scalar end, "
                          "Scalar step=1, *, Tensor(a!) out) -> Tensor(a!)")
                  .impl_unboxedOnlyKernel<at::Tensor &(at::Tensor &, at::Scalar,
                                                       at::Scalar, at::Scalar),
                                          &ATenMLIRTypeDefault::arange_out>(
                      at::TensorTypeId::XLATensorId)
                  .aliasAnalysis(c10::AliasAnalysisKind::FROM_SCHEMA))

But the error also states that I passed in an empty list of Tensors. Does that mean we need another definition of this op or just how it handles optional arguments because I have only passed in the end argument in the above function? Or do you think the root cause of this is something completely different?

Use Symbolic names for any operations that can't be automatically generated in ATen.td

Any chance we could get actual symbolic argument names? Esp for BatchNorm which has a lot of parameters.

Originally posted by @stellaraccident in #16

Missing `_get_mlir` Python method binding in PyTorch Frontend

Importing the PyTorch frontend complains of a missing _get_mlir method in the _torch_mlir.so binary. I am currently on master (81dd571).

# Build script.
LLVM_VERSION=10
export CC=clang-$LLVM_VERSION
export CXX=clang++
export LDFLAGS=-fuse-ld=$(which ld.lld)

sh ./build_tools/install_mlir.sh
sh ./build_tools/cmake_configure.sh

# Build and run tests
cd build
ninja
ninja check-npcomp

# Produce import error.
PYTHONPATH=build/python:build/frontends/pytorch/csrc python
Python 3.8.5 (default, Jul 27 2020, 10:09:03)
[GCC 10.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import npcomp.frontends.pytorch
2020-09-09 13:13:57.642115: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/python/npcomp/frontends/pytorch/__init__.py", line 8, in <module>
    from _torch_mlir import _get_mlir
ImportError: cannot import name '_get_mlir' from '_torch_mlir' (build/frontends/pytorch/csrc/_torch_mlir.so)

Bugs of tcf::ConvNCHWOp

Right now I'm learning the source code. And I saw the following code in TCFToLinalg.cpp:67

auto heightPlusTwicePadding = builder.create<SubIOp>(op->getLoc(), height, twicePaddingHeight);

According to PyTorch conv2d document
I think it might be:

auto heightPlusTwicePadding = builder.create<AddIOp>(op->getLoc(), height, twicePaddingHeight);

I'm just started learning PyTorch and MLIR things. I'm not sure if I'm correct?

Besides, I have another question. Does it support conversion (tcf::ConvNCHWOp->linalg::ConvNCHWOp) with padding, dilation, stride parameter?

ninja check-npcomp failed

when I run ninja check-npcomp, there are some erros :

Testing Time: 2.29s
Unsupported: 1
Passed : 2
Failed : 74
FAILED: test/CMakeFiles/check-npcomp-lit

why?

Need to import unrecognized tensors as constants

RuntimeError: TODO: implement tensor import for non-arg tensors

Will require building out DenseElementsAttr in the C/Python API.

Automatically generated ATen ops need verifiers.

There doesn't seem to be anything in the verifier for the convolution ops that guarantees a particular rank here, so this could easily segfault/abort.

Originally posted by @silvasean in #16

Audit ATenLoweringPass for completeness of transformation

Would you mind adding a comment describing the invariants, and conversions that are supported (and what is currently not supported).

Originally posted by @stellaraccident in #16

RuntimeError: Unsupported capture value returned from kernel (Bool): True

In test_export_conv2d_back.py:

Traceback (most recent call last):
  File "/src/mlir-npcomp/frontends/pytorch/test/acap_export/test_export_conv2d_back.py", line 30, in <module>
    result = model(tensor)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 419, in forward
    return self._conv_forward(input, self.weight)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 415, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Unsupported capture value returned from kernel (Bool): True

Crashes and issues on MacOS

Now that #154 is resolved, we get further on MacOS, exposing a couple of other issues. Tracking them here.

StatisticsOpInterface is ripe for refactoring

This looks like a quite heavy type to return here.
It isn't clear to me what this is doing right now, but taking a StringMap (or better if possible a DenseMap<StringRef. uint64_t>) as output operand would allow to not realloc constantly by reusing the same map over and over from the call site.

Originally posted by @joker-eph in https://github.com/_render_node/MDIzOlB1bGxSZXF1ZXN0UmV2aWV3VGhyZWFkMjkyOTIwMTA4OnYy/pull_request_review_threads/discussion

Is it possible to build npcomp with a prebuilt llvm-project ?

Hi,

I'm thinking to contribute some build feature to the npcomp cmake build script, and want some advice before doing it.

Here are our problem.

We developed a project Foo, which will use mlir's python binding from a modified llvm-project, this llvm-project is updated with upstream, and shipped to us as a prebuilt library.
Now , we want also to use npcomp in our project, but the current build process need to build llvm from source, and will create a bundled mlir pacakge, so this just doesn't work with us.

I think npcomp should be able to work in a more decoupled way, what I want to do are:

Build npcomp with a prebuilt llvm-project, with something like cmake -DLLVM_INSTALL_ROOT
Re-arrange code/file to remove the "python_packages/npcomp_core/mlir", so the import mlir will import milr from the prebuilt mlir , and import npcomp will just import things npcomp needed

Is it fine to do so?

Sincerely

Make required PyTorch version clear

With PyTorch nightly (1.9.0.dev20210216) installed via conda, I get the following error

static_assert(c10::guts::false_t<Func>(), ".impl_UNBOXED(...) was removed. Please use .impl(...) instead.");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~
../frontends/pytorch/csrc/builder/acap_dispatch.cpp:596:5: note: in instantiation of function template specialization 'torch::Library::impl_UNBOXED<const char *, at::Tensor (const at::Tensor &, const at::Tensor &, const c10::optional<at::Tensor> &, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long)>' requested here
  m.impl_UNBOXED("convolution", &AcapController::convolutionKernel);

I could do a bisection but it would be better if you could point me to a version that is known to work. Thank you!

llvm / torch-mlir Goto Github PK

torch-mlir's People

Contributors

Stargazers

Watchers

Forkers

torch-mlir's Issues

CMake Error at /work/install/llvm-project/mlir-generic-rtti/lib/cmake/mlir/AddMLIR.cmake:187 (get_target_property):

Traceback (most recent call last):

Recommend Projects

Recommend Topics

Recommend Org