Comments (6)
This is related to support for explicit loops in the frontend. Supporting variable sizes (i.e. symbolic shapes in TVM) in general is probably a substantial change, but allowing explicit loops, declared in the frontend code, over a single variable-sized dimension is very similar to the existing support for variable batch sizes but would cover many NLP use cases (like RNNs/QRNNs).
from tensorcomprehensions.
Technically, this is simpler than it looks. Most of the compilation flow should support this transparently (Halide and polyhedral passes). For example, polyhedral scheduling is meant to operate on symbolic parameters and we have an option to substitute them with inferred numerical values before or _after the scheduling itself. We even have tests that emit parametric code.
However, this will degrade performance. Simply put, the more information we have about the operation, the deeper we can analyze it, the better it can be optimized. So I'd argue for as specialized code as possible.
The main problem with RNNs now would be their outer sequentiality. But this is mostly orthogonal to variable sizes.
from tensorcomprehensions.
An RNN kernel would look something like this:
def elman_rnn(float(T,B,Ci) input, float(B,Co) h0, float(Ci,Co) i2h, float(Co,Co) h2h) -> (hidden) {
for t in T {
if t == 0 {hidden(t,b,co) +=! h2h(ci,co) * h0(b,ci)}
else {hidden(t,b,co) +=! h2h(ci,co) * hidden(t-1,b,ci)}
hidden(t,b,co) += i2h(ci,co) * input(t,b,ci)
}
}
which does indeed seem pretty annoying to support, far beyond the variable T. I was also wrong and TC doesn't appear to currently support optimizing for a variable batch size.
A QRNN kernel would also have these issues, just without the reduction inside the loop.
from tensorcomprehensions.
Indeed, the "imperative syntax" proposed earlier in TC context is not yet implemented in the language. It is annoying to support efficient code generation. The "imperative loop" is outer sequential and the current compilation pass would just map the computation to a single block given that syntax is supported by the frontend. We would need to emit global synchronizations even for a naïve mapping to more than one block, and we currently cannot. But this seems orthogonal to variable batch sizes.
Turning on parametric batch sizes is a small change. In general, TC looks at the actual sizes of the supplied tensors and infers numerical values of all symbolic parameters. These values are substituted in some optimization passes. Disabling this substitution looks trivial. However, disabling it for a specific parameter requires the user to somehow tell us for which one.
from tensorcomprehensions.
Let me first share some context about parametric sizes and then a short-term "solution" that should work in practice.
Solving the general problem is complicated as it requires inferring proper parameter regions and devising proper strategies for each region; this is a longer term goal to make it push button and just work. Additionally emitting symbolic code involves inefficiencies (control-flow that just disappears with JIT'ing or missing information about parameter ranges that messes up internal heuristics).
The current approach works because it is pragmatic and goes for the lowest hanging fruit, however it suffers from needing to autotune for various sizes if one isn't careful. Compilation given fixed options however is not a big deal. It happens already under the hood all the time when you emit SASS from PTX (the first time you run a kernel, then it gets cached to disk).
One simple way to circumvent the autotuning pain on the user side is to just reuse the options found by an autotuning run when you change sizes. This would give the same type of behavior that one would get from a parametric codegen:
a. with parametric codegen, the options don't change, the code is compiled, cached and autotuned only once;
b. with options reuse, the options don't change, the code is autotuned only once but the code is compiled and cached for each new size; this should significantly improve the user experience though
So for a short-term improvement in the workflow I would recommend:
- take a TC + sizes and run the autotuner with large enough generations/candidates on it, save the best options
- take the same TC, change sizes, reuse the options in 1 and just call
compile
2b. alternatively use the options in 1 as a starting point for calling a smallautotune
run with few generations/candidates to do some quick minimal exploration - when sizes change too much and perf is too far from some notion of good, repeat step 1.
- as always, save your cached results in a proto FB file that you can reuse across runs so you only need to autotune / compile the least amount possible; note however that we are still very early on in the life of the project and the proto itself is subject to change so 1-3 will prob need to be redone a few times until we iterate to a stable enough system.
Hiding the above from the user amounts to solving the general problem but I think the faster workflow outlined in 1-4 is very easy to setup and should improve things significantly. @jekbradbury 's example about RNNs is different and we have similar things in the work but only at a conceptual stage atm.
@roadwalker does the above sound like a reasonable solution?
from tensorcomprehensions.
@jekbradbury @seongwook-ham see if #225 starts addressing your needs. The code is still JIT'ed and will take a few seconds for various parameter sizes but autotuning results can be easily reused.
If you have real world parametric needs where you see many different versions for 1 parameter (i.e. > 100) then we probably can give you 1-3 parameters in a relatively short term but it would be great to work on a concrete example.
RNN loops are still outside of the scope at this point though.
from tensorcomprehensions.
Related Issues (20)
- [Build] Aten library cannot be found, incomplete config HOT 5
- Conda installation failed: PROTOBUF_LIBRARIES are used in this project, but they are set to NOTFOUND." HOT 1
- Convolution Backward Pass with kernel_size > 1 HOT 5
- module 'tensor_comprehensions' has no attribute 'make_autotuned_options_factory'
- Build from source HOT 4
- Update Conda package? HOT 1
- Question about how to integrate TC to my project HOT 1
- [Build] No rule to make libnvrtc.so needed by 'test/cuda/test_basic_gpu'.
- [BUILD] Trying on ubuntu 18.04
- Cannot find -lqcustomplot
- Is there support for pytorch 1.x ? HOT 2
- Can I have a minimum version excluding python and torch?
- [Build] error in llvm: include/llvm/Object/SymbolicFile.h HOT 3
- Is the project still under development? HOT 3
- [Build] dockerfile build failed, source code compilation failed HOT 1
- TC supports for Pytorch >= 1.0 or for CUDA 10.0?
- How to express Broadcast with TC language? HOT 6
- an IRBuilder interface for TensorComprehensions? HOT 1
- [Build] Error while "Building CXX object tc/core/CMakeFiles/tc_core_cpu.dir/polyhedral/llvm_jit.cc.o"
- Python API HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorcomprehensions.