Documentations |
Chat |
taichi-nightly |
taichi-nightly-cuda-10-0 |
taichi-nightly-cuda-10-1 |
![Documentation Status](https://camo.githubusercontent.com/949267f26fb25718611586a4b2131895ad28e6a6cbb6000c7a299c31cfc27f0e/68747470733a2f2f72656164746865646f63732e6f72672f70726f6a656374732f7461696368692f62616467652f3f76657273696f6e3d6c6174657374) |
![Join the chat at https://gitter.im/taichi-dev/Lobby](https://camo.githubusercontent.com/b87f23f4df754efd9baf8812597eb4a1799e1308117b6634cd634bb50c48ef9b/68747470733a2f2f6261646765732e6769747465722e696d2f7461696368692d6465762f4c6f6262792e737667) |
![Downloads](https://camo.githubusercontent.com/af0192da6b861332ba1ac95269ee3a5a3346d6f3d1780799bb56a596198d4a48/68747470733a2f2f706570792e746563682f62616467652f7461696368692d6e696768746c79) |
![Downloads](https://camo.githubusercontent.com/3bd311a715100d4b57b0d90bad49d45f333d92520cc08fc4a7a7790e9b8b0653/68747470733a2f2f706570792e746563682f62616467652f7461696368692d6e696768746c792d637564612d31302d30) |
![Downloads](https://camo.githubusercontent.com/ec114e7887bd03fee1c0486b1050b928dbbc2539b3ae202041494c102456905f/68747470733a2f2f706570792e746563682f62616467652f7461696368692d6e696768746c792d637564612d31302d31) |
# Python 3.6/3.7 needed for all platforms. Python 3.8 supported only on OS X and Windows
# CPU only. No GPU/CUDA needed. (Linux, OS X and Windows)
python3 -m pip install taichi-nightly
# With GPU (CUDA 10.0) support (Linux only)
python3 -m pip install taichi-nightly-cuda-10-0
# With GPU (CUDA 10.1) support (Linux only)
python3 -m pip install taichi-nightly-cuda-10-1
# Build from source if you work in other environments
|
Linux (CUDA) |
OS X (10.14+) |
Windows |
Build |
![Build Status](https://camo.githubusercontent.com/19ca149c30461f1251a58a9069b4f6b4f8eb6572b0dd27d1e532d79c2e50075f/687474703a2f2f6631312e637361696c2e6d69742e6564753a383038302f6a6f622f7461696368692f62616467652f69636f6e) |
![Build Status](https://camo.githubusercontent.com/a3f402f6263fcd02fcd618c83c7872e5824c962bb7e0c06adacb669e1a0a54c1/68747470733a2f2f7472617669732d63692e636f6d2f7461696368692d6465762f7461696368692e7376673f6272616e63683d6d6173746572) |
![Build status](https://camo.githubusercontent.com/38e16a286d9c750c92cc2307d10bf4e0086a92812b81c6b926c16a83a8dddfda/68747470733a2f2f63692e6170707665796f722e636f6d2f6170692f70726f6a656374732f7374617475732f79786d30756e69696e38787479346a372f6272616e63682f6d61737465723f7376673d74727565) |
PyPI |
![Build Status](https://camo.githubusercontent.com/e0c1ec815b5c27e2ddc39cb0fc58e998e4747084193b64adb0cec0c8602599d0/68747470733a2f2f7472617669732d63692e636f6d2f7975616e6d696e672d68752f7461696368692d776865656c732d746573742e7376673f6272616e63683d6d6173746572) |
![Build Status](https://camo.githubusercontent.com/e0c1ec815b5c27e2ddc39cb0fc58e998e4747084193b64adb0cec0c8602599d0/68747470733a2f2f7472617669732d63692e636f6d2f7975616e6d696e672d68752f7461696368692d776865656c732d746573742e7376673f6272616e63683d6d6173746572) |
![Build status](https://camo.githubusercontent.com/b07b62defe745022ad000be430bd0949915b2383f16fd800b0d1a2da0bfe9828/68747470733a2f2f63692e6170707665796f722e636f6d2f6170692f70726f6a656374732f7374617475732f3339617239776138796434396a65376f3f7376673d74727565) |
- (Mar 3, 2020) v0.5.6 released:
- Fixed runtime LLVM bitcode loading failure on Linux
- Fixed a GUI bug in
ti.GUI.line
(by Mingkuan Xu [xumingkuan])
- Fixed frontend syntax error false positive (static range-fors) (by Mingkuan Xu [xumingkuan])
arch=ti.arm64
is now supported. (Please build from source)
- CUDA supported on NVIDIA Jetson. (Please build from source)
- (Mar 2, 2020) v0.5.5 released: Experimental CUDA 10.0/10.1 support on Windows. Feedbacks are welcome!
- (Mar 1, 2020) v0.5.4 released
- Metal backend now supports < 32bit args (#530) (by Ye Kuang [k-ye])
- Added
ti.imread/imwrite/imshow
for convenient image IO (by Yubin Peng [archibate])
ti.GUI.set_image
now takes all numpy unsigned integer types (by Yubin Peng [archibate])
- Bug fix: Make sure KernelTemplateMapper extractors's size is the same as the number of args (by Ye Kuang [k-ye])
- Avoid duplicate evaluations in chaining comparison (such as
1 < ti.append(...) < 3 < 4
) (by Mingkuan Xu [xumingkuan])
- Frontend kernel/function structure checking (#544) (by Mingkuan Xu [xumingkuan])
- Throw exception instead of SIGABRT to obtain RuntimeError in Python-scope (by Yubin Peng [archibate])
- Mark sync bit only after running a kernel on GPU (by Ye Kuang [k-ye])
@ti.classkernel
is deprecated. Always use ti.kernel
, no matter you are decorating a class member function or not (by Ye Kuang [k-ye])
- Fix ti.func AST transform (due to locals() not saving compile result) #538, #539 (by Yubin Peng [archibate])
- Add a KernelSimplicityASTChecker to ensure grad kernel is compliant (#553) (by Ye Kuang [k-ye])
- Fixed MSVC C++ mangling which leads to unsupported characters in LLVM NVPTX ASM printer
- CUDA unified memory dependency is now removed. Set
TI_USE_UNIFIED_MEMORY=0
to disable unified memory usage
- Improved
ti.GUI.line
performance
- (For developers) compiler significantly refactored and folder structure reorganized
- (Feb 25, 2020) v0.5.3 released
- Better error message when try to declare tensors after kernel invocation (by Yubin Peng [archibate])
- Logging:
ti.warning
renamed to ti.warn
- Arch:
ti.x86_64
renamed to ti.x64
. ti.x86_64
is deprecated and will be removed in a future release
- (For developers) Improved runtime bit code compilation thread safety (by Yubin Peng [archibate])
- Improved OS X GUI performance (by Ye Kuang [k-ye])
- Experimental support for new integer types
u8, i8, u16, i16, u32
(by Yubin Peng [archibate])
- Update doc (by Ye Kuang [k-ye])
- (Feb 20, 2020) v0.5.2 released
- Gradients for
ti.pow
now supported (by Yubin Peng [archibate])
- Multi-threaded unit testing (by Yubin Peng [archibate])
- Fixed Taichi crashing when starting multiple instances simultaneously (by Yubin Peng [archibate])
- Metal backend now supports
ti.pow
(by Ye Kuang [k-ye])
- Better algebraic simplification (by Mingkuan Xu [xumingkuan])
ti.normalized
now optionally takes a argument eps
to prevent division by zero in differentiable programming
- Improved random number generation by decorrelating PRNG streams on CUDA
- Set environment variable
TI_LOG_LEVEL
to trace
, debug
, info
, warn
, error
to filter out/increase verbosity. Default=info
- [bug fix] fixed a loud failure on differentiable programming code generation due to a new optimization pass
- Added
ti.GUI.triangle
example
- Doc update: added
ti.cross
for 3D cross products
- Use environment variable
TI_TEST_THREADS
to override testing threads
- [For Taichi developers, bug fix]
ti.init(print_processed=True)
renamed to ti.init(print_preprocessed=True)
- Various development infrastructure improvements by Yubin Peng [archibate]
- Official Python3.6 - Python3.8 packages on OS X (by wYw [Detavern])
- (Feb 16, 2020) v0.5.1 released
- Keyboard and mouse events supported in the GUI system. Check out mpm128.py for a interactive demo! (by Yubin Peng [archibate] and Ye Kuang [k-ye])
- Basic algebraic simplification passes (by Mingkuan Xu [xumingkuan])
- (For developers)
ti
(ti.exe
) command supported on Windows after setting %PATH%
correctly (by Mingkuan Xu [xumingkuan])
- General power operator
x ** y
now supported in Taichi kernels (by Yubin Peng [archibate])
.dense(...).pointer()
now abbreviated as .pointer(...)
. pointer
now stands for a dense pointer array. This leads to cleaner code and better performance. (by Kenneth Lozes [KLozes])
- (Advanced struct-fors only)
for i in X
now iterates all child instances of X
instead of X
itself. Skip this if you only use X=leaf node
such as ti.f32/i32/Vector/Matrix
.
- Fixed cuda random number generator racing conditions
- (Feb 14, 2020) v0.5.0 released with a new Apple Metal GPU backend for Mac OS X users! (by Ye Kuang [k-ye])
- Just initialize your program with
ti.init(..., arch=ti.metal)
and run Taichi on your Mac GPUs!
- A few takeaways if you do want to use the Metal backend:
- For now, the Metal backend only supports
dense
SNodes and 32-bit data types. It doesn't support ti.random()
or print()
.
- Pre-2015 models may encounter some undefined behaviors under certain conditions (e.g. read-after-write). According to our tests, it seems like the memory order on a single GPU thread could go inconsistent on these models.
- The
[]
operator in Python is slow in the current implementation. If you need to do a large number of reads, consider dumping all the data to a numpy
array via to_numpy()
as a workaround. For writes, consider first generating the data into a numpy
array, then copying that to the Taichi variables as a whole.
- Do NOT expect a performance boost yet, and we are still profiling and tuning the new backend. (So far we only saw a big performance improvement on a 2015 MBP 13-inch model.)
- Full changelog
- (Done) Fully implement the LLVM backend to replace the legacy source-to-source C++/CUDA backends (By Dec 2019)
- The only missing features compared to the old source-to-source backends:
- Vectorization on CPUs. Given most users who want performance are using GPUs (CUDA), this is given low priority.
- Automatic shared memory utilization. Postponed until Feb/March 2020.
- (Done) Redesign & reimplement (GPU) memory allocator (by the end of Jan 2020)
- (WIP) Tune the performance of the LLVM backend to match that of the legacy source-to-source backends (Hopefully by Feb, 2020. Current progress: setting up/tuning for final benchmarks)
- (SIGGRAPH Asia 2019) High-Performance Computation on Sparse Data Structures [Video] [BibTex]
- by Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand
- (ICLR 2020) Differentiable Programming for Physical Simulation [Video] [BibTex] [Code]
- by Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Frédo Durand