mitsuba-renderer / enoki Goto Github PK
View Code? Open in Web Editor NEWEnoki: structured vectorization and differentiation on modern processor architectures
License: Other
Enoki: structured vectorization and differentiation on modern processor architectures
License: Other
For example, if I have an int* like an int array
How can I cast that int array into a FloatC/FloatD, which is a cuda_array in enoki efficiently? I don't think it is a good idea to scatter_add each element in that int array to enoki cuda_array.
Thanks
Having other backend: DirectX HLSL Direct Compute, OpenCL, Compute OpenGL, ...
Could you explain what is needed to implement another GPUArray (CLArray, DxArray, ...). Or GPUArray which can target CUDA, OpenCL, ... At compile time.
Hello,
All tests has been passed, but I can't compile the following code using Visual Studio 2017 / AVX2:
template <typename Value>
Value srgb_gamma(Value x) {
return enoki::select(
x <= 0.0031308f,
x * 12.92f,
enoki::pow(x * 1.055f, 1.f / 2.4f) - 0.055f
);
}
using ColorP = enoki::Array<float, 16>;
ColorP input = /* ... */;
ColorP output = srgb_gamma(input);
I get this error:
1> c:\code\vsprojects\enoki\include\enoki\array_math.h(962): error C2672: 'enoki::low': no matching overloaded function found
1> c:\code\vsprojects\enoki_test\enoki_test\main.cpp(94): note: see reference to function template instantiation 'auto enoki::pow<false,Derived_,float>(const T1 &,const T2 &)' being compiled
1> with
1> [
1> Derived_=enoki::Array<float,16,true,enoki::RoundingMode::Default>,
1> T1=enoki::Array<float,16,true,enoki::RoundingMode::Default>,
1> T2=float
1> ]
1> c:\code\vsprojects\enoki_test\enoki_test\main.cpp(165): note: see reference to function template instantiation 'Value srgb_gamma<T>(Value)' being compiled
1> with
1> [
1> Value=ColorP,
1> T=ColorP
1> ]
1> c:\code\vsprojects\enoki\include\enoki\array_traits.h(151): error C2783: 'auto enoki::low(const Array &)': could not deduce template argument for '__formal'
...
...
Here, if enoki::Array<float, 16>
is replaced by enoki::Array<float, 8>
, there is no error.
On the other hand, the below version of code is successfully compiled (explicitly creating Value(1.f / 2.4f)):
template <typename Value>
Value srgb_gamma(Value x) {
return enoki::select(
x <= 0.0031308f,
x * 12.92f,
enoki::pow(x * 1.055f, Value(1.f / 2.4f)) - 0.055f
);
}
using ColorP = enoki::Array<float, 16>;
ColorP input = /* ... */;
ColorP output = srgb_gamma(input);
This behavior may be because MSVC compiler can't resolve this type of function overloading for the current code.
Hi,
I have a 100GB file of floats that I have memory mapped, what would be the recommended pattern for doing things like finding the min and max value or computing a histogram? It seems like DynamicArray is the thing to use but it assumes ownership of the array. I could loop over fixed size chunks and load<>() them into an Array but then I need to deal with the boundary condition at the end if the dataset isn't a multiple of the Array size. What would be your suggestions for this scenario?
Hi, I'm trying to use enoki, but it cannot compile the following example with cmake.
hello_enoki.cpp:
#include <enoki/array.h>
#include <string>
#include <iostream>
using namespace enoki;
using StrArray = Array<std::string, 2>;
int main(int argc, char **argv) {
StrArray x("Hello ", "How are "), y("world!", "you?");
std::cout << x + y << std::endl;
return 0;
}
CMakeLists.txt:
cmake_minimum_required(VERSION 2.8.12)
project(mytest)
# C++17
include(CheckCXXCompilerFlag)
if (CMAKE_CXX_COMPILER_ID MATCHES "^(GNU|Clang|Emscripten|Intel)$")
CHECK_CXX_COMPILER_FLAG("-std=c++17" HAS_CPP17_FLAG)
if (HAS_CPP17_FLAG)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++17")
else()
CHECK_CXX_COMPILER_FLAG("-std=c++1z" HAS_CPP1Z_FLAG)
if (HAS_CPP1Z_FLAG)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++1z")
else()
message(FATAL_ERROR "Unsupported compiler -- nanogui requires C++17 support!")
endif()
endif()
elseif(MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /std:c++17")
endif()
# Enoki
add_subdirectory(enoki)
enoki_set_compile_flags()
enoki_set_native_flags()
include_directories(enoki/include)
add_executable(mytest hello_enoki.cpp)
Any advice? thanks!
After I following the enoki document GPU Arrays
(https://enoki.readthedocs.io/en/master/gpu.html#)
cd <path-to-enoki>
mkdir build
cmake -DENOKI_CUDA=ON -DENOKI_AUTODIFF=ON -DENOKI_PYTHON=ON ..
make
In python:
>>> from enoki import *
I find only few names imported, without FloatC
, cuda_set_log_level
.
['CPUBuffer',
'__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__path__',
'__spec__',
'__version__',
'allclose',
'arange',
'core',
'e',
'empty',
'full',
'inf',
'linspace',
'nan',
'pi',
'zero']
Did i miss something? Or the code only for demonstration?
Thanks!
This library seems to use the following conventions:
(*) Ensures fast matrix-column-vector multiplications.
Any possibility of providing support for the exact opposites of these as well:
(**) Ensures fast row-vector-matrix multiplications.
Consider the following code snippet. The program prints [10, 20, 30, 40, 5, 6, 7, 8]
four times if it's being compiled "out of the box". However as soon as I specify -mavx2
or march=native
or the like
the output is [1, 2, 3, 4, 5, 6, 7, 8]
for the first three prints. The fourth one works as expected, though.
#include <iostream>
#include <enoki/array.h>
using namespace enoki;
int main() {
auto print = [](auto x) { std::cout << x << '\n'; };
using Arr = Array<int, 8>;
using M = mask_t<Arr>;
M m{1,1,1,1,0,0,0,0};
{
Arr a = {1, 2, 3, 4, 5, 6, 7, 8};
masked(a, m) *= 10;
std::cout << a << std::endl; // <- Wrong: should print [10, 20, 30, 40, 5, 6, 7, 8]
}
{
Arr a = {1, 2, 3, 4, 5, 6, 7, 8};
a = enoki::select(m, a * 10, a);
std::cout << a << std::endl; // <- Wrong: should print [10, 20, 30, 40, 5, 6, 7, 8]
}
{
Arr a = {1, 2, 3, 4, 5, 6, 7, 8};
a[m] *= 10;
std::cout << a << std::endl; // <- Wrong: should print [10, 20, 30, 40, 5, 6, 7, 8]
}
{
Arr a = {1, 2, 3, 4, 5, 6, 7, 8};
a[m > 0] *= 10;
std::cout << a << std::endl; // <- OK: prints [10, 20, 30, 40, 5, 6, 7, 8]
}
return 0;
}
I've tested gcc-7.4, clang-7 and clang-9 on ubuntu 18.04.
Here's the CmakeLists.txt I'm using:
cmake_minimum_required(VERSION 3.15)
project(enoki_test)
set(CMAKE_CXX_STANDARD 17)
add_executable(enoki_test main.cpp)
target_include_directories(enoki_test PRIVATE ../enoki/include)
set(CMAKE_CXX_FLAGS "-mavx2")
Any idea how to fix this?
In python, ek.select(True, 0.1, 0.2)
outputs [1] of type scalar.Vector1m
Adding the following binding code in src/python/scalar.cpp
solves the bug.
m.def("select", [](bool a, Float b, Float c) {
return enoki::select(a, b, c);
});
Please verify if this is the correct way to add binding to scalar for select
function.
I clone the repository recursively, as suggested by the documentation:
$ git clone --recursive https://github.com/mitsuba-renderer/enoki
Cloning into 'enoki'...
...
Cloning into '/home/bram/src/enoki/ext/cub'...
...
Cloning into '/home/bram/src/enoki/ext/pybind11'...
...
Cloning into '/home/bram/src/enoki/ext/pybind11/tools/clang'...
...
I then call cmake
$ cd enoki
$ mkdir build
$ cd build
$ CXX=clang++-8 CC=clang-8 cmake ../
-- The CXX compiler identification is Clang 8.0.0
-- Check for working CXX compiler: /usr/bin/clang++-8
-- Check for working CXX compiler: /usr/bin/clang++-8 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
-- Enoki: using libc++.
-- Found Sphinx: /usr/bin/sphinx-build
-- Configuring done
-- Generating done
-- Build files have been written to: /home/bram/src/enoki/build
When I then try to make, nothing gets built.
$ make
$
I expect at least the tests to get built with that.
This is on Ubuntu 18.04.4 LTS
if desired, it can compile the same source code to multiple different implementations
The above line from README suggests this but I couldn't find examples/tutorials/other references regarding the process to do so. Would it be possible to add more information regarding this?
I would also like to generate multiple versions (scalar, avx512, avx2, sse, cuda) to benchmark them (since sometimes avx drops the clock speed and can actually hurt performance in multi-threaded applications). Would it be possible to do this in the same binary?
In c++
FloatC data= (enoki::arange(8)) * 0.f;
FloatC data2 = (enoki::arange(8 * 2)) * 1.f;
How can I do something like:
data = data + data2;
or
data = data + data2[0:8] + data2[8:16];
Thanks!!
clang++-8 -I ../../src/ThreadTracer -I ../../src/enoki/include -mavx2 -O2 -g -std=c++17 try.cpp ../../src/ThreadTracer/threadtracer.o -o try
In file included from try.cpp:2:
In file included from ../../src/enoki/include/enoki/array.h:47:
../../src/enoki/include/enoki/array_avx.h:384:35: error: always_inline function '_mm256_fnmadd_ps' requires target
feature 'fma', but would be inlined into function 'rsqrt_' that is compiled without support for 'fma'
r = _mm256_mul_ps(_mm256_fnmadd_ps(t1, r, c1), t0);
$ clang++-8 --version
clang version 8.0.0-3~ubuntu18.04.2 (tags/RELEASE_800/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
NOTE: error goes away by:
Hi,
May I ask how to define an emply mask or array?
for example:
MaskC c{true, true, true, true, true};
can I write it into something like:
MaskC c = MaskC(value = true, size = 5);
I am trying to auto-diff using DiffArray<FloatX>
defined below, instead of the usual DiffArray<CudaArray<float>>
:
#include <enoki/dynamic.h>
#include <enoki/autodiff.h>
using namespace enoki;
using Float = float;
// not working:
using FloatP = Packet<Float>;
using FloatX = DynamicArray<FloatP>;
using FloatD = DiffArray<FloatX>;
// working:
// using FloatD = DiffArray<Float>;
int main()
{
FloatD x = 1.f;
set_requires_gradient(x);
FloatD y = 10.f * x;
backward(y);
std::cout << y << std::endl;
std::cout << gradient(x) << std::endl;
}
Compiled with Clang++-9 on Ubuntu 18.04, linked with libenoki-autodiff.so
, libenoki-cuda.so
and cuda.so
(the last two may not need; but just adding FYI). Which gives:
/tests/enoki/CMakeFiles/test_examples.dir/examples.cpp.o: In function `main':
examples.cpp:(.text+0x30): undefined reference to `enoki::Tape<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::get()'
examples.cpp:(.text+0x3f): undefined reference to `enoki::Tape<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::append_leaf(unsigned long)'
examples.cpp:(.text+0x7f): undefined reference to `enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::mul_(enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > > const&) const'
examples.cpp:(.text+0x89): undefined reference to `enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::~DiffArray()'
examples.cpp:(.text+0x9a): undefined reference to `enoki::Tape<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::backward(unsigned int, bool)'
examples.cpp:(.text+0x104): undefined reference to `enoki::Tape<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::gradient(unsigned int)'
examples.cpp:(.text+0x166): undefined reference to `enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::~DiffArray()'
examples.cpp:(.text+0x16e): undefined reference to `enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::~DiffArray()'
examples.cpp:(.text+0x1cd): undefined reference to `enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::~DiffArray()'
examples.cpp:(.text+0x1d5): undefined reference to `enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::~DiffArray()'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
If instead using FloatD = DiffArray<Float>;
, then it works. How to fix this link error for DiffArray<FloatX>
defined above?
Hello,
I have a use case in which I iterative over a const range, do some transformation on it and store it another range. The original range is const float.
I modelled this as:
using FloatArray = enoki::Array<float, 8, true, enoki::RoundingMode::Default>;
enoki::DynamicArray<const FloatArray> input;
enoki::DynamicArray<FloatArray> destination;
and I do DynamicArray::map
over my float ranges like this:
void foo(const float* input, size_t s) {
input = enoki::DynamicArray<const FloatArray>(input, s);
}
However, map
signature is:
static Derived map(void *ptr, size_t size) {
and I get a compiler error that I am casting away my constness.
If I change this to:
template <typename T>
static Derived map(T *ptr, size_t size) {
everything works out since now template type T has constness in it.
Would you be open to accept this change as a PR?
I wrote a simple test to try enoki. However, I am unable to perform simple comparison operations due to type differences. Documentation states that return type of operator<
and neq
is mask_t<Array>
. However, types of result1
and result2
variable in the following code are different.
import enoki as ek
def myfunc(arr1, arr2):
result1 = ek.dot(arr1, arr1) < 0
result2 = ek.neq(arr2, 0)
print(type(result1), type(result2))
return result1, result2
def test_scalar():
from enoki import scalar
arr1 = scalar.Vector1f([1])
arr2 = scalar.Vector1f([2])
res = myfunc(arr1, arr2)
def test_cuda():
from enoki import cuda
arr1 = cuda.Vector1f([1])
arr2 = cuda.Vector1f([2])
res = myfunc(arr1, arr2)
if __name__ == '__main__':
test_scalar()
test_cuda()
Output in scalar mode gives below. According to my understanding this is because the output of dot
operation is converted to py::float
. Is there a way to perform comparison without explicitly casting to bool
in this case?
<class 'bool'> <class 'enoki.scalar.Vector1m'>
Output in cuda mode gives below. The difference between these types is unclear to me. Can you kindly give more details?
class 'enoki.cuda.Mask'> <class 'enoki.cuda.Vector1m'>
Hey all!
First of all thank you very much for publishing/releasing mitsuba2!
I wanted to start experimenting with inverse rendering and tried multiple platforms (Google Colab and my own hardware), but I keep facing the exact same issue everywhere:
import mitsuba
mitsuba.set_variant('gpu_autodiff_rgb')
# The C++ type associated with 'Float' is enoki::DiffArray<enoki::CUDAArray<float>>
from mitsuba.core import Float
import enoki as ek
# Initialize a dynamic CUDA floating point array with some values
x = Float([1, 2, 3])
# Tell Enoki that we'll later be interested in gradients of
# an as-of-yet unspecified objective function with respect to 'x'
ek.set_requires_gradient(x)
# Example objective function: sum of squares
y = ek.hsum(x * x)
PTX linker error:
ptxas fatal : SM version specified by .target is higher than default SM version assumed
cuda_check(): driver API error = 0400 "CUDA_ERROR_INVALID_HANDLE" in ../ext/enoki/src/cuda/jit.cu:253.
I've tried different GPU's and the results are:
GPU | Driver version | CUDA version | Result | Computing Capability |
---|---|---|---|---|
Geforce 940M | 440.64 | 10.0.130 | Fails | 5.0 |
K80 | 418.67 | 10.0.130 | Fails | 3.7 |
Tesla P4 | 418.67 | 10.0.130 | WORKS | 6.1 |
P100 | 418.67 | 10.0.130 | Fails | 6.0 |
-> The weird thing is that the issue does not occur on a Tesla P4 but it does on all the others
Does anyone have an idea what can cause this and how I can fix it?
Thanks a lot! Pieterjan
Slicing a dynamically-sized mask (mask_t<FloatX>
) returns a float &
(see Example 1). This makes sense given that masks are stored using their underlying type's registers and slice
needs to return a reference.
But then, in the slicing operator defined by ENOKI_STRUCT_DYNAMIC
,
template <typename T> \
static ENOKI_INLINE auto slice(T &&value, size_t index) { \
constexpr static bool co_ = std::is_const< \
std::remove_reference_t<T>>::value; \
using Value = Struct<decltype(enoki::slice(std::declval< \
std::conditional_t<co_, const Args &, Args &>>(), index))...>; \
return Value{ ENOKI_MAP_EXPR_F2(enoki::slice, value, index, \
__VA_ARGS__) }; \
}
the following becomes problematic:
MyStruct(
slice(value.arg1, index), ...,
// Trying to initialize a `mask_t<Float &>` with a `Float &`
slice(value.some_mask, index)
)
Would there be a way to initialize the mask with a reference to the underlying storage directly?
This problem occurs in Mitsuba 2, see Example 2.
#include <iostream>
#include <vector>
#include <enoki/array.h>
using namespace enoki;
namespace {
constexpr size_t PacketSize = enoki::max_packet_size / sizeof(float);
using Float = float;
using FloatP = Packet<Float, 4>;
using FloatX = DynamicArray<FloatP>;
template <typename T>
ENOKI_NOINLINE void print(const T &val) {
std::cout << val << std::endl;
}
} // namespace
int main() {
mask_t<FloatX> masks;
set_slices(masks, 4);
masks = false; masks[1] = true;
auto mask = slice(masks, 1);
print(mask);
print(typeid(mask).name());
print(masks.coeff(0));
print(masks.coeff(1));
print(typeid(masks.coeff(1)).name());
return 0;
}
Result:
1
f
0
1
f
Usage in Mitsuba 2 that triggers this issue:
// records.h
ENOKI_STRUCT_DYNAMIC(mitsuba::PositionSample, ...)
// Example usage that would trigger compilation error
Position3fX pos;
auto p = slice(pos, 1);
// Actual usage: python/records.cpp
bind_slicing_operators<PositionSample<Point3fX>>();
I really like the enoki with the template design which can be used on multiple platform and autodiff. I want to use this as the base of my fluid simulation code. so I just make a little tests about the efficiency:
std::array<float, 3> srgb_gamma(std::array<float, 3> x) {
std::array<float, 3> result;
for (int i = 0; i < 3; i++) {
if (x[i] <= 0.0031308f)
result[i] = x[i] * 12.92f;
else
result[i] = std::pow(x[i] * 1.055f, 1.f / 2.4f) - 0.055f;
}
return result;
}
I handwrite a function and compare the code given in the tutorial, I loop for 10000times and find my test is 100x faster than enoki(without -msse4), 20xfaster(with -msse4), I can't figure out why? Does I miss something in compile flag?
e.g.
FloatD cuda_diff = ...;
FloatC cuda = cuda_diff.val();
or
Vector3fD 3d_diff = ...;
Vector3fC 3d = 3d_diff.val();
I am rendering a gradient image using enoki CUDA array. Is there any suggestion on how to store the c++ cuda array FloatC and FloatD (or vector) into python so I can call backward in python for optimization? I didn't see there is a binding for that in enoki/python.h
For example:
FloatC arr = {0.0f, 1.0f, 2.0f, 3.0f, 4.0f};
MaskC msk = {0, 0, 1, 1, 0};
FloatC flitered_arr = do_something(arr, msk);
flitered_arr is {0.0f, 0.0f, 2.0f, 3.0f, 0.0f};
or
FloatC flitered_arr = do_something2(5.0f, arr, msk);
flitered_arr is {5.0f, 5.0f, 2.0f, 3.0f, 5.0f};
Hello,
As the documentation says, 3D arrays are treated as 4D arrays to make better used of intrinsics, but this raises an interesting problem.
Consider the following code:
using Array = enoki::Array<float, 3>;
Array numerator{2, 4, 8};
Array denominator{1, 1, 1};
auto result = numerator / denominator;
if (std::fetestexcept(FE_INVALID)) {
throw std::runtime_error("domain error");
}
This throws the exception because the last number in the Register is initialized to 0, and this leads to a division by zero. Note that this is not limited to division. Any operation on the last number (things like min, max) also trigger than exception.
We do a bunch of floating point computation and like to keep the floating point exception check to verify we didn't mess up.
I was wondering what is your suggestion to handle cases like this?
Hi,
I would like to do dereference, something like this...
using PtrP = enoki::Packet<uintptr_t, 8>;
using Uint16P = enoki::Packet<uint16_t, 8>;
PtrP p = fun();
Uint16P i = *p;
But can't compile this code.
Secondly, I think enoki::gather<>() fits for this situation.
Uint16P i = enoki::gather<Uint16P>(p,0);
But it doesn't return desired result.
Maybe gather<> presuppose pointer is not packet, I thought.
Any solution?
Thanks.
Could Enoki be used to to apply a function to a grid when the function requires access to data points and their neighbors (typically when computing a numerical scheme) ? And what would be the preferred method ?
Should I extract the neighbors manually with a loop (in order to build one array by neighbor position) followed with the function application ?
Or is there a better way ? Maybe using a precomputed array of neighbours index ?
An example would be great as, if it is efficiently doable, that would be a great use case for Enoki.
It is possible to seamlessly use normal cpu threading library together with this library?
Or should the array object basically be treated as sequential only, thread-private object?
Hello,
First, thanks for making this public.
While reading the documentation, I noticed this:
Performing an aligned load from an unaligned memory address will cause a general protection fault that immediately terminates the application.
and correctly, doing an avx512 load on an unaligned memory causes the application to segfault. Would it be possible to add assert( ptr % n == 0)
whenever you do ENOKI_ASSUME_ALIGNED
?
e.g.
static ENOKI_INLINE Derived load_(const void *ptr) {
return _mm512_load_ps((const Value *) ENOKI_ASSUME_ALIGNED(ptr, 64));
}
to
static ENOKI_INLINE Derived load_(const void *ptr) {
assert((uintptr_t) ptr % 64 == 0);
return _mm512_load_ps((const Value *) ENOKI_ASSUME_ALIGNED(ptr, 64));
}
This catches the problem in debug build.
Is there a good method to copy a FloatC into a cpu array?
I am currently using pytorch's .cpu(), but it seems can be very slow if the graph is too complex.
Is there any better way either in c++ or python?
I tried
float arr[4] = { 10.f, 20.f, 30.f, 40.f };
float* ptr;
ptr = arr;
FloatC arr_enoki = load<float>(ptr);
std::cout << "see if this is working" << std::endl;
std::cout << arr_enoki << std::endl; // [10]
I am importing the Enoki in python and using dynamic array. I find it only uses one core of cpu.
Should I use multithread myself or Enoki surpport multi-core parallal?
I have multi GPU, can I specify which GPU Enoki uses?
Thanks for your help!
for example
A = [1, 2, 3, 4, 5]
I = [0, 1, 1, 2]
then A[I] = [A[0], A[1], A[1], A[2]] = [1, 2, 2, 3]
or
FloatD some = {2.3f, 3.4f, 4.5f, 5.6f, 6.7f, 7.8f};
IntC index = {1,1,3,3,5};
FloatD check = some[index];
value of check should be: [3.4f, 3.4f, 5.6f, 5.6f, 7.8f]
Trying to do this:
FloatD arr = {1.f,3.f,2.f,5.f,6.f,2.f,6.f};
FloatD t = 0.0f;
set_requires_gradient(t);
arr[2] = arr[2] + t
(error: lvalue required as left operand of assignment)
Hello,
I have started using enoki in my project. Right now, I have basically cloned the entire repo into a subfolder and included enoki in my include paths. (using old cmake way).
It would be really nice to make it cmake installable. (Internally we use conan as package manager, and making a conan recipe of a project which is cmake installable is straightforward).
Essentially it would be nice if we can do this:
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/my/path/ -G ninja ..
ninja
ninja install
I don't know enough cmake to do this myself though :/
PS: Some info: http://mariobadr.com/creating-a-header-only-library-with-cmake.html
I really like your design! Do you have any benchmarks available for example problems like the graphs in Figure 8 of the Stan math paper?
The question might be a little confusing but it is something related to rendering an image on GPU. Is there any suggestion on how to do that?
for example, if a want to render an image 600*800 with 32spp, is it a wise idea to create a CUDA array with size 600*800*32?(just assume the GPU can handle that) and use some method to take the average of that into a 600*800 size array? Is there any function to do that?
Also, the Cuda array has a gradient.
Thanks
I can'nt use Enoki(release 0.1) on VC2019.
Here is the smallest reproduction code
enoki::Packet<float, 8> x(1.0f);
enoki::pow(x, x);
Thanks.
I came across a potential Mask
and / or select
bug in the Mitsuba2 codebase. Here is an MVE:
#include <iostream>
#include <enoki/array.h>
using namespace enoki;
namespace {
using Float = float;
constexpr size_t PacketSize = enoki::max_packet_size / sizeof(Float);
using Point4f = Packet<Float, 4>;
using MyMask = mask_t<Point4f>;
template <typename T>
void print(const T &val) {
std::cout << val << std::endl;
}
} // namespace
int main() {
MyMask m1(true);
MyMask m2(true | true);
print("--- These two masks look identical:");
print(m1);
print(m2);
print("--- But maybe they aren't?");
print(m1 & m2);
print(all(eq(m1, m2)));
print("--- Using scalars (for reference):");
print(select(true, 1.0f, 0.0f));
print(select(true | true, 1.0f, 0.0f));
print("--- Now, using packets:");
print(select(m1, Point4f(1.0f), Point4f(0.0f)));
print(select(m2, Point4f(1.0f), Point4f(0.0f))); // Unexpected result here.
}
Running it outputs:
--- These two masks look identical:
[1, 1, 1, 1]
[1, 1, 1, 1]
--- But maybe they aren't?
[1, 1, 1, 1]
0
--- Using scalars (for reference):
1
1
--- Now, using packets:
[1, 1, 1, 1]
[0, 0, 0, 0]
Note how the last line's result is unexpected.
Inspecting the masks in LLDB, they indeed look different:
(lldb) p m1
((anonymous namespace)::MyMask) $0 = {
enoki::StaticMaskImpl<float, 4, true, enoki::RoundingMode::Default, enoki::PacketMask<float, 4, true, enoki::RoundingMode::Default>, void> = {
enoki::StaticArrayImpl<float, 4, true, enoki::RoundingMode::Default, enoki::PacketMask<float, 4, true, enoki::RoundingMode::Default> > = {
m = (NaN, NaN, NaN, NaN)
}
}
}
(lldb) p m2
((anonymous namespace)::MyMask) $1 = {
enoki::StaticMaskImpl<float, 4, true, enoki::RoundingMode::Default, enoki::PacketMask<float, 4, true, enoki::RoundingMode::Default>, void> = {
enoki::StaticArrayImpl<float, 4, true, enoki::RoundingMode::Default, enoki::PacketMask<float, 4, true, enoki::RoundingMode::Default> > = {
m = (0.00000000000000000000000000000000000000000000140129846, 0.00000000000000000000000000000000000000000000140129846, 0.00000000000000000000000000000000000000000000140129846, 0.00000000000000000000000000000000000000000000140129846)
}
}
}
I think that LLDB's printers don't print the mask entries' correctly anyway, but at least we can confirm that they are different.
It would be great to have some instructions (using cmake) to build and run a trivial c++ example to get started, is there any?
For example a simple scalar CPU example of the Automatic differentiation as described in https://enoki.readthedocs.io/en/master/demo.html
(without requiring a complicated test framework or other dependencies)
I tried running cmake, only got a 'mkdoc' project (aside from ALL_BUILD, ZERO_CHECK) that fails
(1>ImportError: No module named 'guzzle_sphinx_theme'), and I'm likely not interested in creating docs.
Enabling 'ENOKI_TEST' didn't create a new build target.
Thanks!
After reading through Enoki's documentation, I am still somewhat confused how one would implement a prefix sum or create a summed area table. What is the most idiomatic way to perform a prefix sum?
What's the correct way to work with dynamic arrays of complex numbers?
AFAICS there are two possible ways: Complex<DynamicArray<FloatP>>
andDynamicArray<Complex<FloatP>>
. The first seems to work somehow, but unfortunately it is not possible to use the map
function with it. The second version, on the other hand, allows me to map existing memory but fails with most other functions.
Thanks in advance!
Melissa O'Neil's PCG family of pseudo-random number generators have some dubious claims about their speed and quality. There are multiple reviews that call them into question (for example, here). It appears that xorshift-derived generators (like xoshiro
) are better in every regard. Perhaps it would be worth adding more choice to random.h
, or even removing PCG from it.
I asked a question before that IntC::copy can copy a int* into IntC on gpu.
Is there any method to copy a IntC into int*?
The documentation references these functions https://enoki.readthedocs.io/en/master/reference.html#memory-allocation but I can't find them in the code.
Could you help me to clarify this behavior:
using Mat22f = enoki::Matrix< f32, 2 >;
using Mat22fBuffer = enoki::DynamicArray< enoki::Packet< Mat22f, 2 > >; // With 2
Mat22fBuffer x, y;
y = x;
// Compile time error:
array_generic.h(464,59): error C2440: '<function-style-cast>': cannot convert from 'enoki::Array<eMV::f32,2>' to 'enoki::Matrix<eMV::f32,2>'
array_generic.h(414,1): message : No constructor could take the source type, or constructor overload resolution was ambiguous
array_generic.h(344): message : see reference to function template instantiation 'void enoki::StaticArrayImpl<Value_,2,false,enoki::Packet<Value_,2>,int>::assign_<eMV::Mat22f&,0,1>(T,std::integer_sequence<size_t,0,1>)' being compiled
1> with
1> [
1> Value_=eMV::Mat22f,
1> T=eMV::Mat22f &
1> ]
With:
using Mat22f = enoki::Matrix< f32, 2 >;
using Mat22fBuffer = enoki::DynamicArray< enoki::Packet< Mat22f, 2 > >; // With 1
Mat22fBuffer x, y;
y = x;
Compile with no problem.
README.md:
$ git clone --rescursive https://github.com/mitsuba-renderer/enoki
rescursive -> recursive
I get an UndefinedBehaviourSanitizer hit from Google's sanitizer (https://github.com/google/sanitizers) when initializing a dynamic array of a struct containing bool values to a number of slices not a multiple of the packet size.
Taking the example from here https://enoki.readthedocs.io/en/master/dynamic.html?highlight=bool_array_t#custom-dynamic-data-structures, if you do something like
using FloatP = Packet<float, 4>;
using FloatX = DynamicArray<FloatP>;
using GPSCoord2fX = GPSCoord2<FloatX>;
GPSCoord2fX coord;
set_slices(coord, 1001);
UBSAN will fire saying:
enoki/array_fallbacks.h:495:16: runtime error: load of value 190, which is not a valid value for type 'const bool'
I dug a little into this and traced it down to the clean_trailing_() function in dynamic.h, specifically this line;
store(addr, load<Packet>(addr) & mask);
Something weird is happening with the types here that it doesn't like. I think load<Packet>(addr)
causes a read of uninitialized bool values, which are then put into the &
expression at array_fallbacks.h:495. A workaround is changing the Bool type in the struct to:
using Bool = enoki::replace_scalar_t<Value, uint8_t>;
This avoids the UB and functions as you'd expect.
Consider this simple code:
void bar(const char* src, int src_size, char* dst, int dst_size) {
assert(src_size == dst_size);
for (int i = 0; i < src_size; ++i) {
*dst++ = *src++;
}
}
this code generates the following assembly (only the loop part is shown here):
40d7c8: c5 fe 6f 04 07 vmovdqu ymm0,YMMWORD PTR [rdi+rax*1]
40d7cd: c5 fe 7f 04 02 vmovdqu YMMWORD PTR [rdx+rax*1],ymm0
40d7d2: 48 83 c0 20 add rax,0x20
40d7d6: 48 39 c8 cmp rax,rcx
40d7d9: 75 ed jne 40d7c8 <bar(char const*, int, char*, int)+0x28>
gcc is smart enough to vectorize this loop and copy chunks of 32 bytes.
Now consider this code written with enoki:
void foo(const char* src, int src_size, char* dst, int dst_size) {
using Array = enoki::Array<int, 8>;
auto es = enoki::DynamicArray<Array>::map(src, src_size);
auto ed = enoki::DynamicArray<Array>::map(dst, dst_size);
for (int i = 0; i < (int)es.packets(); ++i) {
const auto& pkt = es.packet(i);
auto& dst_pkt = ed.packet(i);
dst_pkt = pkt;
}
}
This code generated this assembly:
40d850: c5 fd 6f 04 07 vmovdqa ymm0,YMMWORD PTR [rdi+rax*1]
40d855: c5 fd 7f 04 02 vmovdqa YMMWORD PTR [rdx+rax*1],ymm0
40d85a: 48 83 c0 20 add rax,0x20
40d85e: 48 39 c8 cmp rax,rcx
40d861: 75 ed jne 40d850 <foo(char const*, int, char*, int)+0x20>
So almost the same code (except for aligned read).
Now I wanted to change the code so use two ymm
registers to unroll this loop further. so I changed the Array
in above code to
using Array = enoki::Array<int, 16>;
The assembly generated with 16 byte array is this:
40d850: c5 f9 6f 04 07 vmovdqa xmm0,XMMWORD PTR [rdi+rax*1]
40d855: c5 f8 29 04 02 vmovaps XMMWORD PTR [rdx+rax*1],xmm0
40d85a: c5 f9 6f 4c 07 10 vmovdqa xmm1,XMMWORD PTR [rdi+rax*1+0x10]
40d860: c5 f8 29 4c 02 10 vmovaps XMMWORD PTR [rdx+rax*1+0x10],xmm1
40d866: c5 f9 6f 54 07 20 vmovdqa xmm2,XMMWORD PTR [rdi+rax*1+0x20]
40d86c: c5 f8 29 54 02 20 vmovaps XMMWORD PTR [rdx+rax*1+0x20],xmm2
40d872: c5 f9 6f 5c 07 30 vmovdqa xmm3,XMMWORD PTR [rdi+rax*1+0x30]
40d878: c5 f8 29 5c 02 30 vmovaps XMMWORD PTR [rdx+rax*1+0x30],xmm3
40d87e: 48 83 c0 40 add rax,0x40
40d882: 48 39 c8 cmp rax,rcx
40d885: 75 c9 jne 40d850 <foo(char const*, int, char*, int)+0x20>
So instead of using two ymm registers, it uses 4 xmm registers. I find this quite odd. Do you have any idea why did enoki do that?
Could you please briefly explain how enoki compares with Halide?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.