Giter Site home page Giter Site logo

getkeops / keops Goto Github PK

View Code? Open in Web Editor NEW
1.0K 1.0K 65.0 21.43 MB

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows

Home Page: https://www.kernel-operations.io

License: MIT License

C++ 3.33% Shell 0.65% Python 64.11% R 31.47% TeX 0.10% Dockerfile 0.34%

keops's People

Contributors

adam-coogan avatar amelievernay avatar bcharlier avatar chloesrcb avatar davidlapous avatar djsutherland avatar dogukantai avatar dvolgyes avatar fradav avatar fwilliams avatar gdurif avatar haguettaz avatar jeanfeydy avatar joanglaunes avatar keckj avatar kpoeppel avatar kshitij12345 avatar louis-pujol avatar mdiazmel avatar mvinyard avatar rubenalv avatar tanglef avatar turakar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keops's Issues

Impossible to install RKeops

When I try
install.packages("rkeops")
I get

Warning in install.packages :
  package ‘rkeops’ is not available (for R version 3.6.2) 

When I try
devtools::install_git("https://github.com/getkeops/keops", subdir = "rkeops", args="--recurse-submodules='keops/lib/sequences'")
I get

Erreur : Failed to install 'unknown package' from Git:
  Error in 'git2r_remote_ls': there is no TLS stream available

I'm running R3.6.2 on Ubuntu 18.04.4 :

platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          6.2                         
year           2019                        
month          12                          
day            12                          
svn rev        77560                       
language       R                           
version.string R version 3.6.2 (2019-12-12)
nickname       Dark and Stormy Night 

Any idea on the origin of the problem ?

pytorch_scatter improvement

Hello there,

I would like to ask some questions about keops.

I have been using a lot pytorch geometric for my work on graphs.
It uses pytorch scatter has its core: https://github.com/rusty1s/pytorch_scatter
And the MessagePassing https://github.com/rusty1s/pytorch_geometric/blob/master/torch_geometric/nn/conv/message_passing.py which is using torch.select_index for the message.

I also found this paper implementing a smarter hierarchical scatter method
image

I was wondering if keops could be used to implement a symbolic message function and maybe also the HAG aggregation within a new pytorch_scatter.

I would not only reduce drastically the memory, but also could speed up training / inference.

What are you thoughts on that ?

Best,
Thomas Chaton.

PyTorch 1.3. Deprecation Warnings

Hi,

with the new release of PyTorch 1.3, the usage of data<...>() is now deprecated in favor of data_ptr<...>(). This results in a bunch of warnings when compiling KeOps kernels. Other libraries solve this problem via PyTorch version checking, e.g., see here.

This KeOps shared object has been compiled without cuda - Failed to build bindings

Hello friends,
Thanks a lot for keops, amazing library, and great examples:)
Unfortunately, I couldn't build this example, nor the pykeops.test_torch_bindings()

Following are my machine specs, and the error itself
nvcc version -

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

gcc

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 

g++

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

clang

clang version 8.0.0-3~ubuntu18.04.2 (tags/RELEASE_800/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

cmake

cmake version 3.10.2

CMake suite maintained and supported by Kitware (kitware.com/cmake).

pytorch 1.4

(testenv2) name@station:~/repos/docBert$ python
Python 3.7.7 (default, May  6 2020, 10:21:04) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pykeops
>>> 
>>> pykeops.verbose = True
>>> pykeops.clean_pykeops()  
/home/name/.cache/pykeops-1.4-cpython-37/libKeOpstorchc33cb27a33.so has been removed.
/home/name/.cache/pykeops-1.4-cpython-37/libKeOpstorchc33cb27a33.cpython-37m-x86_64-linux-gnu.so has been removed.
>>> pykeops.test_torch_bindings() 
Compiling libKeOpstorch11f5758313 in /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float32
... -- The CXX compiler identification is GNU 7.3.0
-- Check for working CXX compiler: /home/name/anaconda3/envs/testenv2/bin/x86_64-conda_cos6-linux-gnu-c++
-- Check for working CXX compiler: /home/name/anaconda3/envs/testenv2/bin/x86_64-conda_cos6-linux-gnu-c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Using shared_obj_name: libKeOpstorch11f5758313
-- First i variables detected is 0
-- First j variables detected is 1
-- Compiled formula is Sum_Reduction(SqNorm2(x - y),1); auto x = Vi(0,3); auto y = Vj(1,3); where the number of args is 2.
-- Found PythonInterp: /home/name/anaconda3/envs/testenv2/bin/python3.7 (found suitable version "3.7.7", minimum required is "3.7") 
-- Found PythonLibs: /home/name/anaconda3/envs/testenv2/lib/libpython3.7m.so
-- pybind11 v2.3.dev1
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313

/usr/bin/cmake -H/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops -B/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpstorch11f5758313
make[1]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
/usr/bin/cmake -H/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops -B/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles 4
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpstorch11f5758313.dir/all
make[2]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorch11f5758313.dir/build.make CMakeFiles/keopslibKeOpstorch11f5758313.dir/depend
make[3]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
cd /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops /home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/keopslibKeOpstorch11f5758313.dir/DependInfo.cmake --color=
Dependee "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/keopslibKeOpstorch11f5758313.dir/DependInfo.cmake" is newer than depender "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/keopslibKeOpstorch11f5758313.dir/depend.internal".
Dependee "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/keopslibKeOpstorch11f5758313.dir/depend.internal".
Scanning dependencies of target keopslibKeOpstorch11f5758313
make[3]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorch11f5758313.dir/build.make CMakeFiles/keopslibKeOpstorch11f5758313.dir/build
make[3]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
[ 25%] Building CXX object CMakeFiles/keopslibKeOpstorch11f5758313.dir/keops/core/link_autodiff.cpp.o
/home/name/anaconda3/envs/testenv2/bin/x86_64-conda_cos6-linux-gnu-c++  -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpstorch11f5758313 -DSUM_SCHEME=1 -DUSE_CUDA=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -D_GLIBCXX_USE_CXX11_ABI=0 -D__TYPEACC__=float -D__TYPE__=float -DkeopslibKeOpstorch11f5758313_EXPORTS -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/keops -I/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/torch/include -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/torch/include/torch/csrc/api/include  -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/name/anaconda3/envs/testenv/include -DUSE_OPENMP -fopenmp -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3 -fPIC   -include libKeOpstorch11f5758313.h -std=gnu++14 -o CMakeFiles/keopslibKeOpstorch11f5758313.dir/keops/core/link_autodiff.cpp.o -c /home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cpp
[ 50%] Linking CXX shared library libKeOpstorch11f5758313.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/keopslibKeOpstorch11f5758313.dir/link.txt --verbose=1
/home/name/anaconda3/envs/testenv2/bin/x86_64-conda_cos6-linux-gnu-c++ -fPIC -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/name/anaconda3/envs/testenv/include -DUSE_OPENMP -fopenmp -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3 -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/home/name/anaconda3/envs/testenv/lib -Wl,-rpath-link,/home/name/anaconda3/envs/testenv/lib -L/home/name/anaconda3/envs/testenv/lib -shared -Wl,-soname,libKeOpstorch11f5758313.so -o libKeOpstorch11f5758313.so CMakeFiles/keopslibKeOpstorch11f5758313.dir/keops/core/link_autodiff.cpp.o 
/usr/bin/cmake -E copy /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/libKeOpstorch11f5758313.so /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/../
make[3]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
[ 50%] Built target keopslibKeOpstorch11f5758313
/usr/bin/make -f CMakeFiles/libKeOpstorch11f5758313.dir/build.make CMakeFiles/libKeOpstorch11f5758313.dir/depend
make[3]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
cd /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops /home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/libKeOpstorch11f5758313.dir/DependInfo.cmake --color=
Dependee "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/libKeOpstorch11f5758313.dir/DependInfo.cmake" is newer than depender "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/libKeOpstorch11f5758313.dir/depend.internal".
Dependee "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles/libKeOpstorch11f5758313.dir/depend.internal".
Scanning dependencies of target libKeOpstorch11f5758313
make[3]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
/usr/bin/make -f CMakeFiles/libKeOpstorch11f5758313.dir/build.make CMakeFiles/libKeOpstorch11f5758313.dir/build
make[3]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
[ 75%] Building CXX object CMakeFiles/libKeOpstorch11f5758313.dir/torch/generic/generic_red.cpp.o
/home/name/anaconda3/envs/testenv2/bin/x86_64-conda_cos6-linux-gnu-c++  -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpstorch11f5758313 -DSUM_SCHEME=1 -DUSE_CUDA=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -D_GLIBCXX_USE_CXX11_ABI=0 -D__TYPEACC__=float -D__TYPE__=float -DlibKeOpstorch11f5758313_EXPORTS -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/keops -I/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/torch/include -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/pybind11/include -I/home/name/anaconda3/envs/testenv2/include/python3.7m  -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/name/anaconda3/envs/testenv/include -DUSE_OPENMP -fopenmp -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3 -fPIC -fvisibility=hidden   -flto -fno-fat-lto-objects -include torch_headers.h -std=gnu++14 -o CMakeFiles/libKeOpstorch11f5758313.dir/torch/generic/generic_red.cpp.o -c /home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.cpp
[100%] Linking CXX shared module libKeOpstorch11f5758313.cpython-37m-x86_64-linux-gnu.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/libKeOpstorch11f5758313.dir/link.txt --verbose=1
/home/name/anaconda3/envs/testenv2/bin/x86_64-conda_cos6-linux-gnu-c++ -fPIC -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/name/anaconda3/envs/testenv/include -DUSE_OPENMP -fopenmp -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3 -Wl,-rpath,$ORIGIN -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/home/name/anaconda3/envs/testenv/lib -Wl,-rpath-link,/home/name/anaconda3/envs/testenv/lib -L/home/name/anaconda3/envs/testenv/lib -shared  -o libKeOpstorch11f5758313.cpython-37m-x86_64-linux-gnu.so CMakeFiles/libKeOpstorch11f5758313.dir/torch/generic/generic_red.cpp.o -Wl,-rpath,/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313 -flto libKeOpstorch11f5758313.so 
/usr/bin/strip /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/libKeOpstorch11f5758313.cpython-37m-x86_64-linux-gnu.so
/usr/bin/cmake -E copy /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/libKeOpstorch11f5758313.cpython-37m-x86_64-linux-gnu.so /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/../
make[3]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
[100%] Built target libKeOpstorch11f5758313
make[2]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'
/usr/bin/cmake -E cmake_progress_start /home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313/CMakeFiles 0
make[1]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch11f5758313'

Done.
Traceback (most recent call last):
  File "/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/test/install.py", line 55, in test_torch_bindings
    if torch.allclose(my_conv(x, y).view(-1), torch.tensor(expected_res).type(torch.float32)):
  File "/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 396, in __call__
    device_id, ranges, self.accuracy_flags, *args)
  File "/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 44, in forward
    result = myconv.genred_pytorch(tagCPUGPU, tag1D2D, tagHostDevice, device_id, ranges, *args)
RuntimeError: [KeOps] This KeOps shared object has been compiled without cuda support: 
 1) to perform computations on CPU, simply set tagHostDevice to 0
 2) to perform computations on GPU, please recompile the formula with a working version of cuda.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/test/install.py", line 66, in test_torch_bindings
    print(my_conv(x, y))
  File "/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 396, in __call__
    device_id, ranges, self.accuracy_flags, *args)
  File "/home/name/anaconda3/envs/testenv2/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 44, in forward
    result = myconv.genred_pytorch(tagCPUGPU, tag1D2D, tagHostDevice, device_id, ranges, *args)
RuntimeError: [KeOps] This KeOps shared object has been compiled without cuda support: 
 1) to perform computations on CPU, simply set tagHostDevice to 0
 2) to perform computations on GPU, please recompile the formula with a working version of cuda.

minres solver

Hi,

Thank you for working on this amazing project! I've had a lot of luck using it for large-scale GP regression. I was wondering if there are any plans for implementing a minres solver? That would be very useful for kernels that are not necessarily positive definite. As an example, I'm interested in using KeOps for radial basis function interpolation with the conditionally positive definite cubic and thin-plate spline kernels. In this setting, you need to solve linear systems with a symmetric, but not positive definite, kernel matrix.

Best,
David

Feature Request: Support FP16

Hi!

Thank you for your this library, I greatly enjoy using it!

This is more of a request for KeOps to support FP16 in pytorch, so we could combine KeOps with apex for even faster GPU computation.

Thank you for you consideration!

Best regards,
Robert

Prebuilt binaries do not work with pytorch v1.5.0

Torch 1.5.0 has been released four days ago and breaks the installation of pykeops using pip.

The following commands install pykeops using pip

pip3.6 install pykeops
pip3.7 install pykeops
pip3.8 install pykeops

but yield the same error on module import for every python version I tried.

This seems to be linked to a change of interface in the torch library:
pykeops-1.4-cpython-38/libKeOpstorch4770b04be2.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZTIN3c1021AutogradMetaInterfaceE

A simple temporary fix is to downgrade the version of torch manually using pip3 install --upgrade pykeops torch==1.4.0

ImportError with PyInit_libKeOpsnumpy73a835aa5f module

Hello, great work !

I couldn't run the sample code below with cuda 10 and and cmake 3.14.4

(base) [hicham@gpuserver ~]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

I made sure I installed pykeops with all dependencies pykeops[full].

import numpy as np
import pykeops
pykeops.verbose = True
from pykeops.numpy import Genred

x = np.arange(1, 10).reshape(-1, 3).astype('float32')
y = np.arange(3, 9 ).reshape(-1, 3).astype('float32')

my_conv = Genred('-SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))

Here is the output:

(base) [hicham@gpuserver ~]$ python keopstest.py 
Compiling libKeOpsnumpy73a835aa5f in /home/hicham/.cache/pykeops-1.0.2/:
       formula: Sum_Reduction(-SqNorm2(x-y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... -- Compute properties automatically set to: -DMAXIDGPU=12;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152;-DMAXTHREADSPERBLOCK1=1024;-DSHAREDMEMPERBLOCK1=49152;-DMAXTHREADSPERBLOCK2=1024;-DSHAREDMEMPERBLOCK2=49152;-DMAXTHREADSPERBLOCK3=1024;-DSHAREDMEMPERBLOCK3=49152;-DMAXTHREADSPERBLOCK4=1024;-DSHAREDMEMPERBLOCK4=49152;-DMAXTHREADSPERBLOCK5=1024;-DSHAREDMEMPERBLOCK5=49152;-DMAXTHREADSPERBLOCK6=1024;-DSHAREDMEMPERBLOCK6=49152;-DMAXTHREADSPERBLOCK7=1024;-DSHAREDMEMPERBLOCK7=49152;-DMAXTHREADSPERBLOCK8=1024;-DSHAREDMEMPERBLOCK8=49152;-DMAXTHREADSPERBLOCK9=1024;-DSHAREDMEMPERBLOCK9=49152;-DMAXTHREADSPERBLOCK10=1024;-DSHAREDMEMPERBLOCK10=49152;-DMAXTHREADSPERBLOCK11=1024;-DSHAREDMEMPERBLOCK11=49152;-DMAXTHREADSPERBLOCK12=1024;-DSHAREDMEMPERBLOCK12=49152
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s):  6.0 6.0 6.0 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.7
-- Using shared_obj_name: libKeOpsnumpy73a835aa5f
-- pybind11 v2.2.4
-- Configuring done
-- Generating done
-- Build files have been written to: /home/hicham/.cache/pykeops-1.0.2

In file included from /home/hicham/miniconda3/lib/python3.7/site-packages/pykeops/numpy/generic/generic_red.cpp:2:0:
/home/hicham/miniconda3/lib/python3.7/site-packages/torch/include/pybind11/numpy.h:288:5: erreur: 'is_trivially_copyable' is not a member of 'std'
     std::is_trivially_copyable<T>,
     ^
/home/hicham/miniconda3/lib/python3.7/site-packages/torch/include/pybind11/numpy.h:288:5: erreur: 'is_trivially_copyable' is not a member of 'std'
compilation terminated due to -fmax-errors=2.
gmake[3]: *** [CMakeFiles/libKeOpsnumpy73a835aa5f.dir/numpy/generic/generic_red.cpp.o] Error 1
gmake[2]: *** [CMakeFiles/libKeOpsnumpy73a835aa5f.dir/all] Error 2
gmake[1]: *** [CMakeFiles/libKeOpsnumpy73a835aa5f.dir/rule] Error 2
gmake: *** [libKeOpsnumpy73a835aa5f] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpsnumpy73a835aa5f']' returned non-zero exit status 2.
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpsnumpy73a835aa5f.dir/keops/core/keopslibKeOpsnumpy73a835aa5f_generated_link_autodiff.cu.o
[ 40%] Linking CUDA device code CMakeFiles/keopslibKeOpsnumpy73a835aa5f.dir/cmake_device_link.o
[ 60%] Linking CXX shared library libKeOpsnumpy73a835aa5f.so
[ 60%] Built target keopslibKeOpsnumpy73a835aa5f
Scanning dependencies of target libKeOpsnumpy73a835aa5f
[ 80%] Building CXX object CMakeFiles/libKeOpsnumpy73a835aa5f.dir/numpy/generic/generic_red.cpp.o

--------------------- ----------- -----------------
Done. 
Traceback (most recent call last):
  File "/home/hicham/miniconda3/lib/python3.7/site-packages/pykeops/common/keops_io.py", line 45, in load_keops
    return importlib.import_module(dll_name)
  File "/home/hicham/miniconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 670, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 583, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1043, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: dynamic module does not define module export function (PyInit_libKeOpsnumpy73a835aa5f)

Sample Test and GeomLoss Sample Error

Hello,

Thank you very much for your amazing work!

I am running in Ubuntu 18.04.3. I installed Python3.7 using Anaconda. Then I installed CUDA 10.1 and Pytorch, and then KeOps. I did not receive any error when I install pykeops. But when I tested installation, I passed none of the scripts.

The error message from the Pytorch script was:

/usr/include/crt/host_config.h:121:2: error: #error -- unsupported GNU version! gcc versions later than 6 are not supported!
#error -- unsupported GNU version! gcc versions later than 6 are not supported!
^~~~~
CMake Error at keopslibKeOpstorch91c92bd508_generated_link_autodiff.cu.o.Release.cmake:219 (message):
Error generating
/home/velysianp/.cache/pykeops-1.2-cpython-37/build-
libKeOpstorch91c92bd508/CMakeFiles/keopslibKeOpstorch91c92bd508.dir/keops/core/./keopslibKeOpstorch91c92bd508_generated_link_autodiff.cu.o

make[3]: *** [CMakeFiles/keopslibKeOpstorch91c92bd508.dir/keops/core/keopslibKeOpstorch91c92bd508_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpstorch91c92bd508.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpstorch91c92bd508.dir/rule] Error 2
make: *** [libKeOpstorch91c92bd508] Error 2

I checked my system with gcc --version, and it returned gcc-7.4.0 which seems to be Ubuntu 18.04.3 x86_64 default. So I am not sure what should I do with this error.

Then I also received error when I tried to run sample code from GeomLoss:

Traceback (most recent call last):
File "plot_optimal_transport_2D.py", line 149, in gradient_descent( SamplesLoss("sinkhorn", p=2, blur=.1) )
File "plot_optimal_transport_2D.py", line 107, in gradient_descent L_αβ = loss(x_i, y_j)
File "/home/me/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/me/anaconda3/lib/python3.7/site-packages/geomloss/samples_loss.py", line 237, in forward verbose = self.verbose )
File "/home/me/anaconda3/lib/python3.7/site-packages/geomloss/sinkhorn_samples.py", line 102, in sinkhorn_online C_xx, C_yy, C_xy, C_yx, ε_s, ρ, debias=debias )
File "/home/me/anaconda3/lib/python3.7/site-packages/geomloss/sinkhorn_divergence.py", line 151, in sinkhorn_loop a_x = λ * softmin(ε, C_xx, α_log ) # OT(α,α)
File "/home/me/anaconda3/lib/python3.7/site-packages/geomloss/sinkhorn_samples.py", line 69, in softmin_online
return - ε * log_conv( x, y, f_y.view(-1,1), torch.Tensor([1/ε]).type_as(x) ).view(-1)
File "/home/me/anaconda3/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 351, in call out = GenredAutograd.apply(self.formula, self.aliases, backend, self.dtype, device_id, ranges, *args)
File "/home/me/anaconda3/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 43, in forward *args)
RuntimeError: [KeOps] This KeOps shared object has been compiled without cuda support: try to set tagHostDevice to 0 or recompile the formula with a working version of cuda.

Also, I checked my CUDA and Pytorch installation, it turned out that they are working all right. So I am really curious what should I do with those errors?

Thank you very much!
Elyson

Compilation error on test script

I seem to be facing a rather weird issue on my local test with CUDA 10.2, gcc5.4, cmake3.10 and cmake3.12 (Fails with both cmakes)

>>> import numpy as np
>>> import pykeops.numpy as pknp
>>> x = np.arange(1, 10).reshape(-1, 3).astype('float32')
>>> y = np.arange(3, 9).reshape(-1, 3).astype('float32')
>>> my_conv = pknp.Genred('SqNorm2(x - y)', ['x = Vi(3)', 'y = Vj(3)'])
Compiling libKeOpsnumpy5ac3d464a2 in /home/cg260486/.cache/pykeops-1.2-cpython-35//build-libKeOpsnumpy5ac3d464a2:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... make: Warning: File 'Makefile' has modification time 54 s in the future
make[1]: Warning: File 'CMakeFiles/Makefile2' has modification time 54 s in the future
make[2]: Warning: File 'CMakeFiles/Makefile2' has modification time 54 s in the future
make[3]: Warning: File 'CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/flags.make' has modification time 54 s in the future
In file included from /usr/include/c++/5/type_traits:35:0,
                 from /neurospin/optimed/Chaithya/Environments/CSMRI_sparkling/venv/lib/python3.5/site-packages/pykeops/keops/lib/sequences/include/tao/seq/concatenate.hpp:7,
                 from /neurospin/optimed/Chaithya/Environments/CSMRI_sparkling/venv/lib/python3.5/site-packages/pykeops/keops/core/formulas/maths/TensorDot.h:8,
                 from /neurospin/optimed/Chaithya/Environments/CSMRI_sparkling/venv/lib/python3.5/site-packages/pykeops/keops/keops_includes.h:33,
                 from /home/cg260486/.cache/pykeops-1.2-cpython-35/build-libKeOpsnumpy5ac3d464a2/libKeOpsnumpy5ac3d464a2.h:13,
                 from <command-line>:0:
/usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support \
  ^
CMake Error at keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.Release.cmake:219 (message):
  Error generating
  /home/cg260486/.cache/pykeops-1.2-cpython-35/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/rule] Error 2
make: *** [libKeOpsnumpy5ac3d464a2] Error 2

I had faced similar issue while trying to build pykeops locally, and I found that adding -std=c++11 arg for nvcc helped. Now, I am not sure if this is an issue in keops or cmake. Note that this works fine on Google Colab. So I think this is surely more about my environment, and I could not find sufficient resources online other than -std=c++11 to fix this.

Matrix multiplication of two LazyTensors

Hello, thanks for the nice package!

I am running into a problem when trying to decompose a matrix as the product of two LazyTensors. The matrix I want to represent has shape [N, N] and can be expressed as the product of an [M, N] matrix and its transpose. The snippet below shows what I'd like to do in more detail:

import torch
from pykeops.torch import LazyTensor

# Set up inputs
M, N, d = 10, 5, 2
x, y = torch.rand([M, d]), torch.rand([N, d])

# Construct kernel matrix
x_i = LazyTensor(x[:, None, :])  # (M, 1, 2)
y_j = LazyTensor(y[None, :, :])  # (1, N, 2)
D_ij = ((x_i - y_j) ** 2).sum(-1)   # (M, N): squared distances
sqrt_K = (-D_ij).exp()

K = sqrt_K.t() @ sqrt_K  # does not work
# next step: run K.solve(...)

The last line does not work since __matmul__ calls view() on its argument, which is not supported by LazyTensor. I also don't see how to construct a reduction formula for the matrix K since it seems like these can only involve two indices, while three are needed here. Is there some other way I can construct the matrix K?

Eigen Values and eigen vectors with keops

I would like to implement a formula that involve eigen values and eigen vectors of the gram matrix. Do you think its possible to compute it with keops (e.g. using the KernelSolve) or not?

Compilation hangs for a specific example of tensordot

Hi. I think the following example makes the compilation hang

from pykeops.torch import LazyTensor
import torch as T

a = LazyTensor(T.rand(1, 1, 27 * 3 * 64).cuda())
b = LazyTensor(T.rand(2, 1, 20 * 8 * 27).cuda())
c = b.keops_tensordot(a, (20, 8, 27), (27, 3, 64), (2,), (0,))
output = c.sum(1)

The stdout is like this

Compiling libKeOpstorch5b4bdc3ab7 in /home/adnguyen/.cache/pykeops-1.4-cpython-37/build-libKeOpstorch5b4bdc3ab7:
       formula: Sum_Reduction(TensorDot(Var(0,4320,0), Var(1,5184,2), Ind(20,8,27), Ind(27,3,64), Ind(2), Ind(0)),0)
       aliases: Var(0,4320,0); Var(1,5184,2); 
       dtype  : float32
... 

I am able to run other tensordot examples as well as others, so I would not presume the problem is the environment. Please have a look at the problem. Thanks!

Error on GPU when trying to replicate the behaviour of torch.bmm().

Hello,
For the sake of it, I am trying to replicate the behavior of torch.bmm().
Here is a minimal example (with device and keops_backend appropriatly set):

formula = "TensorDot(a, b, Ind(2,2), Ind(2,2), Ind(1), Ind(0))"
alias = ["a=Vi(4)", "b=Vi(4)"]
keops_bmm = Genred(formula, alias, reduction_op='Sum', axis=1, dtype='float32')

A = torch.rand(N, 2, 2, device=device)
B = torch.rand(N, 2, 2, device=device)

print(torch.allclose(torch.bmm(A, B), keops_bmm(A.view(-1, 4), B.view(-1, 4), backend=keops_backend).view(-1, 2, 2)))

On CPU, the code works correctly. However, when choosing either GPU backends, the program stops and outputs "Instruction non permise (core dumped)".

Let me know if you need additional information.

Best regards,
Lex

ModuleNotFoundError: No module named 'libKeOpstorch99c715f463' while test bindings work

Hello.
In my previous issue I was not able to install keops python bindings, the problem was with cuda toolkit installed by conda not being sufficient, thus after installing nvcc using Nvidia drivers the following script finishes with not errors

import pykeops
pykeops.verbose = True
pykeops.clean_pykeops()  
pykeops.test_torch_bindings() 

Unfortunatly, I am trying to run the following example,
But receive the following error

Compiling libKeOpstorch99c715f463 in /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463:
       formula: Max_SumShiftExp_Reduction(((-(WeightedSqDist(G_0,X_0,Y_0))) + B_0),0)
       aliases: G_0 = Vj(0,4); X_0 = Vi(1,100); Y_0 = Vj(2,2); B_0 = Vj(3,1); 
       dtype  : float32
... /home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/maths/Subtract.h(35): error: static assertion failed with "Dimensions must be the same for Subtract"
          detected during:
            instantiation of class "keops::Subtract_Impl<FA, FB> [with FA=keops::_X<1, 100>, FB=keops::_Y<2, 2>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/norms/WeightedSqNorm.h(26): here
            instantiation of type "keops::WeightedSqNorm<keops::_Y<0, 4>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/norms/WeightedSqDist.h(14): here
            instantiation of type "keops::WeightedSqDist<keops::_Y<0, 4>, keops::_X<1, 100>, keops::_Y<2, 2>>" 
/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/libKeOpstorch99c715f463.h(27): here

/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/maths/Mult.h(30): error: static assertion failed with "Dimensions of FA and FB must be the same for Mult"
          detected during:
            instantiation of class "keops::Mult_Impl<FA, FB> [with FA=keops::_Y<0, 4>, FB=keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/autodiff/UnaryOp.h(45): here
            instantiation of class "keops::UnaryOp_base<OP, F, NS...> [with OP=keops::Sum, F=keops::Mult<keops::_Y<0, 4>, keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>>, NS=<>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/autodiff/UnaryOp.h(61): here
            instantiation of class "keops::UnaryOp<OP, F, NS...> [with OP=keops::Sum, F=keops::Mult<keops::_Y<0, 4>, keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>>, NS=<>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/maths/Sum.h(20): here
            instantiation of class "keops::Sum<F> [with F=keops::Mult<keops::_Y<0, 4>, keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/pre_headers.h(43): here
            instantiation of class "keops::KeopsNS<F> [with F=keops::WeightedSqDist<keops::_Y<0, 4>, keops::_X<1, 100>, keops::_Y<2, 2>>]" 
/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/libKeOpstorch99c715f463.h(27): here

2 errors detected in the compilation of "/tmp/tmpxft_00000684_00000000-6_link_autodiff.cpp1.ii".
CMake Error at keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.Release.cmake:279 (message):
  Error generating file
  /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpstorch99c715f463.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpstorch99c715f463.dir/rule] Error 2
make: *** [libKeOpstorch99c715f463] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorch99c715f463', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/usr/bin/cmake -H/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -B/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpstorch99c715f463
make[1]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
/usr/bin/cmake -H/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -B/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpstorch99c715f463.dir/all
make[2]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorch99c715f463.dir/build.make CMakeFiles/keopslibKeOpstorch99c715f463.dir/depend
make[3]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
cd /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core && /usr/bin/cmake -E make_directory /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/.
cd /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core && /usr/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Release -D generated_file:STRING=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.cubin.txt -P /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.Release.cmake
-- Removing /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
-- Generating dependency file: /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda-10.1/bin/nvcc -M -D__CUDACC__ /home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -o /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpstorch99c715f463_EXPORTS -DMAXIDGPU=3 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorch99c715f463 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_70,code=sm_70 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpstorch99c715f463.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops -I/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Generating temporary cmake readable file: /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp
/usr/bin/cmake -D input_file:FILEPATH=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/share/cmake-3.10/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp to /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend
/usr/bin/cmake -E copy_if_different /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend
-- Removing /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp and /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend
/usr/bin/cmake -E remove /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
/usr/local/cuda-10.1/bin/nvcc /home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpstorch99c715f463_EXPORTS -DMAXIDGPU=3 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorch99c715f463 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_70,code=sm_70 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpstorch99c715f463.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops -I/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Removing /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
CMakeFiles/keopslibKeOpstorch99c715f463.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/keopslibKeOpstorch99c715f463.dir/all' failed
make[2]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
CMakeFiles/Makefile2:79: recipe for target 'CMakeFiles/libKeOpstorch99c715f463.dir/rule' failed
make[1]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
Makefile:118: recipe for target 'libKeOpstorch99c715f463' failed

--------------------- ----------- -----------------
Done.
Traceback (most recent call last):
  File "/home/name/repos/docBert/models/gmm/torch_kops_gmm.py", line 210, in <module>
    cost = model.neglog_likelihood(word_features_reduced)  # Cost to minimize.
  File "/home/name/repos/docBert/models/gmm/torch_kops_gmm.py", line 156, in neglog_likelihood
    ll = self.log_likelihoods(sample)
  File "/home/name/repos/docBert/models/gmm/torch_kops_gmm.py", line 152, in log_likelihoods
    return kernel_product(self.params, sample, self.mu, self.weights_log(), mode='lse')
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/kernel_product/kernels.py", line 412, in kernel_product
    return FeaturesKP(kernel, gamma, x, y, bs, mode=mode, backend=backend, dtype=dtype)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/kernel_product/features_kernels.py", line 165, in FeaturesKP
    return genconv(*full_args, backend=backend)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 396, in __call__
    device_id, ranges, self.accuracy_flags, *args)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 22, in forward
    myconv = LoadKeOps(formula, aliases, dtype, 'torch', optional_flags).import_module()
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpstorch99c715f463'

Ideas?
Thanks!

python ~/repos/docBert/models/gmm/torch_kops_gmm.py 
Compiling libKeOpstorch99c715f463 in /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463:
       formula: Max_SumShiftExp_Reduction(((-(WeightedSqDist(G_0,X_0,Y_0))) + B_0),0)
       aliases: G_0 = Vj(0,4); X_0 = Vi(1,100); Y_0 = Vj(2,2); B_0 = Vj(3,1); 
       dtype  : float32
... /home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/maths/Subtract.h(35): error: static assertion failed with "Dimensions must be the same for Subtract"
          detected during:
            instantiation of class "keops::Subtract_Impl<FA, FB> [with FA=keops::_X<1, 100>, FB=keops::_Y<2, 2>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/norms/WeightedSqNorm.h(26): here
            instantiation of type "keops::WeightedSqNorm<keops::_Y<0, 4>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/norms/WeightedSqDist.h(14): here
            instantiation of type "keops::WeightedSqDist<keops::_Y<0, 4>, keops::_X<1, 100>, keops::_Y<2, 2>>" 
/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/libKeOpstorch99c715f463.h(27): here

/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/maths/Mult.h(30): error: static assertion failed with "Dimensions of FA and FB must be the same for Mult"
          detected during:
            instantiation of class "keops::Mult_Impl<FA, FB> [with FA=keops::_Y<0, 4>, FB=keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/autodiff/UnaryOp.h(45): here
            instantiation of class "keops::UnaryOp_base<OP, F, NS...> [with OP=keops::Sum, F=keops::Mult<keops::_Y<0, 4>, keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>>, NS=<>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/autodiff/UnaryOp.h(61): here
            instantiation of class "keops::UnaryOp<OP, F, NS...> [with OP=keops::Sum, F=keops::Mult<keops::_Y<0, 4>, keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>>, NS=<>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/formulas/maths/Sum.h(20): here
            instantiation of class "keops::Sum<F> [with F=keops::Mult<keops::_Y<0, 4>, keops::TensorProd<keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>, keops::Subtract<keops::_X<1, 100>, keops::_Y<2, 2>>>>]" 
/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/pre_headers.h(43): here
            instantiation of class "keops::KeopsNS<F> [with F=keops::WeightedSqDist<keops::_Y<0, 4>, keops::_X<1, 100>, keops::_Y<2, 2>>]" 
/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/libKeOpstorch99c715f463.h(27): here

2 errors detected in the compilation of "/tmp/tmpxft_00000684_00000000-6_link_autodiff.cpp1.ii".
CMake Error at keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.Release.cmake:279 (message):
  Error generating file
  /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpstorch99c715f463.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpstorch99c715f463.dir/rule] Error 2
make: *** [libKeOpstorch99c715f463] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorch99c715f463', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/usr/bin/cmake -H/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -B/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpstorch99c715f463
make[1]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
/usr/bin/cmake -H/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -B/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpstorch99c715f463.dir/all
make[2]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorch99c715f463.dir/build.make CMakeFiles/keopslibKeOpstorch99c715f463.dir/depend
make[3]: Entering directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
cd /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core && /usr/bin/cmake -E make_directory /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/.
cd /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core && /usr/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Release -D generated_file:STRING=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.cubin.txt -P /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.Release.cmake
-- Removing /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
-- Generating dependency file: /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda-10.1/bin/nvcc -M -D__CUDACC__ /home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -o /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpstorch99c715f463_EXPORTS -DMAXIDGPU=3 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorch99c715f463 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_70,code=sm_70 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpstorch99c715f463.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops -I/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Generating temporary cmake readable file: /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp
/usr/bin/cmake -D input_file:FILEPATH=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/share/cmake-3.10/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp to /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend
/usr/bin/cmake -E copy_if_different /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend
-- Removing /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp and /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend
/usr/bin/cmake -E remove /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.depend.tmp /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
/usr/local/cuda-10.1/bin/nvcc /home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpstorch99c715f463_EXPORTS -DMAXIDGPU=3 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorch99c715f463 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_70,code=sm_70 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpstorch99c715f463.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/keops -I/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463 -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include -I/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Removing /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463/CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/./keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o
CMakeFiles/keopslibKeOpstorch99c715f463.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpstorch99c715f463.dir/keops/core/keopslibKeOpstorch99c715f463_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/keopslibKeOpstorch99c715f463.dir/all' failed
make[2]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
CMakeFiles/Makefile2:79: recipe for target 'CMakeFiles/libKeOpstorch99c715f463.dir/rule' failed
make[1]: Leaving directory '/home/name/.cache/pykeops-1.4-cpython-36/build-libKeOpstorch99c715f463'
Makefile:118: recipe for target 'libKeOpstorch99c715f463' failed

--------------------- ----------- -----------------
Done.
Traceback (most recent call last):
  File "/home/name/repos/docBert/models/gmm/torch_kops_gmm.py", line 210, in <module>
    cost = model.neglog_likelihood(word_features_reduced)  # Cost to minimize.
  File "/home/name/repos/docBert/models/gmm/torch_kops_gmm.py", line 156, in neglog_likelihood
    ll = self.log_likelihoods(sample)
  File "/home/name/repos/docBert/models/gmm/torch_kops_gmm.py", line 152, in log_likelihoods
    return kernel_product(self.params, sample, self.mu, self.weights_log(), mode='lse')
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/kernel_product/kernels.py", line 412, in kernel_product
    return FeaturesKP(kernel, gamma, x, y, bs, mode=mode, backend=backend, dtype=dtype)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/kernel_product/features_kernels.py", line 165, in FeaturesKP
    return genconv(*full_args, backend=backend)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 396, in __call__
    device_id, ranges, self.accuracy_flags, *args)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 22, in forward
    myconv = LoadKeOps(formula, aliases, dtype, 'torch', optional_flags).import_module()
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/name/anaconda3/envs/testenv4/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpstorch99c715f463'

compilation error in backwards pass of sumsoftmaxweight

Hi,

thanks for the great library, I can still hardly believe the amazing performance.

When executing this script, I get an compilation error in the backwards pass through the sumsoftmaxweight reduction.

import torch
import pykeops
pykeops.verbose = True
from pykeops.torch import LazyTensor

N, D = 1000, 10
v = torch.randn((1, N, D), dtype=torch.float32, requires_grad=True).cuda()

v_i = LazyTensor(v[:, :, None])
v_j = LazyTensor(v[:, None, :])
D_ij = v_i - v_j

result = LazyTensor.sumsoftmaxweight(D_ij.sum(-1), D_ij, axis=1)

loss = (1. * result).sum()
print(f'loss: {loss}') # forward is succesful
loss.backward()

This is the output:

loss: 6528.2978515625
Compiling libKeOpstorch509fe71999 in /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37//build-libKeOpstorch509fe71999:
       formula: Grad_WithSavedForward(Max_SumShiftExpWeight_Reduction(Sum((Var(0,10,0) - Var(1,10,1))),1,Concat(IntCst(1),(Var(0,10,0) - Var(1,10,1)))), Var(0,10,0), Var(2,12,1), Var(3,12,1))
       aliases: Var(0,10,0); Var(1,10,1); Var(2,12,1); Var(3,12,1); 
       dtype  : float32
... /home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Extract.h(23): error: static assertion failed with "Index out of bound in Extract"
          detected during:
            instantiation of class "keops::Extract<F, START, DIM_> [with F=keops::Extract<keops::Var<2, 12, 1>, 1, 11>, START=1, DIM_=11]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Subtract.h(134): here
            instantiation of class "keops::Subtract_Alias<FA, keops::Zero<DIM>> [with FA=keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, DIM=10]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Subtract.h(23): here
            instantiation of type "keops::Subtract<keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, keops::IdOrZero<keops::Var<1, 10, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Subtract.h(48): here
            instantiation of type "keops::Subtract_Impl<FA, FB>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>> [with FA=keops::Var<0, 10, 0>, FB=keops::Var<1, 10, 1>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Concat.h(37): here
            instantiation of type "keops::Concat_Impl<F, G>::DiffTG<keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>> [with F=keops::IntConstant<1>, G=keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Concat.h(40): here
            [ 2 instantiation contexts not shown ]
            instantiation of type "keops::Grad<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Sum_Reduction.h(72): here
            instantiation of type "keops::Sum_Reduction_Impl<F, tagI>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>, void> [with F=keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, tagI=1]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(16): here
            instantiation of type "keops::Grad<keops::Sum_Reduction<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Max_SumShiftExp_Reduction.h(114): here
            instantiation of type "keops::Max_SumShiftExp_Reduction<F, tagI, G_>::DiffT<keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>> [with F=keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, tagI=1, G_=keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(20): here
            instantiation of type "keops::Grad_WithSavedForward<keops::Max_SumShiftExp_Reduction<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, 1, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>>" 
/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/libKeOpstorch509fe71999.h(21): here

/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Subtract.h(134): error: static assertion failed with "Dimensions must be the same for Subtract"
          detected during:
            instantiation of class "keops::Subtract_Alias<FA, keops::Zero<DIM>> [with FA=keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, DIM=10]" 
(23): here
            instantiation of type "keops::Subtract<keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, keops::IdOrZero<keops::Var<1, 10, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>>" 
(48): here
            instantiation of type "keops::Subtract_Impl<FA, FB>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>> [with FA=keops::Var<0, 10, 0>, FB=keops::Var<1, 10, 1>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Concat.h(37): here
            instantiation of type "keops::Concat_Impl<F, G>::DiffTG<keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>> [with F=keops::IntConstant<1>, G=keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Concat.h(40): here
            instantiation of type "keops::Concat_Impl<F, G>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>> [with F=keops::IntConstant<1>, G=keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Scal.h(59): here
            instantiation of type "keops::Scal_Impl<FA, FB>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>> [with FA=keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, FB=keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(16): here
            instantiation of type "keops::Grad<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Sum_Reduction.h(72): here
            instantiation of type "keops::Sum_Reduction_Impl<F, tagI>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>, void> [with F=keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, tagI=1]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(16): here
            instantiation of type "keops::Grad<keops::Sum_Reduction<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Max_SumShiftExp_Reduction.h(114): here
            instantiation of type "keops::Max_SumShiftExp_Reduction<F, tagI, G_>::DiffT<keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>> [with F=keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, tagI=1, G_=keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(20): here
            instantiation of type "keops::Grad_WithSavedForward<keops::Max_SumShiftExp_Reduction<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, 1, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>>" 
/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/libKeOpstorch509fe71999.h(21): here

/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Add.h(152): error: static assertion failed with "Dimensions must be the same for Add"
          detected during:
            instantiation of class "keops::Add_Alias<keops::Zero<DIM>, FB> [with FB=keops::Subtract<keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, keops::IdOrZero<keops::Var<1, 10, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>>, DIM=10]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/norms/Scalprod.h(33): here
            instantiation of type "keops::Add<keops::Zero<10>, keops::Subtract<keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, keops::IdOrZero<keops::Var<1, 10, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Concat.h(40): here
            instantiation of type "keops::Concat_Impl<F, G>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>> [with F=keops::IntConstant<1>, G=keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Scal.h(59): here
            instantiation of type "keops::Scal_Impl<FA, FB>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>> [with FA=keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, FB=keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(16): here
            instantiation of type "keops::Grad<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Sum_Reduction.h(72): here
            instantiation of type "keops::Sum_Reduction_Impl<F, tagI>::DiffT<keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>, void> [with F=keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, tagI=1]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(16): here
            instantiation of type "keops::Grad<keops::Sum_Reduction<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Max_SumShiftExp_Reduction.h(114): here
            instantiation of type "keops::Max_SumShiftExp_Reduction<F, tagI, G_>::DiffT<keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>> [with F=keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, tagI=1, G_=keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/autodiff/Grad.h(20): here
            instantiation of type "keops::Grad_WithSavedForward<keops::Max_SumShiftExp_Reduction<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, 1, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>>" 
/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/libKeOpstorch509fe71999.h(21): here

/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/formulas/maths/Add.h(41): error: static assertion failed with "Dimensions must be the same for Add"
          detected during:
            instantiation of class "keops::Add_Impl<FA, FB> [with FA=keops::Subtract<keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::SumT<keops::Mult<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Scalprod<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>>, 10>>, keops::IdOrZero<keops::Var<1, 10, 1>, keops::Var<0, 10, 0>, keops::SumT<keops::Mult<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Scalprod<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>>, 10>>>, FB=keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Add<keops::Zero<10>, keops::Subtract<keops::IdOrZero<keops::Var<0, 10, 0>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>, keops::IdOrZero<keops::Var<1, 10, 1>, keops::Var<0, 10, 0>, keops::Extract<keops::Extract<keops::Var<2, 12, 1>, 1, 11>, 1, 11>>>>>]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Reduction.h(26): here
            instantiation of class "keops::Reduction<F_, tagI_> [with F_=keops::Grad<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>, tagI_=0]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/reductions/Sum_Reduction.h(22): here
            instantiation of class "keops::Sum_Reduction_Impl<F, tagI> [with F=keops::Grad<keops::Scal<keops::Exp<keops::Subtract<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, keops::Extract<keops::Var<3, 12, 1>, 0, 1>>>, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Extract<keops::Var<2, 12, 1>, 1, 11>>, tagI=0]" 
/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/pre_headers.h(40): here
            instantiation of class "keops::KeopsNS<F> [with F=keops::Grad_WithSavedForward<keops::Max_SumShiftExp_Reduction<keops::Sum<keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>, 1, keops::Concat<keops::IntConstant<1>, keops::Subtract_Impl<keops::Var<0, 10, 0>, keops::Var<1, 10, 1>>>>, keops::Var<0, 10, 0>, keops::Var<2, 12, 1>, keops::Var<3, 12, 1>>]" 
/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/libKeOpstorch509fe71999.h(21): here

4 errors detected in the compilation of "/tmp/tmpxft_00003e29_00000000-6_link_autodiff.cpp1.ii".
CMake Error at keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.Release.cmake:279 (message):
  Error generating file
  /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpstorch509fe71999.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpstorch509fe71999.dir/rule] Error 2
make: *** [libKeOpstorch509fe71999] Error 2
-- The CXX compiler identification is GNU 7.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Compute properties automatically set to: -DMAXIDGPU=6;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152;-DMAXTHREADSPERBLOCK1=1024;-DSHAREDMEMPERBLOCK1=49152;-DMAXTHREADSPERBLOCK2=1024;-DSHAREDMEMPERBLOCK2=49152;-DMAXTHREADSPERBLOCK3=1024;-DSHAREDMEMPERBLOCK3=49152;-DMAXTHREADSPERBLOCK4=1024;-DSHAREDMEMPERBLOCK4=49152;-DMAXTHREADSPERBLOCK5=1024;-DSHAREDMEMPERBLOCK5=49152;-DMAXTHREADSPERBLOCK6=1024;-DSHAREDMEMPERBLOCK6=49152
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s): 6.1 6.1 6.1 6.1 6.1 6.1 6.1 
-- Using shared_obj_name: libKeOpstorch509fe71999
-- Found PythonInterp: /home_sdc/rremme_tmp/anaconda3/envs/main/bin/python3.7 (found version "3.7.4") 
-- Found PythonLibs: /home_sdc/rremme_tmp/anaconda3/envs/main/lib/libpython3.7m.so
-- pybind11 v2.3.dev1
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999


--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorch509fe71999', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/usr/bin/cmake -H/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops -B/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpstorch509fe71999
make[1]: Entering directory '/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999'
/usr/bin/cmake -H/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops -B/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpstorch509fe71999.dir/all
make[2]: Entering directory '/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorch509fe71999.dir/build.make CMakeFiles/keopslibKeOpstorch509fe71999.dir/depend
make[3]: Entering directory '/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o
cd /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core && /usr/bin/cmake -E make_directory /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/.
cd /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core && /usr/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Release -D generated_file:STRING=/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.cubin.txt -P /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.Release.cmake
-- Removing /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o
-- Generating dependency file: /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda-10.0/bin/nvcc -M -D__CUDACC__ /home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cu -o /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpstorch509fe71999_EXPORTS -DMAXIDGPU=6 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -DMAXTHREADSPERBLOCK4=1024 -DSHAREDMEMPERBLOCK4=49152 -DMAXTHREADSPERBLOCK5=1024 -DSHAREDMEMPERBLOCK5=49152 -DMAXTHREADSPERBLOCK6=1024 -DSHAREDMEMPERBLOCK6=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpstorch509fe71999 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_61,code=sm_61 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpstorch509fe71999.h -DNVCC -I/usr/local/cuda-10.0/include -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops -I/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999 -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/include -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/include/torch/csrc/api/include
-- Generating temporary cmake readable file: /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend.tmp
/usr/bin/cmake -D input_file:FILEPATH=/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/share/cmake-3.10/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend.tmp to /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend
/usr/bin/cmake -E copy_if_different /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend.tmp /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend
-- Removing /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend.tmp and /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.NVCC-depend
/usr/bin/cmake -E remove /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.depend.tmp /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o
/usr/local/cuda-10.0/bin/nvcc /home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpstorch509fe71999_EXPORTS -DMAXIDGPU=6 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -DMAXTHREADSPERBLOCK4=1024 -DSHAREDMEMPERBLOCK4=49152 -DMAXTHREADSPERBLOCK5=1024 -DSHAREDMEMPERBLOCK5=49152 -DMAXTHREADSPERBLOCK6=1024 -DSHAREDMEMPERBLOCK6=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpstorch509fe71999 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_61,code=sm_61 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpstorch509fe71999.h -DNVCC -I/usr/local/cuda-10.0/include -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/keops -I/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999 -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/include -I/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/include/torch/csrc/api/include
-- Removing /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999/CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/./keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o
CMakeFiles/keopslibKeOpstorch509fe71999.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpstorch509fe71999.dir/keops/core/keopslibKeOpstorch509fe71999_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999'
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/keopslibKeOpstorch509fe71999.dir/all' failed
make[2]: Leaving directory '/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999'
CMakeFiles/Makefile2:79: recipe for target 'CMakeFiles/libKeOpstorch509fe71999.dir/rule' failed
make[1]: Leaving directory '/home_sdc/rremme_tmp/.cache/pykeops-1.2-cpython-37/build-libKeOpstorch509fe71999'
Makefile:118: recipe for target 'libKeOpstorch509fe71999' failed

--------------------- ----------- -----------------
Done.
Traceback (most recent call last):
  File "sumsoftmaxweight_bug.py", line 17, in <module>
    loss.backward()
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/torch/autograd/function.py", line 77, in apply
    return self._forward_cls.backward(self, *args)
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 123, in backward
    grad = genconv(formula_g, aliases_g, backend, dtype, device_id, ranges, *args_g)
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 21, in forward
    ['-DPYTORCH_INCLUDE_DIR=' + ';'.join(include_dirs)]).import_module()
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home_sdc/rremme_tmp/anaconda3/envs/main/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpstorch509fe71999'

I was able to run all the pytorch examples without any problems.

Support for multiprocessing within PyTorch data loaders

Hello. Can pykeops be used in subprocesses, i.e. in PyTorch DataLoader methods, please?

import torch
import pykeops.torch

def nearest(a,b):
    print(a.shape,b.shape)
    a=a[:,None,:]
    b=b[None,:,:]
    a= pykeops.torch.LazyTensor(a)
    b= pykeops.torch.LazyTensor(b)
    return ((a-b)**2).sum(2).argmin(1).flatten()

class dataset(torch.utils.data.Dataset):
    def __len__(self):
        return 10
    def __getitem__(self,k):
        return nearest(torch.randn(10,3),torch.randn(10,3))

for x in torch.utils.data.DataLoader(dataset(), batch_size=None, num_workers=0):
    print(x)
#This works with num_workers==0
    
for x in torch.utils.data.DataLoader(dataset(), batch_size=None, num_workers=1):
    print(x)
#This fails for num_workers==1
# RuntimeError: DataLoader worker (pid(s) 6743) exited unexpectedly

Why does PyKeOps require GCC >= 7 ?

Hi,

The installation instructions for PyKeOps lists the following requirements:

A C++ compiler compatible with std=c++14: g++ version >=7 or clang++ version >=8.

But according to the GCC website

https://gcc.gnu.org/projects/cxx-status.html

already the GCC versions 5.x and 6.x should fully implement c++14. Should it therefore be possible to install PyKeOps already with GCC 6.3.0 ? Or is there another reason that you ask for GCC >= 7 ? This version would already fully implement c++17.

Best regards

Sam

pybind11 does not find python interpreter if not present in system path

This issue is similar to issue #49, but this time the shipped CMakeLists.txt does not find the python interpreter even if the version is supplied. At some point pybind11 invokes FindPythonInterp.cmake which fails because PYTHON_EXECUTABLE has not been defined and no matching python executable can be found in system path.

As the path to the running python interpreter can be obtained with sys.executable, a simple fix consists into inserting '-DPYTHON_EXECUTABLE=' + sys.executable, to pykeops/common/compile_routines.py:54.

--- compile_routines.py	2020-05-05 13:47:43.688013050 +0000
+++ compile_routines.py	2020-05-05 13:48:17.340202073 +0000
@@ -51,6 +51,7 @@
                      '-Dshared_obj_name=' + dllname,
                      '-D__TYPE__=' + c_type[dtype],
                      '-DPYTHON_LANG=' + lang,
+                     '-DPYTHON_EXECUTABLE=' + sys.executable,
                      '-DPYBIND11_PYTHON_VERSION=' + str(sys.version_info.major) + '.' +str(sys.version_info.minor),
                      '-DC_CONTIGUOUS=1',
                     ] + optional_flags

Why compute gradient cost in this example?

Hi,
I am referencing this specific example in which the goal is to fit a GMM with flexible number of mixtures to some 2d points denoted x.

I do not understand the point of x.requires_grad = True here. In other examples, (shape matching) you do update the data values (x) so that the original sample matches some other target sample.
However here x is (if I'm correct) the target sample.

Removing this line with fixed seed yields the same results.

Pytorch `.contiguous()` not enough to make tensors contiguous for keops?

Hi,

Suppose I have a PyTorch Tensor X. I then take a slice of this dataset and make it contiguous by X_ = X[:10].contiguous(). I then use Keops to do some computation on X_. According to the PyTorch docs, this should be enough to make the data contiguous since it basically clones the original Tensor.

Keops works fine if I compute using X. However, it won't work on X_ despite the contiguous call. Somehow Keops still sees that this isn't a contiguous array with the following message:

RuntimeError: [KeOps] Arg number 3 : is not contiguous. Please provide 'contiguous' data array, as KeOps does not support strides. If you're getting this error in the 'backward' pass of a code using torch.sum() on the output of a KeOps routine, you should consider replacing 'a.sum(
)' with '(1. * a).sum()' or 'torch.dot(a.view(-1), torch.ones_like(a).view(-1))'.

Is this expected? And is there another way of making my Tensor contiguous enough for Keops?

Is it possible to parallelize computations across GPUs?

Hi,

I'm trying to parallel computations on multiple GPUs with Keops, but it seems like the computation happens sequantially across the GPUs. What I'm doing is:

from gpytorch.kernels.keops import RBFKernel

# Instantiate a Module on every GPU
rbfs = [RBFKernel().to(d) for d in range(2)]

# Instantiate the tensors on every GPU
xs = [torch.randn(5000, 1).to(d) for d in range(2)]

# Create a wrapper around a keops.torch.LazyTensor on each device that carries out the kernel matrix multiplication 
lztsrs = [rbf.forward(x, x) for rbf, x in zip(rbfs, xs)]

# Get the actual Pytorch kernel tensors by multiplying by the identity matrix
res = [t.evaluate() for t in lztsrs]

However, according to the GPU usage in nvidia-smi, the matrix multiplications are happening sequentially since only one GPU has 100% utilization at a time.

On the other hand, in pytorch for example, the following will dispatch the computations in parallel and all GPUs will simultaneously have high usage:

import torch

xs = [torch.randn(30000, 30000, device=f"cuda:{i}") for i in range(2)]
res = [x @ x for x in xs]

Is there anyway to do keops computations on each GPU in parallel in the same way?

Feature suggestion: SIMD vectorization on CPU

Hi, thanks for the presentation you gave at Inria Parietal today ;)

I just wanted to give a heads up on https://github.com/QuantStack/xsimd which might be a useful tool to make kernel computation more efficient on modern CPUs which could be useful for people who don't have an nvidia GPU at hand.

Also you might be interested in xtensor by the same developers who provide a lazy C++ API for n-dimensional array manipulation:

https://github.com/QuantStack/xtensor

And also xeus / cling for interactive C++ development in jupyter notebook:

https://github.com/QuantStack/xeus-cling (interactive demo):

compilation error with test script

Dear keops team,
thank you for providing such an amazing package!
When trying to set up pykeops on one of my machines, I got a compilation error on the test scripts, which I do not understand.

Specifications:
Cuda 10.1
GCC 7.4.0
Python 3.7.4
Pykeops 1.2

Here is the script:

import numpy as np
import pykeops
pykeops.verbose = True

import pykeops.numpy as pknp

x = np.arange(1, 10).reshape(-1, 3).astype('float32')
y = np.arange(3, 9).reshape(-1, 3).astype('float32')

my_conv = pknp.Genred('SqNorm2(x - y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))

Here is the output:

(keops_roman) sdamrich@sirherny:~/mirrored_code/mod_shift/keops$ python keops_numpy_test.py 
Compiling libKeOpsnumpy5ac3d464a2 in /home/sdamrich/.cache/pykeops-1.2-cpython-37//build-libKeOpsnumpy5ac3d464a2:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... -- The CXX compiler identification is GNU 7.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Compute properties automatically set to: -DMAXIDGPU=0;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152
-- The CUDA compiler identification is NVIDIA 10.1.105
-- Check for working CUDA compiler: /usr/local/cuda-10.1/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.1/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s): 6.1 
-- Using shared_obj_name: libKeOpsnumpy5ac3d464a2
-- Found PythonInterp: /home/sdamrich/anaconda3/envs/keops_roman/bin/python3.7 (found version "3.7.4") 
-- Found PythonLibs: /home/sdamrich/anaconda3/envs/keops_roman/lib/libpython3.7m.so
-- pybind11 v2.3.dev1
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2

/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char16_t*; _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134:   required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6688:95:   required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char16_t; _Traits = std::char_traits<char16_t>; _Alloc = std::allocator<char16_t>]’ without object
       __p->_M_set_sharable();
       ~~~~~~~~~^~
/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134:   required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6693:95:   required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’ without object
CMake Error at keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.Release.cmake:279 (message):
  Error generating file
  /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/rule] Error 2
make: *** [libKeOpsnumpy5ac3d464a2] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpsnumpy5ac3d464a2', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/usr/bin/cmake -H/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -B/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpsnumpy5ac3d464a2
make[1]: Entering directory '/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/cmake -H/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -B/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/all
make[2]: Entering directory '/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/make -f CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/build.make CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/depend
make[3]: Entering directory '/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
cd /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core && /usr/bin/cmake -E make_directory /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/.
cd /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core && /usr/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Release -D generated_file:STRING=/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.cubin.txt -P /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.Release.cmake
-- Removing /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
-- Generating dependency file: /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda-10.1/bin/nvcc -M -D__CUDACC__ /home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cu -o /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpsnumpy5ac3d464a2_EXPORTS -DMAXIDGPU=0 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=double -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpsnumpy5ac3d464a2 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=1 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_double -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_61,code=sm_61 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpsnumpy5ac3d464a2.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -I/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops -I/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2
-- Generating temporary cmake readable file: /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp
/usr/bin/cmake -D input_file:FILEPATH=/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/share/cmake-3.10/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp to /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend
/usr/bin/cmake -E copy_if_different /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend
-- Removing /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp and /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend
/usr/bin/cmake -E remove /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
/usr/local/cuda-10.1/bin/nvcc /home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpsnumpy5ac3d464a2_EXPORTS -DMAXIDGPU=0 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=double -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpsnumpy5ac3d464a2 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=1 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_double -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_61,code=sm_61 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpsnumpy5ac3d464a2.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -I/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops -I/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2
-- Removing /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
CMakeFiles/Makefile2:141: recipe for target 'CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/all' failed
make[2]: Leaving directory '/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
CMakeFiles/Makefile2:79: recipe for target 'CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/rule' failed
make[1]: Leaving directory '/home/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
Makefile:118: recipe for target 'libKeOpsnumpy5ac3d464a2' failed

--------------------- ----------- -----------------
Done.
Traceback (most recent call last):
  File "keops_numpy_test.py", line 12, in <module>
    my_conv = pknp.Genred('SqNorm2(x - y)', ['x = Vi(3)', 'y = Vj(3)'])
  File "/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/numpy/generic/generic_red.py", line 114, in __init__
    self.myconv = LoadKEops(self.formula, self.aliases, self.dtype, 'numpy').import_module()
  File "/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpsnumpy5ac3d464a2'

When running the same script in the same conda environment on a different machine with older CUDA (9.2), everything works like a charm:

(keops_roman) sdamrich@sfb1129gpu02:~/keops$ python keops_numpy_test.py 
Compiling libKeOpsnumpy5ac3d464a2 in /export/home/sdamrich/.cache/pykeops-1.2-cpython-37//build-libKeOpsnumpy5ac3d464a2:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... -- The CXX compiler identification is GNU 7.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Compute properties automatically set to: -DMAXIDGPU=7;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152;-DMAXTHREADSPERBLOCK1=1024;-DSHAREDMEMPERBLOCK1=49152;-DMAXTHREADSPERBLOCK2=1024;-DSHAREDMEMPERBLOCK2=49152;-DMAXTHREADSPERBLOCK3=1024;-DSHAREDMEMPERBLOCK3=49152;-DMAXTHREADSPERBLOCK4=1024;-DSHAREDMEMPERBLOCK4=49152;-DMAXTHREADSPERBLOCK5=1024;-DSHAREDMEMPERBLOCK5=49152;-DMAXTHREADSPERBLOCK6=1024;-DSHAREDMEMPERBLOCK6=49152;-DMAXTHREADSPERBLOCK7=1024;-DSHAREDMEMPERBLOCK7=49152
-- The CUDA compiler identification is NVIDIA 9.2.148
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s): 6.1 6.1 6.1 6.1 6.1 6.1 6.1 6.1 
-- Using shared_obj_name: libKeOpsnumpy5ac3d464a2
-- Found PythonInterp: /export/home/sdamrich/anaconda3/envs/keops_roman/bin/python3.7 (found version "3.7.4") 
-- Found PythonLibs: /export/home/sdamrich/anaconda3/envs/keops_roman/lib/libpython3.7m.so
-- pybind11 v2.3.dev1
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2

Generated /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o successfully.
/usr/bin/cmake -H/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -B/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpsnumpy5ac3d464a2
make[1]: Entering directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/cmake -H/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -B/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/all
make[2]: Entering directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/make -f CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/build.make CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/depend
make[3]: Entering directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
cd /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core && /usr/bin/cmake -E make_directory /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/.
cd /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core && /usr/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Release -D generated_file:STRING=/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.cubin.txt -P /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.Release.cmake
-- Removing /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
/usr/bin/cmake -E remove /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
-- Generating dependency file: /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda/bin/nvcc -M -D__CUDACC__ /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cu -o /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpsnumpy5ac3d464a2_EXPORTS -DMAXIDGPU=7 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -DMAXTHREADSPERBLOCK4=1024 -DSHAREDMEMPERBLOCK4=49152 -DMAXTHREADSPERBLOCK5=1024 -DSHAREDMEMPERBLOCK5=49152 -DMAXTHREADSPERBLOCK6=1024 -DSHAREDMEMPERBLOCK6=49152 -DMAXTHREADSPERBLOCK7=1024 -DSHAREDMEMPERBLOCK7=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=double -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpsnumpy5ac3d464a2 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=1 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_double -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_61,code=sm_61 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpsnumpy5ac3d464a2.h -DNVCC -I/usr/local/cuda/include -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops -I/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2
-- Generating temporary cmake readable file: /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp
/usr/bin/cmake -D input_file:FILEPATH=/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/share/cmake-3.10/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp to /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend
/usr/bin/cmake -E copy_if_different /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend
-- Removing /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp and /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend
/usr/bin/cmake -E remove /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.depend.tmp /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o
/usr/local/cuda/bin/nvcc /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/./keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpsnumpy5ac3d464a2_EXPORTS -DMAXIDGPU=7 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -DMAXTHREADSPERBLOCK2=1024 -DSHAREDMEMPERBLOCK2=49152 -DMAXTHREADSPERBLOCK3=1024 -DSHAREDMEMPERBLOCK3=49152 -DMAXTHREADSPERBLOCK4=1024 -DSHAREDMEMPERBLOCK4=49152 -DMAXTHREADSPERBLOCK5=1024 -DSHAREDMEMPERBLOCK5=49152 -DMAXTHREADSPERBLOCK6=1024 -DSHAREDMEMPERBLOCK6=49152 -DMAXTHREADSPERBLOCK7=1024 -DSHAREDMEMPERBLOCK7=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=double -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpsnumpy5ac3d464a2 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=1 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_double -Xcompiler ,\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_61,code=sm_61 --use_fast_math --compiler-options=-fPIC -ccbin /usr/bin/c++ --pre-include=libKeOpsnumpy5ac3d464a2.h -DNVCC -I/usr/local/cuda/include -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops -I/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2
cd /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/DependInfo.cmake --color=
Dependee "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/DependInfo.cmake" is newer than depender "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/depend.internal".
Dependee "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/depend.internal".
Scanning dependencies of target keopslibKeOpsnumpy5ac3d464a2
make[3]: Leaving directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/make -f CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/build.make CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/build
make[3]: Entering directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
[ 40%] Linking CUDA device code CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/cmake_device_link.o
/usr/bin/cmake -E cmake_link_script CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/dlink.txt --verbose=1
/usr/local/cuda/bin/nvcc   -O3 -DNDEBUG -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o -o CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/cmake_device_link.o  -L/usr/local/cuda/targets/x86_64-linux/lib/stubs  -L/usr/local/cuda/targets/x86_64-linux/lib 
[ 60%] Linking CXX shared library libKeOpsnumpy5ac3d464a2.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/link.txt --verbose=1
/usr/bin/c++ -fPIC  -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3  -shared -Wl,-soname,libKeOpsnumpy5ac3d464a2.so -o libKeOpsnumpy5ac3d464a2.so CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/keops/core/keopslibKeOpsnumpy5ac3d464a2_generated_link_autodiff.cu.o CMakeFiles/keopslibKeOpsnumpy5ac3d464a2.dir/cmake_device_link.o  -L/usr/local/cuda/targets/x86_64-linux/lib/stubs  -L/usr/local/cuda/targets/x86_64-linux/lib /usr/local/cuda/lib64/libcudart_static.a -lpthread -ldl /usr/lib/x86_64-linux-gnu/librt.so -lcudadevrt -lcudart_static -lrt -lpthread -ldl 
/usr/bin/cmake -E copy /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/libKeOpsnumpy5ac3d464a2.so /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/../
make[3]: Leaving directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
[ 60%] Built target keopslibKeOpsnumpy5ac3d464a2
/usr/bin/make -f CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/build.make CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/depend
make[3]: Entering directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
cd /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/DependInfo.cmake --color=
Dependee "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/DependInfo.cmake" is newer than depender "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/depend.internal".
Dependee "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/depend.internal".
Scanning dependencies of target libKeOpsnumpy5ac3d464a2
make[3]: Leaving directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/make -f CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/build.make CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/build
make[3]: Entering directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
[ 80%] Building CXX object CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/numpy/generic/generic_red.cpp.o
/usr/bin/c++  -DCUDA_BLOCK_SIZE=192 -DC_CONTIGUOUS=1 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMAXIDGPU=7 -DMAXTHREADSPERBLOCK0=1024 -DMAXTHREADSPERBLOCK1=1024 -DMAXTHREADSPERBLOCK2=1024 -DMAXTHREADSPERBLOCK3=1024 -DMAXTHREADSPERBLOCK4=1024 -DMAXTHREADSPERBLOCK5=1024 -DMAXTHREADSPERBLOCK6=1024 -DMAXTHREADSPERBLOCK7=1024 -DMODULE_NAME=libKeOpsnumpy5ac3d464a2 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_double -DSHAREDMEMPERBLOCK0=49152 -DSHAREDMEMPERBLOCK1=49152 -DSHAREDMEMPERBLOCK2=49152 -DSHAREDMEMPERBLOCK3=49152 -DSHAREDMEMPERBLOCK4=49152 -DSHAREDMEMPERBLOCK5=49152 -DSHAREDMEMPERBLOCK6=49152 -DSHAREDMEMPERBLOCK7=49152 -DUSE_CUDA=1 -DUSE_DOUBLE=1 -D_FORCE_INLINES -D_GLIBCXX_USE_CXX11_ABI=0 -D__TYPE__=double -DlibKeOpsnumpy5ac3d464a2_EXPORTS -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/keops -I/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 -I/usr/local/cuda/include -I/export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/pybind11/include -I/export/home/sdamrich/anaconda3/envs/keops_roman/include/python3.7m  -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3 -fPIC -fvisibility=hidden   -flto -fno-fat-lto-objects -include libKeOpsnumpy5ac3d464a2.h -std=gnu++14 -o CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/numpy/generic/generic_red.cpp.o -c /export/home/sdamrich/anaconda3/envs/keops_roman/lib/python3.7/site-packages/pykeops/numpy/generic/generic_red.cpp
[100%] Linking CXX shared module libKeOpsnumpy5ac3d464a2.cpython-37m-x86_64-linux-gnu.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/link.txt --verbose=1
/usr/bin/c++ -fPIC  -Wall -Wno-unknown-pragmas -fmax-errors=2 -O3 -DNDEBUG -O3 -Wl,-rpath,$ORIGIN -shared  -o libKeOpsnumpy5ac3d464a2.cpython-37m-x86_64-linux-gnu.so CMakeFiles/libKeOpsnumpy5ac3d464a2.dir/numpy/generic/generic_red.cpp.o -Wl,-rpath,/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2 -flto libKeOpsnumpy5ac3d464a2.so /usr/local/cuda/lib64/libcudart_static.a -lpthread -ldl -lrt 
/usr/bin/strip /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/libKeOpsnumpy5ac3d464a2.cpython-37m-x86_64-linux-gnu.so
/usr/bin/cmake -E copy /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/libKeOpsnumpy5ac3d464a2.cpython-37m-x86_64-linux-gnu.so /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/../
make[3]: Leaving directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
[100%] Built target libKeOpsnumpy5ac3d464a2
make[2]: Leaving directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'
/usr/bin/cmake -E cmake_progress_start /net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2/CMakeFiles 0
make[1]: Leaving directory '/net/hcihome/storage/sdamrich/.cache/pykeops-1.2-cpython-37/build-libKeOpsnumpy5ac3d464a2'

Done.
[[63.]
 [90.]]

Do you have an idea on how I could get Keops to run on the first machine?

ImportError: dynamic module does not define module export function

Hi all,

I'm facing the same problem as well. I installed gcc 7.4 and nvcc 10.0 and still getting the same problem. Any ideas ?

nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130
gcc --version gcc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0 Copyright (C) 2017 Free Software Foundation, Inc.

`
Compiling libKeOpsnumpy73a835aa5f in /home/hassanhaija/.cache/pykeops-1.0.2/:
formula: Sum_Reduction(-SqNorm2(x-y),1)
aliases: x = Vi(0,3); y = Vj(1,3);
dtype : float64
... -- The CXX compiler identification is GNU 7.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Compute properties automatically set to: -DMAXIDGPU=1;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152;-DMAXTHREADSPERBLOCK1=1024;-DSHAREDMEMPERBLOCK1=49152
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s): 6.1 6.1
-- Using shared_obj_name: libKeOpsnumpy73a835aa5f
-- Found PythonInterp: /home/hassanhaija/anaconda3/bin/python3.7 (found version "3.7.1")
-- Found PythonLibs: /home/hassanhaija/anaconda3/lib/libpython3.7m.so
-- Performing Test HAS_CPP14_FLAG
-- Performing Test HAS_CPP14_FLAG - Success
-- pybind11 v2.2.4
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home/hassanhaija/.cache/pykeops-1.0.2

[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpsnumpy73a835aa5f.dir/keops/core/keopslibKeOpsnumpy73a835aa5f_generated_link_autodiff.cu.o
Scanning dependencies of target keopslibKeOpsnumpy73a835aa5f
[ 40%] Linking CUDA device code CMakeFiles/keopslibKeOpsnumpy73a835aa5f.dir/cmake_device_link.o
[ 60%] Linking CXX shared library libKeOpsnumpy73a835aa5f.so
[ 60%] Built target keopslibKeOpsnumpy73a835aa5f
Scanning dependencies of target libKeOpsnumpy73a835aa5f
[ 80%] Building CXX object CMakeFiles/libKeOpsnumpy73a835aa5f.dir/numpy/generic/generic_red.cpp.o
[100%] Linking CXX shared module libKeOpsnumpy73a835aa5f.cpython-37m-x86_64-linux-gnu.so
[100%] Built target libKeOpsnumpy73a835aa5f

Done.
Traceback (most recent call last):
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 45, in load_keops
return importlib.import_module(dll_name)
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpsnumpy73a835aa5f'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 29, in _safe_compile_and_load
return importlib.import_module(dll_name)
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpsnumpy73a835aa5f'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "keopstest.py", line 9, in
my_conv = Genred('-SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/site-packages/pykeops/numpy/generic/generic_red.py", line 114, in init
self.myconv = load_keops(self.formula, self.aliases, self.dtype, 'numpy')
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 48, in load_keops
return _safe_compile_and_load(formula, aliases, dll_name, dtype, lang, optional_flags)
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/site-packages/pykeops/common/utils.py", line 70, in wrapper_filelock
return func(*args, **kwargs)
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 34, in _safe_compile_and_load
return importlib.import_module(dll_name)
File "/home/hassanhaija/anaconda3/envs/py36/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 658, in _load_unlocked
File "", line 571, in module_from_spec
File "", line 922, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: dynamic module does not define module export function (PyInit_libKeOpsnumpy73a835aa5f)
`

Originally posted by @hassanhaija in #2 (comment)

Double broadcasting strange behaviour

Hello. I am trying to do `double broadcasting', and I am getting some strange behavior.
Consider 4 point-clouds of 64 points in 3d space.

import torch 
from pykeops.torch import LazyTensor
l = torch.randn(4,64,3) #4 point-clouds

#Get the sum of distances-squared between each point cloud: 
l0= l[:,None,:,None,:]
l1= l[None,:,None,:,:]
print(((l0-l1)**2).sum(4).sum(3).sum(2))

#Now try with pykeops
l0=LazyTensor(l0)
l1=LazyTensor(l1)
print(((l0-l1)**2).sum(4).shape) #The reported size is wrong (!), but ...
print(((l0-l1)**2).sum(4).sum(3).sum(2)[:,:,0]) #... output is right, but with an extra singleton dimension.

Output

tensor([[21983.7969, 23900.2793, 23794.3164, 22217.5625],
        [23900.2773, 24554.8984, 24448.7891, 23295.2148],
        [23794.3164, 24448.7871, 23484.6074, 22879.8828],
        [22217.5625, 23295.2129, 22879.8828, 21782.1055]])
(4, 1, 64, 64)
tensor([[21983.7949, 23900.2793, 23794.3164, 22217.5625],
        [23900.2773, 24554.8984, 24448.7891, 23295.2129],
        [23794.3164, 24448.7871, 23484.6055, 22879.8828],
        [22217.5625, 23295.2129, 22879.8848, 21782.1055]])

Example for 3D convolutions

Thanks a lot for this useful code, the benchmarks are impressive.

I wanted to try it out for convolutions (anisotropic kernels) on gridded data (3D images). Could you point me to an example code for this use case (couldn't find it)?

That'd be great, thanks a lot in advance!

ImportError due to pybind11 autodetecting python version in CMakeLists.txt

Steps to reproduce:

  1. Have two or more different versions of python 3 installed (python3.5 and python3.6 for example):
sudo apt-get install python3.5-dev python3.6-dev
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3.5 get-pip.py
python3.6 get-pip.py
  1. Install pykeops using pip (or from sources):
pip3.5 install numpy && pip3.5 install pykeops
pip3.6 install numpy && pip3.6 install pykeops

This will install two different keops modules:
<module 'pykeops' from '/usr/local/lib/python3.5/dist-packages/pykeops/init.py'>
<module 'pykeops' from '/usr/local/lib/python3.6/dist-packages/pykeops/init.py'>

  1. Create python file test.py containing:
import pykeops
pykeops.clean_pykeops()          # just in case old build files are still present
pykeops.test_numpy_bindings()    # perform the compilation
  1. The test script will fail for at least one python version:
  • python3.5 test.py fails with ImportError: dynamic module does not define module export function (PyInit_libKeOpsnumpyb10acd1892)
  • python3.6 test.py succeeds with pyKeOps with numpy bindings is working!

What really happens:

This happens because pykeops/common/compile_routines.py calls cmake on pykeops/CMakeLists.txt which contains add_subdirectory(pybind11) which will detect python3.6 by default (in this specific case). This generates a shared library <CACHE_DIR>/pykeops-1.3-cpython-35/libKeOpsnumpyb10acd1892.so targeting python 3.6 instead of 3.5 and the importlib module fails to load the library.

How to fix:

The simplest solution I found is to enforce the python version directly in the build script by using the PYBIND11_PYTHON_VERSION variable. Adding set(PYBIND11_PYTHON_VERSION 3.5) at the beginning of /usr/local/lib/python3.5/dist-packages/pykeops/CMakeLists.txt fixes the problem.

I imagine this could be done automatically by detecting python version during build / before installation. This fix could solve many issues such as #2 #8 #28 #37 and others.

Changes between 1.1.2 and 1.1.1?

After the update from 1.1.1 to 1.1.2, I'm having issues with using keops on two different computers when running the basic installation code:

import torch
import pykeops.torch as pktorch

x = torch.arange(1, 10, dtype=torch.float32).view(-1, 3)
y = torch.arange(3, 9, dtype=torch.float32).view(-1, 3)

my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))

On one device, I'm getting

>>> import torch
>>> import pykeops.torch as pktorch
>>> 
>>> x = torch.arange(1, 10, dtype=torch.float32).view(-1, 3)
>>> y = torch.arange(3, 9, dtype=torch.float32).view(-1, 3)
>>> 
>>> my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
>>> print(my_conv(x, y))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/alex/miniconda3/envs/gpytorch/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 351, in __call__
    out = GenredAutograd.apply(self.formula, self.aliases, backend, self.dtype, device_id, ranges, *args)
  File "/home/alex/miniconda3/envs/gpytorch/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 21, in forward
    ['-DPYTORCH_INCLUDE_DIR=' + ';'.join(include_dirs)]).import_module()
  File "/home/alex/miniconda3/envs/gpytorch/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/alex/miniconda3/envs/gpytorch/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed

On another device, I'm getting

>>> import torch
>>> import pykeops.torch as pktorch
7.4.0
>>> 
>>> x = torch.arange(1, 10, dtype=torch.float32).view(-1, 3)
>>> y = torch.arange(3, 9, dtype=torch.float32).view(-1, 3)
>>> 
>>> my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
>>> print(my_conv(x, y))
Compiling libKeOpstorch91c92bd508 in /home/alex_w/.cache/pykeops-1.1.2-cpython-37//build-libKeOpstorch91c92bd508:
       formula: Sum_Reduction(SqNorm2(x-y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float32
... CMake Error at pybind11/tools/FindPythonLibsNew.cmake:127 (message):
  Python config failure: Python is 0-bit, chosen compiler is 64-bit
Call Stack (most recent call first):
  pybind11/tools/pybind11Tools.cmake:16 (find_package)
  pybind11/CMakeLists.txt:33 (include)



--------------------- CMAKE DEBUG -----------------
Command '['cmake', '/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/pykeops', '-DCMAKE_BUILD_TYPE=Release', '-DFORMULA_OBJ=Sum_Reduction(SqNorm2(x-y),1)', '-DVAR_ALIASES=auto x = Vi(0,3); auto y = Vj(1,3); ', '-Dshared_obj_name=libKeOpstorch91c92bd508', '-D__TYPE__=float', '-DPYTHON_LANG=torch', '-DC_CONTIGUOUS=1', '-DPYTORCH_INCLUDE_DIR=/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/torch/include;/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/torch/include/torch/csrc/api/include', '-DcommandLine=cmake /home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/pykeops -DCMAKE_BUILD_TYPE=Release -DFORMULA_OBJ=Sum_Reduction(SqNorm2(x-y),1) -DVAR_ALIASES=auto x = Vi(0,3); auto y = Vj(1,3);  -Dshared_obj_name=libKeOpstorch91c92bd508 -D__TYPE__=float -DPYTHON_LANG=torch -DC_CONTIGUOUS=1 -DPYTORCH_INCLUDE_DIR=/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/torch/include;/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/torch/include/torch/csrc/api/include']' returned non-zero exit status 1.
-- The CXX compiler identification is GNU 7.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Compute properties automatically set to: -DMAXIDGPU=7;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152;-DMAXTHREADSPERBLOCK1=1024;-DSHAREDMEMPERBLOCK1=49152;-DMAXTHREADSPERBLOCK2=1024;-DSHAREDMEMPERBLOCK2=49152;-DMAXTHREADSPERBLOCK3=1024;-DSHAREDMEMPERBLOCK3=49152;-DMAXTHREADSPERBLOCK4=1024;-DSHAREDMEMPERBLOCK4=49152;-DMAXTHREADSPERBLOCK5=1024;-DSHAREDMEMPERBLOCK5=49152;-DMAXTHREADSPERBLOCK6=1024;-DSHAREDMEMPERBLOCK6=49152;-DMAXTHREADSPERBLOCK7=1024;-DSHAREDMEMPERBLOCK7=49152
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CUDA compiler: /usr/bin/nvcc
-- Check for working CUDA compiler: /usr/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s): 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 
-- Using shared_obj_name: libKeOpstorch91c92bd508
-- Found PythonInterp: /home/alex_w/miniconda3/envs/rl/bin/python3.7 (found version "3.7.4") 
-- Configuring incomplete, errors occurred!
See also "/home/alex_w/.cache/pykeops-1.1.2-cpython-37/build-libKeOpstorch91c92bd508/CMakeFiles/CMakeOutput.log".
See also "/home/alex_w/.cache/pykeops-1.1.2-cpython-37/build-libKeOpstorch91c92bd508/CMakeFiles/CMakeError.log".

--------------------- ----------- -----------------
make: *** No rule to make target 'libKeOpstorch91c92bd508'.  Stop.

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorch91c92bd508', '--', 'VERBOSE=1']' returned non-zero exit status 2.

--------------------- ----------- -----------------
Done.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 351, in __call__
    out = GenredAutograd.apply(self.formula, self.aliases, backend, self.dtype, device_id, ranges, *args)
  File "/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/pykeops/torch/generic/generic_red.py", line 21, in forward
    ['-DPYTORCH_INCLUDE_DIR=' + ';'.join(include_dirs)]).import_module()
  File "/home/alex_w/miniconda3/envs/rl/lib/python3.7/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/alex_w/miniconda3/envs/rl/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpstorch91c92bd508'

I'm wondering if anything about the builds changed between 1.1 and 1.2? Also how can I go about getting debugging these issues? I tried PYKEOPS_VERBOSE=1 and clearing the cache, but it hasn't given me any additional debug information.

Shared object compiled without CUDA support

Hi, teams! I met an issue in the test sample

issue

My compiling configuration is

g++ Version: 7.3.0
gcc Version: 7.3.0
cmake Version: 3.14.0
nvcc Version: 10.0.130
Nvidia Driver Version: 440.64
CUDA Version: 10.1
Pytorch Version: 1.3.1

Many thanks!

PyTorch test script fails

Hello.

I pass the numpy test script, but not the pyTorch. I get the following output:

`(venv) jad@jad-Aspire-A717-71G ~/venv $ python3
Python 3.5.2 (default, Oct 8 2019, 13:06:37)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch

import pykeops.torch as pktorch

x = torch.arange(1, 10, dtype=torch.float32).view(-1, 3)
y = torch.arange(3, 9, dtype=torch.float32).view(-1, 3)

my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))
Traceback (most recent call last):
File "", line 1, in
File "/home/jad/venv/lib/python3.5/site-packages/pykeops/torch/generic/generic_red.py", line 351, in call
out = GenredAutograd.apply(self.formula, self.aliases, backend, self.dtype, device_id, ranges, *args)
File "/home/jad/venv/lib/python3.5/site-packages/pykeops/torch/generic/generic_red.py", line 21, in forward
['-DPYTORCH_INCLUDE_DIR=' + ';'.join(include_dirs)]).import_module()
File "/home/jad/venv/lib/python3.5/site-packages/pykeops/common/keops_io.py", line 52, in import_module
return importlib.import_module(self.dll_name)
File "/home/jad/venv/lib/python3.5/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 986, in _gcd_import
File "", line 969, in _find_and_load
File "", line 958, in _find_and_load_unlocked
File "", line 666, in _load_unlocked
File "", line 577, in module_from_spec
File "", line 906, in create_module
File "", line 222, in _call_with_frames_removed
ImportError: /home/jad/.cache/pykeops-1.2-cpython-35/libKeOpstorch91c92bd508.cpython-35m-x86_64-linux-gnu.so: undefined symbol: _ZN3c1011CPUTensorIdEv
`

Compilation Failure under JupyterLab

System:

  • Ubuntu 16.04 docker
  • Python 3.6.7
  • PyTorch 1.1
  • NVCC 10.0
  • g++ 5.4.0-6
  • cmake 3.14.4
  • GNU make 4.1
  • JupyterLab 0.35.4

When running one of the example scripts from the command line (either line by line in a python3 shell, or as 'python3 test.py') everything works fine. When running from what I believe to be a properly configured JupyterLab, I get cmake and make errors.

The chosen test script:

import torch
import pykeops.torch as pktorch

x = torch.arange(1, 10, dtype=torch.float32).view(-1, 3)
y = torch.arange(3, 9, dtype=torch.float32).view(-1, 3)

my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))

The first part of the errors, showing make and cmake errors while python is still running:

Compiling libKeOpstorch91c92bd508 in /root/.cache/pykeops-1.0.2/:
       formula: Sum_Reduction(SqNorm2(x-y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float32
... 
--------------------- CMAKE DEBUG -----------------
Command '['cmake', '/opt/conda/lib/python3.6/site-packages/pykeops', '-DCMAKE_BUILD_TYPE=Release', '-DFORMULA_OBJ=Sum_Reduction(SqNorm2(x-y),1)', '-DVAR_ALIASES=auto x = Vi(0,3); auto y = Vj(1,3); ', '-Dshared_obj_name=libKeOpstorch91c92bd508', '-D__TYPE__=float', '-DPYTHON_LANG=torch', '-DPYTORCH_INCLUDE_DIR=/opt/conda/lib/python3.6/site-packages/torch/include;/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include']' returned non-zero exit status 1.
-- Configuring incomplete, errors occurred!

--------------------- ----------- -----------------

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorch91c92bd508']' returned non-zero exit status 1.

--------------------- ----------- -----------------
Done. 

Followed immediately by the python errors themselves:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pykeops/common/keops_io.py in load_keops(formula, aliases, dtype, lang, optional_flags)
     44         # high frequency path
---> 45         return importlib.import_module(dll_name)
     46     except ImportError:

/opt/conda/lib/python3.6/importlib/__init__.py in import_module(name, package)
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'libKeOpstorch91c92bd508'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pykeops/common/keops_io.py in _safe_compile_and_load(formula, aliases, dll_name, dtype, lang, optional_flags)
     28             # already compiled, just load
---> 29             return importlib.import_module(dll_name)
     30         except ImportError:

/opt/conda/lib/python3.6/importlib/__init__.py in import_module(name, package)
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'libKeOpstorch91c92bd508'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-387377d109a7> in <module>
      7 
      8 my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
----> 9 print(my_conv(x, y))

/opt/conda/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py in __call__(self, backend, device_id, ranges, *args)
    311 
    312         """
--> 313         out = GenredAutograd.apply(self.formula, self.aliases, backend, self.dtype, device_id, ranges, *args)
    314         nx, ny = get_sizes(self.aliases, *args)
    315         nout = nx if self.axis==1 else ny

/opt/conda/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py in forward(ctx, formula, aliases, backend, dtype, device_id, ranges, *args)
     17     def forward(ctx, formula, aliases, backend, dtype, device_id, ranges, *args):
     18 
---> 19         myconv = load_keops(formula, aliases, dtype, 'torch', ['-DPYTORCH_INCLUDE_DIR=' + ';'.join(include_dirs)])
     20 
     21         # Context variables: save everything to compute the gradient:

/opt/conda/lib/python3.6/site-packages/pykeops/common/keops_io.py in load_keops(formula, aliases, dtype, lang, optional_flags)
     46     except ImportError:
     47         # could not import (ie not compiled), safely compile/import
---> 48         return _safe_compile_and_load(formula, aliases, dll_name, dtype, lang, optional_flags)

/opt/conda/lib/python3.6/site-packages/pykeops/common/utils.py in wrapper_filelock(*args, **kwargs)
     68             with open(build_folder + '/' + lock_file_name, 'w') as f:
     69                 with FileLock(f):
---> 70                     return func(*args, **kwargs)
     71 
     72         return wrapper_filelock

/opt/conda/lib/python3.6/site-packages/pykeops/common/keops_io.py in _safe_compile_and_load(formula, aliases, dll_name, dtype, lang, optional_flags)
     32             # print(dll_name + " not found")
     33             compile_generic_routine(formula, aliases, dll_name, dtype, lang, optional_flags)
---> 34             return importlib.import_module(dll_name)
     35 
     36     # create the name from formula, aliases and dtype.

/opt/conda/lib/python3.6/importlib/__init__.py in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)

/opt/conda/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'libKeOpstorch91c92bd508'

I will note that after running various test scripts successfully from the command line, the following contents are in /root/.cache/pykeops-1.0.2, which is in the sys.path as printed from the jupyter script.

14 Jun 30 00:32 .
5 Jun 29 20:15 ..
CMakeCache.txt
CMakeFiles
Makefile
cmake_install.cmake
detect_cuda_compute_capabilities.cu
detect_cuda_props.cu
libKeOpstorch91c92bd508.cpython-36m-x86_64-linux-gnu.so
libKeOpstorch91c92bd508.h
libKeOpstorch91c92bd508.so
pybind11
pykeops_build.lock
torch_headers.h

example .pt file, and tools to convert binary image to .pt file?

Hi, I'm interested in testing the KeOps, and am trying to go through the "Surface registration" tutorial here:
http://kernel-operations.io/keops/_auto_tutorials/surface_registration/plot_LDDMM_Surface.html#sphx-glr-auto-tutorials-surface-registration-plot-lddmm-surface-py

In the tutorial, it requires the "*.pt" as the import data file:
hippos.pt” : original data (6611 vertices), etc...

I'm assuming it's a point cloud data format. May I ask if there is any example ".pt" file format that we can use to test? Furthermore, would there be tools that can be used to convert a binary mask image to a ".pt" file format?

Thank you!

Pykeops cannot find cuda

Hello, great work !

With this sample code, pykeops cannot see the GPUs with cuda 9.1 and cmake 3.12.1, is that normal ?

(base) hjanati@drago3:~/code/nips19$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

I made sure I installed pykeops with all dependencies pykeops[full].

import numpy as np
import pykeops
pykeops.verbose = True
from pykeops.numpy import Genred

x = np.arange(1, 10).reshape(-1, 3).astype('float32')
y = np.arange(3, 9 ).reshape(-1, 3).astype('float32')

my_conv = Genred('-SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))

Here is the output:

Compiling libKeOpsnumpy73a835aa5f in /home/parietal/hjanati/.cache/pykeops-1.0.2/:
       formula: Sum_Reduction(-SqNorm2(x-y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... CMake Warning at /home/parietal/hjanati/miniconda3/share/cmake-3.14/Modules/FindCUDA.cmake:893 (message):
  Expecting to find librt for libcudart_static, but didn't find it.
Call Stack (most recent call first):
  keops/cuda.cmake:8 (find_package)
  CMakeLists.txt:11 (include)


-- No GPU detected. USE_CUDA set to FALSE.
-- Using shared_obj_name: libKeOpsnumpy73a835aa5f
-- pybind11 v2.2.4
-- Configuring done
-- Generating done
-- Build files have been written to: /home/parietal/hjanati/.cache/pykeops-1.0.2

Scanning dependencies of target keopslibKeOpsnumpy73a835aa5f
[ 25%] Building CXX object CMakeFiles/keopslibKeOpsnumpy73a835aa5f.dir/keops/core/link_autodiff.cpp.o
[ 50%] Linking CXX shared library libKeOpsnumpy73a835aa5f.so
[ 50%] Built target keopslibKeOpsnumpy73a835aa5f
Scanning dependencies of target libKeOpsnumpy73a835aa5f
[ 75%] Building CXX object CMakeFiles/libKeOpsnumpy73a835aa5f.dir/numpy/generic/generic_red.cpp.o
[100%] Linking CXX shared module libKeOpsnumpy73a835aa5f.cpython-36m-x86_64-linux-gnu.so
[100%] Built target libKeOpsnumpy73a835aa5f

Done. 
Traceback (most recent call last):
  File "keopstest.py", line 10, in <module>
    print(my_conv(x, y))
  File "/home/parietal/hjanati/miniconda3/lib/python3.6/site-packages/pykeops/numpy/generic/generic_red.py", line 224, in __call__
    out = self.myconv.genred_numpy(nx, ny, tagCpuGpu, tag1D2D, 0, device_id, ranges, *args)
RuntimeError: [KeOps]\xa0This KeOps shared object has been compiled without cuda support: 
 1) to perform computations on CPU, simply set tagHostDevice to 0
 2) to perform computations on GPU, please recompile the formula with a working version of cuda.

Kernel with norm matrix indexed by i and j

Hi! I am trying to figure out how to construct a kernel where the matrix defining the norm has i and j indices. More specifically, I want the kernel evaluated at the points x_i and y_j to be

k(x_i, y_j) = exp(-1/2 d_ij @ inverse(S_i + S_j) @ d_ij) / sqrt(det(S_i + S_j)),
d_ij = x_i - y_j.

For my specific problem, x_i and y_i are vectors of length d=2. I want to use k with KernelSolve, and I need to use the pytorch backend and to be able to take gradients.

I was wondering how to best do this in keops. Since d is 2, I can write out explicit expressions for everything in terms of the components of the vectors and elements of the matrices S_i and S_j. However, this could be optimized substantially by only computing det(S_i + S_j) once for each (i, j). Is there any way I can write formulas like this, or would it require something lower-level in keops? Thanks!

For reference, here's how I'm constructing the kernel and solver now:

d_0 = "(Elem(x_i, 0) - Elem(y_j, 0))"
d_1 = "(Elem(x_i, 1) - Elem(y_j, 1))"
s_00 = "(Elem(s_i, 0) + Elem(s_j, 0))"
s_01 = "(Elem(s_i, 1) + Elem(s_j, 1))"
s_11 = "(Elem(s_i, 3) + Elem(s_j, 3))"
det = f"({s_00} * {s_11} - Square({s_01}))"
s_inv_00 = f"({s_11} / {det})"
s_inv_01 = f"(-{s_01} / {det})"
s_inv_11 = f"({s_00} / {det})"
formula = (
    "Exp(-("
    f"Square({d_0}) * {s_inv_00} + "
    f"IntCst(2) * {d_0} * {d_1} * {s_inv_01} +"
    f"Square({d_1}) * {s_inv_11}"
    ") * IntInv(2))"
    f" * alpha2 * Rsqrt({det}) * theta_j"
)

aliases = [
    "x_i = Vi(2)",
    "y_j = Vj(2)",
    "s_i = Vi(4)",
    "s_j = Vj(4)",
    "alpha2 = Pm(1)",
    "theta_j = Vj(1)"
]

K = Genred(formula, aliases, axis=1)
K_inv = KernelSolve(formula, aliases, "theta_j", axis=1)

Compilation failures on 1.1.2

I am getting compilation failures for operations after updating to v1.1.2 that I did not previously get. To easily reproduce, all I have to do is enter keops/pykeops/test and run python unit_tests_numpy.py, and I get the error at the end of the file on the first (and every) test.

Using git bisect, I was able to determine that 466eaf2 is the first commit on which this error occurs (e.g., on the commit before this things work and on this commit I receive the error below). A lot happened in this commit however, so I wasn't able to determine an easy fix.

Error log

Compiling libKeOpsnumpy60ff1f2397 in /home/jake.gardner/git/keops/pykeops/common/../build//build-libKeOpsnumpy60ff1f2397:
       formula: Sum_Reduction(Inv(Exp((IntCst(1) + Sum((Square((Var(0,1,0) + (Var(1,3,0) * Var(2,3,1)))) + Var(3,1,2)))))),0)
       aliases: Var(0,1,0); Var(1,3,0); Var(2,3,1); Var(3,1,2); 
       dtype  : float32
... In file included from /usr/include/c++/5/type_traits:35:0,
                 from /home/jake.gardner/git/keops/pykeops/../keops/lib/sequences/include/tao/seq/is_all.hpp:13,
                 from /home/jake.gardner/git/keops/pykeops/../keops/lib/sequences/include/tao/seq/is_any.hpp:10,
                 from /home/jake.gardner/git/keops/pykeops/../keops/lib/sequences/include/tao/seq/contains.hpp:8,
                 from /home/jake.gardner/git/keops/pykeops/../keops/core/formulas/tensordot.h:4,
                 from /home/jake.gardner/git/keops/pykeops/../keops/core/formulas/maths.h:11,
                 from /home/jake.gardner/git/keops/pykeops/../keops/core/formulas/newsyntax.h:10,
                 from /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/libKeOpsnumpy60ff1f2397.h:16,
                 from <command-line>:0:
/usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support \
  ^
CMake Error at keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o.Release.cmake:219 (message):
  Error generating
  /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/./keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpsnumpy60ff1f2397.dir/rule] Error 2
make: *** [libKeOpsnumpy60ff1f2397] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpsnumpy60ff1f2397', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/home/jake.gardner/anaconda3/lib/python3.7/site-packages/cmake/data/bin/cmake -S/home/jake.gardner/git/keops/pykeops -B/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpsnumpy60ff1f2397
make[1]: Entering directory '/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397'
/home/jake.gardner/anaconda3/lib/python3.7/site-packages/cmake/data/bin/cmake -S/home/jake.gardner/git/keops/pykeops -B/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397 --check-build-system CMakeFiles/Makefile.cmake 0
/home/jake.gardner/anaconda3/lib/python3.7/site-packages/cmake/data/bin/cmake -E cmake_progress_start /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpsnumpy60ff1f2397.dir/all
make[2]: Entering directory '/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397'
/usr/bin/make -f CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/build.make CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/depend
make[3]: Entering directory '/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o
cd /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core && /home/jake.gardner/anaconda3/lib/python3.7/site-packages/cmake/data/bin/cmake -E make_directory /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/.
cd /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core && /home/jake.gardner/anaconda3/lib/python3.7/site-packages/cmake/data/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Release -D generated_file:STRING=/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/./keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/./keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o.cubin.txt -P /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o.Release.cmake
-- Removing /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/./keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o
/home/jake.gardner/anaconda3/lib/python3.7/site-packages/cmake/data/bin/cmake -E remove /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/./keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o
-- Generating dependency file: /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda/bin/nvcc -M -D__CUDACC__ /home/jake.gardner/git/keops/pykeops/../keops/core/link_autodiff.cu -o /home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397/CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpsnumpy60ff1f2397_EXPORTS -DMAXIDGPU=1 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -DMAXTHREADSPERBLOCK1=1024 -DSHAREDMEMPERBLOCK1=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -DMODULE_NAME=libKeOpsnumpy60ff1f2397 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-Wall\",\"-fmax-errors=2\",\"-fPIC\",\"-O3\",\"-DNDEBUG\",\"-O3\" -gencode arch=compute_75,code=sm_75 --use_fast_math --compiler-options=-fPIC --expt-relaxed-constexpr --pre-include=libKeOpsnumpy60ff1f2397.h -DNVCC -I/usr/local/cuda/include -I/home/jake.gardner/git/keops/pykeops -I/home/jake.gardner/git/keops/pykeops/../keops -I/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397
CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/__/keops/core/keopslibKeOpsnumpy60ff1f2397_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397'
CMakeFiles/Makefile2:331: recipe for target 'CMakeFiles/keopslibKeOpsnumpy60ff1f2397.dir/all' failed
make[2]: Leaving directory '/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397'
CMakeFiles/Makefile2:269: recipe for target 'CMakeFiles/libKeOpsnumpy60ff1f2397.dir/rule' failed
make[1]: Leaving directory '/home/jake.gardner/git/keops/pykeops/build/build-libKeOpsnumpy60ff1f2397'
Makefile:183: recipe for target 'libKeOpsnumpy60ff1f2397' failed

--------------------- ----------- -----------------
Done.

Compile Issue - No module named 'libKeOpstorchd5f55273e3'

I'm trying to get KeOps working and I'm having compiler issues that appear to be related to compiling Pytorch bindings.

I read through similar issues (https://github.com/getkeops/keops/issues/28, https://github.com/getkeops/keops/issues/49, https://github.com/getkeops/keops/issues/8), but these mention fixes in v1.4 release. I'm running into the following issue using the v1.4 pip install.

Test script:

import pykeops
pykeops.verbose = True
pykeops.build_type = 'Debug'
pykeops.clean_pykeops()
pykeops.test_torch_bindings()

Terminal output:

Compiling libKeOpstorchd5f55273e3 in /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3);
       dtype  : float32
... -- The CXX compiler identification is GNU 7.3.0
-- Check for working CXX compiler: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++
-- Check for working CXX compiler: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Compute properties automatically set to: -DMAXIDGPU=0;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152
-- The CUDA compiler identification is NVIDIA 10.1.243
-- Check for working CUDA compiler: /usr/local/cuda-10.1/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.1/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++
-- Autodetected CUDA architecture(s):  3.7
-- Using shared_obj_name: libKeOpstorchd5f55273e3
-- First i variables detected is 0
-- First j variables detected is 1
-- Compiled formula is Sum_Reduction(SqNorm2(x - y),1); auto x = Vi(0,3); auto y = Vj(1,3); where the number of args is 2.
-- Found PythonInterp: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/python3.6 (found suitable version "3.6.5", minimum required is "3.6")
-- Found PythonLibs: /home/ubuntu/anaconda3/envs/pytorch_p36/lib/libpython3.6m.so
-- pybind11 v2.3.dev1
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(79): error: inline specifier allowed on function declarations only

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: argument list for class template "std::pair" is missing

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: expected a ")"

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: template parameter "_T1" may not be redeclared in this scope

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: expected a ";"

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/utility(366): error: inline specifier allowed on function declarations only

6 errors detected in the compilation of "/tmp/tmpxft_00002eb0_00000000-6_link_autodiff.cpp1.ii".
CMake Error at keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.Debug.cmake:279 (message):
  Error generating file
  /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpstorchd5f55273e3.dir/rule] Error 2
make: *** [libKeOpstorchd5f55273e3] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorchd5f55273e3', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -S/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -B/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpstorchd5f55273e3
make[1]: Entering directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -S/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -B/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E cmake_progress_start /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpstorchd5f55273e3.dir/all
make[2]: Entering directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/build.make CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/depend
make[3]: Entering directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
cd /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core && /usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E make_directory /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/.
cd /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core && /usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Debug -D generated_file:STRING=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.cubin.txt -P /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.Debug.cmake
-- Removing /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E remove /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
-- Generating dependency file: /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda-10.1/bin/nvcc -M -D__CUDACC__ /home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -o /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpstorchd5f55273e3_EXPORTS -DMAXIDGPU=0 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorchd5f55273e3 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-fvisibility-inlines-hidden\",\"-std=c++17\",\"-fmessage-length=0\",\"-march=nocona\",\"-mtune=haswell\",\"-ftree-vectorize\",\"-fPIC\",\"-fstack-protector-strong\",\"-fno-plt\",\"-O2\",\"-ffunction-sections\",\"-pipe\",\"-isystem\",\"/home/ubuntu/anaconda3/envs/pytorch_p36/include\",\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-g\",\"-O0\",\"-g\" -gencode arch=compute_37,code=sm_37 --use_fast_math --compiler-options=-fPIC -ccbin /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++ --pre-include=libKeOpstorchd5f55273e3.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops -I/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Generating temporary cmake readable file: /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -D input_file:FILEPATH=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/local/lib/python3.5/dist-packages/cmake/data/share/cmake-3.13/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp to /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E copy_if_different /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend
-- Removing /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp and /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E remove /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
/usr/local/cuda-10.1/bin/nvcc /home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpstorchd5f55273e3_EXPORTS -DMAXIDGPU=0 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorchd5f55273e3 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-fvisibility-inlines-hidden\",\"-std=c++17\",\"-fmessage-length=0\",\"-march=nocona\",\"-mtune=haswell\",\"-ftree-vectorize\",\"-fPIC\",\"-fstack-protector-strong\",\"-fno-plt\",\"-O2\",\"-ffunction-sections\",\"-pipe\",\"-isystem\",\"/home/ubuntu/anaconda3/envs/pytorch_p36/include\",\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-g\",\"-O0\",\"-g\" -gencode arch=compute_37,code=sm_37 --use_fast_math --compiler-options=-fPIC -ccbin /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++ --pre-include=libKeOpstorchd5f55273e3.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops -I/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Removing /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E remove /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
CMakeFiles/Makefile2:331: recipe for target 'CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/all' failed
make[2]: Leaving directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
CMakeFiles/Makefile2:306: recipe for target 'CMakeFiles/libKeOpstorchd5f55273e3.dir/rule' failed
make[1]: Leaving directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
Makefile:196: recipe for target 'libKeOpstorchd5f55273e3' failed

--------------------- ----------- -----------------
Done.
Compiling libKeOpstorchd5f55273e3 in /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3);
       dtype  : float32
... -- The CXX compiler identification is GNU 7.3.0
-- Check for working CXX compiler: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++
-- Check for working CXX compiler: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Compute properties automatically set to: -DMAXIDGPU=0;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152
-- The CUDA compiler identification is NVIDIA 10.1.243
-- Check for working CUDA compiler: /usr/local/cuda-10.1/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.1/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++
-- Autodetected CUDA architecture(s):  3.7
-- Using shared_obj_name: libKeOpstorchd5f55273e3
-- First i variables detected is 0
-- First j variables detected is 1
-- Compiled formula is Sum_Reduction(SqNorm2(x - y),1); auto x = Vi(0,3); auto y = Vj(1,3); where the number of args is 2.
-- Found PythonInterp: /home/ubuntu/anaconda3/envs/pytorch_p36/bin/python3.6 (found suitable version "3.6.5", minimum required is "3.6")
-- Found PythonLibs: /home/ubuntu/anaconda3/envs/pytorch_p36/lib/libpython3.6m.so
-- pybind11 v2.3.dev1
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(79): error: inline specifier allowed on function declarations only

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: argument list for class template "std::pair" is missing

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: expected a ")"

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: template parameter "_T1" may not be redeclared in this scope

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/stl_pair.h(437): error: expected a ";"

/home/ubuntu/anaconda3/envs/pytorch_p36/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/utility(366): error: inline specifier allowed on function declarations only

6 errors detected in the compilation of "/tmp/tmpxft_0000308a_00000000-6_link_autodiff.cpp1.ii".
CMake Error at keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.Debug.cmake:279 (message):
  Error generating file
  /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o


make[3]: *** [CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o] Error 1
make[2]: *** [CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/all] Error 2
make[1]: *** [CMakeFiles/libKeOpstorchd5f55273e3.dir/rule] Error 2
make: *** [libKeOpstorchd5f55273e3] Error 2

--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorchd5f55273e3', '--', 'VERBOSE=1']' returned non-zero exit status 2.
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -S/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -B/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/make -f CMakeFiles/Makefile2 libKeOpstorchd5f55273e3
make[1]: Entering directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -S/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -B/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 --check-build-system CMakeFiles/Makefile.cmake 0
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E cmake_progress_start /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles 5
/usr/bin/make -f CMakeFiles/Makefile2 CMakeFiles/libKeOpstorchd5f55273e3.dir/all
make[2]: Entering directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
/usr/bin/make -f CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/build.make CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/depend
make[3]: Entering directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
[ 20%] Building NVCC (Device) object CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
cd /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core && /usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E make_directory /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/.
cd /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core && /usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=Debug -D generated_file:STRING=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o -D generated_cubin_file:STRING=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.cubin.txt -P /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.Debug.cmake
-- Removing /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E remove /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
-- Generating dependency file: /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/cuda-10.1/bin/nvcc -M -D__CUDACC__ /home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -o /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend -m64 -DkeopslibKeOpstorchd5f55273e3_EXPORTS -DMAXIDGPU=0 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorchd5f55273e3 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-fvisibility-inlines-hidden\",\"-std=c++17\",\"-fmessage-length=0\",\"-march=nocona\",\"-mtune=haswell\",\"-ftree-vectorize\",\"-fPIC\",\"-fstack-protector-strong\",\"-fno-plt\",\"-O2\",\"-ffunction-sections\",\"-pipe\",\"-isystem\",\"/home/ubuntu/anaconda3/envs/pytorch_p36/include\",\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-g\",\"-O0\",\"-g\" -gencode arch=compute_37,code=sm_37 --use_fast_math --compiler-options=-fPIC -ccbin /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++ --pre-include=libKeOpstorchd5f55273e3.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops -I/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Generating temporary cmake readable file: /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -D input_file:FILEPATH=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend -D output_file:FILEPATH=/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp -D verbose=1 -P /usr/local/lib/python3.5/dist-packages/cmake/data/share/cmake-3.13/Modules/FindCUDA/make2cmake.cmake
-- Copy if different /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp to /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E copy_if_different /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend
-- Removing /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp and /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E remove /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.depend.tmp /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o.NVCC-depend
-- Generating /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
/usr/local/cuda-10.1/bin/nvcc /home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops/core/link_autodiff.cu -c -o /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o -m64 -DkeopslibKeOpstorchd5f55273e3_EXPORTS -DMAXIDGPU=0 -DMAXTHREADSPERBLOCK0=1024 -DSHAREDMEMPERBLOCK0=49152 -D_FORCE_INLINES -DCUDA_BLOCK_SIZE=192 -DUSE_CUDA=1 -D__TYPE__=float -DC_CONTIGUOUS=1 -D__TYPEACC__=float -DSUM_SCHEME=1 -DMODULE_NAME=libKeOpstorchd5f55273e3 -D_GLIBCXX_USE_CXX11_ABI=0 -DUSE_DOUBLE=0 -DUSE_HALF=0 -DKERNEL_GEOM_TYPE=0 -DKERNEL_SIG_TYPE=0 -DKERNEL_SPHERE_TYPE=0 -DMODULE_NAME_FSHAPE_SCP=fshape_scp_gaussiangaussiangaussian_unoriented_float -Xcompiler ,\"-fvisibility-inlines-hidden\",\"-std=c++17\",\"-fmessage-length=0\",\"-march=nocona\",\"-mtune=haswell\",\"-ftree-vectorize\",\"-fPIC\",\"-fstack-protector-strong\",\"-fno-plt\",\"-O2\",\"-ffunction-sections\",\"-pipe\",\"-isystem\",\"/home/ubuntu/anaconda3/envs/pytorch_p36/include\",\"-DUSE_OPENMP\",\"-fopenmp\",\"-Wall\",\"-Wno-unknown-pragmas\",\"-fmax-errors=2\",\"-fPIC\",\"-g\",\"-O0\",\"-g\" -gencode arch=compute_37,code=sm_37 --use_fast_math --compiler-options=-fPIC -ccbin /home/ubuntu/anaconda3/envs/pytorch_p36/bin/x86_64-conda_cos6-linux-gnu-c++ --pre-include=libKeOpstorchd5f55273e3.h -DNVCC -I/usr/local/cuda-10.1/include -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops -I/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/keops -I/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3 -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include -I/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/include/torch/csrc/api/include
-- Removing /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
/usr/local/lib/python3.5/dist-packages/cmake/data/bin/cmake -E remove /home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3/CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/./keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o
CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/build.make:63: recipe for target 'CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/keops/core/keopslibKeOpstorchd5f55273e3_generated_link_autodiff.cu.o' failed
make[3]: Leaving directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
CMakeFiles/Makefile2:331: recipe for target 'CMakeFiles/keopslibKeOpstorchd5f55273e3.dir/all' failed
make[2]: Leaving directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
CMakeFiles/Makefile2:306: recipe for target 'CMakeFiles/libKeOpstorchd5f55273e3.dir/rule' failed
make[1]: Leaving directory '/home/ubuntu/.cache/pykeops-1.4-cpython-36/build-libKeOpstorchd5f55273e3'
Makefile:196: recipe for target 'libKeOpstorchd5f55273e3' failed

--------------------- ----------- -----------------
Done.
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/test/install.py", line 55, in test_torch_bindings
    if torch.allclose(my_conv(x, y).view(-1), torch.tensor(expected_res).type(torch.float32)):
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 396, in __call__
    device_id, ranges, self.accuracy_flags, *args)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 22, in forward
    myconv = LoadKeOps(formula, aliases, dtype, 'torch', optional_flags).import_module()
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpstorchd5f55273e3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Optimal Transport/compile_test.py", line 5, in <module>
    pykeops.test_torch_bindings()    # perform the compilation
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/test/install.py", line 66, in test_torch_bindings
    print(my_conv(x, y))
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 396, in __call__
    device_id, ranges, self.accuracy_flags, *args)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py", line 22, in forward
    myconv = LoadKeOps(formula, aliases, dtype, 'torch', optional_flags).import_module()
  File "/home/ubuntu/.local/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'libKeOpstorchd5f55273e3'

Compiler settings:

gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.12' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)
g++ -v

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.12' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)

Compilers are installed locally using Anaconda.

cmake version 3.13.3

nvcc Install

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Any assistance with this would be helpful.

ImportError: dynamic module does not define module export function (PyInit_libKeOpsnumpy5ac3d464a2)

Hello!
There's something wrong when I run the example code as follow.
Actually I see this bug in #17, but I think they are not the same because it's ok for the first time in #17 but failed in my machine.

code

import numpy as np
import pykeops.numpy as pknp

x = np.arange(1, 10).reshape(-1, 3).astype('float32')
y = np.arange(3, 9).reshape(-1, 3).astype('float32')
my_conv = pknp.Genred('SqNorm2(x - y)', ['x = Vi(3)', 'y = Vj(3)'])
res = my_conv(x, y, backend='CPU')
assert res.shape == (2, 1)
print("okay")

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

cmake --version

cmake version 3.15.3

which python

/home/lowen/anaconda3/envs/lowenEnv/bin/python

python --version

Python 3.6.8 :: Anaconda, Inc.

gcc --version

gcc (GCC) 7.4.0
Copyright © 2017 Free Software Foundation, Inc.


output

Compiling libKeOpsnumpy5ac3d464a2 in /home/lowen/.cache/pykeops-1.1.2-cpython-36//build-libKeOpsnumpy5ac3d464a2:
       formula: Sum_Reduction(SqNorm2(x - y),1)
       aliases: x = Vi(0,3); y = Vj(1,3); 
       dtype  : float64
... Done.
Traceback (most recent call last):
  File "xjbx.py", line 972, in <module>
    test_geomloss()
  File "xjbx.py", line 936, in test_geomloss
    my_conv = pknp.Genred('SqNorm2(x - y)', ['x = Vi(3)', 'y = Vj(3)'])
  File "/home/lowen/anaconda3/envs/lowenEnv/lib/python3.6/site-packages/pykeops/numpy/generic/generic_red.py", line 114, in __init__
    self.myconv = LoadKEops(self.formula, self.aliases, self.dtype, 'numpy').import_module()
  File "/home/lowen/anaconda3/envs/lowenEnv/lib/python3.6/site-packages/pykeops/common/keops_io.py", line 52, in import_module
    return importlib.import_module(self.dll_name)
  File "/home/lowen/anaconda3/envs/lowenEnv/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: dynamic module does not define module export function (PyInit_libKeOpsnumpy5ac3d464a2)

Efficiency of covariance calculation with high dimensionality

Hi,
First of all thank you for this amazing library.

I encountered a behaviour which was unexpected on my part and was wondering if there is anything I can do to fix it.
When calculating a kernel between N x D matrices with large D keops seems to slow down a lot.

Simple code sample to reproduce this is:

import torch
from pykeops.torch import Genred
import timeit

a = torch.randn(10000, 700, requires_grad=False, dtype=torch.float64)
c = torch.randn(10000, 700, requires_grad=False, dtype=torch.float64)
v = torch.randn(10000, 2, requires_grad=False, dtype=torch.float64)

formula = '(X|Y) * v'
aliases = [
    'X = Vi(%d)' % (a.shape[1]),
    'Y = Vj(%d)' % (c.shape[1]),
    'v = Vi(%d)' % (v.shape[1]),
]
mmv = Genred(formula, aliases, reduction_op='Sum', axis=1, dtype='float64')
mmv(a, c, v)

timeit.repeat("mmv(a, c, v, backend='GPU_1D'); torch.cuda.synchronize()", globals=globals(), number=1, repeat=5)
timeit.repeat('(a @ c.T) @ v', globals=globals(), number=1, repeat=5)

The keops function takes ~6 seconds to run (on the GPU) while the naive pytorch takes ~0.4s (on a 24-core CPU). I find this interesting since if we reduce D to e.g. 7 KeOps is massively faster!

I'm sure there is something simple that I am clearly missing. Please let me know if this is the case.
Thanks,
Giacomo

Extracting a band diagonal with KeOps?

I want to do matrix multiplication of 2D tensors where I only care about a few diagonals of the resulting matrix, and I want to run this on GPU on PyTorch. Is this something that can be done with keops ?

For illustration, here's the numpy code, but what I really need is PyTorch/GPU

import numpy as np
M = 16000  # huge, can't do the O(n^2) operations
N = 64
c = 100  # number of diagonals, a lot smaller than M
t1 = np.random.rand(M, N)  
t2 = np.random.rand(M, N) 
r = np.zeros((M, d))

for i in range(M):
    for j in range(-c, c):  
        r[i][j] = np.dot(t1[i], t2[i + j])  # `(i + j)` should be `min(max(i + j, 0), M - 1)`
                                            # to take care of boundry condition, but let's skip that for now

PS: I am running into compilation errors with the tutorial examples, but will ask about that later.

Windows support

(Py37A) C:\Users\Francois>pip install pykeops
Collecting pykeops
  Downloading https://files.pythonhosted.org/packages/48/72/d1576e0841b1fa6dd65de4ef203362e5eb7748215005ace2975e12ac2679/pykeops-1.3.tar.gz (301kB)
     |████████████████████████████████| 307kB 731kB/s
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\francois\venvs\py37a\scripts\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Francois\\AppData\\Local\\Temp\\pip-install-hm6v38ko\\pykeops\\setup.py'"'"'; __file__='"'"'C:\\Users\\Francois\\AppData\\Local\\Temp\\pip-install-hm6v38ko\\pykeops\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\Francois\AppData\Local\Temp\pip-install-hm6v38ko\pykeops\pip-egg-info'
         cwd: C:\Users\Francois\AppData\Local\Temp\pip-install-hm6v38ko\pykeops\
    Complete output (9 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\Francois\AppData\Local\Temp\pip-install-hm6v38ko\pykeops\setup.py", line 11, in <module>
        from pykeops import __version__ as current_version
      File "C:\Users\Francois\AppData\Local\Temp\pip-install-hm6v38ko\pykeops\pykeops\__init__.py", line 34, in <module>
        from .common.utils import clean_pykeops
      File "C:\Users\Francois\AppData\Local\Temp\pip-install-hm6v38ko\pykeops\pykeops\common\utils.py", line 1, in <module>
        import fcntl
    ModuleNotFoundError: No module named 'fcntl'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

(Py37A) C:\Users\Francois>pip install fcntl
ERROR: Could not find a version that satisfies the requirement fcntl (from versions: none)
ERROR: No matching distribution found for fcntl

a litlle more search shows that fcntl is only supported on Mac/Linux. It seems it is only used in one place to lock a file, which could be done in a more portable way with portalocker?

unary('Max')

I was using ((LazyTensor(XX[:,None,:])-LazyTensor(XX[None,:,:]))**2).sum(-1) for norm L^2 and naively tried (LazyTensor(XX[:,None,:])-LazyTensor(XX[None,:,:])).abs().max(-1) for the sup norm, but it doesn't seem to be implemented. Adding a struct Max (similar to Sum) that derives from UnaryOp lets me call unary('Max',dimres=1) and seems to work. The only tricky parts are avoiding std::max (or other host functions) and getting the initial value right (I kept 0 which was fine for a sup norm, but in general I guess it should be -infinity or lowest, from numeric_limits, depending on the type), unless we take advantage of F::DIM>0, initialize with outF[0], and start the iteration from k=1.
Is max (and min) missing because of a lack of time and demand, or is there some reason why it would be a bad idea?

Library missing: libKeOpstorchf44721a1c0

I have tried to run: https://www.kernel-operations.io/keops/_auto_benchmarks/plot_benchmark_convolutions.html#sphx-glr-auto-benchmarks-plot-benchmark-convolutions-py

Timings for 10000x10000 convolutions:
kernel: gaussian
Compiling libKeOpstorchf44721a1c0 in /home/thomas/.cache/pykeops-1.2-cpython-36//build-libKeOpstorchf44721a1c0:
formula: Sum_Reduction((Exp( -(WeightedSqDist(G_0,X_0,Y_0))) * B_0),0)
aliases: G_0 = Pm(0,1); X_0 = Vi(1,3); Y_0 = Vj(2,3); B_0 = Vj(3,3);
dtype : float32
...
--------------------- CMAKE DEBUG -----------------
Command '['cmake', '/home/thomas/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops', '-DCMAKE_BUILD_TYPE=Release', '-DFORMULA_OBJ=Sum_Reduction((Exp( -(WeightedSqDist(G_0,X_0,Y_0))) * B_0),0)', '-DVAR_ALIASES=auto G_0 = Pm(0,1); auto X_0 = Vi(1,3); auto Y_0 = Vj(2,3); auto B_0 = Vj(3,3); ', '-Dshared_obj_name=libKeOpstorchf44721a1c0', '-D__TYPE__=float', '-DPYTHON_LANG=torch', '-DC_CONTIGUOUS=1', '-DPYTORCH_INCLUDE_DIR=/home/thomas/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/torch/include;/home/thomas/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/torch/include/torch/csrc/api/include', '-DcommandLine=cmake /home/thomas/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops -DCMAKE_BUILD_TYPE=Release -DFORMULA_OBJ=Sum_Reduction((Exp( -(WeightedSqDist(G_0,X_0,Y_0))) * B_0),0) -DVAR_ALIASES=auto G_0 = Pm(0,1); auto X_0 = Vi(1,3); auto Y_0 = Vj(2,3); auto B_0 = Vj(3,3); -Dshared_obj_name=libKeOpstorchf44721a1c0 -D__TYPE__=float -DPYTHON_LANG=torch -DC_CONTIGUOUS=1 -DPYTORCH_INCLUDE_DIR=/home/thomas/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/torch/include;/home/thomas/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/torch/include/torch/csrc/api/include']' returned non-zero exit status 1.
-- The CXX compiler identification is GNU 7.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Compute properties automatically set to: -DMAXIDGPU=0;-DMAXTHREADSPERBLOCK0=1024;-DSHAREDMEMPERBLOCK0=49152
-- The CUDA compiler identification is NVIDIA 9.1.85
-- Check for working CUDA compiler: /usr/bin/nvcc
-- Check for working CUDA compiler: /usr/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- The CUDA Host CXX Compiler: /usr/bin/c++
-- Autodetected CUDA architecture(s): 7.5
-- Using shared_obj_name: libKeOpstorchf44721a1c0
-- Found PythonInterp: /home/thomas/.pyenv/shims/python3.7 (found version "1.4")
-- Configuring incomplete, errors occurred!
See also "/home/thomas/.cache/pykeops-1.2-cpython-36/build-libKeOpstorchf44721a1c0/CMakeFiles/CMakeOutput.log".
See also "/home/thomas/.cache/pykeops-1.2-cpython-36/build-libKeOpstorchf44721a1c0/CMakeFiles/CMakeError.log".


--------------------- MAKE DEBUG -----------------
Command '['cmake', '--build', '.', '--target', 'libKeOpstorchf44721a1c0', '--', 'VERBOSE=1']' returned non-zero exit status 2.


Done.

ModuleNotFoundError Traceback (most recent call last)
in
16 }
17
---> 18 g_keops = kernel_product(params, xc, yc, bc, mode='sum').cpu()
19 torch.cuda.synchronize()
20 speed_pykeops[k] = np.array(timeit.repeat(

~/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops/torch/kernel_product/kernels.py in kernel_product(params, x, y, mode, backend, dtype, cuda_type, *bs)
410 if not y.class in [tuple, list]: y = (y,)
411
--> 412 return FeaturesKP(kernel, gamma, x, y, bs, mode=mode, backend=backend, dtype=dtype)

~/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops/torch/kernel_product/features_kernels.py in FeaturesKP(kernel, gs, xs, ys, bs, mode, backend, dtype)
163 genconv = Genred(formula, aliases, reduction_op=red, axis=axis, dtype=dtype)
164
--> 165 return genconv(*full_args, backend=backend)

~/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py in call(self, backend, device_id, ranges, *args)
349
350 """
--> 351 out = GenredAutograd.apply(self.formula, self.aliases, backend, self.dtype, device_id, ranges, *args)
352 nx, ny = get_sizes(self.aliases, *args)
353 nout = nx if self.axis==1 else ny

~/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops/torch/generic/generic_red.py in forward(ctx, formula, aliases, backend, dtype, device_id, ranges, *args)
19
20 myconv = LoadKEops(formula, aliases, dtype, 'torch',
---> 21 ['-DPYTORCH_INCLUDE_DIR=' + ';'.join(include_dirs)]).import_module()
22
23 # Context variables: save everything to compute the gradient:

~/.cache/pypoetry/virtualenvs/superpoint-graph-job-py3.6/lib/python3.6/site-packages/pykeops/common/keops_io.py in import_module(self)
50
51 def import_module(self):
---> 52 return importlib.import_module(self.dll_name)

~/.pyenv/versions/3.6.8/lib/python3.6/importlib/init.py in import_module(name, package)
124 break
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)
127
128

~/.pyenv/versions/3.6.8/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

~/.pyenv/versions/3.6.8/lib/python3.6/importlib/_bootstrap.py in find_and_load(name, import)

~/.pyenv/versions/3.6.8/lib/python3.6/importlib/_bootstrap.py in find_and_load_unlocked(name, import)

ModuleNotFoundError: No module named 'libKeOpstorchf44721a1c0'

Issue when discrepancy between available CUDA device at build time / runtime

Hey, first off, thanks for the library !

I have had some weird issues today when trying to use a kernel on 'cuda:1' when the kernel was built on a machine with only 2 gpus. I run into this because I use a shared home filesystem (and hence shared .cache folder) on a cluster where I have access to machines with various number of GPUS.

Here is how to reproduce, on a machine with 2 GPUs:

test.py :

import torch
from pykeops.torch import LazyTensor

def test(data):
	neigh_state = LazyTensor(data[None, :, :])
	state = LazyTensor(data[:, None, :])
	all_distances = ((neigh_state - state) ** 2).sum(dim=2)
	return (- all_distances).logsumexp(dim=1)

tensor = torch.randn(10,128).to('cuda:0')
print(torch.cuda.device_count())
test(tensor)

run CUDA_VISIBLE_DEVICES=0 python test.py. This should build a kernel.
then change 'cuda:0' to 'cuda:1' in test.py
run python test.py.

This fails with error :
invalid Gpu device number. If the number of available Gpus is > 12, add required lines at the end of function SetGpuProps and recompile.

Recompiling is not a great option for me, as I might run different experiments using the same kernel but on machines with different number of available gpus.

[Bug] Error when back-propagating through matmul with large dimension

When back-propagating through operations that involve matmul with large matrix dimension, I run into the following error:

RuntimeError: [KeOps] Arg number 6 : is not contiguous. Please provide 'contiguous' data array, as KeOps does not support strides. If you're getting this error in the 'backward' pass of a code using torch.sum() on the output of a KeOps routine, you should consider replacing 'a.sum()' with '(1. * a).sum()' or 'torch.dot(a.view(-1), torch.ones_like(a).view(-1))'. 

This happens at exactly dim 80, so I was able to trace this back to the special casing here:

if pykeops.gpu_available and v_.shape[-1] > 80 :
# custom method when last dim of v is large
# we have :
# K._shape = (batchdimsK,M,N,1)
# v_.shape = (batchdimsv,1,N,Nv)
# we expand v_ to get same shape as K :
v_ = self.tools.view(v_,[1]*(len(self._shape)-len(v_.shape))+list(v_.shape)) # (1,..,1,batchdimsv,1,N,Nv)
# (NB if K has less batch dims than v it does nothing)
# now we shift the Nv dim from last to first position
v_ = self.tools.permute(v_,[len(v_.shape)-1]+list(range(0,len(v_.shape)-1))) # (Nv,1,..,1,batchdimsv,1,N)
v_ = self.tools.contiguous(v_)
# we add a dummy dimension at the end (maybe not necessary ?)
v_ = self.tools.view(v_,list(v_.shape)+[1]) # (Nv,1,..,1,batchdimsv,1,N,1)
v_ = LazyTensor(v_)
Kv = (self*v_).sum(dim=len(v_._shape)-2) # (Nv,outbatchdims,M,1)
Kv = self.tools.permute(Kv,list(range(1,len(Kv.shape)))+[0]) # (outbatchdims,M,1,Nv)
Kv = self.tools.contiguous(Kv)
Kv = self.tools.view(Kv,list(Kv.shape[:-2])+[Kv.shape[-1]]) # (outbatchdims,M,Nv)

If I comment out that block and use the else for everything I don't observe this issue, so there must be something problematic going on there.

I don't have an stripped down repro, but the following, using gpytorch, is pretty concise:
keops_backward_issue.ipynb.txt

cc @jacobrgardner, @gpleiss

ABI incompatibility: -D_GLIBCXX_USE_CXX11_ABI=0

I was getting a weird error message when I tried to use pykeops.
Basically this one:

undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

After doing some sleuthing I figured out my distribution (NixOS) compiles pytorch with gcc7 and thus uses the newer ABI that was introduced with gcc4.8 I think. Then I fixed the issue locally deleting the line in CMakeLists.txt that adds sets -D_GLIBCXX_USE_CXX11_ABI=0.

Then I dug into the documentation of pytorch but couldn't find any reference to setting -D_GLIBCXX_USE_CXX11_ABI=0while compiling and also couldn't really figure out what the recommended version of gcc is for pytorch. I guess I am currently trying to figure whose responsible for the ABI incompatibility (nix, pytorch documentation, or pykeops), so that I can open an issue at the correct place but are people still compiling pytorch with gcc4.8?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.