lurk-lab / neptune Goto Github PK

Rust Poseidon implementation.

License: Other

Rust 97.63% C 1.76% Cool 0.17% Makefile 0.22% Shell 0.23%

neptune's Issues

column_tree_builder::tests::test_column_tree_builder ... LLVM ERROR

Wasn't sure how to title this bug report, so I hope it's good enough.

My configuration:
Manjaro Linux
Mesa Drivers with AMDGPU-PRO OpenCL library (this configuration works in other OpenCL applications/benchmarks such as Luxmark)
CPU: Ryzen 5950X
GPU: Radeon RX 6800 XT

Ran into this error when running benchy from rust-fil-proofs. @vmx advised me to file a bug report against neptune.

     Finished test [unoptimized + debuginfo] target(s) in 20.54s
     Running target/debug/deps/neptune-b1005e4d0551b26d

running 33 tests
test circuit::tests::test_poseidon_hash ... ok
test circuit::tests::test_scalar_product ... ok
test circuit::tests::test_scalar_product_with_add ... ok
test circuit::tests::test_square_sum ... ok
test column_tree_builder::tests::test_column_tree_builder ... LLVM ERROR: Cannot select: 0x55bca5ac5090: i32 = GlobalAddress<[4 x i64] addrspace(5)* @constinit.10> 0
In function: hash_2_standard
error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `/home/chuck/git/neptune/target/debug/deps/neptune-b1005e4d0551b26d --test-threads=1` (signal: 6, SIGABRT: process abort signal)

Interestingly, when I installed ROCm from AUR, I get a different, but probably related error (technically, ROCm isn't support on RDNA cards, but I tried it after seeing a Phoronix article):

$ cargo test --features opencl,arity2,arity4,arity8,arity11,arity16,arity24,arity36 -- --test-threads=1
    Finished test [unoptimized + debuginfo] target(s) in 0.16s
     Running target/debug/deps/neptune-b1005e4d0551b26d

running 33 tests
test circuit::tests::test_poseidon_hash ... ok
test circuit::tests::test_scalar_product ... ok
test circuit::tests::test_scalar_product_with_add ... ok
test circuit::tests::test_square_sum ... ok
test column_tree_builder::tests::test_column_tree_builder ... LLVM ERROR: Cannot select: 0x5650f3109988: i32 = GlobalAddress<[4 x i64] addrspace(5)* @constinit.10> 0
In function: apply_round_matrix_2_standard
error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `/home/chuck/git/neptune/target/debug/deps/neptune-b1005e4d0551b26d --test-threads=1` (signal: 6, SIGABRT: process abort signal)

dev needs to be rebased on main

dev at c58fb8b does not include main at 5285991

Re-establish a GPU-based CI

We introduced basic CI as part of #177 to compensate for the change in CI infrastructure. We should introduce a Docker runner able to test Neptune CI with OpenCL and CUDA capabilities, and then run it in the form of self-hosted runners.

To achieve this, we need to integrate the following resources:

Existing CI: We currently have a CI pipeline set up with GitHub Actions.
Self-Hosted Runner Documentation: The official GitHub documentation for self-hosted runners can be found here: https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners and https://docs.github.com/en/actions/hosting-your-own-runners/using-self-hosted-runners-in-a-workflow
NVIDIA Container Toolkit: In order to use the GPU within our Docker container, we can integrate the NVIDIA Container Toolkit (https://github.com/NVIDIA/nvidia-docker) into our setup.
Dockerfile Example: A sample Dockerfile that demonstrates basic Rust installation: https://gist.github.com/huitseeker/0c58ee69f63c5e81d6ea64f0dc5153f7. This example can serve as a starting point for our own Dockerfile configuration.

dev needs to be rebased on main

dev at f65b3e9 does not include main at cbcd0a4

Potential implementation of Filecoin Poseidon in Golang?

In the new CID/CAR format, multihash is supported, and Poseidon has officially become a hash option.

However, it has not been implemented by many multihash implementations. Indeed, the go-multihash does not support Poseidon, neither does rust-multihash
https://github.com/multiformats/go-multihash/tree/master/register
https://github.com/multiformats/rust-multihash#supported-hash-types

Is neptune finalized? Is there a plan to implement Poseidon in multiple programming languages (FFI does not seem to be a good idea)?

Error with AMD 570 GPU

cd gbench
cargo run

Compiling gbench v0.5.4 (/home/peware/neptune/gbench)
Finished dev [unoptimized + debuginfo] target(s) in 2m 43s
Running target/debug/gbench
[2020-07-13T00:50:57Z INFO gbench] KiB: 4194304
[2020-07-13T00:50:57Z INFO gbench] leaves: 134217728
[2020-07-13T00:50:57Z INFO gbench] max column batch size: 400000
[2020-07-13T00:50:57Z INFO gbench] max tree batch size: 700000
--> Run 0
[2020-07-13T00:50:57Z INFO gbench] Creating ColumnTreeBuilder
(some Futhark code): Could not find acceptable OpenCL device.

sudo lshw -C display
*-display
description: VGA compatible controller
product: Ellesmere [Radeon RX 470/480/570/570X/580/580X]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:05:00.0
version: ef
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: irq:61 memory:d0000000-dfffffff memory:cfe00000-cfffffff ioport:5000(size=256) memory:fdec0000-fdefffff memory:fde00000-fde1ffff
*-display UNCLAIMED
description: VGA compatible controller
product: ES1000
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 3
bus info: pci@0000:01:03.0
version: 02
width: 32 bits
clock: 33MHz
capabilities: pm vga_controller bus_master cap_list
configuration: latency=64 mingnt=8
resources: memory:c0000000-c7ffffff ioport:2000(size=256) memory:ed9f0000-ed9fffff memory:c0000-dffff

dev needs to be rebased on main

dev at 89b24d5 does not include main at 3559d02

Neptune uses an outdated reference script

Hi, fn generate_constants() https://github.com/filecoin-project/neptune/blob/2b11f0ce69f52aa9594f250baa658bfe2d349ac3/src/round_constants.rs#L26
references https://extgit.iaik.tugraz.at/krypto/hadeshash/blob/master/code/scripts/create_rcs_grain.sage
That file does not exist. An updated script exists in that repo with a notice of some fixed bugs.

Are there no security implications in not following the updated reference impl?

I was trying to reproduce the Poseidon constants which circomlib uses (they use the more recent script generate_parameters_grain.sage) and was unable to.

BusIdNotAvailable in gbench

cd neptune-master/gbench
export NEPTUNE_GBENCH_GPUS=99
RUST_LOG=info cargo run -- --max-tree-batch-size 700000 --max-column-batch-size 400000 
    Finished dev [unoptimized + debuginfo] target(s) in 0.36s
     Running `/public/home/cf/neptune-master/target/debug/gbench --max-tree-batch-size 700000 --max-column-batch-size 400000
[2021-03-10T09:41:39Z INFO  gbench] KiB: 4194304
[2021-03-10T09:41:39Z INFO  gbench] leaves: 134217728
[2021-03-10T09:41:39Z INFO  gbench] max column batch size: 400000
[2021-03-10T09:41:39Z INFO  gbench] max tree batch size: 700000
[2021-03-10T09:41:39Z INFO  gbench] GPU[Selector: BatcherType::CustomGPU(BusId(99))] --> Run 0
[2021-03-10T09:41:39Z INFO  gbench] GPU[Selector: BatcherType::CustomGPU(BusId(99))]: Creating ColumnTreeBuilder
[2021-03-10T09:41:39Z INFO  neptune::triton::cl] getting context for ~BusId(99)
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ClError(BusIdNotAvailable)', gbench/src/main.rs:31:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', gbench/src/main.rs:163:23

BUSID info

$ rocm-smi --showhw
========================ROCm System Management Interface========================

GPU  DID   GFX RAS   SDMA RAS  UMC RAS  VBIOS             BUS           
1    66a1  DISABLED  ENABLED   ENABLED  113-D1631900-064  0000:04:00.0  
2    66a1  DISABLED  ENABLED   ENABLED  113-D1631900-064  0000:26:00.0  
3    66a1  DISABLED  ENABLED   ENABLED  113-D1631900-064  0000:43:00.0  
4    66a1  DISABLED  ENABLED   ENABLED  113-D1631900-064  0000:63:00.0  
==============================End of ROCm SMI Log ==============================

clinfo

[cf@c07r1n01 gbench]$ clinfo 
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (2982.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               4
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Device 66a1
  Device Topology:                               PCI[ B#4, D#0, F#0 ]
  Max compute units:                             60
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1600Mhz
  Address bits:                                  64
  Max memory allocation:                         14588628172
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26273
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            17163091968
  Constant buffer size:                          14588628172
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          1703726284
  Max global variable size:                      14588628172
  Max global variable preferred total size:      17163091968
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x2b835a7f4d30
  Name:                                          gfx906+sram-ecc
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                2982.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 


  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Device 66a1
  Device Topology:                               PCI[ B#38, D#0, F#0 ]
  Max compute units:                             60
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1600Mhz
  Address bits:                                  64
  Max memory allocation:                         14588628172
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26273
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            17163091968
  Constant buffer size:                          14588628172
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          1703726284
  Max global variable size:                      14588628172
  Max global variable preferred total size:      17163091968
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x2b835a7f4d30
  Name:                                          gfx906+sram-ecc
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                2982.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 


  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Device 66a1
  Device Topology:                               PCI[ B#67, D#0, F#0 ]
  Max compute units:                             60
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1600Mhz
  Address bits:                                  64
  Max memory allocation:                         14588628172
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26273
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            17163091968
  Constant buffer size:                          14588628172
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          1703726284
  Max global variable size:                      14588628172
  Max global variable preferred total size:      17163091968
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x2b835a7f4d30
  Name:                                          gfx906+sram-ecc
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                2982.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 


  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Device 66a1
  Device Topology:                               PCI[ B#99, D#0, F#0 ]
  Max compute units:                             60
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1600Mhz
  Address bits:                                  64
  Max memory allocation:                         14588628172
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26273
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            17163091968
  Constant buffer size:                          14588628172
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          1703726284
  Max global variable size:                      14588628172
  Max global variable preferred total size:      17163091968
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x2b835a7f4d30
  Name:                                          gfx906+sram-ecc
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                2982.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program

neptune for Pasta curves

The repo currently says: "Neptune is specialized to the BLS12-381 curve. Although the API allows for type specialization to other fields, the round numbers, constants, and s-box selection may not be correct. Do not do this."

Can we verify if the same set of constants would work for both curves in the Pasta cycle?

create_futhark_context() get 3090 GPU Error

I recently use the 3090 to power my P2. But Got the Error

Graphics SM warp Exception on GPc 1, TPc 0,
Graphics Exception Out of Range Addr

Then some of my P2 task will fail with invalid vinilla proof or CID not match.
the tree_c and tree_r_last are the same!

I use ubuntu 18.04 + GPU driver 460.92

Making pasta_curves work

This is a tracking issue in regards to making Neptune work with pasta_curves.

Upgrading ff and group crates

This is issue is meant as a reminder and not as an immediate action item. There are new releases of ff and group. Upgrading to those is a breaking change as they contain traits and you cannot really have the traits of two different versions in your dependency tree.

I propose postponing the upgrade until a new breaking release is needed for other reasons. This upgrade could then combined with such a release.

The upgrade is not straight forward, as also all other dependencies using those traits would need to be updates e.g. bellperson.

The upgrade would enable an upgrade of pasta_curves as well. The most recent release v0.5.0 should contain everything our current fork contains. This means once upgraded, we won't need to rely on a fork anymore.

dev needs to be rebased on main

dev at eccf7cb does not include main at 6cae521

dev needs to be rebased on main

dev at 576049a does not include main at d463888

Reporting rebasing need only once

There's a GitHub action that checks if a rebase is needed. It posts a comment every hour.

I'm currently watching this repo and it would be great if there wouldn't be so many events happening if nothing really changes. I propose using something like https://github.com/peter-evans/find-comment to check if the comment already exists and in case it does, skipping to post another one.

(Potential) Mismatch with Poseidon Hash paper

Thanks for the wonderful work!

It seems there are a few (potential) mismatches between round_numbers.rs and the Poseidon Hash paper. Is there any reason for this mismatch? More specifically,

In Line 82, we are using let rf_interp = 0.43 * m + t.log2() - rp;. In Poseidon Hash paper, Eq (3) requires

For BLS12-381 with , M=128, and , we should add something like log_5(t) instead of t.log2().

Line 82 ~ 83 is also different from Eq (5) in Poseidon hash paper.

dev needs to be rebased on main

dev at b3dc59e does not include main at a3e9a96

dev needs to be rebased on main

dev at ac41cb7 does not include main at 308e62c

Error in the specification of the sparse_factorize function.

The documentation says that w is the first column of m_hat or second column of m without the first row..

Actually it is the first column of m without the first row.

The implementation is ok: https://github.com/filecoin-project/neptune/blob/bafd77a5014e3b6a6b40359097835c3eb1dd533f/src/mds.rs#L197

dev needs to be rebased on main

dev at 9e67e35 does not include main at cfeaa9a

dev needs to be rebased on main

dev at 85f1d09 does not include main at dd09734

wasm target build error

Hi, could you pls suggest the correct way to build this lib for wasm. I tried
cargo build --target wasm32-unknown-unknown
and also with
cargo build --target wasm32-unknown-unknown --features "wasm"
but was getting compile errors:
MmapInner::map(self.get_len(file)?, file, self.offset).map(|inner| Mmap { inner: inner })
| ^^^^^^^^^ use of undeclared type MmapInner

Thanks.

dev needs to be rebased on main

dev at d6ef165 does not include main at ef14a61

Serialize/Deserialize Poseidon Constants

Hi - Nova's public parameters reference Neptune's Poseidon Hash Constants. We would like to serialize/deserialize Nova's public parameters using serde. What do you'll think about adding serde derive as an optional feature so that we can serialize/deserialize? It gets a bit tricky when it comes to serialize/deserialize of UInt from typenum so I was just wandering if anyone had any thoughts on that. Thanks.

Add a CITATION.cff to the repo

This allows Neptune to be cited properly in publications, see:
https://citation-file-format.github.io/

chore: rust toolchain needs an upgrade

The rust version specified in rust-toolchain.toml (1.75.0) is out of date with the latest stable (1.76.0).

Check the rust version check workflow for details.

This issue was raised by the workflow at https://github.com/lurk-lab/neptune/actions/runs/7879730700/workflow.

help..

Sorry.
I want to know how to use it.
I have 2 GPUs.
I don't understand RUST and cargo, so I want to know the execution command.

dev needs to be rebased on main

dev at 5e85c74 does not include main at 70665ce

chore: some installed deps are not needed

Some dependencies specified in Cargo.toml are not needed.

Check the unused dependencies sanity check workflow for details.

This issue was raised by the workflow at https://github.com/lurk-lab/ci-workflows/tree/main/.github/workflows/unused-deps.yml.

Note
If this is a false positive, please refer to the cargo-udeps docs on how to ignore the dependencies.

dev needs to be rebased on main

dev at 3248ce7 does not include main at 35f6fe3

Fix description of Custom domain tag algorithm in comment.

00a87a7#r734019298

Feature Request: Implementation of Poseidon2 Hash Function

I would like to propose the implementation of the Poseidon2 hash function in the Neptune repository. This recent advancement enhances the efficiency of the Poseidon hash function, specifically tailored for zero-knowledge applications.

Referencing the research paper and the explanatory note provided by the authors, Poseidon2 enhances performance by focusing on its linear layers and round constant addition. This new design requires only a short chain of additions for computation, significantly reducing the number of multiplications and reductions.

Specifically,

Poseidon2 employs a fixed matrix for the external linear layers and another matrix for the internal linear layers, differing from the original Poseidon, which uses the same expensive MDS matrix in each linear layer.
Poseidon2 directly modifies the round constant addition, eliminating the need for the efficient representation required in the original Poseidon.

Given these improvements, Poseidon2 can offer a performance boost of up to a factor of 4 compared to the original Poseidon, without any increase in the number of rounds or other disadvantages. The reference implementation provided by HorizenLabs may be useful for this implementation.

Considering the focus of the Neptune repository on the Poseidon hash function, I believe that including Poseidon2 would greatly enhance its performance and efficiency.

Proposal: removing Futhark implementation

Neptune currently has an implementation called Triton which is implemented in Futhark. There is now an OpenCL/CUDA implementation called Proteus, with better performance. I propose removing Triton to lower the maintenance cost of this library.

dev needs to be rebased on main

dev at b692e19 does not include main at 45510e4

chore: rust toolchain needs an upgrade

The rust version specified in rust-toolchain.toml (1.76.0) is out of date with the latest stable (1.81.0).

Check the rust version check workflow for details.

This issue was raised by the workflow at https://github.com/argumentcomputer/neptune/actions/runs/10746951111/workflow.

How to use multiple GPUs? For example, my machine has two NVIDIA GPU

Add CUDA support

Neptune should also support to run on CUDA.

Neptune does not currently build due to dependencies

There are broken dependencies in neptune that are causing build issues in https://github.com/filecoin-project/rust-filecoin-proofs-api

From neptune:

$ cargo update
    Updating crates.io index
error: failed to select a version for the requirement `rustc_version = "^0.1"`
candidate versions found which didn't match: 0.3.3, 0.3.2, 0.3.1, ...
location searched: crates.io index
required by package `fil-ocl-core v0.11.3`
    ... which is depended on by `fil-ocl v0.19.4`
    ... which is depended on by `rust-gpu-tools v0.3.0`
    ... which is depended on by `gbench v0.5.4 (...../neptune/gbench)`

From rust-filecoin-proofs-api:

$ cargo update
    Updating crates.io index
error: failed to select a version for the requirement `rustc_version = "^0.1"`
candidate versions found which didn't match: 0.3.3, 0.3.2, 0.3.1, ...
location searched: crates.io index
required by package `fil-ocl-core v0.11.3`
    ... which is depended on by `fil-ocl v0.19.4`
    ... which is depended on by `rust-gpu-tools v0.2.0`
    ... which is depended on by `bellperson v0.12.3`
    ... which is depended on by `filecoin-proofs-api v6.0.0 (...../rust-filecoin-proofs-api)`

CI: push tests vs merge groups

Our current CI triggers on push to dev as well as merge groups:
https://github.com/lurk-lab/neptune/blob/9a6c931d158ebbfeb2a301f45d637642d65f0779/.github/workflows/check-downstream-compiles.yml
https://github.com/lurk-lab/neptune/blob/d8b4eeadd8acc9d9e8d9d510605c954f1410aa60/.github/workflows/rust.yml#L4-L9

the push tests are for the most part redundant,
the check-downstream-compiles job is meant only as a warning, and so is useless on merge_group or push.

Allow odd full rounds

Currently, neptune expects that the number of full rounds R_F is an even number, as evidenced by the number of full rounds in the first and second halves being the same R_f = floor(R_F / 2).

All three Poseidon implementations (static, correct, and dynamic) use R_f as the number of first and second half full rounds, which is correct only when R_F is even (currently R_F = 8 for all Filecoin applications). However, when R_F is odd, the number of second half full rounds should be R_F - R_f (so you don't lose the last full round of the second half).

Required Changes

In all three Poseidon implementations, change the number of second half full rounds to self.constants.full_rounds - self.constants.half_full_rounds.
Remove this unimplemented! panic.

The formula is a bit wrong I think,

The formula is a bit wrong I think,
it should be identifier * 2^40 + strength * 2^32.

Originally posted by @Kubuxu in #116 (comment)

Benchmark tracking

Issue for tracking benchmarks over time.

Purpose of (Column)TreeBuilderTrait?

We currently have two tree builders, TreeBuilder and ColumnTreeBuilder, both with their own traits. AFAICT, there are only those implementations of those traits. Hence I wonder what the purpose of the ColumnTreeBuilderTrait and TreeBuilderTrait traits is?

I propose removing those traits and moving the implementation directly into the structs. Benefits:

Users of this library don't need to import those traits: Both tree builders, don't make much sense without implementing those traits. Usually users would import those structs as well as their traits.
No search/guessing where other implementations might be.
Easier to understand code, due to less abstractions.

Downsides:

Breaking change, users of the library would need to update their code.

This issue was raised by the workflow at https://github.com/argumentcomputer/ci-workflows/tree/main/.github/workflows/unused-deps.yml.

Note
If this is a false positive, please refer to the cargo-udeps docs on how to ignore the dependencies.

lurk-lab / neptune Goto Github PK

neptune's Issues

Required Changes

Recommend Projects

Recommend Topics

Recommend Org