Giter Site home page Giter Site logo

neptune's Issues

column_tree_builder::tests::test_column_tree_builder ... LLVM ERROR

Wasn't sure how to title this bug report, so I hope it's good enough.

My configuration:
Manjaro Linux
Mesa Drivers with AMDGPU-PRO OpenCL library (this configuration works in other OpenCL applications/benchmarks such as Luxmark)
CPU: Ryzen 5950X
GPU: Radeon RX 6800 XT

Ran into this error when running benchy from rust-fil-proofs. @vmx advised me to file a bug report against neptune.

     Finished test [unoptimized + debuginfo] target(s) in 20.54s
     Running target/debug/deps/neptune-b1005e4d0551b26d

running 33 tests
test circuit::tests::test_poseidon_hash ... ok
test circuit::tests::test_scalar_product ... ok
test circuit::tests::test_scalar_product_with_add ... ok
test circuit::tests::test_square_sum ... ok
test column_tree_builder::tests::test_column_tree_builder ... LLVM ERROR: Cannot select: 0x55bca5ac5090: i32 = GlobalAddress<[4 x i64] addrspace(5)* @constinit.10> 0
In function: hash_2_standard
error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `/home/chuck/git/neptune/target/debug/deps/neptune-b1005e4d0551b26d --test-threads=1` (signal: 6, SIGABRT: process abort signal)

Interestingly, when I installed ROCm from AUR, I get a different, but probably related error (technically, ROCm isn't support on RDNA cards, but I tried it after seeing a Phoronix article):

$ cargo test --features opencl,arity2,arity4,arity8,arity11,arity16,arity24,arity36 -- --test-threads=1
    Finished test [unoptimized + debuginfo] target(s) in 0.16s
     Running target/debug/deps/neptune-b1005e4d0551b26d

running 33 tests
test circuit::tests::test_poseidon_hash ... ok
test circuit::tests::test_scalar_product ... ok
test circuit::tests::test_scalar_product_with_add ... ok
test circuit::tests::test_square_sum ... ok
test column_tree_builder::tests::test_column_tree_builder ... LLVM ERROR: Cannot select: 0x5650f3109988: i32 = GlobalAddress<[4 x i64] addrspace(5)* @constinit.10> 0
In function: apply_round_matrix_2_standard
error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `/home/chuck/git/neptune/target/debug/deps/neptune-b1005e4d0551b26d --test-threads=1` (signal: 6, SIGABRT: process abort signal)

Re-establish a GPU-based CI

We introduced basic CI as part of #177 to compensate for the change in CI infrastructure. We should introduce a Docker runner able to test Neptune CI with OpenCL and CUDA capabilities, and then run it in the form of self-hosted runners.

To achieve this, we need to integrate the following resources:

Potential implementation of Filecoin Poseidon in Golang?

In the new CID/CAR format, multihash is supported, and Poseidon has officially become a hash option.

poseidon-bls12_381-a2-fc1 | multihash | 0xb401 | permanent | Poseidon using BLS12-381 and arity of 2 with Filecoin parameters |

However, it has not been implemented by many multihash implementations. Indeed, the go-multihash does not support Poseidon, neither does rust-multihash
https://github.com/multiformats/go-multihash/tree/master/register
https://github.com/multiformats/rust-multihash#supported-hash-types

Is neptune finalized? Is there a plan to implement Poseidon in multiple programming languages (FFI does not seem to be a good idea)?

Error with AMD 570 GPU

cd gbench
cargo run

Compiling gbench v0.5.4 (/home/peware/neptune/gbench)
Finished dev [unoptimized + debuginfo] target(s) in 2m 43s
Running target/debug/gbench
[2020-07-13T00:50:57Z INFO gbench] KiB: 4194304
[2020-07-13T00:50:57Z INFO gbench] leaves: 134217728
[2020-07-13T00:50:57Z INFO gbench] max column batch size: 400000
[2020-07-13T00:50:57Z INFO gbench] max tree batch size: 700000
--> Run 0
[2020-07-13T00:50:57Z INFO gbench] Creating ColumnTreeBuilder
(some Futhark code): Could not find acceptable OpenCL device.

sudo lshw -C display
*-display
description: VGA compatible controller
product: Ellesmere [Radeon RX 470/480/570/570X/580/580X]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:05:00.0
version: ef
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: irq:61 memory:d0000000-dfffffff memory:cfe00000-cfffffff ioport:5000(size=256) memory:fdec0000-fdefffff memory:fde00000-fde1ffff
*-display UNCLAIMED
description: VGA compatible controller
product: ES1000
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 3
bus info: pci@0000:01:03.0
version: 02
width: 32 bits
clock: 33MHz
capabilities: pm vga_controller bus_master cap_list
configuration: latency=64 mingnt=8
resources: memory:c0000000-c7ffffff ioport:2000(size=256) memory:ed9f0000-ed9fffff memory:c0000-dffff

Neptune uses an outdated reference script

Hi, fn generate_constants() https://github.com/filecoin-project/neptune/blob/2b11f0ce69f52aa9594f250baa658bfe2d349ac3/src/round_constants.rs#L26
references https://extgit.iaik.tugraz.at/krypto/hadeshash/blob/master/code/scripts/create_rcs_grain.sage
That file does not exist. An updated script exists in that repo with a notice of some fixed bugs.

Are there no security implications in not following the updated reference impl?

I was trying to reproduce the Poseidon constants which circomlib uses (they use the more recent script generate_parameters_grain.sage) and was unable to.

BusIdNotAvailable in gbench

cd neptune-master/gbench
export NEPTUNE_GBENCH_GPUS=99
RUST_LOG=info cargo run -- --max-tree-batch-size 700000 --max-column-batch-size 400000 
    Finished dev [unoptimized + debuginfo] target(s) in 0.36s
     Running `/public/home/cf/neptune-master/target/debug/gbench --max-tree-batch-size 700000 --max-column-batch-size 400000
[2021-03-10T09:41:39Z INFO  gbench] KiB: 4194304
[2021-03-10T09:41:39Z INFO  gbench] leaves: 134217728
[2021-03-10T09:41:39Z INFO  gbench] max column batch size: 400000
[2021-03-10T09:41:39Z INFO  gbench] max tree batch size: 700000
[2021-03-10T09:41:39Z INFO  gbench] GPU[Selector: BatcherType::CustomGPU(BusId(99))] --> Run 0
[2021-03-10T09:41:39Z INFO  gbench] GPU[Selector: BatcherType::CustomGPU(BusId(99))]: Creating ColumnTreeBuilder
[2021-03-10T09:41:39Z INFO  neptune::triton::cl] getting context for ~BusId(99)
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ClError(BusIdNotAvailable)', gbench/src/main.rs:31:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', gbench/src/main.rs:163:23

BUSID info

$ rocm-smi --showhw
========================ROCm System Management Interface========================

GPU  DID   GFX RAS   SDMA RAS  UMC RAS  VBIOS             BUS           
1    66a1  DISABLED  ENABLED   ENABLED  113-D1631900-064  0000:04:00.0  
2    66a1  DISABLED  ENABLED   ENABLED  113-D1631900-064  0000:26:00.0  
3    66a1  DISABLED  ENABLED   ENABLED  113-D1631900-064  0000:43:00.0  
4    66a1  DISABLED  ENABLED   ENABLED  113-D1631900-064  0000:63:00.0  
==============================End of ROCm SMI Log ==============================

clinfo

[cf@c07r1n01 gbench]$ clinfo 
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (2982.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               4
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Device 66a1
  Device Topology:                               PCI[ B#4, D#0, F#0 ]
  Max compute units:                             60
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1600Mhz
  Address bits:                                  64
  Max memory allocation:                         14588628172
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26273
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            17163091968
  Constant buffer size:                          14588628172
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          1703726284
  Max global variable size:                      14588628172
  Max global variable preferred total size:      17163091968
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x2b835a7f4d30
  Name:                                          gfx906+sram-ecc
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                2982.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 


  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Device 66a1
  Device Topology:                               PCI[ B#38, D#0, F#0 ]
  Max compute units:                             60
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1600Mhz
  Address bits:                                  64
  Max memory allocation:                         14588628172
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26273
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            17163091968
  Constant buffer size:                          14588628172
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          1703726284
  Max global variable size:                      14588628172
  Max global variable preferred total size:      17163091968
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x2b835a7f4d30
  Name:                                          gfx906+sram-ecc
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                2982.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 


  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Device 66a1
  Device Topology:                               PCI[ B#67, D#0, F#0 ]
  Max compute units:                             60
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1600Mhz
  Address bits:                                  64
  Max memory allocation:                         14588628172
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26273
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            17163091968
  Constant buffer size:                          14588628172
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          1703726284
  Max global variable size:                      14588628172
  Max global variable preferred total size:      17163091968
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x2b835a7f4d30
  Name:                                          gfx906+sram-ecc
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                2982.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 


  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Device 66a1
  Device Topology:                               PCI[ B#99, D#0, F#0 ]
  Max compute units:                             60
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1600Mhz
  Address bits:                                  64
  Max memory allocation:                         14588628172
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26273
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            17163091968
  Constant buffer size:                          14588628172
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          1703726284
  Max global variable size:                      14588628172
  Max global variable preferred total size:      17163091968
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x2b835a7f4d30
  Name:                                          gfx906+sram-ecc
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                2982.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 

neptune for Pasta curves

The repo currently says: "Neptune is specialized to the BLS12-381 curve. Although the API allows for type specialization to other fields, the round numbers, constants, and s-box selection may not be correct. Do not do this."

Can we verify if the same set of constants would work for both curves in the Pasta cycle?

create_futhark_context() get 3090 GPU Error

I recently use the 3090 to power my P2. But Got the Error

Graphics SM warp Exception on GPc 1, TPc 0,
Graphics Exception Out of Range Addr

Then some of my P2 task will fail with invalid vinilla proof or CID not match.
the tree_c and tree_r_last are the same!

I use ubuntu 18.04 + GPU driver 460.92

Upgrading ff and group crates

This is issue is meant as a reminder and not as an immediate action item. There are new releases of ff and group. Upgrading to those is a breaking change as they contain traits and you cannot really have the traits of two different versions in your dependency tree.

I propose postponing the upgrade until a new breaking release is needed for other reasons. This upgrade could then combined with such a release.

The upgrade is not straight forward, as also all other dependencies using those traits would need to be updates e.g. bellperson.

The upgrade would enable an upgrade of pasta_curves as well. The most recent release v0.5.0 should contain everything our current fork contains. This means once upgraded, we won't need to rely on a fork anymore.

Reporting rebasing need only once

There's a GitHub action that checks if a rebase is needed. It posts a comment every hour.

I'm currently watching this repo and it would be great if there wouldn't be so many events happening if nothing really changes. I propose using something like https://github.com/peter-evans/find-comment to check if the comment already exists and in case it does, skipping to post another one.

(Potential) Mismatch with Poseidon Hash paper

Thanks for the wonderful work!

It seems there are a few (potential) mismatches between round_numbers.rs and the Poseidon Hash paper. Is there any reason for this mismatch? More specifically,

  1. In Line 82, we are using let rf_interp = 0.43 * m + t.log2() - rp;. In Poseidon Hash paper, Eq (3) requires
    image

For BLS12-381 with image, M=128, and image, we should add something like log_5(t) instead of t.log2().

  1. Line 82 ~ 83 is also different from Eq (5) in Poseidon hash paper.

wasm target build error

Hi, could you pls suggest the correct way to build this lib for wasm. I tried
cargo build --target wasm32-unknown-unknown
and also with
cargo build --target wasm32-unknown-unknown --features "wasm"
but was getting compile errors:
MmapInner::map(self.get_len(file)?, file, self.offset).map(|inner| Mmap { inner: inner })
| ^^^^^^^^^ use of undeclared type MmapInner

Thanks.

Serialize/Deserialize Poseidon Constants

Hi - Nova's public parameters reference Neptune's Poseidon Hash Constants. We would like to serialize/deserialize Nova's public parameters using serde. What do you'll think about adding serde derive as an optional feature so that we can serialize/deserialize? It gets a bit tricky when it comes to serialize/deserialize of UInt from typenum so I was just wandering if anyone had any thoughts on that. Thanks.

help..

Sorry.
I want to know how to use it.
I have 2 GPUs.
I don't understand RUST and cargo, so I want to know the execution command.

Feature Request: Implementation of Poseidon2 Hash Function

I would like to propose the implementation of the Poseidon2 hash function in the Neptune repository. This recent advancement enhances the efficiency of the Poseidon hash function, specifically tailored for zero-knowledge applications.

Referencing the research paper and the explanatory note provided by the authors, Poseidon2 enhances performance by focusing on its linear layers and round constant addition. This new design requires only a short chain of additions for computation, significantly reducing the number of multiplications and reductions.

Specifically,

  • Poseidon2 employs a fixed matrix for the external linear layers and another matrix for the internal linear layers, differing from the original Poseidon, which uses the same expensive MDS matrix in each linear layer.
  • Poseidon2 directly modifies the round constant addition, eliminating the need for the efficient representation required in the original Poseidon.

Given these improvements, Poseidon2 can offer a performance boost of up to a factor of 4 compared to the original Poseidon, without any increase in the number of rounds or other disadvantages. The reference implementation provided by HorizenLabs may be useful for this implementation.

Considering the focus of the Neptune repository on the Poseidon hash function, I believe that including Poseidon2 would greatly enhance its performance and efficiency.

Proposal: removing Futhark implementation

Neptune currently has an implementation called Triton which is implemented in Futhark. There is now an OpenCL/CUDA implementation called Proteus, with better performance. I propose removing Triton to lower the maintenance cost of this library.

Neptune does not currently build due to dependencies

There are broken dependencies in neptune that are causing build issues in https://github.com/filecoin-project/rust-filecoin-proofs-api

From neptune:

$ cargo update
    Updating crates.io index
error: failed to select a version for the requirement `rustc_version = "^0.1"`
candidate versions found which didn't match: 0.3.3, 0.3.2, 0.3.1, ...
location searched: crates.io index
required by package `fil-ocl-core v0.11.3`
    ... which is depended on by `fil-ocl v0.19.4`
    ... which is depended on by `rust-gpu-tools v0.3.0`
    ... which is depended on by `gbench v0.5.4 (...../neptune/gbench)`

From rust-filecoin-proofs-api:

$ cargo update
    Updating crates.io index
error: failed to select a version for the requirement `rustc_version = "^0.1"`
candidate versions found which didn't match: 0.3.3, 0.3.2, 0.3.1, ...
location searched: crates.io index
required by package `fil-ocl-core v0.11.3`
    ... which is depended on by `fil-ocl v0.19.4`
    ... which is depended on by `rust-gpu-tools v0.2.0`
    ... which is depended on by `bellperson v0.12.3`
    ... which is depended on by `filecoin-proofs-api v6.0.0 (...../rust-filecoin-proofs-api)`

Allow odd full rounds

Currently, neptune expects that the number of full rounds R_F is an even number, as evidenced by the number of full rounds in the first and second halves being the same R_f = floor(R_F / 2).

All three Poseidon implementations (static, correct, and dynamic) use R_f as the number of first and second half full rounds, which is correct only when R_F is even (currently R_F = 8 for all Filecoin applications). However, when R_F is odd, the number of second half full rounds should be R_F - R_f (so you don't lose the last full round of the second half).

Required Changes

  1. In all three Poseidon implementations, change the number of second half full rounds to self.constants.full_rounds - self.constants.half_full_rounds.
  2. Remove this unimplemented! panic.

Purpose of (Column)TreeBuilderTrait?

We currently have two tree builders, TreeBuilder and ColumnTreeBuilder, both with their own traits. AFAICT, there are only those implementations of those traits. Hence I wonder what the purpose of the ColumnTreeBuilderTrait and TreeBuilderTrait traits is?

I propose removing those traits and moving the implementation directly into the structs. Benefits:

  • Users of this library don't need to import those traits: Both tree builders, don't make much sense without implementing those traits. Usually users would import those structs as well as their traits.
  • No search/guessing where other implementations might be.
  • Easier to understand code, due to less abstractions.

Downsides:

  • Breaking change, users of the library would need to update their code.

MDS matrix security

The recent update of the Poseidon article drops in additional requirements on the MDS matrix security, see p. 7. Any idea if a randomly sampled Cauchy matrix over a large field is still safe?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.