Giter Site home page Giter Site logo

neptune's Introduction

Neptune crates.io neptunedependency status

About

Neptune is a Rust implementation of the Poseidon hash function tuned for Filecoin.

Neptune has been audited by ADBK Consulting and deemed fully compliant with the paper (Starkad and Poseidon: New Hash Functions for Zero Knowledge Proof Systems).

Neptune was initially specialized to the BLS12-381 curve. Although the API allows for type specialization to other fields, the round numbers, constants, and s-box selection may not be correct. As long as the alternate field is a prime field of ~256 bits, the 128-bit security Neptune targets will apply. There is a run-time assertion which will fail if constants are generated for a field whose elements do not have a representation of exactly 32 byte. The Pasta Curves meet these criteria and are explicitly supported by Neptune.

At the time of the 1.0.0 release, Neptune on RTX 2080Ti GPU can build 8-ary Merkle trees for 4GiB of input in 16 seconds.

Implementation Specification

Filecoin's Poseidon specification is published in the Filecoin specification document here. Additionally, Markdown and PDF versions are mirrored in this repo in the spec directory.

Contributing to the Spec

PDF Rendering Instructions

The spec's PDF is rendered using Typora. Download the spec's Markdown file here, open the file in Typora, make and save your changes, then export the file as a PDF.

Ensuring Spec Documents Stay in Sync

When making changes to the spec documents in neptune, make sure that the spec's PDF file poseidon_spec.pdf is the PDF rendering of the Markdown spec poseidon_spec.md.

If you make changes to the spec in neptune, you must make those same changes to the Filecoin spec here, thus ensuring all three document's (one Markdown+Latex and one PDF in neptune and one Markdown+MathJax in filecoin-project/specs) stay in sync.

Environment variables

  • EC_GPU_FRAMEWORK=<cuda | opencl> allows to select whether the CUDA or OpenCL implementation should be used. If not set, cuda will be used if available.

  • EC_GPU_CUDA_NVCC_ARGS

By default the CUDA kernel is compiled for several architectures, which may take a long time. EC_GPU_CUDA_NVCC_ARGS can be used to override those arguments. The input and output file will still be automatically set.

// Example for compiling the kernel for only the Turing architecture
EC_GPU_CUDA_NVCC_ARGS="--fatbin --gpu-architecture=sm_75 --generate-code=arch=compute_75,code=sm_75"

Rust feature flags

Neptune also supports batch hashing and tree building, which can be performed on a GPU. GPU batch hashing is implemented in pure CUDA/OpenCL. The pure CUDA/OpenCL batch hashing is provided by the internal proteus module. To use proteus, compile neptune with the opencl and/or cuda feature.

The cuda and opencl feature can be used independently or together. If both cuda and opencl are used, you can also select which implementation to use via the NEPTUNE_GPU_FRAMEWORK environment variable.

Arities

The CUDA/OpenCL kernel (enabled with the cuda/opencl feature) is generated with specific arities. Those arities need to be specified at compile-time via Rust feature flags. Available features are arity2, arity4, arity8, arity11, arity16, arity24, arity36. When the strengthened feature is enables, there will be an additional strengthened version available for each arity.

When using the cuda feature, the kernel is generated at compile-time. The more arities are used, the longer is the compile time. Hence, by default there are no specific arities enabled. You need to set at least one yourself.

Fields

The CUDA/OpenCL kernel (enabled with the cuda/opencl feature) is generated for specific fields. Those fields need to be specified at compile-time via Rust feature flags. Available features are bls for BLS12-381 and pasta for the Pallas and Vesta curves' scalar fields.

Running the tests

As the compile-time of the kernel depends on how many arities are used, there are no arities enabled by default. In order to run the test, all arities need to explicitly be enabled. To run all tests on e.g. the CUDA implementation, run:

cargo test --no-default-features --features cuda,bls,pasta,arity2,arity4,arity8,arity11,arity16,arity24,arity36

Benchmarking Poseidon by Field and Preimage Length

Benchmark Poseidon over the BLS12-381, Pallas, and Vesta scalar fields for preimages of length 2, 4, 8, or 11 using:

cargo bench arity-<preimage len>

Benchmark Poseidon over a specific field (bls, pallas, or vesta) and preimage length using:

cargo bench arity-<preimage len>/<field name>

Sponge API

Neptune implements the Secure Sponge API for Field Elements and serves as its reference implementation. The SpongeAPI trait defines the relevant API methods. See tests in source for simple examples of API usage with circuits and without circuits.

History

Neptune was originally bootstrapped from Dusk's reference implementation.

Changes

CHANGELOG

License

MIT or Apache 2.0

neptune's People

Contributors

anderssorby avatar cryptonemo avatar daviddias avatar dependabot[bot] avatar dignifiedquire avatar drpetervannostrand avatar emmorais avatar github-actions[bot] avatar huitseeker avatar jonas-lj avatar keyvank avatar leonardoalt avatar luozijun avatar mengsuenyan avatar porcuquine avatar qy3u avatar samuelburnham avatar storojs72 avatar swasilyev avatar themighty1 avatar vmx avatar vuittont60 avatar winston-h-zhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neptune's Issues

Potential implementation of Filecoin Poseidon in Golang?

In the new CID/CAR format, multihash is supported, and Poseidon has officially become a hash option.

poseidon-bls12_381-a2-fc1 | multihash | 0xb401 | permanent | Poseidon using BLS12-381 and arity of 2 with Filecoin parameters |

However, it has not been implemented by many multihash implementations. Indeed, the go-multihash does not support Poseidon, neither does rust-multihash
https://github.com/multiformats/go-multihash/tree/master/register
https://github.com/multiformats/rust-multihash#supported-hash-types

Is neptune finalized? Is there a plan to implement Poseidon in multiple programming languages (FFI does not seem to be a good idea)?

MDS matrix security

The recent update of the Poseidon article drops in additional requirements on the MDS matrix security, see p. 7. Any idea if a randomly sampled Cauchy matrix over a large field is still safe?

Error with AMD 570 GPU

cd gbench
cargo run

Compiling gbench v0.5.4 (/home/peware/neptune/gbench)
Finished dev [unoptimized + debuginfo] target(s) in 2m 43s
Running target/debug/gbench
[2020-07-13T00:50:57Z INFO gbench] KiB: 4194304
[2020-07-13T00:50:57Z INFO gbench] leaves: 134217728
[2020-07-13T00:50:57Z INFO gbench] max column batch size: 400000
[2020-07-13T00:50:57Z INFO gbench] max tree batch size: 700000
--> Run 0
[2020-07-13T00:50:57Z INFO gbench] Creating ColumnTreeBuilder
(some Futhark code): Could not find acceptable OpenCL device.

sudo lshw -C display
*-display
description: VGA compatible controller
product: Ellesmere [Radeon RX 470/480/570/570X/580/580X]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:05:00.0
version: ef
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: irq:61 memory:d0000000-dfffffff memory:cfe00000-cfffffff ioport:5000(size=256) memory:fdec0000-fdefffff memory:fde00000-fde1ffff
*-display UNCLAIMED
description: VGA compatible controller
product: ES1000
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 3
bus info: pci@0000:01:03.0
version: 02
width: 32 bits
clock: 33MHz
capabilities: pm vga_controller bus_master cap_list
configuration: latency=64 mingnt=8
resources: memory:c0000000-c7ffffff ioport:2000(size=256) memory:ed9f0000-ed9fffff memory:c0000-dffff

Proposal: removing Futhark implementation

Neptune currently has an implementation called Triton which is implemented in Futhark. There is now an OpenCL/CUDA implementation called Proteus, with better performance. I propose removing Triton to lower the maintenance cost of this library.

Feature Request: Implementation of Poseidon2 Hash Function

I would like to propose the implementation of the Poseidon2 hash function in the Neptune repository. This recent advancement enhances the efficiency of the Poseidon hash function, specifically tailored for zero-knowledge applications.

Referencing the research paper and the explanatory note provided by the authors, Poseidon2 enhances performance by focusing on its linear layers and round constant addition. This new design requires only a short chain of additions for computation, significantly reducing the number of multiplications and reductions.

Specifically,

  • Poseidon2 employs a fixed matrix for the external linear layers and another matrix for the internal linear layers, differing from the original Poseidon, which uses the same expensive MDS matrix in each linear layer.
  • Poseidon2 directly modifies the round constant addition, eliminating the need for the efficient representation required in the original Poseidon.

Given these improvements, Poseidon2 can offer a performance boost of up to a factor of 4 compared to the original Poseidon, without any increase in the number of rounds or other disadvantages. The reference implementation provided by HorizenLabs may be useful for this implementation.

Considering the focus of the Neptune repository on the Poseidon hash function, I believe that including Poseidon2 would greatly enhance its performance and efficiency.

wasm target build error

Hi, could you pls suggest the correct way to build this lib for wasm. I tried
cargo build --target wasm32-unknown-unknown
and also with
cargo build --target wasm32-unknown-unknown --features "wasm"
but was getting compile errors:
MmapInner::map(self.get_len(file)?, file, self.offset).map(|inner| Mmap { inner: inner })
| ^^^^^^^^^ use of undeclared type MmapInner

Thanks.

Neptune uses an outdated reference script

Hi, fn generate_constants() https://github.com/filecoin-project/neptune/blob/2b11f0ce69f52aa9594f250baa658bfe2d349ac3/src/round_constants.rs#L26
references https://extgit.iaik.tugraz.at/krypto/hadeshash/blob/master/code/scripts/create_rcs_grain.sage
That file does not exist. An updated script exists in that repo with a notice of some fixed bugs.

Are there no security implications in not following the updated reference impl?

I was trying to reproduce the Poseidon constants which circomlib uses (they use the more recent script generate_parameters_grain.sage) and was unable to.

Upgrading ff and group crates

This is issue is meant as a reminder and not as an immediate action item. There are new releases of ff and group. Upgrading to those is a breaking change as they contain traits and you cannot really have the traits of two different versions in your dependency tree.

I propose postponing the upgrade until a new breaking release is needed for other reasons. This upgrade could then combined with such a release.

The upgrade is not straight forward, as also all other dependencies using those traits would need to be updates e.g. bellperson.

The upgrade would enable an upgrade of pasta_curves as well. The most recent release v0.5.0 should contain everything our current fork contains. This means once upgraded, we won't need to rely on a fork anymore.

help..

Sorry.
I want to know how to use it.
I have 2 GPUs.
I don't understand RUST and cargo, so I want to know the execution command.

Neptune does not currently build due to dependencies

There are broken dependencies in neptune that are causing build issues in https://github.com/filecoin-project/rust-filecoin-proofs-api

From neptune:

$ cargo update
    Updating crates.io index
error: failed to select a version for the requirement `rustc_version = "^0.1"`
candidate versions found which didn't match: 0.3.3, 0.3.2, 0.3.1, ...
location searched: crates.io index
required by package `fil-ocl-core v0.11.3`
    ... which is depended on by `fil-ocl v0.19.4`
    ... which is depended on by `rust-gpu-tools v0.3.0`
    ... which is depended on by `gbench v0.5.4 (...../neptune/gbench)`

From rust-filecoin-proofs-api:

$ cargo update
    Updating crates.io index
error: failed to select a version for the requirement `rustc_version = "^0.1"`
candidate versions found which didn't match: 0.3.3, 0.3.2, 0.3.1, ...
location searched: crates.io index
required by package `fil-ocl-core v0.11.3`
    ... which is depended on by `fil-ocl v0.19.4`
    ... which is depended on by `rust-gpu-tools v0.2.0`
    ... which is depended on by `bellperson v0.12.3`
    ... which is depended on by `filecoin-proofs-api v6.0.0 (...../rust-filecoin-proofs-api)`

Reporting rebasing need only once

There's a GitHub action that checks if a rebase is needed. It posts a comment every hour.

I'm currently watching this repo and it would be great if there wouldn't be so many events happening if nothing really changes. I propose using something like https://github.com/peter-evans/find-comment to check if the comment already exists and in case it does, skipping to post another one.

column_tree_builder::tests::test_column_tree_builder ... LLVM ERROR

Wasn't sure how to title this bug report, so I hope it's good enough.

My configuration:
Manjaro Linux
Mesa Drivers with AMDGPU-PRO OpenCL library (this configuration works in other OpenCL applications/benchmarks such as Luxmark)
CPU: Ryzen 5950X
GPU: Radeon RX 6800 XT

Ran into this error when running benchy from rust-fil-proofs. @vmx advised me to file a bug report against neptune.

     Finished test [unoptimized + debuginfo] target(s) in 20.54s
     Running target/debug/deps/neptune-b1005e4d0551b26d

running 33 tests
test circuit::tests::test_poseidon_hash ... ok
test circuit::tests::test_scalar_product ... ok
test circuit::tests::test_scalar_product_with_add ... ok
test circuit::tests::test_square_sum ... ok
test column_tree_builder::tests::test_column_tree_builder ... LLVM ERROR: Cannot select: 0x55bca5ac5090: i32 = GlobalAddress<[4 x i64] addrspace(5)* @constinit.10> 0
In function: hash_2_standard
error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `/home/chuck/git/neptune/target/debug/deps/neptune-b1005e4d0551b26d --test-threads=1` (signal: 6, SIGABRT: process abort signal)

Interestingly, when I installed ROCm from AUR, I get a different, but probably related error (technically, ROCm isn't support on RDNA cards, but I tried it after seeing a Phoronix article):

$ cargo test --features opencl,arity2,arity4,arity8,arity11,arity16,arity24,arity36 -- --test-threads=1
    Finished test [unoptimized + debuginfo] target(s) in 0.16s
     Running target/debug/deps/neptune-b1005e4d0551b26d

running 33 tests
test circuit::tests::test_poseidon_hash ... ok
test circuit::tests::test_scalar_product ... ok
test circuit::tests::test_scalar_product_with_add ... ok
test circuit::tests::test_square_sum ... ok
test column_tree_builder::tests::test_column_tree_builder ... LLVM ERROR: Cannot select: 0x5650f3109988: i32 = GlobalAddress<[4 x i64] addrspace(5)* @constinit.10> 0
In function: apply_round_matrix_2_standard
error: test failed, to rerun pass '--lib'

Caused by:
  process didn't exit successfully: `/home/chuck/git/neptune/target/debug/deps/neptune-b1005e4d0551b26d --test-threads=1` (signal: 6, SIGABRT: process abort signal)

Purpose of (Column)TreeBuilderTrait?

We currently have two tree builders, TreeBuilder and ColumnTreeBuilder, both with their own traits. AFAICT, there are only those implementations of those traits. Hence I wonder what the purpose of the ColumnTreeBuilderTrait and TreeBuilderTrait traits is?

I propose removing those traits and moving the implementation directly into the structs. Benefits:

  • Users of this library don't need to import those traits: Both tree builders, don't make much sense without implementing those traits. Usually users would import those structs as well as their traits.
  • No search/guessing where other implementations might be.
  • Easier to understand code, due to less abstractions.

Downsides:

  • Breaking change, users of the library would need to update their code.

Serialize/Deserialize Poseidon Constants

Hi - Nova's public parameters reference Neptune's Poseidon Hash Constants. We would like to serialize/deserialize Nova's public parameters using serde. What do you'll think about adding serde derive as an optional feature so that we can serialize/deserialize? It gets a bit tricky when it comes to serialize/deserialize of UInt from typenum so I was just wandering if anyone had any thoughts on that. Thanks.

neptune for Pasta curves

The repo currently says: "Neptune is specialized to the BLS12-381 curve. Although the API allows for type specialization to other fields, the round numbers, constants, and s-box selection may not be correct. Do not do this."

Can we verify if the same set of constants would work for both curves in the Pasta cycle?

(Potential) Mismatch with Poseidon Hash paper

Thanks for the wonderful work!

It seems there are a few (potential) mismatches between round_numbers.rs and the Poseidon Hash paper. Is there any reason for this mismatch? More specifically,

  1. In Line 82, we are using let rf_interp = 0.43 * m + t.log2() - rp;. In Poseidon Hash paper, Eq (3) requires
    image

For BLS12-381 with image, M=128, and image, we should add something like log_5(t) instead of t.log2().

  1. Line 82 ~ 83 is also different from Eq (5) in Poseidon hash paper.

Allow odd full rounds

Currently, neptune expects that the number of full rounds R_F is an even number, as evidenced by the number of full rounds in the first and second halves being the same R_f = floor(R_F / 2).

All three Poseidon implementations (static, correct, and dynamic) use R_f as the number of first and second half full rounds, which is correct only when R_F is even (currently R_F = 8 for all Filecoin applications). However, when R_F is odd, the number of second half full rounds should be R_F - R_f (so you don't lose the last full round of the second half).

Required Changes

  1. In all three Poseidon implementations, change the number of second half full rounds to self.constants.full_rounds - self.constants.half_full_rounds.
  2. Remove this unimplemented! panic.

create_futhark_context() get 3090 GPU Error

I recently use the 3090 to power my P2. But Got the Error

Graphics SM warp Exception on GPc 1, TPc 0,
Graphics Exception Out of Range Addr

Then some of my P2 task will fail with invalid vinilla proof or CID not match.
the tree_c and tree_r_last are the same!

I use ubuntu 18.04 + GPU driver 460.92

Re-establish a GPU-based CI

We introduced basic CI as part of #177 to compensate for the change in CI infrastructure. We should introduce a Docker runner able to test Neptune CI with OpenCL and CUDA capabilities, and then run it in the form of self-hosted runners.

To achieve this, we need to integrate the following resources:

BusIdNotAvailable in gbench

cd neptune-master/gbench
export NEPTUNE_GBENCH_GPUS=99
RUST_LOG=info cargo run -- --max-tree-batch-size 700000 --max-column-batch-size 400000 
    Finished dev [unoptimized + debuginfo] target(s) in 0.36s
     Running `/public/home/cf/neptune-master/target/debug/gbench --max-tree-batch-size 700000 --max-column-batch-size 400000
[2021-03-10T09:41:39Z INFO  gbench] KiB: 4194304
[2021-03-10T09:41:39Z INFO  gbench] leaves: 134217728
[2021-03-10T09:41:39Z INFO  gbench] max column batch size: 400000
[2021-03-10T09:41:39Z INFO  gbench] max tree batch size: 700000
[2021-03-10T09:41:39Z INFO  gbench] GPU[Selector: BatcherType::CustomGPU(BusId(99))] --> Run 0
[2021-03-10T09:41:39Z INFO  gbench] GPU[Selector: BatcherType::CustomGPU(BusId(99))]: Creating ColumnTreeBuilder
[2021-03-10T09:41:39Z INFO  neptune::triton::cl] getting context for ~BusId(99)
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ClError(BusIdNotAvailable)', gbench/src/main.rs:31:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Any', gbench/src/main.rs:163:23

BUSID info

$ rocm-smi --showhw
========================ROCm System Management Interface========================

GPU  DID   GFX RAS   SDMA RAS  UMC RAS  VBIOS             BUS           
1    66a1  DISABLED  ENABLED   ENABLED  113-D1631900-064  0000:04:00.0  
2    66a1  DISABLED  ENABLED   ENABLED  113-D1631900-064  0000:26:00.0  
3    66a1  DISABLED  ENABLED   ENABLED  113-D1631900-064  0000:43:00.0  
4    66a1  DISABLED  ENABLED   ENABLED  113-D1631900-064  0000:63:00.0  
==============================End of ROCm SMI Log ==============================

clinfo

[cf@c07r1n01 gbench]$ clinfo 
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (2982.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               4
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Device 66a1
  Device Topology:                               PCI[ B#4, D#0, F#0 ]
  Max compute units:                             60
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1600Mhz
  Address bits:                                  64
  Max memory allocation:                         14588628172
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26273
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            17163091968
  Constant buffer size:                          14588628172
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          1703726284
  Max global variable size:                      14588628172
  Max global variable preferred total size:      17163091968
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x2b835a7f4d30
  Name:                                          gfx906+sram-ecc
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                2982.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 


  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Device 66a1
  Device Topology:                               PCI[ B#38, D#0, F#0 ]
  Max compute units:                             60
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1600Mhz
  Address bits:                                  64
  Max memory allocation:                         14588628172
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26273
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            17163091968
  Constant buffer size:                          14588628172
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          1703726284
  Max global variable size:                      14588628172
  Max global variable preferred total size:      17163091968
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x2b835a7f4d30
  Name:                                          gfx906+sram-ecc
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                2982.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 


  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Device 66a1
  Device Topology:                               PCI[ B#67, D#0, F#0 ]
  Max compute units:                             60
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1600Mhz
  Address bits:                                  64
  Max memory allocation:                         14588628172
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26273
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            17163091968
  Constant buffer size:                          14588628172
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          1703726284
  Max global variable size:                      14588628172
  Max global variable preferred total size:      17163091968
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x2b835a7f4d30
  Name:                                          gfx906+sram-ecc
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                2982.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 


  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Device 66a1
  Device Topology:                               PCI[ B#99, D#0, F#0 ]
  Max compute units:                             60
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1600Mhz
  Address bits:                                  64
  Max memory allocation:                         14588628172
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    26273
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            17163091968
  Constant buffer size:                          14588628172
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             65536
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          1703726284
  Max global variable size:                      14588628172
  Max global variable preferred total size:      17163091968
  Max read/write image args:                     64
  Max on device events:                          1024
  Queue on device max size:                      8388608
  Max on device queues:                          1
  Queue on device preferred size:                262144
  SVM capabilities:                              
    Coarse grain buffer:                         Yes
    Fine grain buffer:                           Yes
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:                              
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:                            
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x2b835a7f4d30
  Name:                                          gfx906+sram-ecc
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 2.0 
  Driver version:                                2982.0 (HSA1.1,LC)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 2.0 
  Extensions:                                    cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.