Giter Site home page Giter Site logo

tromp / cuckoo Goto Github PK

View Code? Open in Web Editor NEW
818.0 84.0 173.0 13.19 MB

a memory-bound graph-theoretic proof-of-work system

License: Other

Java 1.27% Makefile 2.79% C++ 51.67% C 5.14% Cuda 38.72% Perl 0.30% PostScript 0.04% Raku 0.07%
cuckoo-cycle bounties proof-of-work graph solver

cuckoo's People

Contributors

agusdallalba avatar casey avatar dincho avatar domfarolino avatar hyc avatar kallewoof avatar lvella avatar minux avatar npinto avatar opticflowx avatar otoburb avatar peterreid avatar smaugid avatar tolbrino avatar tromp avatar wilsonk avatar yeastplume avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cuckoo's Issues

asm.js

Have you already thought about whether it could be compiled to run in a browser? Would a hand-optimised js implementation be a terrible idea?

This is very interesting, thanks for sharing it!

Feature request: Add structured output

I'm working on a tool that runs a bunch of cuckoo cycle solvers and saves output (including time,
algorithm use, and graph size) to a file. The hope is that people in the community will use this tool to benchmark and report their GPU/CPU performance, which I can then use to produce a public website that shows the performance of the different solvers and algorithms across different GPUs and CPUs.

You can see the current work in progress code here.

Parsing data from the solver output was pretty painful. Would you consider adding some kind of structured output, for example JSON, to make such tools easier to write? Ideally it would log all parameters that the solver was launched with, information on the CPU or GPU which the solver was running on, and information about solution times for the graphs searched.

Any more information about `src/cuckoo/mean.hpp`?

It turns out that the edge-trimming process in mean.cpp is much more difficult for me to understand, as compared to the simple and lean version... It's even hard for me to make a list of questions, since I am a sort of totally lost... I watched at least 3 youtube videos about cuckoo cycle, but not in-depth discussion about mean algorithm so far...so I am wondering if there are any more information on the web discussing details about mean.hpp?

For example, I am confused about what's the data structure is used to store "edges", and how (where) the "edges" are trimmed with regard to the corresponding data structure...also it seems a zbucket elment is of BIGSIZE (5) bytes, which is 40 bits, and an an edge's two node (UYZ, VYZ) is 22*2=44(bit). How 44-bit info fits into 40-bit space...

Many thanks!

suggesting for algorithm improvement

Hey, your cuckoo cycle paper inspired me to choose proof of work functions as a topic for a paper I had to write this semester. In the paper linked below, I purposed a small change to the algorithm which keeps it memory hard/easy to verify, but removes possibility of speeding up the computation through pre-calculating siphash values - link:
http://erbbysam.com/papers/seeking_better_proof_of_work.pdf (in particular, pages 6-8)

Not really an issue, just something for you to consider.
Thanks,
Sam

GPUassert: out of memory mean.cu 377

./cuda29 -h dd7703e424a7f841131d8c580bf79c09907a463323b2d4531b3907d3823c40092f28377b48abfc46
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.54                 Driver Version: 396.54                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:03:00.0  On |                  N/A |
| 60%   31C    P8    14W / 120W |     86MiB /  6078MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 106...  Off  | 00000000:04:00.0 Off |                  N/A |
| 60%   33C    P8    10W / 120W |      2MiB /  6078MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 106...  Off  | 00000000:09:00.0 Off |                  N/A |
| 60%   35C    P8     9W / 120W |      2MiB /  6078MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1153      G   /usr/lib/xorg/Xorg                            33MiB |
|    0      1192      G   /usr/bin/gnome-shell                          50MiB |
+-----------------------------------------------------------------------------+
GeForce GTX 1060 3GB with 6078MB @ 192 bits x 4004MHz
Looking for 42-cycle on cuckoo30("AX
                                        zF3#ำ‚<",0) with 50% edges, 64*64 buckets, 176 trims, and 64 thread blocks.
GPUassert: out of memory mean.cu 377

Hi tromp,

When I run cuda29, I got GPUassert: out of memory mean.cu error, Is there any problem with my program?

where is indexesE[2] used?

Round<1, EDGES_A, EDGES_B/NB><<<tp.trim.blocks/NB, tp.trim.tpb>>>(0, (uint2*)(bufferA+i*qA), (uint2*)(bufferB+i*qB), indexesE[0]+i*qE, indexesE[1+i]); // to .632

hi, tromp, I can not understand this code: the indexesE[2] never used after round1, how did you know the bucket size after bufferB + sizeB/NB?

Support for N > 32 (EDGEBITS > 32)

@tromp,

Thanks for creating the Cuckoo Cycle and providing this excellent implementation!

This library has some defines to handle EDGEBITS > 32 but, for example, neither the lean nor the mean implementation compile if you set it to 33. Here's some output from attempting this with the mean implementation:

g++ -march=native -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I.  -pthread -o mean34x1 -DNSIPHASH=1 -DEDGEBITS=33 mean_miner.cpp -L. -lblake2b
mean_miner.cpp: In function โ€˜int main(int, char**)โ€™:
mean_miner.cpp:94:54: error: cannot convert โ€˜u32* {aka unsigned int*}โ€™ to โ€˜edge_t* {aka long unsigned int*}โ€™ for argument โ€˜1โ€™ to โ€˜int verify(edge_t*, siphash_keys*)โ€™
       int pow_rc = verify(prf, &ctx.trimmer->sip_keys);

Hacking the appropriate line to use a 64-bit type doesn't solve the problem, so before going further down the rabbit hole I figured I'd ask if EDGEBITS > 32 is intended to work without significant changes.

For context: I'm interested in very long running (runtime of hours) memory-hardened PoWs. This is not for a consensus/mining mechanism, but as an authorization mechanism for writing data to a blockchain. It seems like Cuckoo Cycle could be a good fit. An option being considered is chaining solutions for multiple proofs together to enable a much longer running PoW without prohibitively high memory requirements. This causes a larger aggregate solution size -- something that I'd also like to minimize by finding the right balance. Hence, I'd like to know if this implementation supports larger graphs. I suppose requiring a longer cycle is also something that could be played with.

Fix Makefile line 192 tab

Need to run:

perl -pi -e 's/^ */\t/' Makefile

Otherwise cannot compile:

Makefile:192: *** missing separator. Stop.

Ubuntu 16.04

core dump on linux when nthreads more than 1

grin-miner/cuckoo-miner/src/cuckoo/sys/plugins/cuckoo/src/cuckoo/mean.hpp:689: void
edgetrimmer::trimrename1(u32, u32) [with bool TRIMONV = true; u32 = unsigned int]: Assertion 
'renames <buckets[NX][NY].renameu1' failed

grin-miner/cuckoo-miner/src/cuckoo/sys/plugins/cuckoo/src/cuckoo/mean.hpp:689: void 
edgetrimmer::trimrename1(u32, u32) [with bool TRIMONV = true; u32 = unsigned int]: Assertion 
'renames <buckets[NX][NY].renameu1' failed

Aborted (core dumped)

grin-miner version v1.0.0 on Linux using default CPU solver

I get this when nthreads > 1 but works fine otherwise
Anyone seen this error?

original issue at grin-miner repo

Cuckoo cuda29 solver compiler error

I believe the last diff broke the cuda solver:

Building cuckoo29 cuda solver...
nvcc -std=c++11  -o cuda29 -DEDGEBITS=29 -arch sm_35 mean.cu ../crypto/blake2b-ref.c
mean.cu(782): error: class "SolverParams" has no member "cpuload"

mean.cu(810): error: class "SolverParams" has no member "cpuload"

mean.cu(834): error: class "SolverParams" has no member "cpuload"

not work on NVIDIA GeForce GTX 750 Ti

-d 0 -E 0 -h "" -m 176 -n 0 -r 1 -U 4096 -u 256 -v 128 -w 512 -y 1024 -Z
1024 -z 1024

The thread 0x1c84 has exited with code 0 (0x0).
The thread 0x1160 has exited with code 0 (0x0).
The thread 0x2a30 has exited with code 0 (0x0).
First-chance exception at 0x01049349 in vs2013.exe: 0xC0000005: Access violation reading location 0x08812770.
Unhandled exception at 0x01049349 in vs2013.exe: 0xC0000005: Access violation reading location 0x08812770.

class cuckoo_hash {
public:
u64 *cuckoo;

cuckoo_hash() {
cuckoo = new u64[CUCKOO_SIZE];
}
~cuckoo_hash() {
delete[] cuckoo;
}
void set(node_t u, node_t v) {
u64 niew = (u64)u << NODEBITS | v;
for (node_t ui = u >> IDXSHIFT; ; ui = (ui+1) & CUCKOO_MASK) {
u64 old = cuckoo[ui]; //error at
if (old == 0 || (old >> NODEBITS) == (u & KEYMASK)) {
cuckoo[ui] = niew;
return;
}
}
}
node_t operator[](node_t u) const {
for (node_t ui = u >> IDXSHIFT; ; ui = (ui+1) & CUCKOO_MASK) {
u64 cu = cuckoo[ui]; //error at
if (!cu)
return 0;
if ((cu >> NODEBITS) == (u & KEYMASK)) {
assert(((ui - (u >> IDXSHIFT)) & CUCKOO_MASK) < MAXDRIFT);
return (node_t)(cu & NODEMASK);
}
}
}
};

Suggestion: Refactor solvers to share main function

I would suggest refactoring the solvers to share a single main function. Benefits would be:

  • I found a bug in #72 which had been copied into four different files, since they were part of the main function which is duplicated across multiple files.

  • If there was a single main function shared by all the solvers, features added to that main function wouldn't need to be copied to all the main function.

Also, it would be an opportunity to rewrite the main function in rust, because why not :)

Additionally, if the main function was separated from the solvers, individual solvers could be statically linked into other programs where the flexibility of dynamic linking isn't required.

Prerequisites - time and the LD_LIBRARY_PATH

I am working with an Arch Linux system, and I encountered two issues when building this code.

First was setting the LD_LIBRARY_PATH, for this, the command, which I found in another issue:

export LD_LIBRARY_PATH=$PWD

The second prerequisite was the 'time' command. On Arch Linux the function exists, I'm not 100% certain, but I think, within bash itself, and so the makefile was unable to run the timing tests until I installed the package with pacman -S time.

There is no build instructions in the README.md and I think these two things should be added in order to complete the documentation.

I found there is a note about in the Makefile...

[Question] Fair Mining License

@tromp , thanks for such a straightforward, well-documented repo!

Where can I find more info about the fair mining license and how it works? The description is license.txt is a little brief.

If I'm interested in implementing a miner (or sponsoring an implementation)...

  • How do I 'offer' half the revenue fee with developers?
  • Who should I reach out to, and depending on the project, how will they accept coins?
  • What happens if developers' addresses change, or get compromised?

Googled around a bit, wasn't able to find much on it. This thread on removing miner fees comes to mind. Happy to link to this issue/PR a documentation improvement if it is desired. Thank you.

GPUassert: out of memory mean.cu 377

./cuda29 -h dd7703e424a7f841131d8c580bf79c09907a463323b2d4531b3907d3823c
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.54                 Driver Version: 396.54                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:03:00.0  On |                  N/A |
| 60%   31C    P8    14W / 120W |     86MiB /  6078MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 106...  Off  | 00000000:04:00.0 Off |                  N/A |
| 60%   33C    P8    10W / 120W |      2MiB /  6078MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 106...  Off  | 00000000:09:00.0 Off |                  N/A |
| 60%   35C    P8     9W / 120W |      2MiB /  6078MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1153      G   /usr/lib/xorg/Xorg                            33MiB |
|    0      1192      G   /usr/bin/gnome-shell                          50MiB |
+-----------------------------------------------------------------------------+
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Linux miner 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Hi tromp:

When I run cuda29, I got GPUassert: out of memory mean.cu 377 error, Is there any problem with my program? Tell me if the information above isn't enough.

What is the baseline for the bounty??

I'm looking into this, but there is no clear baseline for the bounty. On the one hand, your Makefile defines bounty targets that use the regular miner (cuckoo_miner), but on the other hand, you have a SIMD implementation in bounty_miner (which I assume is much faster) that is not being used in the Makefile.

So what exactly are we supposed to beat here??

cuda19

cuda19 gives this runtime exception:

cuda19: mean.cu:851: SolverCtx* create_solver_ctx(SolverParams*): Assertion `tp.genA.blocks * tp.genA.tpb <= NEDGES' failed.

I didn't try the other variants but assume they're the same.

mean30 terminates with illegal instruction (signal 4 - error 132) on virtualized Linux environment

As per 6c78cec CI, that runs on virtualized Linux environment operated by Travis, fails:

$ if test demo30 = "${JOB:?}"; then ( cd src && env ${LIBV:?}="${LIBP:?}" make demo30; ); fi
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I.  -pthread -o mean30s -mavx2 -DSAVEEDGES -DNSIPHASH=8 -DEDGEBITS=29 mean_miner.cpp -L. -lblake2b
time ./mean30s -n 25
Looking for 42-cycle on cuckoo30("",25) with 50% edges
Using 3168MB bucket memory at 2ad301872010,
1x21MB thread memory at 2ad3c78b3010,
8-way siphash, and 128 buckets.
nonce 25 k0 k1 21db62abf2643353 f6603de05b29eab
Command terminated by signal 4
0.00user 0.68system 0:00.79elapsed 86%CPU (0avgtext+0avgdata 3268196maxresident)k
0inputs+0outputs (0major+3049minor)pagefaults 0swaps
make: *** [demo30] Error 132

This may be symptom of mean30 relying on undefined C/C++ behaviour, or similar other real issue.

Additional CPU threads occupy significant memory space

When running grin-miner with the c31 plugin, 1 thread will use ~11 GB of memory as expected. However, running 64 threads consumes 15.5 GB, and much beyond that will cause a memory allocation crash.

This system is a Xeon Phi 7210, which has 256 logical cores and 16 GB of high-bandwidth memory. Is a ~70 MB allocation per thread actually necessary? Are there any mitigating actions that could be taken here?

Compilation error

Hi John,

First of all, "Merry Xmas"!

I've a little problem compiling your code on my laptop:

$ export LD_LIBRARY_PATH=$PWD
$ make
gcc -m64 -std=gnu11 -O -fPIC -shared -o libblake2b.so blake2b-ref.c
echo "add build dir to LD_LIBRARY_PATH (or DYLD_LIBRARY_PATH on Mac OSX) if library not found"
add build dir to LD_LIBRARY_PATH (or DYLD_LIBRARY_PATH on Mac OSX) if library not found
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o lean20 -DATOMIC -DEDGEBITS=19 lean_miner.cpp -L. -lblake2b
gcc -m64 -std=gnu11 -O -o verify20 -DEDGEBITS=19 cuckoo.c -L. -lblake2b
./lean20 -n 38 | grep ^Sol | ./verify20 -n 38
Verifying size 42 proof for cuckoo20("",38)
Verified with cyclehash 2f0e43de86b8ca70e73170d54750d0dc86dcdbf5563265911d981801b929dd63
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o lean4 -DSHOW -DIDXSHIFT=0 -DPROOFSIZE=6 -DEDGEBITS=3 simple_miner.cpp -L. -lblake2b
./lean4 -h 211
Looking for 6-cycle on cuckoo4("211") with 50% edges
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 0 (6,15)
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:06 2 (2,15)
1: 2:15 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:06 3 (6,5)
1: 2:15 3: 4: 5:06 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:06 4 (12,11)
1: 2:15 3: 4: 5:06 6: 7: 8: 9: 10: 11:12 12: 13: 14: 15:06 5 (2,11)
1: 2:15 3: 4: 5:06 6: 7: 8: 9: 10: 11:02 12:11 13: 14: 15:06 6 (4,11)
1: 2:15 3: 4:11 5:06 6: 7: 8: 9: 10: 11:02 12:11 13: 14: 15:06 7 (4,5)
6-cycle found at 87%
Solution 0 2 3 5 6 7
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o lean28stx8 -mavx2 -DNSIPHASH=8 -DEDGEBITS=27 lean_miner.cpp -L. -lblake2b
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o mean28x8 -mavx2 -DXBITS=6 -DCOMPRESSROUND=10 -DNSIPHASH=8 -DEDGEBITS=27 mean_miner.cpp -L. -lblake2b
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o lean30stx8 -mavx2 -DNSIPHASH=8 -DEDGEBITS=29 lean_miner.cpp -L. -lblake2b
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o mean30x8 -mavx2 -DNSIPHASH=8 -DEDGEBITS=29 mean_miner.cpp -L. -lblake2b
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o lean32stx8 -mavx2 -DNSIPHASH=8 -DEDGEBITS=31 lean_miner.cpp -L. -lblake2b
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o mean32x8 -mavx2 -DEXPANDROUND=8 -DCOMPRESSROUND=22 -DXBITS=8 -DNSIPHASH=8 -DEDGEBITS=31 mean_miner.cpp -L. -lblake2b
gcc -m64 -std=gnu11 -O -o verify28 -DEDGEBITS=27 cuckoo.c -L. -lblake2b
gcc -m64 -std=gnu11 -O -o verify30 -DEDGEBITS=29 cuckoo.c -L. -lblake2b
gcc -m64 -std=gnu11 -O -o verify32 -DEDGEBITS=31 cuckoo.c -L. -lblake2b
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o mean30sx8 -mavx2 -DSAVEEDGES -DNSIPHASH=8 -DEDGEBITS=29 mean_miner.cpp -L. -lblake2b
time ./mean30sx8 -n 63
Looking for 42-cycle on cuckoo30("",63) with 50% edges
Using 3168MB bucket memory at 7fc335202010,
1x21MB thread memory at 7fc333d11010,
8-way siphash, and 128 buckets.
nonce 63
0.12user 2.36system 0:23.55elapsed 10%CPU (0avgtext+0avgdata 3268776maxresident)k
16inputs+0outputs (0major+816560minor)pagefaults 0swaps
make: *** [Makefile:105: demo30x8] Error 4

Could you help me, please?
Thanks,
Marco

No license included in source

Can you include an open source license with the source distribution? There's a copyright notice in some of the files, but no license.

Speed challenge

Performance summary: 3x speed at the cost of 20x memory usage. The speed advantage can be 4x or higher, depending on your computer and multithreading mode.

This work does not actually follow the bounty requirements.

  1. Only cuckoo28 and cuckoo30 are prepared. cuckoo32 is not available mainly because it is estimated to require 16GB memory and my computer does not have that.
  2. There is no SHA-256 hashing to convert 80-byte header to (k0,k1). Instead, you supply (k0,k1) directly from the command line. The reason is because I want to reduce clutter and focus on the Cuckoo Cycle.
  3. No mining mode because there is no SHA-256 hashing. The program only does one-shot cycle finding for a (k0,k1) input.
  4. You should trust the timing printed by my program. It excludes memory allocation time, and it makes sense to do so to gauge mining performance. You can still use Linux time command to see that the difference is reasonable.

Despite the above, this submission is interesting for the following reasons.

  1. I think the DRAM latency problem is bypassed and the performance is somewhat limited by memory bandwidth instead, but I could be wrong.
  2. Your reference implementation now becomes the TMTO.
  3. It sounds cool to perform Cuckoo Cycle PoW without Cuckoo hashing. The algorithm used is extreme form of edge trimming followed by naive traversing of graph to find cycle. The main ingredient for the edge trimming is bucket sorting to detect collision.
  4. This program contains multiple techniques put together to make it fast.

A description of the algorithm is still in progress. It may take some time, as I am busy with full-time job. The program currently outputs a lot of progress and debug information. Expect it to be buggy.

https://github.com/xenoncat/cuckoo_pow

Not solvable at 50% edge easiness?

I have been playing around with the java version and noticed that it never really finds a solution on pretty much most things at 50% edge easiness. I usually have to bump it up to about 56% to find a solution. Which, seems odd, considering that it should be able to find a solution at 50%. I only ever see the cycle print out when it reaches about 75% or more the way through the nonces.

example of one at 56%

ith 56% edges and 1 threads
 14-cycle found at 0:85%
 18-cycle found at 0:87%
 26-cycle found at 0:88%
 20-cycle found at 0:89%
 72-cycle found at 0:89%
 194-cycle found at 0:91%
 168-cycle found at 0:91%
 174-cycle found at 0:91%
 14-cycle found at 0:92%
 24-cycle found at 0:92%
 380-cycle found at 0:92%
 180-cycle found at 0:92%
 104-cycle found at 0:92%
 356-cycle found at 0:92%
 310-cycle found at 0:92%
 212-cycle found at 0:92%
 28-cycle found at 0:92%
 254-cycle found at 0:92%
 300-cycle found at 0:92%
 306-cycle found at 0:92%
 414-cycle found at 0:92%
 86-cycle found at 0:92%
 386-cycle found at 0:92%
 202-cycle found at 0:92%
 302-cycle found at 0:92%
 376-cycle found at 0:92%
 200-cycle found at 0:92%
 144-cycle found at 0:92%
 334-cycle found at 0:92%
 366-cycle found at 0:93%
 114-cycle found at 0:93%
 362-cycle found at 0:93%
 264-cycle found at 0:93%
 394-cycle found at 0:93%
 136-cycle found at 0:93%
 130-cycle found at 0:93%
 118-cycle found at 0:93%
 318-cycle found at 0:93%
 118-cycle found at 0:93%
 202-cycle found at 0:93%
 182-cycle found at 0:93%
 98-cycle found at 0:93%
 30-cycle found at 0:93%
 150-cycle found at 0:93%
 152-cycle found at 0:93%
 352-cycle found at 0:93%
 208-cycle found at 0:93%
 188-cycle found at 0:93%
 340-cycle found at 0:93%
 366-cycle found at 0:93%
 128-cycle found at 0:93%
 128-cycle found at 0:93%
 218-cycle found at 0:93%
 260-cycle found at 0:93%
 446-cycle found at 0:93%
 306-cycle found at 0:93%
 238-cycle found at 0:93%
 368-cycle found at 0:93%
 408-cycle found at 0:93%
 186-cycle found at 0:93%
 276-cycle found at 0:93%
 260-cycle found at 0:93%
 342-cycle found at 0:93%
 42-cycle found at 0:93%
 326-cycle found at 0:93%
Solution 7d25 7eb5 961a fb19 103d2 13606 16320 204ab 20b7a 242e0 24429 25398 29025 310fc 31472 33113 37910 3a6cf 3de2b 3fb9b 41bc1 459e5 4e794 4fe98 54153 57949 57b3a 57ed3 58cc6 5e676 60627 68077 7374a 73b9f 747b5 7507c 7694a 7a0e8 7ba83 83e0e 840e6 8690f

vs what I get at 50%:

with 50% edges and 1 threads
 14-cycle found at 0:95%
 18-cycle found at 0:97%
 26-cycle found at 0:99%

with no solution found at all.

Am I missing something here on how it is suppose to work?

Reduce cudaMemcpy size of hostA to sizeof(u32)

cuckoo/mean.cu
cuckaroo/mean.cu
cuckatoo/mean.cu

cudaMemcpy(hostA, indexesE, NX * NY * sizeof(u32), cudaMemcpyDeviceToHost);

should be change to

cudaMemcpy(hostA, indexesE, sizeof(u32), cudaMemcpyDeviceToHost);

to reduce io between device and host a little?

make cuda28 fails

The latest commit I could get working was af7c892. The error message I got was:

$ make cuda28
nvcc -o cuda28 -DEDGEBITS=27 -arch sm_35 cuda_miner.cu -lssl -lcrypto
cuda_miner.cu(136): error: identifier "HALFSIZE" is undefined
cuda_miner.cu(162): error: identifier "SIZE" is undefined
cuda_miner.cu(169): error: identifier "SIZESHIFT" is undefined
cuda_miner.cu(170): error: identifier "SIZE" is undefined
cuda_miner.cu(187): error: identifier "SIZE" is undefined
cuda_miner.cu(195): error: identifier "SIZESHIFT" is undefined
cuda_miner.cu(215): error: argument of type "siphash_keys *" is incompatible with parameter of type "const char *"
cuda_miner.cu(215): error: argument of type "char *" is incompatible with parameter of type "u32"
cuda_miner.cu(215): error: too few arguments in function call
cuda_miner.cu(224): error: identifier "HALFSIZE" is undefined
cuda_miner.cu(242): error: identifier "HALFSIZE" is undefined
cuda_miner.cu(260): error: identifier "SIZESHIFT" is undefined
cuda_miner.cu(309): error: identifier "SIZESHIFT" is undefined
cuda_miner.cu(321): error: identifier "HALFSIZE" is undefined
cuda_miner.cu(365): error: identifier "SIZE" is undefined
15 errors detected in the compilation of "/tmp/tmpxft_00001830_00000000-9_cuda_miner.cpp1.ii".
Makefile:132: recipe for target 'cuda28' failed
make: *** [cuda28] Error 2

mean_miner.h & mean_miner.hpp

Well that's confusing. It looks like the .hpp file is there to replace the .h file, but unless you're going right through the code to change all references of the .h file to the .hpp file, that's going to be an issue, yes?

Modify value of M/N?

Modifying the value of NEDGES causes most of the miners to crash -- is there an easy way to set the value of M/N (M edges and N vertices) to an arbitrary value (like 0.7)?

Libblake2b.so not found

Hello John, just a quick note that the tests in the Makefile can't find libblake2b.so (on the linux I am using, which is a Ubuntu clone), because it is built in the local directory and that usually isn't in ones library path.

I set LD_LIBRARY_PATH to the build dir and things work fine (or one could add the build dir to ld.so.conf and re-run ldconfig with sudo privs, I suppose). Perhaps there should be a note in the README or an addition to the Makefile to set LD_LIBRARY_PATH (with a warning I would think :) )?

Anyways, I just thought I would mention it.

On a separate note: can you still build/run xenoncats code? I tried last night and kept getting seg faults (I opened a new ticket about it) for all the versions except cuckoo20? I severely doubt this is a problem specific to my machine, but I suppose it could be. Just wanted to check before I waste 12 hours on memtest86 to no avail ;)

Thanks,
Kelly

Apparent inconsistency between results, using Cuckoo12

As discussed in the grin channel, I've come across something that looks like an inconsistency
between implementations that I haven't quite yet figured out. This may not be an issue, but I'd really appreciate if you could look at it.

I've put an example branch here, containing a few changes that I've made in my own branch for grin. The main differences are that it uses its own SHA256 implementation and doesn't append a nonce to the hash (because this is done on the grin side,) but otherwise should work identically:

https://github.com/mimblewimble/cuckoo/tree/cuckoo12-bug

I have 2 targets in the makefile, cuckoo12 and simple12. In both cases, I've just hardcoded a sample header generated by grin, which is valid and has a solution in grin's implementation and in the simple_miner.cpp implementation. If you make and run simple12, you should see the solution... this is the same as appears in grin, and indeed simple miner works when I hook it into grin through my modded plugin implementation.

cuckoo12 is compiled with the same parameters, EDGEBITS=11, and given the same header as simple12.. but it comes up with no solution set.

However, if, in cuckoo.h, I transform EDGEBITS into a global variable instead of a #define, and set it to 12, (i.e. it's 12 at initialisation time,) then set it back to 11 in the main function (set GRIN_MOD to 1) in my modded implementation to do this), then the result set that comes back is identical to the simple12 example.

Again, haven't quite been able to figure out why, but this indicates there's some slight inconsistency somewhere in the cuckoo_miner.cpp implementation, as the hash keys are identical in both examples.

Security Concern of Possible Collisions

According to the code, Siphash is only taking the first 16 bytes total for k0 an k1 from the sha256 hash. Which means, it has a probability of collision as that of a MD5 hash.

I am referring to this:

public Cuckoo(byte[] header) {
    byte[] hdrkey;
    try {
      hdrkey = MessageDigest.getInstance("SHA-256").digest(header);
      long k0 = u8to64(hdrkey, 0);
      long k1 = u8to64(hdrkey, 8);
      v[0] = k0 ^ 0x736f6d6570736575L;
      v[1] = k1 ^ 0x646f72616e646f6dL;
      v[2] = k0 ^ 0x6c7967656e657261L;
      v[3] = k1 ^ 0x7465646279746573L;
    } catch(NoSuchAlgorithmException e) {
      System.out.println(e);
      System.exit(1);
    }
  }

Recommendation:
create k0,k1,k2,k3 each with 64bit integers from throughout the sha256 32 bytes.

would be better if it was:

public Cuckoo(byte[] header) {
    byte[] hdrkey;
    try {
      hdrkey = MessageDigest.getInstance("SHA-256").digest(header);
      long k0 = u8to64(hdrkey, 0);
      long k1 = u8to64(hdrkey, 8);
      long k2 = u8to64(hdrkey, 16);
      long k3 = u8to64(hdrkey, 24);

      v[0] = k0 ^ 0x736f6d6570736575L;
      v[1] = k1 ^ 0x646f72616e646f6dL;
      v[2] = k2 ^ 0x6c7967656e657261L;
      v[3] = k3 ^ 0x7465646279746573L;
    } catch(NoSuchAlgorithmException e) {
      System.out.println(e);
      System.exit(1);
    }
  }

This ensures it uses the full 32 bytes of the sha256 hash.

Compilation issues on latest macOS

Seems there were some changes to the pthread libs in the latest version of macOS, with an appropriate issue opened in grin. Suggested fixes that need to be folded into the miners below.

https://github.com/ignopeverell/grin/issues/152

#ifndef _MACH_PORT_T
#define _MACH_PORT_T
#include <sys/_types.h> /* __darwin_mach_port_t */
typedef __darwin_mach_port_t mach_port_t;
#include <pthread.h>
mach_port_t pthread_mach_thread_np(pthread_t);
#endif /* _MACH_PORT_T */

Is the solver context thread safe?

Hi John:

I build mean.cu as a shared library, and use cgo to call it. Create one goroutine for each graphic card, and binding one solver context for each goroutine.

One solver context works as expected, but multiple contexts not work well, after running a while, those
solver contexts stopping working, and the log shows:

findcycles edges 46 time 4 ms total 4 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 4 ms total 4 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 16 ms total 16 ms
findcycles edges 46 time 4 ms total 4 ms

Seems that trimming phase is not work. Any ideas? Thanks!

mean30sx8 error

When running the build instructions on arch linux I get the following error:

./mean30sx8 -n 63
Looking for 42-cycle on cuckoo30("",63) with 50% edges
Using 3168MB bucket memory at 7f6bc0763010,
1x21MB thread memory at 7f6bbf272010,
8-way siphash, and 128 buckets.
nonce 63
genUnodes round  0 size 536870912 rdtsc: 5862158714
mean30sx8: mean_miner.hpp:198: u32 zbucket<BUCKETSIZE>::setsize(const u8*) [with unsigned int BUCKETSIZE = 202752; u32 = unsigned int; u8 = unsigned char]: Assertion `size <= BUCKETSIZE' failed.

The system is a i7 8700k with 32GB of DDR4
Linux philipp-pc 4.14.57-1-MANJARO #1 SMP PREEMPT Sun Jul 22 19:25:51 UTC 2018 x86_64 GNU/Linux

avx and avx2 are supported as stated in the flags from /proc/cpuinfo

is there any more debugging information I can provide?

Cuckaroo cuda build fails: undefined avx builtin intrinsics

I was unable to build Cuckaroo on ubuntu 16.04 (x86_64) with a GTX 1070 and CUDA 10.0 installed.
I am attempting to build by executing:

make cuda29

The output that I am getting is the following:

/usr/local/cuda-10.0/bin/nvcc -o cuda29 -DEDGEBITS=29 -arch sm_35 mean.cu ../crypto/blake2b-ref.c
/usr/lib/gcc/x86_64-linux-gnu/6/include/avx512fintrin.h(12943): error: identifier "__builtin_ia32_kmov16" is undefined

1 error detected in the compilation of "/tmp/tmpxft_00004028_00000000-8_mean.cpp1.ii".
Makefile:53: recipe for target 'cuda29' failed

I am working on commit b39eb42054c15eb9af41e17286ebc21d2875deb7

This might help: https://devtalk.nvidia.com/default/topic/780902/nvcc-with-avx-support-cannot-find-gcc-builtin-intrinsics/

Inconsistency between C++ and Java miners.

I'm seeing an inconsistency between the Java and C++ miners:
In C++ the nodes are shifted by 1 bit and or'ed bit (cuckoo.h):

node_t sipnode(siphash_keys *keys, edge_t nonce, u32 uorv) {
 return _sipnode(keys, nonce, uorv) << 1 | uorv;
}

In Java they aren't (Cuckoo.java):

  // generate edge in cuckoo graph
  public int sipnode(int nonce, int uorv) {
    return (int)siphash24(2*nonce + uorv) & EDGEMASK;
  }
  // generate edge in cuckoo graph
  public Edge sipedge(int nonce) {
    return new Edge(sipnode(nonce,0), sipnode(nonce,1));
  }

Is it a trivial choice, is it by design? Am I overlooking something? The Java cycle verification is more accepting than the C++ one, though all miners produce equal results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.