tromp / cuckoo Goto Github PK
View Code? Open in Web Editor NEWa memory-bound graph-theoretic proof-of-work system
License: Other
a memory-bound graph-theoretic proof-of-work system
License: Other
Have you already thought about whether it could be compiled to run in a browser? Would a hand-optimised js implementation be a terrible idea?
This is very interesting, thanks for sharing it!
I'm working on a tool that runs a bunch of cuckoo cycle solvers and saves output (including time,
algorithm use, and graph size) to a file. The hope is that people in the community will use this tool to benchmark and report their GPU/CPU performance, which I can then use to produce a public website that shows the performance of the different solvers and algorithms across different GPUs and CPUs.
You can see the current work in progress code here.
Parsing data from the solver output was pretty painful. Would you consider adding some kind of structured output, for example JSON, to make such tools easier to write? Ideally it would log all parameters that the solver was launched with, information on the CPU or GPU which the solver was running on, and information about solution times for the graphs searched.
It turns out that the edge-trimming process in mean.cpp
is much more difficult for me to understand, as compared to the simple
and lean
version... It's even hard for me to make a list of questions, since I am a sort of totally lost... I watched at least 3 youtube videos about cuckoo cycle, but not in-depth discussion about mean
algorithm so far...so I am wondering if there are any more information on the web discussing details about mean.hpp
?
For example, I am confused about what's the data structure is used to store "edges", and how (where) the "edges" are trimmed with regard to the corresponding data structure...also it seems a zbucket elment is of BIGSIZE
(5) bytes, which is 40 bits, and an an edge's two node (UYZ, VYZ) is 22*2=44(bit). How 44-bit info fits into 40-bit space...
Many thanks!
Hey, your cuckoo cycle paper inspired me to choose proof of work functions as a topic for a paper I had to write this semester. In the paper linked below, I purposed a small change to the algorithm which keeps it memory hard/easy to verify, but removes possibility of speeding up the computation through pre-calculating siphash values - link:
http://erbbysam.com/papers/seeking_better_proof_of_work.pdf (in particular, pages 6-8)
Not really an issue, just something for you to consider.
Thanks,
Sam
I did not figure out the meaning and usage of it yet...
Many thanks!
./cuda29 -h dd7703e424a7f841131d8c580bf79c09907a463323b2d4531b3907d3823c40092f28377b48abfc46
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.54 Driver Version: 396.54 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:03:00.0 On | N/A |
| 60% 31C P8 14W / 120W | 86MiB / 6078MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 106... Off | 00000000:04:00.0 Off | N/A |
| 60% 33C P8 10W / 120W | 2MiB / 6078MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 106... Off | 00000000:09:00.0 Off | N/A |
| 60% 35C P8 9W / 120W | 2MiB / 6078MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1153 G /usr/lib/xorg/Xorg 33MiB |
| 0 1192 G /usr/bin/gnome-shell 50MiB |
+-----------------------------------------------------------------------------+
GeForce GTX 1060 3GB with 6078MB @ 192 bits x 4004MHz
Looking for 42-cycle on cuckoo30("AX
zF3#ำ<",0) with 50% edges, 64*64 buckets, 176 trims, and 64 thread blocks.
GPUassert: out of memory mean.cu 377
Hi tromp,
When I run cuda29, I got GPUassert: out of memory mean.cu
error, Is there any problem with my program?
Line 482 in 7d69e76
hi, tromp, I can not understand this code: the indexesE[2] never used after round1, how did you know the bucket size after bufferB + sizeB/NB?
Thanks for creating the Cuckoo Cycle and providing this excellent implementation!
This library has some defines to handle EDGEBITS > 32 but, for example, neither the lean nor the mean implementation compile if you set it to 33. Here's some output from attempting this with the mean implementation:
g++ -march=native -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o mean34x1 -DNSIPHASH=1 -DEDGEBITS=33 mean_miner.cpp -L. -lblake2b
mean_miner.cpp: In function โint main(int, char**)โ:
mean_miner.cpp:94:54: error: cannot convert โu32* {aka unsigned int*}โ to โedge_t* {aka long unsigned int*}โ for argument โ1โ to โint verify(edge_t*, siphash_keys*)โ
int pow_rc = verify(prf, &ctx.trimmer->sip_keys);
Hacking the appropriate line to use a 64-bit type doesn't solve the problem, so before going further down the rabbit hole I figured I'd ask if EDGEBITS > 32 is intended to work without significant changes.
For context: I'm interested in very long running (runtime of hours) memory-hardened PoWs. This is not for a consensus/mining mechanism, but as an authorization mechanism for writing data to a blockchain. It seems like Cuckoo Cycle could be a good fit. An option being considered is chaining solutions for multiple proofs together to enable a much longer running PoW without prohibitively high memory requirements. This causes a larger aggregate solution size -- something that I'd also like to minimize by finding the right balance. Hence, I'd like to know if this implementation supports larger graphs. I suppose requiring a longer cycle is also something that could be played with.
Need to run:
perl -pi -e 's/^ */\t/' Makefile
Otherwise cannot compile:
Makefile:192: *** missing separator. Stop.
Ubuntu 16.04
According to Section 2.2 of SipHash specification, last byte of of input string should be prefixed with message length.
But according to following line in siphash.h
, it seems that message length is not appended.
Line 48 in c79d660
Is there any reason for this difference?
grin-miner/cuckoo-miner/src/cuckoo/sys/plugins/cuckoo/src/cuckoo/mean.hpp:689: void
edgetrimmer::trimrename1(u32, u32) [with bool TRIMONV = true; u32 = unsigned int]: Assertion
'renames <buckets[NX][NY].renameu1' failed
grin-miner/cuckoo-miner/src/cuckoo/sys/plugins/cuckoo/src/cuckoo/mean.hpp:689: void
edgetrimmer::trimrename1(u32, u32) [with bool TRIMONV = true; u32 = unsigned int]: Assertion
'renames <buckets[NX][NY].renameu1' failed
Aborted (core dumped)
grin-miner version v1.0.0 on Linux using default CPU solver
I get this when nthreads
> 1 but works fine otherwise
Anyone seen this error?
I believe the last diff broke the cuda solver:
Building cuckoo29 cuda solver...
nvcc -std=c++11 -o cuda29 -DEDGEBITS=29 -arch sm_35 mean.cu ../crypto/blake2b-ref.c
mean.cu(782): error: class "SolverParams" has no member "cpuload"
mean.cu(810): error: class "SolverParams" has no member "cpuload"
mean.cu(834): error: class "SolverParams" has no member "cpuload"
-d 0 -E 0 -h "" -m 176 -n 0 -r 1 -U 4096 -u 256 -v 128 -w 512 -y 1024 -Z
1024 -z 1024
The thread 0x1c84 has exited with code 0 (0x0).
The thread 0x1160 has exited with code 0 (0x0).
The thread 0x2a30 has exited with code 0 (0x0).
First-chance exception at 0x01049349 in vs2013.exe: 0xC0000005: Access violation reading location 0x08812770.
Unhandled exception at 0x01049349 in vs2013.exe: 0xC0000005: Access violation reading location 0x08812770.
class cuckoo_hash {
public:
u64 *cuckoo;
cuckoo_hash() {
cuckoo = new u64[CUCKOO_SIZE];
}
~cuckoo_hash() {
delete[] cuckoo;
}
void set(node_t u, node_t v) {
u64 niew = (u64)u << NODEBITS | v;
for (node_t ui = u >> IDXSHIFT; ; ui = (ui+1) & CUCKOO_MASK) {
u64 old = cuckoo[ui]; //error at
if (old == 0 || (old >> NODEBITS) == (u & KEYMASK)) {
cuckoo[ui] = niew;
return;
}
}
}
node_t operator[](node_t u) const {
for (node_t ui = u >> IDXSHIFT; ; ui = (ui+1) & CUCKOO_MASK) {
u64 cu = cuckoo[ui]; //error at
if (!cu)
return 0;
if ((cu >> NODEBITS) == (u & KEYMASK)) {
assert(((ui - (u >> IDXSHIFT)) & CUCKOO_MASK) < MAXDRIFT);
return (node_t)(cu & NODEMASK);
}
}
}
};
I would suggest refactoring the solvers to share a single main function. Benefits would be:
I found a bug in #72 which had been copied into four different files, since they were part of the main function which is duplicated across multiple files.
If there was a single main function shared by all the solvers, features added to that main function wouldn't need to be copied to all the main function.
Also, it would be an opportunity to rewrite the main function in rust, because why not :)
Additionally, if the main function was separated from the solvers, individual solvers could be statically linked into other programs where the flexibility of dynamic linking isn't required.
I am working with an Arch Linux system, and I encountered two issues when building this code.
First was setting the LD_LIBRARY_PATH, for this, the command, which I found in another issue:
export LD_LIBRARY_PATH=$PWD
The second prerequisite was the 'time' command. On Arch Linux the function exists, I'm not 100% certain, but I think, within bash itself, and so the makefile was unable to run the timing tests until I installed the package with pacman -S time
.
There is no build instructions in the README.md and I think these two things should be added in order to complete the documentation.
I found there is a note about in the Makefile...
@tromp , thanks for such a straightforward, well-documented repo!
Where can I find more info about the fair mining license and how it works? The description is license.txt is a little brief.
If I'm interested in implementing a miner (or sponsoring an implementation)...
Googled around a bit, wasn't able to find much on it. This thread on removing miner fees comes to mind. Happy to link to this issue/PR a documentation improvement if it is desired. Thank you.
./cuda29 -h dd7703e424a7f841131d8c580bf79c09907a463323b2d4531b3907d3823c
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.54 Driver Version: 396.54 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:03:00.0 On | N/A |
| 60% 31C P8 14W / 120W | 86MiB / 6078MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 106... Off | 00000000:04:00.0 Off | N/A |
| 60% 33C P8 10W / 120W | 2MiB / 6078MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 106... Off | 00000000:09:00.0 Off | N/A |
| 60% 35C P8 9W / 120W | 2MiB / 6078MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1153 G /usr/lib/xorg/Xorg 33MiB |
| 0 1192 G /usr/bin/gnome-shell 50MiB |
+-----------------------------------------------------------------------------+
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Linux miner 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Hi tromp:
When I run cuda29, I got GPUassert: out of memory mean.cu 377
error, Is there any problem with my program? Tell me if the information above isn't enough.
I think that to augment diffusion and interest in cuckoo cycle non optimized C89 version of the code could be useful (no threads, no atomic, etc...).
I'm looking into this, but there is no clear baseline for the bounty. On the one hand, your Makefile defines bounty targets that use the regular miner (cuckoo_miner), but on the other hand, you have a SIMD implementation in bounty_miner (which I assume is much faster) that is not being used in the Makefile.
So what exactly are we supposed to beat here??
cuda19
gives this runtime exception:
cuda19: mean.cu:851: SolverCtx* create_solver_ctx(SolverParams*): Assertion `tp.genA.blocks * tp.genA.tpb <= NEDGES' failed.
I didn't try the other variants but assume they're the same.
As per 6c78cec CI, that runs on virtualized Linux environment operated by Travis, fails:
$ if test demo30 = "${JOB:?}"; then ( cd src && env ${LIBV:?}="${LIBP:?}" make demo30; ); fi
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o mean30s -mavx2 -DSAVEEDGES -DNSIPHASH=8 -DEDGEBITS=29 mean_miner.cpp -L. -lblake2b
time ./mean30s -n 25
Looking for 42-cycle on cuckoo30("",25) with 50% edges
Using 3168MB bucket memory at 2ad301872010,
1x21MB thread memory at 2ad3c78b3010,
8-way siphash, and 128 buckets.
nonce 25 k0 k1 21db62abf2643353 f6603de05b29eab
Command terminated by signal 4
0.00user 0.68system 0:00.79elapsed 86%CPU (0avgtext+0avgdata 3268196maxresident)k
0inputs+0outputs (0major+3049minor)pagefaults 0swaps
make: *** [demo30] Error 132
This may be symptom of mean30 relying on undefined C/C++ behaviour, or similar other real issue.
When running grin-miner with the c31 plugin, 1 thread will use ~11 GB of memory as expected. However, running 64 threads consumes 15.5 GB, and much beyond that will cause a memory allocation crash.
This system is a Xeon Phi 7210, which has 256 logical cores and 16 GB of high-bandwidth memory. Is a ~70 MB allocation per thread actually necessary? Are there any mitigating actions that could be taken here?
Hi John,
First of all, "Merry Xmas"!
I've a little problem compiling your code on my laptop:
$ export LD_LIBRARY_PATH=$PWD
$ make
gcc -m64 -std=gnu11 -O -fPIC -shared -o libblake2b.so blake2b-ref.c
echo "add build dir to LD_LIBRARY_PATH (or DYLD_LIBRARY_PATH on Mac OSX) if library not found"
add build dir to LD_LIBRARY_PATH (or DYLD_LIBRARY_PATH on Mac OSX) if library not found
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o lean20 -DATOMIC -DEDGEBITS=19 lean_miner.cpp -L. -lblake2b
gcc -m64 -std=gnu11 -O -o verify20 -DEDGEBITS=19 cuckoo.c -L. -lblake2b
./lean20 -n 38 | grep ^Sol | ./verify20 -n 38
Verifying size 42 proof for cuckoo20("",38)
Verified with cyclehash 2f0e43de86b8ca70e73170d54750d0dc86dcdbf5563265911d981801b929dd63
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o lean4 -DSHOW -DIDXSHIFT=0 -DPROOFSIZE=6 -DEDGEBITS=3 simple_miner.cpp -L. -lblake2b
./lean4 -h 211
Looking for 6-cycle on cuckoo4("211") with 50% edges
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 0 (6,15)
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:06 2 (2,15)
1: 2:15 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:06 3 (6,5)
1: 2:15 3: 4: 5:06 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:06 4 (12,11)
1: 2:15 3: 4: 5:06 6: 7: 8: 9: 10: 11:12 12: 13: 14: 15:06 5 (2,11)
1: 2:15 3: 4: 5:06 6: 7: 8: 9: 10: 11:02 12:11 13: 14: 15:06 6 (4,11)
1: 2:15 3: 4:11 5:06 6: 7: 8: 9: 10: 11:02 12:11 13: 14: 15:06 7 (4,5)
6-cycle found at 87%
Solution 0 2 3 5 6 7
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o lean28stx8 -mavx2 -DNSIPHASH=8 -DEDGEBITS=27 lean_miner.cpp -L. -lblake2b
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o mean28x8 -mavx2 -DXBITS=6 -DCOMPRESSROUND=10 -DNSIPHASH=8 -DEDGEBITS=27 mean_miner.cpp -L. -lblake2b
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o lean30stx8 -mavx2 -DNSIPHASH=8 -DEDGEBITS=29 lean_miner.cpp -L. -lblake2b
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o mean30x8 -mavx2 -DNSIPHASH=8 -DEDGEBITS=29 mean_miner.cpp -L. -lblake2b
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o lean32stx8 -mavx2 -DNSIPHASH=8 -DEDGEBITS=31 lean_miner.cpp -L. -lblake2b
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o mean32x8 -mavx2 -DEXPANDROUND=8 -DCOMPRESSROUND=22 -DXBITS=8 -DNSIPHASH=8 -DEDGEBITS=31 mean_miner.cpp -L. -lblake2b
gcc -m64 -std=gnu11 -O -o verify28 -DEDGEBITS=27 cuckoo.c -L. -lblake2b
gcc -m64 -std=gnu11 -O -o verify30 -DEDGEBITS=29 cuckoo.c -L. -lblake2b
gcc -m64 -std=gnu11 -O -o verify32 -DEDGEBITS=31 cuckoo.c -L. -lblake2b
g++ -march=native -m64 -std=c++11 -Wall -Wno-format -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -DPREFETCH -I. -pthread -o mean30sx8 -mavx2 -DSAVEEDGES -DNSIPHASH=8 -DEDGEBITS=29 mean_miner.cpp -L. -lblake2b
time ./mean30sx8 -n 63
Looking for 42-cycle on cuckoo30("",63) with 50% edges
Using 3168MB bucket memory at 7fc335202010,
1x21MB thread memory at 7fc333d11010,
8-way siphash, and 128 buckets.
nonce 63
0.12user 2.36system 0:23.55elapsed 10%CPU (0avgtext+0avgdata 3268776maxresident)k
16inputs+0outputs (0major+816560minor)pagefaults 0swaps
make: *** [Makefile:105: demo30x8] Error 4
Could you help me, please?
Thanks,
Marco
Can you include an open source license with the source distribution? There's a copyright notice in some of the files, but no license.
Performance summary: 3x speed at the cost of 20x memory usage. The speed advantage can be 4x or higher, depending on your computer and multithreading mode.
This work does not actually follow the bounty requirements.
Despite the above, this submission is interesting for the following reasons.
A description of the algorithm is still in progress. It may take some time, as I am busy with full-time job. The program currently outputs a lot of progress and debug information. Expect it to be buggy.
Using DFS, cucktoo needs to store a lot of information.
Thanks
Hi John:
https://github.com/tromp/cuckoo/blob/master/src/cuckoo/mean.cu#L409
cudaEventCreate
without cudaEventDestroy
will lead memory leak.
I have been playing around with the java version and noticed that it never really finds a solution on pretty much most things at 50% edge easiness. I usually have to bump it up to about 56% to find a solution. Which, seems odd, considering that it should be able to find a solution at 50%. I only ever see the cycle print out when it reaches about 75% or more the way through the nonces.
example of one at 56%
ith 56% edges and 1 threads
14-cycle found at 0:85%
18-cycle found at 0:87%
26-cycle found at 0:88%
20-cycle found at 0:89%
72-cycle found at 0:89%
194-cycle found at 0:91%
168-cycle found at 0:91%
174-cycle found at 0:91%
14-cycle found at 0:92%
24-cycle found at 0:92%
380-cycle found at 0:92%
180-cycle found at 0:92%
104-cycle found at 0:92%
356-cycle found at 0:92%
310-cycle found at 0:92%
212-cycle found at 0:92%
28-cycle found at 0:92%
254-cycle found at 0:92%
300-cycle found at 0:92%
306-cycle found at 0:92%
414-cycle found at 0:92%
86-cycle found at 0:92%
386-cycle found at 0:92%
202-cycle found at 0:92%
302-cycle found at 0:92%
376-cycle found at 0:92%
200-cycle found at 0:92%
144-cycle found at 0:92%
334-cycle found at 0:92%
366-cycle found at 0:93%
114-cycle found at 0:93%
362-cycle found at 0:93%
264-cycle found at 0:93%
394-cycle found at 0:93%
136-cycle found at 0:93%
130-cycle found at 0:93%
118-cycle found at 0:93%
318-cycle found at 0:93%
118-cycle found at 0:93%
202-cycle found at 0:93%
182-cycle found at 0:93%
98-cycle found at 0:93%
30-cycle found at 0:93%
150-cycle found at 0:93%
152-cycle found at 0:93%
352-cycle found at 0:93%
208-cycle found at 0:93%
188-cycle found at 0:93%
340-cycle found at 0:93%
366-cycle found at 0:93%
128-cycle found at 0:93%
128-cycle found at 0:93%
218-cycle found at 0:93%
260-cycle found at 0:93%
446-cycle found at 0:93%
306-cycle found at 0:93%
238-cycle found at 0:93%
368-cycle found at 0:93%
408-cycle found at 0:93%
186-cycle found at 0:93%
276-cycle found at 0:93%
260-cycle found at 0:93%
342-cycle found at 0:93%
42-cycle found at 0:93%
326-cycle found at 0:93%
Solution 7d25 7eb5 961a fb19 103d2 13606 16320 204ab 20b7a 242e0 24429 25398 29025 310fc 31472 33113 37910 3a6cf 3de2b 3fb9b 41bc1 459e5 4e794 4fe98 54153 57949 57b3a 57ed3 58cc6 5e676 60627 68077 7374a 73b9f 747b5 7507c 7694a 7a0e8 7ba83 83e0e 840e6 8690f
vs what I get at 50%:
with 50% edges and 1 threads
14-cycle found at 0:95%
18-cycle found at 0:97%
26-cycle found at 0:99%
with no solution found at all.
Am I missing something here on how it is suppose to work?
cuckoo/mean.cu
cuckaroo/mean.cu
cuckatoo/mean.cu
cudaMemcpy(hostA, indexesE, NX * NY * sizeof(u32), cudaMemcpyDeviceToHost);
should be change to
cudaMemcpy(hostA, indexesE, sizeof(u32), cudaMemcpyDeviceToHost);
to reduce io between device and host a little?
The latest commit I could get working was af7c892
. The error message I got was:
$ make cuda28
nvcc -o cuda28 -DEDGEBITS=27 -arch sm_35 cuda_miner.cu -lssl -lcrypto
cuda_miner.cu(136): error: identifier "HALFSIZE" is undefined
cuda_miner.cu(162): error: identifier "SIZE" is undefined
cuda_miner.cu(169): error: identifier "SIZESHIFT" is undefined
cuda_miner.cu(170): error: identifier "SIZE" is undefined
cuda_miner.cu(187): error: identifier "SIZE" is undefined
cuda_miner.cu(195): error: identifier "SIZESHIFT" is undefined
cuda_miner.cu(215): error: argument of type "siphash_keys *" is incompatible with parameter of type "const char *"
cuda_miner.cu(215): error: argument of type "char *" is incompatible with parameter of type "u32"
cuda_miner.cu(215): error: too few arguments in function call
cuda_miner.cu(224): error: identifier "HALFSIZE" is undefined
cuda_miner.cu(242): error: identifier "HALFSIZE" is undefined
cuda_miner.cu(260): error: identifier "SIZESHIFT" is undefined
cuda_miner.cu(309): error: identifier "SIZESHIFT" is undefined
cuda_miner.cu(321): error: identifier "HALFSIZE" is undefined
cuda_miner.cu(365): error: identifier "SIZE" is undefined
15 errors detected in the compilation of "/tmp/tmpxft_00001830_00000000-9_cuda_miner.cpp1.ii".
Makefile:132: recipe for target 'cuda28' failed
make: *** [cuda28] Error 2
line 31:
// bit/10 2222222 2111111 111110000000000
should be
// bit/10 2222222 2211111 111110000000000?
we use gpu miner, don't use cpu to do mining. in order to thanks for your great works for zcash community. i send 0.5btc: https://blockchain.info/tx/f4b30f9975d42e5d30c7f6868ae120dfdf2aba67b55e60126d3410458a5321e0
ZCASH forum forbiddens me to send pm.
so, i have to send it here.
if possible, give me your email.
Well that's confusing. It looks like the .hpp file is there to replace the .h file, but unless you're going right through the code to change all references of the .h file to the .hpp file, that's going to be an issue, yes?
Modifying the value of NEDGES
causes most of the miners to crash -- is there an easy way to set the value of M/N
(M
edges and N
vertices) to an arbitrary value (like 0.7)?
Hello John, just a quick note that the tests in the Makefile can't find libblake2b.so (on the linux I am using, which is a Ubuntu clone), because it is built in the local directory and that usually isn't in ones library path.
I set LD_LIBRARY_PATH to the build dir and things work fine (or one could add the build dir to ld.so.conf and re-run ldconfig with sudo privs, I suppose). Perhaps there should be a note in the README or an addition to the Makefile to set LD_LIBRARY_PATH (with a warning I would think :) )?
Anyways, I just thought I would mention it.
On a separate note: can you still build/run xenoncats code? I tried last night and kept getting seg faults (I opened a new ticket about it) for all the versions except cuckoo20? I severely doubt this is a problem specific to my machine, but I suppose it could be. Just wanted to check before I waste 12 hours on memtest86 to no avail ;)
Thanks,
Kelly
in the comments at line 69:
CUCKOO_SIZE * sizeof(u64) == 2 * ONCE_BITS / 32
32
seems a typo of 8
.
I am not sure this is the proper place to report it, but whitepaper has a missing reference ([?]) at the second paragraph in page 2.
Here are the logs we get when running the cuda31 utility program with -E 2 -r 100
parameters:
Tesla P100: https://pastebin.com/M3Gnpi7t
Tesla V100: https://pastebin.com/HLUngKbQ
As discussed in the grin channel, I've come across something that looks like an inconsistency
between implementations that I haven't quite yet figured out. This may not be an issue, but I'd really appreciate if you could look at it.
I've put an example branch here, containing a few changes that I've made in my own branch for grin. The main differences are that it uses its own SHA256 implementation and doesn't append a nonce to the hash (because this is done on the grin side,) but otherwise should work identically:
https://github.com/mimblewimble/cuckoo/tree/cuckoo12-bug
I have 2 targets in the makefile, cuckoo12 and simple12. In both cases, I've just hardcoded a sample header generated by grin, which is valid and has a solution in grin's implementation and in the simple_miner.cpp implementation. If you make and run simple12, you should see the solution... this is the same as appears in grin, and indeed simple miner works when I hook it into grin through my modded plugin implementation.
cuckoo12 is compiled with the same parameters, EDGEBITS=11, and given the same header as simple12.. but it comes up with no solution set.
However, if, in cuckoo.h, I transform EDGEBITS into a global variable instead of a #define, and set it to 12, (i.e. it's 12 at initialisation time,) then set it back to 11 in the main function (set GRIN_MOD to 1) in my modded implementation to do this), then the result set that comes back is identical to the simple12 example.
Again, haven't quite been able to figure out why, but this indicates there's some slight inconsistency somewhere in the cuckoo_miner.cpp implementation, as the hash keys are identical in both examples.
Hi John!
I have the error in the subject. Could you check and fix the Makefile, please?
Thanks,
Marco
Here are the logs we get when running the cuda29 utility program with -r 100
parameter:
According to the code, Siphash is only taking the first 16 bytes total for k0 an k1 from the sha256 hash. Which means, it has a probability of collision as that of a MD5 hash.
I am referring to this:
public Cuckoo(byte[] header) {
byte[] hdrkey;
try {
hdrkey = MessageDigest.getInstance("SHA-256").digest(header);
long k0 = u8to64(hdrkey, 0);
long k1 = u8to64(hdrkey, 8);
v[0] = k0 ^ 0x736f6d6570736575L;
v[1] = k1 ^ 0x646f72616e646f6dL;
v[2] = k0 ^ 0x6c7967656e657261L;
v[3] = k1 ^ 0x7465646279746573L;
} catch(NoSuchAlgorithmException e) {
System.out.println(e);
System.exit(1);
}
}
Recommendation:
create k0,k1,k2,k3 each with 64bit integers from throughout the sha256 32 bytes.
would be better if it was:
public Cuckoo(byte[] header) {
byte[] hdrkey;
try {
hdrkey = MessageDigest.getInstance("SHA-256").digest(header);
long k0 = u8to64(hdrkey, 0);
long k1 = u8to64(hdrkey, 8);
long k2 = u8to64(hdrkey, 16);
long k3 = u8to64(hdrkey, 24);
v[0] = k0 ^ 0x736f6d6570736575L;
v[1] = k1 ^ 0x646f72616e646f6dL;
v[2] = k2 ^ 0x6c7967656e657261L;
v[3] = k3 ^ 0x7465646279746573L;
} catch(NoSuchAlgorithmException e) {
System.out.println(e);
System.exit(1);
}
}
This ensures it uses the full 32 bytes of the sha256 hash.
Seems there were some changes to the pthread libs in the latest version of macOS, with an appropriate issue opened in grin. Suggested fixes that need to be folded into the miners below.
https://github.com/ignopeverell/grin/issues/152
#ifndef _MACH_PORT_T
#define _MACH_PORT_T
#include <sys/_types.h> /* __darwin_mach_port_t */
typedef __darwin_mach_port_t mach_port_t;
#include <pthread.h>
mach_port_t pthread_mach_thread_np(pthread_t);
#endif /* _MACH_PORT_T */
Hi John:
I build mean.cu
as a shared library, and use cgo to call it. Create one goroutine for each graphic card, and binding one solver context for each goroutine.
One solver context works as expected, but multiple contexts not work well, after running a while, those
solver contexts stopping working, and the log shows:
findcycles edges 46 time 4 ms total 4 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 4 ms total 4 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 3 ms total 3 ms
findcycles edges 46 time 16 ms total 16 ms
findcycles edges 46 time 4 ms total 4 ms
Seems that trimming phase is not work. Any ideas? Thanks!
When running the build instructions on arch linux I get the following error:
./mean30sx8 -n 63
Looking for 42-cycle on cuckoo30("",63) with 50% edges
Using 3168MB bucket memory at 7f6bc0763010,
1x21MB thread memory at 7f6bbf272010,
8-way siphash, and 128 buckets.
nonce 63
genUnodes round 0 size 536870912 rdtsc: 5862158714
mean30sx8: mean_miner.hpp:198: u32 zbucket<BUCKETSIZE>::setsize(const u8*) [with unsigned int BUCKETSIZE = 202752; u32 = unsigned int; u8 = unsigned char]: Assertion `size <= BUCKETSIZE' failed.
The system is a i7 8700k with 32GB of DDR4
Linux philipp-pc 4.14.57-1-MANJARO #1 SMP PREEMPT Sun Jul 22 19:25:51 UTC 2018 x86_64 GNU/Linux
avx and avx2 are supported as stated in the flags from /proc/cpuinfo
is there any more debugging information I can provide?
I was unable to build Cuckaroo on ubuntu 16.04 (x86_64) with a GTX 1070 and CUDA 10.0 installed.
I am attempting to build by executing:
make cuda29
The output that I am getting is the following:
/usr/local/cuda-10.0/bin/nvcc -o cuda29 -DEDGEBITS=29 -arch sm_35 mean.cu ../crypto/blake2b-ref.c
/usr/lib/gcc/x86_64-linux-gnu/6/include/avx512fintrin.h(12943): error: identifier "__builtin_ia32_kmov16" is undefined
1 error detected in the compilation of "/tmp/tmpxft_00004028_00000000-8_mean.cpp1.ii".
Makefile:53: recipe for target 'cuda29' failed
I am working on commit b39eb42054c15eb9af41e17286ebc21d2875deb7
This might help: https://devtalk.nvidia.com/default/topic/780902/nvcc-with-avx-support-cannot-find-gcc-builtin-intrinsics/
Zwilla/BitCaine5_aeternity_miner#5
Hi @tromp, can you take a look on this?
Line 750 in 1c84aae
with or without it have the same effect!?
I'm seeing an inconsistency between the Java and C++ miners:
In C++ the nodes are shifted by 1 bit and or'ed bit (cuckoo.h):
node_t sipnode(siphash_keys *keys, edge_t nonce, u32 uorv) {
return _sipnode(keys, nonce, uorv) << 1 | uorv;
}
In Java they aren't (Cuckoo.java):
// generate edge in cuckoo graph
public int sipnode(int nonce, int uorv) {
return (int)siphash24(2*nonce + uorv) & EDGEMASK;
}
// generate edge in cuckoo graph
public Edge sipedge(int nonce) {
return new Edge(sipnode(nonce,0), sipnode(nonce,1));
}
Is it a trivial choice, is it by design? Am I overlooking something? The Java cycle verification is more accepting than the C++ one, though all miners produce equal results.
It's not 2015 any more.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.