ihaque / memtestg80 Goto Github PK
View Code? Open in Web Editor NEWCUDA-based memory tester for NVIDIA GPUs
License: Other
CUDA-based memory tester for NVIDIA GPUs
License: Other
I have just tried testing on two new Quadro P6000 cards. Both return the same errors on tests while testing memory over 16400 MiB.
Following are results of 16400 passing, 16401 failing and 20000 failing:
-------------------------------------------------------------
| MemtestG80 v1.00 |
| |
| Usage: memtestG80 [flags] [MB GPU RAM to test] [# iters] |
| |
| Defaults: GPU 0, 128MB RAM, 50 test iterations |
| Amount of tested RAM will be rounded up to nearest 2MB |
-------------------------------------------------------------
Available flags:
--gpu N ,-g N : run test on the Nth (from 0) CUDA GPU
--license ,-l : show license terms for this build
Running 1 iterations of tests over 16400 MB of GPU memory on card 0: Quadro P6000
Running memory bandwidth test over 20 iterations of 8200 MB transfers...
Estimated bandwidth 328000000.00 MB/s
Test iteration 1 (GPU 0, 16400 MiB): 0 errors so far
Moving Inversions (ones and zeros): 0 errors (256 ms)
Memtest86 Walking 8-bit: 0 errors (2049 ms)
True Walking zeros (8-bit): 0 errors (1011 ms)
True Walking ones (8-bit): 0 errors (1012 ms)
Moving Inversions (random): 0 errors (258 ms)
Memtest86 Walking zeros (32-bit): 0 errors (4050 ms)
Memtest86 Walking ones (32-bit): 0 errors (4051 ms)
Random blocks: 0 errors (456 ms)
Memtest86 Modulo-20: 0 errors (23933 ms)
Logic (one iteration): 0 errors (129 ms)
Logic (4 iterations): 0 errors (130 ms)
Logic (shared memory, one iteration): 0 errors (129 ms)
Logic (shared-memory, 4 iterations): 0 errors (130 ms)
Final error count after 1 iterations over 16400 MiB of GPU memory: 0 errors
-------------------------------------------------------------
| MemtestG80 v1.00 |
| |
| Usage: memtestG80 [flags] [MB GPU RAM to test] [# iters] |
| |
| Defaults: GPU 0, 128MB RAM, 50 test iterations |
| Amount of tested RAM will be rounded up to nearest 2MB |
-------------------------------------------------------------
Available flags:
--gpu N ,-g N : run test on the Nth (from 0) CUDA GPU
--license ,-l : show license terms for this build
Running 1 iterations of tests over 16402 MB of GPU memory on card 0: Quadro P6000
Running memory bandwidth test over 20 iterations of 8201 MB transfers...
Estimated bandwidth 328040000.00 MB/s
Test iteration 1 (GPU 0, 16402 MiB): 0 errors so far
Moving Inversions (ones and zeros): 0 errors (257 ms)
Memtest86 Walking 8-bit: 0 errors (2052 ms)
True Walking zeros (8-bit): 0 errors (1010 ms)
True Walking ones (8-bit): 0 errors (1014 ms)
Moving Inversions (random): 0 errors (257 ms)
Memtest86 Walking zeros (32-bit): 0 errors (4050 ms)
Memtest86 Walking ones (32-bit): 0 errors (4051 ms)
Random blocks: 67198032 errors (457 ms)
Memtest86 Modulo-20: 0 errors (23952 ms)
Logic (one iteration): 0 errors (128 ms)
Logic (4 iterations): 0 errors (130 ms)
Logic (shared memory, one iteration): 0 errors (129 ms)
Logic (shared-memory, 4 iterations): 0 errors (130 ms)
Final error count after 1 iterations over 16402 MiB of GPU memory: 67198032 errors
-------------------------------------------------------------
| MemtestG80 v1.00 |
| |
| Usage: memtestG80 [flags] [MB GPU RAM to test] [# iters] |
| |
| Defaults: GPU 0, 128MB RAM, 50 test iterations |
| Amount of tested RAM will be rounded up to nearest 2MB |
-------------------------------------------------------------
Available flags:
--gpu N ,-g N : run test on the Nth (from 0) CUDA GPU
--license ,-l : show license terms for this build
Running 1 iterations of tests over 20000 MB of GPU memory on card 0: Quadro P6000
Running memory bandwidth test over 20 iterations of 10000 MB transfers...
Estimated bandwidth 2030456.85 MB/s
Test iteration 1 (GPU 0, 20000 MiB): 0 errors so far
Moving Inversions (ones and zeros): 0 errors (313 ms)
Memtest86 Walking 8-bit: 0 errors (2499 ms)
True Walking zeros (8-bit): 0 errors (1232 ms)
True Walking ones (8-bit): 0 errors (1234 ms)
Moving Inversions (random): 0 errors (314 ms)
Memtest86 Walking zeros (32-bit): 0 errors (4932 ms)
Memtest86 Walking ones (32-bit): 0 errors (4933 ms)
Random blocks: 2270811672 errors (557 ms)
Memtest86 Modulo-20: 0 errors (29190 ms)
Logic (one iteration): 0 errors (157 ms)
Logic (4 iterations): 0 errors (158 ms)
Logic (shared memory, one iteration): 0 errors (157 ms)
Logic (shared-memory, 4 iterations): 0 errors (157 ms)
Final error count after 1 iterations over 20000 MiB of GPU memory: 2270811672 errors
The number of errors are the same for each card. All other tests pass which makes me think this is a bug and not a failure of the card.
This is a great tool and has helped me find GPUs with problems.
Thank you
I am running a RTX 3090 with 24 Gigs of vram.
Trying to allocate 4095mb gives the following error:
memtestG80.exe 4095 1
-------------------------------------------------------------
| MemtestG80 v1.00 |
| |
| Usage: memtestG80 [flags] [MB GPU RAM to test] [# iters] |
| |
| Defaults: GPU 0, 128MB RAM, 50 test iterations |
| Amount of tested RAM will be rounded up to nearest 2MB |
-------------------------------------------------------------
Available flags:
--gpu N ,-g N : run test on the Nth (from 0) CUDA GPU
--license ,-l : show license terms for this build
Running 1 iterations of tests over 4096 MB of GPU memory on card 0: NVIDIA GeForce RTX 3090
Running memory bandwidth test over 20 iterations of 2048 MB transfers...
Test failed!
Test iteration 1 (GPU 0, 4096 MiB): 0 errors so far
Moving Inversions (ones and zeros): 4294967295 errors (125 ms)
Memtest86 Walking 8-bit: 4294967288 errors (0 ms)
True Walking zeros (8-bit): 4294967288 errors (0 ms)
True Walking ones (8-bit): 4294967288 errors (0 ms)
Moving Inversions (random): 4294967295 errors (0 ms)
Memtest86 Walking zeros (32-bit): 4294967264 errors (0 ms)
Memtest86 Walking ones (32-bit): 4294967264 errors (0 ms)
Random blocks: 4294967295 errors (0 ms)
Memtest86 Modulo-20: 4294967276 errors (0 ms)
Logic (one iteration): 4294967295 errors (0 ms)
Logic (4 iterations): 4294967295 errors (0 ms)
Logic (shared memory, one iteration): 4294967295 errors (0 ms)
Logic (shared-memory, 4 iterations): 4294967295 errors (0 ms)
Final error count after 1 iterations over 4096 MiB of GPU memory: 4294967181 errors
While trying to allocate one mb less (4094mb) works just fine:
./memtestG80.exe 4094 1
-------------------------------------------------------------
| MemtestG80 v1.00 |
| |
| Usage: memtestG80 [flags] [MB GPU RAM to test] [# iters] |
| |
| Defaults: GPU 0, 128MB RAM, 50 test iterations |
| Amount of tested RAM will be rounded up to nearest 2MB |
-------------------------------------------------------------
Available flags:
--gpu N ,-g N : run test on the Nth (from 0) CUDA GPU
--license ,-l : show license terms for this build
Running 1 iterations of tests over 4094 MB of GPU memory on card 0: NVIDIA GeForce RTX 3090
Running memory bandwidth test over 20 iterations of 2047 MB transfers...
Estimated bandwidth 401372.55 MB/s
Test iteration 1 (GPU 0, 4094 MiB): 0 errors so far
Moving Inversions (ones and zeros): 0 errors (78 ms)
Memtest86 Walking 8-bit: 0 errors (484 ms)
True Walking zeros (8-bit): 0 errors (250 ms)
True Walking ones (8-bit): 0 errors (250 ms)
Moving Inversions (random): 0 errors (63 ms)
Memtest86 Walking zeros (32-bit): 0 errors (1000 ms)
Memtest86 Walking ones (32-bit): 0 errors (1031 ms)
Random blocks: 0 errors (62 ms)
Memtest86 Modulo-20: 0 errors (1578 ms)
Logic (one iteration): 0 errors (32 ms)
Logic (4 iterations): 0 errors (31 ms)
Logic (shared memory, one iteration): 0 errors (31 ms)
Logic (shared-memory, 4 iterations): 0 errors (31 ms)
Final error count after 1 iterations over 4094 MiB of GPU memory: 0 errors
Initially I though something is wrong with my GPU so I wrote a small python snippet allocating vram gradually with PyOpenGL and it succeeds in allocating the full 24Gb I have on the card.
Any idea what's going on here?
I have an NVidia GeForce GTX 1060 with 6GB of RAM. I'm running Linux (64-bit, Debian stable) with the latest proprietary drivers which support Cuda. I downloaded the memtestG80 binary (from SimTK) and ran it, and at first I was just getting Segmentation Fault errors. I never figured out what was causing that, but after a restart I am now able to get the binary to run, however even with default parameters (128MB, 50 tests) it reports thousands of errors almost immediately every time.
I also downloaded the code from this repo and compiled it, and have been running it for 100 iterations on almost all the GPU RAM and have not seen any errors. Does this make sense? I can literally run the binary from SimTK, get ~6000 errors reported with the defaults, and then go run the version I compiled (also with the default params) and get 0 errors reported every time.
Should the binary from SimTK work correctly, and thus should I assume the card is bad and that it is just a fluke that the compiled version isn't also showing the errors? Or should I trust the result with the compiled version and ignore the errors reported from the binary from SimTK? (If so, should a note to this effect be placed on the SimTK download area?)
Other things I have done:
for some reason on my 980gtx the memory down-clocks to 3005 (6010 mhz effective) when memtestg80 is running. at other times it runs at the normal speed (3505, 7010 mhz effective)
Im not familiar with nvcc but I assume at some point -O defaulted to some level of optimization and now it is required to be specified?
E:\Software\src\memtestG80>make -f Makefiles\Makefile.windows
nvcc -c -DWINDOWS -DCURL_STATICLIB -O -Xptxas -v -o memtestG80_core.obj memtestG80_core.cu
nvcc fatal : '-Xptxas': expected a number
Makefiles\Makefile.windows:13: recipe for target 'memtestG80_core.obj' failed
make: *** [memtestG80_core.obj] Error 1
Changed Makefile.windows to be
CFLAGS=-DWINDOWS -DCURL_STATICLIB -O2
resolved the issue
I get this when running the make command
nvcc -c -DWINDOWS -DCURL_STATICLIB -O2 -Xptxas -v -o memtestG80_core.obj memtestG80_core.cu nvcc fatal : Cannot find compiler 'cl.exe' in PATH make: *** [memtestG80_core.obj] Error 1
I was able to run the cuda example that comes with Visual studio 2022 on Windows 10
Imran,
Thank you for putting memtestG80 in GitHub.
It has been about 3 1/2 years since the last update and multiple GPU systems are be oming more common.
Have you thought about adding multi GPU support for memtestG80 and running the memtest core in parallel on all the GPUs with OpenMP?
Or having memtestG80 automatically determining the maximum amount of available GPU memory to test?
BYI what is Standford now using for XStream?
Later,
David Carver
Breaks above 2^32 bytes of memory (4000MB works, 4100MB breaks) - 32bit issues?
If so, allocate multiple chunks of memory?
Output from 'memtestG80 4100' (on 8GB card)
Running 2 iterations of tests over 4100 MB of GPU memory on card 0: GeForce GTX 1080
Test iteration 1 (GPU 0, 4100 MiB): 0 errors so far
Moving Inversions (ones and zeros): 4294967295 errors (16 ms)
Memtest86 Walking 8-bit: 4294967288 errors (0 ms)
True Walking zeros (8-bit): 4294967288 errors (0 ms)
True Walking ones (8-bit): 4294967288 errors (0 ms)
Moving Inversions (random): 4294967295 errors (0 ms)
Memtest86 Walking zeros (32-bit): 4294967264 errors (0 ms)
Memtest86 Walking ones (32-bit): 4294967264 errors (0 ms)
Random blocks: 4294967295 errors (0 ms)
Memtest86 Modulo-20: 4294967276 errors (0 ms)
Logic (one iteration): 4294967295 errors (0 ms)
Logic (4 iterations): 4294967295 errors (0 ms)
Logic (shared memory, one iteration): 4294967295 errors (0 ms)
Logic (shared-memory, 4 iterations): 4294967295 errors (0 ms)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.