Comments (6)
I'm confused too. 3 possibilities you can check for:
- do you have the full GPU allocated for the job, or is it split in 2 instances and you only use 1?
Yes , I have the full GPU allocated for this job , and it is not split in 2 instances. like the screen shot I show.- hardware issue like insufficient cooling (maybe) or bad power delivery (unlikely)
I don't think cooling is a problem . I have tested this A100 with HPL and HPCG benchmark, they are consistent with public results.- check with older 470 driver, maybe the compiler on 510 was changed and gets confused (unlikely)
I haven't test this
from fluidx3d.
Strange... I just re-tested on our A100 40GB PCIe, and I get consistent results.
Our system uses Nvidia driver 470.103.01, ECC is enabled on the A100. I also checked on 2 other systems with A100 40GB SXM4 and results are the same.
FP16C is very heavy on compute, so you might get thermal throttling. Did you check for sufficient cooling?
from fluidx3d.
from fluidx3d.
I'm confused too. 3 possibilities you can check for:
- do you have the full GPU allocated for the job, or is it split in 2 instances and you only use 1?
- hardware issue like insufficient cooling (maybe) or bad power delivery (unlikely)
- check with older 470 driver, maybe the compiler on 510 was changed and gets confused (unlikely)
from fluidx3d.
I have another question, even in your results list , the MLUPS with FP16C is less than that with FP16s ๏ผ so , in which kind of case we should choose FP16C instead of FP16S ?
from fluidx3d.
FP16S is memory compression to hardware-supported IEEE-754 FP16 format with 1 bit for sign, 5 bits for exponent und 10 bits for mantissa. The conversion is done in hardware, thus it does only double FLOPs/Byte compared to FP32, as it halves transferred Bytes but does not need significantly more FLOPs for the conversion.
FP16C is a custom floating-point format with 1 bit for sign, 4 bits for exponent und 11 bits for mantissa. This halves the truncation error compared to FP16S, so it's more accurate; though the difference is only visible in edge case scenarios. But conversion is not supported in hardware and has to be emulated in software, increasing FLOPs/Byte by a factor of ~8 compared to FP32. Hardware with very fast memory and at the same time low compute power struggles with that.
Bottom line, use
- FP32 when accuracy is the main constraint
- FP16C when both memory and accuracy are the main constraints
- FP16S when both memory and compute time are the main constraints
For more details, see this paper.
from fluidx3d.
Related Issues (20)
- Less brittle Linux OpenCL setup instructions HOT 3
- No progress - stuck at Step 0 (benchmark works) HOT 2
- cannot find file "FluidX3D.exe" HOT 2
- Including the scale, frame-info, and copyright notice inside the output PNG files ? HOT 1
- INTERACTIVE_GRAPHICS_ASCII broken by the "Made camera movement/rotation/zoom behavior independent of framerate" feature? HOT 1
- Struct of Arrays vs Arrays of Structs: Possible Oversight? HOT 2
- Connection to flow5 possible? HOT 1
- Order of pairs is non uniform between Lattice sets HOT 2
- Applying a force during simulation to simulate main line water pressure HOT 2
- NIT: No Volume Force in 2D Taylor Vortex HOT 1
- Unnecessary neighbor calculations
- Vibrating Lines HOT 13
- Issue with setting up inflow/outflow boundaries. The boundaries keep reflecting back. HOT 4
- Does FluidX3D support multi-component calculations? HOT 2
- Conflicting declaration when compiling on 32 bit powerpc HOT 3
- Regarding passive tracer particles HOT 2
- undefined reference when trying to run interactive graphics in linux x11 HOT 1
- Model changing when rotating HOT 2
- Acoustic Propagation via Varying Density
- Atomic float addition is using a slow algorithm HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fluidx3d.