Comments (6)
Interesting. Good info. Thank you :) Yes, effectively, my current ISA is aligned with the Zfinx extension, as you say.
That said, I intend to move towards using BF16 at some point, because it provides the same dynamic range as single precision, but with half the number of bits. BF16 is becoming increasingly popular for machine learning. In ML, the quantization of having only a 7-bit mantissa manifests itself as noise, and ML training loves noise.
When I move towards BF16, I haven't quite decided how I intend to do that. There are two options I see:
-
- pack two FP16 numbers into a single 32-bit register
-
- use 16-bit registers
I'm fairly tempted by the second option, which would be a deviation from Zhinx, if I understand correctly? Actually, both options would be, since if I understand correctly, Zhinx would use a full 32-bit register to store each 16-bit half float?
Big picture, I intend to only support BF16 floats. No 32-bit, no 64-bit, no FP16. This will keep the cores small, lightweight, and then we can either pack in a lot of cores into the same size die; or shrink the die, keeping tape-out costs lower.
from verigpu.
What else?
As far as what else...
- there will be other deviations for the ISA used by each core
- I'm starting to ponder instructions for the GPU controller I'm starting to draft. https://github.com/hughperkins/VeriGPU/blob/b3a9ff188e42edb814e1d42b17ac0a850014f68b/prot/verilator/prot_unified_source/controller.sv Initially this will just use whatever instructions I feel like, e.g. for the unified C++ source, I'm not even attempting to use OpenCL instructions or similar currently,
VeriGPU/prot/verilator/prot_unified_source/verilator_driver.cpp
Lines 107 to 110 in b3a9ff1
- however, sooner or later might want to find a standard ISA for this (but since there's only a handful of instructions (memory alloc/free copy memory in each direction, launch kernel), and it's pretty niche, might not be necessary?)
from verigpu.
(Update: controller now capable of allocating gpu memory, and passing data back and forth to the gpu :) https://github.com/hughperkins/VeriGPU/blob/8fcaf074e50d798e6b14930027c0ad862f206dd4/prot/verilator/prot_unified_source/verilator_driver.cpp ) (Edit: I could do with a PCIe4 interface; opportunity for someone to add one whilst I'm working on the c++ kernel compilation/launch bits).
from verigpu.
Question: are you aware of any way of persuading clang/llvm to generate Zfinx-compatible assembly? I just now realized that if i:
- use clang to separate out kernels, in a single-source scenario, into LLVM IR files,
- and then use clang's llc to convert these LLVM IR files into riscv32 assembly files
... then the assembly files will plausibly use more total int + float registers than I will have room for.
from verigpu.
Might be in llvm-14 :)
/usr/local/opt/llvm-14.0.0/bin/llc --march riscv32 -mattr=help 2>&1 | grep zfinx
zfinx - 'Zfinx' (Float in Integer).
from verigpu.
Ok, so:
- the bad news is that zfinx isn't in llvm14. It's not even in
main
- the good news is that https://github.com/sunshaoce has fixed up
+zfinx
in https://reviews.llvm.org/D122918 , so that at least loads, stores, additions and multiplications are working now :) (I believe that a lot more than these operations are working, but at least my simple float kernels atVeriGPU/examples/cpp_single_source/sum_floats/sum_floats.cpp
Lines 15 to 23 in 68b6c22
VeriGPU/examples/cpp_single_source/mul_floats/mul_floats.cpp
Lines 15 to 23 in 68b6c22
from verigpu.
Related Issues (16)
- Consider doing a tapeout using SKY130 using Google's no cost shuttle program HOT 2
- FV on the Current Design HOT 3
- CMake issues HOT 5
- Generation of files in "examples/direct/expected" HOT 2
- Open gpu v HOT 1
- DOes this contain hardware ? HOT 1
- Code coverage report HOT 2
- [doc] Awesome! How to get started? - Quick Start Readme HOT 3
- DDR4 Controller HOT 2
- PCIe 4+ interface HOT 1
- Network on a Chip (NoC) implementation HOT 1
- Does anyone have an understanding into the trade-offs of providing an instruction pointer to each core, or only to each computer unit?
- Why not fully RISC-V compliant?
- Implement into VAAMAN!
- Any specific environment? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from verigpu.