Giter Site home page Giter Site logo

deviations from RISC-V about verigpu HOT 6 OPEN

hughperkins avatar hughperkins commented on July 19, 2024
deviations from RISC-V

from verigpu.

Comments (6)

hughperkins avatar hughperkins commented on July 19, 2024

Interesting. Good info. Thank you :) Yes, effectively, my current ISA is aligned with the Zfinx extension, as you say.

That said, I intend to move towards using BF16 at some point, because it provides the same dynamic range as single precision, but with half the number of bits. BF16 is becoming increasingly popular for machine learning. In ML, the quantization of having only a 7-bit mantissa manifests itself as noise, and ML training loves noise.

When I move towards BF16, I haven't quite decided how I intend to do that. There are two options I see:

    1. pack two FP16 numbers into a single 32-bit register
    1. use 16-bit registers

I'm fairly tempted by the second option, which would be a deviation from Zhinx, if I understand correctly? Actually, both options would be, since if I understand correctly, Zhinx would use a full 32-bit register to store each 16-bit half float?

Big picture, I intend to only support BF16 floats. No 32-bit, no 64-bit, no FP16. This will keep the cores small, lightweight, and then we can either pack in a lot of cores into the same size die; or shrink the die, keeping tape-out costs lower.

from verigpu.

hughperkins avatar hughperkins commented on July 19, 2024

What else?

As far as what else...

from verigpu.

hughperkins avatar hughperkins commented on July 19, 2024

(Update: controller now capable of allocating gpu memory, and passing data back and forth to the gpu :) https://github.com/hughperkins/VeriGPU/blob/8fcaf074e50d798e6b14930027c0ad862f206dd4/prot/verilator/prot_unified_source/verilator_driver.cpp ) (Edit: I could do with a PCIe4 interface; opportunity for someone to add one whilst I'm working on the c++ kernel compilation/launch bits).

from verigpu.

hughperkins avatar hughperkins commented on July 19, 2024

Question: are you aware of any way of persuading clang/llvm to generate Zfinx-compatible assembly? I just now realized that if i:

  • use clang to separate out kernels, in a single-source scenario, into LLVM IR files,
  • and then use clang's llc to convert these LLVM IR files into riscv32 assembly files
    ... then the assembly files will plausibly use more total int + float registers than I will have room for.

from verigpu.

hughperkins avatar hughperkins commented on July 19, 2024

Might be in llvm-14 :)

/usr/local/opt/llvm-14.0.0/bin/llc --march riscv32 -mattr=help 2>&1 | grep zfinx
  zfinx            - 'Zfinx' (Float in Integer).

from verigpu.

hughperkins avatar hughperkins commented on July 19, 2024

Ok, so:

  • the bad news is that zfinx isn't in llvm14. It's not even in main
  • the good news is that https://github.com/sunshaoce has fixed up +zfinx in https://reviews.llvm.org/D122918 , so that at least loads, stores, additions and multiplications are working now :) (I believe that a lot more than these operations are working, but at least my simple float kernels at
    __global__ void sum_floats(float *in, unsigned int numValues, float *p_out) {
    // sum the ints in in, and write the result to *out
    // we assume just a single thread/core for now
    float out = 0.0;
    for (unsigned int i = 0; i < numValues; i++) {
    out += in[i];
    }
    *p_out = out;
    }
    and
    __global__ void mul_floats(float *in, unsigned int numValues, float *p_out) {
    // sum the ints in in, and write the result to *out
    // we assume just a single thread/core for now
    float out = 1.0;
    for (unsigned int i = 0; i < numValues; i++) {
    out *= in[i];
    }
    *p_out = out;
    }
    compile and run now :))))

from verigpu.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.