Giter Site home page Giter Site logo

Comments (4)

alazzaro avatar alazzaro commented on August 20, 2024

Are you using CUBLAS flag during the compilation? Such a big kernel should not be there, unless we apply densification and that case we should not transpose...

from dbcsr.

dev-zero avatar dev-zero commented on August 20, 2024

no, I used cmake -DUSE_MPI=OFF -DUSE_CUDA=ON -DWITH_GPU=K20X to build, which leaves CUBLAS turned off by default

from dbcsr.

alazzaro avatar alazzaro commented on August 20, 2024

Ah OK, I see the problem now. The file

https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/libcusmm/parameters_K20X.json

reports "huge" kernels (I think for testing purpose):

{"m": 81, "n": 9, "k": 9, "tile_m": 3, "tile_n": 3, "w": 4, "v": 6, "threads": 128, "grouping": 16, "minblocks": 8, "algorithm": "largeDB1", "perf": 189.642},
{"m": 96, "n": 96, "k": 96, "tile_m": 6, "tile_n": 3, "w": 14, "v": 48, "threads": 512, "grouping": 16, "minblocks": 1, "algorithm": "largeDB1", "perf": 614.588},
{"m": 100, "n": 10, "k": 10, "tile_m": 2, "tile_n": 2, "threads": 256, "grouping": 16, "minblocks": 4, "algorithm": "medium", "perf": 226.917},
{"m": 121, "n": 11, "k": 11, "tile_m": 5, "tile_n": 3, "threads": 128, "grouping": 16, "minblocks": 1, "algorithm": "medium", "perf": 233.211},
{"m": 144, "n": 12, "k": 12, "tile_m": 2, "tile_n": 4, "w": 6, "v": 8, "threads": 288, "grouping": 16, "minblocks": 4, "algorithm": "largeDB1", "perf": 268.209},
{"m": 169, "n": 13, "k": 13, "tile_m": 3, "tile_n": 4, "w": 6, "v": 10, "threads": 256, "grouping": 16, "minblocks": 1, "algorithm": "largeDB1", "perf": 221.427},
{"m": 196, "n": 14, "k": 14, "tile_m": 6, "tile_n": 2, "threads": 256, "grouping": 16, "minblocks": 1, "algorithm": "medium", "perf": 243.838},
{"m": 225, "n": 15, "k": 15, "tile_m": 3, "tile_n": 3, "w": 4, "v": 12, "threads": 384, "grouping": 16, "minblocks": 1, "algorithm": "largeDB1", "perf": 248.307},
{"m": 256, "n": 16, "k": 16, "tile_m": 2, "tile_n": 6, "w": 6, "v": 10, "threads": 384, "grouping": 16, "minblocks": 1, "algorithm": "largeDB1", "perf": 309.19}

@shoshijak
Those kernels are too big to run on the GPU, I wonder how they made it...
I see two solutions:

  1. remove those kernels from the file
  2. replace the assert with the product of dimensions (808080 as maximum).

Personally, I will go for the first solution...

from dbcsr.

shoshijak avatar shoshijak commented on August 20, 2024

The libcusmm_unittest_transpose test just tests all transposition operations that could arise given the kernels defined in parameters_GPU.json. I see that in parameters_K20X.json (https://github.com/cp2k/dbcsr/blob/develop/src/acc/libsmm_acc/libcusmm/parameters_K20X.json), there are a number of kernels with m, n, or k > 80. I'm guessing they were introduced to the parameter file before the Cannot transpose matrices with dimensions above 80-limitation.

If really libcusmm should not be transposing kernels with a dimension above 80, then these kernels should be removed from parameters_K20X.json.

from dbcsr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.