This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse][gpu] end-to-end test for 2:4 sparsity on NVidia GPUs
ClosedPublic

Authored by aartbik on Mar 6 2023, 3:55 PM.

Diff Detail

Event Timeline

aartbik created this revision.Mar 6 2023, 3:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2023, 3:55 PM
aartbik requested review of this revision.Mar 6 2023, 3:55 PM
aartbik updated this revision to Diff 505681.Mar 15 2023, 6:38 PM

rebased with main, updated RUN command to get into actual running state

aartbik updated this revision to Diff 505967.Mar 16 2023, 6:55 PM

made code actually running, still needs a few i's dotted....

aartbik updated this revision to Diff 506243.Mar 17 2023, 5:19 PM

dotted all the i's (thanks to Chris); ready for review!

aartbik edited the summary of this revision. (Show Details)Mar 17 2023, 5:20 PM
aartbik added reviewers: ThomasRaoux, guraypp.
wrengr added inline comments.Mar 17 2023, 5:48 PM
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-mma-2-4-f16.mlir
2

I'm not sure exactly how lit parses these lines, so I'm sure it'd mess up the double-quotes, but... It'd be a lot easier to read this if it were broken up across a few lines, like how lines 3~6 are

59

Is there any particular reason for using affine.apply instead of arith.floordiv? (ditto for everywhere below)

aartbik retitled this revision from [WIP] [mlir][sparse][gpu] end-to-end test for 2:4 sparsity on NVidia GPUs to [mlir][sparse][gpu] end-to-end test for 2:4 sparsity on NVidia GPUs.Mar 17 2023, 5:52 PM
aartbik marked 2 inline comments as done.Mar 17 2023, 6:25 PM
aartbik added inline comments.
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-mma-2-4-f16.mlir
2

Yeah, you hit the nail on the head that I was very afraid breaking up the string value.
I have cleaned it up a bit so that only the string breaks for now...

59

My understanding is that both yield the same code eventually, and that affine.apply's are more idiomatic for the GPU thread expressions (see e.g. affine.delinearize_index below). So following one style seems the preferred way, but I am easily convinced otherwise ;-)

aartbik updated this revision to Diff 506252.Mar 17 2023, 6:26 PM
aartbik marked 2 inline comments as done.

bit of code cleanup, also used smaller values to avoid fp16 issues

ThomasRaoux accepted this revision.Mar 21 2023, 11:01 AM

Looks great!

This revision is now accepted and ready to land.Mar 21 2023, 11:01 AM
aartbik updated this revision to Diff 507076.Mar 21 2023, 11:58 AM

rebased with main

This revision was landed with ongoing or failed builds.Mar 21 2023, 1:32 PM
This revision was automatically updated to reflect the committed changes.

The bot is broken (and has been for >2 weeks now!): https://lab.llvm.org/buildbot/#/builders/61/builds/42062 ; please make sure you don't filter out the notifications.

Note: this seems to have broken the build bots https://lab.llvm.org/buildbot/#/builders/61/builds/41349 and follow ups added more sparse failures https://lab.llvm.org/buildbot/#/builders/61/builds/42062

I don't think I got any bot notification about this?

This test requires A100 GPU's to be present.
Is there any way to extend

if not config.enable_cuda_runner:
  config.unsupported = True

to have that restriction? In bazel, this would be

"requires-gpu-sm80",