This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ROCDL] Adds the ROCDL target attribute.
ClosedPublic

Authored by fmorac on Jun 29 2023, 11:29 AM.

Details

Summary

For an explanation of these patches see D154153.

Commit message:
This patch adds the ROCDL target attribute for serializing GPU modules into
strings containing HSAco.

Depends on D154117

Diff Detail

Event Timeline

fmorac created this revision.Jun 29 2023, 11:29 AM
fmorac updated this revision to Diff 536068.Jun 29 2023, 5:36 PM

Rebasing.

fmorac edited the summary of this revision. (Show Details)Jun 29 2023, 6:18 PM
fmorac added a reviewer: mehdi_amini.
fmorac published this revision for review.Jun 30 2023, 6:29 AM

Overall seems reasonable, just have a few questions

mlir/lib/Dialect/GPU/Targets/AMDGPUTarget.cpp
40 ↗(On Diff #536068)

Are these meant to be called before main() like this?

189 ↗(On Diff #536068)

These split-up blocks of #includes are vaguely bugging me but that's a nit

fmorac updated this revision to Diff 543181.Jul 22 2023, 6:58 AM

Changing the location of the includes & updating location of ModuleToObject.h.

fmorac planned changes to this revision.Jul 31 2023, 5:38 PM
fmorac retitled this revision from [mlir][gpu] Adds the AMDGPU target attribute. to [mlir][ROCDL] Adds the ROCDL target attribute..
fmorac edited the summary of this revision. (Show Details)
fmorac updated this revision to Diff 545884.Jul 31 2023, 5:53 PM

Switched the AMDGPU target to ROCDL.

Adding some comgr folks in the interests of making sure this revision isn't subtly wrong

Adding some comgr folks in the interests of making sure this revision isn't subtly wrong

I'm potentially adding an extra field in gpu::TargetOptions for passing extra cmd arguments to tools, for example passing --max-reg-count to ptxas on NVIDIA. The question here is, what would be the AMD tool where I should pass those extra flags here?

arsenm added inline comments.Aug 1 2023, 11:00 AM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
454–460

I'm in the process of deleting the control libraries. Everything except wave64 currently has a clear path to deletion

fmorac added inline comments.Aug 1 2023, 11:07 AM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
454–460

What would be the new mechanism? Also, any ideas on how to address compatibility with current ROCm installs? Should we ship this now and then update?

arsenm added inline comments.Aug 1 2023, 11:09 AM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
454–460

For the most part 'work correctly' as if it were a normal library. I would just not bother adding configuration points for the optional ones and just use the conservative default setting. Something will likely still be required for specific subtarget handling (and wave64 vs. wave32 may end up as complete separate builds)

fmorac added inline comments.Aug 1 2023, 11:17 AM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
454–460

Ok, but in the new mechanism without amdgcn/bitcode control libraries, is there an option for requesting fast math and all of this other options?

I mean this attribute only specifies that compilation should use fast math, for example:

rocdl.target<flags = {fast, wave64}>

Only says the compilation mechanism should use fast math. So we could leave those here?

The actual implementation and using those control libraries is done line 205 in mlir/lib/Target/LLVM/ROCDL/Target.cpp

arsenm added inline comments.Aug 1 2023, 12:17 PM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
454–460

Ideally the optimization of the libraries happens based on the fast math flags and attributes of the call sites. For example, the callers should mark their outgoing arguments with nofpclass(nan inf) for finite only. Attributor will then be able to propagate that into the library functions assuming all callers also use fast math

For sqrt, the default will be correct. the backend will look for llvm.sqrt calls with appropriate !fpmath metadata for the not correctly rounded case.

For general unsafe/approximate functions, it will be driven off some combination of afn or contract flags on the call.

DAZ checks can also be folded out based on the known denormal mode by attributor

fmorac added inline comments.Aug 1 2023, 12:25 PM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
454–460

Ok I see, what about existing ROCm installations, again should we ship this as it is, and change it in the future? I mean, I can just remove those control libs .

Also, how about we keep these in the target attribute? as they could be used by a pass aware of the target that swaps non swap to fast math?

arsenm added inline comments.Aug 1 2023, 12:29 PM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
454–460

I don't understand what the "target" attribute means here.

The "libraries" are kind of fake, D130096 switches to just injecting the global constants directly (not sure this is the solution for the target-cpu/isa version)

mehdi_amini added inline comments.Aug 1 2023, 12:39 PM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
454–460

Here the "target" attribute is basically the "driver" of the GPU kernel compiler. It defines "how to invoke LLVM and the assembler to get a binary back"

fmorac added inline comments.Aug 1 2023, 12:42 PM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
454–460

Ok, just to be clear, you are removing the libraries but keeping the control variables? Like you plan to remove oclc_daz_opt.bc but keeping the respective control variable?

Which libraries are going to be removed? Are oclc_isa_version_, oclc_abi_version_, oclc_wavefrontsize64 & ockl also going to be removed?

Target attributes are the new mechanism in MLIR for handling compilation of GPU modules, however they also possess information on the target for example:

#rocdl.target<chip="gfx90a", flags = {fast, noWave64, unsafe_math}>

Says that the GPU module should be compiled for chip gfx90a, using fast, unsafe_math and using wave32 flags.

arsenm added inline comments.Aug 1 2023, 2:20 PM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
454–460

There's a few moving parts here. The first set of optional variables (unsafe math, DAZ, sqrt) should be completely gone in short order. I have a few more pieces to implement but I think are fully deletable within 1-2 months. These conceptually do not belong to the target, so I don't think you want to handle them here.

The control library for the ABI version will likely be gone soon, and replaced with clang directly emitting the global definition.

The target (isa version and wave32/wave64) case is trickier. Long term I have my eye on deleting them, but it's not close enough to have a concrete idea of exactly what it looks like.

fmorac added inline comments.Aug 1 2023, 2:40 PM
mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
454–460

Ok, so a target in here doesn't map 1-1 to an LLVM target, these options inside a target are closer to cmd line options that would be passed to clang.

Ok, I'll change these to emit global definitions and leave linking to wave64.

fmorac updated this revision to Diff 546538.Aug 2 2023, 10:30 AM

Switched to control variables & added more unit tests.

fmorac updated this revision to Diff 547734.Aug 7 2023, 5:19 AM
fmorac marked 2 inline comments as done.

Removed unnecessary fwd declarations.

fmorac updated this revision to Diff 548031.Aug 7 2023, 7:25 PM

Adds a couple of syntax tests & fixes a bug not displaying the link option in the attribute.

fmorac updated this revision to Diff 548613.Aug 9 2023, 7:23 AM

Ping for review. Solved merge conflicts.

So, this code looks reasonable to me (and, as a bonus, I can tell how to invoke this when I'm generating binaries as a library), but I don't want to hand out a green check without @arsenm or someone else who's more involved in the intricacies of how LLVM gets called for GPU compilations signing off, unless we can't get a hold of anyone.

fmorac updated this revision to Diff 549326.Aug 11 2023, 3:59 AM

Rebasing.

mehdi_amini accepted this revision.Aug 11 2023, 10:50 AM

We're not committing to a stable API here, it's a first version that we need to plumb together and learn from.
If we want to be conservative we can just remove every flag and add them later when @arsenm can provide more guidance, otherwise that seems fine to me to land and iterate in tree.

This revision is now accepted and ready to land.Aug 11 2023, 10:50 AM
krzysz00 accepted this revision.Aug 11 2023, 10:56 AM

Works for me

arsenm added inline comments.Aug 11 2023, 10:58 AM
mlir/lib/Target/LLVM/ROCDL/Target.cpp
435

don't need the flushes and can use one dbgs()?

mlir/unittests/Target/LLVM/SerializeROCDLTarget.cpp
157

ASSERT_FALSE(object->empty())?

fmorac updated this revision to Diff 549492.Aug 11 2023, 12:34 PM
fmorac marked 2 inline comments as done.

Addressed the comments.

This revision was landed with ongoing or failed builds.Aug 11 2023, 12:44 PM
This revision was automatically updated to reflect the committed changes.