This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Add the Select Object compilation attribute.
ClosedPublic

Authored by fmorac on Jun 29 2023, 1:37 PM.

Details

Summary

For an explanation of these patches see D154153.

Commit message:
This patch adds the default offloading handler for GPU binary ops: #gpu.select_object,
it selects the object to embed based on an index or a target attribute, embedding
the object as a global string and launches the kernel using the scheme used in the
GPU to LLVM pass.

Depends on D154137

Diff Detail

Event Timeline

fmorac created this revision.Jun 29 2023, 1:37 PM
fmorac updated this revision to Diff 536076.Jun 29 2023, 5:48 PM

Rebasing.

fmorac edited the summary of this revision. (Show Details)Jun 29 2023, 6:19 PM
fmorac added a reviewer: mehdi_amini.
fmorac published this revision for review.Jun 30 2023, 6:30 AM

Could this hypothetically be used to do runtime selection - that is, if I'll have a gfx90a binary and a gfx940 binary, could I write an object selector that'll query which GPU I have at execution time and select the relevant binary? Or is that out of scope here?

Could this hypothetically be used to do runtime selection - that is, if I'll have a gfx90a binary and a gfx940 binary, could I write an object selector that'll query which GPU I have at execution time and select the relevant binary? Or is that out of scope here?

With this particular attribute, no.

However If you add the relevant runtime functions and create your own ObjectManager attribute -see D154108 for the interface, you could definitely do it. It would look something like:

gpu.binary @myobject <#mydialect.runtime_select_object> [object0, object1]

I even think it would be possible to have AMD and NVIDIA targets all packed in a single IR, and perform dispatching based on the GPU at execution time. The ObjectManager attribute interface leaves the room open for all of this.

Could this hypothetically be used to do runtime selection - that is, if I'll have a gfx90a binary and a gfx940 binary, could I write an object selector that'll query which GPU I have at execution time and select the relevant binary? Or is that out of scope here?

That would make the compiler all but stateless, which is something I would be strongly against.

Could this hypothetically be used to do runtime selection - that is, if I'll have a gfx90a binary and a gfx940 binary, could I write an object selector that'll query which GPU I have at execution time and select the relevant binary? Or is that out of scope here?

That would make the compiler all but stateless, which is something I would be strongly against.

I think we were talking about something along the lines of fat binaries. Like implementing a fatbin Object Manager attribute, something like:

gpu.binary @name <#gpu.fatbin> [#gpu.object<#gpu.amdgpu<chip = "gfx90a">, "BLOB">, #gpu.object<#gpu.amdgpu<chip = "gfx940">, "BLOB">, #gpu.object<#gpu.nvptx<chip = "sm_80">, "BLOB">]

That when translated down to LLVM IR embeds all the objects together in a single image. And instead of calling mgpuModuleLoad we have something like mgpuFatbinLoad and mgpuFatbinGetFunction loading the correct function at execution time depending on the detected GPU.

The above scheme is possible with Object Manager attributes.

Yeah, I was thinking something more like fat binaries here.

Looks reasonable to me

fmorac updated this revision to Diff 543193.Jul 22 2023, 8:06 AM

Rebasing.

mehdi_amini added inline comments.Jul 24 2023, 12:27 AM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1122

It's not clear to me when it makes sense to have multiple gpu.object when there is a select?

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1122

The SelectObjectManager is just the default manager, but you can have FatBinManager in which it makes sense to have multiple objects.

As to why select_object is the way it is, is to provide a sort of similar method to that of gpu-binary-annotation in the existing implementation of gpu-to-llvm where you can choose the binary:

gpu.module @kernel_module attributes {
      nvvm.cubin = "CUBIN", rocdl.hsaco = "HSACO"
  }
mehdi_amini added inline comments.Jul 24 2023, 11:05 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1122

How would the select be used? The gpu-binary-annotation is an input to a transformation and not an IR construct.

fmorac added inline comments.Jul 25 2023, 5:15 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1122

I thought of 2 uses cases:

  1. having a pass that lets you specify the target and it changes the selected object.
  2. the option of manually editing the selected object without having to do heavy edits to a file (like removing multiple objects).

More nitpicking than actual issues - this translation seems reasonable

mlir/lib/Dialect/GPU/Targets/ObjectHandler.cpp
103

It feels weird to be declaring things into the llvm namespace from mlir/. Maybe this'd make sense in mlir::LLVM with a using?

I might also be off about how this translation code works

fmorac added inline comments.Aug 2 2023, 10:32 AM
mlir/lib/Dialect/GPU/Targets/ObjectHandler.cpp
103

The issue is conflicting names in namespaces, so I cannot have using mlir & using llvm, inside translation it made more sense to use llvm to fix this issue as it in its majority LLVM API.

fmorac updated this revision to Diff 547744.EditedAug 7 2023, 5:42 AM
fmorac edited the summary of this revision. (Show Details)

Moved the SelectObject implementation to the GPU translation library & added registration calls.

TODO:
The registration mechanism needs to be changed in the future, as the IR only verifies successfully if the registration call happens before verifying the IR. This issue also occurs with the NVVM & ROCDL registration mechanisms.
One solution is adding a promised flag in Tablegen that generates a separate trait without the interface methods and checks a promise was made. Also, the promised interface mechanism needs to allow finer granularity, allowing to specify specific attribute promises.

mehdi_amini added inline comments.Aug 8 2023, 10:17 PM
mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
160

Isn't it an Attribute Interface? I'm confused what it means on the dialect here? Does it provide some fallback or something?

1609

cast?

fmorac added inline comments.Aug 9 2023, 7:46 AM
mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
160

Yes, it's an attribute interface, however to not bundle LLVM libs into the GPUDialect lib and being able to put the implementation of the translation on the GPU Translation To LLVM lib, I have to register the interface as a promised interface.

As far as I could tell, the promised interface mechanism doesn't enable the granularity to say that something it's a promised interface for an specific attribute, it just promises an interface, that's why I listed it in the TODO on my previous comment:

The registration mechanism needs to be changed in the future, as the IR only verifies successfully if the registration call happens before verifying the IR. This issue also occurs with the NVVM & ROCDL registration mechanisms.
One solution is adding a promised flag in Tablegen that generates a separate trait without the interface methods and checks a promise was made. Also, the promised interface mechanism needs to allow finer granularity, allowing to specify specific attribute promises.

mehdi_amini added inline comments.Aug 9 2023, 9:52 PM
mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
160

I don't understand what it means to promise this interface here actually?
What happens if we just remove this line without changing anything else?

fmorac updated this revision to Diff 549332.Aug 11 2023, 4:08 AM

Remove the promised interface, instead use OffloadingTranslationAttrTrait in SelectObjectAttr
and register the interface automatically with the GPUTranslationInterface.

mehdi_amini accepted this revision.Aug 11 2023, 11:09 AM
This revision is now accepted and ready to land.Aug 11 2023, 11:09 AM
This revision was automatically updated to reflect the committed changes.