This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Update GPU translation to accept binaries.
ClosedPublic

Authored by fmorac on Jun 29 2023, 2:12 PM.

Details

Summary

Commit message

Modifies GPU translation to accept GPU binaries embedding them using the
object manager interface method embedBinary, as well as accepting kernel
launch operations translating them using the interface method launchKernel.

Depends on D154152

Explanation

Summary:
These patches aim to be a replacement to the current GPU compilation infrastructure, with extensibility and trying to minimizing future disruption as the primary goal.
The biggest updates performed by these patches are:

  • The introduction of Target attributes, these attributes handle compilation of GPU modules into binary strings. These attributes can be implemented by any dialect, leaving the option for downstream users to implement their own serializations.
  • The introduction of the GPU binary operation, this operation stores GPU objects for different targets and can be invoked by gpu.launch_func.
  • Making gpu.binary & gpu.launch_func translatable to LLVM IR, with the translation being controlled by Object Manager attributes.
  • The introduction of the gpu-module-to-binary pass. This pass serializes GPU modules into GPU binaries, using the GPU targets available in the module.
  • The introduction of the #gpu.select_object object manager as the default object manager, it selects a single object for embedding in the IR, by default it selects the first object.

These patches leave the current infrastructure in place, allowing for a migration period for downstream users.

Examples:

  • GPU modules using target attributes:
gpu.module @my_module [#gpu.nvptx<chip = "sm_90">, #gpu.amdgpu, #gpu.amdgpu<chip = "gfx90a">] {
...
}
  • Applying the gpu-module-to-binary pass:
gpu.module @my_module [#gpu.nvptx<chip = "sm_90">, #gpu.amdgpu] {
...
}
; mlir-opt --gpu-module-to-binary
gpu.binary @my_module [#gpu.object<#gpu.nvptx<chip = "sm_90">, "BINARY DATA">, #gpu.object<#gpu.amdgpu, "BINARY DATA">]
  • Choosing the #gpu.amdgpu object for embedding:
gpu.binary @my_module <#gpu.select_object<#gpu.amdgpu>> [#gpu.object<#gpu.nvptx<chip = "sm_90">, "BINARY DATA">, #gpu.object<#gpu.amdgpu, "BINARY DATA">]
; It's also valid to pass the index of the object.
gpu.binary @my_module <#gpu.select_object<1>> [#gpu.object<#gpu.nvptx<chip = "sm_90">, "BINARY DATA">, #gpu.object<#gpu.amdgpu, "BINARY DATA">]

Testing:
This infrastructure was tested in 2 systems, one with a NVIDIA V100 and the other one with a AMD MI250X, in both cases the test completion was successful.

Input files:

  1. Steps for assembling the test for the NVIDIA system:
mlir-opt --gpu-to-llvm --gpu-module-to-binary test_nvvm.mlir | mlir-translate --mlir-to-llvmir -o test_nvptx.ll
clang++ test_nvptx.ll test.cpp -l

Output file: test_nvptx.ll

  1. Steps for assembling the test for the AMD system:
mlir-opt --gpu-to-llvm --gpu-module-to-binary test_rocdl.mlir | mlir-translate --mlir-to-llvmir -o test_amdgpu.ll
clang++ test_amdgpu.ll test.cpp -l

Output file: test_amdgpu.ll

Diff list

The following patches implement the proposal described in: https://discourse.llvm.org/t/rfc-extending-mlir-gpu-device-codegen-pipeline/70199/54 :

  • D154098: Add a GlobalSymbol trait.
  • D154097: Add a parameter for passing default values to StringRefParameter
  • D154100: Adds an utility class for serializing operations to binary strings.
  • D154104: Add GPU target attribute interface.
  • D154113: Add target attribute to GPU modules.
  • D154117: Adds the NVPTX target attribute.
  • D154129: Adds the AMDGPU target attribute.
  • D154108: Add the GPU object manager attribute interface.
  • D154132: Add gpu.binary op and #gpu.object attribute.
  • D154137: Modifies gpu.launch_func to allow lowering it after gpu-to-llvm.
  • D154147: Add the Select Object compilation attribute.
  • D154149: Add the gpu-module-to-binary pass.
  • D154152: Add GPU target support to gpu-to-llvm.

Diff Detail

Event Timeline

fmorac created this revision.Jun 29 2023, 2:12 PM
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
fmorac updated this revision to Diff 536082.Jun 29 2023, 6:14 PM

Rebasing.

fmorac edited the summary of this revision. (Show Details)Jun 29 2023, 7:03 PM
fmorac edited the summary of this revision. (Show Details)Jun 30 2023, 6:25 AM
fmorac published this revision for review.Jun 30 2023, 6:30 AM
Matt added a subscriber: Matt.Jun 30 2023, 4:36 PM
fmorac updated this revision to Diff 543218.Jul 22 2023, 12:12 PM
fmorac edited the summary of this revision. (Show Details)

Rebasing.

Something that would be welcome here is to create a new entry in docs/ to explain the GPU lowering/translation/codegen/embedding flow, can you add that?

mlir/test/Target/LLVMIR/gpu.mlir
4

Is this attribute necessary here?

10

STURCT-> STRUCT?

10

Also can you add a line before the block of CHECK saying what is checked here?

Something that would be welcome here is to create a new entry in docs/ to explain the GPU lowering/translation/codegen/embedding flow, can you add that?

Yes, I'll create a new patch just for docs. There are actually a couple more patches not in this series, the idea behind this series was for agreeing on the base concept & implementation. But I'll add the docs and then modify them if necessary.

mlir/test/Target/LLVMIR/gpu.mlir
10

I'll add it.

10

I don't understand, what do you mean?

mehdi_amini added inline comments.Jul 24 2023, 11:07 PM
mlir/test/Target/LLVMIR/gpu.mlir
10

Maybe STURCT Is intentional (I thought it was a typo), but I don't know what it means?

fmorac added inline comments.Jul 25 2023, 5:09 PM
mlir/test/Target/LLVMIR/gpu.mlir
10

Oh, yes, struct is the variable holding the struct with the args. I'll change the name & document.

fmorac updated this revision to Diff 547752.Aug 7 2023, 5:52 AM
fmorac marked 4 inline comments as done.

Changed the name of the arguments & added comments indicating the purpose of the tests.

mehdi_amini accepted this revision.Aug 8 2023, 10:25 PM
mehdi_amini added inline comments.
mlir/lib/Target/LLVMIR/Dialect/GPU/CMakeLists.txt
15

Why private here?

mlir/lib/Target/LLVMIR/Dialect/GPU/GPUToLLVMIRTranslation.cpp
47

Why is gpu.module a no-op success?

This revision is now accepted and ready to land.Aug 8 2023, 10:25 PM
fmorac added inline comments.Aug 9 2023, 6:02 PM
mlir/lib/Target/LLVMIR/Dialect/GPU/CMakeLists.txt
15

I'll change it, it's a left over from a previous version.

mlir/lib/Target/LLVMIR/Dialect/GPU/GPUToLLVMIRTranslation.cpp
47

In trunk it's also a no-op success, I presume the reason is that there's nothing to translate.

mehdi_amini added inline comments.Aug 9 2023, 10:17 PM
mlir/lib/Target/LLVMIR/Dialect/GPU/GPUToLLVMIRTranslation.cpp
47

Not clear to me why:

module {
 gpu.module @foo {
 }
}

shouldn't return a failure instead?

fmorac updated this revision to Diff 549338.EditedAug 11 2023, 4:20 AM
fmorac marked an inline comment as done.

Removed private lib.

gpu.module is no-op success because the translation mechanism is not aware of its context.

Example, when converting gpu.module it's not possible to distinguish between translating:

module {
 gpu.module @foo {
 }
}

and translating:

gpu.module @foo {
}

They both will invoke the same function with the same information.
It should be possible to distinguish the context by checking if the ops inside the module
have been converted or not, however that's an expensive check.