This is an archive of the discontinued LLVM Phabricator instance.

[mlir][spirv] Use spv.entry_point_abi in GPU to SPIR-V conversions
ClosedPublic

Authored by antiagainst on Feb 4 2020, 6:02 PM.

Details

Summary

We have spv.entry_point_abi for specifying the local workgroup size.
It should be decorated onto input gpu.func ops to drive the SPIR-V
CodeGen to generate the proper SPIR-V module execution mode. Compared
to using command-line options for specifying the configuration, using
attributes also has the benefits that 1) we are now able to use
different local workgroup for different entry points and 2) the
tests contains the configuration directly.

Diff Detail

Event Timeline

antiagainst created this revision.Feb 4 2020, 6:02 PM

Unit tests: unknown.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Unit tests: unknown.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Unit tests: unknown.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

I understand what the intent is here, but the input already has an attribute that belongs to the SPIR-V dialect before lowering. That makes things a bit non-composable. In cases where someone lowers to the GPU dialect and then conditionally decides to lower to SPIR-V dialect or the NVVM dialect, with this change on the SPIR-V side a separate pass will be needed to add this attribute. Ideally the input should be only in GPU dialect, whereas here it isnt.
Is it possible instead to add an attribute to GPU dialect itself which contains information about the workgroup size. Then while lowering we can convert one attribute to another.

mravishankar requested changes to this revision.Feb 4 2020, 10:55 PM
This revision now requires changes to proceed.Feb 4 2020, 10:55 PM

I understand what the intent is here, but the input already has an attribute that belongs to the SPIR-V dialect before lowering.

Generally I think it is inevitable that we'll have attributes belonging to lower layers attached at the source input at a higher layer, for example we will have spv.target_env and spv.interface_var_abi attributes attached to dispatch regions isolated by IREE at HLO level. The root problem is that we cannot infer everything a lower dialect needs all from higher dialects and it does not make sense to create duplicates all the way up to every layer. But with that said,

That makes things a bit non-composable. In cases where someone lowers to the GPU dialect and then conditionally decides to lower to SPIR-V dialect or the NVVM dialect, with this change on the SPIR-V side a separate pass will be needed to add this attribute. Ideally the input should be only in GPU dialect, whereas here it isnt.

Here we are at the boundary between GPU dialect and SPIR-V dialect; so it should be fine to have SPIR-V specific stuff attached to the input to drive conversions towards SPIR-V. But I get your idea here regarding non-composability. If the input is, say, loops, it's better to have a proper GPU dialect attribute instead of spv.entry_point_abi attached to loops for driving further conversion.

Is it possible instead to add an attribute to GPU dialect itself which contains information about the workgroup size. Then while lowering we can convert one attribute to another.

Yeah that makes sense to me. At GPU level we also have such concepts so we can have similar attributes; they are just contracts to different layers. Right now we are using a bunch of command-line options for doing that job; I'd love to see us switching to use attributes there too. I've created https://llvm.discourse.group/t/using-attributes-to-specify-workgroup-configuration-when-lowering-to-gpu/496 to RFC. I view that as an upper layer above SPIR-V so it's a bit separated from the changes here IMHO.

Lets go with this for now. We can clean this up when the attribute story gets fixed up.

mlir/include/mlir/Conversion/GPUToSPIRV/ConvertGPUToSPIRV.h
26

Can we leave the workgroup size as optional. If provided will be used for overriding the default way of using the attribute.

mlir/include/mlir/Conversion/GPUToSPIRV/ConvertGPUToSPIRVPass.h
27

Same here, make the workgroup size optional.

mlir/lib/Dialect/SPIRV/TargetAndABI.cpp
162

Cant you just do op->getParentOfType<FuncOp>(op) / op->getParentOfType<gpu::FuncOp>(op).

herhut added a comment.Feb 6 2020, 2:41 AM

I understand what the intent is here, but the input already has an attribute that belongs to the SPIR-V dialect before lowering.

Generally I think it is inevitable that we'll have attributes belonging to lower layers attached at the source input at a higher layer, for example we will have spv.target_env and spv.interface_var_abi attributes attached to dispatch regions isolated by IREE at HLO level. The root problem is that we cannot infer everything a lower dialect needs all from higher dialects and it does not make sense to create duplicates all the way up to every layer. But with that said,

That makes things a bit non-composable. In cases where someone lowers to the GPU dialect and then conditionally decides to lower to SPIR-V dialect or the NVVM dialect, with this change on the SPIR-V side a separate pass will be needed to add this attribute. Ideally the input should be only in GPU dialect, whereas here it isnt.

Here we are at the boundary between GPU dialect and SPIR-V dialect; so it should be fine to have SPIR-V specific stuff attached to the input to drive conversions towards SPIR-V. But I get your idea here regarding non-composability. If the input is, say, loops, it's better to have a proper GPU dialect attribute instead of spv.entry_point_abi attached to loops for driving further conversion.

Is it possible instead to add an attribute to GPU dialect itself which contains information about the workgroup size. Then while lowering we can convert one attribute to another.

Yeah that makes sense to me. At GPU level we also have such concepts so we can have similar attributes; they are just contracts to different layers. Right now we are using a bunch of command-line options for doing that job; I'd love to see us switching to use attributes there too. I've created https://llvm.discourse.group/t/using-attributes-to-specify-workgroup-configuration-when-lowering-to-gpu/496 to RFC. I view that as an upper layer above SPIR-V so it's a bit separated from the changes here IMHO.

Maybe I am missing something here, but from the GPU dialect, the sizes are passed to the gpu.launch, so you can take them from there. If you want to specialize a kernel for specific sizes, you need to ensure compatible call sites, like in other function specialization. Is this more about driving upper layers of code generation so that you end up with a gpu.launch that has sizes you want? Or do you want to make gpu.func usable independent of the gpu.launch?

Maybe I am missing something here, but from the GPU dialect, the sizes are passed to the gpu.launch, so you can take them from there. If you want to specialize a kernel for specific sizes, you need to ensure compatible call sites, like in other function specialization. Is this more about driving upper layers of code generation so that you end up with a gpu.launch that has sizes you want? Or do you want to make gpu.func usable independent of the gpu.launch?

I think you hit upon the core issue here. We would like gpu.func usable independent of gpu.launch. The way I see it, gpu.func allows for arbitrary workgroup size, but for cases where the workgroup size is fixed, the workgroup size is specified as an attribute on the function. For GPU to SPIR-V it is a pre-requisite that the workgroup size be a constant, so the change here makes it a requirement to have spv.entry_point_abi attribute on the gpu.func for the conversion to succeed (earlier it was implemented by having the workgroup size passed as an argument at pattern construction time).
Coming to the gpu.launch issue, I think we should have gpu.launch semantics be that the workgroup size be constant values if the gpu.func used has the attribute set. It would actually be better to go a bit further. Make the workgroup size arguments optional. If unspecified, the function uses a constant workgroup size, and would be illegal to specify if the target function has a constant workgroup size. This would be fairly easy to enforce in the verification.

mravishankar accepted this revision.Feb 10 2020, 10:13 AM

THanks Lei! More I think about it, it is better to use this approach.

This revision is now accepted and ready to land.Feb 10 2020, 10:13 AM

This is really helpful for mlir-vulkan-runner as well, would be nice to rebase on it, when this patch is commited.

antiagainst marked 5 inline comments as done.Feb 10 2020, 1:01 PM

@herhut: +1 to what Mahesh said. Additionally, I'd like to tighten SPIR-V side to use attributes in general for passing in pattern configurations.

There is mismatch between GPU dialect and the SPIR-V side. For this one (spv.entry_point_abi = {local_size = ...}) probably we can push one layer upwards and propose a gpu.workgroup_size at the GPU dialect level; then the SPIR-V lowering can convert gpu.workgroup_size to spv.entry_point_abi = {local_size = ...}. But as explained in https://llvm.discourse.group/t/using-attributes-to-specify-workgroup-configuration-when-lowering-to-gpu/496/20, for gpu.launch with non-constant workgroup sizes, we need to specify the SpecIds for them, which does not really make sense to appear at GPU dialect level; so likely we need the SPIR-V lowering path user (say, IREE) to attach something like spv.entry_point_abi = {local_size_spec_id = ...} before going to GPU dialect level and pass it all the way down the stack. (You can think this as part of the SPIR-V target in lieu of a proper target mechanism in SPIR-V. We have SPIR-V conversion target on SPIR-V side but that's covering different things than the ABI here. Complexities. ;-P) Again, we need to attach a SPIR-V specific attribute to the gpu.func eventually. I'd like to have consistency between the normal constant case and spec constant case.

mlir/include/mlir/Conversion/GPUToSPIRV/ConvertGPUToSPIRV.h
26

We probably don't want to do that. One of the purpose is to tighten the contract on SPIR-V lowerings and make them consistent. Having two ways is causing more confusion IMO.

mlir/lib/Dialect/SPIRV/TargetAndABI.cpp
162

I think this is simpler and more composable given this is a utility for writing SPIR-V lowerings. There may exist some other funcs that one would like to lower towards SPIR-V so I think we can be a bit flexible here.

This revision was automatically updated to reflect the committed changes.
antiagainst marked 2 inline comments as done.