This is an archive of the discontinued LLVM Phabricator instance.

[mlir][GPU] Expose the functionality to create a gpu.GPUFuncOp from a gpu.GPULaunchOp
ClosedPublic

Authored by mravishankar on Feb 27 2020, 11:56 AM.

Details

Summary

The current setup of the GPU dialect is to model both the host and
device side codegen. For cases (like IREE) the host side modeling
might not directly fit its use case, but device-side codegen is still
valuable. First step in accessing just the device-side functionality
of the GPU dialect is to allow just creating a gpu.func operation from
a gpu.launch operation. In addition this change also "inlines"
operations into the gpu.func op at time of creation instead of this
being a later step.

Diff Detail

Event Timeline

mravishankar created this revision.Feb 27 2020, 11:56 AM
Herald added a project: Restricted Project. · View Herald Transcript
mravishankar added a subscriber: hanchung.
mehdi_amini added inline comments.Feb 27 2020, 10:26 PM
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
136

Please don't use "inline" for other aspect that the inliner.

What about sink?

139

Nit: don't use auto when it does not improve the readability (line 93 below is explicit for instance)

141

This whole sinking transformation does not seem safe in general: this should check *legality* rather than "benefit".
Also it isn't clear to me why this is done during the outlining and not as a pre-pass. The launch operation with the region abstraction seems perfectly suited to model this. I rather have this exposed in a separate API / as a separate step.

ftynse requested changes to this revision.Feb 28 2020, 1:45 AM

Please check the comments around the code you are modifying/moving, some of them no longer describe what the code does after your changes.

mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
34–35

Please update the comment to describe the new API

141

This whole sinking transformation does not seem safe in general: this should check *legality* rather than "benefit".

The function just seems misnamed, should be something like shouldSink because it mixes validity and benefit. In practice, it only returns true for constant and dim operations that don't have side effects.

180

Will this work for blocks whose dominance relation is inverse of their textual order? E.g.

^entry:
  br ^bb2:
^bb1:
  "use"(%0) : (index) -> ()
  return
^bb2:
  %0 = "def"() : () -> (index)
  br ^bb1
181

This no longer removes the arguments, but rather updates the map.

193

Will this update the users of the inlinedOps? I don't see the map updated anywhere.

This revision now requires changes to proceed.Feb 28 2020, 1:45 AM
herhut added inline comments.Feb 28 2020, 7:41 AM
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
141

This whole sinking transformation does not seem safe in general: this should check *legality* rather than "benefit".

Well, it should check both. You do not want to move all legal operation either :)

Also it isn't clear to me why this is done during the outlining and not as a pre-pass. The launch operation with the region abstraction seems perfectly suited to model this. I rather have this exposed in a separate API / as a separate step

This has purely historical reasons. Not long ago, the gpu.launch was closed from above, so this transformation was done when moving to function form. I have a separate pass for this in a local client, which I can send out next week. It just needs tests.

It was implemented as a "post transformation" to the outlining and I would prefer if we do not mix it into the outlining transformation itself. When written separately, the transformations are trivial.

mehdi_amini added inline comments.Feb 28 2020, 9:21 AM
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
141

It was implemented as a "post transformation" to the outlining and

Pre-outlining seems easier to manage because region vs inter-procedural (and also can be kept a function pass).

I would prefer if we do not mix it into the outlining transformation itself. When written separately, the transformations are trivial.

Seems like we're in agreement :)

mravishankar marked 6 inline comments as done.

Addressing comments and changing the way cloning is of the region of
the gpu.launch operation is done.

mravishankar marked 4 inline comments as done.Feb 28 2020, 3:29 PM
mravishankar added inline comments.
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
136

Changed to "sink" and updated all variables names.

141

A pre-pass is fine, but I think it would be better to leave it here. Eventually, it would be good if all transformations can be expressed as a pattern match and rewrite. This "outlining" is essentially converting a gpu.launchOp to a gpu.launchFuncOp. If you need to have a separate pass to sink the instructions, then it breaks the ability of going from loops -> GPU -> NVVM/SPIR-V. I am not saying anybody does this today (not doing this in IREE), but in general it seems like it would be beneficial to have transformations as patterns, and passes as just a light-weight wrapper around patterns.

Re: being able to keep it as a function pass, is related to where the gpu.module is created. As set up right now it is put outside of the function that the gpu.launch operation lives in. Thats a a very specific choice and would be very useful to allow "clients" of the outlining to decide where to put the gpu.module.

180

Thanks for pointing this out. I updated the method to use cloneInto which handles this case (see comment on changes to cloneInto. I can make the change here if you think that is reasonable)

193

It is done within the clone(map) operation. The results of the operation are added to the map as well.

mravishankar marked 3 inline comments as done.

Fixes minor typos

antiagainst added inline comments.Feb 28 2020, 3:53 PM
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
185

We are only using the "set" part here right? Just use a set data type?

Fix failing test

mehdi_amini requested changes to this revision.Feb 28 2020, 10:21 PM
mehdi_amini added inline comments.
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
141

If you need to have a separate pass to sink the instructions, then it breaks the ability of going from loops -> GPU -> NVVM/SPIR-V.

I don't understand what you mean, can you elaborate?

but in general it seems like it would be beneficial to have transformations as patterns, and passes as just a light-weight wrapper around patterns.

This is mixing an optimization within an unrelated transformation: this just does not belong here IMO.

Re: being able to keep it as a function pass, is related to where the gpu.module is created.

I don't know what you mean or how it answer the point about the function pass right now.

This revision now requires changes to proceed.Feb 28 2020, 10:21 PM
mravishankar marked an inline comment as done.Mar 2 2020, 10:45 AM
mravishankar added inline comments.
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
141

Re : separate pass to sink instructions.

The Dialect conversion framework is designed to go from A -> B -> C. If I want to target SPIR-V/NVVM from Linalg dialect vial Loop dialect and GPU dialects (i.e. Linalg -> Loops -> GPU -> SPIRV/NVVM), I can add all the patterns for the conversion into the dialect conversion framework. Currently Loops to GPU dialect is not exposed as a conversion pattern. GPU to SPIRV is. By adding extra steps as a "pre-condition" will limit the ability to the entire conversion being done using the dialect conversion framework (which is what it is built for). You could add the "sinking" as a canonicalization pattern, but it seems to me this sinking is useful only when the gpu.launch region is outlined to create a gpu.func operation. So doing the sinking during the conversion makes sense to me.

Re: fusion pass vs module pass

The current setup of the gpu.launch to gpu.launch_func conversion creates a gpu.module that is inserted just after the function the gpu.launch is in. This makes it a module pass, and this behavior is only relevant for the CUDA/NVVM side of things. For IREE, we are only interested in the device side for now. So we can make this a function pass if we can control where the gpu.module is inserted. See this dependent PR in IREE that uses this change and makes the conversion of gpu.launch to gpu.func as a function pass.

mravishankar marked an inline comment as done.Mar 2 2020, 1:35 PM
mravishankar added inline comments.
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
141

@mehdi_amini : Update on the function pass vs module pass. You were right that the outlining can only be a module pass since the gpu.module also has a symbol so it needs to be added to a module (or an op with symbol table). So I was wrong about that. I update the PR shown above to be module pass as well, but FYI there was no assert when i did it as a function pass.

Just filling in some details about discussion offline. It is true that the sinking could be done as a prepass. If so then it is a separate "pre-processing" pass. It is unclear if sinking can be expressed as a pattern.

mehdi_amini added inline comments.Mar 2 2020, 8:13 PM
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
141

(typed the following this morning before you last comment, but didn't click send)

The Dialect conversion framework is designed to go from A -> B -> C.

Yes, but if you just say this, you can shoehorn anything in there: you could use this mental model to go from Swift SIL to X86 assembly in a single "legalize()" call, I don't think this is a good use of the framework.
We should use the lowering framework where it makes sense and where it is *the* way to solve a problem. If your problem fits into the pass pipeline, then why not start there? This is the most natural way of thinking about chain of transformations.

the ability to the entire conversion being done using the dialect conversion framework (which is what it is built for).

I disagree that this is what it is built for. I think this is a misconception of what the framework is intended to solve.
If you can express a pass pipeline where you want to do A->C as a logical sequence of A->B and then B->C, where B is an "interesting level of abstraction", I believe this should be separate passes.

If we take for example the HLO -> SPIR-V pipeline, we can likely identify logical stages like HLO->Loops, Loops->GPU Kernel, and GPU Kernel -> SPIRV. These stages are fully disjoint as far as I can tell, and there is no immediate benefit to combine them in a single lowering. On the opposite: if these stages are well separated, this provides the opportunity for passes to run on each intermediate level of abstraction (including some generic things like canonicalization or CSE), and it allows also more reusable blocks (Loops->GPU Kernel can be reused even when you don't come from HLO). It also forces testing at every level and help compiler engineers keeping a mental model where we can reason about this stages and how they compose independently.

Updating patch to separate the sinking of instructions into launch op
as a separate function.

mravishankar marked an inline comment as done.Mar 3 2020, 2:05 AM
mravishankar added inline comments.
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
141

Thanks @mehdi_amini for that overview. I think what you say makes sense and is a good thumb rule to use (probably good to add it somewhere in rationale)

Going back to the change at hand. I modified the patch to expose the "sinking" transformation as a separate utility function exposed by the GPU dialect. PTAL, but to me this seems more complex. If this is along the lines of what is the recommendation here, I can work with it for my use case.

mehdi_amini added inline comments.Mar 3 2020, 9:39 PM
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
236

Note: this is still not decoupled from this pass right now (i.e. not tested in isolation, etc.): we still have "outlining" and "sinking" part of the same pass, can't they be separated?

ftynse accepted this revision.Mar 4 2020, 4:33 AM

I don't have further objections other than a bunch of nits. This patch intends to expose _functions_ and can land as is. Refactoring those functions into separate (test) passes is okay for a follow-up IMO.

mlir/include/mlir/Dialect/GPU/Passes.h
21

struct, otherwise you'll break windows builds

53

Bikeshed nit: sinkOperationsIntoLaunchOp. "Intructions" aren't a thing in MLIR.

mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
64

Bikeshed nit: operands confused me into thinking it referred to the _existing_ launchOp operands, not "values that might become operands if sinking is beneficial".

105

How about iterating over uses rather than users?

for (auto use : result.value().getUses()) {
  if (use.getUser().getParentOfType<gpu::LaunchOp>() == launchOp))
    use.getOperand().set(replacement);
}
112

Can this happen in a valid IR? If not, I would rather assert. Otherwise, please drop trivial braces

117

This comment looks outdated

234

Sinking already reports an error, no need to add another one IMO.

251

Nit: Please drop trivial braces

mravishankar marked 9 inline comments as done.

Addressing comments

mravishankar added inline comments.Mar 4 2020, 2:01 PM
mlir/include/mlir/Dialect/GPU/Passes.h
21

THanks! Interesting that the build bot passed.

53

Good point. I have conflated the two mentally cause of that.

mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
112

I think so, but I am not sure. Will leave it as an error, and remove braces

236

They are separate functions. I have no visibility into the clients of the pass. So if any user of the pass is relying on sinking happening then removing the sinking would "potentially" break. One could argue that then it is incorrect usage since the gpu.launch_func op gets updated accordingly, but at this point I would rather keep this change as an NFC.

mehdi_amini added inline comments.Mar 4 2020, 5:19 PM
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
236

Keeping it NFC is a very good point!

(what would break here is an optimization and not correctness right? So we can still do it in the absolute?)

mehdi_amini accepted this revision.Mar 4 2020, 5:21 PM
mehdi_amini added inline comments.
mlir/include/mlir/Dialect/GPU/Passes.h
33

Nit: this is the only *pass* exposed in a header called Passes.h.
Can you split this header in a Utils.h?

This revision is now accepted and ready to land.Mar 4 2020, 5:21 PM
mravishankar marked an inline comment as done.

KMoving GPU utility functions into Utils.h

mravishankar added inline comments.Mar 4 2020, 10:35 PM
mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp
236

Yes, it would indeed be an optimization thing and not a correctness thing.

This revision was automatically updated to reflect the committed changes.