This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Introduce device mapper attribute for `thread_dim_map` and `mapped to dims`
ClosedPublic

Authored by guraypp on Nov 4 2022, 5:25 AM.

Details

Summary

scf.foreach_thread defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it.

scf.foreach_thread (%i, %j) in (%c1, %c2) {
	scf.foreach_thread (%i2, %j2) in (%c1, %c2)
	{...} { thread_dim_mapping = [0, 1]}
} { thread_dim_mapping = [0, 1]}

It first introduces a DeviceMappingInterface which is an attribute interface. scf.foreach_thread defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation.

The change also introduces two new attributes (#gpu.thread<x/y/z> and #gpu.block<x,y,z> ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads.

scf.foreach_thread (%i, %j) in (%c1, %c2) {
	scf.foreach_thread (%i2, %j2) in (%c1, %c2)
	{...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]}
} { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]}

Diff Detail

Event Timeline

guraypp created this revision.Nov 4 2022, 5:25 AM
guraypp requested review of this revision.Nov 4 2022, 5:25 AM
Herald added a project: Restricted Project. · View Herald Transcript
This comment was removed by guraypp.
guraypp edited the summary of this revision. (Show Details)Nov 4 2022, 5:32 AM

Please move DeviceMappingInterface to SCF.

ftynse added a comment.Nov 4 2022, 1:55 PM

This goes into the right direction IMO. I don't have a strong preference on where the interface should live. Putting it in SCF makes it a bit more "private", not sure we have layering issues with the GPU dialect depending on SCF for these interface purposes.

mlir/include/mlir/Dialect/GPU/TransformOps/GPUDeviceMapper.td
1 ↗(On Diff #473199)

The naming is weird. "Mapper" implies there is a process of mapping. I'd rather call this GPUDeviceMappingAttr.td.

49–60 ↗(On Diff #473199)

This is unused so let's not commit it.

mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
485

Can't we keep the default value here?

485

Also, we can use TypedArrayAttrBase here to specify that only DeviceMappingAttr are accepted.

mlir/include/mlir/Interfaces/DeviceMappingInterface.h
1 ↗(On Diff #473199)

Nit: 80 cols.

mlir/include/mlir/Interfaces/DeviceMappingInterface.td
1 ↗(On Diff #473199)

Nit: 80 cols

26 ↗(On Diff #473199)

???

mlir/lib/Dialect/GPU/TransformOps/GPUTransformOps.cpp
172

Nit: expand auto unless the type is obvious from line context.

177

Nit: error messages must start with a small letter, unlike comments. Same above.

357

Nit: prefer C++-style casts.

mlir/lib/Dialect/SCF/IR/SCF.cpp
1118

Debug leftover?

mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp
1148–1156

Can't we just use a foreachThreadOp.getMapping().value_or({}) for the last argument instead? If not, wrap multi-line bodies into braces.

mlir/lib/Dialect/Utils/StaticValueUtils.cpp
70–74 ↗(On Diff #473199)

llvm::to_vector(arrayAttr.getAsRange<Attribute>()) is shorter and likely even removes the need for this function entirely. Just use that at the call sites.

mlir/lib/Interfaces/DeviceMappingInterface.cpp
1–2 ↗(On Diff #473199)

This should have been one line.

mlir/test/Dialect/GPU/transform-gpu-failing.mlir
276

Please add the newline.

herhut added a comment.Nov 7 2022, 5:58 AM

I agree that this goes in the right direction, however, I am unsure what this direction is? Is there a writeup of where this should be going? That would make reviewing these changes easier.

Also, it would be nice to unify this with the mapping attributes on the scf.parallel operation, so that we have one way of doing this. Having a different encoding was fine for the initial experimentation but now that this turns into an interface, we should ensure it works with other scf operations, as well.

guraypp marked 14 inline comments as done.Nov 7 2022, 8:40 AM
guraypp added inline comments.
mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
485

I removed this because we don't have a meaningful default value.
I can introduce a new DeviceMappingAttr like #scf.not_mapped and make this one default value of the $mapping as a next work.

guraypp edited the summary of this revision. (Show Details)Nov 7 2022, 8:41 AM
guraypp marked an inline comment as done.Nov 7 2022, 8:47 AM

I agree that this goes in the right direction, however, I am unsure what this direction is? Is there a writeup of where this should be going? That would make reviewing these changes easier.

Also, it would be nice to unify this with the mapping attributes on the scf.parallel operation, so that we have one way of doing this. Having a different encoding was fine for the initial experimentation but now that this turns into an interface, we should ensure it works with other scf operations, as well.

When I originally implemented D136851, which just changed mappings from integer to string, I thought it is NFC. so we don't have a write-up. The idea for attribute interface originated in D136851 so I created this revision. I added more explanation to make things clear.

I think it is a good idea to make the mapping attribute of scf.parallel to implement DeviceMappingAttrInterface I can work on that as my next task.

guraypp updated this revision to Diff 473700.Nov 7 2022, 8:52 AM

address ftynse comments

rriddle requested changes to this revision.Nov 7 2022, 3:42 PM

Please move DeviceMappingInterface to SCF.

+1 here. This looks very SCF specific at this point.

I'm also curious on what the direction is.

mlir/lib/IR/Builders.cpp
12 ↗(On Diff #473700)

This looks unnecessary.

This revision now requires changes to proceed.Nov 7 2022, 3:42 PM

I agree that this goes in the right direction, however, I am unsure what this direction is? Is there a writeup of where this should be going? That would make reviewing these changes easier.

This a refactoring related to https://discourse.llvm.org/t/rfc-parallel-abstraction-for-tensors-and-buffers/62607 and the followup thread_dim_mapping attribute that was added as we implemented the transforms for e2e execution.
We have now reached the limits of the array of integers encoding, which this PR addresses.

Also, it would be nice to unify this with the mapping attributes on the scf.parallel operation, so that we have one way of doing this. Having a different encoding was fine for the initial experimentation but now that this turns into an interface, we should ensure it works with other scf operations, as well.

+1 this would be a nice followup.

Thanks for addressing @ftynse 's comments.
Please move the interface to SCF and then we can iterate.

guraypp updated this revision to Diff 474047.Nov 8 2022, 10:48 AM

Moved DeviceMappingInterface to SCF.

guraypp edited the summary of this revision. (Show Details)Nov 8 2022, 10:49 AM
nicolasvasilache accepted this revision.Nov 8 2022, 1:46 PM

Thanks @guraypp, Our existing transformations should not be much more intuitive to use.!

(I'll defer LGTM to others now)

mlir/include/mlir/Dialect/SCF/IR/DeviceMappingInterface.h
2

Cast Interfaces for MLIR?

mlir/include/mlir/Dialect/SCF/IR/DeviceMappingInterface.td
2

Data layout interfaces?

guraypp updated this revision to Diff 474199.Nov 9 2022, 2:07 AM

Address @rriddle comments, rebase, add description to gpu device mapping attributes

guraypp updated this revision to Diff 474213.Nov 9 2022, 3:16 AM

bazel fix

guraypp updated this revision to Diff 474226.Nov 9 2022, 4:52 AM

minor: fix comments

ftynse accepted this revision.Nov 10 2022, 8:02 AM
ftynse added inline comments.
mlir/lib/Dialect/GPU/TransformOps/GPUTransformOps.cpp
362

Nit: please expand auto systematically unless the type is obvious from statement context (the RHS is a constructor or a cast) or obnoxious/impossible to spell (lambdas, some iterators).

395–396

Nit: this rewrapping looks spurious.

guraypp updated this revision to Diff 474560.Nov 10 2022, 8:51 AM

address @ftynse comments

guraypp edited the summary of this revision. (Show Details)Nov 10 2022, 8:52 AM
guraypp edited the summary of this revision. (Show Details)Nov 10 2022, 8:55 AM
This revision was not accepted when it landed; it landed in state Needs Review.Nov 10 2022, 11:45 PM
This revision was automatically updated to reflect the committed changes.