This is an archive of the discontinued LLVM Phabricator instance.

[mlir][GPU] Use StructAttr to drive lowering from loop.parallel to gpu.launch
ClosedPublic

Authored by mravishankar on Mar 13 2020, 2:44 PM.

Details

Summary

Current implementation of lowering from loop.parallel to gpu.launch
uses a DictionaryAttr to specify the mapping. Moving this attribute to
be auto-generated from specification as a StructAttr. This simplifies
a lot the logic of looking up and creating this attribute.

Diff Detail

Event Timeline

mravishankar created this revision.Mar 13 2020, 2:44 PM
Herald added a project: Restricted Project. · View Herald Transcript
mravishankar retitled this revision from [mlir][GPU] Use StructAttr to driver lowering from loop.parallel to gpu.launch to [mlir][GPU] Use StructAttr to drive lowering from loop.parallel to gpu.launch.Mar 13 2020, 2:46 PM
mravishankar added reviewers: ftynse, antiagainst.
mravishankar added a subscriber: hanchung.

Fixing typo in CMakeLists

Fixing test failures

herhut requested changes to this revision.Mar 16 2020, 4:04 AM

Thanks for cleaning this up. With comments addresses, I'd be happy to see this land.

mlir/include/mlir/Dialect/GPU/ParallelLoopMapper.h
56

This is not illegal.

mlir/include/mlir/Dialect/GPU/ParallelLoopMapperAttr.td
41

ParallelLoopDimMappingAttr as it describes the mapping and not the mapper.

mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp
51

Why is this invalid? You could map to the same processor but use ceilDiv and modulo in the map attribute to decompose the bound again.

83–88

I do not understand this comment.

mlir/test/Conversion/LoopsToGPU/parallel_loop.mlir
312

Now the error message no longer corresponds to what is encoded in the attribute. If I search for BLOCKY in the input I would not find it.

This revision now requires changes to proceed.Mar 16 2020, 4:04 AM
mravishankar marked 4 inline comments as done.

Addressing comments

mravishankar marked 4 inline comments as done.

Adding missing newline.

mlir/include/mlir/Dialect/GPU/ParallelLoopMapper.h
56
mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp
51
83–88

Maybe this comment is misplaced (I should move it below). At line 136 below, the 0th-induction variable is mapped to processor x, and 1th-induction variable to processor y, etc. Typically the 0th induction variable is the "outer" parallel loop. The 1th induction variable is the next inner, etc. There is no strong reason for it, but typically the inner-most parallel loop is also used to access the data in stride 1 (in elementwise operations for example). So a default mapping can just try to handle this common case. There are cases where this doesnt work obviously, and maybe a more general mechanism to control which dimensions maps to which processor dimension is useful. Is that what the mapping and bound are expected to do?

mlir/test/Conversion/LoopsToGPU/parallel_loop.mlir
312

I think this is an issue with the parsing of custom StructAttr while roundtripping. Its implemented as an IntegerAttr. So while printing and parsing it will use the enum value. If you adapt the loop.parallel parser/printer to custom parse the mapping attribute you can make it accept keywords like "BLOCKX", "BLOCKY", etc. and print it out that way as well. Then the error message here would match up well.
For now I reverted the error message to print it as it was before. I can adapt the parser/printer to rountrip the processor value as something more meaningful. (I can add that to this change itself, but it would be a breaking change and not an NFC). Your call.

antiagainst added inline comments.Mar 16 2020, 6:15 PM
mlir/include/mlir/Dialect/GPU/ParallelLoopMapperAttr.td
21

I'm wondering the convention here. Why all cap letters? BlockX, ThreadY/Sequential/etc. seems better to me.

mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp
526

Isn't this just static_cast<unsigned>(processor) excluding gpu::Processor::SEQUENTIAL? Do we need this explicit switch here?

ftynse accepted this revision.Mar 23 2020, 12:43 PM
ftynse added inline comments.
mlir/include/mlir/Dialect/GPU/ParallelLoopMapper.h
55

Nit: ploopOp does not refer to anything in the code

56

Add a TODO for Stephan to support this case in the pipeline and update the check? :)

mlir/include/mlir/Dialect/GPU/ParallelLoopMapperAttr.td
21

+1 for CamelCase. I used the same in the LLVM dialect for enumerants.

mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp
49–50

You have getProcessor for this

mravishankar marked 7 inline comments as done.

Addressing comments.

mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp
526

I am very uncomfortable relying on the enum value implicitly (especially mapping between two separate "enums" based on enum value). If one of the enums change (due to adding new enums), then it can lead to weird errors that can be hard to track down. Plus this is more readable IMO. Worth the cost of a switch-case?

mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp
49–50

Oh yes! Thanks!

Fixing build changes

Rebase on top-of-tree

This revision was not accepted when it landed; it landed in state Needs Review.Mar 24 2020, 4:38 PM
This revision was automatically updated to reflect the committed changes.