Current implementation of lowering from loop.parallel to gpu.launch
uses a DictionaryAttr to specify the mapping. Moving this attribute to
be auto-generated from specification as a StructAttr. This simplifies
a lot the logic of looking up and creating this attribute.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for cleaning this up. With comments addresses, I'd be happy to see this land.
mlir/include/mlir/Dialect/GPU/ParallelLoopMapper.h | ||
---|---|---|
56 | This is not illegal. | |
mlir/include/mlir/Dialect/GPU/ParallelLoopMapperAttr.td | ||
41 | ParallelLoopDimMappingAttr as it describes the mapping and not the mapper. | |
mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp | ||
51 | Why is this invalid? You could map to the same processor but use ceilDiv and modulo in the map attribute to decompose the bound again. | |
83–88 | I do not understand this comment. | |
mlir/test/Conversion/LoopsToGPU/parallel_loop.mlir | ||
312 | Now the error message no longer corresponds to what is encoded in the attribute. If I search for BLOCKY in the input I would not find it. |
Adding missing newline.
mlir/include/mlir/Dialect/GPU/ParallelLoopMapper.h | ||
---|---|---|
56 | I can remove that check, but I was going by what is here : https://github.com/llvm/llvm-project/blob/5c261c9c452959985de19540c168b224af24e2d3/mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp#L673 | |
mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp | ||
51 | I was going by what is here : https://github.com/llvm/llvm-project/blob/5c261c9c452959985de19540c168b224af24e2d3/mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp#L673. Sorry if I misread it | |
83–88 | Maybe this comment is misplaced (I should move it below). At line 136 below, the 0th-induction variable is mapped to processor x, and 1th-induction variable to processor y, etc. Typically the 0th induction variable is the "outer" parallel loop. The 1th induction variable is the next inner, etc. There is no strong reason for it, but typically the inner-most parallel loop is also used to access the data in stride 1 (in elementwise operations for example). So a default mapping can just try to handle this common case. There are cases where this doesnt work obviously, and maybe a more general mechanism to control which dimensions maps to which processor dimension is useful. Is that what the mapping and bound are expected to do? | |
mlir/test/Conversion/LoopsToGPU/parallel_loop.mlir | ||
312 | I think this is an issue with the parsing of custom StructAttr while roundtripping. Its implemented as an IntegerAttr. So while printing and parsing it will use the enum value. If you adapt the loop.parallel parser/printer to custom parse the mapping attribute you can make it accept keywords like "BLOCKX", "BLOCKY", etc. and print it out that way as well. Then the error message here would match up well. |
mlir/include/mlir/Dialect/GPU/ParallelLoopMapperAttr.td | ||
---|---|---|
21 | I'm wondering the convention here. Why all cap letters? BlockX, ThreadY/Sequential/etc. seems better to me. | |
mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp | ||
526 | Isn't this just static_cast<unsigned>(processor) excluding gpu::Processor::SEQUENTIAL? Do we need this explicit switch here? |
mlir/include/mlir/Dialect/GPU/ParallelLoopMapper.h | ||
---|---|---|
55 | Nit: ploopOp does not refer to anything in the code | |
56 | Add a TODO for Stephan to support this case in the pipeline and update the check? :) | |
mlir/include/mlir/Dialect/GPU/ParallelLoopMapperAttr.td | ||
21 | +1 for CamelCase. I used the same in the LLVM dialect for enumerants. | |
mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp | ||
49–50 | You have getProcessor for this |
Addressing comments.
mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp | ||
---|---|---|
526 | I am very uncomfortable relying on the enum value implicitly (especially mapping between two separate "enums" based on enum value). If one of the enums change (due to adding new enums), then it can lead to weird errors that can be hard to track down. Plus this is more readable IMO. Worth the cost of a switch-case? | |
mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp | ||
49–50 | Oh yes! Thanks! |
Nit: ploopOp does not refer to anything in the code