The mapper assigns annotations to loop.parallel operations that
are compatible with the loop to gpu mapping pass. The outermost
loop uses the grid dimensions, followed by block dimensions. All
remaining loops are mapped to sequential loops.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Mostly nits from my side. Feel free to address and land.
mlir/include/mlir/Dialect/GPU/ParallelLoopMapper.h | ||
---|---|---|
2 | Copy-pasta in the header | |
27 | Do you about the op being a FuncOp? I'd just go with a body region here. | |
51 | Nit: missing newline at the end of file | |
mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp | ||
2 | Also copy-pasta. | |
35 | Nit: the convention is to have file-local types declared in an anonymous namespaces, but file-local functions as "static" outside of such namespaces | |
45 | This looks like it would be better placed next to the code that consumes hardware ids, that is the ploop->gpu.launch transformation. | |
67–68 | b.getNamedAttr("processor", b.getI64IntegerAttr(...)) | |
75 | Nit: let's factor out reused strings into named constants | |
mlir/test/Dialect/GPU/mapping.mlir | ||
57 | This test made me wonder about the semantic choice of saying an inner loop in a loop nest being mapped to "sequential" execution as opposed to "don't care about order" . Saying it's sequential implies the first iteration must be completed before starting the second and so on, which isn't the semantics of the original loop nest and isn't that of the generated code that does not include a synchronization after the nested parallel loop. | |
mlir/test/lib/Transforms/TestGpuParallelLoopMapping.cpp | ||
2 | Copy-pasta |
Review comments.
mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp | ||
---|---|---|
45 | This is just an elaborate way of counting. The consumer does not use these levels. | |
mlir/test/Dialect/GPU/mapping.mlir | ||
57 | You should see this more as "it is mapped to a sequential loop" and describes the generated code rather than specific execution order. The overall execution order is a result of the overall mapping and constrains the original parallel execution semantics (which did not assume any order) into a specific order that assumes some order. This should always be valid. |
Copy-pasta in the header