This is an archive of the discontinued LLVM Phabricator instance.

[mlir][transform] Introduce `gpu.map_foreach`
Needs ReviewPublic

Authored by guraypp on Nov 4 2022, 7:27 AM.

Details

Summary

GPU Transform dialect provides followings ops for mapping

	gpu.map_nested_foreach_to_threads	--> gpu blocks
	gpu.map_foreach_to_blocks			--> gpu threads

D137413 implements a clear loop mapping attributes. There is no need to have two different ops for mapping.

In addition, parallelism the entire kernel with a single op comes with advantages. First, the compiler can calculate gridDim once you know blockDim. This is actually necessary for global mapping (blockIdx.x * blockDim.x + threadIdx.x).

Diff Detail

Event Timeline

guraypp created this revision.Nov 4 2022, 7:27 AM
guraypp requested review of this revision.Nov 4 2022, 7:27 AM
bondhugula added inline comments.Nov 5 2022, 9:12 PM
mlir/include/mlir/Dialect/GPU/TransformOps/GPUTransformOps.h
49–50

Reference to mapping missing here.

51

Typo here. dim_apping

mlir/lib/Dialect/GPU/TransformOps/GPUTransformOps.cpp
158

This is an input - should have been an ArrayRef. You have an unnecessary copy.

csigg added inline comments.Nov 7 2022, 1:52 AM
mlir/include/mlir/Dialect/GPU/TransformOps/GPUTransformOps.td
189

I expected that the block size of the gpu launch would be 1, because there is no mentioning of mapping to threads in this description. But the test seems to suggest that it's mapped to multiple threads. Maybe this is just me (I'm not familiar with this code), or maybe the description is misleading.

mlir/lib/Dialect/GPU/TransformOps/GPUTransformOps.cpp
273

Please change to camelCase