This revision adds GPU transform dialect. It also introduce a prefix such as "transform.gpu" for all ops related to this dialect.
MLIR already had two GPU transform op in linalg. This revision moves these ops into GPUTransformOps. The Ops are as follows:
transform.structured.map_nested_foreach_thread_to_gpu_blocks -> transform.gpu.map_foreach_to_blocks
This op selects the outermost (toplevel) foreach_thread and parallelize across GPU blocks. It can also generate gpu_launch.
transform.structured.map_nested_foreach_thread_to_gpu_threads -> transform.gpu.map_nested_foreach_to_threads
This op parallelizes nested foreach_thread that are inside gpu_launch across GPU threads.
It doesn't add new functionality, but there are some minor refactoring of the code.
Nit: /// here too, // inside functions only.