This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][GPU] Implement a simple greedy loop mapper.
ClosedPublic

Authored by herhut on Feb 21 2020, 7:18 AM.

Details

Summary

The mapper assigns annotations to loop.parallel operations that
are compatible with the loop to gpu mapping pass. The outermost
loop uses the grid dimensions, followed by block dimensions. All
remaining loops are mapped to sequential loops.

Diff Detail

Event Timeline

herhut created this revision.Feb 21 2020, 7:18 AM
ftynse accepted this revision.Feb 24 2020, 6:18 AM

Mostly nits from my side. Feel free to address and land.

mlir/include/mlir/Dialect/GPU/ParallelLoopMapper.h
2

Copy-pasta in the header

27

Do you about the op being a FuncOp? I'd just go with a body region here.

51

Nit: missing newline at the end of file

mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp
2

Also copy-pasta.

35

Nit: the convention is to have file-local types declared in an anonymous namespaces, but file-local functions as "static" outside of such namespaces

45

This looks like it would be better placed next to the code that consumes hardware ids, that is the ploop->gpu.launch transformation.

67–68

b.getNamedAttr("processor", b.getI64IntegerAttr(...))

75

Nit: let's factor out reused strings into named constants

mlir/test/Dialect/GPU/mapping.mlir
57

This test made me wonder about the semantic choice of saying an inner loop in a loop nest being mapped to "sequential" execution as opposed to "don't care about order" . Saying it's sequential implies the first iteration must be completed before starting the second and so on, which isn't the semantics of the original loop nest and isn't that of the generated code that does not include a synchronization after the nested parallel loop.

mlir/test/lib/Transforms/TestGpuParallelLoopMapping.cpp
2

Copy-pasta

This revision is now accepted and ready to land.Feb 24 2020, 6:18 AM
herhut updated this revision to Diff 246395.Feb 25 2020, 2:36 AM
herhut marked 12 inline comments as done.

Review comments.

mlir/lib/Dialect/GPU/Transforms/ParallelLoopMapper.cpp
45

This is just an elaborate way of counting. The consumer does not use these levels.

mlir/test/Dialect/GPU/mapping.mlir
57

You should see this more as "it is mapped to a sequential loop" and describes the generated code rather than specific execution order. The overall execution order is a result of the overall mapping and constrains the original parallel execution semantics (which did not assume any order) into a specific order that assumes some order. This should always be valid.

herhut updated this revision to Diff 246396.Feb 25 2020, 2:39 AM

Add some newlines to end of files.

herhut updated this revision to Diff 246399.Feb 25 2020, 2:43 AM

Rebase and some whitespace cleanup.

This revision was automatically updated to reflect the committed changes.