This is an archive of the discontinued LLVM Phabricator instance.

[mlir] add a simple gpu barrier elimination mechanism
ClosedPublic

Authored by ftynse on Jul 7 2023, 7:36 AM.

Details

Summary

GPU code generation, and specifically the shared memory copy insertion
may introduce spurious barriers guarding read-after-read dependencies or
read-after-write on non-aliasing data, which degrades performance due to
unnecessary synchronization. Add a pattern and transform op that removes
such barriers by analyzing memory effects that the barrier actually
guards that are not also guarded by other barriers. The code is adapted
from the Polygeist incubator project.

Co-authored-by: William Moses <gh@wsmoses.com>
Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp>

Diff Detail

Event Timeline

ftynse created this revision.Jul 7 2023, 7:36 AM
Herald added a project: Restricted Project. · View Herald Transcript
ftynse requested review of this revision.Jul 7 2023, 7:36 AM
Herald added a project: Restricted Project. · View Herald Transcript
This revision is now accepted and ready to land.Jul 7 2023, 7:39 AM
wsmoses accepted this revision.Jul 7 2023, 7:45 AM
ftynse updated this revision to Diff 538202.Jul 7 2023, 10:44 AM

some more bazel

This revision was landed with ongoing or failed builds.Jul 7 2023, 11:51 AM
This revision was automatically updated to reflect the committed changes.
Hardcode84 added inline comments.
mlir/lib/Dialect/GPU/TransformOps/GPUTransformOps.cpp
416

Can we use existing MLIR alias analysis utilities (LocalAliasAnalysis specifically)?.

nicolasvasilache added inline comments.
mlir/lib/Dialect/GPU/TransformOps/GPUTransformOps.cpp
416

Great, I had not followed the state of that work too closely but thanks for surfacing!

We are reaching the point where we also need seriously alias analysis in memory-based transforms (cc @dcaballe with whom we touched on related topics yesterday).

ftynse added inline comments.Jul 12 2023, 7:22 AM
mlir/lib/Dialect/GPU/TransformOps/GPUTransformOps.cpp
416

The existing utility is missing a lot of things (capturing, globals, func arguments) and adding those was causing breakages elsewhere. We need a proper dataflow analysis to replace both.