GPU code generation, and specifically the shared memory copy insertion
may introduce spurious barriers guarding read-after-read dependencies or
read-after-write on non-aliasing data, which degrades performance due to
unnecessary synchronization. Add a pattern and transform op that removes
such barriers by analyzing memory effects that the barrier actually
guards that are not also guarded by other barriers. The code is adapted
from the Polygeist incubator project.
Co-authored-by: William Moses <gh@wsmoses.com>
Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp>
Can we use existing MLIR alias analysis utilities (LocalAliasAnalysis specifically)?.