Targets like AMDGPU can have performance benefit when replacing an argument passed by
reference with an argument passed by value on input and/or return value on output.
This isn't only simplifies the function body but also allows SROA for allocas for such
pointer arguments and helps analysis passes which quit on escaping pointers.
Although submitted very late this can be viewed as an early-preview. This definitely
requires more testing but I would like to know if I'm missing something fundamental.
Despite that this code has already been tested on some applications and seem working
to some extent.
I decided not to modify existing ArgumentPromotion pass because the change is substantial
and the new pass would allow gradual involvement for the concerned targets. Biggest change
is the use of MSSA for clobber testing and support for input/output arguments. So far GEPs
aren't supported but supposed to.
ArgumentPromotion code has been used as the source of inspiration and fragments of code. Another
used source is 'promoteLoopAccessesToScalars' function from LICM which does very similar thing
for loops. The way how a pointer can be treated as a safe to promote for an output argument
has been borrowed from there.
The original ArgumentPromotion has been submitted by Chris Lattner in 2004 and has another 78
contributors since then so I'm not really sure who to add as a reviewer here, please advise or add.
Clobber testing using MSSA is the most critical part here not only for performance reasons but
also for correctness - I would like it to be thoroughly reviewed.
clang-format: please reformat the code
40 diff lines are omitted. See full path.