This pretty much does the same thing as D12269 , but makes it in SROA so that we don't generate horribly inneffiscient code to optimize it just after.
Details
Diff Detail
Event Timeline
As I mentioned on the original thread, SROA already has logic to remove FCA loads and stores. We shouldn't *just* add more code to SROA, we should re-use (and possibly extend) the existing code.
Does this make sense?
I'm still not sure that we should *only* do this inside of SROA, but maybe that's enough...
I think doing this as part of SROA makes the more sense. Doing the transformation without SROA make things worse is many cases, on the other hand, running this as part of SROA give nice results even when no other pass runs. Clearly this can be seen as preparing the field so SROA can do more.
As for the memcpy temporary, it is necessary to not loose the information about the whole aggregate when there is padding. If you loose this information at one step, subsequents passes suddenly try to work hard to keep the padding intact, which result in suboptimal results, so simply "exploding" the memory access like SRAO does not cut it.
Moving on with D14483 ( http://reviews.llvm.org/D14483 )
It doesn't handle cases where there is some padding, for which the right way forward is unclear, but at least get things moving forward when there is no padding, which is already something.