The resulting code is shorter and simpler to optimize. The existing code was more general, and still serves as the fallback case when the incoming vector type and the outgoing scalar are incompatible. This code should trigger more often than through type punning, but that's the user code I saw it trigger on.
On the mailing list @chandlerc said:
Not sure this is the right approach. It is a lot of complexity, and we still have the fallback.
Have you looked at teaching instcombine to transform the code produced by SROA today into the element extract? That would seem a better layering at the least, although I'm still on the fence about whether we want to in general perform this operation.
This seems acceptable to me, I just want to make sure I record this on Phabricator. Here are a few thoughts:
- instcombine will "clean up" what SROA currently does by pattern matching what SROA currently produced.
- instcombine may also end up cleaning other code that looks the same (good!).
- The code in instcombine may diverge from SROA, so I'll write a test that makes sure SROA followed by instcombine is "clean".
- I could instead keep the code in SROA and remove the fallback and (from my beautiful ASCII art example) return the i40 directly instead of having the fallback. I'm just not sure this code gets hit much, so I figured addressing the case I saw first made sense, but can address both if you think it worthwhile.
I'll get started on instcombine and send a separate patch.