SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is.
Diff Detail
Event Timeline
lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
1096 | Note that the code I'm adding must precede this code, which causes the visitor to bail early. |
lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
994 | Is this different from !BC->use_empty() ? |
- s/BC->hasNUsesOrMore(1)/!BC->use_empty()/
lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
994 | Done. |
lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
863 | should be 'isShuffle...' | |
964–966 | The first sentence here doesn't read cleanly. It's important to emphasize that we only do this when the extracted sub-vector is bitcast to an non-vector type that we could extract. | |
974–981 | Fabulous ascii art! Thanks! | |
1010–1028 | Please use the InstCombine IR builder throughout. That will ensure these new instructions are visited. Also, please create a single extract element instruction and replace all the bitcasts with it. |
- s/ShuffleIsExtractingFromLHS/isShuffleExtractingFromLHS/
- Clarify documentation.
- Add tests for bitcasts to types of the same size, and control flow.
- Cache previously generated bitcasts so they only get generated once, and generate them where the old shuffle used to be to ensure they dominate all potential uses.
- Use InstCombine's IRBuilder.
lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
1010–1028 | I'm not sure I did exactly what you wanted: we have to handle the case where multiple bitcasts to different types of the same size are handled. For now I did this by creating one bitcast per Type, but I could do it with a single bitcast+extract followed by a bitcast of the scalar. The later seems suboptimal, and the example is kind of silly anyways so as long as it's correct I guess "meh". |
Mostly nits, but looks okay to me.
lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
955 | "We can instead bitcast of the original vector followed by an extract..." sounds a bit weird to me -- missing a few words? | |
965 | "If the shuffle is extracting contiguous range of values from..." -> | |
1016 | Would it be more straightforward to use "Undef = UndefValue::get(Int32Ty)"? | |
1026 | Is there anything that actually guarantees TgtElemBitWidth > SrcElemBitWidth? Could be safer to just check this earlier and bail? N/M (this should be guaranteed by the shuffled vector have >= 1 elems), but phabricator won't let me delete this comment =( | |
1037 | "isn't being replace:" |
- Address jvoung's comments.
lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
955 | Done. | |
965 | Done. | |
1016 | Good point, done. | |
1026 | Yes, it's a sanity check I added in case the code changes and this guarantee is broken. The divide by zero would be caught on x86 anyways, but not on ARM... I'm being overly paranoid on this one :) | |
1037 | Done. |
should be 'isShuffle...'