This is an archive of the discontinued LLVM Phabricator instance.

InstCombine: extract instead of shuffle when performing vector/array type punning
ClosedPublic

Authored by jfb on Feb 18 2015, 12:55 PM.

Details

Reviewers
chandlerc
jvoung
Summary

SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is.

Diff Detail

Event Timeline

jfb updated this revision to Diff 20213.Feb 18 2015, 12:55 PM
jfb retitled this revision from to InstCombine: extract instead of shuffle when performing vector/array type punning.
jfb updated this object.
jfb edited the test plan for this revision. (Show Details)
jfb added reviewers: jvoung, chandlerc.
jfb added a subscriber: Unknown Object (MLST).
jfb added inline comments.Feb 18 2015, 12:58 PM
lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1096

Note that the code I'm adding must precede this code, which causes the visitor to bail early.

majnemer added inline comments.
lib/Transforms/InstCombine/InstCombineVectorOps.cpp
994

Is this different from !BC->use_empty() ?

jfb updated this revision to Diff 20220.Feb 18 2015, 1:58 PM
  • s/BC->hasNUsesOrMore(1)/!BC->use_empty()/
lib/Transforms/InstCombine/InstCombineVectorOps.cpp
994

Done.

chandlerc added inline comments.Feb 18 2015, 2:52 PM
lib/Transforms/InstCombine/InstCombineVectorOps.cpp
863

should be 'isShuffle...'

964–966

The first sentence here doesn't read cleanly. It's important to emphasize that we only do this when the extracted sub-vector is bitcast to an non-vector type that we could extract.

974–981

Fabulous ascii art! Thanks!

1010–1028

Please use the InstCombine IR builder throughout. That will ensure these new instructions are visited.

Also, please create a single extract element instruction and replace all the bitcasts with it.

jfb updated this revision to Diff 20235.Feb 18 2015, 4:38 PM
  • s/ShuffleIsExtractingFromLHS/isShuffleExtractingFromLHS/
  • Clarify documentation.
  • Add tests for bitcasts to types of the same size, and control flow.
  • Cache previously generated bitcasts so they only get generated once, and generate them where the old shuffle used to be to ensure they dominate all potential uses.
  • Use InstCombine's IRBuilder.
jfb added inline comments.Feb 18 2015, 4:41 PM
lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1010–1028

I'm not sure I did exactly what you wanted: we have to handle the case where multiple bitcasts to different types of the same size are handled. For now I did this by creating one bitcast per Type, but I could do it with a single bitcast+extract followed by a bitcast of the scalar. The later seems suboptimal, and the example is kind of silly anyways so as long as it's correct I guess "meh".

jvoung edited edge metadata.Feb 23 2015, 11:53 AM

Mostly nits, but looks okay to me.

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
955

"We can instead bitcast of the original vector followed by an extract..." sounds a bit weird to me -- missing a few words?

965

"If the shuffle is extracting contiguous range of values from..." ->
"If the shuffle is extracting a contiguous range of values from..." ?

1016

Would it be more straightforward to use "Undef = UndefValue::get(Int32Ty)"?

1026

Is there anything that actually guarantees TgtElemBitWidth > SrcElemBitWidth? Could be safer to just check this earlier and bail?

N/M (this should be guaranteed by the shuffled vector have >= 1 elems), but phabricator won't let me delete this comment =(

1037

"isn't being replace:"
->
"isn't being replaced:"

jfb updated this revision to Diff 20552.Feb 23 2015, 3:31 PM
jfb edited edge metadata.
  • Address jvoung's comments.
lib/Transforms/InstCombine/InstCombineVectorOps.cpp
955

Done.

965

Done.

1016

Good point, done.

1026

Yes, it's a sanity check I added in case the code changes and this guarantee is broken. The divide by zero would be caught on x86 anyways, but not on ARM... I'm being overly paranoid on this one :)

1037

Done.

jvoung accepted this revision.Feb 24 2015, 1:25 PM
jvoung edited edge metadata.

lgtm

This revision is now accepted and ready to land.Feb 24 2015, 1:25 PM