This is an archive of the discontinued LLVM Phabricator instance.

SROA: extract instead of shuffle when performing vector/array type punning
AbandonedPublic

Authored by jfb on Feb 7 2015, 5:16 PM.

Details

Reviewers
chandlerc
jvoung
Summary

The resulting code is shorter and simpler to optimize. The existing code was more general, and still serves as the fallback case when the incoming vector type and the outgoing scalar are incompatible. This code should trigger more often than through type punning, but that's the user code I saw it trigger on.

Diff Detail

Event Timeline

jfb updated this revision to Diff 19540.Feb 7 2015, 5:16 PM
jfb retitled this revision from to SROA: extract instead of shuffle when performing vector/array type punning.
jfb updated this object.
jfb edited the test plan for this revision. (Show Details)
jfb added reviewers: chandlerc, jvoung.
jfb added a subscriber: Unknown Object (MLST).
majnemer added inline comments.
lib/Transforms/Scalar/SROA.cpp
3075–3078

TODO(name) is not typical LLVM style.

jfb updated this revision to Diff 19543.Feb 7 2015, 7:03 PM
  • TODO -> FIXME
jfb added inline comments.Feb 7 2015, 7:05 PM
lib/Transforms/Scalar/SROA.cpp
3075–3078

Done.

One small inline comment.

lib/Transforms/Scalar/SROA.cpp
2616–2621

Block comment here.

jfb updated this revision to Diff 19544.Feb 7 2015, 9:30 PM
  • Add block comments to functions extractVector and rewriteVectorizedLoadInst.
jfb added a comment.Feb 9 2015, 9:53 AM

On the mailing list @chandlerc said:

Not sure this is the right approach. It is a lot of complexity, and we still have the fallback.

Have you looked at teaching instcombine to transform the code produced by SROA today into the element extract? That would seem a better layering at the least, although I'm still on the fence about whether we want to in general perform this operation.

This seems acceptable to me, I just want to make sure I record this on Phabricator. Here are a few thoughts:

  • instcombine will "clean up" what SROA currently does by pattern matching what SROA currently produced.
  • instcombine may also end up cleaning other code that looks the same (good!).
  • The code in instcombine may diverge from SROA, so I'll write a test that makes sure SROA followed by instcombine is "clean".
  • I could instead keep the code in SROA and remove the fallback and (from my beautiful ASCII art example) return the i40 directly instead of having the fallback. I'm just not sure this code gets hit much, so I figured addressing the case I saw first made sense, but can address both if you think it worthwhile.

I'll get started on instcombine and send a separate patch.

jfb abandoned this revision.Feb 18 2015, 12:56 PM

I finally got back to this and implemented the solution we discussed in InstCombine: D7734. Abandoning this revision in favor of the new one.