Page MenuHomePhabricator

Optimize store of "bitcast" from vector to aggregate.
ClosedPublic

Authored by ArchDRobison on Nov 2 2015, 2:29 PM.

Details

Summary

This patch is what was the "instcombine" portion of D14185, with an additional test added (see julia_pseudovec in test/Transforms/InstCombine/insert-val-extract-elem.ll). The patch causes instcombine to replace sequences of extractelement-insertvalue-store that act essentially like a bitcast followed by a store.

Diff Detail

Event Timeline

ArchDRobison retitled this revision from to Optimize store of "bitcast" from vector to aggregate..
ArchDRobison updated this object.
ArchDRobison added subscribers: loladiro, llvm-commits.
hfinkel added inline comments.
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
922

Please use %U<n> and %W<n> in this comment because you use the variables named U and W in the code (and that should make things clearer without necessitating longer variable names).

942

Do we need to check here that IV has only one index?

964

Do the types here need to agree, or just the type sizes?

ArchDRobison added inline comments.Jan 27 2016, 12:49 PM
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
922

[I'm now back from sabbatical.] Nice idea. U and W actually refer to the same instruction. I'll try using %U, %V, and %E since those correspond to U, V, and EI in the code. I'll rename EI to E to make the correspondence cleaner.

942

The check is necessary, since if IV has more than one index the code would have to do much trickier checking for equivalence to a vector. Checking that the aggregate element type is a scalar type would have the same effect, though checking the number of indices seems like a more direct approach.

964

At a minimum, sizes and alignments have to agree (otherwise inter-element padding might not match up exactly). But I doubt the generalization is worth the extra complexity unless there are clear use cases that would benefit from it.

Update changes code comment per Hal's request.

ArchDRobison marked 2 inline comments as done.

Two changes inspired by mzolokhin's comments for D14185:

  • CHECK-LABEL added to tests.
  • getTypeStoreSizeInBits used for size comparisons.

The choice between using getTypeSizeInBits or getTypeStoreSizeInBits is not obvious to me, but the latter seems more appropriate because the issue is where type-punning a memory location is allowed, not whether address arithmetic is isomorphic. For example, consider a hypothetical architecture requires 32-bit alignment for everything, and has a i24 type and a packed <3 x i24> type. In this scenario, a struct {i24,i24,i24} will have 3 pad bytes interspersed, whereas a <3 x i24> will have the pad bytes at the end. If someone has a more realistic example, I'd like to be enlightened.

mzolotukhin edited edge metadata.Jan 29 2016, 1:07 PM

Hi Arch,

You might find a related example in the test from r248943 (D13277).

Michael

Anyone want to give this a final okay? It needs to be checked in before D14185(which is also waiting for a final okay).

majnemer edited edge metadata.Apr 5 2016, 10:49 AM

It'd be nice to have tests that show that this does the right thing if we insertelement on the same index twice to make sure we get the correct element.

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
930

Pointers should lean right: Value *V.

949

I'd use auto *UT here and do something similar bellow.

ArchDRobison edited edge metadata.

Addressed issues raised by David Majnemer, including test for "same index twice".

ArchDRobison marked 2 inline comments as done.Apr 6 2016, 8:57 AM

No changes other than update of context. It would be good if someone could give this a final sign-off.

majnemer accepted this revision.Apr 25 2016, 1:20 PM
majnemer edited edge metadata.

LGTM

This revision is now accepted and ready to land.Apr 25 2016, 1:20 PM
loladiro closed this revision.Jul 14 2016, 3:37 PM

This was applied some time back as rL267482.