This patch is what was the "instcombine" portion of D14185, with an additional test added (see julia_pseudovec in test/Transforms/InstCombine/insert-val-extract-elem.ll). The patch causes instcombine to replace sequences of extractelement-insertvalue-store that act essentially like a bitcast followed by a store.
Details
Diff Detail
Event Timeline
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
922 | Please use %U<n> and %W<n> in this comment because you use the variables named U and W in the code (and that should make things clearer without necessitating longer variable names). | |
942 | Do we need to check here that IV has only one index? | |
964 | Do the types here need to agree, or just the type sizes? |
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
922 | [I'm now back from sabbatical.] Nice idea. U and W actually refer to the same instruction. I'll try using %U, %V, and %E since those correspond to U, V, and EI in the code. I'll rename EI to E to make the correspondence cleaner. | |
942 | The check is necessary, since if IV has more than one index the code would have to do much trickier checking for equivalence to a vector. Checking that the aggregate element type is a scalar type would have the same effect, though checking the number of indices seems like a more direct approach. | |
964 | At a minimum, sizes and alignments have to agree (otherwise inter-element padding might not match up exactly). But I doubt the generalization is worth the extra complexity unless there are clear use cases that would benefit from it. |
Two changes inspired by mzolokhin's comments for D14185:
- CHECK-LABEL added to tests.
- getTypeStoreSizeInBits used for size comparisons.
The choice between using getTypeSizeInBits or getTypeStoreSizeInBits is not obvious to me, but the latter seems more appropriate because the issue is where type-punning a memory location is allowed, not whether address arithmetic is isomorphic. For example, consider a hypothetical architecture requires 32-bit alignment for everything, and has a i24 type and a packed <3 x i24> type. In this scenario, a struct {i24,i24,i24} will have 3 pad bytes interspersed, whereas a <3 x i24> will have the pad bytes at the end. If someone has a more realistic example, I'd like to be enlightened.
Anyone want to give this a final okay? It needs to be checked in before D14185(which is also waiting for a final okay).
It'd be nice to have tests that show that this does the right thing if we insertelement on the same index twice to make sure we get the correct element.
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp | ||
---|---|---|
930 | Pointers should lean right: Value *V. | |
949 | I'd use auto *UT here and do something similar bellow. |
No changes other than update of context. It would be good if someone could give this a final sign-off.
Please use %U<n> and %W<n> in this comment because you use the variables named U and W in the code (and that should make things clearer without necessitating longer variable names).