This is an archive of the discontinued LLVM Phabricator instance.

[Loads] Forward constant vector store to load of first element
ClosedPublic

Authored by nikic on Mar 6 2021, 8:20 AM.

Details

Summary

InstCombine performs simple forwarding from stores to loads, but currently only handles the case where the load and store have the same size. This extends it to also handle a store of a constant with a larger size followed by a load with a smaller size.

This is implemented through ConstantFoldLoadThroughBitcast() which is fairly primitive (e.g. does not allow storing a large integer and then loading a small one), but at least can forward the first element of a vector store. Unfortunately it seems that we currently don't have a generic helper for "read a constant value as a different type", it's all tangled up with other logic in either ConstantFolding or VNCoercion.

Of course, GVN is the pass that implements this in full generality, but store to load forwarding in InstCombine is fairly important to sidestep phase ordering issues.

Diff Detail

Event Timeline

nikic created this revision.Mar 6 2021, 8:20 AM
nikic requested review of this revision.Mar 6 2021, 8:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2021, 8:20 AM
nikic added a comment.Mar 7 2021, 4:20 AM

Addressed my original motivating case in https://github.com/llvm/llvm-project/commit/3fedaf2a522b5ca8db209a3864bf55978c407b29, but this still seems like a sensible thing to do.

Looks sensible, not really familiar with this code :(

spatel accepted this revision.Mar 15 2021, 11:57 AM

LGTM - might want to add a test with constant expression, so we know we're not going to die on something like:

define i32 @vec_store_load_first(i32* %p) {
  %p2 = bitcast i32* %p to <2 x i32>*
  store <2 x i32> bitcast (i64 ptrtoint (i32 (i32*)* @vec_store_load_first to i64) to <2 x i32>), <2 x i32>* %p2, align 8
  %load = load i32, i32* %p, align 4
  ret i32 %load
}
This revision is now accepted and ready to land.Mar 15 2021, 11:57 AM
bjope added a subscriber: bjope.Apr 3 2021, 4:42 AM
bjope added inline comments.
llvm/test/Transforms/InstCombine/load-store-forward.ll
33

Maybe we should have some test cases with types when DataLayout::typeSizeEqualsStoreSize returns false. I actually think the result would be wrong here if for example using i4 (or i13) instead of i32, because we don't really know which bits that are loaded.

So there might be a bug in ConstantFoldLoadThroughBitcast, that it does not bail out if typeSizeEqualsStoreSize is false for the DestTy, that would be exposed by adding such test cases here.

(Maybe even typeAllocSize is of importance and not only typeStoreSize, or can we ignore alignment for the scalar data type?)

nikic added inline comments.Apr 3 2021, 6:43 AM
llvm/test/Transforms/InstCombine/load-store-forward.ll
33

I've added a test in https://github.com/llvm/llvm-project/commit/1470f94d71c544327f76b85c55cb6f7cb43a6cbb, does that match what you had in mind?

After reading through https://llvm.org/docs/LangRef.html#vector-type I think the result is fine, but the description there is certainly confusing. For little-endian it should certainly be fine, and for big endian the result is inverted twice (once per the described element order and then again through endianness), so should also come out to the same result.

bjope added inline comments.Apr 3 2021, 7:59 AM
llvm/test/Transforms/InstCombine/load-store-forward.ll
33

As someone has decided that the vectors are bitpacked, there is no individual alignment/padding for the elements in the datalayout, then I figure <2 x i17> must have the same alignment/padding as an i34. This because we allow bitcasting between such types, and bitcast is defined as storing using one type followed by a load with the other type..

However, it is not well-defined exactly where the padding goes when storing an type such as i34. Looking at https://llvm.org/docs/LangRef.html#load-instruction it says "When loading a value of a type like i20 with a size that is not an integral number of bytes, the result is undefined if the value was not originally written using a store of the same type. ". I assume that rule applies to storing a vector and loading a difference sized and non byte-sized scalar as well.

I don't think the optimization should trigger for the test case added in https://github.com/llvm/llvm-project/commit/1470f94d71c544327f76b85c55cb6f7cb43a6cbb

nikic added inline comments.Apr 3 2021, 9:23 AM
llvm/test/Transforms/InstCombine/load-store-forward.ll
33

Well, if you want to interpret that load wording strictly, the load i17 would simply be UB in this case, because the value was not stored with store i17, and it's okay to optimize it arbitrarily. The load spec also renders doing perfectly sensible things like a load after a memset UB. I'm pretty sure that was not the intention. Probably this should be changed to specify the actual behavior of whatever SDAG legalization does.

I don't plan to pursue this issue myself though. If there is an issue here (which I'm not convinced of), then it is a pre-existing one, as I'm just reusing a generic helper.

bjope added inline comments.Apr 6 2021, 12:41 AM
llvm/test/Transforms/InstCombine/load-store-forward.ll
33

Ok, right. If the result is undefined it doesn't make sense to have such IR in the first place so we can probably get away with anything.