This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Support load-store forwarding with scalable vector types
Changes PlannedPublic

Authored by david-arm on Feb 8 2022, 5:49 AM.

Details

Summary

This patch fixes an invalid TypeSize->uint64_t implicit conversion in
FoldReinterpretLoadFromConst. We now check to see if the offset is
greater than or equal to the known minimum size of the constant. In
addition, I have added support for scalable vector splats in
ReadDataFromGlobal, since we can support load-store forwarding
when we know the offset is safe.

Tests added here:

Transforms/InstCombine/load-store-forward.ll

Diff Detail

Event Timeline

david-arm created this revision.Feb 8 2022, 5:49 AM
david-arm requested review of this revision.Feb 8 2022, 5:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2022, 5:49 AM
david-arm edited the summary of this revision. (Show Details)Feb 8 2022, 5:49 AM
nikic added a reviewer: nikic.Feb 8 2022, 5:57 AM
nikic added a subscriber: nikic.
nikic added inline comments.
llvm/lib/Analysis/ConstantFolding.cpp
601–602

This is saying that accessing a scalable vector past its *minimum* size is undefined behavior. This doesn't look right to me. You probably want to return nullptr in that case, not UndefValue.

david-arm updated this revision to Diff 406839.Feb 8 2022, 8:21 AM
  • Return nullptr if the offset exceeds the known minimum value for a scalable vector.
david-arm marked an inline comment as done.Feb 8 2022, 8:22 AM
david-arm added inline comments.
llvm/lib/Analysis/ConstantFolding.cpp
601–602

Well spotted, thanks @nikic !

nikic added inline comments.Feb 9 2022, 1:04 AM
llvm/lib/Analysis/ConstantFolding.cpp
601–602

I think the code in ReadDataFromGlobal suffers from a similar issue, where a read that starts before the min size but crossed over it will leave the remaining bytes as zero.

david-arm marked an inline comment as done.Feb 9 2022, 1:08 AM
david-arm added inline comments.
llvm/lib/Analysis/ConstantFolding.cpp
601–602

I couldn't seem to create a test for that because whenever the loaded type is larger than the known minimum size of the constant type we don't perform the optimisation at all. Also, I wasn't able to construct a test case where the start offset is non-zero because then we don't even get into this function (I tried using getelementptr to load from an offset of the store ptr). If you have any suggestions on how to this cross-over case I'm happy to add them? Alternatively, I could just add an assert for that case?

nikic added inline comments.Feb 9 2022, 1:17 AM
llvm/lib/Analysis/ConstantFolding.cpp
601–602

Are scalable vectors supported as globals? If so, the more straightforward way to test this code is to perform a load from a scalable vector constant, in which case you can also test GEP with offset. The InstCombine code you're targeting currently only allows zero offsets.

sdesmalen added inline comments.Feb 16 2022, 7:17 AM
llvm/lib/Analysis/ConstantFolding.cpp
502

Can you split out all changes to make ReadDataFromGlobal work on scalable splats into a separate patch? I think the changes aren't trivial, and we want a fix for the assertion-failures in LLVM 14. The improvements to make this optimisation work for scalable vectors is less urgent.

512–513

I'd really like to avoid things like this, where a variable is named NumElts and we use the "Minimum known number of elements" from a scalable vector.

To avoid relying on the VectorType's ElementCount entirely, you can define the number of elements based on BytesLeft, i.e.:

Type *EltTy;
if (auto *AT = dyn_cast<ArrayType>(C->getType()))
  EltTy = AT->getElementType();
else
  EltTy = cast<VectorType>(C->getType())->getElementType();

uint64_t NumElts =
    std::max<uint64_t>(1, (8 * BytesLeft) / EltTy->getScalarSizeInBits());
521

Instead of checking for SplatVal and the special case for scalable vectors, I'd rather you just add this support to Constant::getAggregateElement directly.

david-arm planned changes to this revision.Feb 21 2022, 2:58 AM
david-arm added inline comments.Mar 7 2022, 5:42 AM
llvm/lib/Analysis/ConstantFolding.cpp
521

Hi @sdesmalen, I'm not sure this is right to be honest. It seems like we already have separate Constant::getAggregateElement and Constant::getSplatValue functions for a reason. If I understand you correctly, what you're asking for is basically something that does:

Constant *Constant::getAggregateElement(unsigned Elt) {
  if (Constant *SplatVal = getSplatValue(false))
    return SplatVal;
  .. existing code ..
}

If we do this then doesn't it also make sense to simply get rid of getSplatValue and replace all uses with getAggregateElement, rather than have two differently named interfaces that do the same thing?

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 5:42 AM
david-arm added inline comments.Mar 9 2022, 3:08 AM
llvm/lib/Analysis/ConstantFolding.cpp
512–513

So the problem with this approach is that currently BytesLeft can be larger than the type you're reading from. For example, in test/Transforms/InstCombine/strncpy-3.ll we try to read 4 bytes from an array of 3 bytes. In your example code above that means NumElts would be 4 and we end up crashing. That means we still have to use the number of elements in the type to limit the reads/writes to only those that are valid. Perhaps in the first instance where we call ReadDataFromGlobal I can clamp the value of BytesLeft to the known minimum size of the type?