This is an archive of the discontinued LLVM Phabricator instance.

[LoopCacheAnalysis] Fix a type mismatch bug in cost calculation
ClosedPublic

Authored by congzhe on Jun 29 2022, 10:27 PM.

Details

Summary

As reported in https://reviews.llvm.org/rGb941857b40edd7f3f3a9ec2ec85a26db24739774#1100674, there is a loop cache analysis bug exposed by the recent loop interchange new cost model patch. In isConsecutive() from LoopCacheAnalysis.cpp, sometimes the SCEV variables Coeff and ElemSize may not match, e.g., when there is no target datalayout provided in an IR. The mismatch would cause SCEV failures when multiplying Coeff with ElemSize.

As discussed in the loopopt meeting, the fix in this patch is to extend the type of both Coeff and ElemSize to whichever is wider in those two variables. The IR reported in https://reviews.llvm.org/rGb941857b40edd7f3f3a9ec2ec85a26db24739774#1100674 is added in loop cache analysis tests.

As a clean-up, the Stride variable in computeRefCost() has been already computed in isConsecutive(), so I remove the duplicate calculations of Stride which also helps this patch to be more compact.

Diff Detail

Event Timeline

congzhe created this revision.Jun 29 2022, 10:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 10:27 PM
congzhe requested review of this revision.Jun 29 2022, 10:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 10:27 PM
congzhe edited the summary of this revision. (Show Details)Jun 29 2022, 10:30 PM
congzhe added reviewers: bmahjour, Meinersbur, Restricted Project.
congzhe added a project: Restricted Project.

Regarding Michael's question that whether SE.getNoopOrAnyExtend() is signed extension or unsigned extension: it is actually well handled in ScalarEvolution::getAnyExtendExpr() where it could do either signed or unsigned extension depending on the actual SCEV type of the value we want to extend. I'm wondering if it answers your question? @Meinersbur

congzhe updated this revision to Diff 441268.Jun 29 2022, 10:49 PM
Meinersbur added a comment.EditedJun 30 2022, 10:32 AM

Regarding Michael's question that whether SE.getNoopOrAnyExtend() is signed extension or unsigned extension: it is actually well handled in ScalarEvolution::getAnyExtendExpr() where it could do either signed or unsigned extension depending on the actual SCEV type of the value we want to extend. I'm wondering if it answers your question? @Meinersbur

Whether to use sext or zext can make a semantic difference. Say we have an element size of i16 4 and a stride of -1 (going backwards) in 8-bit precision. That represented as i8 255. zero-extension of that is i16 255, and multiplying both gives i16 1020 which should be -4 (i16 0xFFFC).

getNoopOrAnyExtend() seems to prefer zero-extension over sign-extension.

Regarding Michael's question that whether SE.getNoopOrAnyExtend() is signed extension or unsigned extension: it is actually well handled in ScalarEvolution::getAnyExtendExpr() where it could do either signed or unsigned extension depending on the actual SCEV type of the value we want to extend. I'm wondering if it answers your question? @Meinersbur

Whether to use sext or zext can make a semantic difference. Say we have an element size of i16 4 and a stride of -1 (going backwards) in 8-bit precision. That represented as i8 255. zero-extension of that is i16 255, and multiplying both gives i16 1020 which should be -4 (i16 0xFFFC).

getNoopOrAnyExtend() seems to prefer zero-extension over sign-extension.

It looks like the case you mentioned has been handled correctly in ScalarEvolution::getAnyExtendExpr() (which is called from SE.getNoopOrAnyExtend()) by the following piece of code?

// Sign-extend negative constants.
if (const SCEVConstant *SC = dyn_cast<SCEVConstant>(Op))
  if (SC->getAPInt().isNegative())
    return getSignExtendExpr(Op, Ty);
bjope added a subscriber: bjope.Jul 1 2022, 3:09 AM

Afaik getNoopOrAnyExtend() is defined as "If the type must be extended, it is extended with unspecified bits.". So you can't rely on it doing zero extend or sign extend. The extended bits are undefined.

That is probably OK in use-cases such as how getNoopOrAnyExtend is used in ScalarEvolutionExpander, when it is truncating the result from the add/mul expr that is using the extended size (i.e. the upper bits are of no interest anyway).
But I do not really see how the usage of getNoopOrAnyExtend() in LoopCacheAnalysis is safe in a similar manner. Those "undefined" bits seem to be significant both in the new Stride calculation in this patch, as well as in the already present code that for example is doing

Stride = SE.getNoopOrAnyExtend(Stride, WiderType);
TripCount = SE.getNoopOrAnyExtend(TripCount, WiderType);
const SCEV *Numerator = SE.getMulExpr(Stride, TripCount);
RefCost = SE.getUDivExpr(Numerator, CacheLineSize);

Or am I missing something?

@bjope I think you are correct. This is what the source code documentation says:

/// Return a SCEV corresponding to a conversion of the input value to the
/// specified type. If the type must be extended, it is extended with
/// unspecified bits. The conversion must not be narrowing.
const SCEV *getNoopOrAnyExtend(const SCEV *V, Type *Ty);

@congzhe getAnyExtendExpr has some heuristics for which kind of extension is better for some definition of "better", but that doesn't mean it will find the correct one every time. The test case you added contains zext. Shouldn't that somehow indicate that zext is to be used in this case?

Hi folks, my apologies for the delay, and thanks for the input. I do agree with you that we need to treat this scev expansion more carefully, although for some places where getNoopOrAnyExtend() is called (like the piece of code that Bjorn posted above), both Stride and TripCount are positive so it might be straightforward to use SE.getNoopOrZeroExtend(), or maybe just SE.getNoopOrAnyExtend() as how it is used now.

Motivated by @Meinersbur Michael's comment that we might just do zext for the test case in this patch, looking into the function isConsecutive(), I'm thinking we can change this patch to the following code in order to handle the expansions more carefully. What do you think?

const SCEV *Coeff = getLastCoefficient();
const SCEV *ElemSize = Sizes.back();
Type *WiderType = SE.getWiderType(Coeff->getType(), ElemSize->getType());
Stride = SE.getMulExpr(SE.isKnownNegative(Coeff)
                           ? SE.getNoopOrSignExtend(Coeff, WiderType)
                           : SE.getNoopOrZeroExtend(Coeff, WiderType),
                       SE.isKnownNegative(ElemSize)
                           ? SE.getNoopOrSignExtend(ElemSize, WiderType)
                           : SE.getNoopOrZeroExtend(ElemSize, WiderType));

As a side note, in order not to block @bjope Bjorn's work, I'd like to mention that if we add a target datalayout line to the IR that crashes (https://reviews.llvm.org/rGb941857b40edd7f3f3a9ec2ec85a26db24739774#1100674), it would work fine. Something like target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" is sufficient. I mentioned it in the summary of this patch but just in case if you missed it. Hope this could provide a workaround for now.

I don't think we can rely on isKnownNegative

  1. The SCEV might be sometimes negative, sometime positive, depending on some runtime value (e.g. stride is multiplied by a function argument). This is, if isKnownPositive and isKnownNegative both return false, we'd need to bail out.
  2. isKnownNegative already assumes that the integer is signed. With that assumption, sext would always be correct (which is equivalent to zext for positive integers)
congzhe updated this revision to Diff 444376.EditedJul 13 2022, 12:13 PM

Updated the patch according to the discussion in the loopopt meeting. @Meinersbur I would appreciate it if you could take a second look, thanks a lot!

Meinersbur added inline comments.Jul 18 2022, 12:47 PM
llvm/lib/Analysis/LoopCacheAnalysis.cpp
477–478

Could you make it make more explicit that this is known-incorrect for some cases?
Suggestion:

// FIXME: This assumes that all values are signed integers which may be incorrect in unusual codes and incorrectly use sext instead of zext.
// for (uint32_t i = 256; i < 512; ++i) {
//   uint8_t trunc = i;
//   A[i] = 42;
// }
// This consecutively iterates twice over A. If `trunc` is sign-extended, we would conclude that this may iterate backwards over the array.
// However, LoopCacheAnalysis is heuristic anyway and transformations must not result in wrong optimizations if the heuristic was incorrect.
congzhe updated this revision to Diff 445613.Jul 18 2022, 1:43 PM

Thanks Michael for your suggestion, I've updated the patch accordingly.

I'm just wondering if you meant A[trunc]=42 in your example? Also I'm a bit confused if the loop actually iterates over A twice? It looks to me the variable i loops from 0x0100 to 0x01FF, hence trunc loops from A[0x00] to A[0xFF] so seems like it just iterates once?

Meinersbur accepted this revision.Jul 19 2022, 12:59 PM

Thanks Michael for your suggestion, I've updated the patch accordingly.

I'm just wondering if you meant A[trunc]=42 in your example? Also I'm a bit confused if the loop actually iterates over A twice? It looks to me the variable i loops from 0x0100 to 0x01FF, hence trunc loops from A[0x00] to A[0xFF] so seems like it just iterates once?

Both you remarks are correct. You already fixed the first. I started with a loop for (uint32_t i = 0; i < 512; ++i) which would iterate twice over the array (once forward, once backward) but then found it overcomplicates things but forgot to change it everywhere. Could you fix it either way? After that, LGTM.

This revision is now accepted and ready to land.Jul 19 2022, 12:59 PM
This revision was landed with ongoing or failed builds.Jul 20 2022, 10:58 PM
This revision was automatically updated to reflect the committed changes.