This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Simplify scalar cost calculation in getInstructionCost
ClosedPublic

Authored by david-arm on Apr 1 2021, 4:15 AM.

Details

Summary

This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

  1. VF is a scalar, in which case N = 1.
  2. VF is a vector. We can only get here if: a) the instruction is a

GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Diff Detail

Event Timeline

david-arm created this revision.Apr 1 2021, 4:15 AM
david-arm requested review of this revision.Apr 1 2021, 4:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2021, 4:15 AM

A version of this patch was previously merged (D98512) then reverted due to a failure with the X86 sanitiser build that exposed some missing tests from our LLVM test suite regarding pointer PHIs feeding directly into stores. I've attempted another fix here without the previous assert because the logic seems far too complicated for an assert.

fhahn added a comment.Apr 19 2021, 4:12 AM

A version of this patch was previously merged (D98512) then reverted due to a failure with the X86 sanitiser build that exposed some missing tests from our LLVM test suite regarding pointer PHIs feeding directly into stores. I've attempted another fix here without the previous assert because the logic seems far too complicated for an assert.

I'm not sure I follow why the logic would be far too complicated for an assert? There are asserts that verify the whole dominator tree or the whole function, which seems much more complicated :) Also, IIUC the assert caught a case that was missing from the comment/explanation in the initial patch, so it seems like it was doing what it was supposed to be?

ctetreau added inline comments.Apr 19 2021, 10:04 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7433

I'd feel better if there was an assert here that verifies that the conditions you state in the commit message are true. Same with all the other places N is eliminated.

At the very least, a comment that explains inline that, even though conceptually there should be an N, because of [reasons], we don't need it.

I'm not sure I follow why the logic would be far too complicated for an assert? There are asserts that verify the whole dominator tree or the whole function, which seems much more complicated :) Also, IIUC the assert caught a case that was missing from the comment/explanation in the initial patch, so it seems like it was doing what it was supposed to be?

I agree. If you don't want a huge complicated expression inside of an assert, you could write a function that does the asserting, and guard it with NDEBUG so it is only included if asserts are enabled.

Hi @ctetreau and @fhahn, my concern is that the decisions for whether or not to multiply by VF is quite complicated and is lacking a well-documented description for the expected behaviour. I'm cautious that the updated assert may still not cover everything correctly and is therefore fragile.

On the other hand, if the condition is incorrect but not asserted, then in the worst case the cost would be different and the vectorizer might pick a different VF. If we add an assertion and the condition is not correct, the compiler would crash.

So it depends on how fragile the condition would be for handling new corner-cases, and if it's even worth the effort given that the impact of making wrong assumptions by this code is very low.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7433

Hi @ctetreau, in the first iteration of this patch I did assert something like this:

auto hasSingleCopyAfterVectorization = [this](Instruction *I,
                                              ElementCount VF) -> bool {
  if (VF.isScalar())
    return true;

  auto Scalarized = InstsToScalarize.find(VF);
  assert(Scalarized != InstsToScalarize.end() &&
         "VF not yet analyzed for scalarization profitability");
  return !Scalarized->second.count(I) &&
         llvm::all_of(I->users(), [&](User *U) {
           auto *UI = cast<Instruction>(U);
           return !Scalarized->second.count(UI);
         });
};

if (isScalarAfterVectorization(I, VF)) {
  VectorTy = RetTy;
  // With the exception of GEPs, after scalarization there should only be one
  // copy of the instruction generated in the loop. This is because the VF is
  // either 1, or any instructions that need scalarizing have already been
  // dealt with by the the time we get here. As a result, it means we don't
  // have to multiply the instruction cost by VF.
  assert(I->getOpcode() == Instruction::GetElementPtr ||
         hasSingleCopyAfterVectorization(I, VF));

and although it passed the LLVM tests, it failed after merging due to another missing case - that of Instruction::Phi.

I prefer an assert, but so long as the compiler produces correct results in the assertion failure case, I guess it's fine for now. I'll not block your patch over it.

david-arm updated this revision to Diff 340472.Apr 26 2021, 3:35 AM
  • Re-added the assert from the first version of the patch (D98512) with an extra check for PHI instructions.
david-arm edited the summary of this revision. (Show Details)Apr 26 2021, 3:36 AM
david-arm marked an inline comment as done.Apr 26 2021, 3:39 AM

Hi @ctetreau and @fhahn, I've re-added the assert to the patch with an extra check for PHIs. It has passed the LLVM/clang tests.

sdesmalen accepted this revision.Apr 26 2021, 9:04 AM

Expanding the assert with the extra condition seems the right approach, as it hopefully covers all cases now. If it doesn't, we can always reconsider the approach.
LGTM.

This revision is now accepted and ready to land.Apr 26 2021, 9:04 AM
fhahn accepted this revision.Apr 27 2021, 1:29 AM

LGTM, thanks. The assert should ensure we catch any other cases that may got missed by the original reasoning behind the change.

This revision was landed with ongoing or failed builds.Apr 27 2021, 7:26 AM
This revision was automatically updated to reflect the committed changes.

@fhahn and @sdesmalen well that didn't last long, as expected. :) Looks like this assert will need yet another condition. I'm looking into the build failure.

fhahn added a comment.May 13 2022, 3:42 AM

Unfortunately another case surfaced where we a scalarized instruction requiring multiple copies reaches getInstructionCost and triggers the assert: https://github.com/llvm/llvm-project/issues/55096. I don't think this can be handled easily, so I put up a patch to undo the changes: D125533. Not sure if there are any better alternatives.

Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2022, 3:42 AM