This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Improve PHI-breaking heuristics in CGP
ClosedPublic

Authored by Pierre-vh on May 10 2023, 6:01 AM.

Details

Summary

D147786 made the transform more conservative by adding heuristics,
which was a good idea. However, the transform got a bit
too conservative at times.

This caused a surprise in some rocRAND benchmarks because D143731 greatly helped a few of them.
For instance, a few xorwow-uniform tests saw a +30% boost in performance after that pass, which was lost when D147786 landed.

This patch is an attempt at reaching a middleground that makes
the pass a bit more permissive. It continues in the same spirit as
D147786 but does the following changes:

  • PHI users of a PHI node are now recursively checked. When loops are encountered, we consider the PHIs non-breakable. (Considering them breakable had very negative effect in one app I tested)
  • shufflevector is now considered interesting, given that it satisfies a few trivial checks.

Diff Detail

Event Timeline

Pierre-vh created this revision.May 10 2023, 6:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2023, 6:01 AM
Pierre-vh requested review of this revision.May 10 2023, 6:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2023, 6:01 AM
Pierre-vh added a reviewer: Restricted Project.May 10 2023, 6:01 AM
jmmartinez added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1405–1409

Nit (very optional): I think that if we made areInSameBB into isOperandInSameBB (see below) the code becomes slightly simpler

static bool isOperandInSameBB(const Instruction& I, unsigned Op) {
  if(Instruction *OpI = dyn_cast<Instruction>(I->getOperand(Op))) {
    return Op->getParent() == I.getParent();
  }
  return true;
}

if (isa<Instruction>(VecSrc) && !areInSameBB(VecSrc, IE))
  return false;

becomes

if (!areInSameBB(IE, 0))
  return false;

and

return isa<Constant>(SV->getOperand(1)) ||
       areInSameBB(SV, SV->getOperand(0)) ||
       areInSameBB(SV, SV->getOperand(1));

becomes

return areInSameBB(SV, 0) || areInSameBB(SV, 1);
1469–1475

You can use the result of DenseMap::insert to do a single lookup instead of two (one for checking if the element is in the map and another to insert the false value).

1474

Are references to DenseMap elements guaranteed to be valid after insertion?

If not, the recursive call to canBreakPhiNode may insert new values into the cache and invalidate the references held by the parent calls.

jmmartinez added inline comments.May 10 2023, 7:01 AM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1472–1473

Have you considered using an enum with three states (enum CanBreakPHI { OnHold, CanBreak, Unbreakable }) instead of a boolean? In that case, we would be able to break phi-nodes that currently we can't.

I have no idea if that would be interesting though.

Pierre-vh updated this revision to Diff 520987.May 10 2023, 7:19 AM
Pierre-vh marked 4 inline comments as done.

Address comments

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1405–1409

I think it's a bit more readable if I keep the current style, in the loop I still need to extract VecSrc anyway so IMO it's clearer to do:

const auto *VecSrc = IE->getOperand(0);
if (isa<Instruction>(VecSrc) && !areInSameBB(VecSrc, IE))
  return false;
const auto *VecSrc = IE->getOperand(0);
if(isOperandInSameBB(IE, 0))` // it's identical to VecSrc but looks different for no particular reason
  return false;
1469–1475

Nice catch, though with iterator invalidation I need to do another lookup anyway before returning to be safe.
Though DenseMap is very fast, especially here because it's unlikely to be large, so I think it's fine.

1472–1473

I tried a similar thing at first: considering all PHIs breakable until they were proven not to be, but I found that it's not beneficial. In one particularly pathological case it even increased the total number of instructions in the output by 50% (from 16000 to 23000). The current logic seems to handle all cases I checked against (rocRAND benchmarks mostly which are super sensitive to this transform)

I don't think a tri-state enum would bring anything here other than added complexity so I opted to just use a boolean, but I'm open to revisiting it.

1474

Ah, I always forget DenseMap can invalidate on insert
Fixed

jmmartinez accepted this revision.May 11 2023, 12:54 AM

Just a few minor suggestions. Otherwise it looks good to me.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1405–1409

You're right, it looks weird.

What about using an const Use & then ? Use& converts implicitely to Value*, but it also keeps track of the User instruction.

const auto *VecSrc = IE->getOperand(0);
if (isa<Instruction>(VecSrc) && !areInSameBB(VecSrc, IE))
  return false;

would become

const Use& VecSrc = IE->getOperandUse(0);
if(isOperandInSameBBAsUser(VecSrc))
  return false;

With

static bool isOperandInSameBBAsUser(const Use& Op) {
  Instruction *OpI = dyn_cast<Instruction>(&Op);
  retrun !OpI || OpI->getParent() == cast<Instruction>(Op.getUser())->getParent();
}

I was also wondering. What if the operand is an Argument? (I guess there are no relevant cases to really care about it)

1528

I'd use a dyn_cast instead of the isa + cast. Even if it makes the loop a bit bigger.

1529

We can remove the assignment for the false case, since it's already set to false, right?

This revision is now accepted and ready to land.May 11 2023, 12:54 AM
Pierre-vh updated this revision to Diff 521226.May 11 2023, 1:42 AM
Pierre-vh marked 3 inline comments as done.

Comments

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1405–1409

I'm still not convinced it needs changing :)
I think the current version is easy enough to understand - it's a simple function with a simple name, and I think there's value in that

As for Arguments, they're intentionally not supported because they'll be lowered to a CopyFromReg anyway (I think), so we have no opportunity to simplify the ExtractElement instructions we'd insert.

I also think vector args are somewhat uncommon, I rarely encounter vector values in parameters, often people just pass an array of data coming from the host or something like that and you get load <N x T> %arg instead, so they're not worth considering unless a good counter-example is found, IMO.

jmmartinez accepted this revision.May 11 2023, 3:05 AM

@arsenm would you like to take a look before this lands?

arsenm accepted this revision.May 12 2023, 7:00 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1433

Capitalize

1442–1443

Capitalize

This revision was landed with ongoing or failed builds.May 15 2023, 12:16 AM
This revision was automatically updated to reflect the committed changes.
Pierre-vh marked 2 inline comments as done.