This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Fold SETCC(FREEZE(x),const) to FREEZE(SETCC(x,const)) if SETCC is used by BRCOND
ClosedPublic

Authored by aqjune on Jul 2 2021, 5:11 AM.

Details

Summary

This patch adds a peephole optimization SETCC(FREEZE(x),const) => FREEZE(SETCC(x,const))
if the SETCC is only used by BRCOND.

Combined with BRCOND(FREEZE(X)) => BRCOND(X), this leads to a nice improvement in the generated assembly when x is a masked loaded value.

Diff Detail

Event Timeline

aqjune created this revision.Jul 2 2021, 5:11 AM
aqjune requested review of this revision.Jul 2 2021, 5:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2021, 5:11 AM
aqjune added a comment.Jul 2 2021, 5:13 AM

Found while investigating the impact of D104569.

aqjune updated this revision to Diff 356154.Jul 2 2021, 5:14 AM

clang-format

nikic added a comment.EditedJul 2 2021, 12:10 PM

Might it make sense to consider this as a fold on freeze, which walks a chain of one-use instructions down to a brcond? That would make it easier to generalize for looking through other instructions.

aqjune added a comment.Jul 3 2021, 8:54 AM

Might it make sense to consider this as a fold on freeze, which walks a chain of one-use instructions down to a brcond? That would make it easier to generalize for looking through other instructions.

Yep, it makes sense; will do that.

aqjune updated this revision to Diff 356347.Jul 3 2021, 10:18 AM

Update a comment

Might it make sense to consider this as a fold on freeze, which walks a chain of one-use instructions down to a brcond? That would make it easier to generalize for looking through other instructions.

Yep, it makes sense; will do that.

Well, I wanted to write a generalized worklist algorithm that traverses one-use operands from BRCOND and folds freeze if possible, but it seems SDNode is an immutable data structure?
I couldn't find setOperand()-like function from SDNode.

This optimization seems still helpful after D105392 is landed.
Should we generalize this transform and optimize away all freezes used by BRCOND? This will need an introduction of a new helper function that is analogous to canCreateUndefOrPoison.

nikic added a comment.Jul 12 2021, 1:17 PM

This optimization seems still helpful after D105392 is landed.
Should we generalize this transform and optimize away all freezes used by BRCOND? This will need an introduction of a new helper function that is analogous to canCreateUndefOrPoison.

Hm, why do we need canCreateUndefOrPoison()? TBH I'm not quite sure when exactly doing this is valid. Let me think out load. Using a freeze instruction can only be necessary if either

  • It is used multiple times (or a recursive user is used multiple times). In this case freeze makes sure that the same value is seen by all users.
  • It is used in an instruction where undef/poison are immediate UB. In this case freeze launders UB.

On the IR level this covers pretty much all uses, because we either have additional uses we don't see (function return/arg, store value) or the operand is UB on undef (branch, pointer operand). On the DAG level the issue with additional uses remains (store value, CopyToReg) but the UB on undef problem mostly goes away. At least we define BRCOND on undef as not UB -- I don't know about load/store addresses.

So basically it's okay to drop freeze if we can walk a one-use chain down to a "sink" instruction that both doesn't have implicit uses and does not have UB on undef. One such sink is BRCOND. Whether instructions along the way can produce poison shouldn't matter.

All that said, handling this fully generically is probably not simple, but writing it in a way that allows adding ISD opcodes to a whitelist as needed would probably still be nice.

This optimization seems still helpful after D105392 is landed.
Should we generalize this transform and optimize away all freezes used by BRCOND? This will need an introduction of a new helper function that is analogous to canCreateUndefOrPoison.

Hm, why do we need canCreateUndefOrPoison()?

It's because pushing freeze forward doesn't hold for certain poison-generating operations like mul nsw:
https://alive2.llvm.org/ce/z/boohtf
Actually, it depends on operations: add nsw is fine because add nsw x, 0 is simply equivalent to x and add nsw x, y for any non-zero y there exists x that makes the result overflow.
Didn't think enough in which case the transformation is correct in general, yet.

TBH I'm not quite sure when exactly doing this is valid. Let me think out load. Using a freeze instruction can only be necessary if either

  • It is used multiple times (or a recursive user is used multiple times). In this case freeze makes sure that the same value is seen by all users.
  • It is used in an instruction where undef/poison are immediate UB. In this case freeze launders UB.

On the IR level this covers pretty much all uses, because we either have additional uses we don't see (function return/arg, store value) or the operand is UB on undef (branch, pointer operand). On the DAG level the issue with additional uses remains (store value, CopyToReg) but the UB on undef problem mostly goes away. At least we define BRCOND on undef as not UB -- I don't know about load/store addresses.

So basically it's okay to drop freeze if we can walk a one-use chain down to a "sink" instruction that both doesn't have implicit uses and does not have UB on undef. One such sink is BRCOND. Whether instructions along the way can produce poison shouldn't matter.

All that said, handling this fully generically is probably not simple, but writing it in a way that allows adding ISD opcodes to a whitelist as needed would probably still be nice.

I think splitting the transformation into smaller steps would be easier.
For some op1 and op2, folding op2(op1(FREEZE(x),y)) => op2(op1(x,y)) can be broken down into three steps:

  %z = op1(FREEZE(x), y)
  op2(%z)
=> // step 1
  %z = op1(FREEZE(x), FREEZE(y))
  op2(%z)
=> // step 2
  %z = op1(x, y)
  %z.fr = FREEZE(%z)
  op2(%z.fr)
=> // step 3
  %z = op1(x, y)
  op2(%z)

Step 1 is trivially correct.

Step 2: Relying on canCreateUndefOrPoison makes checking this step simpler, maybe?

Step 3: Optimizing freeze away is correct if the result of op2 on undef/poison is equivalent to op2 on some random number. In this case, op2 is BRCOND, and BRCOND(undef) is equivalent to BRCOND(0) or BRCOND(1), so this holds.
This property doesn't hold for e.g. mul; mul undef, 2 returns a set of even numbers, whereas mul n, 2 for any n is a single even number.

That being said, I'm fine with the current patch - it still allows optimizing the setcc-freeze.ll example.

  %z = op1(FREEZE(x), y)
  op2(%z)
=> // step 1
  %z = op1(FREEZE(x), FREEZE(y))
  op2(%z)
=> // step 2
  %z = op1(x, y)
  %z.fr = FREEZE(%z)
  op2(%z.fr)
=> // step 3
  %z = op1(x, y)
  op2(%z)

Step 2: Relying on canCreateUndefOrPoison makes checking this step simpler, maybe?

I found that canCreateUndefOrPoison isn't enough. A non-poison-generating operation can be problematic as well.
Look at this example: https://alive2.llvm.org/ce/z/xGDGHF

    %n.fr = freeze i8 %n           // Assume that %n = poison
    %m.fr = freeze i8 %m           // Assume that %m = -128
    %v = icmp slt i8 %n.fr, %m.fr  // This is always false, but...
=>
    %v = icmp slt i8 %n, %m        // This is poison, so
    %v.fr = freeze i1 %v           // This (freeze poison) can be true.
    ret i1 %v.fr

So, even in SETCC (which is analogous to icmp) pushing freeze should have been carefully done.

Fortunately, we only need to deal with one operand being constant, so this can be checked in advance.

nikic added a comment.Jul 17 2021, 3:41 AM

@aqjune That's a great point. So we need to limit the transform to having a constant operand and showing that the constant operand does not imply always-true/always-false setcc?

Yes, I think so. Will update this patch tmr.

aqjune updated this revision to Diff 359596.Jul 17 2021, 11:12 PM

Enable SETCC folding only when it holds the discussed constraint

aqjune retitled this revision from [DAGCombiner] Fold SETCC(FREEZE(x),y) to SETCC(x,y) if used by BRCOND to [DAGCombiner] Fold SETCC(FREEZE(x),const) to FREEZE(SETCC(x,const)) if SETCC is used by BRCOND.Jul 17 2021, 11:12 PM
aqjune edited the summary of this revision. (Show Details)

This patch is fixed to deal with pulling out FREEZE from SETCC only. The removal of FREEZE is delegated to visitBRCOND's simplification.

llvm/test/CodeGen/X86/setcc-freeze.ll
24

These suboptimal codes were being generated even if %cond was not frozen as well.

efriedma added inline comments.Jul 26 2021, 1:41 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10200

Maybe worth calling out the one-use constraint in this comment?

10210

I don't think the IsIntComparison check does anything. The cod codes are shared between int and fp ops. The way to check if you have an integer comparison is just to check the types of the operands... which you're implicitly doing by casting to ConstantSDNode.

aqjune updated this revision to Diff 361929.Jul 27 2021, 12:38 AM
aqjune edited the summary of this revision. (Show Details)

Remove int predicate check, update a comment

aqjune marked 2 inline comments as done.Jul 27 2021, 12:38 AM
aqjune added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10210

Didn't realize that ConstantSDNode only has an integer, thanks.

This revision is now accepted and ready to land.Jul 27 2021, 1:40 PM
This revision was landed with ongoing or failed builds.Jul 27 2021, 5:22 PM
This revision was automatically updated to reflect the committed changes.
aqjune marked an inline comment as done.