Page MenuHomePhabricator

Add a machine function pass to convert binop(phi(constants), v) to phi(binop)
Needs ReviewPublic

Authored by Carrot on Feb 15 2022, 10:55 PM.

Details

Summary

This is a Machine Function version of D116058, also a conservative version of D37467 without critical edge splitting.

This pass does the transformation

  bb.0:
    ...
    %3:gr32 = MOV32ri 7
    JCC_1 %bb.2, 5, implicit $eflags

  bb.1:
    %5:gr32 = MOV32ri 11

  bb.2:
    %0:gr32 = PHI %3:gr32, %bb.0, %5:gr32, %bb.1                         // At most one non-const operand
    %6:gr32 = ADD32rr %2:gr32(tied-def 0), %0:gr32, implicit-def dead $eflags
    ...

=>

  bb.0:
    ...
    %7:gr32 = ADD32ri8 %2:gr32(tied-def 0), 7, implicit-def dead $eflags
    JCC_1 %bb.2, 5, implicit $eflags

  bb.1:
    %8:gr32 = ADD32ri8 %2:gr32(tied-def 0), 11, implicit-def dead $eflags

  bb.2:
    %6:gr32 = PHI %7:gr32, %bb.0, %8:gr32, %bb.1
    ...

So we can reduce both static and dynamic instructions. If there is one non-const operand, we can still do the transformation without code size improvement, but still have performance improvement.

Test case dup-phi-users1.ll is from D116058 and dup-phi-users2.ll is from D37467.

Diff Detail

Event Timeline

Carrot created this revision.Feb 15 2022, 10:55 PM
Carrot requested review of this revision.Feb 15 2022, 10:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2022, 10:55 PM

I suppose this is the right place for this transformation, rather than in middle-end.
Is this the same transform as D117110? Some enhancements are missing there, but do we need both?

My first question here would be whether it is preferable to do this as a machine-pass, or as a backend IR pass (which runs post-LSR at least, possibly part of CGP). Do you have any thoughts about what the trade-offs between those two options would be?

I suppose this is the right place for this transformation, rather than in middle-end.
Is this the same transform as D117110? Some enhancements are missing there, but do we need both?

D117110 requires two phis, it works on code binop(phi1, phi2).
This patch covers more cases, it works on instructions binop(phi, v).

I think D117110 is still required for reasons:

  • Earlier transformation can also benefit other optimizations. For example in test case fold-phi-of-binary-op.ll of D115914, if the or(phi, phi) can be optimized in the mid end, tailcall can be generated by SelectionDAG. It we do the transformation in back end, we will miss the tailcall chance.
  • D117110 works for all targets. This patch needs a target implementation of canFoldImmediate.

My first question here would be whether it is preferable to do this as a machine-pass, or as a backend IR pass (which runs post-LSR at least, possibly part of CGP). Do you have any thoughts about what the trade-offs between those two options would be?

@lebedev.ri mentioned there is no immediate materialization instructions at IR. These instructions are explicitly generated in Machine IR. For CGP there is comment "It should eventually be removed." in the file, so my understanding is we should not add too many things in it if possible. I didn't thought too much about other places in backend LLVM IR because I thought most of them are lowering and expanding work that should be done before SelectionDAG.

Carrot updated this revision to Diff 411289.Feb 24 2022, 7:09 PM

Reformat.

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 8:32 PM

Sorry, i don't have useful feedback here.

Any comments from other reviewers?

Adding more potential machine pass reviewers.
Was the question from @nikic (about making this an IR pass) answered?

Adding more potential machine pass reviewers.
Was the question from @nikic (about making this an IR pass) answered?

It was answered in https://reviews.llvm.org/D119916#3327325.