This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Enhance machine PHIs optimization
ClosedPublic

Authored by anton-afanasyev on Nov 22 2018, 2:13 PM.

Details

Summary

Make machine PHIs optimization to work for single value register taken from
several different copies. This is the first step to fix PR38917. This change
allows to get rid of redundant PHIs (see opt_phis2.mir test) to make
the subsequent optimizations (like CSE) possible and simpler.

For instance, before this patch the code like this:

%b = COPY %z
...
%a = PHI %bb1, %a; %bb2, %b

could be optimized to:

%a = %b

but the code like this:

%c = COPY %z
...
%b = COPY %z
...
%a = PHI %bb1, %a; %bb2, %b; %bb3, %c

would remain unchanged.
With this patch the latter case will be optimized:

%a = %z```.

Diff Detail

Repository
rL LLVM

Event Timeline

anton-afanasyev edited the summary of this revision. (Show Details)Nov 28 2018, 8:13 AM
anton-afanasyev edited the summary of this revision. (Show Details)
anton-afanasyev edited the summary of this revision. (Show Details)

I've updated summary to be more clear about the change. How can I else get the review of this differential?

RKSimon added a subscriber: MatzeB.

This looks ok, but I know very little about this pass - @MatzeB @craig.topper any comments?

(Adding more potential reviewers for MI changes)

MatzeB added a comment.Dec 6 2018, 9:07 AM

I'm not too familiar with the history of OptimizePHIs but some things feel odd to me here:

  • The comments in the pass make it seem like it is designed to catch dataflow around loops, yet the motivation for your changes appears to be simpler phis without loops involved.
  • I keep wondering why we bother with COPYs here at all. If the COPYs are really trivial COPYs between the same register classes then we really should just remove them when we are still in MachineSSA. Do you know how they are created? Maybe you can simply stop their creation instead?
  • Skipping COPYs like this is also a bit inconsequential and will miss several cases: You could have more than 1 COPY in a row or switch between register classes. The PeepholeOptimizer pass already handles these more complex cases, so maybe we rather should look for running OptimizePHIs after the Peephole Optimizer so those COPYs are optimized? (Not sure though if there are other negative effects of running OptimizePHIs later or PeepholeOpt earlier).

With all that said, I don't expect the patch to break anything, so no principal objections to committing it...

lib/CodeGen/OptimizePHIs.cpp
123–125 ↗(On Diff #175072)

I don't undestand why COPYs would be more likely in the predecessor blocks of the PHIs (this is before PHI lowering after all). And it also doesn't make any sense to me to have different behavior depending on where the COPY is, can't we just always look through the COPY and change SrcReg?

140 ↗(On Diff #175072)

This is a nice change!

test/CodeGen/X86/opt_phis2.mir
15–35 ↗(On Diff #175072)

Things should also work without this block, as you already define all the register classes on the MIR operands themselfes.

36–39 ↗(On Diff #175072)

I suspect you can simply drop this block as well for this test

I'm not too familiar with the history of OptimizePHIs but some things feel odd to me here:

  • The comments in the pass make it seem like it is designed to catch dataflow around loops, yet the motivation for your changes appears to be simpler phis without loops involved.

This pass is more about PHIs cycles rather than about dataflow loops (though they are related). I've just expanded this pass to work with more complex PHIs configuration. I wanted firstly to rename function IsSingleValuePHICycle() to something like IsSingleValuePHICycleOrChain() but then realized that before it was not only about cycle as well (PHIs configuration could contain several cycles). Actual name of function (before or after my change) should be IsSingleValuePHIGraph() or IsSingleValuePHIConfiguration(). The common sense remained the same -- we have several (possibly single) PHIs connected by cycle/cycles/chain with only single external register. You are right in the sense that with patch this pass can eliminate single PHI node without cycle to other PHI nodes, but one can still regard it as "PHI loop" to itself through COPYied register. And you are right that it was my motivation initially (for further MachineCSE/GVN simplification), but this case was expanded to more common fortunately. Also it triggered several enhancements in regression tests having phi cycles.
What I really forgot is to add some comments to this function description to make it more clear. Will it be enough?

  • I keep wondering why we bother with COPYs here at all. If the COPYs are really trivial COPYs between the same register classes then we really should just remove them when we are still in MachineSSA. Do you know how they are created? Maybe you can simply stop their creation instead?
  • Skipping COPYs like this is also a bit inconsequential and will miss several cases: You could have more than 1 COPY in a row or switch between register classes. The PeepholeOptimizer pass already handles these more complex cases, so maybe we rather should look for running OptimizePHIs after the Peephole Optimizer so those COPYs are optimized? (Not sure though if there are other negative effects of running OptimizePHIs later or PeepholeOpt earlier).

These COPYs are created while MIR instruction selection, for instance, llvm instr %2 = bitcast <4 x i64> %1 to <8 x i32> translates to %2 = COPY %1. Not sure this COPYs could be eliminated during X86 DAG->DAG itself.
I'm not familiar with PeepholeOptimizer, are you sure it is aimed to handle such COPYs? I've tried llc -run-pass=peephole-opt opt_phis2.mir, haven't seen any COPYs propagation on my test sample opt_phis2.mir. Nevertheless, eliminating COPYs is not enough to optimize phis at the samples like this one. Without patch this pass even cannot optimize %a = PHI %b, %bb1, %b, %bb2 to %a = COPY %b. Actually COPYs are auxilarly thing in this pass. I've explored MachineCSE pass before this patch coding -- it uses auxiliary copy propagation as well to help main optimization (see PerformTrivialCopyPropagation()). Actually OptimizePHI pass itself (without my patch) already uses copy propagation -- but only to get next possible PHI node, not to get possible same register. One can do copy propagation (in MachineSSA) in any preceding pass (which one?), or just add several lines in passes where it is needed (MachineCSE, OptimizePHIs, ...). After breaking SSA form such redundant COPIes will be eliminated anyway.
Yes, you're right, this pass checks only one COPY in a row but that is true before my patch as well, I've just added the case when several COPYs leads to the same register. Though I believe more than one COPY in a row is rare case.

With all that said, I don't expect the patch to break anything, so no principal objections to committing it...

Thank you for deep review! I'm to remove redundant if-condition of COPY locations which you have noticed.

anton-afanasyev marked 7 inline comments as done.Dec 9 2018, 11:38 PM
anton-afanasyev added inline comments.
lib/CodeGen/OptimizePHIs.cpp
123–125 ↗(On Diff #175072)

Yes, you are right, this check is redundant! I'm to remove it.

test/CodeGen/X86/opt_phis2.mir
15–35 ↗(On Diff #175072)

Yes, thank you!

36–39 ↗(On Diff #175072)

Ok.

anton-afanasyev edited the summary of this revision. (Show Details)
anton-afanasyev marked 3 inline comments as done.

Update according to @MatzeB remarks.

Hi @MatzeB, as for special pass for COPYs propagation, I haven't found natural place for it after MIR ISel and before OptimizationPHIs among existing passes:

X86 DAG->DAG Instruction Selection
MachineDominator Tree Construction
Local Dynamic TLS Access Clean-up
X86 PIC Global Base Reg Initialization
Expand ISel Pseudo-instructions
X86 Domain Reassignment Pass
Early Tail Duplication
Optimize machine instruction PHIs

I think such pass could be useful right before OptimizePHIs, possibly making special passes in OptimizePHIs and MachineCSE needless.
But it is not related straightforwardly to this patch since COPYs check has been already in place before patch.

Hi Matthias, is it ok for LGTM now?

MatzeB accepted this revision.Dec 14 2018, 3:32 PM

LGTM

This revision is now accepted and ready to land.Dec 14 2018, 3:32 PM
RKSimon accepted this revision.Dec 15 2018, 5:01 AM

LGTM cheers

This revision was automatically updated to reflect the committed changes.