This is an archive of the discontinued LLVM Phabricator instance.

[PHIElimination] Compile time optimization for huge functions.
ClosedPublic

Authored by jonpa on Jan 21 2020, 5:35 PM.

Details

Summary

This is a compile-time optimization for PHIElimination (splitting of critical edges), which was reported at https://bugs.llvm.org/show_bug.cgi?id=44249. As discussed there, the way to remedy the slowdowns with huge functions seems to be to pre-compute the live-in registers for each MBB in an efficient way in PHIElimination.cpp and then pass that information along to LiveVariabless::addNewBlock().

The reported test case there achieves a significant improved compile time with this patch:

TRUNK:

time clang -O3 -march=z10 crash0.i -c -ftime-report |& grep "Eliminate PHI" 
36.5362 ( 46.1%)   0.0137 (  3.2%)  36.5499 ( 45.8%)  36.5584 ( 45.8%)  Eliminate PHI nodes for register allocation 
real    1m20.967s
user    1m19.779s
sys     0m0.623s 

time clang -O3 -target x86_64 crash0.i -c -ftime-report |& grep "Eliminate PHI"
0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0028 (  0.1%)  Eliminate PHI nodes for register allocation
real    0m3.851s
user    0m3.604s
sys     0m0.101s 
 
PATCHED: 
 
time clang -O3 -march=z10 crash0.i -c -ftime-report |& grep "Eliminate PHI"
0.0399 (  0.1%)   0.0001 (  0.0%)   0.0400 (  0.1%)   0.0391 (  0.1%)  Eliminate PHI nodes for register allocation
real    0m43.473s
user    0m42.967s
sys     0m0.497s 
  
time clang -O3 -target x86_64 crash0.i -c -ftime-report |& grep "Eliminate PHI"
0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0010 (  0.0%)  Eliminate PHI nodes for register allocation
real    0m3.645s  
user    0m3.559s 
sys     0m0.090s

There is still a big slowdown for SystemZ, but the PHIElimination slowdown is gone. I would guess that this is an improvement on any given target where the number of virtual registers and basic blocks are great enough, but I have not tried this.

  • Is it OK to trust that MBB numbers don't change, or would it be best to use a map from MachineBasicBlock* to its SparseBitVector?
  • I choose to store the index of the virtual register instead of the register number, since that intuitively seems wiser (starting at 0), but curious if that's an improvement (but it's just one extra line of code...)
  • SparseBitVector was proven faster than BitVector...

Diff Detail

Event Timeline

jonpa created this revision.Jan 21 2020, 5:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 21 2020, 5:35 PM
bjope added inline comments.Jan 22 2020, 10:59 AM
llvm/lib/CodeGen/PHIElimination.cpp
166

I think it is a little bit ugly to do this after setting leaveSSA above.
My suggestion is to postpone the leaveSSA toggling until after populating the LiveInSets (considering that this code relies on still having ssa form).

177

This might needs an explanatory code comment.

If there is more than one kill, then it isn't killed in DefMBB => add to liveinset. (Ok!)

Else, if there is one kill and it isn't DefMBB => add to liveinset. (Ok!)

What I think is missing here is a guard to avoid the Kills.front() deref when there are no kills (may happen when a phi is the last use).

jonpa updated this revision to Diff 239679.Jan 22 2020, 1:01 PM
jonpa marked 4 inline comments as done.

Thanks for review!

Patch updated - see inline comments.

llvm/lib/CodeGen/PHIElimination.cpp
166

OK - moved it down.

177

Added comment.

What I think is missing here is a guard to avoid the Kills.front() deref when there are no kills (may happen when a phi is the last use).

Ah, right!

jonpa added a comment.Feb 4 2020, 5:51 AM

Ping!

This patch is very convincing to me since I saw just last week dozens and dozens of huge csmith programs suffering from significant slow-downs where PHI elimination was a big part of it. PHI Elim can take more than a minute without this patch, while with this patch all of those test cases do not spend much time in PHIElim anymore.

I have only tried this on SystemZ, but it really should be target independent. Who could try this on some other architecture as well (I could provide huge csmith programs if needed)?

bjope accepted this revision.Feb 5 2020, 1:02 PM

LGTM

This revision is now accepted and ready to land.Feb 5 2020, 1:02 PM
qcolombet accepted this revision.Feb 5 2020, 1:40 PM

LGTM with a nit. Use a reference instead of a pointer (or check the pointer and fallback to the old addNewBlock when the pointer is nullptr)

llvm/lib/CodeGen/LiveVariables.cpp
820

Shouldn't you check that LiveInSets is not nullptr?
Otherwise, use a reference in the API.

This revision was automatically updated to reflect the committed changes.
jonpa marked 2 inline comments as done.
jonpa added inline comments.Feb 5 2020, 3:14 PM
llvm/lib/CodeGen/LiveVariables.cpp
820

Changed the LiveInSets argument to LiveVariables::addNewBlock() to be a reference instead.