This is an archive of the discontinued LLVM Phabricator instance.

[LCSSA] Handle case with single new PHI faster.
ClosedPublic

Authored by fhahn on Jan 21 2019, 12:44 PM.

Details

Summary

If there is only a single available value, all uses must be dominated by
the single value and there is no need to search for a reaching
definition.

This drastically speeds up LCSSA in some cases. For the test case
from PR37202, it speeds up LCSSA construction by 4 times.

Time-passes without this patch for test case from PR37202:

Total Execution Time: 29.9285 seconds (29.9276 wall clock)

---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
5.2786 ( 17.7%)   0.0021 (  1.2%)   5.2806 ( 17.6%)   5.2808 ( 17.6%)  Unswitch loops
4.3739 ( 14.7%)   0.0303 ( 18.1%)   4.4042 ( 14.7%)   4.4042 ( 14.7%)  Loop-Closed SSA Form Pass
4.2658 ( 14.3%)   0.0192 ( 11.5%)   4.2850 ( 14.3%)   4.2851 ( 14.3%)  Loop-Closed SSA Form Pass #2
2.2307 (  7.5%)   0.0013 (  0.8%)   2.2320 (  7.5%)   2.2318 (  7.5%)  Loop Invariant Code Motion
2.0888 (  7.0%)   0.0012 (  0.7%)   2.0900 (  7.0%)   2.0897 (  7.0%)  Unroll loops
1.6761 (  5.6%)   0.0013 (  0.8%)   1.6774 (  5.6%)   1.6774 (  5.6%)  Value Propagation
1.3686 (  4.6%)   0.0029 (  1.8%)   1.3716 (  4.6%)   1.3714 (  4.6%)  Induction Variable Simplification
1.1457 (  3.8%)   0.0010 (  0.6%)   1.1468 (  3.8%)   1.1468 (  3.8%)  Loop-Closed SSA Form Pass #4
1.1384 (  3.8%)   0.0005 (  0.3%)   1.1389 (  3.8%)   1.1389 (  3.8%)  Loop-Closed SSA Form Pass #6
1.1360 (  3.8%)   0.0027 (  1.6%)   1.1387 (  3.8%)   1.1387 (  3.8%)  Loop-Closed SSA Form Pass #5
1.1331 (  3.8%)   0.0010 (  0.6%)   1.1341 (  3.8%)   1.1340 (  3.8%)  Loop-Closed SSA Form Pass #3

Time passes with this patch

Total Execution Time: 19.2802 seconds (19.2813 wall clock)

 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
 4.4234 ( 23.2%)   0.0038 (  2.0%)   4.4272 ( 23.0%)   4.4273 ( 23.0%)  Unswitch loops
 2.3828 ( 12.5%)   0.0020 (  1.1%)   2.3848 ( 12.4%)   2.3847 ( 12.4%)  Unroll loops
 1.8714 (  9.8%)   0.0020 (  1.1%)   1.8734 (  9.7%)   1.8735 (  9.7%)  Loop Invariant Code Motion
 1.7973 (  9.4%)   0.0022 (  1.2%)   1.7995 (  9.3%)   1.8003 (  9.3%)  Value Propagation
 1.4010 (  7.3%)   0.0033 (  1.8%)   1.4043 (  7.3%)   1.4044 (  7.3%)  Induction Variable Simplification
 0.9978 (  5.2%)   0.0244 ( 13.1%)   1.0222 (  5.3%)   1.0224 (  5.3%)  Loop-Closed SSA Form Pass #2
 0.9611 (  5.0%)   0.0257 ( 13.8%)   0.9868 (  5.1%)   0.9868 (  5.1%)  Loop-Closed SSA Form Pass
 0.5856 (  3.1%)   0.0015 (  0.8%)   0.5871 (  3.0%)   0.5869 (  3.0%)  Unroll loops #2
 0.4132 (  2.2%)   0.0012 (  0.7%)   0.4145 (  2.1%)   0.4143 (  2.1%)  Loop Invariant Code Motion #3

Diff Detail

Event Timeline

fhahn created this revision.Jan 21 2019, 12:44 PM

SSAUpdater has code to insert undef in certain cases. Can you prove those cases don't apply to all callers of GetValueAtEndOfBlock?

SSAUpdater has code to insert undef in certain cases. Can you prove those cases don't apply to all callers of GetValueAtEndOfBlock?

SSAUpdater has code to insert undef in certain cases. Can you prove those cases don't apply to all callers of GetValueAtEndOfBlock?

IIUC, SSAUpdater insert undefs for uses in unreachable blocks. As it is, this patch would rename uses in unreachable block to use the lcssa PHI node instead of undef. It should be straight forward to do such checks in LCSSA (DT is available), so I think it is easier to move the code to LCSSA. What do you think?

If LCSSA just doesn't care that it's creating dead references to variables instead of undefs, it would be faster to completely skip the check. Otherwise, sure, that makes sense.

fhahn added a comment.Jan 22 2019, 1:12 PM

If LCSSA just doesn't care that it's creating dead references to variables instead of undefs, it would be faster to completely skip the check. Otherwise, sure, that makes sense.

LCSSA doesn't care about it, but I thought it might be better to prune the use list?

That's sort of out of scope for LCSSA, I think. I mean, yes, the unreachable check is cheap, but if it comes up in practice we should just fix the pass pipeline to prune dead code more aggressively.

fhahn added a comment.Jan 22 2019, 1:19 PM

Sounds good to me, I'll move the code to LCSSA without DT checks. Thanks!

fhahn updated this revision to Diff 183377.Jan 24 2019, 1:09 PM
fhahn retitled this revision from [SSAUpdater] Handle case with single available value faster. to [LCSSA] Handle case with single new PHI faster..
fhahn edited the summary of this revision. (Show Details)

Moved change to LCSSA. Also shortens the runtime a tiny bit more.

efriedma added inline comments.Jan 24 2019, 1:50 PM
llvm/lib/Transforms/Utils/LCSSA.cpp
205 ↗(On Diff #183377)

Maybe worth mentioning that this can call ValueIsRAUWd multiple times for the same value. (I don't think it's a problem, but took me a minute to understand what this is doing.)

226 ↗(On Diff #183377)

I'm not sure I understand why you're changing the logic here for the AddedPHIs.size() != 1 case.

fhahn marked 2 inline comments as done.Jan 24 2019, 1:54 PM
fhahn added inline comments.
llvm/lib/Transforms/Utils/LCSSA.cpp
205 ↗(On Diff #183377)

Yep, I can also move it to avoid calling it multiple times.

226 ↗(On Diff #183377)

In the AddedPHIs.size() == 1 case, SSAUpdate won't be used to rewrite the uses, so we can just use the only inserted PHI.

efriedma added inline comments.Jan 24 2019, 1:58 PM
llvm/lib/Transforms/Utils/LCSSA.cpp
226 ↗(On Diff #183377)

Let me rephrase: the setOperand() call used to be guarded by an if statement. Why are you making the call unconditional?

fhahn updated this revision to Diff 183408.Jan 24 2019, 2:53 PM
fhahn marked 2 inline comments as done.

Updated comment, added back conditional.

llvm/lib/Transforms/Utils/LCSSA.cpp
226 ↗(On Diff #183377)

Ah sorry, I missed your point!

I was just going by the comment, which I think is not completely true. We currently handle debug values residing in blocks that are part of the subset traversed from renamed uses to defs (those traversed by SSAUpdater while rewriting). But debug values could also be used outside this subset, so we need the conditional. I've updated the comment and added it back.

fhahn marked an inline comment as done.Jan 24 2019, 3:03 PM
fhahn added inline comments.
llvm/lib/Transforms/Utils/LCSSA.cpp
226 ↗(On Diff #183377)

Oh and I'll add a test case illustrating that (seems to be missing ATM tomorrow ); it's too late on my side of the planet :)

fhahn updated this revision to Diff 183503.Jan 25 2019, 2:47 AM

Update LCSSA llvm.dbg.value test case with 2 additional interesting cases: a llvm.dbg.value call in a block without a placed PHI, but a use re-written by SSAUpdater and a llvm.dbg.value call we cannot rewrite, because it was not visited by SSUpdater (because it is 'after' all rewritten uses).

fhahn added a comment.Jan 25 2019, 2:48 AM

(I also restructured the test case to make it a bit easier to read, I could move that to a separate commit, if required)

fhahn added a comment.Feb 1 2019, 6:13 AM

ping. Eli, does the latest update address your comments?

This revision is now accepted and ready to land.Feb 1 2019, 12:04 PM
fhahn added a comment.Feb 1 2019, 3:18 PM

Thank you very much for the reviews :)

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2019, 7:26 AM