This is an archive of the discontinued LLVM Phabricator instance.

[ConstraintElim] Enable pass by default.
ClosedPublic

Authored by fhahn on Oct 13 2022, 1:53 PM.

Details

Summary

The pass should help to close a functional gap when it comes to
reasoning about related conditions in a relatively general way.

It addresses multiple existing issues (linked below) and the need for a
more powerful reasoning system was also discussed recently in
https://discourse.llvm.org/t/rfc-alternative-approach-of-dealing-with-implications-from-comparisons-through-pos-analysis/65601/7

On AArch64, the new pass performs ~2000 simplifications on
MultiSource,SPEC2006,SPEC2017 with -O3.

Compile-time impact:

NewPM-O3: +0.20%
NewPM-ReleaseThinLTO: +0.32%
NewPM-ReleaseLTO-g: +0.28%

https://llvm-compile-time-tracker.com/compare.php?from=f01a3a893c147c1594b9a3fbd817456b209dabbf&to=577688758ef64fb044215ec3e497ea901bb2db28&stat=instructions:u

Fixes #49344.
Fixes #47888.
Fixes #48253.
Fixes #49229.
Fixes #58074.

Diff Detail

Unit TestsFailed

Event Timeline

fhahn created this revision.Oct 13 2022, 1:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 1:53 PM
fhahn requested review of this revision.Oct 13 2022, 1:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 1:53 PM
asbirlea accepted this revision.Oct 14 2022, 11:52 AM

I cannot see any major benefits or major regressions in the limited testing I did so far. I have a concern on it pushing workloads with already high compile time over our limit.
Considering the new optimizations performed, I consider it a net improvement to the compiler, so I'm going to say let's move forward with this and see how it's received.

This revision is now accepted and ready to land.Oct 14 2022, 11:52 AM

As mentioned previously, I'm generally on board with this. I'll try to review the implementation sometime soon.

llvm/lib/Passes/PassBuilderPipelines.cpp
1647

Land this change separately?

Also, looking at the position of ConstraintElimination in the normal pipeline, did you experiment with different pipeline positions? The current scheduling looks too early to me, it runs directly after SROA/EarlyCSE at the start of the function simplification pipeline. I would expect that running ConstraintElimination at least after the first InstCombine pass would be beneficial, to ensure that IR is in canonical form.

Could you also mention this new otimization in release news?

xbolva00 added inline comments.Oct 16 2022, 3:20 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
1647

+1, I think that this position is not ideal, nikic’s suggestion sounds good

Thanks for taking a look! I'll probably be out of office for some or most of the next few weeks, so I'll definitely hold off with landing this until then. I am also keeping an eye out for more motivating uses cases that are not handled at the moment.

llvm/lib/Passes/PassBuilderPipelines.cpp
1647

The main reason for the current placement is to run before JumpThreading/SimplifyCFG, which I think removed some opportunities due to branch folding in earlier versions.

Moving after instcombine results in some small improvements and regressions in the end-to-end tests I have. I'll take another look.

khchen added a subscriber: khchen.Dec 12 2022, 8:25 PM
fhahn updated this revision to Diff 486011.Jan 3 2023, 9:13 AM

Rebase after splitting off pass position changes, update tests.

fhahn edited the summary of this revision. (Show Details)Jan 3 2023, 9:29 AM
fhahn added a comment.Jan 3 2023, 9:37 AM

I think all comments have been addressed and this is ready to land.

nikic added a comment.Jan 3 2023, 11:39 AM

One remaining question I have is whether this pass has any pathological cases we may need to limit. Say, if we have a generated function that contains a few thousand icmps, is this going to be handled gracefully (i.e., sub-quadratically)?

fhahn added a comment.Jan 3 2023, 2:32 PM

One remaining question I have is whether this pass has any pathological cases we may need to limit. Say, if we have a generated function that contains a few thousand icmps, is this going to be handled gracefully (i.e., sub-quadratically)?

I checked and I didn't add this limit there yet. Put up D140926.

This revision was landed with ongoing or failed builds.Jan 4 2023, 10:01 AM
This revision was automatically updated to reflect the committed changes.

This breaks the sanitizer-x86_64-linux bot: https://lab.llvm.org/buildbot/#/builders/37/builds/19310. Could you please either fix or revert?

Thanks!

fhahn added a comment.Jan 5 2023, 10:25 AM

This breaks the sanitizer-x86_64-linux bot: https://lab.llvm.org/buildbot/#/builders/37/builds/19310. Could you please either fix or revert?

Thanks!

Looks like the test contains a redundant condition which is now removed. Will update the test soon.

@fhahn We have cpp file where compilation with -fsanitize=undefined,address which regress ~1000% (or hangs, out build system kills after that). I will try to make reproducer, but letting you know if you have ideas.

fhahn added a comment.Jan 16 2023, 6:06 AM

@fhahn We have cpp file where compilation with -fsanitize=undefined,address which regress ~1000% (or hangs, out build system kills after that). I will try to make reproducer, but letting you know if you have ideas.

Interesting! Looking forward to a reproducer. Without one it is difficult to speculate what the issue could be.

vitalybuka added a comment.EditedJan 16 2023, 11:51 AM

Different ubsan checks have different effect on the pass, but looks like -fsanitize=pointer-overflow is most significant

With just only this check we see timing like this.

===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 195.4380 seconds (198.3854 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  93.1270 ( 48.2%)   0.7577 ( 32.9%)  93.8847 ( 48.0%)  95.2275 ( 48.0%)  ConstraintEliminationPass
  41.0259 ( 21.2%)   0.6802 ( 29.5%)  41.7061 ( 21.3%)  42.4373 ( 21.4%)  InstCombinePass
  13.8371 (  7.2%)   0.1043 (  4.5%)  13.9414 (  7.1%)  14.0983 (  7.1%)  IndVarSimplifyPass
   8.7696 (  4.5%)   0.1775 (  7.7%)   8.9471 (  4.6%)   9.0888 (  4.6%)  InlinerPass
   5.8936 (  3.1%)   0.0790 (  3.4%)   5.9726 (  3.1%)   6.0713 (  3.1%)  GVNPass

I extracted one of affected functions and running llvm-reduce on IR, I'll continue to reduce, but here intermediate result:

https://reviews.llvm.org/P8300
Run it:
opt -O2 -time-passes report.ll -o /dev/null

I see

===-------------------------------------------------------------------------===
                          Pass execution timing report
===-------------------------------------------------------------------------===
  Total Execution Time: 45.3980 seconds (46.6084 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  40.9242 ( 90.3%)   0.0355 ( 40.1%)  40.9597 ( 90.2%)  41.8877 ( 89.9%)  ConstraintEliminationPass
   2.6194 (  5.8%)   0.0080 (  9.0%)   2.6274 (  5.8%)   2.7320 (  5.9%)  InstCombinePass

From profile ConstraintSystem::eliminateUsingFM is top offender. NR.push_back(N); is executed very frequently, e.g. on the function from reproducer with ~70K IR instructions that push back called about 120M times.
Probably something is O(N^2), where N is number of blocks?

Does reverting makes sense before we have a fix?

Also regardless of the fix, should this be -O3 only?

I see no reason why this should be -O3 only. It is quite expected that some issues pop up but they all can be fixed.

Does reverting makes sense before we have a fix?

You can use the flag to disable this pass for now.

Reverting should be acceptable here.
AFAIU, the norm for work with high impact (adding a pass to the pipeline is one) is to do the work itself under a flag so the work itself does not get reverted if there are issues, and for the flag flip patch (such as this one) to be reverted if issues arise.
When the issue is resolved, the flag flip patch should be re-landed.
Regressing a test by ~1000%, seems like a solid reason to revert if a solution is not immediately available, especially with having a reproducer available that confirms the issue.

fhahn added a comment.Jan 17 2023, 3:42 PM

Reverting should be acceptable here.
AFAIU, the norm for work with high impact (adding a pass to the pipeline is one) is to do the work itself under a flag so the work itself does not get reverted if there are issues, and for the flag flip patch (such as this one) to be reverted if issues arise.
When the issue is resolved, the flag flip patch should be re-landed.
Regressing a test by ~1000%, seems like a solid reason to revert if a solution is not immediately available, especially with having a reproducer available that confirms the issue.

Agreed, I had an initial look and I don't think there will be an immediate fix. I'll revert the commit early tomorrow morning.

fhahn added a comment.Feb 2 2023, 9:00 AM

@vitalybuka I pushed a commit to improve performance for the test case you shared. Time spent in ConstraintElimination goes from 43s to ~3s with time spent in InstCombine being about 1.3s. Could you check your full reproduce if the big increase is fixed now with ConstraintElimination enabled?

fhahn added a comment.Feb 6 2023, 8:49 AM

I pushed another optimization that brings time spent in CE down to ~0.7s (vs 1.2s in InstCombine). I plan to re-land this soon.

SGTM, thank you Florian!

LGTM

-enable-constraint-elimination=0 (on HEAD)

Total Execution Time: 148.1427 seconds (153.6694 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  60.7388 ( 42.2%)   1.9181 ( 45.1%)  62.6569 ( 42.3%)  65.2129 ( 42.4%)  InstCombinePass
  15.4982 ( 10.8%)   0.3211 (  7.5%)  15.8193 ( 10.7%)  16.2964 ( 10.6%)  IndVarSimplifyPass
  14.5336 ( 10.1%)   0.4696 ( 11.0%)  15.0032 ( 10.1%)  15.5363 ( 10.1%)  InlinerPass
   8.4646 (  5.9%)   0.2521 (  5.9%)   8.7167 (  5.9%)   9.0442 (  5.9%)  GVNPass

-enable-constraint-elimination=1 (on HEAD with 1373f203cea8)

Total Execution Time: 178.4404 seconds (184.3066 wall clock)

 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
57.9196 ( 33.3%)   1.5966 ( 37.0%)  59.5161 ( 33.4%)  61.6656 ( 33.5%)  InstCombinePass
35.2887 ( 20.3%)   0.6996 ( 16.2%)  35.9883 ( 20.2%)  37.0314 ( 20.1%)  ConstraintEliminationPass
14.5506 (  8.4%)   0.2951 (  6.8%)  14.8457 (  8.3%)  15.3015 (  8.3%)  IndVarSimplifyPass
14.3476 (  8.2%)   0.4198 (  9.7%)  14.7674 (  8.3%)  15.2595 (  8.3%)  InlinerPass
 8.2571 (  4.7%)   0.2126 (  4.9%)   8.4697 (  4.7%)   8.7601 (  4.8%)  GVNPass

-enable-constraint-elimination=1 (on 7d3a181c8c with NO 1373f203cea8)

Total Execution Time: 919.3991 seconds (919.3875 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  806.2299 ( 87.8%)   0.3594 ( 22.6%)  806.5893 ( 87.7%)  806.6486 ( 87.7%)  ConstraintEliminationPass
  49.9876 (  5.4%)   0.5464 ( 34.3%)  50.5341 (  5.5%)  50.5068 (  5.5%)  InstCombinePass
  11.1020 (  1.2%)   0.1577 (  9.9%)  11.2596 (  1.2%)  11.2566 (  1.2%)  InlinerPass
  10.7445 (  1.2%)   0.0358 (  2.2%)  10.7803 (  1.2%)  10.7772 (  1.2%)  IndVarSimplifyPass
   6.2082 (  0.7%)   0.0497 (  3.1%)   6.2579 (  0.7%)   6.2542 (  0.7%)  GVNPass