This is an archive of the discontinued LLVM Phabricator instance.

Require Dominator Tree For SROA, improve compile-time
ClosedPublic

Authored by mehdi_amini on Aug 22 2015, 10:26 PM.

Details

Summary

TL-DR: SROA is followed by EarlyCSE which requires the DominatorTree.
There is no reason not to require it up-front for SROA.

Some history is necessary to understand why we ended-up here.

r123437 switched the second (Legacy)SROA in the optimizer pipeline to
use SSAUpdater in order to avoid recomputing the costly
DominanceFrontier. The purpose was to speed-up the compile-time.

Later r123609 removed the need for the DominanceFrontier in
(Legacy)SROA.

Right after, some cleanup was made in r123724 to remove any reference
to the DominanceFrontier. SROA existed in two flavors: SROA_SSAUp and
SROA_DT (the latter replacing SROA_DF).
The second argument of createScalarReplAggregatesPass was renamed
from UseDomFrontier to UseDomTree.
I believe this is were a mistake was made. The pipeline was not
updated and the call site was still:

PM->add(createScalarReplAggregatesPass(-1, false));

At that time, SROA was immediately followed in the pipeline by
EarlyCSE which required alread the DominatorTree. Not requiring
the DominatorTree in SROA didn't save anything, but unfortunately
it was lost at this point.

When the new SROA Pass was introduced in r163965, I believe the goal
was to have an exact replacement of the existing SROA, this bug
slipped through.

You can see currently:

$ echo "" | clang -x c++ -O3 -c - -mllvm -debug-pass=Structure
...
...

FunctionPass Manager
  SROA
  Dominator Tree Construction
  Early CSE

After this patch:

$ echo "" | clang -x c++ -O3 -c - -mllvm -debug-pass=Structure
...
...

FunctionPass Manager
  Dominator Tree Construction
  SROA
  Early CSE

This improves the compile time from 88s to 23s for PR17855.

https://llvm.org/bugs/show_bug.cgi?id=17855

Diff Detail

Event Timeline

mehdi_amini retitled this revision from to Require Dominator Tree For SROA, improve compile-time.
mehdi_amini updated this object.
mehdi_amini added a reviewer: chandlerc.
mehdi_amini added a subscriber: llvm-commits.

Hi Mehdi,

Could you please explain why does it save compile time in the end? Why does SROA work slower when DT isn’t available? While your logic to construct DT before SROA seems legit, could that be that there is another bug in SROA itself (and we’re just papering it over now)?

Thanks,
Michael

As discussed on IRC: apparently it is a known limitation of SSAUpdater. It tries to compute dominance on the fly and does not cope well with large CFGs.
That said I'm looking independently if something trivial can be done there.

After this patch SROA will never be used with SSAUpdater in the current pipeline, but it is used by other places so we might encounter the same kind of issue with it in the future.

Chandler might comment here if it makes sense to keep the option to run SROA without the DominatorTree?

Ssaupdater is not a linear time algorithm, there one using domtree is

chandlerc accepted this revision.Aug 23 2015, 3:09 PM
chandlerc edited edge metadata.

I've been wanting to kill the SSAUpdater path in SROA since i started rewriting it. I could swear that when I was rewriting it, adding it here forced a second run of DomTree.... But your analysis is clearly correct, and we no longer need this. I'll proceed with ripping out all the complexity SSAUpdater imposes later. There are still more compile time wins when we *know* that we have domtree here.

This revision is now accepted and ready to land.Aug 23 2015, 3:09 PM