This is an archive of the discontinued LLVM Phabricator instance.

Optimization bisect support in X86-specific passes
ClosedPublic

Authored by andrew.w.kaylor on Apr 22 2016, 3:06 PM.

Details

Summary

This patch adds opt-in calls to the x86-specific passes which can be skipped while still producing correct code.

The following passes are not opting in to bisection because they cannot be skipped:

FPS ("X86 FP Stackifier")
X86ExpandPseudo
WinEHStatePass
CGBR ("X86 PIC Global Base Reg Initialization")

Diff Detail

Repository
rL LLVM

Event Timeline

andrew.w.kaylor retitled this revision from to Optimization bisect support in X86-specific passes.
andrew.w.kaylor updated this object.
andrew.w.kaylor added a reviewer: DavidKreitzer.
andrew.w.kaylor set the repository for this revision to rL LLVM.
andrew.w.kaylor added a subscriber: llvm-commits.
DavidKreitzer edited edge metadata.Apr 26 2016, 9:44 AM

My comments about vzeroupper make me wonder whether we want skipFunction to be able to make the distinction between skipping a pass for OptBisect and skipping a pass for -O0. I can certainly imagine wanting to run some functionally optional "optimization" passes at -O0, e.g. a pass that can improve the size/performance of the generated code w/o negatively affecting debuggability.

lib/Target/X86/X86FixupBWInsts.cpp
138 ↗(On Diff #54735)

I see that you chose to put the skipFunction calls first before other pass-skipping checks. Did you give any thought to that placement and/or do this intentionally? The effect of putting the call earlier is that the bisection counts will be higher.

For these skipFunction calls, it probably doesn't matter too much, but when we get around to adding shouldRunCase calls, we tend to want to delay the calls as long as possible to avoid wasting bisect numbers on optimization cases that get filtered out by subsequent safety/performance checks.

Along those lines, you might want to make LastBisectNum in OptBisect.h a 64-bit integer rather than a 32-bit integer. It would not surprise me if a 32-bit counter overflows for a large LTO compilation, especially if we are not careful with our calls to shouldRunCase.

lib/Target/X86/X86VZeroUpper.cpp
258 ↗(On Diff #54735)

skipFunction will return true at -O0, right? That is not the behavior we want for the vzeroupper insertion pass. Leaving the machine in a VEX-256 "dirty" state at function call/return boundaries is a very bad idea, because it will result in AVX<=>SSE transition penalties for any subsequent transitions until the next vzeroupper.

The vzeroupper strategy that we defined at the initial implementation of AVX was to assume and preserve a zeroupper state at function entry/call/return boundaries so that AVX128<=>SSE transitions would incur no penalties. Think of it as a kind of "performance ABI". Having just one call to one routine that violates these rules can crater performance for the lifetime of a program.

So perhaps the best approach is to just make VZeroUpperInserter a required pass and document the rationale. I would have no objection to hooking this pass up to the optimization bisector for debugging purposes since the pass is optional from a functional perspective. But that might cause more confusion than it's worth.

My comments about vzeroupper make me wonder whether we want skipFunction to be able to make the distinction between skipping a pass for OptBisect and skipping a pass for -O0. I can certainly imagine wanting to run some functionally optional "optimization" passes at -O0, e.g. a pass that can improve the size/performance of the generated code w/o negatively affecting debuggability.

I think you're right about having "optimization" passes that we want to run at -O0. I would expect that to be uncommon enough and limited enough in scope that it's probably OK not to involve those passes in the bisect, the benefit being that -O0, "optnone" and the bisect all follow the same rules. We could revisit that later if it turns out to limit the usefulness of the bisect.

lib/Target/X86/X86FixupBWInsts.cpp
138 ↗(On Diff #54735)

I started out always making the skip check be the first thing that happens so that it doesn't creep in to the logic of the run functions, but as I've been making more of these changes I've started putting it behind more trivial checks like this to trim the extra call in cases where bisection isn't being done. Given the logarithmic order of the bisection I wouldn't think it will make much difference at this level, but I suppose once we get into having all the cases instrumented those will add up quickly.

It's definitely best to be consistent and I don't have any reason not to put the skip check behind trivial checks like this, so I'll standardize on that.

lib/Target/X86/X86VZeroUpper.cpp
258 ↗(On Diff #54735)

It turns out that there is a lit test which verifies that nothing is skipped by the "optnone" attribute check that would otherwise be run at -O0 and I think it makes sense for opt bisect to follow the same rule. So given that we think VZeroUpper should be run at O0 I'll remove this check.

andrew.w.kaylor edited edge metadata.

Removed skip calls from passes that are run at -O0
Moved skip checks behind other single variable conditions

DavidKreitzer accepted this revision.Apr 26 2016, 1:55 PM
DavidKreitzer edited edge metadata.

Thanks for the fixes, Andy.

Hmmm, it was probably unintentional for X86CallFrameOptimization to be run at -O0. (It was a consequence of this change: http://reviews.llvm.org/D18573.) I doubt we want to do that, because it potentially has an adverse affect on debugging. Based on a quick experiment, it looks like the optimization fails to detect the pattern in the -O0 machine IR anyway, but that is probably also coincidental. At any rate, don't worry about it for this patch. I'm doing some work in X86CallFrameOptimization and can add the code to make sure it gets disabled at -O0.

So this patch LGTM.

This revision is now accepted and ready to land.Apr 26 2016, 1:55 PM
This revision was automatically updated to reflect the committed changes.