This is an archive of the discontinued LLVM Phabricator instance.

[BreakFalseDeps] ignore function with minsize attribute
ClosedPublic

Authored by spatel on Sep 9 2019, 12:02 PM.

Details

Summary

This came up in the x86-specific:
https://bugs.llvm.org/show_bug.cgi?id=43239
...but it seems like a general problem for the BreakFalseDeps pass. Dependencies are always broken by adding some other instruction, so that should be avoided if the overall goal is to minimize size.

Diff Detail

Event Timeline

spatel created this revision.Sep 9 2019, 12:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2019, 12:02 PM

This would make it policy for -Oz builds to not bother to break dependencies but -Os/-O0+ builds would still do.

Does anything else use BreakFalseDeps?

llvm/lib/CodeGen/BreakFalseDeps.cpp
15–16

NFC

This would make it policy for -Oz builds to not bother to break dependencies but -Os/-O0+ builds would still do.

Correct - that matches my interpretation of the vague clang docs:

-Os Like -O2 with extra optimizations to reduce code size.
-Oz Like -Os (and thus -O2), but reduces code size further.

In my experience, -Oz ("minsize") is used to ensure that code fits in some constrained space, so any extras that can be eliminated should be eliminated. Perf is a secondary concern. This is different than -Os ("optsize") where we are trying to balance speed and size.

Does anything else use BreakFalseDeps?

x86 and ARM are the only targets in trunk that use it from what I can see. Both use it the same way: insert dummy (not required for correctness) instructions to avoid known uarch hazards.

spatel updated this revision to Diff 219531.Sep 10 2019, 6:03 AM

Patch updated:
Pre-committed NFC code comment diffs in rL371516.

spatel marked an inline comment as done.Sep 10 2019, 6:03 AM

Adding some ARM folks since ARM is (the only) other in-tree target that would be affected by this change.
It's not strictly stated that dependency-breaking always means adding an instruction, but that’s the practical definition based on x86 and ARM's implementation of this pass's hooks.

For VEX instructions don’t we just use the other input register to break the dependency without adding an instruction?

For VEX instructions don’t we just use the other input register to break the dependency without adding an instruction?

That sounds right, but it's not controlled by this pass. That's part of memory op folding?
If I'm seeing it correctly, that's already more aggressively optimizing for size than what we're doing here:

// Avoid partial and undef register update stalls unless optimizing for size.
if (!MF.getFunction().hasOptSize() &&
    (hasPartialRegUpdate(MI.getOpcode(), Subtarget, /*ForLoadFold*/true) ||
     shouldPreventUndefRegUpdateMemFold(MF, MI)))
  return nullptr;
spatel updated this revision to Diff 219546.Sep 10 2019, 8:07 AM

Patch updated:
Rebased with improved test coverage for ARM and x86:
rL371526
rL371525

spatel marked an inline comment as done.Sep 10 2019, 8:12 AM
spatel added inline comments.
llvm/test/CodeGen/X86/sqrt-partial.ll
78

The AVX form of sqrtsd doesn't have a partial reg update, so no difference here.

For VEX instructions don’t we just use the other input register to break the dependency without adding an instruction?

That sounds right, but it's not controlled by this pass. That's part of memory op folding?
If I'm seeing it correctly, that's already more aggressively optimizing for size than what we're doing here:

// Avoid partial and undef register update stalls unless optimizing for size.
if (!MF.getFunction().hasOptSize() &&
    (hasPartialRegUpdate(MI.getOpcode(), Subtarget, /*ForLoadFold*/true) ||
     shouldPreventUndefRegUpdateMemFold(MF, MI)))
  return nullptr;

We disable folding if it takes away our opportunity to use the other input register. But something still needs to pick a register for the implicit_def operand. I though that was this pass? I think in absense of this pass it may just be set to the xmm0 by the register allocator. Do we have tests where xmm0 isn't the right choice?

I believe this test case compiled with avx needs this pass.

define double @minsize(double %x, double %y) minsize {

%t6 = tail call fast double @llvm.sqrt.f64(double %y)
%t= fadd fast double %t6, %y
ret double %t6

}
declare double @llvm.sqrt.f64(double)

I believe this test case compiled with avx needs this pass.

define double @minsize(double %x, double %y) minsize {

%t6 = tail call fast double @llvm.sqrt.f64(double %y)
%t= fadd fast double %t6, %y
ret double %t6

}
declare double @llvm.sqrt.f64(double)

Thanks! I didn't understand where that happened - it's in here (and no existing regression tests appear to capture that).
So yes, to limit this pass, we need to use a slightly finer check to distinguish between transforms that are only changing registers vs. adding instructions.
Added more tests with rL371528 and rL371551.

spatel updated this revision to Diff 219595.Sep 10 2019, 1:29 PM

Patch updated:
Refine the places where minsize comes into play (where the target is called to add an instruction via TII->breakPartialRegDependency()).

spatel marked an inline comment as done.Sep 10 2019, 1:31 PM
spatel added inline comments.
llvm/test/CodeGen/X86/sqrt-partial.ll
99

This is the test showing that the pass is still capable of changing a reg.

X86 looks good to me.

Ping - ARM ok?

efriedma added inline comments.Sep 19 2019, 4:15 PM
llvm/test/CodeGen/ARM/a15-partial-update.ll
59

There's a missed optimization here: we could avoid both the false dependence and the extra instruction by modifying the vld1 to a splat load. But generally looks fine.

spatel updated this revision to Diff 221094.Sep 20 2019, 12:35 PM

Patch updated:
Added a TODO comment about using a splat load on the ARM tests.

spatel accepted this revision.Sep 22 2019, 8:43 AM

Marking patch as accepted based on "looks good" and "looks fine". I'll hold off on commit for a ~day in case there are any more comments.

This revision is now accepted and ready to land.Sep 22 2019, 8:43 AM
RKSimon accepted this revision.Sep 22 2019, 9:46 AM

LGTM - maybe raise a bug about the missing splat loads in a15-partial-update.ll ?

LGTM - maybe raise a bug about the missing splat loads in a15-partial-update.ll ?

Sure - https://bugs.llvm.org/show_bug.cgi?id=43410

This revision was automatically updated to reflect the committed changes.