This is an archive of the discontinued LLVM Phabricator instance.

[IPRA][ARM] Spill extra registers at -Oz
ClosedPublic

Authored by ostannard on Nov 7 2019, 2:59 AM.

Details

Summary

When optimising for code size at the expense of performance, it is often
worth saving and restoring some of r0-r3, if IPRA will be able to take
advantage of them. This doesn't cost any extra code size if we already
have a PUSH/POP pair, and increases the number of available registers
across any calls to the function.

We already have an optimisation which tries fold the subtract/add of the
SP into the PUSH/POP by using extra registers, which somewhat conflicts
with this. I've made the new optimisation less aggressive in cases where
the existing one is likely to trigger, which gives better results than
either of these optimisations by themselves.

Diff Detail

Event Timeline

ostannard created this revision.Nov 7 2019, 2:59 AM

Looks reasonable (haven't looked at the ARM part.)
The change in shrink wrapping doesn't look correct to me though. Is that particular change require?

llvm/lib/CodeGen/MachineRegisterInfo.cpp
629–634

Could you change unsigned to Register here and in enableCalleeSavedRegister while you are at it?
Fine as a follow-up commit or a preparatory commit :).

llvm/lib/CodeGen/ShrinkWrap.cpp
170 ↗(On Diff #228199)

That change doesn't look right.

llvm/test/CodeGen/ARM/ipra-extra-spills-shrink-wrapping.mir
36 ↗(On Diff #228199)

You should be able to remove the section with the LLVM IR.
Just make sure to remove the LLVM IR basic block names in the names of the MachineBasicBlock.
E.g., bb.0.entry: => bb.0:

ostannard updated this revision to Diff 249633.Mar 11 2020, 8:27 AM
ostannard marked an inline comment as done.
  • Rebase
  • Use Register type instead of unsigned
  • Remove change in shrink wrapping. This was an attempt to avoid an issue where shrink wrapping causes determineCalleeSaves to get called twice, but since shrink wrapping is bad for code size on ARM, it's simpler to disable it at -Oz.
efriedma added inline comments.Mar 11 2020, 5:49 PM
llvm/lib/Target/ARM/ARMFrameLowering.cpp
1638

I'm not sure why you think funclets are relevant... we don't actually support generating exception-handling code on Windows for 32-bit ARM. Even if they were relevant, I'm not sure why they would need special treatment here.

ostannard marked an inline comment as done.Mar 12 2020, 2:41 AM
ostannard added inline comments.
llvm/lib/Target/ARM/ARMFrameLowering.cpp
1638

I think I must have cargo-culted this from somewhere else, I'll remove it.

ostannard updated this revision to Diff 249868.Mar 12 2020, 2:42 AM

How does this interact with a function using "invoke" to catch an exception? Do we refuse to do IPRA for invokes? Or do we do the right thing, somehow? Either way, I'd like to see a testcase.

Currently, IPRA does trigger on invokes, but only marks registers as preserved on the non-exceptional path. The live-ins of the landing pad block aren't changed, so the compiler won't try to rely on the values of r0-r3 there. I'll add some tests for this. This patch emits correct unwind info (already tested), so unwinding through these functions shouldn't cause any problems.

ostannard updated this revision to Diff 250176.Mar 13 2020, 4:23 AM
  • Add a test for how this interacts with C++ exception handling.
efriedma accepted this revision.Mar 13 2020, 3:04 PM

LGTM after you address the whitespace issues from the lint check

This revision is now accepted and ready to land.Mar 13 2020, 3:04 PM
This revision was automatically updated to reflect the committed changes.

Hi @ostannard ,

Unfortunately this change causes the following failure for a clang that's built with LLVM_EXPENSIVE_CHECKS=On when building compiler-rt for armv7k:

*** Bad machine code: Using an undefined physical register ***
- function:    _ZN11__sanitizer20InternalReallocArrayEPvmmPNS_30SizeClassAllocator32LocalCacheINS_20SizeClassAllocator32INS_4AP32EEEEE
- basic block: %bb.0 entry (0x7f8d6e025e68)
- instruction: $r3 = tMOVr killed $r12, 14, $noreg, debug-location !232; ../compiler-rt/lib/sanitizer_common/sanitizer_allocator.cpp:179:5
- operand 1:   killed $r12
fatal error: error in backend: Found 1 machine code errors.

I've attached an llvm-ir file

that reproduces the problem. You can invoke it like this (llc needs to be built with '-DLLVM_ENABLE_EXPENSIVE_CHECKS=ON'):

llc -mtriple thumbv7k-apple-ios9.0.0 < sanitizer_allocator-810340.ll

Could you please take a look to see what's gone wrong there?

Thanks for the reproducer, I'll look into this.

Hi @ostannard, can you temporarily revert the patch if it's going to take longer to fix the bug?

I think the failure was a bug in the machine verifier: D77783.