This is an archive of the discontinued LLVM Phabricator instance.

[arm] Fix Unnecessary reloads from GOT.
ClosedPublic

Authored by eugenis on Nov 9 2017, 2:23 PM.

Details

Summary

This fixes PR35221.
Use pseudo-instructions to let MachineCSE hoist global address computation.

Diff Detail

Repository
rL LLVM

Event Timeline

eugenis created this revision.Nov 9 2017, 2:23 PM
eugenis set the repository for this revision to rL LLVM.
eugenis added inline comments.Nov 9 2017, 2:32 PM
llvm/test/CodeGen/Thumb2/v8_IT_3.ll
62 ↗(On Diff #122333)

This new code is the load from %bb moved to the end of %entry under condition of %tmp1 == 1. This is made possible by this change, because %entry has address of G@GOT precomputed and the second load is just one instruction.

efriedma added inline comments.
llvm/lib/Target/ARM/ARMISelLowering.cpp
3168 ↗(On Diff #122333)

If I'm following correctly, whether "G" points to the global itself or the GOT entry is implicitly controlled by ARMSubtarget::getCPModifier? That's awfully confusing... can we fix it to use target flags passed to getTargetGlobalAddress() to control it instead?

eugenis added inline comments.Nov 9 2017, 3:26 PM
llvm/lib/Target/ARM/ARMISelLowering.cpp
3168 ↗(On Diff #122333)

Yes... That's not ideal.

To make sure we are on the same page: I could add something like ARMII::MO_GOT to TargetGlobalAddress flags, and use to differentiate between LDRLIT_ga_pcrel and MOV_ga_pcrel. This would let us use movt/movw for dso-local globals, yay!

I'm not sure how to access the flag in ARMExpandPseudoInsts, though. Is it copied to the operand's TargetFlags? I'll check.

efriedma added inline comments.Nov 9 2017, 4:12 PM
llvm/lib/Target/ARM/ARMISelLowering.cpp
3168 ↗(On Diff #122333)

Yes, that's what I was thinking.

I think the flag gets copied into TargetFlags. See AArch64II::MO_GOT for inspiration.

eugenis updated this revision to Diff 122373.Nov 9 2017, 4:52 PM

Add ARMII::MO_GOT.

I've switched to MO_GOT, and killed getCPModifier().

I did not figure out how to select between LDRLIT_ga_pcrel and MOV_ga_pcrel based on TargetFlags. Apparently, I need a ComplexPattern. Anyway, I think that it should be done in a separate patch.

This makes sense.

llvm/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h
233 ↗(On Diff #122373)

Missing doc comment.

eugenis updated this revision to Diff 122540.Nov 10 2017, 3:18 PM

Added a comment for MO_GOT.

eugenis marked an inline comment as done.Nov 10 2017, 3:18 PM
efriedma accepted this revision.Nov 10 2017, 3:27 PM

LGTM, assuming you've done appropriate testing.

llvm/test/CodeGen/ARM/GlobalISel/arm-select-globals-pic.mir
59 ↗(On Diff #122540)

It would be nice to implement getSerializableDirectMachineOperandTargetFlags() to make this clearer.

This revision is now accepted and ready to land.Nov 10 2017, 3:27 PM

I've done some testing. All sanitizer tests pass on arm/android, both with and without -asan-with-ifunc (which is what I'm doing this for).

Code size of libclang_rt.asan-arm-android.so is down by 3% (!). It is just a regular library (i.e. no asan instrumentation in it), so I assume that all other code is similarly improved.

Unfortunately, even with this change MachineCSE is not good enough to enable -asan-with-ifunc (it brings it from 15% code bloat to "only" 5%). For example, the following code generates a GOT load in each bb:

extern char x;
void bar();
void use(char *p);
void foo(int n) {
  if (n > 10) {
    use(&x);
  } else {
    bar();
    use(&x);
  }
}

Anyway, this change is a clear improvement.

Btw, none of the targets that I've looked at hoists the GOT load in the above example. It's just that on ARM it is particularly expensive.

Shared libraries in ContentShell.apk (chromium) are down by 0.8% in size (and the entire package is down by 0.45%) with this change.

Landing.

This revision was automatically updated to reflect the committed changes.