This fixes PR35221.
Use pseudo-instructions to let MachineCSE hoist global address computation.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
llvm/test/CodeGen/Thumb2/v8_IT_3.ll | ||
---|---|---|
62 ↗ | (On Diff #122333) | This new code is the load from %bb moved to the end of %entry under condition of %tmp1 == 1. This is made possible by this change, because %entry has address of G@GOT precomputed and the second load is just one instruction. |
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
3168 ↗ | (On Diff #122333) | If I'm following correctly, whether "G" points to the global itself or the GOT entry is implicitly controlled by ARMSubtarget::getCPModifier? That's awfully confusing... can we fix it to use target flags passed to getTargetGlobalAddress() to control it instead? |
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
3168 ↗ | (On Diff #122333) | Yes... That's not ideal. To make sure we are on the same page: I could add something like ARMII::MO_GOT to TargetGlobalAddress flags, and use to differentiate between LDRLIT_ga_pcrel and MOV_ga_pcrel. This would let us use movt/movw for dso-local globals, yay! I'm not sure how to access the flag in ARMExpandPseudoInsts, though. Is it copied to the operand's TargetFlags? I'll check. |
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
3168 ↗ | (On Diff #122333) | Yes, that's what I was thinking. I think the flag gets copied into TargetFlags. See AArch64II::MO_GOT for inspiration. |
I've switched to MO_GOT, and killed getCPModifier().
I did not figure out how to select between LDRLIT_ga_pcrel and MOV_ga_pcrel based on TargetFlags. Apparently, I need a ComplexPattern. Anyway, I think that it should be done in a separate patch.
This makes sense.
llvm/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h | ||
---|---|---|
233 ↗ | (On Diff #122373) | Missing doc comment. |
LGTM, assuming you've done appropriate testing.
llvm/test/CodeGen/ARM/GlobalISel/arm-select-globals-pic.mir | ||
---|---|---|
59 ↗ | (On Diff #122540) | It would be nice to implement getSerializableDirectMachineOperandTargetFlags() to make this clearer. |
I've done some testing. All sanitizer tests pass on arm/android, both with and without -asan-with-ifunc (which is what I'm doing this for).
Code size of libclang_rt.asan-arm-android.so is down by 3% (!). It is just a regular library (i.e. no asan instrumentation in it), so I assume that all other code is similarly improved.
Unfortunately, even with this change MachineCSE is not good enough to enable -asan-with-ifunc (it brings it from 15% code bloat to "only" 5%). For example, the following code generates a GOT load in each bb:
extern char x; void bar(); void use(char *p); void foo(int n) { if (n > 10) { use(&x); } else { bar(); use(&x); } }
Anyway, this change is a clear improvement.
Btw, none of the targets that I've looked at hoists the GOT load in the above example. It's just that on ARM it is particularly expensive.
Shared libraries in ContentShell.apk (chromium) are down by 0.8% in size (and the entire package is down by 0.45%) with this change.
Landing.