This is an archive of the discontinued LLVM Phabricator instance.

[ARM] implement LOAD_STACK_GUARD for remaining targets
ClosedPublic

Authored by ardb on Oct 29 2021, 5:34 AM.

Details

Summary

Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and
other targets rely on the generic support which may result in spilling of the
stack canary value or address, or may cause it to be kept in a callee save
register across function calls, which means they essentially get spilled as
well, only by the callee when it wants to free up this register.

So let's implement LOAD_STACK GUARD for other targets as well. This ensures
that the load of the stack canary is rematerialized fully in the epilogue.

This code was split off from

D112768: [ARM] implement support for TLS register based stack protector

for which it is a prerequisite.

Diff Detail

Event Timeline

ardb created this revision.Oct 29 2021, 5:34 AM
ardb requested review of this revision.Oct 29 2021, 5:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 29 2021, 5:34 AM
nickdesaulniers accepted this revision.Oct 29 2021, 2:31 PM

Changes LGTM; let's wait for @mstorsjo to triple check the COFF related changes, since I'm almost completely unaware of the details of that object file format. Thanks for the patch (and splitting it up, making it easier to review)!

llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
4904–4905

Other uses of ARMII::MO_COFFSTUB in llvm/lib/Target/ARM/ARMISelLowering.cpp seem to check TargetMachine::shouldAssumeDSOLocal returns false (and GlobalValue::hasDLLImportStorageClass() returns false, too) before setting this flag. Should we be doing so here as well?

To get a reference to the TargetMachine instance, I think we can do:

MachineFunction *MF = MBB.getParent();
ARMSubtarget &Subtarget = MF->getSubtarget<ARMSubtarget>();
TargetMachine *TM = SubTarget.getTargetLowering()->getTargetMachine();

@peter.smith anyone with more windows on ARM experience that might be able to help triple check this? Probably @mstorsjo ?

This revision is now accepted and ready to land.Oct 29 2021, 2:31 PM
llvm/lib/Target/ARM/Thumb2InstrInfo.cpp
250–259

Does Thumb1InstrInfo::expandLoadStackGuard need a similar change?

If so, then all the callers of ARMBaseInstrInfo::expandLoadStackGuardBase seem to also compute the same value of GV that is rematerialized in ARMBaseInstrInfo::expandLoadStackGuardBase. At that point, I'd say let's change the function signature of ARMBaseInstrInfo::expandLoadStackGuardBase to accept the GlobalValue* we calculated in the caller. Basically, let's DRY up const GlobalValue *GV = cast<GlobalValue>((*MI->memoperands_begin())->getValue()); which seems repeated in callers and callee.

Though if we don't need this change for thumb[1] then perhaps we don't need to do that (or could simply pass nullptr for that interface change.

ardb added inline comments.Oct 29 2021, 3:22 PM
llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
4904–4905

Other uses of ARMII::MO_COFFSTUB in llvm/lib/Target/ARM/ARMISelLowering.cpp seem to check TargetMachine::shouldAssumeDSOLocal returns false (and GlobalValue::hasDLLImportStorageClass() returns false, too) before setting this flag. Should we be doing so here as well?

!TargetMachine::shouldAssumeDSOLocal() is implied by isGVIndirectSymbol() so we already check this.

To get a reference to the TargetMachine instance, I think we can do:

MachineFunction *MF = MBB.getParent();
ARMSubtarget &Subtarget = MF->getSubtarget<ARMSubtarget>();
TargetMachine *TM = SubTarget.getTargetLowering()->getTargetMachine();

@peter.smith anyone with more windows on ARM experience that might be able to help triple check this? Probably @mstorsjo ?

llvm/lib/Target/ARM/Thumb2InstrInfo.cpp
250–259

No it does not.

Adding ostannard as a reviewer, I'm on vacation this week and Oliver knows this area better than I do.

Changes LGTM; let's wait for @mstorsjo to triple check the COFF related changes, since I'm almost completely unaware of the details of that object file format. Thanks for the patch (and splitting it up, making it easier to review)!

The change to the testcase looks correct given the description of the commit, and the reasoning around isGVIndirectSymbol seems correct too. Thanks!

This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Nov 20 2021, 12:47 AM

We bisected some Chromium test failures on Android, which points to this change (https://crbug.com/1270361). We'll investigate further, just wanted to give a heads up.

hans added a comment.Nov 29 2021, 11:29 AM

We bisected some Chromium test failures on Android, which points to this change (https://crbug.com/1270361). We'll investigate further, just wanted to give a heads up.

What we're hitting seems to be that the LOAD_STACK_GUARD pseudo-instruction can be register allocated to one of the high Thumb registers (in our case r12), but gets expanded using the tLDRLIT_ga_pcrel pseudo, which cannot take a high Thumb register as output.

It tries anyway, and generates the following textual asm:

ldr     r12, .LCPI0_0

but what it actually encodes is this:

4: 11 4c         ldr     r4, [pc, #68]

(Note the different register.)

I've uploaded a reproducer here: https://bugs.chromium.org/p/chromium/issues/detail?id=1270361#c30

I'm not sure what the best fix is. Maybe the register class for LOAD_STACK_GUARD could be restricted, or maybe tLDRLIT_ga_pcrel could be made to work with the high registers.

In any case, unless you can think of an immediate fix, I'd like to revert this in the meantime.

We bisected some Chromium test failures on Android, which points to this change (https://crbug.com/1270361). We'll investigate further, just wanted to give a heads up.

What we're hitting seems to be that the LOAD_STACK_GUARD pseudo-instruction can be register allocated to one of the high Thumb registers (in our case r12), but gets expanded using the tLDRLIT_ga_pcrel pseudo, which cannot take a high Thumb register as output.

It tries anyway, and generates the following textual asm:

ldr     r12, .LCPI0_0

but what it actually encodes is this:

4: 11 4c         ldr     r4, [pc, #68]

(Note the different register.)

I've uploaded a reproducer here: https://bugs.chromium.org/p/chromium/issues/detail?id=1270361#c30

Seems like LLVM is not choosing the wide encoding, whereas GAS is:

$ arm-linux-gnueabi-as foo.s
$ arm-linux-gnueabi-objdump -dr a.out
...
   4:   f8df c038       ldr.w   ip, [pc, #56]   ; 40 <_ZN2v88internal4wasm16LiftoffAssembler13emit_i32_addiENS0_8RegisterES3_i+0x40>
...

$ llc -filetype=obj -relocation-model=pic -o a.out a.ll
$ arm-linux-gnueabi-objdump -dr a.out
...
   4:   4c0d            ldr     r4, [pc, #52]   ; (3c <_ZN2v88internal4wasm16LiftoffAssembler13emit_i32_addiENS0_8RegisterES3_i+0x3c>)
...

I don't see anything in F5.1.73 LDR (literal) of the ARM ARM regarding the use of of ip/r12. GAS even errors if I try to force the narrow encoding with a .n suffix:

foo.s:33: Error: cannot honor width suffix -- `ldr.n r12,.LCPI0_0'

It looks like GAS doesn't support a dest/Rt greater than r7 for the narrow encoding? Do we not have a way to emit the wide encoding?

In any case, unless you can think of an immediate fix, I'd like to revert this in the meantime.

This patch has a child patch as well; D112768.

Follow up: https://reviews.llvm.org/D107872.