Page MenuHomePhabricator

[SLH] AArch64: correctly pick temporary register to mask SP
ClosedPublic

Authored by kristof.beyls on Jan 15 2019, 6:06 AM.

Details

Summary

As part of speculation hardening, the stack pointer gets masked with the
taint register (X16) before a function call or before a function return.
Since there are no instructions that can directly mask writing to the
stack pointer, the stack pointer must first be transferred to another
register, where it can be masked, before that value is transferred back
to the stack pointer.
Before, that temporary register was always picked to be x17, since the
ABI allows clobbering x17 on any function call, resulting in the
following instruction pattern being inserted before function calls and
returns/tail calls:

mov x17, sp
and x17, x17, x16
mov sp, x17

However, x17 can be live in those locations, for example when the call
is an indirect call, using x17 as the target address (blr x17).

To fix this, this patch looks for an available register just before the
call or terminator instruction and uses that.

In the rare case when no register turns out to be available (this
situation is only encountered twice across the whole test-suite), just
insert a full speculation barrier at the start of the basic block where
this occurs.

Diff Detail

Repository
rL LLVM

Event Timeline

kristof.beyls created this revision.Jan 15 2019, 6:06 AM
olista01 requested changes to this revision.Jan 15 2019, 7:27 AM
olista01 added inline comments.
llvm/lib/Target/AArch64/AArch64SpeculationHardening.cpp
232 ↗(On Diff #181773)

I think this requires a return instruction, not just any terminator, because terminators could have other values live across them. I'm also not sure that it's correct for calls, which could have registers live across them, without using them directly.

Maybe we could use the RegScavenger here, which looks like it can also handle some more difficult cases by spilling a register to the stack?

252 ↗(On Diff #181773)

This can be hit by this code:

typedef int (*fptr_t)(void);
int foo(fptr_t f) {
  register fptr_t f2 asm("x30") = f;
  asm("" : "+r"(f2));
  return f2() + 1;
}

This uses inline assembly to force the function pointer into x30, but it is an allocatable register so I think it is possible for this to happen "naturally".

This revision now requires changes to proceed.Jan 15 2019, 7:27 AM

Adding Zola as she's been looking at ARM's SLH as well...

kristof.beyls marked 2 inline comments as done.Jan 16 2019, 5:01 AM
kristof.beyls added inline comments.
llvm/lib/Target/AArch64/AArch64SpeculationHardening.cpp
232 ↗(On Diff #181773)

Yeah, you're right probably better to reuse RegScavenger than trying to reinvent the wheel here.
I am not fully sure if introducing new loads and stores in this pass which aims to protect loads and stores is a good idea, will try to think about that.
Even if we can't find a free register without introducing spill/fills, in that case, we could still use the "emergency" ISB/DSB barrier to block speculation entirely in the affected basic block and not need to mask values in that basic block.
I'll experiment with such an approach.

252 ↗(On Diff #181773)

Thanks for that example. I didn't manage to come up with an example myself.
My understanding is that the llvm_unreachable is reached because the ABI rules on the indirect call are not modelled very accurately.
Some of the registers will be clobbered by the call, and could be used as temporary register, but it seems this isn't modelled on the BLR mir instruction.
Anyway, as I said above - probably I'll need to implement generation of the ISB/DSB full speculation barrier in such a rare case.

zbrid added inline comments.Jan 16 2019, 10:05 AM
llvm/lib/Target/AArch64/AArch64SpeculationHardening.cpp
253 ↗(On Diff #181773)

Since I don't know much about the ABI or the assumptions around calls, I had some quick questions about this for clarification. Can you be sure that this is unreachable because the way the calling convention is specified means there will always be at least one dead register before a call? For example, since there are several call clobbered registers specified in the calling convention even if x17 is used as an address for an indirect branch instruction (and is therefore live across the call), there's no way for all of the call clobbered registers to be live since there are no call instructions that could force all of them to be live?

kristof.beyls marked an inline comment as done.Jan 16 2019, 10:50 AM
kristof.beyls added inline comments.
llvm/lib/Target/AArch64/AArch64SpeculationHardening.cpp
253 ↗(On Diff #181773)

Yes, indeed, the assumption is that the calling convention will result in at least one register being "killed" during the call. Or at least not being guaranteed to retain the same value.
The ABI actually specifies that x17 can never be live across a call, which is why this was picked initially. However, as can be seen in the new test cases in this patch, x17 could still be legally be used as containing the address an indirect call has to branch to - making the register still live just before the call.

kristof.beyls edited the summary of this revision. (Show Details)

Updated approach to use the register scavenger to find an available temporary register. In the rare case when no temporary register is available, fall back to inserting a full speculation barrier.

kristof.beyls marked 5 inline comments as done.Jan 18 2019, 4:50 AM
kristof.beyls added inline comments.
llvm/lib/Target/AArch64/AArch64SpeculationHardening.cpp
252 ↗(On Diff #181773)

The example you provided is now (further simplified and as MIR) in one of the regression tests.

253 ↗(On Diff #181773)

The new version of the patch handles no temporary register being available by inserting a full speculation barrier.
Across all of the test suite, this case was encountered just twice.

olista01 accepted this revision.Jan 22 2019, 6:24 AM

LGTM with one nit.

llvm/lib/Target/AArch64/AArch64SpeculationHardening.cpp
307 ↗(On Diff #182498)

GPR64allRegClass includes SP and XZR, I don't think it actually matters here because neither of them are allocatable, but it would be clearer to use GPR64common.

This revision is now accepted and ready to land.Jan 22 2019, 6:24 AM
This revision was automatically updated to reflect the committed changes.
kristof.beyls marked 2 inline comments as done.Jan 23 2019, 12:36 AM
kristof.beyls added inline comments.
llvm/lib/Target/AArch64/AArch64SpeculationHardening.cpp
307 ↗(On Diff #182498)

Nice catch - thanks for noticing.
I've followed your suggestion in the committed code.