Index: cfe/trunk/docs/ShadowCallStack.rst =================================================================== --- cfe/trunk/docs/ShadowCallStack.rst +++ cfe/trunk/docs/ShadowCallStack.rst @@ -9,11 +9,11 @@ ============ ShadowCallStack is an **experimental** instrumentation pass, currently only -implemented for x86_64, that protects programs against return address -overwrites (e.g. stack buffer overflows.) It works by saving a function's return -address to a separately allocated 'shadow call stack' in the function prolog and -checking the return address on the stack against the shadow call stack in the -function epilog. +implemented for x86_64 and aarch64, that protects programs against return +address overwrites (e.g. stack buffer overflows.) It works by saving a +function's return address to a separately allocated 'shadow call stack' +in the function prolog and checking the return address on the stack against +the shadow call stack in the function epilog. Comparison ---------- @@ -37,8 +37,16 @@ Compatibility ------------- -ShadowCallStack currently only supports x86_64. A runtime is not currently -provided in compiler-rt so one must be provided by the compiled application. +ShadowCallStack currently only supports x86_64 and aarch64. A runtime is not +currently provided in compiler-rt so one must be provided by the compiled +application. + +On aarch64, the instrumentation makes use of the platform register ``x18``. +On some platforms, ``x18`` is reserved, and on others, it is designated as +a scratch register. This generally means that any code that may run on the +same thread as code compiled with ShadowCallStack must either target one +of the platforms whose ABI reserves ``x18`` (currently Darwin, Fuchsia and +Windows) or be compiled with the flag ``-ffixed-x18``. Security ======== @@ -56,28 +64,37 @@ semantics to fix this on x86_64 would incur an unacceptable performance overhead due to return branch prediction. -The instrumentation makes use of the ``gs`` segment register to reference the -shadow call stack meaning that references to the shadow call stack do not have -to be stored in memory. This makes it possible to implement a runtime that -avoids exposing the address of the shadow call stack to attackers that can read -arbitrary memory. However, attackers could still try to exploit side channels -exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_ to discover -the address of the shadow call stack. +The instrumentation makes use of the ``gs`` segment register on x86_64, +or the ``x18`` register on aarch64, to reference the shadow call stack +meaning that references to the shadow call stack do not have to be stored in +memory. This makes it possible to implement a runtime that avoids exposing +the address of the shadow call stack to attackers that can read arbitrary +memory. However, attackers could still try to exploit side channels exposed +by the operating system `[1]`_ `[2]`_ or processor `[3]`_ to discover the +address of the shadow call stack. .. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/ .. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf .. _`[3]`: https://www.vusec.net/projects/anc/ -Leaf functions are optimized to store the return address in a free register -and avoid writing to the shadow call stack if a register is available. Very -short leaf functions are uninstrumented if their execution is judged to be -shorter than the race condition window intrinsic to the instrumentation. +On x86_64, leaf functions are optimized to store the return address in a +free register and avoid writing to the shadow call stack if a register is +available. Very short leaf functions are uninstrumented if their execution +is judged to be shorter than the race condition window intrinsic to the +instrumentation. + +On aarch64, the architecture's call and return instructions (``bl`` and +``ret``) operate on a register rather than the stack, which means that +leaf functions are generally protected from return address overwrites even +without ShadowCallStack. It also means that ShadowCallStack on aarch64 is not +vulnerable to the same types of time-of-check-to-time-of-use races as x86_64. Usage ===== -To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack`` flag -to both compile and link command lines. +To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack`` +flag to both compile and link command lines. On aarch64, you also need to pass +``-ffixed-x18`` unless your target already reserves ``x18``. Low-level API ------------- @@ -125,7 +142,20 @@ pop %rcx retq -Adding ``-fsanitize=shadow-call-stack`` would output the following: +or the following aarch64 assembly: + +.. code-block:: none + + stp x29, x30, [sp, #-16]! + mov x29, sp + bl bar + add w0, w0, #1 + ldp x29, x30, [sp], #16 + ret + + +Adding ``-fsanitize=shadow-call-stack`` would output the following x86_64 +assembly: .. code-block:: gas @@ -148,3 +178,16 @@ trap: ud2 + +or the following aarch64 assembly: + +.. code-block:: none + + str x30, [x18], #8 + stp x29, x30, [sp, #-16]! + mov x29, sp + bl bar + add w0, w0, #1 + ldp x29, x30, [sp], #16 + ldr x30, [x18, #-8]! + ret Index: cfe/trunk/lib/Driver/SanitizerArgs.cpp =================================================================== --- cfe/trunk/lib/Driver/SanitizerArgs.cpp +++ cfe/trunk/lib/Driver/SanitizerArgs.cpp @@ -18,6 +18,7 @@ #include "llvm/Support/FileSystem.h" #include "llvm/Support/Path.h" #include "llvm/Support/SpecialCaseList.h" +#include "llvm/Support/TargetParser.h" #include using namespace clang; @@ -375,6 +376,15 @@ << lastArgumentForMask(D, Args, Kinds & NeedsLTO) << "-flto"; } + if ((Kinds & ShadowCallStack) && + TC.getTriple().getArch() == llvm::Triple::aarch64 && + !llvm::AArch64::isX18ReservedByDefault(TC.getTriple()) && + !Args.hasArg(options::OPT_ffixed_x18)) { + D.Diag(diag::err_drv_argument_only_allowed_with) + << lastArgumentForMask(D, Args, Kinds & ShadowCallStack) + << "-ffixed-x18"; + } + // Report error if there are non-trapping sanitizers that require // c++abi-specific parts of UBSan runtime, and they are not provided by the // toolchain. We don't have a good way to check the latter, so we just Index: cfe/trunk/lib/Driver/ToolChain.cpp =================================================================== --- cfe/trunk/lib/Driver/ToolChain.cpp +++ cfe/trunk/lib/Driver/ToolChain.cpp @@ -814,7 +814,8 @@ getTriple().getArch() == llvm::Triple::wasm32 || getTriple().getArch() == llvm::Triple::wasm64) Res |= CFIICall; - if (getTriple().getArch() == llvm::Triple::x86_64) + if (getTriple().getArch() == llvm::Triple::x86_64 || + getTriple().getArch() == llvm::Triple::aarch64) Res |= ShadowCallStack; return Res; } Index: cfe/trunk/test/Driver/sanitizer-ld.c =================================================================== --- cfe/trunk/test/Driver/sanitizer-ld.c +++ cfe/trunk/test/Driver/sanitizer-ld.c @@ -563,6 +563,19 @@ // CHECK-SHADOWCALLSTACK-LINUX-X86-64-NOT: error: // RUN: %clang -fsanitize=shadow-call-stack %s -### -o %t.o 2>&1 \ +// RUN: -target aarch64-unknown-linux -fuse-ld=ld \ +// RUN: | FileCheck --check-prefix=CHECK-SHADOWCALLSTACK-LINUX-AARCH64 %s +// CHECK-SHADOWCALLSTACK-LINUX-AARCH64: '-fsanitize=shadow-call-stack' only allowed with '-ffixed-x18' + +// RUN: %clang -fsanitize=shadow-call-stack %s -### -o %t.o 2>&1 \ +// RUN: -target aarch64-unknown-linux -fuse-ld=ld -ffixed-x18 \ +// RUN: | FileCheck --check-prefix=CHECK-SHADOWCALLSTACK-LINUX-AARCH64-X18 %s +// RUN: %clang -fsanitize=shadow-call-stack %s -### -o %t.o 2>&1 \ +// RUN: -target arm64-unknown-ios -fuse-ld=ld \ +// RUN: | FileCheck --check-prefix=CHECK-SHADOWCALLSTACK-LINUX-AARCH64-X18 %s +// CHECK-SHADOWCALLSTACK-LINUX-AARCH64-X18-NOT: error: + +// RUN: %clang -fsanitize=shadow-call-stack %s -### -o %t.o 2>&1 \ // RUN: -target x86-unknown-linux -fuse-ld=ld \ // RUN: | FileCheck --check-prefix=CHECK-SHADOWCALLSTACK-LINUX-X86 %s // CHECK-SHADOWCALLSTACK-LINUX-X86: error: unsupported option '-fsanitize=shadow-call-stack' for target 'x86-unknown-linux' Index: llvm/trunk/include/llvm/Support/TargetParser.h =================================================================== --- llvm/trunk/include/llvm/Support/TargetParser.h +++ llvm/trunk/include/llvm/Support/TargetParser.h @@ -212,6 +212,8 @@ ARM::ProfileKind parseArchProfile(StringRef Arch); unsigned parseArchVersion(StringRef Arch); +bool isX18ReservedByDefault(const Triple &TT); + } // namespace AArch64 namespace X86 { Index: llvm/trunk/lib/Support/TargetParser.cpp =================================================================== --- llvm/trunk/lib/Support/TargetParser.cpp +++ llvm/trunk/lib/Support/TargetParser.cpp @@ -917,3 +917,7 @@ unsigned llvm::AArch64::parseArchVersion(StringRef Arch) { return ARM::parseArchVersion(Arch); } + +bool llvm::AArch64::isX18ReservedByDefault(const Triple &TT) { + return TT.isOSDarwin() || TT.isOSFuchsia() || TT.isOSWindows(); +} Index: llvm/trunk/lib/Target/AArch64/AArch64CallingConvention.td =================================================================== --- llvm/trunk/lib/Target/AArch64/AArch64CallingConvention.td +++ llvm/trunk/lib/Target/AArch64/AArch64CallingConvention.td @@ -349,3 +349,18 @@ : CalleeSavedRegs<(add (sequence "X%u", 0, 15), (sequence "X%u", 18, 28), FP, SP, (sequence "Q%u", 0, 31))>; + +// Variants of the standard calling conventions for shadow call stack. +// These all preserve x18 in addition to any other registers. +def CSR_AArch64_NoRegs_SCS + : CalleeSavedRegs<(add CSR_AArch64_NoRegs, X18)>; +def CSR_AArch64_AllRegs_SCS + : CalleeSavedRegs<(add CSR_AArch64_AllRegs, X18)>; +def CSR_AArch64_CXX_TLS_Darwin_SCS + : CalleeSavedRegs<(add CSR_AArch64_CXX_TLS_Darwin, X18)>; +def CSR_AArch64_AAPCS_SwiftError_SCS + : CalleeSavedRegs<(add CSR_AArch64_AAPCS_SwiftError, X18)>; +def CSR_AArch64_RT_MostRegs_SCS + : CalleeSavedRegs<(add CSR_AArch64_RT_MostRegs, X18)>; +def CSR_AArch64_AAPCS_SCS + : CalleeSavedRegs<(add CSR_AArch64_AAPCS, X18)>; Index: llvm/trunk/lib/Target/AArch64/AArch64FrameLowering.cpp =================================================================== --- llvm/trunk/lib/Target/AArch64/AArch64FrameLowering.cpp +++ llvm/trunk/lib/Target/AArch64/AArch64FrameLowering.cpp @@ -414,6 +414,14 @@ static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec( MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc) { + // Ignore instructions that do not operate on SP, i.e. shadow call stack + // instructions. + while (MBBI->getOpcode() == AArch64::STRXpost || + MBBI->getOpcode() == AArch64::LDRXpre) { + assert(MBBI->getOperand(0).getReg() != AArch64::SP); + ++MBBI; + } + unsigned NewOpc; bool NewIsUnscaled = false; switch (MBBI->getOpcode()) { @@ -481,6 +489,14 @@ static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, unsigned LocalStackSize) { unsigned Opc = MI.getOpcode(); + + // Ignore instructions that do not operate on SP, i.e. shadow call stack + // instructions. + if (Opc == AArch64::STRXpost || Opc == AArch64::LDRXpre) { + assert(MI.getOperand(0).getReg() != AArch64::SP); + return; + } + (void)Opc; assert((Opc == AArch64::STPXi || Opc == AArch64::STPDi || Opc == AArch64::STRXui || Opc == AArch64::STRDui || @@ -935,6 +951,18 @@ // assumes the SP is at the same location as it was after the callee-save save // code in the prologue. if (AfterCSRPopSize) { + // Find an insertion point for the first ldp so that it goes before the + // shadow call stack epilog instruction. This ensures that the restore of + // lr from x18 is placed after the restore from sp. + auto FirstSPPopI = MBB.getFirstTerminator(); + while (FirstSPPopI != Begin) { + auto Prev = std::prev(FirstSPPopI); + if (Prev->getOpcode() != AArch64::LDRXpre || + Prev->getOperand(0).getReg() == AArch64::SP) + break; + FirstSPPopI = Prev; + } + // Sometimes (when we restore in the same order as we save), we can end up // with code like this: // @@ -949,7 +977,7 @@ // a post-index ldp. // If we managed to grab the first pop instruction, move it to the end. if (LastPopI != Begin) - MBB.splice(MBB.getFirstTerminator(), &MBB, LastPopI); + MBB.splice(FirstSPPopI, &MBB, LastPopI); // We should end up with something like this now: // // ldp x24, x23, [sp, #16] @@ -962,7 +990,7 @@ // // ldp x26, x25, [sp], #64 // - emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP, + emitFrameOffset(MBB, FirstSPPopI, DL, AArch64::SP, AArch64::SP, AfterCSRPopSize, TII, MachineInstr::FrameDestroy); } } @@ -1081,7 +1109,8 @@ static void computeCalleeSaveRegisterPairs( MachineFunction &MF, const std::vector &CSI, - const TargetRegisterInfo *TRI, SmallVectorImpl &RegPairs) { + const TargetRegisterInfo *TRI, SmallVectorImpl &RegPairs, + bool &NeedShadowCallStackProlog) { if (CSI.empty()) return; @@ -1115,6 +1144,15 @@ RPI.Reg2 = NextReg; } + // If either of the registers to be saved is the lr register, it means that + // we also need to save lr in the shadow call stack. + if ((RPI.Reg1 == AArch64::LR || RPI.Reg2 == AArch64::LR) && + MF.getFunction().hasFnAttribute(Attribute::ShadowCallStack)) { + if (!MF.getSubtarget().isX18Reserved()) + report_fatal_error("Must reserve x18 to use shadow call stack"); + NeedShadowCallStackProlog = true; + } + // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI // list to come in sorted by frame index so that we can issue the store // pair instructions directly. Assert if we see anything otherwise. @@ -1165,9 +1203,24 @@ DebugLoc DL; SmallVector RegPairs; - computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs); + bool NeedShadowCallStackProlog = false; + computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, + NeedShadowCallStackProlog); const MachineRegisterInfo &MRI = MF.getRegInfo(); + if (NeedShadowCallStackProlog) { + // Shadow call stack prolog: str x30, [x18], #8 + BuildMI(MBB, MI, DL, TII.get(AArch64::STRXpost)) + .addReg(AArch64::X18, RegState::Define) + .addReg(AArch64::LR) + .addReg(AArch64::X18) + .addImm(8) + .setMIFlag(MachineInstr::FrameSetup); + + // This instruction also makes x18 live-in to the entry block. + MBB.addLiveIn(AArch64::X18); + } + for (auto RPII = RegPairs.rbegin(), RPIE = RegPairs.rend(); RPII != RPIE; ++RPII) { RegPairInfo RPI = *RPII; @@ -1231,7 +1284,9 @@ if (MI != MBB.end()) DL = MI->getDebugLoc(); - computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs); + bool NeedShadowCallStackProlog = false; + computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, + NeedShadowCallStackProlog); auto EmitMI = [&](const RegPairInfo &RPI) { unsigned Reg1 = RPI.Reg1; @@ -1280,6 +1335,17 @@ else for (const RegPairInfo &RPI : RegPairs) EmitMI(RPI); + + if (NeedShadowCallStackProlog) { + // Shadow call stack epilog: ldr x30, [x18, #-8]! + BuildMI(MBB, MI, DL, TII.get(AArch64::LDRXpre)) + .addReg(AArch64::X18, RegState::Define) + .addReg(AArch64::LR, RegState::Define) + .addReg(AArch64::X18) + .addImm(-8) + .setMIFlag(MachineInstr::FrameDestroy); + } + return true; } Index: llvm/trunk/lib/Target/AArch64/AArch64RegisterInfo.cpp =================================================================== --- llvm/trunk/lib/Target/AArch64/AArch64RegisterInfo.cpp +++ llvm/trunk/lib/Target/AArch64/AArch64RegisterInfo.cpp @@ -75,21 +75,25 @@ const uint32_t * AArch64RegisterInfo::getCallPreservedMask(const MachineFunction &MF, CallingConv::ID CC) const { + bool SCS = MF.getFunction().hasFnAttribute(Attribute::ShadowCallStack); if (CC == CallingConv::GHC) // This is academic because all GHC calls are (supposed to be) tail calls - return CSR_AArch64_NoRegs_RegMask; + return SCS ? CSR_AArch64_NoRegs_SCS_RegMask : CSR_AArch64_NoRegs_RegMask; if (CC == CallingConv::AnyReg) - return CSR_AArch64_AllRegs_RegMask; + return SCS ? CSR_AArch64_AllRegs_SCS_RegMask : CSR_AArch64_AllRegs_RegMask; if (CC == CallingConv::CXX_FAST_TLS) - return CSR_AArch64_CXX_TLS_Darwin_RegMask; + return SCS ? CSR_AArch64_CXX_TLS_Darwin_SCS_RegMask + : CSR_AArch64_CXX_TLS_Darwin_RegMask; if (MF.getSubtarget().getTargetLowering() ->supportSwiftError() && MF.getFunction().getAttributes().hasAttrSomewhere(Attribute::SwiftError)) - return CSR_AArch64_AAPCS_SwiftError_RegMask; + return SCS ? CSR_AArch64_AAPCS_SwiftError_SCS_RegMask + : CSR_AArch64_AAPCS_SwiftError_RegMask; if (CC == CallingConv::PreserveMost) - return CSR_AArch64_RT_MostRegs_RegMask; + return SCS ? CSR_AArch64_RT_MostRegs_SCS_RegMask + : CSR_AArch64_RT_MostRegs_RegMask; else - return CSR_AArch64_AAPCS_RegMask; + return SCS ? CSR_AArch64_AAPCS_SCS_RegMask : CSR_AArch64_AAPCS_RegMask; } const uint32_t *AArch64RegisterInfo::getTLSCallPreservedMask() const { Index: llvm/trunk/lib/Target/AArch64/AArch64Subtarget.cpp =================================================================== --- llvm/trunk/lib/Target/AArch64/AArch64Subtarget.cpp +++ llvm/trunk/lib/Target/AArch64/AArch64Subtarget.cpp @@ -24,6 +24,7 @@ #include "llvm/CodeGen/GlobalISel/InstructionSelect.h" #include "llvm/CodeGen/MachineScheduler.h" #include "llvm/IR/GlobalValue.h" +#include "llvm/Support/TargetParser.h" using namespace llvm; @@ -151,8 +152,8 @@ const std::string &FS, const TargetMachine &TM, bool LittleEndian) : AArch64GenSubtargetInfo(TT, CPU, FS), - ReserveX18(TT.isOSDarwin() || TT.isOSFuchsia() || TT.isOSWindows()), - IsLittle(LittleEndian), TargetTriple(TT), FrameLowering(), + ReserveX18(AArch64::isX18ReservedByDefault(TT)), IsLittle(LittleEndian), + TargetTriple(TT), FrameLowering(), InstrInfo(initializeSubtargetDependencies(FS, CPU)), TSInfo(), TLInfo(TM, *this) { CallLoweringInfo.reset(new AArch64CallLowering(*getTargetLowering())); Index: llvm/trunk/test/CodeGen/AArch64/shadow-call-stack.ll =================================================================== --- llvm/trunk/test/CodeGen/AArch64/shadow-call-stack.ll +++ llvm/trunk/test/CodeGen/AArch64/shadow-call-stack.ll @@ -0,0 +1,47 @@ +; RUN: llc -verify-machineinstrs -o - %s -mtriple=aarch64-linux-gnu -mattr=+reserve-x18 | FileCheck %s + +define void @f1() shadowcallstack { + ; CHECK: f1: + ; CHECK-NOT: x18 + ; CHECK: ret + ret void +} + +declare void @foo() + +define void @f2() shadowcallstack { + ; CHECK: f2: + ; CHECK-NOT: x18 + ; CHECK: b foo + tail call void @foo() + ret void +} + +declare i32 @bar() + +define i32 @f3() shadowcallstack { + ; CHECK: f3: + ; CHECK: str x30, [x18], #8 + ; CHECK: str x30, [sp, #-16]! + %res = call i32 @bar() + %res1 = add i32 %res, 1 + ; CHECK: ldr x30, [sp], #16 + ; CHECK: ldr x30, [x18, #-8]! + ; CHECK: ret + ret i32 %res +} + +define i32 @f4() shadowcallstack { + ; CHECK: f4: + %res1 = call i32 @bar() + %res2 = call i32 @bar() + %res3 = call i32 @bar() + %res4 = call i32 @bar() + %res12 = add i32 %res1, %res2 + %res34 = add i32 %res3, %res4 + %res1234 = add i32 %res12, %res34 + ; CHECK: ldp {{.*}}x30, [sp + ; CHECK: ldr x30, [x18, #-8]! + ; CHECK: ret + ret i32 %res1234 +}