[RISCV] Support Shadow Call Stack Currenlty assume x18 is used as pointer to shadow call stack. User shall pass flags: "-fsanitize=shadow-call-stack -ffixed-x18" Runtime supported is needed to setup x18. If SCS is desired, all parts of the program should be built with -ffixed-x18 to maintain inter-operatability. There's no particuluar reason that we must use x18 as SCS pointer. Any register may be used, as long as it does not have designated purpose already, like RA or passing call arguments.
Please style these in the same way as the other RISC-V CodeGen tests, in terms of argument order, redirecting stdin rather than using -o - %s, and using riscv64 rather than riscv64-linux-gnu (unless needed?). Also use update_llc_test_checks.py rather than hand-writing this. And we generally use RV32I and RV64I (or other appropriate arch strings) instead of RISCV32 and RISCV64 prefixes.
Is there a reason for choosing X18? On AArch64 that's either a temporary or saved register depending on ABI, but determined to be the "platform register". Picking X18 on RISC-V because that's the same index as AArch64 seems a little arbitrary, but maybe it happens to make sense.
There are thee things to observe here and other reviewers might have some additional comments:
- Please use local variables with meaningful names for RISCV::Xn rather than inlining them everywhere and making it harder at a glance to work out what's going on without knowing what the ABI register names are.
- Please make a RISCVABI::getSCSPReg or similar function to avoid hard-coding X18 in a bunch of places.
- I'm not convinced a callee-saved register makes any more sense than anything else. Whatever you pick, compatibility only works one way round. If foo uses an SCS and calls bar that doesn't, then being callee saved helps you, but if bar calls foo then (a) foo will try to dereference SCSP (to store then later load), likely faulting, perhaps instead "just" clobbering arbitrary memory (b) if foo manages to return back to bar without faulting then bar would expect the register to have been saved, but it hasn't, an ABI violation. If you use a caller-saved register instead then bar can call foo just fine, but foo can't call bar as it'll clobber its SCSP. Reserving any register like this is a fundamental ABI incompatibility, so picking x18 because it's callee-saved is meaningless, and picking it to avoid duplicating 6 lines of code (or fewer, if the existing 6 lines are refactored) isn't a good reason either. It's ultimately arbitrary, but it should be at least be based on some kind of valid reasoning if such reasoning exists. X18 may or may not be a right answer.
(One might initially think that this incompatibility is fine if you're building an executable with SCSP that calls other libraries without SCSP, but that breaks down as soon as callbacks exist, as well as more niche things like symbol preemption.)
if (find(CSI, RISCV::X1) == CSI.end()) return;
(using llvm::find as a convenient wrapper around std::find, ie shorthand for std::find(CSI.begin(), CSI.end(), RISCV::X1)). Though personally I'd prefer to see X1 come from RI.getRARegister() rather than be hard-coded; other functions in this file already hard-code it, but in our CHERI fork we need to replace RISCV::X1 with RISCV::C1 everywhere so have changed those. Having said that, CHERI renders a shadow call stack unnecessary, so I don't particularly care if it's broken there, personally. But I still think it's nicer code.
This is wrong for RV64.
Also wrong for RV64.
As I said before, please just use -mtriple=riscv32. The -unknown-elf is implied, irrelevant and wastes space, so all the OS-independent CodeGen tests just specify the CPU.
Two extra spaces to indent the | is the predominant style.
Delete this blank line.
The -NOTs shouldn't even exist, this isn't how you use update_llc_test_checks.py. But yes, by default that's how it'll be printed unless you disable printing aliases.
Any callee-saved register can be used.
In fact, any register may be used, as long as it cannot be used to pass arguments or return values. It may be better to select a temporary register, as x18 is on aarch64 when not being used as a platform register, so that problems are noticed more easily (i.e. more likely to be clobbered by incompatible code).
Addressed styling & code clarity issues.
Fixed wrong opcode for RV64.
Unlike x18 on AArch64, there's no register that should normally be reserved/not-used on RISCV. So using any eligible register would break ABI compatibility. x18 is neither a better nor a worse choice than other registers. Non-SCS code should be built with -ffixed-x18 to be compatible with SCS-enabled code.
Not sure how to make llvm::find or llvm::is_contained work in this scenario.
CSI is a std::vector<llvm::CalleeSavedInfo>. We need a getReg() for each element in it before comparing to a 'Register'
We discussed this on the RISC-V LLVM Sync-up (both this time and on 6 August 2020 IIRC). The consensus view is: if you're on an rv32e implementation, you're potentially too constrained to use a shadow call stack anyway. Even then, LLVM doesn't implement the rest of rv32e yet (though there are plans to do so at some point, which means we can revisit this change).
Our feeling is that if users have a distinct need to use a different register at the moment, they can use a downstream change to LLVM. The fact that we use RISCVABI::getSCSPReg() should make this fairly easy.
If the register choice is the only concern about this work, then I think we can probably land it as-is and fixup the register choice if we see major drawbacks later. Yes, it's an ABI issue, but on the other hand the shadow call stack is not a standard ABI anyway.
Rebased & ping...
IMHO, the patch is in good shape. As we discussed in the bi-weekly meetings, RV32E only has 16 registers. Systems based on RV32E may have limited memory as well. Besides, LLVM does not have full support for RV32E yet. We can commit this patch as-is and change it later if RV32E needs SCS.
When SCS is on, x18 must be preserved across calls. Given it's a callee-saved, value in x18 is preserved by functions that do not have SCS attribute.
However, saving x18 on stack risks leaking SCS's location in memory, making the defense useless.
Ok, so any compilation units without -fsanitize=shadow-call-stack should be compiled with -ffixed-x18 if you want to link those together. That is reasonable. My question was whether we can ensure that -fsanitize=shadow-call-stack can imply -ffixed-x18 rather than having to pass both options.
It is my understanding that all functions in a CU with -fsanitize=shadow-call-stack will get the SCS function attribute, so why can't we use that attribute to work out whether x18 should be reserved, rather than using -ffixed-x18? You'll see RISCVRegisterInfo::getReservedRegs reserves fp and bp only if they're needed by the function (which can be based on attributes or other info), rather than using isRegisterReservedByUser - which seems cleaner to me.
FWIW, on aarch64 we decided to make -fsanitize=shadow-call-stack require the x18 reservation (instead of implying it) to try to avoid ABI mismatch problems. That is, it should be safe to mix and match code compiled with and without -fsanitize=shadow-call-stack. If we make -fsanitize=shadow-call-stack imply the x18 reservation, it makes it more likely that someone will accidentally build and link in incompatible code that does not reserve x18.
Yes, binding -fsanitize=shadow-call-stack with -ffixed-x18 is to ensure user don't accidentally link SCS-enabled code with non-SCS code that uses/overwrites x18. We can always infer that x18 is reserved when SCS in on.
There's no need to check RISCVRegisterInfo::getReservedRegs for x18 by looking into function attr. It forms a circular chain: is x18 reserved for SCS? <--> SCS is enabled, so x18 should be reserved.
This is currently incompatible with the save/restore libcalls, and I don't think there's any way to avoid that (the restore libcall both loads ra and jumps to it). We should ensure combining them gives an error.
Pointless comment; remove
Pointless comment; remove
This should be passed in as an argument IMO (same for the epilogue) given the standard prologue/epilogue code already has a DebugLoc lying around.
Is it intended that the shadow call stack grows *up* unlike the normal stack?
No need to repeat ourselves.
Although in fact you have both a bug and a minor performance issue with this, and it should be:
// l[w|d] ra, [-4|-8](s2) // addi s2, s2, -[4|8]
Then there's no read-after-write dependency chain, which is better for out-of-order cores.
The bug is that, currently, you risk a signal handler clobbering your SCS slot in between the two instructions, since you deallocate the frame before you read from it. Will be rare in practice, but a possibility.
No. Which direction the SCS grows on is trivial.
The memory area hosting SCS is independent of the regular stack; and it's provided by the runtime.
Ok. Wasn't saying there was anything wrong with it, was just something that jumped out at me. Having it grow up makes more sense (the only real advantage to the normal stack growing down these days is that doing aligned allocations is slightly cheaper).
This still holds.