While probing stack, the stack register is moved without
dwarf information, which could cause panic if unwind the
backtrace at that point. This commit add dwarf information
for these operation, and use r11 (instead of rsp) to iterate
over pages to probe.
Details
- Reviewers
serge-sans-paille nagisa efriedma lkail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
There has been a bug report for this on bugzilla. A more "downstream" context for this feature is discussed in rust#83139.
llvm/test/CodeGen/X86/stack-clash-unknown-call.ll | ||
---|---|---|
18 | The def_cfa_offset is superfluous here now. The last cfi directive should be either the adjust or def. |
Generally, unlike the formats used on Windows etc., DWARF unwind isn't accurate in the prologue. I mean, you could add separate unwind info for each relevant instruction, but that would be a lot of data, and we currently don't have any option to do that.
Given that, I'm not sure what you're trying to accomplish here. I can't see how adding more .cfi_adjust_cfa_offset directives does anything useful.
To clarify, I've done some more reading now, and figured out where I went wrong. For a long time, LLVM did not emit accurate unwind info to describe the prologue/epilogue (and still doesn't on some targets), so I was under the impression it wasn't possible. Clearly, it is, and it's implemented on x86.
The change to use r11 isn't implemented correctly: we can't adjust the stack pointer until *after* we've probed the relevant pages. It'll appear to work, but it won't actually provide complete protection if a signal handler triggers at the wrong time.
Make sense. I tried to use r11 + offset to represent CFA temporarily. However, r11d cannot be used as a dwarf register on x86_32. Can I use another register (like di) here?
Make sense. I tried to use r11 + offset to represent CFA temporarily. However, r11d cannot be used as a dwarf register on x86_32.
What registers can be used? I did a quick search and couldn't find anything.
Can I use another register (like di) here?
Any register that isn't callee-preserved could be used (or any register in general if its spilled first), assuming it isn't used for something else already. I did a quick search in an effort to figure out which registers are callee-saved on x86, but couldn't find anything definitive :(
This makes me wonder, though: why not leave selection of the register to use here to regalloc?
llvm/test/CodeGen/X86/stack-clash-large.ll | ||
---|---|---|
22 | I… think this wants to be a def_cfa_offset? def_cfa_register does not reset the offset so its not at all obvious what this is offsetting from. Alternatively there's a form that combines both setting the new register and the offset into a single directive: .cfi_def_cfa %rdi, 69632 |
What registers can be used? I did a quick search and couldn't find anything.
I found the available dwarf registers under the x86 register table in LLVM (X86RegisterInfo.td).
Any register that isn't callee-preserved could be used (or any register in general if its spilled first), assuming it isn't used for something else already. I did a quick search in an effort to figure out which registers are callee-saved on x86, but couldn't find anything definitive :(
It seems that different platform will have different set of callee-preserved register. I also found them in the list in LLVM (which is refered in the function getCalleeSavedRegs of llvm/lib/Target/X86/X86RegisterInfo.cpp). Sadly, it seems like RDI is more likely to be a callee-saved register.
Only considering the SYSV and Windows, all callee-saved registers are:
X86::RBX, X86::R12, X86::R13, X86::R14, X86::R15, X86::RBP X86::RBX, X86::RBP, X86::RDI, X86::RSI, X86::R12, X86::R13, X86::R14, X86::R15, X86::XMM6, X86::XMM7, X86::XMM8, X86::XMM9, X86::XMM10, X86::XMM11, X86::XMM12, X86::XMM13, X86::XMM14, X86::XMM15
And all available dwarf registers are:
32bit: EAX, EDX, ECX, EBX, ESI, EDI, EBP, ESP, EIP 64bit, X86-64: RAX, RDX, RCX, RBX, RSI, RDI, RBP, RSP, R8-R15, RIP
For 64bit, RAX, RCX, RDX, R8-R11 could be a good choice, and for 32 bit, it could only be choosen from EAX, ECX, EDX
This makes me wonder, though: why not leave selection of the register to use here to regalloc?
Sounds like a good idea. But I'm not familiar with LLVM codes. Is there any example on how to use the regalloc?
(I don't know whether it will make this patch too complicate. Allocating stack probe register with regalloc and fixing the dwarf sound like two things and I'd like to do it in a separated patch.)
llvm/test/CodeGen/X86/stack-clash-large.ll | ||
---|---|---|
22 | The problem is that I don't know the accurate offset. When the callee-saved registers are pushed to the stack, there will be an offset before probing the stack (if I understand the prolog part correctly, please tell me if I'm wrong). |
Use RDX/EDX instead of RDI/EDI, as RDI/EDI is callee saved on Win64 calling convension
A care must be taken to not overwrite the arguments as well. For instance on SysV x86_64 ABI rdi, rsi, rdx, rcx, r8, r9 are used to pass in integer arguments. For functions with a small number of arguments one of these could be reused, but if a function happens to use all of them, unconditional use of rdx would clobber the argument.
You would probably have to split the patches up into distinct parts, yes (first one adjusting the backend to allocate virtual registers and the next being this one).
I'll get back to you with advice on how to specify a virtual register a little bit later –
A care must be taken to not overwrite the arguments as well. For instance on SysV x86_64 ABI rdi, rsi, rdx, rcx, r8, r9 are used to pass in integer arguments. For functions with a small number of arguments one of these could be reused, but if a function happens to use all of them, unconditional use of rdx would clobber the argument.
Nice catch! After eliminating RCX, RDX from the selections, it seems that the only possible "always correct" choice is RAX (being used as the return value doesn't bother).
See createVirtualRegister.
Thanks. I have seen some usage of this function. It seems that there isn't any suitable RegClass for this situation ("dwarf suitable registers class", is there an equivalent one?). Should I create one by myself?
I investigated this a little bit, and it seems like using createVirtualRegister won't work here, after all. It seems that this code does in fact run after regalloc happens. I'm out of ideas, but I also am not that familiar with this area to give any useful advice or guidance here.
Normally, I'd expect some register is naturally free in the prologue, but you could get into weird situations. On 32-bit specifically, consider compiling with -mregparm=3; I think there are no registers which are unconditionally safe in that case. One possibility is to always use EAX, and just save/restore it if necessary. See isEAXAlive in X86FrameLowering::emitPrologue.
Alternatively, you could ensure that some callee-save GPR is spilled, and explicitly use that register. This is taking advantage of the fact this is part of the prologue: there can't be any other uses of callee-save registers at that point. (In theory, it might be possibly for an exotic calling convention to have no callee-save registers, but I don't think there are any in practice.)
Outside the prologue, the allocation should be represented by some instruction; that instruction should clobber some register, and regalloc will ensure that register is free.
llvm/test/CodeGen/X86/stack-clash-large.ll | ||
---|---|---|
38 | BTW, this is completely broken; r11d doesn't exist on 32-bit x86. |
llvm/test/CodeGen/X86/stack-clash-large.ll | ||
---|---|---|
38 | Wow, surprising discovery. (I think) a "bad register name" should be given when compiling this codes? Is there any pass which will omit this problem? Run clang -fstack-clash-protection -m32 -fomit-frame-pointer -S will generate codes containing r11d, which is bad. However, clang -fstack-clash-protection -m32 -fomit-frame-pointer -c and disassemble the output, the register used here will become ebx. |
Normally, I'd expect some register is naturally free in the prologue, but you could get into weird situations. On 32-bit specifically, consider compiling with -mregparm=3; I think there are no registers which are unconditionally safe in that case. One possibility is to always use EAX, and just save/restore it if necessary. See isEAXAlive in X86FrameLowering::emitPrologue.
Alternatively, you could ensure that some callee-save GPR is spilled, and explicitly use that register. This is taking advantage of the fact this is part of the prologue: there can't be any other uses of callee-save registers at that point. (In theory, it might be possibly for an exotic calling convention to have no callee-save registers, but I don't think there are any in practice.)
Both solutions seems to be either complicate or with extra cost. And surprisingly found that the original implementation of stack probe is wrong on 32bit. Given that, I preferred to only provide DWARF information in 64bit situation (and left a comment), so that r11 can be used and solve this problem easily.
btw I prototyped a D98906: [X86] Improve lowering of the unrolled inline-asm probing yesterday as an alternative approach towards improving the unrolled case.
Looks like my alternative has caveats of its own. Can you please split your change into two parts? One that affects only the unrolled case (which I believe should be good to land) and the other which affects the loop case (potentially subject to further discussion).
Oops. Sorry, I missed the former comment "Looks like my alternative has caveats of its own.". I will split this patch into two parts right now. Thanks.
clang-format: please reformat the code