Fix regression https://github.com/llvm/llvm-project/issues/56952 for Clang CodeGen on Windows. In the Windows ABI the instruction sequence that is expanded from CALL_RVMARKER should use RCX as target register and not RDI.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I assume is that this can only be a temporary fix as it probably has an impact on performance of compiled output. I am eager to fix this properly. The FIXME comment says that we need specific support in the target backend. Any ideas what the x86_64 backend for Windows might be missing to handle the operand bundle correctly? Thanks!
Bisectioning showed that the regression started with this patch: https://reviews.llvm.org/D111331
The symptom is that Clang emits movq %rax, %rdi instead of movq %rax, %rcx while objc_retainAutoreleasedReturnValue() still expects the value in %rcx.
It appears related to D94597. Is this a calling convention issue?
Oh right, so a different marker is used on Windows? The code that adds the marker is here: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86ExpandPseudo.cpp#L224
It would probably be better to just add Windows support there. Also, this should have a test case.
Having some sort of reliable marker here emitted after the call is currently a correctness condition for using the ObjC ARC autorelease optimization. Apple's x86_64 ObjC runtime is constrained by ABI limitations, so we sniff for a particular, fairly reasonable code pattern that looks like moving the return value to the first argument register and then making a specific call; on other platforms, we look for a specific non-standard NOP. Non-Apple runtimes that don't have the same ABI concerns probably ought to just use a NOP marker, but if that's not an option, then yeah, I guess you just need the compiler to emit the right register-register move, and then make sure the runtime is sniffing for that.
It's possible the GNU runtime generally uses a different approach and doesn't need these markers at all; David Chisnall might know.
Thanks for your feedback! For the moment, I continued the discussion in the GitHub ticket. My gut feeling is that I should fix the X86ExpandPseudo code to emit the right register-register move instead of preventing the use of the operand bundle here in CodeGen, because the inliner may generate it as well. I tested it like this and it works as expected:
https://github.com/weliveindetail/llvm-project/commit/c36eeefb3917c66ad75c7dd1dbcda29645d05aed
I'd like to wait for David's response on whether or not GNUstep is checking for optimized callers (or wants to do it in the future) before deciding for one way or the other.
Follow Reid's advice from https://github.com/llvm/llvm-project/issues/56952#issuecomment-1255552565
and fix the instruction sequence expanded from CALL_RVMARKER on Windows (use RCX as target
register and not RDI). This way we stay as close as possible to Apple's ABI and GNUstep has the
option to implement detection of optimized callers for autorelease optimizations in the future.
LGTM, thanks! You might want to extend the windows check lines to the other tests as well.
Thanks. Yes thought about that, but not all of them match the typical IR output on Windows (gxx_personality, cfi, etc.). The test case that I extended is exactly the scenario we were running into and for the moment it seems good enough to make sure we don't regress again. I will keep the review open for a day or two before landing this -- in case there is more feedback.