Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
How to work around this?? The intended instruction is sub sp, sp, x15, lsl #4
I think you're using the wrong opcode? Should be SUBXrs.
lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
597 ↗ | (On Diff #126615) | Is this something that Visual Studio supports? Or the code model an extension like the ARMv7 case? |
1224–1226 ↗ | (On Diff #126615) | Id just use a ternary: unsigned BP = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister() : AArch64::NoRegister; |
1229 ↗ | (On Diff #126615) | Why not use a range based loop? for (const auto &CSR : RegInfo->getCalleeSavedRegs(&MF)) if (CSR == BasePointerReg) ++SpillEstimate; |
No, that ends up doing the wrong thing.
SP and XZR are different names for R31, and the interpretation depend on the kind of the instruction. It refers to SP in add/sub with immediates or extended registers, but not with shifted registers. So therefore I need to use SUBXrx, otherwise this ends up interpreted as sub xzr, xzr, x15, lsl #4, which gets simplified into neg xzr, x15, lsl #4 or so.
lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
597 ↗ | (On Diff #126615) | I modeled this based on what MSVC produces. It's pretty similar to the ARMv7 case, with one difference: On ARM, on return from __chkstk, the register has been rescaled into bytes, so you do sub sp, sp, r4, while here it's kept in the same unit as on input to __chkstk, and thus need to do sub sp, sp, x15, lsl #4. |
1224–1226 ↗ | (On Diff #126615) | Sure, I can do that. |
1229 ↗ | (On Diff #126615) | That doesn't work, since getCalleeSavedRegs returns a plain pointer with null termination, the length isn't known at compile time and it doesn't have begin()/end() functions. |
No, that ends up doing the wrong thing.
Oh, oops, you're right. The actual correct opcode is SUBXrx64.
They're pretty similar, although I only implemented the case for static stack allocations so far, leaving dynamic allocations for later. The basis for what they do is almost identical (with the differences in the function name and calling convention), so I'm sure it should be easy to adapt one to match the pattern laid out by the other one when one of them gets committed first.
lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
597 ↗ | (On Diff #126615) | Oh, I assumed as much. However, the behavior of the large code model is something which is a conforming extension. The generated code is slightly different to allow the values to extend past what the pattern MSVC emits would allow. I am asking if they did this differently in the ARM64 backend or if that is an extension (and thus should be documented). |
lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
597 ↗ | (On Diff #126615) | I didn't check that case with msvc, I'll have a look. |
lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
597 ↗ | (On Diff #126615) | So I presume that MSVC didn't have any command line flag to enable a large code model from before/on other archs? I don't seem to find one in MSVC for ARM64 at least. (I.e., for the armv7 case, was this a case of MSVC just not supporting such a code model, or that it does but didn't emit code like this?) In any case, I'll update this patch with similar documentation as for arm, about the fact that this is an extension. |
Added documentation about the fact that the large code model operation is an extension.
Fix the estimate of number of registers to spill, adjust it to make sure it won't underestimate the amount of stack used.
I did add a section to docs/Extensions.rst about this
lib/Target/AArch64/AArch64FrameLowering.cpp | ||
---|---|---|
367 ↗ | (On Diff #127256) | I think I'd rather keep them separate like this, since they're there for quite different reasons - the 512 byte limit doesn't have anything to do with stack probing. |
498 ↗ | (On Diff #127256) | Not quite sure how you mean I should try to merge these |