This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Implement stack probing for windows
ClosedPublic

Authored by mstorsjo on Dec 12 2017, 1:22 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

mstorsjo created this revision.Dec 12 2017, 1:22 PM

How to work around this?? The intended instruction is sub sp, sp, x15, lsl #4

I think you're using the wrong opcode? Should be SUBXrs.

compnerd added inline comments.Dec 12 2017, 1:57 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
597 ↗(On Diff #126615)

Is this something that Visual Studio supports? Or the code model an extension like the ARMv7 case?

1224–1226 ↗(On Diff #126615)

Id just use a ternary:

unsigned BP = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister() : AArch64::NoRegister;
1229 ↗(On Diff #126615)

Why not use a range based loop?

for (const auto &CSR : RegInfo->getCalleeSavedRegs(&MF))
  if (CSR == BasePointerReg)
    ++SpillEstimate;

How to work around this?? The intended instruction is sub sp, sp, x15, lsl #4

I think you're using the wrong opcode? Should be SUBXrs.

No, that ends up doing the wrong thing.

SP and XZR are different names for R31, and the interpretation depend on the kind of the instruction. It refers to SP in add/sub with immediates or extended registers, but not with shifted registers. So therefore I need to use SUBXrx, otherwise this ends up interpreted as sub xzr, xzr, x15, lsl #4, which gets simplified into neg xzr, x15, lsl #4 or so.

mstorsjo added inline comments.Dec 12 2017, 2:19 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
597 ↗(On Diff #126615)

I modeled this based on what MSVC produces. It's pretty similar to the ARMv7 case, with one difference: On ARM, on return from __chkstk, the register has been rescaled into bytes, so you do sub sp, sp, r4, while here it's kept in the same unit as on input to __chkstk, and thus need to do sub sp, sp, x15, lsl #4.

1224–1226 ↗(On Diff #126615)

Sure, I can do that.

1229 ↗(On Diff #126615)

That doesn't work, since getCalleeSavedRegs returns a plain pointer with null termination, the length isn't known at compile time and it doesn't have begin()/end() functions.

mstorsjo updated this revision to Diff 126624.Dec 12 2017, 2:20 PM

Used a ternary operator as suggested.

No, that ends up doing the wrong thing.

Oh, oops, you're right. The actual correct opcode is SUBXrx64.

No, that ends up doing the wrong thing.

Oh, oops, you're right. The actual correct opcode is SUBXrx64.

Thanks, that does fix it! Those opcodes aren't really easily discoverable...

mstorsjo updated this revision to Diff 126626.Dec 12 2017, 2:24 PM
mstorsjo edited the summary of this revision. (Show Details)

Using SUBXrx64, added -verify-machineinstrs to the test.

How does this interact with https://reviews.llvm.org/D40863 ?

They're pretty similar, although I only implemented the case for static stack allocations so far, leaving dynamic allocations for later. The basis for what they do is almost identical (with the differences in the function name and calling convention), so I'm sure it should be easy to adapt one to match the pattern laid out by the other one when one of them gets committed first.

compnerd added inline comments.Dec 14 2017, 5:50 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
597 ↗(On Diff #126615)

Oh, I assumed as much. However, the behavior of the large code model is something which is a conforming extension. The generated code is slightly different to allow the values to extend past what the pattern MSVC emits would allow. I am asking if they did this differently in the ARM64 backend or if that is an extension (and thus should be documented).

mstorsjo added inline comments.Dec 14 2017, 10:16 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
597 ↗(On Diff #126615)

I didn't check that case with msvc, I'll have a look.

mstorsjo added inline comments.Dec 14 2017, 11:14 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
597 ↗(On Diff #126615)

So I presume that MSVC didn't have any command line flag to enable a large code model from before/on other archs? I don't seem to find one in MSVC for ARM64 at least. (I.e., for the armv7 case, was this a case of MSVC just not supporting such a code model, or that it does but didn't emit code like this?)

In any case, I'll update this patch with similar documentation as for arm, about the fact that this is an extension.

mstorsjo updated this revision to Diff 127068.Dec 14 2017, 11:30 PM

Added documentation about the fact that the large code model operation is an extension.

mstorsjo updated this revision to Diff 127256.Dec 16 2017, 12:53 PM

Fix the estimate of number of registers to spill, adjust it to make sure it won't underestimate the amount of stack used.

compnerd accepted this revision.Dec 19 2017, 2:14 PM

Please do add the docs that the large code model behavior is an extension.

lib/Target/AArch64/AArch64FrameLowering.cpp
367 ↗(On Diff #127256)

Why not just combine these into a single requiresStackProbe?

498 ↗(On Diff #127256)

Same.

This revision is now accepted and ready to land.Dec 19 2017, 2:14 PM

Please do add the docs that the large code model behavior is an extension.

I did add a section to docs/Extensions.rst about this

lib/Target/AArch64/AArch64FrameLowering.cpp
367 ↗(On Diff #127256)

I think I'd rather keep them separate like this, since they're there for quite different reasons - the 512 byte limit doesn't have anything to do with stack probing.

498 ↗(On Diff #127256)

Not quite sure how you mean I should try to merge these

This revision was automatically updated to reflect the committed changes.