Page MenuHomePhabricator

[X86ISelLowering] don't emit frame pointers for eflags intrinsics.
Needs ReviewPublic

Authored by nickdesaulniers on Dec 4 2020, 2:37 PM.

Details

Summary

We're looking to use builtin_ia32_readeflags_u64() and
builtin_ia32_writeeflags_u64() in the Linux kernel in place of
handwritten inline assembly. We would like to minimize the overhead of
accessing the eflags, though that is already relatively slow.

It seems that the use of these compiler builtins was always injecting
frame pointers. The Linux kernel explicitly avoids frame pointers for
x86_64 and relies on custom binary post processing for handling
unwinding (so, also not DWARF .eh_frame or .debug_frame), and makes
Clang's codegen for these builtins less competitive with GCC.

Remove this logic. Users of these compiler builtins may want to specify
-fno-omit-frame-pointer or /Oy- to keep the previous behavior.

Fixes pr/47531.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Diff Detail

Event Timeline

nickdesaulniers created this revision.Dec 4 2020, 2:37 PM
nickdesaulniers requested review of this revision.Dec 4 2020, 2:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2020, 2:37 PM

What prevents overwriting something in the red zone?

What prevents overwriting something in the red zone?

I don't follow. How does the kernel prevent overwriting something in the red zone? I don't know, https://lore.kernel.org/lkml/CALCETrXUaeNUbkQSeMPpPKWDBCEpqX1gLgkv2G9zLeeYMjK8VQ@mail.gmail.com/ was the brief conversation we had on LKML, but I'd have to talk with the x86 maintainers who would know more. Andy Lutomirski seems to imply the kernel may be broken in that regard.

But I also think that is orthogonal to these builtins and frame pointers, or at least I fail to see the relation.

What prevents overwriting something in the red zone?

I don't follow. How does the kernel prevent overwriting something in the red zone? I don't know, https://lore.kernel.org/lkml/CALCETrXUaeNUbkQSeMPpPKWDBCEpqX1gLgkv2G9zLeeYMjK8VQ@mail.gmail.com/ was the brief conversation we had on LKML, but I'd have to talk with the x86 maintainers who would know more. Andy Lutomirski seems to imply the kernel may be broken in that regard.

But I also think that is orthogonal to these builtins and frame pointers, or at least I fail to see the relation.

When the red zone is in use, the stack pointer isn't updated to include it. So when the pushfq happens it will push a value into an address that is part of the red zone potentially clobbering something. The setHasCopyImplyingStackAdjust disables the use of the red zone for the function which avoids this issue.

Maybe the linux kernel disables the red zone and this isn't a problem. But I worry about others using these builtins without disabling the red zone.

Maybe the linux kernel disables the red zone and this isn't a problem.

The x86_64 (64b) Linux kernel is built with -mno-red-zone; I'm not sure if that's precisely the same as "disables the red zone."

But I worry about others using these builtins without disabling the red zone.

Wouldn't -fno-omit-frame-pointer or /Oy- as described in this commit message prevent that? This patch is mostly about matching GCCs behavior for these builtins, so "others using these builtins without disabling the red zone" would already be broken in such a case when built with GCC.

Maybe the linux kernel disables the red zone and this isn't a problem.

The x86_64 (64b) Linux kernel is built with -mno-red-zone; I'm not sure if that's precisely the same as "disables the red zone."

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6ba0efa46047936afa81460489cfd24bc95dd863 has a little more context about the kernel's constraints. https://stackoverflow.com/questions/25787408/why-cant-kernel-code-use-a-red-zone has more interesting discussion as well.

Maybe the linux kernel disables the red zone and this isn't a problem.

The x86_64 (64b) Linux kernel is built with -mno-red-zone; I'm not sure if that's precisely the same as "disables the red zone."

Yes that's what I meant.

But I worry about others using these builtins without disabling the red zone.

Wouldn't -fno-omit-frame-pointer or /Oy- as described in this commit message prevent that? This patch is mostly about matching GCCs behavior for these builtins, so "others using these builtins without disabling the red zone" would already be broken in such a case when built with GCC.

Yes -fno-omit-frame-pointer would fix it. gcc seems to handle the redzone correctly when readeflags is present. Compare the two functions here, https://godbolt.org/z/j7hdYY. With the readeflags builtin, there is a stack pointer adjustment near the beginning. In the second function there is no stack pointer adjustment. Adding -mno-red-zone will put a stack adjustment in the second function.

So I think this patch breaks a case that works with clang and gcc today. I don't think requiring -fno-omit-frame-pointer to use the builtins with clang is a good solution.

It is difficult remembering exactly what my thought process was way back when but the redzone calculation seems very likely...

Lets see...

011980cd50d5ddc5112c8440ffe9161de60b40ae was where I first implemented these intrinsics.

Looking at the Win64 ABI, there is this bit:

A leaf function is one that does not require a function table entry. It can't make changes to any nonvolatile registers, including RSP, which means that it can't call any functions or allocate stack space.

Using these intrinsics must not violate this rule of the ABI so I forced frame pointers across all platforms.

The Linux ABI for x86-64 says that there are 128-bytes reserved for leaf functions. IIRC, forcing a frame pointer here also handled some complicated accounting issue of when or how much redzone we had to play with because this intrinsic allocates some stack...

+1 to @craig.topper's comments.

rnk added inline comments.Dec 7 2020, 1:48 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
25976

This is the only writer of HasCopyImplyingStackAdjustment, so if we did go ahead with this, I'd ask you to clean up the readers.

llvm/test/CodeGen/X86/win64_frame.ll
194

From a Windows PoV, this test case needs to remain unchanged, otherwise we break the contract that stack unwinding works at every PC.

pengfei added inline comments.Oct 26 2021, 1:46 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

How about address Craig and Reid's concerns by adding the check conditions like above?

craig.topper added inline comments.Oct 26 2021, 12:27 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

emitPrologue calls setUsesRedZone. That would occur after this code runs. This is too early to call getUsesRedZone.

pengfei added inline comments.Nov 15 2021, 5:13 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

Maybe we can check for -mno-red-zone only given Linux kernel is built with it:

MachineFunction &MF = DAG.getMachineFunction();
if (!MF.getFunction().hasFnAttribute(Attribute::NoRedZone))
  MF.getFrameInfo().setHasCopyImplyingStackAdjustment(true);
rnk added inline comments.Nov 15 2021, 10:07 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

I would try to tackle this the other way around: usage of readflags/writeflags should prevent PEI from enabling the redzone optimization.

I think what we really want here is to tweak X86FrameLowering::hasFP.
If WinCFI is being used, where we have per-instruction accurate CFI, we need to force the use of a frame pointer.
For Linux, this flag should prevent the use of a red zone later in X86FrameLowering::emitProlog. We don't need to force FP usage if WinCFI is not in use.

pengfei added inline comments.Nov 18 2021, 6:32 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

We can't check hasWinCFI here due to the same reason as getUsesRedZone, I think we can check isOSWindows instead.
Regarding red zone, I think it is a great idea. So the code can be

--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -27148,8 +27148,11 @@ static SDValue LowerINTRINSIC_W_CHAIN(SDValue Op, const X86Subtarget &Subtarget,
     case llvm::Intrinsic::x86_flags_write_u64: {
       // We need a frame pointer because this will get lowered to a PUSH/POP
       // sequence.
-      MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
-      MFI.setHasCopyImplyingStackAdjustment(true);
+      MachineFunction &MF = DAG.getMachineFunction();
+      if (Subtarget.getTargetTriple().isOSWindows())
+        MF.getFrameInfo().setHasCopyImplyingStackAdjustment(true);
+      else
+        MF.getFunction().addFnAttr(Attribute::NoRedZone);
       // Don't do anything here, we will expand these intrinsics out later
       // during FinalizeISel in EmitInstrWithCustomInserter.
       return Op;

I checked it with latest xmain. No test will be affected.