This is an archive of the discontinued LLVM Phabricator instance.

[X86ISelLowering] don't emit frame pointers for eflags intrinsics.
AbandonedPublic

Authored by nickdesaulniers on Dec 4 2020, 2:37 PM.

Details

Summary

We're looking to use builtin_ia32_readeflags_u64() and
builtin_ia32_writeeflags_u64() in the Linux kernel in place of
handwritten inline assembly. We would like to minimize the overhead of
accessing the eflags, though that is already relatively slow.

It seems that the use of these compiler builtins was always injecting
frame pointers. The Linux kernel explicitly avoids frame pointers for
x86_64 and relies on custom binary post processing for handling
unwinding (so, also not DWARF .eh_frame or .debug_frame), and makes
Clang's codegen for these builtins less competitive with GCC.

Fixes pr/47531.

Diff Detail

Event Timeline

nickdesaulniers created this revision.Dec 4 2020, 2:37 PM
nickdesaulniers requested review of this revision.Dec 4 2020, 2:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2020, 2:37 PM

What prevents overwriting something in the red zone?

What prevents overwriting something in the red zone?

I don't follow. How does the kernel prevent overwriting something in the red zone? I don't know, https://lore.kernel.org/lkml/CALCETrXUaeNUbkQSeMPpPKWDBCEpqX1gLgkv2G9zLeeYMjK8VQ@mail.gmail.com/ was the brief conversation we had on LKML, but I'd have to talk with the x86 maintainers who would know more. Andy Lutomirski seems to imply the kernel may be broken in that regard.

But I also think that is orthogonal to these builtins and frame pointers, or at least I fail to see the relation.

What prevents overwriting something in the red zone?

I don't follow. How does the kernel prevent overwriting something in the red zone? I don't know, https://lore.kernel.org/lkml/CALCETrXUaeNUbkQSeMPpPKWDBCEpqX1gLgkv2G9zLeeYMjK8VQ@mail.gmail.com/ was the brief conversation we had on LKML, but I'd have to talk with the x86 maintainers who would know more. Andy Lutomirski seems to imply the kernel may be broken in that regard.

But I also think that is orthogonal to these builtins and frame pointers, or at least I fail to see the relation.

When the red zone is in use, the stack pointer isn't updated to include it. So when the pushfq happens it will push a value into an address that is part of the red zone potentially clobbering something. The setHasCopyImplyingStackAdjust disables the use of the red zone for the function which avoids this issue.

Maybe the linux kernel disables the red zone and this isn't a problem. But I worry about others using these builtins without disabling the red zone.

Maybe the linux kernel disables the red zone and this isn't a problem.

The x86_64 (64b) Linux kernel is built with -mno-red-zone; I'm not sure if that's precisely the same as "disables the red zone."

But I worry about others using these builtins without disabling the red zone.

Wouldn't -fno-omit-frame-pointer or /Oy- as described in this commit message prevent that? This patch is mostly about matching GCCs behavior for these builtins, so "others using these builtins without disabling the red zone" would already be broken in such a case when built with GCC.

Maybe the linux kernel disables the red zone and this isn't a problem.

The x86_64 (64b) Linux kernel is built with -mno-red-zone; I'm not sure if that's precisely the same as "disables the red zone."

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6ba0efa46047936afa81460489cfd24bc95dd863 has a little more context about the kernel's constraints. https://stackoverflow.com/questions/25787408/why-cant-kernel-code-use-a-red-zone has more interesting discussion as well.

Maybe the linux kernel disables the red zone and this isn't a problem.

The x86_64 (64b) Linux kernel is built with -mno-red-zone; I'm not sure if that's precisely the same as "disables the red zone."

Yes that's what I meant.

But I worry about others using these builtins without disabling the red zone.

Wouldn't -fno-omit-frame-pointer or /Oy- as described in this commit message prevent that? This patch is mostly about matching GCCs behavior for these builtins, so "others using these builtins without disabling the red zone" would already be broken in such a case when built with GCC.

Yes -fno-omit-frame-pointer would fix it. gcc seems to handle the redzone correctly when readeflags is present. Compare the two functions here, https://godbolt.org/z/j7hdYY. With the readeflags builtin, there is a stack pointer adjustment near the beginning. In the second function there is no stack pointer adjustment. Adding -mno-red-zone will put a stack adjustment in the second function.

So I think this patch breaks a case that works with clang and gcc today. I don't think requiring -fno-omit-frame-pointer to use the builtins with clang is a good solution.

It is difficult remembering exactly what my thought process was way back when but the redzone calculation seems very likely...

Lets see...

011980cd50d5ddc5112c8440ffe9161de60b40ae was where I first implemented these intrinsics.

Looking at the Win64 ABI, there is this bit:

A leaf function is one that does not require a function table entry. It can't make changes to any nonvolatile registers, including RSP, which means that it can't call any functions or allocate stack space.

Using these intrinsics must not violate this rule of the ABI so I forced frame pointers across all platforms.

The Linux ABI for x86-64 says that there are 128-bytes reserved for leaf functions. IIRC, forcing a frame pointer here also handled some complicated accounting issue of when or how much redzone we had to play with because this intrinsic allocates some stack...

+1 to @craig.topper's comments.

rnk added inline comments.Dec 7 2020, 1:48 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
25976

This is the only writer of HasCopyImplyingStackAdjustment, so if we did go ahead with this, I'd ask you to clean up the readers.

llvm/test/CodeGen/X86/win64_frame.ll
194

From a Windows PoV, this test case needs to remain unchanged, otherwise we break the contract that stack unwinding works at every PC.

pengfei added inline comments.Oct 26 2021, 1:46 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

How about address Craig and Reid's concerns by adding the check conditions like above?

craig.topper added inline comments.Oct 26 2021, 12:27 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

emitPrologue calls setUsesRedZone. That would occur after this code runs. This is too early to call getUsesRedZone.

pengfei added inline comments.Nov 15 2021, 5:13 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

Maybe we can check for -mno-red-zone only given Linux kernel is built with it:

MachineFunction &MF = DAG.getMachineFunction();
if (!MF.getFunction().hasFnAttribute(Attribute::NoRedZone))
  MF.getFrameInfo().setHasCopyImplyingStackAdjustment(true);
rnk added inline comments.Nov 15 2021, 10:07 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

I would try to tackle this the other way around: usage of readflags/writeflags should prevent PEI from enabling the redzone optimization.

I think what we really want here is to tweak X86FrameLowering::hasFP.
If WinCFI is being used, where we have per-instruction accurate CFI, we need to force the use of a frame pointer.
For Linux, this flag should prevent the use of a red zone later in X86FrameLowering::emitProlog. We don't need to force FP usage if WinCFI is not in use.

pengfei added inline comments.Nov 18 2021, 6:32 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

We can't check hasWinCFI here due to the same reason as getUsesRedZone, I think we can check isOSWindows instead.
Regarding red zone, I think it is a great idea. So the code can be

--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -27148,8 +27148,11 @@ static SDValue LowerINTRINSIC_W_CHAIN(SDValue Op, const X86Subtarget &Subtarget,
     case llvm::Intrinsic::x86_flags_write_u64: {
       // We need a frame pointer because this will get lowered to a PUSH/POP
       // sequence.
-      MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
-      MFI.setHasCopyImplyingStackAdjustment(true);
+      MachineFunction &MF = DAG.getMachineFunction();
+      if (Subtarget.getTargetTriple().isOSWindows())
+        MF.getFrameInfo().setHasCopyImplyingStackAdjustment(true);
+      else
+        MF.getFunction().addFnAttr(Attribute::NoRedZone);
       // Don't do anything here, we will expand these intrinsics out later
       // during FinalizeISel in EmitInstrWithCustomInserter.
       return Op;

I checked it with latest xmain. No test will be affected.

void added a comment.Feb 8 2022, 3:55 PM
This comment was removed by void.
rnk added inline comments.Feb 8 2022, 4:07 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

Right, I didn't mean that this code should check if CFI has been emitted, but whether WinCFI should be used. I believe this information is carried on MCAsmInfo. The proposed triple check is incorrect, I don't think we need to force FP usage on 32-bit Windows, it's really specific to Win64. Please use the wincfi check, it is more self-documenting.

  • rebase, check for wincfi
nickdesaulniers marked an inline comment as done.Feb 9 2022, 11:19 AM
nickdesaulniers added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

I think the red zone concerns are handled correctly already...

The only caller of X86MachineFunctionInfo::setUsesRedZone is guarded by a call to X86FrameLowering::has128ByteRedZone, which checks for the noredzone fn attr:

return Is64Bit && !IsWin64CC && !Fn.hasFnAttribute(Attribute::NoRedZone);
nickdesaulniers marked an inline comment as done.Feb 9 2022, 11:19 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

@rnk did you still want me to look into modifying X86FrameLowering::hasFP, or is that comment stale and this approach ok?

@craig.topper are there some redzone tests in tree that we should perhaps add calls to @llvm.x86.flags.{read|write}.u64?

rnk added inline comments.Feb 9 2022, 12:08 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975–25976

The only caller of X86MachineFunctionInfo::setUsesRedZone is guarded by a call to X86FrameLowering::has128ByteRedZone, which checks for the noredzone fn attr:

Sure, but I don't think anything sets that attr except -mno-redzone. We still need to disable the use of the redzone in the presence of pushf/popf instructions. At this point, HasCopyImplyingStackAdjustment means exactly that.

@rnk did you still want me to look into modifying X86FrameLowering::hasFP, or is that comment stale and this approach ok?

Yes, I think it would be cleaner to have this code, which sets the flag, to be target-independent (it just marks functions with pushfq/popfq). hasFP should return true for wincfi targets in the presence of these instructions. The emitProlog code which decides to use a redzone should also check this flag if it doesn't already.

nickdesaulniers edited the summary of this revision. (Show Details)
  • rebase on test from @topperc, base fix on Phoebe's suggestion
craig.topper added inline comments.Feb 9 2022, 3:37 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975

Do we have precedent for modifying an IR function from inside of a MachineFunctionPass.

void added inline comments.Feb 9 2022, 3:52 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975

I believe it's generally frowned upon.

  • rerun update_llc_test_checks.py w/ --no_x86_scrub_sp
nickdesaulniers added inline comments.Feb 9 2022, 3:54 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
25975

If the tests look good and everyone's happy with the tests, then I can find another way to plumb this.

  • check for isWin64Prologue when checking hasCopyImplyingStackAdjustment instead of modifying IR from machine pass

Looks like @rnk is much faster than me: https://reviews.llvm.org/D119391.

llvm/lib/Target/X86/X86Subtarget.h
640 ↗(On Diff #407350)

can be dropped now

  • drop change to X86Subtarget
nickdesaulniers marked 9 inline comments as done.Feb 9 2022, 4:59 PM