This is an archive of the discontinued LLVM Phabricator instance.

[X86] Create extra prolog/epilog for stack realignment
ClosedPublic

Authored by LuoYuanke on Mar 8 2023, 9:53 PM.

Details

Summary

The base pointer register is reserved by compiler when there is
dynamic size alloca and stack realign in a function. However the
base pointer register is not defined in X86 ABI, so user can use
this register in inline assembly. The inline assembly would
clobber base pointer register without being awared by user. This
patch is to create extra prolog to save the stack pointer to a
scratch register and use this register to reference argument from
stack. For some calling convention (e.g. regcall), there may be
few scratch register.
Below is the example code for such case.

extern int bar(void *p);
long long foo(size_t size, char c, int id) {
  __attribute__((__aligned__(64))) int a;
  char *p = (char *)alloca(size);
  asm volatile ("nop"::"S"(405):);
  asm volatile ("movl %0, %1"::"r"(id), "m"(a):);
  p[2] = 8;
  memset(p, c, size);
  return bar(p);
}

And below prolog/epilog will be emit for this case.

leal    4(%esp), %ebx
.cfi_def_cfa %ebx, 0
andl    $-128, %esp
pushl   -4(%ebx)
...
leal    4(%ebx), %esp
.cfi_def_cfa %esp, 4

Diff Detail

Event Timeline

LuoYuanke created this revision.Mar 8 2023, 9:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 9:53 PM
LuoYuanke requested review of this revision.Mar 8 2023, 9:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2023, 9:53 PM
LuoYuanke updated this revision to Diff 503625.Mar 8 2023, 10:12 PM

Add unwind attibute to test cfi instruction emit.

LuoYuanke added inline comments.Mar 8 2023, 10:14 PM
llvm/lib/Target/X86/X86ArgumentStackSlotRebase.cpp
80

About 60 lit test fails and need update if we create a new register class for it.

pengfei added inline comments.Mar 9 2023, 6:34 PM
llvm/lib/Target/X86/X86ArgumentStackSlotRebase.cpp
2

Update name

77

Are we replacing RBX with RBX on 64-bits?

80

Why R10? Do you want R11, according to ABI

%r10 temporary register, used for passing a function’s static chain pointer No
%r11 temporary register                                                     No
llvm/lib/Target/X86/X86FrameLowering.cpp
566–567

Remove

llvm/lib/Target/X86/X86RegisterInfo.td
436–439

Should remove it?

llvm/test/CodeGen/X86/swifttail-realign.ll
9

No sure if it's correct. I assume the old code used rsp to access %ptr but it is intentionally omitted.
It must be wrong if the new code still use rsp because it is not aligned to 32 now.

LuoYuanke added inline comments.Mar 10 2023, 5:35 AM
llvm/lib/Target/X86/X86ArgumentStackSlotRebase.cpp
77

Good catch, we may have to need virtual register since inline assembly may clobber any physical register.

80

Both R10 and R11 should be OK, but we should use virtual register.

llvm/test/CodeGen/X86/swifttail-realign.ll
9

There is alignment instruction andq $-32, %rax in both the old code and new code.

movq    %rsp, %rax
addq    $15, %rdi
andq    $-16, %rdi
subq    %rdi, %rax
andq    $-32, %rax
movq    %rax, %rsp
movq    var@GOTPCREL(%rip), %rcx
movq    %rax, (%rcx)
LuoYuanke updated this revision to Diff 504111.Mar 10 2023, 5:40 AM

Address some of Phoebe's comments. Need another patch
to create new register class.

LuoYuanke updated this revision to Diff 504303.Mar 10 2023, 4:38 PM

Address Phoebe's comments.
Use virtual register for argument stack slot rebase.

LuoYuanke marked 2 inline comments as done.Mar 10 2023, 4:40 PM
LuoYuanke updated this revision to Diff 504325.Mar 10 2023, 7:18 PM

Update test cases.

LuoYuanke updated this revision to Diff 504329.Mar 10 2023, 8:03 PM

Update test cases.

LuoYuanke retitled this revision from [X86] Create extra prolog/epilog for stack realignment to [X86][WIP] Create extra prolog/epilog for stack realignment.Mar 11 2023, 5:41 PM
LuoYuanke updated this revision to Diff 504599.Mar 13 2023, 5:01 AM
  1. Use scratch register for argument base pointer.
  2. Fix CFI instruction
  3. Enhance replacement of frame pointer with argument base pointer.
  4. c++ test case (https://reviews.llvm.org/D145926) can pass with -g.
LuoYuanke added inline comments.Mar 13 2023, 5:07 AM
llvm/test/CodeGen/MIR/X86/inline-asm-registers.mir
31 ↗(On Diff #504599)

GR64 register class ID is changed from 67 to 68. Since register class is encoded in the flag, this test case need to be updated.

LuoYuanke retitled this revision from [X86][WIP] Create extra prolog/epilog for stack realignment to [X86] Create extra prolog/epilog for stack realignment.Mar 13 2023, 5:08 AM

The patch is ready for review.

LuoYuanke added inline comments.Mar 13 2023, 5:11 AM
llvm/test/CodeGen/X86/i386-baseptr.ll
52

The argument base register is saved to a normal stack slot.

71

The argument base register is restored from a normal stack

LuoYuanke added inline comments.Mar 13 2023, 5:14 AM
llvm/test/CodeGen/X86/i386-baseptr.ll
51–54

Stack is aligned with 128 byte. (244 + 4 (ret addr) + 4 (ebp) +4 (esi) = 256)

RKSimon added inline comments.Mar 13 2023, 2:35 PM
llvm/lib/Target/X86/CMakeLists.txt
92

(style) keep this sorted if you can

LuoYuanke updated this revision to Diff 504905.Mar 13 2023, 6:08 PM

Address Simon's comments.

pengfei accepted this revision.Mar 20 2023, 8:44 AM

We had an offline discussion on this. I generally got the idea of the design. Let's land it first to see if there're unconsidered cases.

llvm/lib/Target/X86/X86ArgumentStackSlotRebase.cpp
98

Should assert rather than break?

146

Should assert rather than return?

llvm/lib/Target/X86/X86FrameLowering.cpp
4014

This can be auto.

4018

Use explicit type.

llvm/lib/Target/X86/X86RegisterInfo.cpp
818–819

The format looks odd, maybe clang-format?

llvm/lib/Target/X86/X86RegisterInfo.td
437

last .

443

ditto.

llvm/test/CodeGen/X86/i386-baseptr.ll
51–54

Nit, it seems a bit waste in space.

llvm/test/CodeGen/X86/swifttail-realign.ll
9

That looks good, thanks!

This revision is now accepted and ready to land.Mar 20 2023, 8:44 AM
LuoYuanke updated this revision to Diff 506780.Mar 20 2023, 4:55 PM
LuoYuanke marked 5 inline comments as done.

Address Phoebe's comments.

llvm/lib/Target/X86/X86ArgumentStackSlotRebase.cpp
98

If we use break it can return NoReg, so that nothing would be done in this pass and the extra prolog/epilog won't be inserted.

146

My original idea is to leave it as it is if we can't handle it.

llvm/test/CodeGen/X86/i386-baseptr.ll
51–54

Yes, but it is original code logic that is to align the stack address to 128.

This revision was landed with ongoing or failed builds.Mar 20 2023, 5:10 PM
This revision was automatically updated to reflect the committed changes.
nikic added a subscriber: nikic.Mar 21 2023, 3:34 AM

It looks like this broke the expensive checks build (x86-64-baseptr.ll and i386-baseptr.ll fail).

It looks like this broke the expensive checks build (x86-64-baseptr.ll and i386-baseptr.ll fail).

I'll check it with "-DLLVM_ENABLE_EXPENSIVE_CHECKS=ON". Thanks for notifying.

There are live range issue for the physical register. Let me revert the patch first.

This relanded in e4ceb5a7bb9b8f6d730530345649286370dd3ff8, and we're seeing runtime errors in Halide when building with asan.

I don't have a standalone reproducer I can share yet, but the issue we see is when attempting to return from this method: https://github.com/halide/Halide/blob/main/src/runtime/x86_cpu_features.cpp. If I had to guess, it's a bad interaction with the cpuid implementation here: https://github.com/halide/Halide/blob/2a51f71a0add536ab986c675d1239db860dbc655/src/runtime/x86.ll#L145

The problem is $rsp is not pointing to the right place, and so ret jumps to a bogus location, so the next instruction faults:

... break on start address of x86_cpu_features ...
(lldb) x/2 $rsp
0x7fffffffb818: 0x5806dc35 0x00005555
(lldb) c
...
(lldb) di -l

repro_test`Halide::Runtime::Internal::halide_get_cpu_features:
    0x55555806e23e <+1246>: movq   -0x48(%rbp), %r10
    0x55555806e242 <+1250>: leaq   -0x28(%rbp), %rsp
    0x55555806e246 <+1254>: popq   %rbx
    0x55555806e247 <+1255>: popq   %r12
    0x55555806e249 <+1257>: popq   %r13
    0x55555806e24b <+1259>: popq   %r14
    0x55555806e24d <+1261>: popq   %r15
    0x55555806e24f <+1263>: popq   %rbp
    0x55555806e250 <+1264>: leaq   -0x10(%r10), %rsp
->  0x55555806e254 <+1268>: retq
(lldb) x/2 $rsp
0x7fffffffb810: 0xffffb890 0x00007fff
(lldb) x/2 0x7fffffffb818
0x7fffffffb818: 0x5806dc35 0x00005555

repro_test`Halide::Runtime::Internal::halide_get_cpu_features:

0x55555806e23e <+1246>: movq   -0x48(%rbp), %r10
0x55555806e242 <+1250>: leaq   -0x28(%rbp), %rsp
0x55555806e246 <+1254>: popq   %rbx
0x55555806e247 <+1255>: popq   %r12
0x55555806e249 <+1257>: popq   %r13
0x55555806e24b <+1259>: popq   %r14
0x55555806e24d <+1261>: popq   %r15
0x55555806e24f <+1263>: popq   %rbp
0x55555806e250 <+1264>: leaq   -0x10(%r10), %rsp

-> 0x55555806e254 <+1268>: retq
(lldb) x/2 $rsp
0x7fffffffb810: 0xffffb890 0x00007fff
(lldb) x/2 0x7fffffffb818
0x7fffffffb818: 0x5806dc35 0x00005555

Thank you very much for the indicator. It seems the offset is incorrect for leaq -0x10(%r10), %rsp. It should be leaq -0x8(%r10), %rsp. Sorry for inconvenience. I'll fix it as soon as possible and I'll develop more runtime test case for it.

@rupprecht, could you help to check if https://reviews.llvm.org/D146862 can fix your issue?