There are some users want to use __SSC_MARK for special use.
We have a builtin implement way in https://reviews.llvm.org/D108439
After some discussion in that patch, we decide to let user to use it by #include immintrin.h
Details
Diff Detail
Event Timeline
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
23 | Does it also work on windows? |
Is this subject to the same issue that causes cpuid.h to save rbx on x86-64 when it is used as the base pointer?
__SSC_MARK is used to mark code blocks. "[mov ebx, const_int_id] + [emit 0x64 0x67 0x90], " will be recognized by emulators/simulators. it does not have any side effects,
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
23 | I test it in windows machine, it passed. |
The inline asm has a mov to ebx in it and has ebx in the clobber list. This affects register allocation. If ebx is the base pointer does this work correctly?
It has show the ebx will be clobber, so I think the ebx will be auto restored after the inline asm ?
It didn't work for the modification of rbx in cpuid.h. See this comment. Please test it.
/* x86-64 uses %rbx as the base register, so preserve it. */ #define __cpuid(__leaf, __eax, __ebx, __ecx, __edx) \ __asm(" xchgq %%rbx,%q1\n" \ " cpuid\n" \ " xchgq %%rbx,%q1" \ : "=a"(__eax), "=r" (__ebx), "=c"(__ecx), "=d"(__edx) \ : "0"(__leaf))
I can't find a good reproducer for the cpuid.h issue but maybe this mwaitx bug can help https://bugs.llvm.org/show_bug.cgi?id=43528 maybe you just need to turn on the stack protector to test it?
if we directly use the ebx after SSC_MARK with another inline_asm, it will not restore the ebx. I am not sure this is a problem or not.
Let me try test43528.
3 #include <immintrin.h> 4 5 void ssc_mark(unsigned level, unsigned count) { 6 7 /* x86-64 uses %rbx as the base register, so preserve it. */ 8 #define __cpuid(__leaf, __eax, __ebx, __ecx, __edx) \ 9 __asm(" xchgq %%rbx,%q1\n" \ 10 " cpuid\n" \ 11 " xchgq %%rbx,%q1" \ 12 : "=a"(__eax), "=r" (__ebx), "=c"(__ecx), "=d"(__edx) \ 13 : "0"(__leaf)) 14 15 unsigned eax1, ebx1, ecx1, edx1; 16 __cpuid(level, eax1, ebx1, ecx1, edx1); 17 18 __SSC_MARK(0x0); 19 20 asm("movl %%ebx, %%eax"::); 21 }
The register allocator can’t read the inline asm text. It can’t see you are reading ebx.
It looks the test https://bugs.llvm.org/show_bug.cgi?id=43528 will only def the ebx
I think we need to let RA know we will use ebx after clobber, so let it restore its value before the clobbered inline_asm
I test this test:
3 #include <immintrin.h> 4 5 6 /* x86-64 uses %rbx as the base register, so preserve it. */ 7 #define __cpuid(__leaf, __eax, __ebx, __ecx, __edx) \ 8 __asm(" xchgq %%rbx,%q1\n" \ 9 " cpuid\n" \ 10 " xchgq %%rbx,%q1" \ 11 : "=a"(__eax), "=r" (__ebx), "=c"(__ecx), "=d"(__edx) \ 12 : "0"(__leaf)) 13 14 15 16 static __inline__ void __attribute__((__always_inline__)) 17 _mm_monitorx(void const * __p, unsigned __extensions, unsigned __hints) 18 { 19 __builtin_ia32_monitorx((void *)__p, __extensions, __hints); 20 } 21 22 static __inline__ void __attribute__((__always_inline__)) 23 _mm_mwaitx(unsigned __extensions, unsigned __hints, unsigned __clock) 24 { 25 __builtin_ia32_mwaitx(__extensions, __hints, __clock); 26 } 27 28 int main(int argc, char ** argv) { 29 int v; 30 v = 0; 31 32 unsigned eax1, ebx1, ecx1, edx1, level; 33 __cpuid(level, eax1, ebx1, ecx1, edx1); 34 35 __SSC_MARK(0x0); 36 37 _mm_monitorx(&v, 0, 0); 38 _mm_mwaitx(0, 0, 1); 39 }
clang -mmwaitx -S test.c :
6 ssc_mark: # @ssc_mark 7 .cfi_startproc 8 # %bb.0: # %entry 9 pushq %rbp 10 .cfi_def_cfa_offset 16 11 .cfi_offset %rbp, -16 12 movq %rsp, %rbp 13 .cfi_def_cfa_register %rbp 14 pushq %rbx 15 .cfi_offset %rbx, -24 16 movl %edi, -12(%rbp) 17 movl %esi, -16(%rbp) 18 movl -12(%rbp), %eax 19 #APP 20 xchgq %rbx, %rsi 21 cpuid 22 xchgq %rbx, %rsi 23 #NO_APP 24 movl %esi, -36(%rbp) # 4-byte Spill 25 movl %eax, %esi 26 movl -36(%rbp), %eax # 4-byte Reload 27 movl %esi, -20(%rbp) 28 movl %eax, -24(%rbp) 29 movl %ecx, -28(%rbp) 30 movl %edx, -32(%rbp) 31 #APP 32 movl $0, %ebx 33 .byte 100 34 .byte 103 35 .byte 144 36 #NO_APP 37 #APP 38 movl %ebx, %eax 39 #NO_APP 40 popq %rbx 41 popq %rbp 42 .cfi_def_cfa %rsp, 8 43 retq
Sorry I attach the wrong asm, it should be:
clang -mmwaitx -S test.c :
6 main: # @main 7 .cfi_startproc 8 # %bb.0: # %entry 9 pushq %rbp 10 .cfi_def_cfa_offset 16 11 .cfi_offset %rbp, -16 12 movq %rsp, %rbp 13 .cfi_def_cfa_register %rbp 14 pushq %rbx 15 .cfi_offset %rbx, -24 16 movl %edi, -44(%rbp) 17 movq %rsi, -56(%rbp) 18 movl $0, -60(%rbp) 19 movl -80(%rbp), %eax 20 #APP 21 xchgq %rbx, %rsi 22 cpuid 23 xchgq %rbx, %rsi 24 #NO_APP 25 movl %esi, -84(%rbp) # 4-byte Spill 26 movl %eax, %esi 27 movl -84(%rbp), %eax # 4-byte Reload 28 movl %esi, -64(%rbp) 29 movl %eax, -68(%rbp) 30 movl %ecx, -72(%rbp) 31 movl %edx, -76(%rbp) 32 #APP 33 movl $0, %ebx 34 .byte 100 35 .byte 103 36 .byte 144 37 #NO_APP 38 leaq -60(%rbp), %rax 39 movq %rax, -32(%rbp) 40 movl $0, -36(%rbp) 41 movl $0, -40(%rbp) 42 movq -32(%rbp), %rax 43 movl -36(%rbp), %ecx 44 movl -40(%rbp), %edx 45 monitorx 46 movl $0, -12(%rbp) 47 movl $0, -16(%rbp) 48 movl $1, -20(%rbp) 49 movl -12(%rbp), %ecx 50 movl -16(%rbp), %eax 51 movl -20(%rbp), %ebx 52 mwaitx 53 xorl %eax, %eax 54 popq %rbx 55 popq %rbp 56 .cfi_def_cfa %rsp, 8 57 retq
I fixed a similar issue: https://bugs.llvm.org/show_bug.cgi?id=50133, you should find the use of rbx in the reproducer.
This is a long existing problem, see https://bugs.llvm.org/show_bug.cgi?id=16830, and the fix seems not trivial. A workaround should be more easy.
I reproduce this problem by changing pengfei's test (Good reproduce test need build on windows with "clang-cl.exe -mavx2 /EHs /c /Fa t.cpp")
We need to restore the ebx.
test t.cpp:
#include <array> #include <vector> #define __SSC_MARK(tag) \ __asm__ __volatile__("movl %0, %%ebx; .byte 0x64, 0x67, 0x90 " ::"i"(tag) \ : "%ebx") struct A { std::string S; std::vector<std::array<int, 4>> V; A() { std::array<int, 4> B; __SSC_MARK(0); V.push_back(B); V.push_back(B); char C[64]; memset(C, 0, sizeof(C)); S = C; } } T;
problem asm:
mov qword ptr [rbx + 40], rax # 8-byte Spill call "??0?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@QEAA@XZ" mov rcx, qword ptr [rbx + 48] # 8-byte Reload add rcx, 32 mov qword ptr [rbx + 56], rcx # 8-byte Spill call "??0?$vector@V?$array@H$03@std@@V?$allocator@V?$array@H$03@std@@@2@@std@@QEAA@XZ" mov rcx, qword ptr [rbx + 56] # 8-byte Reload #APP mov ebx, 0 <----------------- modify ebx by SSC_MARK .byte 100 .byte 103 .byte 144 #NO_APP .Ltmp0: lea rdx, [rbx + 136] <---------------- But not restore the RBX call "?push_back@?$vector@V?$array@H$03@std@@V?$allocator@V?$array@H$03@std@@@2@@std@@QEAAXAEBV?$array@H$03@2@@Z" .Ltmp1: jmp .LBB1_1
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
23 | I don't know the user scenario, but I guess this may be not correct. You are changing the assembly from imm -> ebx to reg -> ebx. |
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
23 | The __SSC_MARK(const int Tag) will make sure the Tag's type is const int. So no need to set "i" again. (And that is illegal for const int argument) | |
26 | We just need to make sure the Tag is const int for function __SSC_MARK(const int Tag) | |
27 | RA will not assigned the stack value to ebx. |
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
26 | It is not about if Tag is const or not. It is the difference of generated binary, see your comment before:
I guess the emulator recognizes the binary sequence e.g. bbxxxxxxxx646790, where xx is the 32 bits immediate. Changing to the register variant will make emulator fail to recognize it. |
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
26 | good catch! I am not much sure the emulators/simulators will check mov instruction or not, but it is a safe way to constrain here to be immediate value. Thanks a lot! |
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
25 | I doubt if push/pop is still problematical. E.g. red zone? |
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
25 | Another thought is if we can add a new psudo instruction with only immidate input and imply def ebx in .td and expand it when emit asm/bin. I think we may avoid the ebx issue in this way. |
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
25 | Make sense, I remember we encountered such red zone case, in the leaf function, rsp may not real point to the real top of stack, push is dangerous. There is 2 ways to fix it: Let me first try to fix the root problem, if it is very hard, I'll take the workaround. |
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
25 | There may be a gap in what the register allocator knows here. It may only know the register is reserved. It may not be able to tell the difference between a reserved register that has a special assignment like ESP and EBP and EBX when they are used as frame and base pointers versus R8-R15 being reserved in 32-bit mode. The register allocator may have no idea that EBX is "live". |
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
25 | Yes, I also worry about is is a arch design problem for RA, anyway, let me do some research here. And another easy way, may be replace "push ebx" to "mov [rsp + 128], ebx", I remember reg zone just works for leaf functions with stack size <=128bytes. (I need to recheck it.) |
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
25 | Aha! Why we focus on clobber ebx ?! For example: 1 #define __SSC_MARK2(Tag) \ 2 __asm__ __volatile__("movl %%ebx, %%eax; movl %0, %%ebx; .byte 0x64, 0x67, 0x90; \ 3 movl %%eax, %%ebx;" :: "i"(Tag) : "%ebx","%eax"); 4 int ssc_mark() { 5 int a=3; 6 __SSC_MARK2(0x0); 7 return a; 8 } key asm: 16 movl $3, -12(%rbp) 17 #APP 18 movl %ebx, %eax 19 movl $0, %ebx 20 .byte 100 21 .byte 103 22 .byte 144 23 movl %eax, %ebx 24 25 #NO_APP 26 movl -12(%rbp), %eax BTW, I did a quick check on RedZone for idea of "mov [rsp + 128], ebx" I memtioned above, this is not ok, because if the leaf function StackSize > 128, it still share the RedZone by let its new StackSize-128. |
LGTM, but please wait 1 or 2 days to see if others have objections.
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
26 | We don't clobber ebx now. |
clang/lib/Headers/x86gprintrin.h | ||
---|---|---|
26 | Right, thanks a lot for all of your careful review! |
Does it also work on windows?