This is an archive of the discontinued LLVM Phabricator instance.

[X86] Support __SSC_MARK(const int id) in x86gprintrin.h
ClosedPublic

Authored by xiangzhangllvm on Aug 24 2021, 7:11 PM.

Details

Summary

There are some users want to use __SSC_MARK for special use.
We have a builtin implement way in https://reviews.llvm.org/D108439
After some discussion in that patch, we decide to let user to use it by #include immintrin.h

Diff Detail

Event Timeline

xiangzhangllvm requested review of this revision.Aug 24 2021, 7:11 PM
xiangzhangllvm created this revision.
LuoYuanke added inline comments.Aug 24 2021, 7:29 PM
clang/lib/Headers/x86gprintrin.h
23

Does it also work on windows?

Is this subject to the same issue that causes cpuid.h to save rbx on x86-64 when it is used as the base pointer?

Is this subject to the same issue that causes cpuid.h to save rbx on x86-64 when it is used as the base pointer?

__SSC_MARK is used to mark code blocks. "[mov ebx, const_int_id] + [emit 0x64 0x67 0x90], " will be recognized by emulators/simulators. it does not have any side effects,

clang/lib/Headers/x86gprintrin.h
23

I test it in windows machine, it passed.

Is this subject to the same issue that causes cpuid.h to save rbx on x86-64 when it is used as the base pointer?

__SSC_MARK is used to mark code blocks. "[mov ebx, const_int_id] + [emit 0x64 0x67 0x90], " will be recognized by emulators/simulators. it does not have any side effects,

The inline asm has a mov to ebx in it and has ebx in the clobber list. This affects register allocation. If ebx is the base pointer does this work correctly?

The inline asm has a mov to ebx in it and has ebx in the clobber list. This affects register allocation. If ebx is the base pointer does this work correctly?

It has show the ebx will be clobber, so I think the ebx will be auto restored after the inline asm ?

The inline asm has a mov to ebx in it and has ebx in the clobber list. This affects register allocation. If ebx is the base pointer does this work correctly?

It has show the ebx will be clobber, so I think the ebx will be auto restored after the inline asm ?

It didn't work for the modification of rbx in cpuid.h. See this comment. Please test it.

/* x86-64 uses %rbx as the base register, so preserve it. */                                                                                                                                                                                   
#define __cpuid(__leaf, __eax, __ebx, __ecx, __edx) \                                                                                                                                                                                          
    __asm("  xchgq  %%rbx,%q1\n" \                                                                                                                                                                                                             
          "  cpuid\n" \                                                                                                                                                                                                                        
          "  xchgq  %%rbx,%q1" \                                                                                                                                                                                                               
        : "=a"(__eax), "=r" (__ebx), "=c"(__ecx), "=d"(__edx) \                                                                                                                                                                                
        : "0"(__leaf))

I can't find a good reproducer for the cpuid.h issue but maybe this mwaitx bug can help https://bugs.llvm.org/show_bug.cgi?id=43528 maybe you just need to turn on the stack protector to test it?

if we directly use the ebx after SSC_MARK with another inline_asm, it will not restore the ebx. I am not sure this is a problem or not.
Let me try test43528.

 3 #include <immintrin.h>
 4
 5 void ssc_mark(unsigned level, unsigned count) {
 6
 7 /* x86-64 uses %rbx as the base register, so preserve it. */
 8 #define __cpuid(__leaf, __eax, __ebx, __ecx, __edx) \
 9     __asm("  xchgq  %%rbx,%q1\n" \
10           "  cpuid\n" \
11           "  xchgq  %%rbx,%q1" \
12         : "=a"(__eax), "=r" (__ebx), "=c"(__ecx), "=d"(__edx) \
13         : "0"(__leaf))
14
15 unsigned eax1, ebx1, ecx1, edx1;
16 __cpuid(level, eax1, ebx1, ecx1, edx1);
17
18  __SSC_MARK(0x0);
19
20 asm("movl %%ebx, %%eax"::);
21 }

if we directly use the ebx after SSC_MARK with another inline_asm, it will not restore the ebx. I am not sure this is a problem or not.
Let me try test43528.

 3 #include <immintrin.h>
 4
 5 void ssc_mark(unsigned level, unsigned count) {
 6
 7 /* x86-64 uses %rbx as the base register, so preserve it. */
 8 #define __cpuid(__leaf, __eax, __ebx, __ecx, __edx) \
 9     __asm("  xchgq  %%rbx,%q1\n" \
10           "  cpuid\n" \
11           "  xchgq  %%rbx,%q1" \
12         : "=a"(__eax), "=r" (__ebx), "=c"(__ecx), "=d"(__edx) \
13         : "0"(__leaf))
14
15 unsigned eax1, ebx1, ecx1, edx1;
16 __cpuid(level, eax1, ebx1, ecx1, edx1);
17
18  __SSC_MARK(0x0);
19
20 asm("movl %%ebx, %%eax"::);
21 }

The register allocator can’t read the inline asm text. It can’t see you are reading ebx.

It looks the test https://bugs.llvm.org/show_bug.cgi?id=43528 will only def the ebx

I think we need to let RA know we will use ebx after clobber, so let it restore its value before the clobbered inline_asm

I test this test:

 3 #include <immintrin.h>
 4
 5
 6 /* x86-64 uses %rbx as the base register, so preserve it. */
 7 #define __cpuid(__leaf, __eax, __ebx, __ecx, __edx) \
 8     __asm("  xchgq  %%rbx,%q1\n" \
 9           "  cpuid\n" \
10           "  xchgq  %%rbx,%q1" \
11         : "=a"(__eax), "=r" (__ebx), "=c"(__ecx), "=d"(__edx) \
12         : "0"(__leaf))
13
14
15
16 static __inline__ void __attribute__((__always_inline__))
17 _mm_monitorx(void const * __p, unsigned __extensions, unsigned __hints)
18 {
19   __builtin_ia32_monitorx((void *)__p, __extensions, __hints);
20 }
21
22 static __inline__ void __attribute__((__always_inline__))
23 _mm_mwaitx(unsigned __extensions, unsigned __hints, unsigned __clock)
24 {
25   __builtin_ia32_mwaitx(__extensions, __hints, __clock);
26 }
27
28 int main(int argc, char ** argv) {
29   int v;
30   v = 0;
31
32   unsigned eax1, ebx1, ecx1, edx1, level;
33   __cpuid(level, eax1, ebx1, ecx1, edx1);
34
35  __SSC_MARK(0x0);
36
37   _mm_monitorx(&v, 0, 0);
38   _mm_mwaitx(0, 0, 1);
39 }

clang -mmwaitx -S test.c :

 6 ssc_mark:                               # @ssc_mark
 7         .cfi_startproc
 8 # %bb.0:                                # %entry
 9         pushq   %rbp
10         .cfi_def_cfa_offset 16
11         .cfi_offset %rbp, -16
12         movq    %rsp, %rbp
13         .cfi_def_cfa_register %rbp
14         pushq   %rbx
15         .cfi_offset %rbx, -24
16         movl    %edi, -12(%rbp)
17         movl    %esi, -16(%rbp)
18         movl    -12(%rbp), %eax
19         #APP
20         xchgq   %rbx, %rsi
21         cpuid
22         xchgq   %rbx, %rsi
23         #NO_APP
24         movl    %esi, -36(%rbp)                 # 4-byte Spill
25         movl    %eax, %esi
26         movl    -36(%rbp), %eax                 # 4-byte Reload
27         movl    %esi, -20(%rbp)
28         movl    %eax, -24(%rbp)
29         movl    %ecx, -28(%rbp)
30         movl    %edx, -32(%rbp)
31         #APP
32         movl    $0, %ebx
33         .byte   100
34         .byte   103
35         .byte   144
36         #NO_APP
37         #APP
38         movl    %ebx, %eax
39         #NO_APP
40         popq    %rbx
41         popq    %rbp
42         .cfi_def_cfa %rsp, 8
43         retq

Sorry I attach the wrong asm, it should be:
clang -mmwaitx -S test.c :

 6 main:                                   # @main
 7         .cfi_startproc
 8 # %bb.0:                                # %entry
 9         pushq   %rbp
10         .cfi_def_cfa_offset 16
11         .cfi_offset %rbp, -16
12         movq    %rsp, %rbp
13         .cfi_def_cfa_register %rbp
14         pushq   %rbx
15         .cfi_offset %rbx, -24
16         movl    %edi, -44(%rbp)
17         movq    %rsi, -56(%rbp)
18         movl    $0, -60(%rbp)
19         movl    -80(%rbp), %eax
20         #APP
21         xchgq   %rbx, %rsi
22         cpuid
23         xchgq   %rbx, %rsi
24         #NO_APP
25         movl    %esi, -84(%rbp)                 # 4-byte Spill
26         movl    %eax, %esi
27         movl    -84(%rbp), %eax                 # 4-byte Reload
28         movl    %esi, -64(%rbp)
29         movl    %eax, -68(%rbp)
30         movl    %ecx, -72(%rbp)
31         movl    %edx, -76(%rbp)
32         #APP
33         movl    $0, %ebx
34         .byte   100
35         .byte   103
36         .byte   144
37         #NO_APP
38         leaq    -60(%rbp), %rax
39         movq    %rax, -32(%rbp)
40         movl    $0, -36(%rbp)
41         movl    $0, -40(%rbp)
42         movq    -32(%rbp), %rax
43         movl    -36(%rbp), %ecx
44         movl    -40(%rbp), %edx
45         monitorx
46         movl    $0, -12(%rbp)
47         movl    $0, -16(%rbp)
48         movl    $1, -20(%rbp)
49         movl    -12(%rbp), %ecx
50         movl    -16(%rbp), %eax
51         movl    -20(%rbp), %ebx
52         mwaitx
53         xorl    %eax, %eax
54         popq    %rbx
55         popq    %rbp
56         .cfi_def_cfa %rsp, 8
57         retq

I fixed a similar issue: https://bugs.llvm.org/show_bug.cgi?id=50133, you should find the use of rbx in the reproducer.
This is a long existing problem, see https://bugs.llvm.org/show_bug.cgi?id=16830, and the fix seems not trivial. A workaround should be more easy.

I fixed a similar issue: https://bugs.llvm.org/show_bug.cgi?id=50133, you should find the use of rbx in the reproducer.
This is a long existing problem, see https://bugs.llvm.org/show_bug.cgi?id=16830, and the fix seems not trivial. A workaround should be more easy.

Very good for me! Let me reproduce it on your test, thanks very much!

I reproduce this problem by changing pengfei's test (Good reproduce test need build on windows with "clang-cl.exe -mavx2 /EHs /c /Fa t.cpp")
We need to restore the ebx.

test t.cpp:

#include <array>
#include <vector>

#define __SSC_MARK(tag)                                                        \
  __asm__ __volatile__("movl %0, %%ebx; .byte 0x64, 0x67, 0x90 " ::"i"(tag)    \
                       : "%ebx")

struct A {
  std::string S;
  std::vector<std::array<int, 4>> V;
  A() {
	std::array<int, 4> B;
	__SSC_MARK(0);
	V.push_back(B);
	V.push_back(B);
	char C[64];
	memset(C, 0, sizeof(C));
	S = C;
  }
} T;

problem asm:

	mov	qword ptr [rbx + 40], rax       # 8-byte Spill
	call	"??0?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@QEAA@XZ"
	mov	rcx, qword ptr [rbx + 48]       # 8-byte Reload
	add	rcx, 32
	mov	qword ptr [rbx + 56], rcx       # 8-byte Spill
	call	"??0?$vector@V?$array@H$03@std@@V?$allocator@V?$array@H$03@std@@@2@@std@@QEAA@XZ"
	mov	rcx, qword ptr [rbx + 56]       # 8-byte Reload
	#APP
	mov	ebx, 0                    <-----------------   modify ebx by SSC_MARK
	.byte	100
	.byte	103
	.byte	144
	#NO_APP
.Ltmp0:
	lea	rdx, [rbx + 136]            <---------------- But not restore the RBX
	call	"?push_back@?$vector@V?$array@H$03@std@@V?$allocator@V?$array@H$03@std@@@2@@std@@QEAAXAEBV?$array@H$03@2@@Z"
.Ltmp1:
	jmp	.LBB1_1

Preserve/Restore the ebx register.

LuoYuanke added inline comments.Aug 25 2021, 1:55 AM
clang/lib/Headers/x86gprintrin.h
26

Not sure it is OK to change from "i" to "r".

27

OldEbx may be assigned to ebx?

pengfei added inline comments.Aug 25 2021, 1:55 AM
clang/lib/Headers/x86gprintrin.h
23

I don't know the user scenario, but I guess this may be not correct. You are changing the assembly from imm -> ebx to reg -> ebx.

xiangzhangllvm added inline comments.Aug 25 2021, 2:04 AM
clang/lib/Headers/x86gprintrin.h
23

The __SSC_MARK(const int Tag) will make sure the Tag's type is const int. So no need to set "i" again. (And that is illegal for const int argument)

26

We just need to make sure the Tag is const int for function __SSC_MARK(const int Tag)

27

RA will not assigned the stack value to ebx.
If ebx join the Register Allocation, there will be no problem for "over-write" old ebx vlaue. (because ebx is in clobber list)

pengfei added inline comments.Aug 25 2021, 6:31 AM
clang/lib/Headers/x86gprintrin.h
26

It is not about if Tag is const or not. It is the difference of generated binary, see your comment before:

__SSC_MARK is used to mark code blocks. "[mov ebx, const_int_id] + [emit 0x64 0x67 0x90], " will be recognized by emulators/simulators.

I guess the emulator recognizes the binary sequence e.g. bbxxxxxxxx646790, where xx is the 32 bits immediate. Changing to the register variant will make emulator fail to recognize it.

xiangzhangllvm added inline comments.Aug 25 2021, 5:16 PM
clang/lib/Headers/x86gprintrin.h
26

good catch! I am not much sure the emulators/simulators will check mov instruction or not, but it is a safe way to constrain here to be immediate value. Thanks a lot!

xiangzhangllvm retitled this revision from [X86] Support __SSC_MARK(const int) in x86gprintrin.h to [X86] Support __SSC_MARK(const int id) in x86gprintrin.h.
pengfei added inline comments.Aug 25 2021, 7:13 PM
clang/lib/Headers/x86gprintrin.h
25

I doubt if push/pop is still problematical. E.g. red zone?
But I don't have a good solution. Maybe we can just ignore the use of ebx and leave it as a FIXME.

pengfei added inline comments.Aug 25 2021, 7:20 PM
clang/lib/Headers/x86gprintrin.h
25

Another thought is if we can add a new psudo instruction with only immidate input and imply def ebx in .td and expand it when emit asm/bin. I think we may avoid the ebx issue in this way.

xiangzhangllvm added inline comments.Aug 25 2021, 8:26 PM
clang/lib/Headers/x86gprintrin.h
25

Make sense, I remember we encountered such red zone case, in the leaf function, rsp may not real point to the real top of stack, push is dangerous.

There is 2 ways to fix it:
workaround:
Creating a stack slot saving ebx in IR is ok for it.
fix the root problem:
When we choose rbx for base pointer, RA should restore it after clobber it.

Let me first try to fix the root problem, if it is very hard, I'll take the workaround.

craig.topper added inline comments.Aug 25 2021, 8:43 PM
clang/lib/Headers/x86gprintrin.h
25

There may be a gap in what the register allocator knows here. It may only know the register is reserved. It may not be able to tell the difference between a reserved register that has a special assignment like ESP and EBP and EBX when they are used as frame and base pointers versus R8-R15 being reserved in 32-bit mode. The register allocator may have no idea that EBX is "live".

xiangzhangllvm added inline comments.Aug 25 2021, 8:59 PM
clang/lib/Headers/x86gprintrin.h
25

Yes, I also worry about is is a arch design problem for RA, anyway, let me do some research here.

And another easy way, may be replace "push ebx" to "mov [rsp + 128], ebx", I remember reg zone just works for leaf functions with stack size <=128bytes. (I need to recheck it.)

clang/lib/Headers/x86gprintrin.h
25

Aha! Why we focus on clobber ebx ?!
We just need to clobber other register (which RA can see), then mov the ebx value to it!

For example:

1  #define __SSC_MARK2(Tag)                                                        \
2    __asm__ __volatile__("movl %%ebx, %%eax; movl %0, %%ebx; .byte 0x64, 0x67, 0x90;   \
3                          movl %%eax, %%ebx;" :: "i"(Tag) : "%ebx","%eax");
4 int ssc_mark() {
5   int a=3;
6   __SSC_MARK2(0x0);
7   return a;
8 }

key asm:

16         movl    $3, -12(%rbp)
17         #APP
18         movl    %ebx, %eax
19         movl    $0, %ebx
20         .byte   100
21         .byte   103
22         .byte   144
23         movl    %eax, %ebx
24
25         #NO_APP
26         movl    -12(%rbp), %eax

BTW, I did a quick check on RedZone for idea of "mov [rsp + 128], ebx" I memtioned above, this is not ok, because if the leaf function StackSize > 128, it still share the RedZone by let its new StackSize-128.
In other word: its stack may > 128 and in the meaning time the rsp is not at top of stack too.

pengfei accepted this revision.Aug 26 2021, 1:55 AM

LGTM, but please wait 1 or 2 days to see if others have objections.

clang/lib/Headers/x86gprintrin.h
26

We don't clobber ebx now.

This revision is now accepted and ready to land.Aug 26 2021, 1:55 AM
xiangzhangllvm added inline comments.Aug 26 2021, 5:04 PM
clang/lib/Headers/x86gprintrin.h
26

Right, thanks a lot for all of your careful review!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2021, 5:52 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript