Introduce pconfig and SGX related intrinsics.
Details
Diff Detail
- Repository
- rC Clang
- Build Status
Buildable 15964 Build 15964: arc lint + arc unit
Event Timeline
I see that gcc implements all of this with wrappers around inline assembly. Any reason we couldn't just do that?
lib/Headers/pconfigintrin.h | ||
---|---|---|
32 | This doesn't match the name used by gcc. It also needs to start with underscores since all names without underscores belong to user code. | |
lib/Headers/sgxintrin.h | ||
53 | You need to undef __DEFAULT_FN_ATTRS |
I didn't see an answer to the inline assembly question.
lib/Headers/pconfigintrin.h | ||
---|---|---|
32 | Gcc's constant is called __PCONFIG_KEY_PROGRAM from what I see in their repository |
Here is a variation on this, using inline asm:
pconfig: https://reviews.llvm.org/D46431
I see you've ended up implementing the intrinsics with inline assembly but are there any reasons not to do it with builtins like in this patch? The problem with inline assembly is that for some Apple platforms we require developers to emit bitcode. And inline assembly isn't really compatible with bitcode abstraction. Unfortunately, I have no experience with Intel intrinsics and I need your help in pursuing implementing them with builtins.
The main reason that we used inline assembly is due to the fixed register allocation for these instructions. We would have had to write out the register rules in the backend as a special case as seen in the getInstrWFourImplicitOps in https://reviews.llvm.org/D44386 Since the register allocator had no freedom there was no performance advantage to an intrinsic. So we just did inline assembly to avoid needing to special case these instructions.
Thanks for the insightful explanation, Craig. As far as I understand, implementing intrinsics with builtins is possible but it is more complex and wasn't providing enough value to prefer it over inline assembly. If that is correct, I'd like to revive the abandoned implementation. Does it sound reasonable? And do I need to do something special for testing? Because I don't have access to corresponding hardware and unable to test the intrinsics on the actual hardware.
Your understanding is correct. As far as testing I think the existing testing in the original patches is sufficient.
I'm not sure I understand how a target specific intrinsic that only works on x86 in the bitcode is substantially better than inline assembly. Do you plan to also change the cpuid intrinsics in cpuid.h that are also implemented in inline assembly?
Having in bitcode something like @llvm.x86.encls.64 is better than inline assembly because we understand the meaning of the bitcode while we don't parse assembly and have a very limited understanding of what it is doing.
The use case we need to support is basically
--- a/clang/test/Modules/compiler_builtins_x86.c +++ b/clang/test/Modules/compiler_builtins_x86.c @@ -1,6 +1,7 @@ // RUN: rm -rf %t // RUN: %clang_cc1 -triple i686-unknown-unknown -fsyntax-only -fmodules -fimplicit-module-maps -fmodules-cache-path=%t %s -verify -ffreestanding // RUN: %clang_cc1 -triple i686-unknown-unknown -fsyntax-only -fmodules -fmodule-map-file=%resource_dir/module.modulemap -fmodules-cache-path=%t %s -verify -ffreestanding +// RUN: %clang_cc1 -triple i686-unknown-unknown -fsyntax-only -fmodules -fimplicit-module-maps -fmodules-cache-path=%t %s -verify -ffreestanding -fno-gnu-inline-asm // expected-no-diagnostics #include<x86intrin.h>
And this case isn't affected by cpuid, so we don't plan to change that as well as Windows-specific assembly in immintrin.h.
While writing the response, it occurred to me that another solution might be changing -fno-gnu-inline-asm to cover only actually called code. If you don't call _pconfig_u32, it shouldn't matter that in the header it is implemented with inline assembly. But this is an early idea, I need to discuss with the team if it will work and if there are other constraints I'm missing.
Discussed with the team different approaches and suggesting https://reviews.llvm.org/D61621
I think all our other headers use double underscore here.