This is an archive of the discontinued LLVM Phabricator instance.

[x86] Introduce the pconfig/encl[u|s|v] intrinsics
AbandonedPublic

Authored by GBuella on Mar 12 2018, 8:12 AM.

Details

Reviewers
craig.topper
zvi
Summary

Introduce pconfig and SGX related intrinsics.

Diff Detail

Event Timeline

GBuella created this revision.Mar 12 2018, 8:12 AM
GBuella updated this revision to Diff 143639.Apr 23 2018, 2:05 PM

Rebased the patch.
Added pconfig to Icelake Server.

GBuella updated this revision to Diff 143640.Apr 23 2018, 2:09 PM

I see that gcc implements all of this with wrappers around inline assembly. Any reason we couldn't just do that?

lib/Headers/pconfigintrin.h
32

This doesn't match the name used by gcc. It also needs to start with underscores since all names without underscores belong to user code.

lib/Headers/sgxintrin.h
53

You need to undef __DEFAULT_FN_ATTRS

craig.topper added inline comments.Apr 25 2018, 9:44 AM
lib/Headers/pconfigintrin.h
29

I think all our other headers use double underscore here.

lib/Headers/sgxintrin.h
29

double underscore

GBuella updated this revision to Diff 144112.Apr 26 2018, 7:15 AM
GBuella updated this revision to Diff 144113.
GBuella marked 4 inline comments as done.Apr 26 2018, 7:17 AM

I didn't see an answer to the inline assembly question.

lib/Headers/pconfigintrin.h
32

Gcc's constant is called __PCONFIG_KEY_PROGRAM from what I see in their repository

GBuella added a comment.EditedMay 4 2018, 6:49 AM

Here is a variation on this, using inline asm:

pconfig: https://reviews.llvm.org/D46431

SGX: https://reviews.llvm.org/D46435

GBuella abandoned this revision.May 8 2018, 4:03 AM

I see you've ended up implementing the intrinsics with inline assembly but are there any reasons not to do it with builtins like in this patch? The problem with inline assembly is that for some Apple platforms we require developers to emit bitcode. And inline assembly isn't really compatible with bitcode abstraction. Unfortunately, I have no experience with Intel intrinsics and I need your help in pursuing implementing them with builtins.

The main reason that we used inline assembly is due to the fixed register allocation for these instructions. We would have had to write out the register rules in the backend as a special case as seen in the getInstrWFourImplicitOps in https://reviews.llvm.org/D44386 Since the register allocator had no freedom there was no performance advantage to an intrinsic. So we just did inline assembly to avoid needing to special case these instructions.

Thanks for the insightful explanation, Craig. As far as I understand, implementing intrinsics with builtins is possible but it is more complex and wasn't providing enough value to prefer it over inline assembly. If that is correct, I'd like to revive the abandoned implementation. Does it sound reasonable? And do I need to do something special for testing? Because I don't have access to corresponding hardware and unable to test the intrinsics on the actual hardware.

Your understanding is correct. As far as testing I think the existing testing in the original patches is sufficient.

I'm not sure I understand how a target specific intrinsic that only works on x86 in the bitcode is substantially better than inline assembly. Do you plan to also change the cpuid intrinsics in cpuid.h that are also implemented in inline assembly?

Having in bitcode something like @llvm.x86.encls.64 is better than inline assembly because we understand the meaning of the bitcode while we don't parse assembly and have a very limited understanding of what it is doing.

The use case we need to support is basically

--- a/clang/test/Modules/compiler_builtins_x86.c
+++ b/clang/test/Modules/compiler_builtins_x86.c
@@ -1,6 +1,7 @@
 // RUN: rm -rf %t
 // RUN: %clang_cc1 -triple i686-unknown-unknown -fsyntax-only -fmodules -fimplicit-module-maps -fmodules-cache-path=%t %s -verify -ffreestanding
 // RUN: %clang_cc1 -triple i686-unknown-unknown -fsyntax-only -fmodules -fmodule-map-file=%resource_dir/module.modulemap -fmodules-cache-path=%t %s -verify -ffreestanding
+// RUN: %clang_cc1 -triple i686-unknown-unknown -fsyntax-only -fmodules -fimplicit-module-maps -fmodules-cache-path=%t %s -verify -ffreestanding -fno-gnu-inline-asm
 // expected-no-diagnostics
 
 #include<x86intrin.h>

And this case isn't affected by cpuid, so we don't plan to change that as well as Windows-specific assembly in immintrin.h.

While writing the response, it occurred to me that another solution might be changing -fno-gnu-inline-asm to cover only actually called code. If you don't call _pconfig_u32, it shouldn't matter that in the header it is implemented with inline assembly. But this is an early idea, I need to discuss with the team if it will work and if there are other constraints I'm missing.

Discussed with the team different approaches and suggesting https://reviews.llvm.org/D61621