https://reviews.llvm.org/D23944 implemented the #pragma intrinsic from
MSVC. This causes the statement #pragma intrinsic(cpuid) to fail [0]
on Clang because cpuid is currently implemented in intrin.h instead
of a Clang builtin. Reimplementing cpuid (as well as it's releated
function, cpuidex) should resolve this.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Why does this need to be a builtin? What's wrong with the current implementation in the header file?
The purpose of this patch is to make #pragma intrinsic work with __cpuid. https://reviews.llvm.org/D98438 implemented support for #pragma intrinsic. This resulted in a bug in Chrome (https://crbug.com/1279344) where #pragma intrinsic(__cpuid) works in MSVC but doesn't work on Clang since Clang implements __cpuid in the header file. Clang's implementation of #pragma intrinsic only recognizes Clang builtins, so this patch should fix that.
As an aside, this patch is a _very_ early WIP that doesn't work and as such is not meant for review.
Implement cpuid and cpuidex as Clang builtins
https://reviews.llvm.org/D23944 implemented the #pragma intrinsic from
MSVC. This causes the statement #pragma intrinsic(cpuid) to fail [0]
on Clang because cpuid is currently implemented in intrin.h instead
of a Clang builtin. Reimplementing cpuid (as well as it's releated
function, cpuidex) should resolve this.
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
14917 | nit: perhaps the "Arg" suffix is unnecessary (also for the next variable) | |
14923 | Since Asm isn't going to change here, maybe having a variable isn't worth it; you could just pass "cpuid" directly to InlineAsm::get(). If you want to keep the variable for clarity, I think StringRef would be the better type. SmallString is normally used as a small buffer on the stack when building a string gradually. And since we're not building much, I'd also drop the comment above. | |
14924 | I know this is based on EmitX86BitTestIntrinsic(), but instead of using std::strings here, I think this would be an example of a good use of SmallString to build Constraints. However, I'm not sure we need to add getTarget().getClobbers() here at all. From checking the Intel manual, I don't believe "cpuid" clobbers anything besides the output registers. That's also how the instruction is modeled in llvm: https://github.com/llvm/llvm-project/blob/llvmorg-13.0.1/llvm/lib/Target/X86/X86InstrSystem.td#L452 For EmitX86BitTestIntrinsic(), Craig added the extra clobbers in D88121. Craig, can you comment on why we seem to set these clobbers everywhere? | |
14937 | I don't think hasSideEffects needs to be true for cpuid. | |
14942 | (just using 'int' would be more common llvm style) | |
14949 | (technically we already emitted the intrinsic, the value is really to signal that we successfully did so) | |
clang/lib/Headers/intrin.h | ||
538 | I wonder if we could remove __cpuid_count too. It seems it was added in D101338 just for the __cpuid(ex) implementations. | |
clang/test/CodeGen/ms-intrinsics-cpuid.c | ||
11 ↗ | (On Diff #415626) | It would be good to test that #pragma intrinsic doesn't generate a warning for __cpuid anymore. Just inserting it here and adding -Werror to the run lines would work. |
15 ↗ | (On Diff #415626) | Since the test is very exact, maybe it would make sense to take it all the way by giving symbolic names to the function arguments and check that they get passed correctly to the inline asm, and used to store the results. |
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
14923 | Ended up just passing "cpuid" directly | |
14924 | Changed this to use SmallString. Leaving this comment open until we figure out whether or not we need the clobbers. | |
clang/lib/Headers/intrin.h | ||
538 | I'm always in favor of removing dead code. I do have concerns that because of Hyrum's Law, we'll break users who call __cpuid_count even though it's an undocumented and unsupported API. Is this acceptable? | |
clang/test/CodeGen/ms-intrinsics-cpuid.c | ||
15 ↗ | (On Diff #415626) | This doesn't seem feasible to implement. The reason is that in the unoptimized output, Clang emits a load for the same value of cpu_info 4 times, once for each getelementptr instruction. When it comes time to call asm, Clang can pass the output of any of the 4 load instructions to the asm, and explicitly specifying a single load instruction output as the parameter would make the test flaky. One option would be to [[ https://godbolt.org/z/qY9ja1s11 | pass -O1 to Clang ]], which causes Clang to reuse the input registers and removes unnecessary loads and stores, so we can just use the input registers in the test's expected output. WDYT? |
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
14924 | I don't know the full history here. My changed in D88121 was to get the list from the getClobbers() instead of hardcoding "~{flags},~{fpsr}" into the string. And to add the "~{dirflag}". At the time I assumed dirflag missing was an oversight. getClobbers is what gets used for all user created inline assembly. I think we don't need the clobbers here. |
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
14924 | Clobbers have been removed. |
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
14931 | I think hasSideEffects should be false. I believe that will allow LLVM to optimize away the inline asm if the results are not used. Craig, does that sound right to you? | |
14938 | Oh, do we get a warning/error if this just uses i directly? In that case maybe using "unsigned" for the loop variable is better. | |
clang/lib/Headers/intrin.h | ||
538 | I think D101338 is recent enough that it will be okay. | |
clang/test/CodeGen/ms-intrinsics-cpuid.c | ||
15 ↗ | (On Diff #415626) | Oh right, yes it would get messy. Let's skip it then. |
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
14931 | I agree. It wasn't a volatile asm in the header file so we shouldn't need side effects. | |
14946 | I'm a little surprised we're able to return a dummy i1 value as the result of a void builtin. Is this done anywhere else? | |
clang/lib/Headers/intrin.h | ||
545 | What are we doing about this xchg that's needed to generate correct code? |
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
14944 | You should probably just return the last store you created. See BI__builtin_neon_vzipq_v for example. |
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
14927 | you shouldn't need llvm:: here. There's a using llvm:StringRef in effect on this file. You also don't needs static or constexpr. StringRef's don't own anything and are cheap to create. |
Update xchg implementation so that it uses the one it replaced in intrin.h instead of the one from cpuid.h
It's not identical to the one in intrin.h as InlineAsm isn't exactly tolerant of everything that's legal in an asm statement, so I ran the implementation from intrin.h through clang and copy/pasted it from the IR.
clang/lib/Headers/intrin.h | ||
---|---|---|
545 | The xchg instructions are now in the output for x86_64 I originally thought that the =b constraint was sufficient to preserve rbx, so I used the same implementation for both x86 and x86_64. However, I discovered https://bugs.llvm.org/show_bug.cgi?id=20311 where it turns out that in certain cases, =b is not sufficient. |
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
14933 | The `${...|...}` syntax is used to support multiple inline asm dialects (At&t or Intel), see https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Multiple-assembler-dialects-in-asm-templates The syntax was used in intrin.h, because that file could be built using either asm dialect depending on what options the compiler is run with. But now when we're calling InlineAsm::get directly, I think it's not needed and we should just use the At&t style (because InlineAsm::get's asmDialect param defaults to that). | |
14945 | (nit: llvm code typically just writes it 'unsigned' without the 'int' part) |
nit: perhaps the "Arg" suffix is unnecessary (also for the next variable)