Page MenuHomePhabricator

Implement __cpuid and __cpuidex as Clang builtins
ClosedPublic

Authored by ayzhao on Mar 14 2022, 5:22 PM.

Details

Summary

https://reviews.llvm.org/D23944 implemented the #pragma intrinsic from
MSVC. This causes the statement #pragma intrinsic(cpuid) to fail [0]
on Clang because cpuid is currently implemented in intrin.h instead
of a Clang builtin. Reimplementing cpuid (as well as it's releated
function, cpuidex) should resolve this.

[0]: https://crbug.com/1279344

Diff Detail

Event Timeline

ayzhao created this revision.Mar 14 2022, 5:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 5:22 PM
Herald added a subscriber: pengfei. · View Herald Transcript
ayzhao requested review of this revision.Mar 14 2022, 5:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 5:22 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Why does this need to be a builtin? What's wrong with the current implementation in the header file?

The purpose of this patch is to make #pragma intrinsic work with __cpuid. https://reviews.llvm.org/D98438 implemented support for #pragma intrinsic. This resulted in a bug in Chrome (https://crbug.com/1279344) where #pragma intrinsic(__cpuid) works in MSVC but doesn't work on Clang since Clang implements __cpuid in the header file. Clang's implementation of #pragma intrinsic only recognizes Clang builtins, so this patch should fix that.

As an aside, this patch is a _very_ early WIP that doesn't work and as such is not meant for review.

ayzhao updated this revision to Diff 415626.Mar 15 2022, 4:24 PM

Implement cpuid and cpuidex as Clang builtins

https://reviews.llvm.org/D23944 implemented the #pragma intrinsic from
MSVC. This causes the statement #pragma intrinsic(cpuid) to fail [0]
on Clang because
cpuid is currently implemented in intrin.h instead
of a Clang builtin. Reimplementing cpuid (as well as it's releated
function,
cpuidex) should resolve this.

[0]: https://crbug.com/1279344

ayzhao retitled this revision from Implement __cpuid intrinsic WIP to Implement __cpuid and __cpuidex as Clang builtins.Mar 15 2022, 4:31 PM
ayzhao edited the summary of this revision. (Show Details)
ayzhao added a reviewer: hans.
hans added inline comments.Mar 16 2022, 7:53 AM
clang/lib/CodeGen/CGBuiltin.cpp
14927

nit: perhaps the "Arg" suffix is unnecessary (also for the next variable)

14933

Since Asm isn't going to change here, maybe having a variable isn't worth it; you could just pass "cpuid" directly to InlineAsm::get().

If you want to keep the variable for clarity, I think StringRef would be the better type. SmallString is normally used as a small buffer on the stack when building a string gradually. And since we're not building much, I'd also drop the comment above.

14934

I know this is based on EmitX86BitTestIntrinsic(), but instead of using std::strings here, I think this would be an example of a good use of SmallString to build Constraints.

However, I'm not sure we need to add getTarget().getClobbers() here at all. From checking the Intel manual, I don't believe "cpuid" clobbers anything besides the output registers. That's also how the instruction is modeled in llvm: https://github.com/llvm/llvm-project/blob/llvmorg-13.0.1/llvm/lib/Target/X86/X86InstrSystem.td#L452

For EmitX86BitTestIntrinsic(), Craig added the extra clobbers in D88121. Craig, can you comment on why we seem to set these clobbers everywhere?

14947

I don't think hasSideEffects needs to be true for cpuid.
Maybe Craig can confirm?

14952

(just using 'int' would be more common llvm style)

14959

(technically we already emitted the intrinsic, the value is really to signal that we successfully did so)

clang/lib/Headers/intrin.h
538

I wonder if we could remove __cpuid_count too. It seems it was added in D101338 just for the __cpuid(ex) implementations.

clang/test/CodeGen/ms-intrinsics-cpuid.c
11–12

It would be good to test that #pragma intrinsic doesn't generate a warning for __cpuid anymore. Just inserting it here and adding -Werror to the run lines would work.

17–18

Since the test is very exact, maybe it would make sense to take it all the way by giving symbolic names to the function arguments and check that they get passed correctly to the inline asm, and used to store the results.

ayzhao updated this revision to Diff 415905.Mar 16 2022, 10:51 AM
ayzhao marked 5 inline comments as done.

Addressed some code review comments

ayzhao marked an inline comment as not done.Mar 16 2022, 10:53 AM
ayzhao added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
14933

Ended up just passing "cpuid" directly

14934

Changed this to use SmallString.

Leaving this comment open until we figure out whether or not we need the clobbers.

clang/lib/Headers/intrin.h
538

I'm always in favor of removing dead code. I do have concerns that because of Hyrum's Law, we'll break users who call __cpuid_count even though it's an undocumented and unsupported API. Is this acceptable?

clang/test/CodeGen/ms-intrinsics-cpuid.c
17–18

This doesn't seem feasible to implement.

The reason is that in the unoptimized output, Clang emits a load for the same value of cpu_info 4 times, once for each getelementptr instruction. When it comes time to call asm, Clang can pass the output of any of the 4 load instructions to the asm, and explicitly specifying a single load instruction output as the parameter would make the test flaky.

One option would be to [[ https://godbolt.org/z/qY9ja1s11 | pass -O1 to Clang ]], which causes Clang to reuse the input registers and removes unnecessary loads and stores, so we can just use the input registers in the test's expected output. WDYT?

craig.topper added inline comments.Mar 16 2022, 11:17 AM
clang/lib/CodeGen/CGBuiltin.cpp
14934

I don't know the full history here. My changed in D88121 was to get the list from the getClobbers() instead of hardcoding "~{flags},~{fpsr}" into the string. And to add the "~{dirflag}". At the time I assumed dirflag missing was an oversight. getClobbers is what gets used for all user created inline assembly.

I think we don't need the clobbers here.

ayzhao updated this revision to Diff 415914.Mar 16 2022, 11:30 AM

Removed CPU clobbers

ayzhao marked 2 inline comments as done.Mar 16 2022, 11:31 AM
ayzhao added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
14934

Clobbers have been removed.

hans added inline comments.Mar 16 2022, 12:33 PM
clang/lib/CodeGen/CGBuiltin.cpp
14941

I think hasSideEffects should be false. I believe that will allow LLVM to optimize away the inline asm if the results are not used.

Craig, does that sound right to you?

14948

Oh, do we get a warning/error if this just uses i directly? In that case maybe using "unsigned" for the loop variable is better.

clang/lib/Headers/intrin.h
538

I think D101338 is recent enough that it will be okay.

clang/test/CodeGen/ms-intrinsics-cpuid.c
17–18

Oh right, yes it would get messy. Let's skip it then.

ayzhao updated this revision to Diff 415939.Mar 16 2022, 12:41 PM
ayzhao marked an inline comment as done.

Remove __cpuid_count and use unsigned in the for loop variable

ayzhao marked 5 inline comments as done.Mar 16 2022, 12:44 PM
craig.topper added inline comments.Mar 16 2022, 12:45 PM
clang/lib/CodeGen/CGBuiltin.cpp
14941

I agree. It wasn't a volatile asm in the header file so we shouldn't need side effects.

14956

I'm a little surprised we're able to return a dummy i1 value as the result of a void builtin. Is this done anywhere else?

clang/lib/Headers/intrin.h
545

What are we doing about this xchg that's needed to generate correct code?

craig.topper added inline comments.Mar 16 2022, 12:48 PM
clang/lib/CodeGen/CGBuiltin.cpp
14954

You should probably just return the last store you created. See BI__builtin_neon_vzipq_v for example.

craig.topper added inline comments.Mar 16 2022, 12:58 PM
clang/lib/CodeGen/CGBuiltin.cpp
14937

you shouldn't need llvm:: here. There's a using llvm:StringRef in effect on this file.

You also don't needs static or constexpr. StringRef's don't own anything and are cheap to create.

ayzhao updated this revision to Diff 416014.Mar 16 2022, 4:10 PM
ayzhao marked 6 inline comments as done.

Set hasSideEffects=false, other fixes

clang/lib/CodeGen/CGBuiltin.cpp
14941

hasSideEffects is now false

14956

Ended up just returning the last store

ayzhao updated this revision to Diff 416046.Mar 16 2022, 7:07 PM

Implement xchgq workaround for x86_64

ayzhao updated this revision to Diff 416049.Mar 16 2022, 7:33 PM
ayzhao marked an inline comment as done.

Update xchg implementation so that it uses the one it replaced in intrin.h instead of the one from cpuid.h

It's not identical to the one in intrin.h as InlineAsm isn't exactly tolerant of everything that's legal in an asm statement, so I ran the implementation from intrin.h through clang and copy/pasted it from the IR.

ayzhao added inline comments.Mar 16 2022, 7:38 PM
clang/lib/Headers/intrin.h
545

The xchg instructions are now in the output for x86_64

I originally thought that the =b constraint was sufficient to preserve rbx, so I used the same implementation for both x86 and x86_64. However, I discovered https://bugs.llvm.org/show_bug.cgi?id=20311 where it turns out that in certain cases, =b is not sufficient.

hans added inline comments.Mar 17 2022, 5:06 AM
clang/lib/CodeGen/CGBuiltin.cpp
14943

The `${...|...}` syntax is used to support multiple inline asm dialects (At&t or Intel), see https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Multiple-assembler-dialects-in-asm-templates

The syntax was used in intrin.h, because that file could be built using either asm dialect depending on what options the compiler is run with. But now when we're calling InlineAsm::get directly, I think it's not needed and we should just use the At&t style (because InlineAsm::get's asmDialect param defaults to that).

14955

(nit: llvm code typically just writes it 'unsigned' without the 'int' part)

ayzhao updated this revision to Diff 416176.Mar 17 2022, 7:27 AM

Use AT&T syntax for assembly, unsigned int -> unsigned

ayzhao marked 2 inline comments as done.Mar 17 2022, 7:27 AM
hans accepted this revision.Mar 17 2022, 8:51 AM

lgtm

Since Craig reviewed, let's wait for him too before landing.

This revision is now accepted and ready to land.Mar 17 2022, 8:51 AM
This revision now requires review to proceed.Mar 18 2022, 7:32 AM

LGTM to me other than those two minor comments.

clang/lib/CodeGen/CGBuiltin.cpp
14956

Why do we need curly braces around i?

14957

I think you can use Builder.CreateConstInBoundsGEP1_32(Int32Ty, BasePtr, i) to remove the need to call to ConstantInt::get.

ayzhao updated this revision to Diff 416531.Mar 18 2022, 9:46 AM

Remove curly braces and call to ConstantInt::get

ayzhao marked 2 inline comments as done.Mar 18 2022, 9:47 AM
This revision is now accepted and ready to land.Mar 18 2022, 9:55 AM
This revision was landed with ongoing or failed builds.Mar 18 2022, 10:14 AM
This revision was automatically updated to reflect the committed changes.