This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/lib/Support/
-
trunk/
-
lib/
-
Support/
-
Host.cpp

Differential D46314

[X86][AMD][Bulldozer] Fix Bulldozer Model 2 detection.
ClosedPublic

Authored by lebedev.ri on May 1 2018, 7:30 AM.

Download Raw Diff

Details

Reviewers

craig.topper
GBuella
RKSimon
asbirlea
echristo
bkramer
spatel
andreadb
GGanesh

Commits

rGbc1a924138a6: [X86][AMD][Bulldozer] Fix Bulldozer Model 2 detection.
rL331294: [X86][AMD][Bulldozer] Fix Bulldozer Model 2 detection.

Summary

I have discovered an issue by accident.

$ lscpu 
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          21
Model:               2
Model name:          AMD FX(tm)-8350 Eight-Core Processor
Stepping:            0
CPU MHz:             3584.018
CPU max MHz:         4000.0000
CPU min MHz:         1400.0000
BogoMIPS:            8027.22
Virtualization:      AMD-V
L1d cache:           16K
L1i cache:           64K
L2 cache:            2048K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold

So this is model-2 bulldozer AMD CPU.

GCC agrees:

$ echo | gcc -E - -march=native -###
<...>
 /usr/lib/gcc/x86_64-linux-gnu/7/cc1 -E -quiet -imultiarch x86_64-linux-gnu - "-march=bdver2" -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mlwp -mfma -mfma4 -mxop -mbmi -mno-sgx -mno-bmi2 -mtbm -mavx -mno-avx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mno-rdrnd -mf16c -mno-fsgsbase -mno-rdseed -mprfchw -mno-adx -mfxsr -mxsave -mno-xsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku -mno-rdpid --param "l1-cache-size=16" --param "l1-cache-line-size=64" --param "l2-cache-size=2048" "-mtune=bdver2"
<...>

But clang does not: (look for bdver1)

$ echo | clang -E - -march=native -###
clang version 7.0.0- (trunk)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/local/bin
 "/usr/lib/llvm-7/bin/clang" "-cc1" "-triple" "x86_64-pc-linux-gnu" "-E" "-disable-free" "-disable-llvm-verifier" "-discard-value-names" "-main-file-name" "-" "-mrelocation-model" "static" "-mthread-model" "posix" "-mdisable-fp-elim" "-fmath-errno" "-masm-verbose" "-mconstructor-aliases" "-munwind-tables" "-fuse-init-array" "-target-cpu" "bdver1" "-target-feature" "+sse2" "-target-feature" "+cx16" "-target-feature" "+sahf" "-target-feature" "+tbm" "-target-feature" "-avx512ifma" "-target-feature" "-sha" "-target-feature" "-gfni" "-target-feature" "+fma4" "-target-feature" "-vpclmulqdq" "-target-feature" "+prfchw" "-target-feature" "-bmi2" "-target-feature" "-cldemote" "-target-feature" "-fsgsbase" "-target-feature" "-xsavec" "-target-feature" "+popcnt" "-target-feature" "+aes" "-target-feature" "-avx512bitalg" "-target-feature" "-xsaves" "-target-feature" "-avx512er" "-target-feature" "-avx512vnni" "-target-feature" "-avx512vpopcntdq" "-target-feature" "-clwb" "-target-feature" "-avx512f" "-target-feature" "-clzero" "-target-feature" "-pku" "-target-feature" "+mmx" "-target-feature" "+lwp" "-target-feature" "-rdpid" "-target-feature" "+xop" "-target-feature" "-rdseed" "-target-feature" "-waitpkg" "-target-feature" "-ibt" "-target-feature" "+sse4a" "-target-feature" "-avx512bw" "-target-feature" "-clflushopt" "-target-feature" "+xsave" "-target-feature" "-avx512vbmi2" "-target-feature" "-avx512vl" "-target-feature" "-avx512cd" "-target-feature" "+avx" "-target-feature" "-vaes" "-target-feature" "-rtm" "-target-feature" "+fma" "-target-feature" "+bmi" "-target-feature" "-rdrnd" "-target-feature" "-mwaitx" "-target-feature" "+sse4.1" "-target-feature" "+sse4.2" "-target-feature" "-avx2" "-target-feature" "-wbnoinvd" "-target-feature" "+sse" "-target-feature" "+lzcnt" "-target-feature" "+pclmul" "-target-feature" "-prefetchwt1" "-target-feature" "+f16c" "-target-feature" "+ssse3" "-target-feature" "-sgx" "-target-feature" "-shstk" "-target-feature" "+cmov" "-target-feature" "-avx512vbmi" "-target-feature" "-movbe" "-target-feature" "-xsaveopt" "-target-feature" "-avx512dq" "-target-feature" "-adx" "-target-feature" "-avx512pf" "-target-feature" "+sse3" "-dwarf-column-info" "-debugger-tuning=gdb" "-resource-dir" "/usr/lib/llvm-7/lib/clang/7.0.0" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/llvm-7/lib/clang/7.0.0/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-fdebug-compilation-dir" "/build/llvm-build-Clang-release" "-ferror-limit" "19" "-fmessage-length" "271" "-fobjc-runtime=gcc" "-fdiagnostics-show-option" "-fcolor-diagnostics" "-o" "-" "-x" "c" "-"

So clang, unlike gcc, considers this to be bdver1.

After some digging, i've come across getAMDProcessorTypeAndSubtype() in Host.cpp.
I have added the following debug printf after the call to that function in sys::getHostCPUName():

errs() << "Family " << Family << " Model " << Model << " Type " << Type "\n";

Which produced:

Family 21 Model 2 Type 5

Which matches the lscpu output.

As it was pointed in the review by @craig.topper:

In D46314#1084123, @craig.topper wrote:

I dont' think this is right. Here is what I found on wikipedia. https://en.wikipedia.org/wiki/List_of_AMD_CPU_microarchitectures.

AMD Bulldozer Family 15h - the successor of 10h/K10. Bulldozer is designed for processors in the 10 to 220W category, implementing XOP, FMA4 and CVT16 instruction sets. Orochi was the first design which implemented it. For Bulldozer, CPUID model numbers are 00h and 01h.
AMD Piledriver Family 15h (2nd-gen) - successor to Bulldozer. CPUID model numbers are 02h (earliest "Vishera" Piledrivers) and 10h-1Fh.
AMD Steamroller Family 15h (3rd-gen) - third-generation Bulldozer derived core. CPUID model numbers are 30h-3Fh.
AMD Excavator Family 15h (4th-gen) - fourth-generation Bulldozer derived core. CPUID model numbers are 60h-6Fh, later updated revisions have model numbers 70h-7Fh.

So there's a weird exception where model 2 should go with 0x10-0x1f.

Though It does not help that the code can't be tested at the moment.
With this logical change, the bdver2 is properly detected.

$ echo | /build/llvm-build-Clang-release/bin/clang -E - -march=native -###
clang version 7.0.0 (trunk 331249) (llvm/trunk 331256)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /build/llvm-build-Clang-release/bin
 "/build/llvm-build-Clang-release/bin/clang-7" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-E" "-disable-free" "-main-file-name" "-" "-mrelocation-model" "static" "-mthread-model" "posix" "-mdisable-fp-elim" "-fmath-errno" "-masm-verbose" "-mconstructor-aliases" "-munwind-tables" "-fuse-init-array" "-target-cpu" "bdver2" "-target-feature" "+sse2" "-target-feature" "+cx16" "-target-feature" "+sahf" "-target-feature" "+tbm" "-target-feature" "-avx512ifma" "-target-feature" "-sha" "-target-feature" "-gfni" "-target-feature" "+fma4" "-target-feature" "-vpclmulqdq" "-target-feature" "+prfchw" "-target-feature" "-bmi2" "-target-feature" "-cldemote" "-target-feature" "-fsgsbase" "-target-feature" "-xsavec" "-target-feature" "+popcnt" "-target-feature" "+aes" "-target-feature" "-avx512bitalg" "-target-feature" "-movdiri" "-target-feature" "-xsaves" "-target-feature" "-avx512er" "-target-feature" "-avx512vnni" "-target-feature" "-avx512vpopcntdq" "-target-feature" "-clwb" "-target-feature" "-avx512f" "-target-feature" "-clzero" "-target-feature" "-pku" "-target-feature" "+mmx" "-target-feature" "+lwp" "-target-feature" "-rdpid" "-target-feature" "+xop" "-target-feature" "-rdseed" "-target-feature" "-waitpkg" "-target-feature" "-movdir64b" "-target-feature" "-ibt" "-target-feature" "+sse4a" "-target-feature" "-avx512bw" "-target-feature" "-clflushopt" "-target-feature" "+xsave" "-target-feature" "-avx512vbmi2" "-target-feature" "-avx512vl" "-target-feature" "-avx512cd" "-target-feature" "+avx" "-target-feature" "-vaes" "-target-feature" "-rtm" "-target-feature" "+fma" "-target-feature" "+bmi" "-target-feature" "-rdrnd" "-target-feature" "-mwaitx" "-target-feature" "+sse4.1" "-target-feature" "+sse4.2" "-target-feature" "-avx2" "-target-feature" "-wbnoinvd" "-target-feature" "+sse" "-target-feature" "+lzcnt" "-target-feature" "+pclmul" "-target-feature" "-prefetchwt1" "-target-feature" "+f16c" "-target-feature" "+ssse3" "-target-feature" "-sgx" "-target-feature" "-shstk" "-target-feature" "+cmov" "-target-feature" "-avx512vbmi" "-target-feature" "-movbe" "-target-feature" "-xsaveopt" "-target-feature" "-avx512dq" "-target-feature" "-adx" "-target-feature" "-avx512pf" "-target-feature" "+sse3" "-dwarf-column-info" "-debugger-tuning=gdb" "-resource-dir" "/build/llvm-build-Clang-release/lib/clang/7.0.0" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/build/llvm-build-Clang-release/lib/clang/7.0.0/include" "-internal-externc-isystem" "/usr/include/x86_64-linux-gnu" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-fdebug-compilation-dir" "/build/llvm-build-Clang-release" "-ferror-limit" "19" "-fmessage-length" "271" "-fobjc-runtime=gcc" "-fdiagnostics-show-option" "-fcolor-diagnostics" "-o" "-" "-x" "c" "-"

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.May 1 2018, 7:30 AM

Herald added subscribers: JDevlieghere, arichardson, aprantl, sdardis. · View Herald TranscriptMay 1 2018, 7:30 AM

andreadb added a reviewer: andreadb.May 1 2018, 7:32 AM

RKSimon added a reviewer: GGanesh.May 1 2018, 7:43 AM

I dont' think this is right. Here is what I found on wikipedia. https://en.wikipedia.org/wiki/List_of_AMD_CPU_microarchitectures.

AMD Bulldozer Family 15h - the successor of 10h/K10. Bulldozer is designed for processors in the 10 to 220W category, implementing XOP, FMA4 and CVT16 instruction sets. Orochi was the first design which implemented it. For Bulldozer, CPUID model numbers are 00h and 01h.
AMD Piledriver Family 15h (2nd-gen) - successor to Bulldozer. CPUID model numbers are 02h (earliest "Vishera" Piledrivers) and 10h-1Fh.
AMD Steamroller Family 15h (3rd-gen) - third-generation Bulldozer derived core. CPUID model numbers are 30h-3Fh.
AMD Excavator Family 15h (4th-gen) - fourth-generation Bulldozer derived core. CPUID model numbers are 60h-6Fh, later updated revisions have model numbers 70h-7Fh.

So there's a weird exception where model 2 should go with 0x10-0x1f.

In D46314#1084123, @craig.topper wrote:

I dont' think this is right. Here is what I found on wikipedia. https://en.wikipedia.org/wiki/List_of_AMD_CPU_microarchitectures.

AMD Bulldozer Family 15h - the successor of 10h/K10. Bulldozer is designed for processors in the 10 to 220W category, implementing XOP, FMA4 and CVT16 instruction sets. Orochi was the first design which implemented it. For Bulldozer, CPUID model numbers are 00h and 01h.

AMD Piledriver Family 15h (2nd-gen) - successor to Bulldozer. CPUID model numbers are 02h (earliest "Vishera" Piledrivers) and 10h-1Fh.

Oh, that simplifies things :)

AMD Steamroller Family 15h (3rd-gen) - third-generation Bulldozer derived core. CPUID model numbers are 30h-3Fh.
AMD Excavator Family 15h (4th-gen) - fourth-generation Bulldozer derived core. CPUID model numbers are 60h-6Fh, later updated revisions have model numbers 70h-7Fh.

So there's a weird exception where model 2 should go with 0x10-0x1f.

You may want to check what builtin_cpu_is("btver2") and builtin_cpu_is("btver1") return when compiled with gcc. They seem to have the same bad check in their library code. The gcc compiler uses feature bits not Family/Model.

We also would need to fix the code in compiler-rt since it's copied from our Host.cpp.

gcc uses feature bits. That would be a differential ISA with respect to the previous gen. I think the idea is to get the ISA list as close to the underlying arch. When an older version of the compiler gets used which doesn't have the arch enabled, the compiler can fallback to the closest arch which enables the ISA list.

Model number for bulldozer should be as mentioned in the wikipedia as quoted by @craig.topper.
For example, I have bdver3 machine with lscpu output as mentioned below.
<snip>
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 21
Model: 48
</snip>

So, checking for '3' in model for bdver3 will definitely fail this machine.

lebedev.ri updated this revision to Diff 144737.May 1 2018, 9:31 AM

lebedev.ri retitled this revision from [X86][AMD][Bulldozer] Unbreak Bulldozer sub-type detection. to [X86][AMD][Bulldozer] Fix Bulldozer Model 2 detection..

lebedev.ri edited the summary of this revision. (Show Details)

In D46314#1084130, @craig.topper wrote:

You may want to check what __builtin_cpu_is("btver2") and __builtin_cpu_is("btver1") return when compiled with gcc. They seem to have the same bad check in their library code. The gcc compiler uses feature bits not Family/Model.

We also would need to fix the code in compiler-rt since it's copied from our Host.cpp.

bdver1/2, not btver1/2
But yes, that is apparently broken, too:

$ cat /tmp/test.cpp 
#include <iostream>

int main (int argc, char* argv[]) {
  std::cout << __builtin_cpu_is("bdver2") << " " << __builtin_cpu_is("bdver1") << "\n";
return 0;
}
$ g++ /tmp/test.cpp 
$ ./a.out 
0 1
$ clang++ /tmp/test.cpp 
$ ./a.out 
0 1
$ /build/llvm-build-Clang-release/bin/clang++ /tmp/test.cpp 
$ ./a.out 
0 1

Posted compiler-rt side in D46323, but it does nothing for me because i guess the gcc functionality is being used.

Guess I'm predisposed to seeing Btver2 due to the scheduler model activity lately. Can you file a bug against libgcc?

craig.topper added inline comments.May 1 2018, 10:23 AM

lib/Support/Host.cpp
845 ↗	(On Diff #144737)	Update the comment.

In D46314#1084260, @craig.topper wrote:

Guess I'm predisposed to seeing Btver2 due to the scheduler model activity lately.

Guessed as much :)

I'm kinda tempted to look into creating scheduling model for this processor,
since this is what i'm using, but not sure if i can produce anything useful..

Can you file a bug against libgcc?

Yes, right after gcc-bugzilla-account-request@gcc.gnu.org 'replies'.

Update comment, too.

lebedev.ri marked an inline comment as done.May 1 2018, 10:55 AM

LGTM

This revision is now accepted and ready to land.May 1 2018, 10:59 AM

Closed by commit rL331294: [X86][AMD][Bulldozer] Fix Bulldozer Model 2 detection. (authored by lebedevri). · Explain WhyMay 1 2018, 11:43 AM

This revision was automatically updated to reflect the committed changes.

Diffusion mentioned this in rL331294: [X86][AMD][Bulldozer] Fix Bulldozer Model 2 detection..

Diffusion mentioned this in rL331295: [compiler-rt][X86][AMD][Bulldozer] Fix Bulldozer Model 2 detection..

Diffusion mentioned this in rCRT331295: [compiler-rt][X86][AMD][Bulldozer] Fix Bulldozer Model 2 detection..

@craig.topper thank you for the review!

Revision Contents

Path

Size

llvm/

trunk/

lib/

Support/

Host.cpp

4 lines

Diff 144761

llvm/trunk/lib/Support/Host.cpp

Show First 20 Lines • Show All 834 Lines • ▼ Show 20 Lines	case 21:
if (Model >= 0x60 && Model <= 0x7f) {		if (Model >= 0x60 && Model <= 0x7f) {
*Subtype = X86::AMDFAM15H_BDVER4;		*Subtype = X86::AMDFAM15H_BDVER4;
break; // "bdver4"; 60h-7Fh: Excavator		break; // "bdver4"; 60h-7Fh: Excavator
}		}
if (Model >= 0x30 && Model <= 0x3f) {		if (Model >= 0x30 && Model <= 0x3f) {
*Subtype = X86::AMDFAM15H_BDVER3;		*Subtype = X86::AMDFAM15H_BDVER3;
break; // "bdver3"; 30h-3Fh: Steamroller		break; // "bdver3"; 30h-3Fh: Steamroller
}		}
if (Model >= 0x10 && Model <= 0x1f) {		if ((Model >= 0x10 && Model <= 0x1f) \|\| Model == 0x02) {
*Subtype = X86::AMDFAM15H_BDVER2;		*Subtype = X86::AMDFAM15H_BDVER2;
break; // "bdver2"; 10h-1Fh: Piledriver		break; // "bdver2"; 02h, 10h-1Fh: Piledriver
}		}
if (Model <= 0x0f) {		if (Model <= 0x0f) {
*Subtype = X86::AMDFAM15H_BDVER1;		*Subtype = X86::AMDFAM15H_BDVER1;
break; // "bdver1"; 00h-0Fh: Bulldozer		break; // "bdver1"; 00h-0Fh: Bulldozer
}		}
break;		break;
case 22:		case 22:
*Type = X86::AMD_BTVER2;		*Type = X86::AMD_BTVER2;
▲ Show 20 Lines • Show All 506 Lines • Show Last 20 Lines