This refactor patch means to remove CPU_SPECIFIC* MACROs in X86TargetParser.def
and move those information into ProcInfo of X86TargetParser.cpp. Since these
two files both maintain a table with redundant info such as cpuname and its
features supported. CPU_SPECIFIC* MACROs define some different information. This
patch dealt with them in these ways when moving:
1.mangling
This is now moved to Mangling in ProcInfo and directly initialized at array of
Processors. CPUs don't support cpu_dispatch/specific are assigned '\0' as
mangling.
2.CPU alias
The alias cpu will also be initialized in array of Processors, its attributes
will be same as its alias target cpu. Same feature list, same mangling.
3.TUNE_NAME
Before my change, some cpu names support cpu_dispatch/specific are not
supported in X86.td, which means optimizer/backend doesn't recognize them. So
they use a different TUNE_NAME to generate in IR. In this patch, I added these
missing cpu support at X86.td by utilizing existing Features and XXXTunings, so
that each cpu name can directly use its own name as TUNE_NAME to be supported
by optimizer/backend.
4.Feature list
The feature list of one CPU maintained in X86TargetParser.def is not same as
the one in X86TargetParser.cpp. It only maintains part of features of one CPU
(features defined by X86_FEATURE_COMPAT). While X86TargetParser.cpp maintains
a complete one. This patch abandons the feature list maintained by CPU_SPECIFIC*
MACROs because assigning a CPU with a complete one doesn't affect the
functionality of cpu_dispatch/specific.
Except these four info, since some of CPUs supported by cpu_dispatch/specific
doesn's support clang options like -march, -mtune before, this patch also kept
this behavior still by adding another member OnlyForCPUDispatchSpecific in
ProcInfo.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This patch means to remove CPU_SPECIFIC* MACROs in X86TargetParser.def and move that part of functionality into X86TargetParser.cpp.
Since these two files both maintain a table with cpuname, features of this cpu supported, ... This move can reduce the codes.
CPU_SPECIFIC* has some different and specific information maintained. This patch dealt with them in these ways when moving:
- mangling This is now moved to Mangling in ProcInfo and directly initialized at array of Processors. CPUs don't support cpu_dispatch/specific are assigned '\0' as mangling. This patch also supports some of new intel cpus for cpu_dispatch/specific feature.
- alias relationship The alias cpu will also be initialized in array of Processors, its attributes will be same as its alias target cpu. Same feature list, same mangling.
- TUNE_NAME Before my change, some cpu names support cpu_dispatch/specific are not supported in X86.td, which means optimizer/backend doesn't recognize them. So they use a different TUNE_NAME to generate in IR. In this patch, I added these missing cpu support at X86.td by utilize existing Features and XXXTunings. So that each cpu name can directly use its own name as TUNE_NAME to be supported by optimizer/backend.
- Feature list The feature list of one CPU maintained in X86TargetParser.def is not same as the one in X86TargetParser.cpp. It only maintains part of features of one CPU(Features defined by X86_FEATURE_COMPAT). While X86TargetParser.cpp maintains a complete one. This patch abandons the feature list maintained in X86TargetParser.def because assigning a CPU with a complete feature list in X86TargetParser.cpp doesn't affect the functionality of cpu_dispatch/specific. See the implement of llvm::X86::getCpuSupportsMask, it already masked out the features not defined by X86_FEATURE_COMPAT.
Beyond these four information, since some of CPUs supported by cpu_dispatch/specific doesn's support clang options like -march, -mtune before, this patch also kept this behavior still by adding another member OnlyForCPUSpecificDispath in ProcInfo.
llvm/lib/TargetParser/X86TargetParser.cpp | ||
---|---|---|
15 | (clang-format) - include order | |
313 | Would it be better to move all of this into X86TargetParser.def ? | |
314 | FeaturesPentiumMMX & ~FeaturesPentiumMMX ? | |
llvm/test/CodeGen/X86/cpus-intel.ll | ||
34 | Keeping all the aliased cpus variants RUN together will make maintenance easier (e.g. pentium_mmx) |
- Update generic and pentium Features to align with X86.td.
- Readjust tests in cpus-intel.ll
llvm/test/CodeGen/X86/cpus-intel.ll | ||
---|---|---|
34 | Updated, not sure if I understood your point. pls review again. |
llvm/test/CodeGen/X86/cpus-intel.ll | ||
---|---|---|
8–9 | I meant like this: ; RUN: llc < %s -o /dev/null -mtriple=i686-unknown-unknown -mcpu=pentium-mmx 2>&1 | FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty ; RUN: llc < %s -o /dev/null -mtriple=i686-unknown-unknown -mcpu=pentium_mmx 2>&1 | FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty The idea is to keep RUN lines that test the equivalent cpus together, so its easier for any future edits to handle them together |
llvm/test/CodeGen/X86/cpus-intel.ll | ||
---|---|---|
8–9 | Woops, I really misunderstood. Updated. |
Thanks for @RKSimon 's review, I'd like also to mention that https://reviews.llvm.org/D152989 is supposed to be the base commit of here.
llvm/lib/TargetParser/X86TargetParser.cpp | ||
---|---|---|
110 | I think not. I copied them from the old MACROs in X86TargetParser.def. I think the values are just used to do some distinguish from different CPUs. And 'a'-'Z' is about to run out of soon here. We probably need to extend the mangling, may be with another letter as prefix. @erichkeane may have a better answer here. | |
llvm/test/CodeGen/X86/cpus-intel.ll | ||
91 | I'll do the adjust. BTW I'd like also to mention below: Here are the rules I used to add features/tuning model in X86.td for missing cpu names in CPU_SPECIFIC Macros: RULE 1: Infer from the name itself. E.g. pentium_4 -> pentium4, pentium_iii -> pentium3 pentium_4_sse3 -> prescott core_2_duo_ssse3 -> core2 core_2_duo_sse4_1 -> penryn atom_sse4_2 -> silvermont core_i7_sse4_2 -> nehalem core_aes_pclmulqdq -> westmere core_5th_gen_avx_tsx -> broadwell There is only one exception: atom_sse4_2_movbe. For now I set it as goldmont features and silvermont tuning model according to its definition: CPU_SPECIFIC("atom_sse4_2_movbe", "silvermont", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt") |
clang/test/CodeGen/attr-cpuspecific.c | ||
---|---|---|
354 | Why cmov disappearing? This feature is supported by most recent targets. Same below. |
clang/test/CodeGen/attr-cpuspecific.c | ||
---|---|---|
354 | Good catch. Seems like frontend codes missed CMOV from the very beginning, while X86.td doesn't have this issue. Will update. |
Please can you cleanup the summary, as it isn't very easy to understand at the moment. Possibly split into a series of bullet points describing the changes?
I think this is OK, I have a slight fear we're losing a bit of the 'tune' functionality, but it is not impossible that we've never really cared about that. One concern I have is that the list was used for the resolver function, but I don't see any test changes for that? Are we properly filtering out the features list somehow?
clang/lib/CodeGen/CodeGenModule.cpp | ||
---|---|---|
2491 | So my understanding here is that our intent was that the 'tune' cpu and the 'selected' cpu were not necessarily the same (either not the same name, OR not the same CPU!), right? Is that being lost here? |
Yes! Now there are no tests yet testing the resolver function influenced by the feature list. I'll add another pre-commit test to show this.
Added checks in https://reviews.llvm.org/D152989. pls help review. About the "-tune-cpu", I think it's ok to use the same name as the one that users specified in _cpu_specific(), after I supported those missing names in X86.td so that optimizer can now recognize them.
- Fixed copy errors of some CPU's manglings
- Updated resolver function checkers due to more complete feature list changes.
- Added more cpu checks in attr-cpuspecific-cpus.c
Notice that these two changes are both expected since a more complete feature list won't influence the _cpu_dispatch/specific multiversion to not work.
clang/test/CodeGen/attr-cpuspecific.c | ||
---|---|---|
47–48 | This value change is because the feature list of ivybridge described in X86TargetParser.def before missed feature "pclmul". | |
56–57 | This value change is because the feature list of knl described in X86TargetParser.def before missed feature "bmi2" and "aes". |
I have some concerns for RULE 3, especially core_aes_pclmulqdq -> westmere and atom_sse4_2_movbe -> silvermont.
Sometimes, we have minor feature differences in the same generation targets. I guess that's why we use arch_feature naming like core_2_duo_ssse3. Merging them into the same generation or the next generation might corrup the intention here. But I'm not expert in CPUDispatch, and I don't see any existing tests for them, so I won't block the patch since it's an improvement in general.
Please wait a few days for other reviewers' opinions.
clang/test/CodeGen/attr-cpuspecific.c | ||
---|---|---|
56–57 | The comment is for TwoVersions? |
It looks to me the failed unit tests might be related to this patch, please take a look.
clang/test/CodeGen/attr-cpuspecific.c | ||
---|---|---|
56–57 | Yes. Sorry for wrong point. |
You are right. For historical reasons, I can't find which product each cpu name string points to actually. From the old feature list in X86TargetParser.def, these three cpu names even share a same feature list:
atom_sse4_2 core_i7_sse4_2 core_aes_pclmulqdq
"core_5th_gen_avx_tsx" is also same as "broadwell". So I have a new proposal for RULE3, which can be considered to be more conservative:
pentium_4_sse3 -> prescott (FeatureSSE3) First introduce FeatureSSE3 like prescott core_2_duo_ssse3 -> core2 (FeatureSSSE3) First introduce FeatureSSSE3 like core2 core_2_duo_sse4_1 -> penryn (FeatureSSE4_1) First introduce FeatureSSE4_1 like penryn atom_sse4_2 -> nehalem (FeatureSSE4_2) First introduce FeatureSSE4_2 like nehalem core_i7_sse4_2 -> nehalem (FeatureSSE4_2) First introduce FeatureSSE4_2 like nehalem core_aes_pclmulqdq -> nehalem (FeatureSSE4_2) First introduce FeatureSSE4_2 like nehalem core_5th_gen_avx_tsx -> broadwell Same feature list as broadwell
Meanwhile, the fact above won't affect code changes in X86.td. I'll still define these new cpu names with the TUNE_NAME info in original source.
Yes, it looks like it would be best to split off and commit some of the fixes (cmov / the isa changes causing the attr-cpuspecific.c diffs etc.) first, before this refactor patch.
Yes, it looks like it would be best to split off and commit some of the fixes (cmov / the isa changes causing the attr-cpuspecific.c diffs etc.) first, before this refactor patch.
Done in https://reviews.llvm.org/D154181
clang/test/CodeGen/attr-cpuspecific.c | ||
---|---|---|
47–48 | Pull out an ivybridge fix into its own patch? |
clang/test/CodeGen/attr-cpuspecific.c | ||
---|---|---|
47–48 | Emmm. Seems like a good idea. How about I change these two CPU's old feature list only? Since the successors of them will also have this issue, but no tests influenced. |
clang/test/CodeGen/attr-cpuspecific.c | ||
---|---|---|
47–48 | Sure, a single patch with multiple cpus' fixups is fine |
clang/test/CodeGen/attr-cpuspecific.c | ||
---|---|---|
47–48 | Done in https://reviews.llvm.org/D154209 |
llvm/lib/TargetParser/X86TargetParser.cpp | ||
---|---|---|
378 | I'm still not clear on what determines the mangling mode and cpu dispatch flag for cpu targets are supposedly the same? For example, none of these ivybridge equivalent configs have the same values. |
clang/test/CodeGen/attr-cpuspecific-cpus.c | ||
---|---|---|
40 | In this patch, I additionally supported some intel new CPU's _cpu_specific feature by creating a new mangling or copy some old ones (which means aliasing certain cpu). Maybe I should do this in a following patch? | |
llvm/lib/TargetParser/X86TargetParser.cpp | ||
378 | I assign them by following orders:
|
Add comments about Mangling and OnlyForCPUDispatchSpecific,
and remove the supporting more/new CPU names for cpu_specific/dispatch feature.
So my understanding here is that our intent was that the 'tune' cpu and the 'selected' cpu were not necessarily the same (either not the same name, OR not the same CPU!), right? Is that being lost here?