This is an archive of the discontinued LLVM Phabricator instance.

[X86] Remove CPU_SPECIFIC* MACROs and add getCPUDispatchMangling
ClosedPublic

Authored by FreddyYe on May 30 2023, 1:43 AM.

Details

Summary

This refactor patch means to remove CPU_SPECIFIC* MACROs in X86TargetParser.def
and move those information into ProcInfo of X86TargetParser.cpp. Since these
two files both maintain a table with redundant info such as cpuname and its
features supported. CPU_SPECIFIC* MACROs define some different information. This
patch dealt with them in these ways when moving:
1.mangling
This is now moved to Mangling in ProcInfo and directly initialized at array of
Processors. CPUs don't support cpu_dispatch/specific are assigned '\0' as
mangling.
2.CPU alias
The alias cpu will also be initialized in array of Processors, its attributes
will be same as its alias target cpu. Same feature list, same mangling.
3.TUNE_NAME
Before my change, some cpu names support cpu_dispatch/specific are not
supported in X86.td, which means optimizer/backend doesn't recognize them. So
they use a different TUNE_NAME to generate in IR. In this patch, I added these
missing cpu support at X86.td by utilizing existing Features and XXXTunings, so
that each cpu name can directly use its own name as TUNE_NAME to be supported
by optimizer/backend.
4.Feature list
The feature list of one CPU maintained in X86TargetParser.def is not same as
the one in X86TargetParser.cpp. It only maintains part of features of one CPU
(features defined by X86_FEATURE_COMPAT). While X86TargetParser.cpp maintains
a complete one. This patch abandons the feature list maintained by CPU_SPECIFIC*
MACROs because assigning a CPU with a complete one doesn't affect the
functionality of cpu_dispatch/specific.
Except these four info, since some of CPUs supported by cpu_dispatch/specific
doesn's support clang options like -march, -mtune before, this patch also kept
this behavior still by adding another member OnlyForCPUDispatchSpecific in
ProcInfo.

Diff Detail

Event Timeline

FreddyYe created this revision.May 30 2023, 1:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2023, 1:43 AM
FreddyYe requested review of this revision.May 30 2023, 1:43 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 30 2023, 1:43 AM
FreddyYe retitled this revision from Remove CPU_SPECIFIC* MACROs and use unified getManglingForCPU to [WIP] Remove CPU_SPECIFIC* MACROs and use unified getManglingForCPU.May 30 2023, 1:43 AM
FreddyYe updated this revision to Diff 531596.Jun 14 2023, 7:49 PM

Update to a more conservative change.

FreddyYe retitled this revision from [WIP] Remove CPU_SPECIFIC* MACROs and use unified getManglingForCPU to [x86] Remove CPU_SPECIFIC* MACROs and add getManglingForCPU.Jun 14 2023, 7:50 PM
FreddyYe updated this revision to Diff 531615.Jun 14 2023, 10:44 PM

misc refine.

This patch means to remove CPU_SPECIFIC* MACROs in X86TargetParser.def and move that part of functionality into X86TargetParser.cpp.
Since these two files both maintain a table with cpuname, features of this cpu supported, ... This move can reduce the codes.

CPU_SPECIFIC* has some different and specific information maintained. This patch dealt with them in these ways when moving:

  1. mangling This is now moved to Mangling in ProcInfo and directly initialized at array of Processors. CPUs don't support cpu_dispatch/specific are assigned '\0' as mangling. This patch also supports some of new intel cpus for cpu_dispatch/specific feature.
  2. alias relationship The alias cpu will also be initialized in array of Processors, its attributes will be same as its alias target cpu. Same feature list, same mangling.
  3. TUNE_NAME Before my change, some cpu names support cpu_dispatch/specific are not supported in X86.td, which means optimizer/backend doesn't recognize them. So they use a different TUNE_NAME to generate in IR. In this patch, I added these missing cpu support at X86.td by utilize existing Features and XXXTunings. So that each cpu name can directly use its own name as TUNE_NAME to be supported by optimizer/backend.
  4. Feature list The feature list of one CPU maintained in X86TargetParser.def is not same as the one in X86TargetParser.cpp. It only maintains part of features of one CPU(Features defined by X86_FEATURE_COMPAT). While X86TargetParser.cpp maintains a complete one. This patch abandons the feature list maintained in X86TargetParser.def because assigning a CPU with a complete feature list in X86TargetParser.cpp doesn't affect the functionality of cpu_dispatch/specific. See the implement of llvm::X86::getCpuSupportsMask, it already masked out the features not defined by X86_FEATURE_COMPAT.

Beyond these four information, since some of CPUs supported by cpu_dispatch/specific doesn's support clang options like -march, -mtune before, this patch also kept this behavior still by adding another member OnlyForCPUSpecificDispath in ProcInfo.

FreddyYe edited the summary of this revision. (Show Details)
RKSimon added inline comments.Jun 15 2023, 6:21 AM
llvm/lib/TargetParser/X86TargetParser.cpp
15

(clang-format) - include order

313

Would it be better to move all of this into X86TargetParser.def ?

314

FeaturesPentiumMMX & ~FeaturesPentiumMMX ?

llvm/test/CodeGen/X86/cpus-intel.ll
34

Keeping all the aliased cpus variants RUN together will make maintenance easier (e.g. pentium_mmx)

FreddyYe added inline comments.Jun 15 2023, 7:11 AM
llvm/lib/TargetParser/X86TargetParser.cpp
313

Feels so. My next step plan it to furtherly combine the feature list table in X86.td together.

314

I think I was to initialize a null feature list but failed... Will refine.

FreddyYe updated this revision to Diff 531993.Jun 15 2023, 11:06 PM
  1. Update generic and pentium Features to align with X86.td.
  2. Readjust tests in cpus-intel.ll
FreddyYe marked 2 inline comments as done.Jun 15 2023, 11:07 PM
FreddyYe added inline comments.
llvm/test/CodeGen/X86/cpus-intel.ll
34

Updated, not sure if I understood your point. pls review again.

RKSimon added inline comments.Jun 16 2023, 2:46 AM
llvm/test/CodeGen/X86/cpus-intel.ll
8–9

I meant like this:

; RUN: llc < %s -o /dev/null -mtriple=i686-unknown-unknown -mcpu=pentium-mmx 2>&1 | FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty
; RUN: llc < %s -o /dev/null -mtriple=i686-unknown-unknown -mcpu=pentium_mmx 2>&1 | FileCheck %s --check-prefix=CHECK-NO-ERROR --allow-empty

The idea is to keep RUN lines that test the equivalent cpus together, so its easier for any future edits to handle them together

FreddyYe updated this revision to Diff 532494.Jun 18 2023, 6:10 PM

Update cpus-intel.ll.

FreddyYe marked 2 inline comments as done.Jun 18 2023, 6:28 PM
FreddyYe added inline comments.
llvm/test/CodeGen/X86/cpus-intel.ll
8–9

Woops, I really misunderstood. Updated.

FreddyYe updated this revision to Diff 532517.Jun 18 2023, 8:44 PM
FreddyYe marked an inline comment as done.

Typo refine: OnlyForCPUSpecificDispath -> OnlyForCPUDispatchSpecific

Thanks for @RKSimon 's review, I'd like also to mention that https://reviews.llvm.org/D152989 is supposed to be the base commit of here.

RKSimon added inline comments.Jun 19 2023, 7:58 AM
llvm/lib/TargetParser/X86TargetParser.cpp
110

Do we have a documented list of the mangling values anywhere? The values below look too much like magic numbers tbh.

llvm/test/CodeGen/X86/cpus-intel.ll
90

put this with the goldmonth checks?

FreddyYe added inline comments.Jun 19 2023, 6:03 PM
llvm/lib/TargetParser/X86TargetParser.cpp
110

I think not. I copied them from the old MACROs in X86TargetParser.def. I think the values are just used to do some distinguish from different CPUs. And 'a'-'Z' is about to run out of soon here. We probably need to extend the mangling, may be with another letter as prefix. @erichkeane may have a better answer here.

llvm/test/CodeGen/X86/cpus-intel.ll
90

I'll do the adjust. BTW I'd like also to mention below:

Here are the rules I used to add features/tuning model in X86.td for missing cpu names in CPU_SPECIFIC Macros:

RULE 1: Infer from the name itself. E.g. pentium_4 -> pentium4, pentium_iii -> pentium3
RULE 2: CPU_SPECIFIC_ALIAS can tell me its old name. Using the old name to continue to infer.
RULE 3: Use the feature list in CPU_SPECIFIC to compare and append with existing CPU names:

pentium_4_sse3 -> prescott
core_2_duo_ssse3 -> core2
core_2_duo_sse4_1 -> penryn
atom_sse4_2 -> silvermont
core_i7_sse4_2 -> nehalem
core_aes_pclmulqdq -> westmere
core_5th_gen_avx_tsx -> broadwell

There is only one exception: atom_sse4_2_movbe. For now I set it as goldmont features and silvermont tuning model according to its definition:

CPU_SPECIFIC("atom_sse4_2_movbe", "silvermont", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")
FreddyYe updated this revision to Diff 532772.Jun 19 2023, 6:13 PM

Adjust atom_sse4_2_movbe test.

FreddyYe retitled this revision from [x86] Remove CPU_SPECIFIC* MACROs and add getManglingForCPU to [x86] Remove CPU_SPECIFIC* MACROs and add getCPUDispatchMangling.Jun 20 2023, 11:20 PM
pengfei added inline comments.Jun 20 2023, 11:49 PM
clang/test/CodeGen/attr-cpuspecific.c
342

Why cmov disappearing? This feature is supported by most recent targets. Same below.

FreddyYe added inline comments.Jun 20 2023, 11:57 PM
clang/test/CodeGen/attr-cpuspecific.c
342

Good catch. Seems like frontend codes missed CMOV from the very beginning, while X86.td doesn't have this issue. Will update.

FreddyYe updated this revision to Diff 533155.Jun 21 2023, 12:11 AM

Add missing FeatureCMOV in X8TargetParser.cpp

FreddyYe updated this revision to Diff 533161.Jun 21 2023, 12:26 AM

debug typo.

Please can you cleanup the summary, as it isn't very easy to understand at the moment. Possibly split into a series of bullet points describing the changes?

FreddyYe edited the summary of this revision. (Show Details)Jun 22 2023, 5:50 AM
FreddyYe edited the summary of this revision. (Show Details)

gentle ping _cpu_dispatch/specific feature owner @erichkeane. Any comments?

I think this is OK, I have a slight fear we're losing a bit of the 'tune' functionality, but it is not impossible that we've never really cared about that. One concern I have is that the list was used for the resolver function, but I don't see any test changes for that? Are we properly filtering out the features list somehow?

clang/lib/CodeGen/CodeGenModule.cpp
2272

So my understanding here is that our intent was that the 'tune' cpu and the 'selected' cpu were not necessarily the same (either not the same name, OR not the same CPU!), right? Is that being lost here?

I think this is OK, I have a slight fear we're losing a bit of the 'tune' functionality, but it is not impossible that we've never really cared about that. One concern I have is that the list was used for the resolver function, but I don't see any test changes for that? Are we properly filtering out the features list somehow?

Yes! Now there are no tests yet testing the resolver function influenced by the feature list. I'll add another pre-commit test to show this.

I think this is OK, I have a slight fear we're losing a bit of the 'tune' functionality, but it is not impossible that we've never really cared about that. One concern I have is that the list was used for the resolver function, but I don't see any test changes for that? Are we properly filtering out the features list somehow?

Yes! Now there are no tests yet testing the resolver function influenced by the feature list. I'll add another pre-commit test to show this.

Added checks in https://reviews.llvm.org/D152989. pls help review. About the "-tune-cpu", I think it's ok to use the same name as the one that users specified in _cpu_specific(), after I supported those missing names in X86.td so that optimizer can now recognize them.

FreddyYe updated this revision to Diff 535277.Jun 28 2023, 1:18 AM
  1. Fixed copy errors of some CPU's manglings
  2. Updated resolver function checkers due to more complete feature list changes.
  3. Added more cpu checks in attr-cpuspecific-cpus.c

Notice that these two changes are both expected since a more complete feature list won't influence the _cpu_dispatch/specific multiversion to not work.

clang/test/CodeGen/attr-cpuspecific.c
45–46

This value change is because the feature list of ivybridge described in X86TargetParser.def before missed feature "pclmul".

51–52

This value change is because the feature list of knl described in X86TargetParser.def before missed feature "bmi2" and "aes".

pengfei accepted this revision.Jun 28 2023, 11:46 PM

I have some concerns for RULE 3, especially core_aes_pclmulqdq -> westmere and atom_sse4_2_movbe -> silvermont.
Sometimes, we have minor feature differences in the same generation targets. I guess that's why we use arch_feature naming like core_2_duo_ssse3. Merging them into the same generation or the next generation might corrup the intention here. But I'm not expert in CPUDispatch, and I don't see any existing tests for them, so I won't block the patch since it's an improvement in general.
Please wait a few days for other reviewers' opinions.

clang/test/CodeGen/attr-cpuspecific.c
51–52

The comment is for TwoVersions?

This revision is now accepted and ready to land.Jun 28 2023, 11:46 PM

It looks to me the failed unit tests might be related to this patch, please take a look.

FreddyYe marked an inline comment as done.Jun 29 2023, 12:48 AM
FreddyYe added inline comments.
clang/test/CodeGen/attr-cpuspecific.c
51–52

Yes. Sorry for wrong point.

FreddyYe marked an inline comment as done.Jun 29 2023, 1:42 AM

It looks to me the failed unit tests might be related to this patch, please take a look.

This is due to FeatureCMOV adding. Should I split into another review?

FreddyYe added a comment.EditedJun 29 2023, 1:56 AM

I have some concerns for RULE 3, especially core_aes_pclmulqdq -> westmere and atom_sse4_2_movbe -> silvermont.
Sometimes, we have minor feature differences in the same generation targets. I guess that's why we use arch_feature naming like core_2_duo_ssse3. Merging them into the same generation or the next generation might corrup the intention here. But I'm not expert in CPUDispatch, and I don't see any existing tests for them, so I won't block the patch since it's an improvement in general.
Please wait a few days for other reviewers' opinions.

You are right. For historical reasons, I can't find which product each cpu name string points to actually. From the old feature list in X86TargetParser.def, these three cpu names even share a same feature list:

atom_sse4_2
core_i7_sse4_2
core_aes_pclmulqdq

"core_5th_gen_avx_tsx" is also same as "broadwell". So I have a new proposal for RULE3, which can be considered to be more conservative:

pentium_4_sse3 -> prescott (FeatureSSE3)
    First introduce FeatureSSE3 like prescott
core_2_duo_ssse3 -> core2 (FeatureSSSE3)
    First introduce FeatureSSSE3 like core2
core_2_duo_sse4_1 -> penryn (FeatureSSE4_1)
    First introduce FeatureSSE4_1 like penryn
atom_sse4_2 -> nehalem (FeatureSSE4_2)
    First introduce FeatureSSE4_2 like nehalem
core_i7_sse4_2 -> nehalem (FeatureSSE4_2)
    First introduce FeatureSSE4_2 like nehalem
core_aes_pclmulqdq -> nehalem (FeatureSSE4_2)
    First introduce FeatureSSE4_2 like nehalem
core_5th_gen_avx_tsx -> broadwell 
    Same feature list as broadwell

Meanwhile, the fact above won't affect code changes in X86.td. I'll still define these new cpu names with the TUNE_NAME info in original source.

It looks to me the failed unit tests might be related to this patch, please take a look.

This is due to FeatureCMOV adding. Should I split into another review?

Yes, it looks like it would be best to split off and commit some of the fixes (cmov / the isa changes causing the attr-cpuspecific.c diffs etc.) first, before this refactor patch.

Yes, it looks like it would be best to split off and commit some of the fixes (cmov / the isa changes causing the attr-cpuspecific.c diffs etc.) first, before this refactor patch.

Done in https://reviews.llvm.org/D154181

FreddyYe updated this revision to Diff 536118.Jun 29 2023, 10:54 PM

changes according to the new RULE3, pls review again.

RKSimon added inline comments.Jun 30 2023, 1:22 AM
clang/test/CodeGen/attr-cpuspecific.c
45–46

Pull out an ivybridge fix into its own patch?

FreddyYe added inline comments.Jun 30 2023, 1:29 AM
clang/test/CodeGen/attr-cpuspecific.c
45–46

Emmm. Seems like a good idea. How about I change these two CPU's old feature list only? Since the successors of them will also have this issue, but no tests influenced.

RKSimon added inline comments.Jun 30 2023, 1:33 AM
clang/test/CodeGen/attr-cpuspecific.c
45–46

Sure, a single patch with multiple cpus' fixups is fine

FreddyYe marked an inline comment as done.Jun 30 2023, 6:00 AM
FreddyYe added inline comments.
clang/test/CodeGen/attr-cpuspecific.c
45–46
FreddyYe updated this revision to Diff 536645.Jul 2 2023, 7:00 PM
FreddyYe marked an inline comment as done.

Rebase, especially for D151696

gentle ping. If no objections, I'll merge this tomorrow.

RKSimon added inline comments.Jul 3 2023, 9:08 AM
llvm/lib/TargetParser/X86TargetParser.cpp
378

I'm still not clear on what determines the mangling mode and cpu dispatch flag for cpu targets are supposedly the same? For example, none of these ivybridge equivalent configs have the same values.

FreddyYe marked an inline comment as done.Jul 3 2023, 7:29 PM
FreddyYe added inline comments.
clang/test/CodeGen/attr-cpuspecific-cpus.c
40 ↗(On Diff #536645)

In this patch, I additionally supported some intel new CPU's _cpu_specific feature by creating a new mangling or copy some old ones (which means aliasing certain cpu). Maybe I should do this in a following patch?

llvm/lib/TargetParser/X86TargetParser.cpp
378

I assign them by following orders:

  1. Copy the mangling from the original CPU_SPEICIFC MACRO.
  2. If there's no way to copy, assign to '\0' by default, which means doesn't support __cpu_specific/dispatch feature.
  3. If cpu name contain ''-', assign the mangling as '\0', too. Because '-' cannot be correctly identified in _cpu_specific/dispatch().
  4. set OnlyForCPUDispatch flag as true if this cpu name was not listed here, which means it doesn't support -march, -mtune and so on. This flag makes this cpu name only support cpu_dispatch/specific feature. E.g. core_3rd_gen_avx, core_4rd_gen_avx., ... And normally, these names are very old. So supporting them with -march=, -mtune= is not easy for now. And notice that new cpu names shouldn't set this flag as true since they should both support -march= and cpu_specific/dispatch feature by default.
RKSimon added inline comments.Jul 4 2023, 2:08 AM
clang/test/CodeGen/attr-cpuspecific-cpus.c
40 ↗(On Diff #536645)

Yes please, that would great.

llvm/lib/TargetParser/X86TargetParser.cpp
378

OK - please can you add that to a comment above line 311 for future reference?

FreddyYe updated this revision to Diff 537065.Jul 4 2023, 5:39 AM
FreddyYe marked 2 inline comments as done.

Add comments about Mangling and OnlyForCPUDispatchSpecific,
and remove the supporting more/new CPU names for cpu_specific/dispatch feature.

FreddyYe retitled this revision from [x86] Remove CPU_SPECIFIC* MACROs and add getCPUDispatchMangling to [X86] Remove CPU_SPECIFIC* MACROs and add getCPUDispatchMangling.Jul 4 2023, 5:45 AM
FreddyYe edited the summary of this revision. (Show Details)
RKSimon accepted this revision.Jul 5 2023, 1:43 AM

LGTM - cheers

This revision was landed with ongoing or failed builds.Jul 5 2023, 2:32 AM
This revision was automatically updated to reflect the committed changes.

Thank so much on all your comments/review to make this happen. Cheers~!