This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
TargetInfo.h
-
lib/
-
Basic/Targets/
-
Targets/
-
X86.h
1
X86.cpp
-
CodeGen/
3/4
CodeGenModule.cpp
-
test/CodeGen/
-
CodeGen/
1/1
attr-cpuspecific-avx-abi.c
-
attr-cpuspecific.c
-
llvm/include/llvm/Support/
-
include/
-
llvm/
-
Support/
3/4
X86TargetParser.def

Differential D121410

Have cpu-specific variants set 'tune-cpu' as an optimization hint
ClosedPublic

Authored by erichkeane on Mar 10 2022, 1:49 PM.

Download Raw Diff

Details

Reviewers

aaron.ballman
RKSimon
lebedev.ri
arsenm
andrew.w.kaylor
pengfei

Commits

rGdc152659b452: Have cpu-specific variants set 'tune-cpu' as an optimization hint

Summary

Due to various implementation constraints, despite the programmer
choosing a 'processor' cpu_dispatch/cpu_specific needs to use the
'feature' list of a processor to identify it. This results in the
identified processor in source-code not being propogated to the
optimizer, and thus, not able to be tuned for.

This patch changes to use the actual cpu as written for tune-cpu so that
opt can make decisions based on the cpu-as-spelled, which should better
match the behavior expected by the programmer.

Note that the 'valid' list of processors for x86 is in
llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list
contains only Intel processors, but other vendors may wish to add their
own entries as 'alias'es (or with different feature lists!).

If this is not done, there is two potential performance issues with the
patch, but I believe them to be worth it in light of the improvements to
behavior and performance.

1- In the event that the user spelled "ProcessorB", but we only have the
features available to test for "ProcessorA" (where A is B minus features),
AND there is an optimization opportunity for "B" that negatively affects
"A", the optimizer will likely choose to do so.

2- In the event that the user spelled VendorI's processor, and the feature
list allows it to run on VendorA's processor of similar features, AND there
is an optimization opportunity for VendorIs that negatively affects "A"s,
the optimizer will likely choose to do so. This can be fixed by adding an
alias to X86TargetParser.def.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

erichkeane created this revision.Mar 10 2022, 1:49 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 1:49 PM

Herald added a subscriber: pengfei. · View Herald Transcript

@aaron.ballman : if you can add other reviewers or subscribers (particularly those from "VendorA") it would be greatly appreciated!

Harbormaster completed remote builds in B153640: Diff 414488.Mar 10 2022, 2:17 PM

This example illustrates the problem this patch intends to fix: https://godbolt.org/z/j445sxPMc

For Intel microarchitectures before Skylake, the LLVM cost model says that vector fsqrt is slow, so if fast-math is enabled, we'll use an approximation rather than the vsqrtps instruction when vectorizing a call to sqrtf(). If the code is compiled with -march=skylake or -mtune=skylake, we'll choose the vsqrtps instruction, but with any earlier base target, we'll choose the approximation even if there is a cpu_specific(skylake) implementation in the source code.

For example

__attribute__((cpu_specific(skylake))) void foo(void) {
  for (int i = 0; i < 8; ++i)
    x[i] = sqrtf(y[i]);
}

compiles to

foo.b:
        vmovaps ymm0, ymmword ptr [rip + y]
        vrsqrtps        ymm1, ymm0
        vmulps  ymm2, ymm0, ymm1
        vbroadcastss    ymm3, dword ptr [rip + .LCPI2_0] # ymm3 = [-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0]
        vfmadd231ps     ymm3, ymm2, ymm1        # ymm3 = (ymm2 * ymm1) + ymm3
        vbroadcastss    ymm1, dword ptr [rip + .LCPI2_1] # ymm1 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
        vmulps  ymm1, ymm2, ymm1
        vmulps  ymm1, ymm1, ymm3
        vbroadcastss    ymm2, dword ptr [rip + .LCPI2_2] # ymm2 = [NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN]
        vandps  ymm0, ymm0, ymm2
        vbroadcastss    ymm2, dword ptr [rip + .LCPI2_3] # ymm2 = [1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38]
        vcmpleps        ymm0, ymm2, ymm0
        vandps  ymm0, ymm0, ymm1
        vmovaps ymmword ptr [rip + x], ymm0
        vzeroupper
        ret

but it should compile to

foo.b:
        vsqrtps ymm0, ymmword ptr [rip + y]
        vmovaps ymmword ptr [rip + x], ymm0
        vzeroupper
        ret

andrew.w.kaylor added inline comments.Mar 10 2022, 3:21 PM

clang/lib/CodeGen/CodeGenModule.cpp
2067	Unfortunately, I don't think it's this easy. The list of names used for cpu_specific doesn't come from the same place as the list of names used by "tune-cpu". For one thing, the cpu_specific names can't contain the '-' character, so we have names like "skylake_avx512" in cpu_specific that would need to be translated to "skylake-avx512" for "tune-cpu". I believe the list of valid names for "tune-cpu" comes from here: https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294 Also, some of the aliases supported by cpu_specific don't have any corresponding "tune-cpu" name. You happen to have picked one of these for the test. I believe "core_4th_gen_avx" should map to "haswell".
clang/test/CodeGen/attr-cpuspecific-avx-abi.c
28	As noted above, this isn't a valid setting for "tune-cpu". I think it would just be ignored.

erichkeane added inline comments.Mar 10 2022, 5:25 PM

clang/lib/CodeGen/CodeGenModule.cpp
2067	Hmm... this is unfortunate. I wonder if we add some 'translation' type field to the X86TargetParser.def entries? Any idea who the right one to populate said list would be?

Typos in wiht different feature lists and In the even that.

clang/lib/CodeGen/CodeGenModule.cpp
2067	I believe the list of valid names for "tune-cpu" comes from ... I think it's here https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Target/X86/X86.td#L1408 So back to Andy's problems, where we consume the cpu_specific names in compiler previously, e.g., mapping to different targets? Or it is done by external libraries like compiler-rt? I think I have the same requirments that mapping `-` and `_` for "tune-cpu" in https://github.com/llvm/llvm-project/issues/50125 where the preprocessor defines use `_` as well.

pengfei added reviewers: RKSimon, lebedev.ri.Mar 11 2022, 12:43 AM

Adding @arsenm because of this bit:

Note that the 'valid' list of processors for x86 is in llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list contains only Intel processors, but other vendors may wish to add their own entries as 'alias'es (or wiht different feature lists!).

Herald added a subscriber: wdng. · View Herald TranscriptMar 11 2022, 4:19 AM

Thanks all! I'll do some work on populating a list of 'converted names', but I'll definitely need @pengfei and @andrew.w.kaylor help checking the list/filling in what I miss.

erichkeane edited the summary of this revision. (Show Details)Mar 11 2022, 6:02 AM

add a 'translation' feature to the x86 target so that we can get the 'tune cpu' name from the list. Note that there are 9 with blanks that I was unable to figure out the corresponding name (I have an email out to @andrew.w.kaylor and @pengfei to tell me what it should be). In the meantime, these will result in NO tune-cpu.

Also note that I intentionally added this conversion from the 'alias' as well. This gives us the power to use an alias to change the 'tune' if we care to. Typically I'd consider this unimportant, but it means that previously mentioned VendorA (@arsenm) could simply add their processors as aliases and get the tune feature more easily.

Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2022, 6:57 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

erichkeane added inline comments.Mar 11 2022, 6:58 AM

llvm/include/llvm/Support/X86TargetParser.def
230	Note the blanks on 230-232, 234-237, 245, and 248. Otherwise, a double-check would be really appreciated from everyone familiar with the x86 naming.

Harbormaster completed remote builds in B153765: Diff 414650.Mar 11 2022, 7:34 AM

Corrected the last few processor names thanks to @andrew.w.kaylor and @pengfei

Harbormaster completed remote builds in B153799: Diff 414699.Mar 11 2022, 10:52 AM

This looks good to me. Thanks for the patch!

This revision is now accepted and ready to land.Mar 11 2022, 11:02 AM

craig.topper added a subscriber: craig.topper.Mar 11 2022, 11:13 AM

craig.topper added inline comments.

llvm/include/llvm/Support/X86TargetParser.def
236	core_aes_pclmulqdq is westmere

Update the core_aes_pclmulqdq to be westmere

erichkeane marked 4 inline comments as done.Mar 11 2022, 11:17 AM

erichkeane added inline comments.

llvm/include/llvm/Support/X86TargetParser.def
236	Thanks!

LGTM, though I'm not qualified to review the CPU specific bits in the .def file.

Harbormaster completed remote builds in B153808: Diff 414709.Mar 11 2022, 11:25 AM

LGTM.

clang/lib/Basic/Targets/X86.cpp
1133	clang-format.
llvm/include/llvm/Support/X86TargetParser.def
236	Missed the left `"`?

This revision was landed with ongoing or failed builds.Mar 14 2022, 6:14 AM

Closed by commit rGdc152659b452: Have cpu-specific variants set 'tune-cpu' as an optimization hint (authored by erichkeane). · Explain Why

This revision was automatically updated to reflect the committed changes.

erichkeane marked an inline comment as done.

erichkeane added a commit: rGdc152659b452: Have cpu-specific variants set 'tune-cpu' as an optimization hint.

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 6:14 AM

FreddyYe added a subscriber: FreddyYe.Jun 27 2023, 2:30 AM

FreddyYe added inline comments.

clang/lib/CodeGen/CodeGenModule.cpp
2067	Unfortunately, I don't think it's this easy. The list of names used for cpu_specific doesn't come from the same place as the list of names used by "tune-cpu". For one thing, the cpu_specific names can't contain the '-' character, so we have names like "skylake_avx512" in cpu_specific that would need to be translated to "skylake-avx512" for "tune-cpu". I believe the list of valid names for "tune-cpu" comes from here: https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294 Also, some of the aliases supported by cpu_specific don't have any corresponding "tune-cpu" name. You happen to have picked one of these for the test. I believe "core_4th_gen_avx" should map to "haswell". Happens to find this patch. I recently also change here back to the initial version of this patch at https://reviews.llvm.org/D151696. To resolve the problem @andrew.w.kaylor mentioned here, I added these "unsupported" names in X86.td like Phoebe mentioned below. If you are interested, feel free to comment there.

Herald added a subscriber: StephenFan. · View Herald TranscriptJun 27 2023, 2:30 AM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

TargetInfo.h

7 lines

lib/

Basic/

Targets/

X86.h

2 lines

X86.cpp

18 lines

CodeGen/

CodeGenModule.cpp

7 lines

test/

CodeGen/

attr-cpuspecific-avx-abi.c

2 lines

attr-cpuspecific.c

3 lines

llvm/

include/

llvm/

Support/

X86TargetParser.def

72 lines

Diff 415077

clang/include/clang/Basic/TargetInfo.h

	Show First 20 Lines • Show All 1,369 Lines • ▼ Show 20 Lines
	}			}

	// Get the character to be added for mangling purposes for cpu_specific.			// Get the character to be added for mangling purposes for cpu_specific.
	virtual char CPUSpecificManglingCharacter(StringRef Name) const {			virtual char CPUSpecificManglingCharacter(StringRef Name) const {
	llvm_unreachable(			llvm_unreachable(
	"cpu_specific Multiversioning not implemented on this target");			"cpu_specific Multiversioning not implemented on this target");
	}			}

				// Get the value for the 'tune-cpu' flag for a cpu_specific variant with the
				// programmer-specified 'Name'.
				virtual StringRef getCPUSpecificTuneName(StringRef Name) const {
				llvm_unreachable(
				"cpu_specific Multiversioning not implemented on this target");
				}

	// Get a list of the features that make up the CPU option for			// Get a list of the features that make up the CPU option for
	// cpu_specific/cpu_dispatch so that it can be passed to llvm as optimization			// cpu_specific/cpu_dispatch so that it can be passed to llvm as optimization
	// options.			// options.
	virtual void getCPUSpecificCPUDispatchFeatures(			virtual void getCPUSpecificCPUDispatchFeatures(
	StringRef Name, llvm::SmallVectorImpl<StringRef> &Features) const {			StringRef Name, llvm::SmallVectorImpl<StringRef> &Features) const {
	llvm_unreachable(			llvm_unreachable(
	"cpu_specific Multiversioning not implemented on this target");			"cpu_specific Multiversioning not implemented on this target");
	}			}
	▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/X86.h

Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	public:
bool validateCPUSpecificCPUDispatch(StringRef Name) const override;		bool validateCPUSpecificCPUDispatch(StringRef Name) const override;

char CPUSpecificManglingCharacter(StringRef Name) const override;		char CPUSpecificManglingCharacter(StringRef Name) const override;

void getCPUSpecificCPUDispatchFeatures(		void getCPUSpecificCPUDispatchFeatures(
StringRef Name,		StringRef Name,
llvm::SmallVectorImpl<StringRef> &Features) const override;		llvm::SmallVectorImpl<StringRef> &Features) const override;

		StringRef getCPUSpecificTuneName(StringRef Name) const override;

Optional<unsigned> getCPUCacheLineSize() const override;		Optional<unsigned> getCPUCacheLineSize() const override;

bool validateAsmConstraint(const char *&Name,		bool validateAsmConstraint(const char *&Name,
TargetInfo::ConstraintInfo &info) const override;		TargetInfo::ConstraintInfo &info) const override;

bool validateGlobalRegisterVariable(StringRef RegName, unsigned RegSize,		bool validateGlobalRegisterVariable(StringRef RegName, unsigned RegSize,
bool &HasSizeMismatch) const override {		bool &HasSizeMismatch) const override {
// esp and ebp are the only 32-bit registers the x86 backend can currently		// esp and ebp are the only 32-bit registers the x86 backend can currently
▲ Show 20 Lines • Show All 729 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/X86.cpp

Show First 20 Lines • Show All 1,089 Lines • ▼ Show 20 Lines	unsigned X86TargetInfo::multiVersionSortPriority(StringRef Name) const {

// Now we know we have a feature, so get its priority and shift it a few so		// Now we know we have a feature, so get its priority and shift it a few so
// that we have sufficient room for the CPUs (above).		// that we have sufficient room for the CPUs (above).
return getFeaturePriority(getFeature(Name)) << 1;		return getFeaturePriority(getFeature(Name)) << 1;
}		}

bool X86TargetInfo::validateCPUSpecificCPUDispatch(StringRef Name) const {		bool X86TargetInfo::validateCPUSpecificCPUDispatch(StringRef Name) const {
return llvm::StringSwitch<bool>(Name)		return llvm::StringSwitch<bool>(Name)
#define CPU_SPECIFIC(NAME, MANGLING, FEATURES) .Case(NAME, true)		#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES) .Case(NAME, true)
#define CPU_SPECIFIC_ALIAS(NEW_NAME, NAME) .Case(NEW_NAME, true)		#define CPU_SPECIFIC_ALIAS(NEW_NAME, TUNE_NAME, NAME) .Case(NEW_NAME, true)
#include "llvm/Support/X86TargetParser.def"		#include "llvm/Support/X86TargetParser.def"
.Default(false);		.Default(false);
}		}

static StringRef CPUSpecificCPUDispatchNameDealias(StringRef Name) {		static StringRef CPUSpecificCPUDispatchNameDealias(StringRef Name) {
return llvm::StringSwitch<StringRef>(Name)		return llvm::StringSwitch<StringRef>(Name)
#define CPU_SPECIFIC_ALIAS(NEW_NAME, NAME) .Case(NEW_NAME, NAME)		#define CPU_SPECIFIC_ALIAS(NEW_NAME, TUNE_NAME, NAME) .Case(NEW_NAME, NAME)
#include "llvm/Support/X86TargetParser.def"		#include "llvm/Support/X86TargetParser.def"
.Default(Name);		.Default(Name);
}		}

char X86TargetInfo::CPUSpecificManglingCharacter(StringRef Name) const {		char X86TargetInfo::CPUSpecificManglingCharacter(StringRef Name) const {
return llvm::StringSwitch<char>(CPUSpecificCPUDispatchNameDealias(Name))		return llvm::StringSwitch<char>(CPUSpecificCPUDispatchNameDealias(Name))
#define CPU_SPECIFIC(NAME, MANGLING, FEATURES) .Case(NAME, MANGLING)		#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES) .Case(NAME, MANGLING)
#include "llvm/Support/X86TargetParser.def"		#include "llvm/Support/X86TargetParser.def"
.Default(0);		.Default(0);
}		}

void X86TargetInfo::getCPUSpecificCPUDispatchFeatures(		void X86TargetInfo::getCPUSpecificCPUDispatchFeatures(
StringRef Name, llvm::SmallVectorImpl<StringRef> &Features) const {		StringRef Name, llvm::SmallVectorImpl<StringRef> &Features) const {
StringRef WholeList =		StringRef WholeList =
llvm::StringSwitch<StringRef>(CPUSpecificCPUDispatchNameDealias(Name))		llvm::StringSwitch<StringRef>(CPUSpecificCPUDispatchNameDealias(Name))
#define CPU_SPECIFIC(NAME, MANGLING, FEATURES) .Case(NAME, FEATURES)		#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES) .Case(NAME, FEATURES)
#include "llvm/Support/X86TargetParser.def"		#include "llvm/Support/X86TargetParser.def"
.Default("");		.Default("");
WholeList.split(Features, ',', /MaxSplit=/-1, /KeepEmpty=/false);		WholeList.split(Features, ',', /MaxSplit=/-1, /KeepEmpty=/false);
}		}

		StringRef X86TargetInfo::getCPUSpecificTuneName(StringRef Name) const {
		return llvm::StringSwitch<StringRef>(Name)
		#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES) .Case(NAME, TUNE_NAME)
		#define CPU_SPECIFIC_ALIAS(NEW_NAME, TUNE_NAME, NAME) .Case(NEW_NAME, TUNE_NAME)
		#include "llvm/Support/X86TargetParser.def"
		.Default("");
		pengfeiUnsubmitted Not Done Reply Inline Actions clang-format. pengfei: clang-format.
		}

// We can't use a generic validation scheme for the cpus accepted here		// We can't use a generic validation scheme for the cpus accepted here
// versus subtarget cpus accepted in the target attribute because the		// versus subtarget cpus accepted in the target attribute because the
// variables intitialized by the runtime only support the below currently		// variables intitialized by the runtime only support the below currently
// rather than the full range of cpus.		// rather than the full range of cpus.
bool X86TargetInfo::validateCpuIs(StringRef FeatureStr) const {		bool X86TargetInfo::validateCpuIs(StringRef FeatureStr) const {
return llvm::StringSwitch<bool>(FeatureStr)		return llvm::StringSwitch<bool>(FeatureStr)
#define X86_VENDOR(ENUM, STRING) .Case(STRING, true)		#define X86_VENDOR(ENUM, STRING) .Case(STRING, true)
#define X86_CPU_TYPE_ALIAS(ENUM, ALIAS) .Case(ALIAS, true)		#define X86_CPU_TYPE_ALIAS(ENUM, ALIAS) .Case(ALIAS, true)
▲ Show 20 Lines • Show All 405 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 2,054 Lines • ▼ Show 20 Lines	if (TD) {
getTarget().isValidCPUName(ParsedAttr.Architecture)) {		getTarget().isValidCPUName(ParsedAttr.Architecture)) {
TargetCPU = ParsedAttr.Architecture;		TargetCPU = ParsedAttr.Architecture;
TuneCPU = ""; // Clear the tune CPU.		TuneCPU = ""; // Clear the tune CPU.
}		}
if (!ParsedAttr.Tune.empty() &&		if (!ParsedAttr.Tune.empty() &&
getTarget().isValidCPUName(ParsedAttr.Tune))		getTarget().isValidCPUName(ParsedAttr.Tune))
TuneCPU = ParsedAttr.Tune;		TuneCPU = ParsedAttr.Tune;
}		}

		if (SD) {
		// Apply the given CPU name as the 'tune-cpu' so that the optimizer can
		// favor this processor.
		TuneCPU = getTarget().getCPUSpecificTuneName(
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions Unfortunately, I don't think it's this easy. The list of names used for cpu_specific doesn't come from the same place as the list of names used by "tune-cpu". For one thing, the cpu_specific names can't contain the '-' character, so we have names like "skylake_avx512" in cpu_specific that would need to be translated to "skylake-avx512" for "tune-cpu". I believe the list of valid names for "tune-cpu" comes from here: https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294 Also, some of the aliases supported by cpu_specific don't have any corresponding "tune-cpu" name. You happen to have picked one of these for the test. I believe "core_4th_gen_avx" should map to "haswell". andrew.w.kaylor: Unfortunately, I don't think it's this easy. The list of names used for cpu_specific doesn't…
		erichkeaneAuthorUnsubmitted Done Reply Inline Actions Hmm... this is unfortunate. I wonder if we add some 'translation' type field to the X86TargetParser.def entries? Any idea who the right one to populate said list would be? erichkeane: Hmm... this is unfortunate. I wonder if we add some 'translation' type field to the…
		pengfeiUnsubmitted Done Reply Inline Actions I believe the list of valid names for "tune-cpu" comes from ... I think it's here https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Target/X86/X86.td#L1408 So back to Andy's problems, where we consume the cpu_specific names in compiler previously, e.g., mapping to different targets? Or it is done by external libraries like compiler-rt? I think I have the same requirments that mapping `-` and `_` for "tune-cpu" in https://github.com/llvm/llvm-project/issues/50125 where the preprocessor defines use `_` as well. pengfei: > I believe the list of valid names for "tune-cpu" comes from ... I think it's here https…
		FreddyYeUnsubmitted Not Done Reply Inline Actions Unfortunately, I don't think it's this easy. The list of names used for cpu_specific doesn't come from the same place as the list of names used by "tune-cpu". For one thing, the cpu_specific names can't contain the '-' character, so we have names like "skylake_avx512" in cpu_specific that would need to be translated to "skylake-avx512" for "tune-cpu". I believe the list of valid names for "tune-cpu" comes from here: https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294 Also, some of the aliases supported by cpu_specific don't have any corresponding "tune-cpu" name. You happen to have picked one of these for the test. I believe "core_4th_gen_avx" should map to "haswell". Happens to find this patch. I recently also change here back to the initial version of this patch at https://reviews.llvm.org/D151696. To resolve the problem @andrew.w.kaylor mentioned here, I added these "unsupported" names in X86.td like Phoebe mentioned below. If you are interested, feel free to comment there. FreddyYe: > Unfortunately, I don't think it's this easy. The list of names used for cpu_specific doesn't…
		SD->getCPUName(GD.getMultiVersionIndex())->getName());
		}
} else {		} else {
// Otherwise just add the existing target cpu and target features to the		// Otherwise just add the existing target cpu and target features to the
// function.		// function.
Features = getTarget().getTargetOpts().Features;		Features = getTarget().getTargetOpts().Features;
}		}

if (!TargetCPU.empty()) {		if (!TargetCPU.empty()) {
Attrs.addAttribute("target-cpu", TargetCPU);		Attrs.addAttribute("target-cpu", TargetCPU);
▲ Show 20 Lines • Show All 4,609 Lines • Show Last 20 Lines

clang/test/CodeGen/attr-cpuspecific-avx-abi.c

	Show All 17 Lines
	__m256d foo(void) { return bar_avx1(); }			__m256d foo(void) { return bar_avx1(); }
	// CHECK: define{{.*}} @foo.A() #[[A:[0-9]+]]			// CHECK: define{{.*}} @foo.A() #[[A:[0-9]+]]

	__attribute__((cpu_specific(core_4th_gen_avx)))			__attribute__((cpu_specific(core_4th_gen_avx)))
	__m256d foo(void) { return bar_avx2(); }			__m256d foo(void) { return bar_avx2(); }
	// CHECK: define{{.*}} @foo.V() #[[V:[0-9]+]]			// CHECK: define{{.*}} @foo.V() #[[V:[0-9]+]]

	// CHECK: attributes #[[A]] = {{.*}}"target-features"="+avx,+crc32,+cx8,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"			// CHECK: attributes #[[A]] = {{.*}}"target-features"="+avx,+crc32,+cx8,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
				// CHECK-SAME: "tune-cpu"="generic"
	// CHECK: attributes #[[V]] = {{.*}}"target-features"="+avx,+avx2,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"			// CHECK: attributes #[[V]] = {{.*}}"target-features"="+avx,+avx2,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
				// CHECK-SAME: "tune-cpu"="haswell"
				andrew.w.kaylorUnsubmitted Done Reply Inline Actions As noted above, this isn't a valid setting for "tune-cpu". I think it would just be ignored. andrew.w.kaylor: As noted above, this isn't a valid setting for "tune-cpu". I think it would just be ignored.

clang/test/CodeGen/attr-cpuspecific.c

	Show First 20 Lines • Show All 334 Lines • ▼ Show 20 Lines

	// WINDOWS: define dso_local i32 @DispatchFirst.B			// WINDOWS: define dso_local i32 @DispatchFirst.B
	// WINDOWS: ret i32 1			// WINDOWS: ret i32 1

	ATTR(cpu_specific(knl))			ATTR(cpu_specific(knl))
	void OrderDispatchUsageSpecific(void) {}			void OrderDispatchUsageSpecific(void) {}

	// CHECK: attributes #[[S]] = {{.*}}"target-features"="+avx,+cmov,+crc32,+cx8,+f16c,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"			// CHECK: attributes #[[S]] = {{.*}}"target-features"="+avx,+cmov,+crc32,+cx8,+f16c,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
				// CHECK-SAME: "tune-cpu"="ivybridge"
	// CHECK: attributes #[[K]] = {{.*}}"target-features"="+adx,+avx,+avx2,+avx512cd,+avx512er,+avx512f,+avx512pf,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"			// CHECK: attributes #[[K]] = {{.*}}"target-features"="+adx,+avx,+avx2,+avx512cd,+avx512er,+avx512f,+avx512pf,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
				// CHECK-SAME: "tune-cpu"="knl"
	// CHECK: attributes #[[O]] = {{.*}}"target-features"="+cmov,+cx8,+mmx,+movbe,+sse,+sse2,+sse3,+ssse3,+x87"			// CHECK: attributes #[[O]] = {{.*}}"target-features"="+cmov,+cx8,+mmx,+movbe,+sse,+sse2,+sse3,+ssse3,+x87"
				// CHECK-SAME: "tune-cpu"="atom"

llvm/include/llvm/Support/X86TargetParser.def

	Show First 20 Lines • Show All 205 Lines • ▼ Show 20 Lines
	X86_FEATURE (RETPOLINE_INDIRECT_BRANCHES, "retpoline-indirect-branches")			X86_FEATURE (RETPOLINE_INDIRECT_BRANCHES, "retpoline-indirect-branches")
	X86_FEATURE (RETPOLINE_INDIRECT_CALLS, "retpoline-indirect-calls")			X86_FEATURE (RETPOLINE_INDIRECT_CALLS, "retpoline-indirect-calls")
	X86_FEATURE (LVI_CFI, "lvi-cfi")			X86_FEATURE (LVI_CFI, "lvi-cfi")
	X86_FEATURE (LVI_LOAD_HARDENING, "lvi-load-hardening")			X86_FEATURE (LVI_LOAD_HARDENING, "lvi-load-hardening")
	#undef X86_FEATURE_COMPAT			#undef X86_FEATURE_COMPAT
	#undef X86_FEATURE			#undef X86_FEATURE

	#ifndef CPU_SPECIFIC			#ifndef CPU_SPECIFIC
	#define CPU_SPECIFIC(NAME, MANGLING, FEATURES)			#define CPU_SPECIFIC(NAME, TUNE_NAME, MANGLING, FEATURES)
	#endif			#endif

	#ifndef CPU_SPECIFIC_ALIAS			#ifndef CPU_SPECIFIC_ALIAS
	#define CPU_SPECIFIC_ALIAS(NEW_NAME, NAME)			#define CPU_SPECIFIC_ALIAS(NEW_NAME, TUNE_NAME, NAME)
	#endif			#endif

	CPU_SPECIFIC("generic", 'A', "")			CPU_SPECIFIC("generic", "generic", 'A', "")
	CPU_SPECIFIC("pentium", 'B', "")			CPU_SPECIFIC("pentium", "pentium", 'B', "")
	CPU_SPECIFIC("pentium_pro", 'C', "+cmov")			CPU_SPECIFIC("pentium_pro", "pentiumpro", 'C', "+cmov")
	CPU_SPECIFIC("pentium_mmx", 'D', "+mmx")			CPU_SPECIFIC("pentium_mmx", "pentium-mmx", 'D', "+mmx")
	CPU_SPECIFIC("pentium_ii", 'E', "+cmov,+mmx")			CPU_SPECIFIC("pentium_ii", "pentium2", 'E', "+cmov,+mmx")
	CPU_SPECIFIC("pentium_iii", 'H', "+cmov,+mmx,+sse")			CPU_SPECIFIC("pentium_iii", "pentium3", 'H', "+cmov,+mmx,+sse")
	CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium_iii")			CPU_SPECIFIC_ALIAS("pentium_iii_no_xmm_regs", "pentium3", "pentium_iii")
	CPU_SPECIFIC("pentium_4", 'J', "+cmov,+mmx,+sse,+sse2")			CPU_SPECIFIC("pentium_4", "pentium4", 'J', "+cmov,+mmx,+sse,+sse2")
	CPU_SPECIFIC("pentium_m", 'K', "+cmov,+mmx,+sse,+sse2")			CPU_SPECIFIC("pentium_m", "pentium-m", 'K', "+cmov,+mmx,+sse,+sse2")
	CPU_SPECIFIC("pentium_4_sse3", 'L', "+cmov,+mmx,+sse,+sse2,+sse3")			CPU_SPECIFIC("pentium_4_sse3", "prescott", 'L', "+cmov,+mmx,+sse,+sse2,+sse3")
				erichkeaneAuthorUnsubmitted Done Reply Inline Actions Note the blanks on 230-232, 234-237, 245, and 248. Otherwise, a double-check would be really appreciated from everyone familiar with the x86 naming. erichkeane: Note the blanks on 230-232, 234-237, 245, and 248. Otherwise, a double-check would be really…
	CPU_SPECIFIC("core_2_duo_ssse3", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3")			CPU_SPECIFIC("core_2_duo_ssse3", "core2", 'M', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3")
	CPU_SPECIFIC("core_2_duo_sse4_1", 'N', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1")			CPU_SPECIFIC("core_2_duo_sse4_1", "penryn", 'N', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1")
	CPU_SPECIFIC("atom", 'O', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+movbe")			CPU_SPECIFIC("atom", "atom", 'O', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+movbe")
	CPU_SPECIFIC("atom_sse4_2", 'c', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")			CPU_SPECIFIC("atom_sse4_2", "silvermont", 'c', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
	CPU_SPECIFIC("core_i7_sse4_2", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")			CPU_SPECIFIC("core_i7_sse4_2", "nehalem", 'P', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
	CPU_SPECIFIC("core_aes_pclmulqdq", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")			CPU_SPECIFIC("core_aes_pclmulqdq", "westmere", 'Q', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
				craig.topperUnsubmitted Done Reply Inline Actions core_aes_pclmulqdq is westmere craig.topper: core_aes_pclmulqdq is westmere
				erichkeaneAuthorUnsubmitted Done Reply Inline Actions Thanks! erichkeane: Thanks!
				pengfeiUnsubmitted Not Done Reply Inline Actions Missed the left `"`? pengfei: Missed the left `"`?
	CPU_SPECIFIC("atom_sse4_2_movbe", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")			CPU_SPECIFIC("atom_sse4_2_movbe", "silvermont", 'd', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")
	CPU_SPECIFIC("goldmont", 'i', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")			CPU_SPECIFIC("goldmont", "goldmont", 'i', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")
	CPU_SPECIFIC("sandybridge", 'R', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+avx")			CPU_SPECIFIC("sandybridge", "sandybridge", 'R', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+avx")
	CPU_SPECIFIC_ALIAS("core_2nd_gen_avx", "sandybridge")			CPU_SPECIFIC_ALIAS("core_2nd_gen_avx", "sandybridge", "sandybridge")
	CPU_SPECIFIC("ivybridge", 'S', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+f16c,+avx")			CPU_SPECIFIC("ivybridge", "ivybridge", 'S', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt,+f16c,+avx")
	CPU_SPECIFIC_ALIAS("core_3rd_gen_avx", "ivybridge")			CPU_SPECIFIC_ALIAS("core_3rd_gen_avx", "ivybridge", "ivybridge")
	CPU_SPECIFIC("haswell", 'V', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2")			CPU_SPECIFIC("haswell", "haswell", 'V', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2")
	CPU_SPECIFIC_ALIAS("core_4th_gen_avx", "haswell")			CPU_SPECIFIC_ALIAS("core_4th_gen_avx", "haswell", "haswell")
	CPU_SPECIFIC("core_4th_gen_avx_tsx", 'W', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2")			CPU_SPECIFIC("core_4th_gen_avx_tsx", "haswell", 'W', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2")
	CPU_SPECIFIC("broadwell", 'X', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx")			CPU_SPECIFIC("broadwell", "broadwell", 'X', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx")
	CPU_SPECIFIC_ALIAS("core_5th_gen_avx", "broadwell")			CPU_SPECIFIC_ALIAS("core_5th_gen_avx", "broadwell", "broadwell")
	CPU_SPECIFIC("core_5th_gen_avx_tsx", 'Y', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx")			CPU_SPECIFIC("core_5th_gen_avx_tsx", "broadwell", 'Y', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx")
	CPU_SPECIFIC("knl", 'Z', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd")			CPU_SPECIFIC("knl", "knl", 'Z', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd")
	CPU_SPECIFIC_ALIAS("mic_avx512", "knl")			CPU_SPECIFIC_ALIAS("mic_avx512", "knl", "knl")
	CPU_SPECIFIC("skylake", 'b', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx,+mpx")			CPU_SPECIFIC("skylake", "skylake", 'b', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+adx,+mpx")
	CPU_SPECIFIC( "skylake_avx512", 'a', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512cd,+avx512bw,+avx512vl,+clwb")			CPU_SPECIFIC( "skylake_avx512", "skylake-avx512", 'a', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512cd,+avx512bw,+avx512vl,+clwb")
	CPU_SPECIFIC("cannonlake", 'e', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512ifma,+avx512cd,+avx512bw,+avx512vl,+avx512vbmi")			CPU_SPECIFIC("cannonlake", "cannonlake", 'e', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512dq,+avx512f,+adx,+avx512ifma,+avx512cd,+avx512bw,+avx512vl,+avx512vbmi")
	CPU_SPECIFIC("knm", 'j', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd,+avx5124fmaps,+avx5124vnniw,+avx512vpopcntdq")			CPU_SPECIFIC("knm", "knm", 'j', "+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt,+f16c,+avx,+fma,+bmi,+lzcnt,+avx2,+avx512f,+adx,+avx512er,+avx512pf,+avx512cd,+avx5124fmaps,+avx5124vnniw,+avx512vpopcntdq")

	#undef CPU_SPECIFIC_ALIAS			#undef CPU_SPECIFIC_ALIAS
	#undef CPU_SPECIFIC			#undef CPU_SPECIFIC