This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
4/4
TargetInfo.h
-
lib/Basic/Targets/
-
Basic/
-
Targets/
-
X86.h
17/25
X86.cpp

Differential D74918

Add method to TargetInfo to get CPU cache line size
ClosedPublic

Authored by zoecarver on Feb 20 2020, 11:23 AM.

Download Raw Diff

Details

Reviewers

jfb
EricWF
efriedma
jyknight
echristo
__simt__

Commits

rGb915aec6b591: Add method to TargetInfo to get CPU cache line size

Summary

This patch adds a virtual method getCPUCacheLineSize() to TargetInfo. Currently, I've only (partially) implemented the method in X86TargetInfo. It's extremely important that each CPU's cache line size correct (e.g., we can't just define it as 64 across the board) so, it has been a little slow getting to this point. There are still quite a few CPUs I haven't been able to find the cache line size of yet; for those, I'm returning zero so that the caller of this method can propagate an error. See the commented table above X86TargetInfo::getCPUCacheLineSize to check my sources for each CPU.

I'll work on the ARM CPUs next, but that will probably come later in a different patch.

Also, I updated the current uses of cache line sizes in the compiler to use this API when possible. The only one (that I could find) that I didn't update is in TargetTransformInfo. Updating that would require a more significant API change, which would be out of scope for this patch. It would be nice if that also used this API (to keep everything in one place), so I'll try to update that too at some point.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

zoecarver created this revision.Feb 20 2020, 11:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 20 2020, 11:23 AM

Herald added subscribers: cfe-commits, kristof.beyls. · View Herald Transcript

Harbormaster completed remote builds in B46935: Diff 245700.Feb 20 2020, 11:24 AM

zoecarver added reviewers: jfb, EricWF, efriedma, jyknight, echristo.Feb 20 2020, 11:25 AM

Herald added a subscriber: dexonsmith. · View Herald TranscriptFeb 20 2020, 11:25 AM

zoecarver mentioned this in D66822: Hardware cache line size builtins.Feb 20 2020, 11:26 AM

zoecarver added a reviewer: __simt__.

No proofs, but for completeness, some amd numbers

clang/lib/Basic/Targets/X86.cpp
1811–1814	K6 should have 32B cache line size
1815–1832	K7..ZnVer2 should have 64B
1833–1834	32B for this one

@lebedev.ri thanks! I'll add those.

Add AMD cache line sizes based on @lebedev.ri's comments

Harbormaster completed remote builds in B46939: Diff 245708.Feb 20 2020, 12:01 PM

Where are you planning to use this?

clang/lib/Sema/SemaStmt.cpp
2817 ↗	(On Diff #245708)	"64" here is arbitrary; it's not actually supposed to be the cache line size for any particular CPU, just something generally expensive. I don't think this warning should depend on -mcpu.

As I've mentioned before, depending on what this will be used for, "64" is not a _useful_ answer if you want to know how your memory accesses will behave on modern intel x86 CPUs, despite being the "correct" answer for cache-line size. But, modern intel CPUs fetch aligned-pairs of cache-lines together. Therefore, you'll start to see cache interference effects as-if the cache-lines were 128 bytes, rather than 64 bytes. (This may or may not also apply to AMD cpus, I haven't looked into it.)

jfb added inline comments.Feb 20 2020, 1:56 PM

clang/include/clang/Basic/TargetInfo.h
1211	Return an optional instead of using zero to mean "unknown"?
clang/lib/Basic/Targets/X86.cpp
1840	All of the above (except generic) should be 64.

craig.topper added a subscriber: craig.topper.Feb 20 2020, 3:36 PM

craig.topper added inline comments.

clang/lib/Basic/Targets/X86.cpp
1738	I'd be surprised if 386 is larger than 486 or 586.
1739	doubleword is 32-bits on X86. So 4 double words is 16 bytes I think?
1833	Nehalem, coooperlake, cannonlake and tigerlake are all 64.

craig.topper added inline comments.Feb 20 2020, 3:52 PM

clang/lib/Basic/Targets/X86.cpp
1834	I think Lakemont is 16 bytes. Assuming I'm interpretting the CLFLUSH line size from this CPUID dump correctly https://github.com/InstLatx64/InstLatx64/blob/master/GenuineIntel/GenuineIntel0000590_Lakemont_CPUID2.txt

To address the question, what will this be used for: this could be used for several things in the library and compiler, but the primary use will be to create a builtin that can be used to implement P0154.

@jyknight It would probably be best if we could account for CPUs who like to use aligned pairs when getting a cache line. It's probably also important that we don't change the value getCPUCacheLineSize returns, so, if we are going to account for that, we should probably update getCPUCacheLineSize before this patch lands.

clang/include/clang/Basic/TargetInfo.h
1211	Good idea, will do.
clang/lib/Basic/Targets/X86.cpp
1738	Yes, it does seem surprising but this other source corroborates it https://osxbook.com/blog/2009/03/02/retrieving-x86-processor-information/
1739	Yes, you're right! My bad. This PDF (page 29) also agrees with 16 bytes. I'll update the table.
1833	Great, I'll update it. Thanks.
1834	I may very well be interpreting this incorrectly but I think that the cache line sizes are at `eax = 0x80000005` and `eax = 0x80000006`. However, all the registers at those codes are zero. If I instead look at `eax = 0x00000001` (which I think is incorrect), `ecx = 00010200` which would be 128 bytes (maybe this is where the 16 came from, if they were bits instead?). How were you interpreting it?
1840	Good to know, I'll update it :)
clang/lib/Sema/SemaStmt.cpp
2817 ↗	(On Diff #245708)	This will fall back to `64` as a default if the cache line size isn't known. I can revert this change if you want, though. It's not central to the patch.

craig.topper added inline comments.Feb 20 2020, 10:28 PM

clang/lib/Basic/Targets/X86.cpp
1738	-march=i386 there seem just seems to be a compiler switch. The 386 CPU didn't have CPUID, that was added on the 486. The dump there seems to be a dump from Nehalem. They just compiled the code targeting a really old CPU for some reason. Thinking about it more, the 386 is a bit weird because the cache wasn't part of the CPU, it was on the motherboard and not always present. Not sure what that means for cache line size. I have an ancient 386 hardware manual in my cube at work, maybe I can find something in there.
1834	Interpreted bits 15:8 of the ecx = 00010200 to be the clflush information. The spec for cpuid says to multiply by 8 to get cache line size. So 15:8 is 2, multiply by 8 is 16 bytes. I think Lakemont is based on the 486 architecture so that seems possible.

zoecarver marked an inline comment as done.Feb 21 2020, 9:29 AM

zoecarver added inline comments.

clang/lib/Basic/Targets/X86.cpp
1834	I see. That looks correct. I'll update it. Still, I do find it strange that all registers at `eax = 0x80000006` are zeroed out. Using the `cpuid` instruction on my cpu I get the following: CPUID 0: 13 71 110 105 CPUID 1: 97 0 191 255 CPUID 2: 1 255 0 0 CPUID 80000000: 8 0 0 0 CPUID 80000001: 0 0 33 0 CPUID 80000002: 73 108 32 101 CPUID 80000003: 41 45 48 67 CPUID 80000004: 64 50 122 0 CPUID 80000005: 0 0 0 0 CPUID 80000006: 0 0 64 0 CPUID 80000007: 0 0 0 0 CPUID 80000008: 39 0 0 0 I know my L2 cache line size is 64 and the L2 cache line size should be `ecx` at `eax = 0x80000006` which would be `64`. But, when I read the dump you linked to, it doesn't look like there's any data at `0x80000006`. Anyway, as I said above, I'll look past this and update it based on the clflush information. That seems valid.

Update values returned by getCPUCacheLineSize based on review comments

zoecarver marked an inline comment as done.Feb 21 2020, 9:43 AM

Update getCPUCacheLineSize to return optional

Harbormaster completed remote builds in B47017: Diff 245896.Feb 21 2020, 9:46 AM

Harbormaster completed remote builds in B47018: Diff 245897.

craig.topper added inline comments.Feb 21 2020, 10:26 AM

clang/lib/Basic/Targets/X86.cpp
1834	Lakemont is a modernized version of a 486 which probably had none of the 80000000 leafs originally. My speculation is that they really wanted to support leaf 80000008 so they just put zeros in the rest rather than add a lot of microcode.

zoecarver marked an inline comment as done.Feb 21 2020, 10:34 AM

zoecarver added inline comments.

clang/lib/Basic/Targets/X86.cpp
1834	Alrighty, that makes sense. Thanks :)

Friendly ping. Any other comments on this?

jfb accepted this revision.Feb 27 2020, 1:09 PM

This revision is now accepted and ready to land.Feb 27 2020, 1:09 PM

craig.topper added inline comments.Feb 27 2020, 1:39 PM

clang/lib/Basic/Targets/X86.cpp
1786	I found the documentation for the 82385 cache controller chip for the 386. It's a bit weird. The tags for the cache are based on 16 byte lines, but there are valid bits for every 2 bytes within that line. So its possible for only a portion of a line to be valid.
1840	If Yonah and Penryn are 64, then Core2 should be as well. It's the generation between them.

In D74918#1885869, @zoecarver wrote:

@jyknight It would probably be best if we could account for CPUs who like to use aligned pairs when getting a cache line. It's probably also important that we don't change the value getCPUCacheLineSize returns, so, if we are going to account for that, we should probably update getCPUCacheLineSize before this patch lands.

It would be extremely unfortunate to go to all the trouble of attempting to return accurate results from the P0154 interfaces, and then to return an answer insufficient to actually achieve the performance benefit it's intended for, and then be unable to fix it due to ABI concerns. So, yes, I do believe that this issue must be decided.

Additionally, my opinion here has really not changed from a couple of years ago. I continue to believe it was a mistake to standardize these constexpr values, and that absolutely the best course of action would be to continue to decline to implement this part of the standard, forever. (And that GCC should similarly also continue to decline to implement it).

That said, the list of cacheline sizes collected in this review is still useful in any case, since it needs to be copied into LLVM's X86Subtarget to implement the backend getCacheLineSize function.

This revision now requires changes to proceed.Feb 27 2020, 1:59 PM

(I assume I'm not seeing a code review being used to veto a C++ Standard feature, but actually the other points are the reason for the red flag.)

I can see a desire for hyper-precise definitions to achieve the best possible performance, but we really need a healthy does of conservatism here.

The C++ values can't change as a result of selecting between microarchitecture variations that are supposed to link, it takes an ABI break to change these.

We specified two numbers here so we could conservatively set them (e.g. constructive: smallest; destructive: largest) if we want, but then they are fixed.

I think there's just two plausible answers for x86_64:

constructive=64, destructive=64 (the only plausible answer for X86 classic edition)
constructive=64, destructive=128 (reserve some room for line-pair designs)

Recapping: precision is nice but we need to fix this in the ABI so ultimate precision isn't required nor desirable; we should choose one of these pairs (or debate if another pair is viable that I'm not thinking of); then we should ship C++17.

clang/lib/Basic/Targets/X86.cpp
1748	Broadwell.

Here's what I think we should do: continue to have this method just return the cache line size. Then have another method that returns true or false for whether the given architecture supports aligned pairs of cache lines then, users of this (either in clang or libc++) can decide what they want to do.

Edit: I'm not tied to this approach, though. I'm happy to update this patch instead if that's what you (all) think is best.

In D74918#1896930, @__simt__ wrote:

(I assume I'm not seeing a code review being used to veto a C++ Standard feature, but actually the other points are the reason for the red flag.)

I can see a desire for hyper-precise definitions to achieve the best possible performance, but we really need a healthy does of conservatism here.

The C++ values can't change as a result of selecting between microarchitecture variations that are supposed to link, it takes an ABI break to change these.

If these values are part of the C++ target platform ABI, it seems to me the values for std::hardware_{constructive,destructive}_interference_size should be set by whoever has the authority to decide C++ platform ABI for specific platforms.
Assuming my thought in the previous sentence is correct; discussions on which values to chose for std::hardware_{constructive,destructive}_interference_size should happen in whichever forums decide C++ platform ABI for the various platforms? (Maybe for some platforms that forum might be clang-related fora like reviews.llvm.org, but probably not for all platforms).
With my (probably limited) understanding of the requirements, it seems like deriving std::hardware_{constructive,destructive}_interference_size from actual cache line size on a specific micro-architecture doesn't seem to be the right approach?

We specified two numbers here so we could conservatively set them (e.g. constructive: smallest; destructive: largest) if we want, but then they are fixed.

I think there's just two plausible answers for x86_64:

constructive=64, destructive=64 (the only plausible answer for X86 classic edition)

constructive=64, destructive=128 (reserve some room for line-pair designs)

Recapping: precision is nice but we need to fix this in the ABI so ultimate precision isn't required nor desirable; we should choose one of these pairs (or debate if another pair is viable that I'm not thinking of); then we should ship C++17.

In D74918#1897636, @kristof.beyls wrote:

If these values are part of the C++ target platform ABI, it seems to me the values for std::hardware_{constructive,destructive}_interference_size should be set by whoever has the authority to decide C++ platform ABI for specific platforms.
Assuming my thought in the previous sentence is correct; discussions on which values to chose for std::hardware_{constructive,destructive}_interference_size should happen in whichever forums decide C++ platform ABI for the various platforms? (Maybe for some platforms that forum might be clang-related fora like reviews.llvm.org, but probably not for all platforms).
With my (probably limited) understanding of the requirements, it seems like deriving std::hardware_{constructive,destructive}_interference_size from actual cache line size on a specific micro-architecture doesn't seem to be the right approach?

They will be in the library ABI, meaning the libc++ ABI.

It's valid for libstdc++ and libc++ to have different values there. If we wish, we could try for an alignment (no pun intended) on these values, but even then that's just between these two libraries.

Which is good and encouraging, because I don't know what forum we would have to go to.

In D74918#1898191, @__simt__ wrote:

In D74918#1897636, @kristof.beyls wrote:

If these values are part of the C++ target platform ABI, it seems to me the values for std::hardware_{constructive,destructive}_interference_size should be set by whoever has the authority to decide C++ platform ABI for specific platforms.
Assuming my thought in the previous sentence is correct; discussions on which values to chose for std::hardware_{constructive,destructive}_interference_size should happen in whichever forums decide C++ platform ABI for the various platforms? (Maybe for some platforms that forum might be clang-related fora like reviews.llvm.org, but probably not for all platforms).
With my (probably limited) understanding of the requirements, it seems like deriving std::hardware_{constructive,destructive}_interference_size from actual cache line size on a specific micro-architecture doesn't seem to be the right approach?

They will be in the library ABI, meaning the libc++ ABI.

It's valid for libstdc++ and libc++ to have different values there. If we wish, we could try for an alignment (no pun intended) on these values, but even then that's just between these two libraries.

Which is good and encouraging, because I don't know what forum we would have to go to.

I see.
So IIUC, this is library C++ ABI, to be defined by the C++ library. Since no 2 C++ libraries can co-exist in a single application, there is no need for different C++ libraries to agree?

I think there is still an issue then with getting the values of std::hardware_{constructive,destructive}_interference_size in the library implementation derived from compiler builtins.
There are quite a few systems where clang supports targeting a different C++ library than libc++, e.g. libstdc++ on linux or (IIUC) the MSVC C++ library on Windows.
If these C++ libraries implement std::hardware_{constructive,destructive}_interference_size based on a value returned by a compiler builtin, and the different compilers that are used with these libraries return different values for such a builtin, then the library C++ ABI here will be dependent on which compiler used?
Doesn't this indicate that either:

std::hardware_{constructive,destructive}_interference_size should not be implemented using a compiler builtin, or
all compilers must return the same value for the builtin; for all targets they support.

Overall, that makes me doubt that using a compiler builtin to implement std::hardware_{constructive,destructive}_interference_size is the right direction.
If the functionality in this patch does not need to support implementing std::hardware_{constructive,destructive}_interference_size, then the design constraints for this patch change?

There are a lot of different ways we could implement the feature. We may want to only enable it when -march=native, or maybe only in the unstable ABI, and maybe we want to support aligned pairs on some architectures. I think that's an important discussion to have but I'm not sure _this_ patch is the best place to have that discussion.

Even if we don't use this patch in the implementation I think it would still be a good utility to have. Here's what I suggest: I commit this, create another patch to add a builtin that exposes this API, and then open a libc++ patch with a _possible_ implementation. In that patch, we can discuss how we should actually implement the feature and after we have a consensus I can do the work to implement it. Any objections to that plan?

Herald added a subscriber: danielkiss. · View Herald TranscriptMar 14 2020, 2:31 PM

In D74918#1923151, @zoecarver wrote:

There are a lot of different ways we could implement the feature. We may want to only enable it when -march=native, or maybe only in the unstable ABI, and maybe we want to support aligned pairs on some architectures. I think that's an important discussion to have but I'm not sure _this_ patch is the best place to have that discussion.

Even if we don't use this patch in the implementation I think it would still be a good utility to have. Here's what I suggest: I commit this, create another patch to add a builtin that exposes this API, and then open a libc++ patch with a _possible_ implementation. In that patch, we can discuss how we should actually implement the feature and after we have a consensus I can do the work to implement it. Any objections to that plan?

Discussing the implementation strategy for std::hardware_{constructive,destructive}_interference_size on a libc++ thread rather than here makes sense.
I'm afraid I don't have a good view on all the ways the API and associated intrinsic proposed here will or could be used in practice.
My only thought on it is that we cannot guarantee that different versions of LLVM will keep on reporting the same number, even for identical targets.

@kristof.beyls I'll try to bring that up in my libc++ patch, thanks.

I'm planning on committing this today unless anyone still has concerns. @jyknight is this path OK with you?

This seems misplaced - why is this in clang and not LLVM?

clang/include/clang/Basic/TargetInfo.h
1210	Comment is no longer valid - returns `None` instead. Also, might it be worth explicitly calling out that there is zero guarantees of stability of the returned values?

This patch as it stands is harmless, since as it only defines an internal interface, which is unused. So in that sense, it's perfectly fine to commit even with the remaining unresolved questions about the correct values to return. However, unless we're going to actually use it, adding this code to clang is not useful, for the same reason of it only defining an internal interface which is unused.

Ultimately, I don't see a reason to commit this, until/unless we are going to commit code in Clang using it (which I continue to believe we should not do). So I'd say leaving this in a pending state, until a use is going to be committed immediately afterwards, seems best.

@lebedev.ri LLVM may be better, I'm not sure. If you feel strongly I can move it.

@jyknight I'm planning on adding a builtin that uses this method. I can put the patch up for that first if you would like.

clang/include/clang/Basic/TargetInfo.h
1210	Good catch. I think we should try to have as much stability as possible but I can add a note that these values may change.
clang/lib/Basic/Targets/X86.cpp
1786	Interesting. Does this mean that we should be returning a different value?
1840	I'll update it. Thanks.

In D74918#1931488, @zoecarver wrote:

@lebedev.ri LLVM may be better, I'm not sure. If you feel strongly I can move it.

I do think that it makes much more sense somewhere closer to the backend

@jyknight I'm planning on adding a builtin that uses this method. I can put the patch up for that first if you would like.

but that being said, currently nothing needs that and with all those concerns about exposing this info,
i'd hold off moving it until it is settled whether it is needed (and not a dead code) in the first place :/

zoecarver mentioned this in D76525: Expose cache line size in __builtin_get_cpu_cache_line_size.Mar 20 2020, 2:00 PM

In D76525 I've added a builtin that uses this API. We can use that builtin in libc++.

I've made the suggested changes. Is there a consensus that this should be moved to LLVM? I think it's fairly clang-specific (given the use case). We can also always move it to LLVM if the need arises in the future.

Fix based on review:

update comments
return 64 for Core2

Harbormaster completed remote builds in B49957: Diff 251758.Mar 20 2020, 2:41 PM

craig.topper added inline comments.Mar 20 2020, 11:58 PM

clang/lib/Basic/Targets/X86.cpp
1786	I think we should make 386 return 16.

zoecarver marked an inline comment as done.Mar 21 2020, 9:08 AM

zoecarver added inline comments.

clang/lib/Basic/Targets/X86.cpp
1786	Alright. I'll update it.

@lebedev.ri Where should I put this? Could you maybe point me to another LLVM target API that clang uses that I could base this off? Maybe it should go inside the triple or just be a static function?

Rebase off master
For i386, return 16

In D74918#1937237, @zoecarver wrote:

@lebedev.ri Where should I put this? Could you maybe point me to another LLVM target API that clang uses that I could base this off? Maybe it should go inside the triple or just be a static function?

I would have guessed TargetLoweringInfo/TargetTransformInfo, but i'm not sure if we can access it here without violating layering.

Harbormaster failed remote builds in B50133: Diff 252079!Mar 23 2020, 10:22 AM

In D74918#1937291, @lebedev.ri wrote:

In D74918#1937237, @zoecarver wrote:

@lebedev.ri Where should I put this? Could you maybe point me to another LLVM target API that clang uses that I could base this off? Maybe it should go inside the triple or just be a static function?

I would have guessed TargetLoweringInfo/TargetTransformInfo, but i'm not sure if we can access it here without violating layering.

I don't think can be done unless the proposed builtin for this lowered to an llvm intrinsic and was handled completely in llvm CodeGen.

@craig.topper that would mean that it couldn't be constexpr though, right?

I don't know enough about clang and llvm to know what is both possible and preferable. @craig.topper @lebedev.ri what is the best course of action here?

Like i said in previous comments, this can be moved later, that is not a blocker.
Ship it first :)

@lebedev.ri my misunderstanding. I'll commit :)

This revision was not accepted when it landed; it landed in state Needs Review.Mar 25 2020, 10:16 AM

Closed by commit rGb915aec6b591: Add method to TargetInfo to get CPU cache line size (authored by zoecarver). · Explain Why

This revision was automatically updated to reflect the committed changes.

MaskRay mentioned this in rG14b466b940c6: [X86] Fix a typo of Broadwell after D74918. NFC.Jul 23 2023, 3:15 PM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

TargetInfo.h

4 lines

lib/

Basic/

Targets/

X86.h

2 lines

X86.cpp

113 lines

Diff 252611

clang/include/clang/Basic/TargetInfo.h

Show First 20 Lines • Show All 1,200 Lines • ▼ Show 20 Lines	public:
// cpu_specific/cpu_dispatch so that it can be passed to llvm as optimization		// cpu_specific/cpu_dispatch so that it can be passed to llvm as optimization
// options.		// options.
virtual void getCPUSpecificCPUDispatchFeatures(		virtual void getCPUSpecificCPUDispatchFeatures(
StringRef Name, llvm::SmallVectorImpl<StringRef> &Features) const {		StringRef Name, llvm::SmallVectorImpl<StringRef> &Features) const {
llvm_unreachable(		llvm_unreachable(
"cpu_specific Multiversioning not implemented on this target");		"cpu_specific Multiversioning not implemented on this target");
}		}

		// Get the cache line size of a given cpu. This method switches over
		// the given cpu and returns "None" if the CPU is not found.
		lebedev.riUnsubmitted Done Reply Inline Actions Comment is no longer valid - returns `None` instead. Also, might it be worth explicitly calling out that there is zero guarantees of stability of the returned values? lebedev.ri: Comment is no longer valid - returns `None` instead. Also, might it be worth explicitly calling…
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Good catch. I think we should try to have as much stability as possible but I can add a note that these values may change. zoecarver: Good catch. I think we should try to have as much stability as possible but I can add a note…
		virtual Optional<unsigned> getCPUCacheLineSize() const { return None; }
		jfbUnsubmitted Done Reply Inline Actions Return an optional instead of using zero to mean "unknown"? jfb: Return an optional instead of using zero to mean "unknown"?
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Good idea, will do. zoecarver: Good idea, will do.

// Returns maximal number of args passed in registers.		// Returns maximal number of args passed in registers.
unsigned getRegParmMax() const {		unsigned getRegParmMax() const {
assert(RegParmMax < 7 && "RegParmMax value is larger than AST can handle");		assert(RegParmMax < 7 && "RegParmMax value is larger than AST can handle");
return RegParmMax;		return RegParmMax;
}		}

/// Whether the target supports thread-local storage.		/// Whether the target supports thread-local storage.
bool isTLSSupported() const {		bool isTLSSupported() const {
▲ Show 20 Lines • Show All 212 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/X86.h

Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	public:
bool validateCPUSpecificCPUDispatch(StringRef Name) const override;		bool validateCPUSpecificCPUDispatch(StringRef Name) const override;

char CPUSpecificManglingCharacter(StringRef Name) const override;		char CPUSpecificManglingCharacter(StringRef Name) const override;

void getCPUSpecificCPUDispatchFeatures(		void getCPUSpecificCPUDispatchFeatures(
StringRef Name,		StringRef Name,
llvm::SmallVectorImpl<StringRef> &Features) const override;		llvm::SmallVectorImpl<StringRef> &Features) const override;

		Optional<unsigned> getCPUCacheLineSize() const override;

bool validateAsmConstraint(const char *&Name,		bool validateAsmConstraint(const char *&Name,
TargetInfo::ConstraintInfo &info) const override;		TargetInfo::ConstraintInfo &info) const override;

bool validateGlobalRegisterVariable(StringRef RegName, unsigned RegSize,		bool validateGlobalRegisterVariable(StringRef RegName, unsigned RegSize,
bool &HasSizeMismatch) const override {		bool &HasSizeMismatch) const override {
// esp and ebp are the only 32-bit registers the x86 backend can currently		// esp and ebp are the only 32-bit registers the x86 backend can currently
// handle.		// handle.
if (RegName.equals("esp") \|\| RegName.equals("ebp")) {		if (RegName.equals("esp") \|\| RegName.equals("ebp")) {
▲ Show 20 Lines • Show All 706 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/X86.cpp

Show First 20 Lines • Show All 1,725 Lines • ▼ Show 20 Lines	if (auto Len = matchAsmCCConstraint(Name)) {
Name += Len - 1;		Name += Len - 1;
Info.setAllowsRegister();		Info.setAllowsRegister();
return true;		return true;
}		}
return false;		return false;
}		}
}		}

		// Below is based on the following information:
		// +------------------------------------+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
		// \| Processor Name \| Cache Line Size (Bytes) \| Source \|
		// +------------------------------------+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
		// \| i386 \| 64 \| https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf \|
		craig.topperUnsubmitted Not Done Reply Inline Actions I'd be surprised if 386 is larger than 486 or 586. craig.topper: I'd be surprised if 386 is larger than 486 or 586.
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Yes, it does seem surprising but this other source corroborates it https://osxbook.com/blog/2009/03/02/retrieving-x86-processor-information/ zoecarver: Yes, it does seem surprising but this other source corroborates it https://osxbook.
		craig.topperUnsubmitted Not Done Reply Inline Actions -march=i386 there seem just seems to be a compiler switch. The 386 CPU didn't have CPUID, that was added on the 486. The dump there seems to be a dump from Nehalem. They just compiled the code targeting a really old CPU for some reason. Thinking about it more, the 386 is a bit weird because the cache wasn't part of the CPU, it was on the motherboard and not always present. Not sure what that means for cache line size. I have an ancient 386 hardware manual in my cube at work, maybe I can find something in there. craig.topper: -march=i386 there seem just seems to be a compiler switch. The 386 CPU didn't have CPUID, that…
		// \| i486 \| 16 \| "four doublewords" (doubleword = 32 bits, 4 bits * 32 bits = 16 bytes) https://en.wikichip.org/w/images/d/d3/i486_MICROPROCESSOR_HARDWARE_REFERENCE_MANUAL_%281990%29.pdf and http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.126.4216&rep=rep1&type=pdf (page 29) \|
		craig.topperUnsubmitted Done Reply Inline Actions doubleword is 32-bits on X86. So 4 double words is 16 bytes I think? craig.topper: doubleword is 32-bits on X86. So 4 double words is 16 bytes I think?
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Yes, you're right! My bad. This PDF (page 29) also agrees with 16 bytes. I'll update the table. zoecarver: Yes, you're right! My bad. [[ http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.126.
		// \| i586/Pentium MMX \| 32 \| https://www.7-cpu.com/cpu/P-MMX.html \|
		// \| i686/Pentium \| 32 \| https://www.7-cpu.com/cpu/P6.html \|
		// \| Netburst/Pentium4 \| 64 \| https://www.7-cpu.com/cpu/P4-180.html \|
		// \| Atom \| 64 \| https://www.7-cpu.com/cpu/Atom.html \|
		// \| Westmere \| 64 \| https://en.wikichip.org/wiki/intel/microarchitectures/sandy_bridge_(client) "Cache Architecture" \|
		// \| Sandy Bridge \| 64 \| https://en.wikipedia.org/wiki/Sandy_Bridge and https://www.7-cpu.com/cpu/SandyBridge.html \|
		// \| Ivy Bridge \| 64 \| https://blog.stuffedcow.net/2013/01/ivb-cache-replacement/ and https://www.7-cpu.com/cpu/IvyBridge.html \|
		// \| Haswell \| 64 \| https://www.7-cpu.com/cpu/Haswell.html \|
		// \| Boadwell \| 64 \| https://www.7-cpu.com/cpu/Broadwell.html \|
		__simt__Unsubmitted Done Reply Inline Actions Broadwell. __simt__: Broadwell.
		// \| Skylake (including skylake-avx512) \| 64 \| https://www.nas.nasa.gov/hecc/support/kb/skylake-processors_550.html "Cache Hierarchy" \|
		// \| Cascade Lake \| 64 \| https://www.nas.nasa.gov/hecc/support/kb/cascade-lake-processors_579.html "Cache Hierarchy" \|
		// \| Skylake \| 64 \| https://en.wikichip.org/wiki/intel/microarchitectures/kaby_lake "Memory Hierarchy" \|
		// \| Ice Lake \| 64 \| https://www.7-cpu.com/cpu/Ice_Lake.html \|
		// \| Knights Landing \| 64 \| https://software.intel.com/en-us/articles/intel-xeon-phi-processor-7200-family-memory-management-optimizations "The Intel® Xeon Phi™ Processor Architecture" \|
		// \| Knights Mill \| 64 \| https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf?countrylabel=Colombia "2.5.5.2 L1 DCache " \|
		// +------------------------------------+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
		Optional<unsigned> X86TargetInfo::getCPUCacheLineSize() const {
		switch (CPU) {
		// i386
		case CK_i386:
		// i486
		case CK_i486:
		case CK_WinChipC6:
		case CK_WinChip2:
		case CK_C3:
		// Lakemont
		case CK_Lakemont:
		return 16;

		// i586
		case CK_i586:
		case CK_Pentium:
		case CK_PentiumMMX:
		// i686
		case CK_PentiumPro:
		case CK_i686:
		case CK_Pentium2:
		case CK_Pentium3:
		case CK_PentiumM:
		case CK_C3_2:
		// K6
		case CK_K6:
		case CK_K6_2:
		case CK_K6_3:
		// Geode
		case CK_Geode:
		return 32;
		craig.topperUnsubmitted Not Done Reply Inline Actions I found the documentation for the 82385 cache controller chip for the 386. It's a bit weird. The tags for the cache are based on 16 byte lines, but there are valid bits for every 2 bytes within that line. So its possible for only a portion of a line to be valid. craig.topper: I found the documentation for the 82385 cache controller chip for the 386. It's a bit weird.
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Interesting. Does this mean that we should be returning a different value? zoecarver: Interesting. Does this mean that we should be returning a different value?
		craig.topperUnsubmitted Not Done Reply Inline Actions I think we should make 386 return 16. craig.topper: I think we should make 386 return 16.
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Alright. I'll update it. zoecarver: Alright. I'll update it.

		// Netburst
		case CK_Pentium4:
		case CK_Prescott:
		case CK_Nocona:
		// Atom
		case CK_Bonnell:
		case CK_Silvermont:
		case CK_Goldmont:
		case CK_GoldmontPlus:
		case CK_Tremont:

		case CK_Westmere:
		case CK_SandyBridge:
		case CK_IvyBridge:
		case CK_Haswell:
		case CK_Broadwell:
		case CK_SkylakeClient:
		case CK_SkylakeServer:
		case CK_Cascadelake:
		case CK_Nehalem:
		case CK_Cooperlake:
		case CK_Cannonlake:
		case CK_Tigerlake:
		case CK_IcelakeClient:
		case CK_IcelakeServer:
		case CK_KNL:
		case CK_KNM:
		lebedev.riUnsubmitted Not Done Reply Inline Actions K6 should have 32B cache line size lebedev.ri: K6 should have 32B cache line size
		// K7
		case CK_Athlon:
		case CK_AthlonXP:
		// K8
		case CK_K8:
		case CK_K8SSE3:
		case CK_AMDFAM10:
		// Bobcat
		case CK_BTVER1:
		case CK_BTVER2:
		// Bulldozer
		case CK_BDVER1:
		case CK_BDVER2:
		case CK_BDVER3:
		case CK_BDVER4:
		// Zen
		case CK_ZNVER1:
		case CK_ZNVER2:
		lebedev.riUnsubmitted Done Reply Inline Actions K7..ZnVer2 should have 64B lebedev.ri: K7..ZnVer2 should have 64B
		// Deprecated
		craig.topperUnsubmitted Done Reply Inline Actions Nehalem, coooperlake, cannonlake and tigerlake are all 64. craig.topper: Nehalem, coooperlake, cannonlake and tigerlake are all 64.
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Great, I'll update it. Thanks. zoecarver: Great, I'll update it. Thanks.
		case CK_x86_64:
		lebedev.riUnsubmitted Not Done Reply Inline Actions 32B for this one lebedev.ri: 32B for this one
		craig.topperUnsubmitted Done Reply Inline Actions I think Lakemont is 16 bytes. Assuming I'm interpretting the CLFLUSH line size from this CPUID dump correctly https://github.com/InstLatx64/InstLatx64/blob/master/GenuineIntel/GenuineIntel0000590_Lakemont_CPUID2.txt craig.topper: I think Lakemont is 16 bytes. Assuming I'm interpretting the CLFLUSH line size from this CPUID…
		zoecarverAuthorUnsubmitted Done Reply Inline Actions I may very well be interpreting this incorrectly but I think that the cache line sizes are at `eax = 0x80000005` and `eax = 0x80000006`. However, all the registers at those codes are zero. If I instead look at `eax = 0x00000001` (which I think is incorrect), `ecx = 00010200` which would be 128 bytes (maybe this is where the 16 came from, if they were bits instead?). How were you interpreting it? zoecarver: I may very well be interpreting this incorrectly but I think that the cache line sizes are at…
		craig.topperUnsubmitted Done Reply Inline Actions Interpreted bits 15:8 of the ecx = 00010200 to be the clflush information. The spec for cpuid says to multiply by 8 to get cache line size. So 15:8 is 2, multiply by 8 is 16 bytes. I think Lakemont is based on the 486 architecture so that seems possible. craig.topper: Interpreted bits 15:8 of the ecx = 00010200 to be the clflush information. The spec for cpuid…
		zoecarverAuthorUnsubmitted Done Reply Inline Actions I see. That looks correct. I'll update it. Still, I do find it strange that all registers at `eax = 0x80000006` are zeroed out. Using the `cpuid` instruction on my cpu I get the following: CPUID 0: 13 71 110 105 CPUID 1: 97 0 191 255 CPUID 2: 1 255 0 0 CPUID 80000000: 8 0 0 0 CPUID 80000001: 0 0 33 0 CPUID 80000002: 73 108 32 101 CPUID 80000003: 41 45 48 67 CPUID 80000004: 64 50 122 0 CPUID 80000005: 0 0 0 0 CPUID 80000006: 0 0 64 0 CPUID 80000007: 0 0 0 0 CPUID 80000008: 39 0 0 0 I know my L2 cache line size is 64 and the L2 cache line size should be `ecx` at `eax = 0x80000006` which would be `64`. But, when I read the dump you linked to, it doesn't look like there's any data at `0x80000006`. Anyway, as I said above, I'll look past this and update it based on the clflush information. That seems valid. zoecarver: I see. That looks correct. I'll update it. Still, I do find it strange that all registers at…
		craig.topperUnsubmitted Not Done Reply Inline Actions Lakemont is a modernized version of a 486 which probably had none of the 80000000 leafs originally. My speculation is that they really wanted to support leaf 80000008 so they just put zeros in the rest rather than add a lot of microcode. craig.topper: Lakemont is a modernized version of a 486 which probably had none of the 80000000 leafs…
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Alrighty, that makes sense. Thanks :) zoecarver: Alrighty, that makes sense. Thanks :)
		case CK_Yonah:
		case CK_Penryn:
		case CK_Core2:
		return 64;

		// The following currently have unknown cache line sizes (but they are probably all 64):
		jfbUnsubmitted Done Reply Inline Actions All of the above (except generic) should be 64. jfb: All of the above (except generic) should be 64.
		zoecarverAuthorUnsubmitted Done Reply Inline Actions Good to know, I'll update it :) zoecarver: Good to know, I'll update it :)
		craig.topperUnsubmitted Not Done Reply Inline Actions If Yonah and Penryn are 64, then Core2 should be as well. It's the generation between them. craig.topper: If Yonah and Penryn are 64, then Core2 should be as well. It's the generation between them.
		zoecarverAuthorUnsubmitted Done Reply Inline Actions I'll update it. Thanks. zoecarver: I'll update it. Thanks.
		// Core
		case CK_Generic:
		return None;
		}
		}

bool X86TargetInfo::validateOutputSize(const llvm::StringMap<bool> &FeatureMap,		bool X86TargetInfo::validateOutputSize(const llvm::StringMap<bool> &FeatureMap,
StringRef Constraint,		StringRef Constraint,
unsigned Size) const {		unsigned Size) const {
// Strip off constraint modifiers.		// Strip off constraint modifiers.
while (Constraint[0] == '=' \|\| Constraint[0] == '+' \|\| Constraint[0] == '&')		while (Constraint[0] == '=' \|\| Constraint[0] == '+' \|\| Constraint[0] == '&')
Constraint = Constraint.substr(1);		Constraint = Constraint.substr(1);

return validateOperandSize(FeatureMap, Constraint, Size);		return validateOperandSize(FeatureMap, Constraint, Size);
▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add method to TargetInfo to get CPU cache line sizeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 252611

clang/include/clang/Basic/TargetInfo.h

clang/lib/Basic/Targets/X86.h

clang/lib/Basic/Targets/X86.cpp

Add method to TargetInfo to get CPU cache line size
ClosedPublic