This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
Builtins.def
-
lib/AST/
-
AST/
-
ExprConstant.cpp
-
test/
-
Sema/
-
builtins.c
-
SemaCXX/
-
builtin-hardware-interference-size-aarch.cpp
-
builtin-hardware-interference-size-amd.cpp
-
builtin-hardware-interference-size-arm.cpp
-
builtin-hardware-interference-size-nvptx.cpp
-
builtin-hardware-interference-size-ppc.cpp
-
builtin-hardware-interference-size-unknown.cpp
-
builtin-hardware-interference-size-x86.cpp

Differential D66822

Hardware cache line size builtins
Needs RevisionPublic

Authored by zoecarver on Aug 27 2019, 12:10 PM.

Download Raw Diff

Details

Reviewers

jfb
mclow.lists
EricWF
efriedma
jyknight
echristo
__simt__

Summary

Creates the __builtin_hardware_destructive_interference_size and __builtin_hardware_constructive_interference_size builtins proposed by @jfb here. These builtins can be used to implement P0154 in libc++ and other standard libraries. My implementation switches on the target triple to get the max cache line size for that target. I am not sure if this is the best way to implement these builtins, but it will ensure that there is not an ABI break.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 37373
Build 37372: arc lint + arc unit

Event Timeline

zoecarver created this revision.Aug 27 2019, 12:10 PM

Herald added a reviewer: jfb. · View Herald TranscriptAug 27 2019, 12:10 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: cfe-commits, kbarton, aheejin and 2 others. · View Herald Transcript

Harbormaster completed remote builds in B37373: Diff 217470.Aug 27 2019, 12:10 PM

Herald added subscribers: • wuzish, dexonsmith. · View Herald TranscriptAug 27 2019, 12:10 PM

zoecarver added reviewers: mclow.lists, EricWF, efriedma, jyknight.Aug 27 2019, 12:14 PM

zoecarver edited the summary of this revision. (Show Details)Aug 27 2019, 12:17 PM

My implementation switches on the target triple to get the max cache line size for that target. I am not sure if this is the best way to implement these builtins, but it will ensure that there is not an ABI break.

Passing-by remark: i'm not sure it is possible to guarantee that this will be always correct and that no ABI break will happen.
What if some next model of e.g. x86 processor has 128-byte-wide cache line?
You nominally can't bump the value because it will be an ABI break,
but the default is no longer conservatively correct there - it is smaller than needed,
which will effectively cripple all usages where this size is used as an alignment to avoid cache line sharing.

Passing-by remark: i'm not sure it is possible to guarantee that this will be always correct and that no ABI break will happen.

You're right. I should have said, "least-likely to cause an ABI break." And I completely agree that there is no way to gaurentee this is correct at compile time. hardware_*_interference_size certainly has the potential to do more harm than good but, I think that is another discussion.

It may be a good idea only to enable this when -march=native. And _maybe_ remove the constexpr requirement (though it would make this feature significantly less useful it would also make it significantly more accurate).

numbers for cacheline size.

In D66822#1647664, @zoecarver wrote:

Passing-by remark: i'm not sure it is possible to guarantee that this will be always correct and that no ABI break will happen.

You're right. I should have said, "least-likely to cause an ABI break." And I completely agree that there is no way to gaurentee this is correct at compile time. hardware_*_interference_size certainly has the potential to do more harm than good but, I think that is another discussion.

I don't see why we'd bother to implement this as a builtin, if we're going to implement it like this. A much simpler implementation would be to have libc++ return 64 for constructive and 128 for destructive, across the board. That'd certainly be abi stable, and also correct, at the moment, for architectures people generally care about. (And we should tell people to never use these if they actually care about it.)

BTW, I note that facebook uses 128 bytes for x86, noting in the source:

Microbenchmarks indicate that pairs of cache lines also see destructive
interference under heavy use of atomic operations, as observed for atomic
increment on Sandy Bridge.

We assume a cache line size of 64, so we use a cache line pair size of 128
to avoid destructive interference.

We should probably tell people never to use this, period. That being said, I like your idea of having it be a constant. The only issue would be when, in the next few years, people start shipping CPUs with 256-byte-wide cache lines.

BTW, I note that facebook uses 128 bytes for x86

They also use 64 for arm which is interesting (the opposite of this patch).

Also, see this comment in the same snippet:

We assume a cache line size of 64, so we use a cache line pair size of 128

Which would indicate that 64 is the correct number of bytes for x86. But maybe we should double that for hardware_destructive_interference_size.

Edit: Thinking about it more. A cache pair would usually be in the L2 cache (I think). This feature is looking at the L1 cache line size, so why did Facebook implement it in that way?

Sorry for the delayed response, I was on vacation. Thanks for tackling it!

I don't think this is the approach I would take. From my dev meeting lightning talk I would instead:

Add to target infrastructure
Overriding in sub-targets using -march or -mcpu
Overriding on the command line
If set in target, expose the builtin
Generic le32 / be32 ARM targets expose constructive / destructive as 64B
Generic le64 / be64 ARM targets expose constructive as 64B and destructive as 128B
Generic x86 expose constructive / destructive as 64B
Honor existing sub-target preferences
Let maintainers of other targets choose appropriate size

I think this needs to be split up into a few patches, at least one per target. libc++ would then only expose the feature if __has_builtin is true. I'm happy to go into more details if what I say above is to vague :)

This revision now requires changes to proceed.Sep 3 2019, 3:14 PM

echristo added a reviewer: echristo.Dec 4 2019, 5:44 PM

jfb added a reviewer: __simt__.Dec 5 2019, 9:25 AM

In D66822#1656476, @jfb wrote:

Add to target infrastructure

Refs D74918.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

Builtins.def

2 lines

lib/

AST/

ExprConstant.cpp

40 lines

test/

Sema/

builtins.c

16 lines

SemaCXX/

builtin-hardware-interference-size-aarch.cpp

7 lines

builtin-hardware-interference-size-amd.cpp

7 lines

builtin-hardware-interference-size-arm.cpp

9 lines

builtin-hardware-interference-size-nvptx.cpp

7 lines

builtin-hardware-interference-size-ppc.cpp

8 lines

builtin-hardware-interference-size-unknown.cpp

9 lines

builtin-hardware-interference-size-x86.cpp

7 lines

Diff 217470

clang/include/clang/Basic/Builtins.def

	Show First 20 Lines • Show All 519 Lines • ▼ Show 20 Lines
	BUILTIN(__builtin_longjmp, "vv**i", "r")			BUILTIN(__builtin_longjmp, "vv**i", "r")
	BUILTIN(__builtin_unwind_init, "v", "")			BUILTIN(__builtin_unwind_init, "v", "")
	BUILTIN(__builtin_eh_return_data_regno, "iIi", "nc")			BUILTIN(__builtin_eh_return_data_regno, "iIi", "nc")
	BUILTIN(__builtin_snprintf, "iczcC.", "nFp:2:")			BUILTIN(__builtin_snprintf, "iczcC.", "nFp:2:")
	BUILTIN(__builtin_vsprintf, "iccCa", "nFP:1:")			BUILTIN(__builtin_vsprintf, "iccCa", "nFP:1:")
	BUILTIN(__builtin_vsnprintf, "iczcCa", "nFP:2:")			BUILTIN(__builtin_vsnprintf, "iczcCa", "nFP:2:")
	BUILTIN(__builtin_thread_pointer, "v*", "nc")			BUILTIN(__builtin_thread_pointer, "v*", "nc")
	BUILTIN(__builtin_launder, "vv", "nt")			BUILTIN(__builtin_launder, "vv", "nt")
				LANGBUILTIN(__builtin_hardware_destructive_interference_size, "i", "n", CXX_LANG)
				LANGBUILTIN(__builtin_hardware_constructive_interference_size, "i", "n", CXX_LANG)
	LANGBUILTIN(__builtin_is_constant_evaluated, "b", "n", CXX_LANG)			LANGBUILTIN(__builtin_is_constant_evaluated, "b", "n", CXX_LANG)

	// GCC exception builtins			// GCC exception builtins
	BUILTIN(__builtin_eh_return, "vzv*", "r") // FIXME: Takes intptr_t, not size_t!			BUILTIN(__builtin_eh_return, "vzv*", "r") // FIXME: Takes intptr_t, not size_t!
	BUILTIN(__builtin_frob_return_addr, "vv", "n")			BUILTIN(__builtin_frob_return_addr, "vv", "n")
	BUILTIN(__builtin_dwarf_cfa, "v*", "n")			BUILTIN(__builtin_dwarf_cfa, "v*", "n")
	BUILTIN(__builtin_init_dwarf_reg_size_table, "vv*", "n")			BUILTIN(__builtin_init_dwarf_reg_size_table, "vv*", "n")
	BUILTIN(__builtin_dwarf_sp_column, "Ui", "n")			BUILTIN(__builtin_dwarf_sp_column, "Ui", "n")
	▲ Show 20 Lines • Show All 1,033 Lines • Show Last 20 Lines

clang/lib/AST/ExprConstant.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,472 Lines • ▼ Show 20 Lines

bool IntExprEvaluator::VisitCallExpr(const CallExpr *E) {		bool IntExprEvaluator::VisitCallExpr(const CallExpr *E) {
if (unsigned BuiltinOp = E->getBuiltinCallee())		if (unsigned BuiltinOp = E->getBuiltinCallee())
return VisitBuiltinCallExpr(E, BuiltinOp);		return VisitBuiltinCallExpr(E, BuiltinOp);

return ExprEvaluatorBaseTy::VisitCallExpr(E);		return ExprEvaluatorBaseTy::VisitCallExpr(E);
}		}

		static unsigned getCacheLineSize(llvm::Triple::ArchType Arch) {
		switch (Arch) {
		case llvm::Triple::arm:
		case llvm::Triple::armeb:
		case llvm::Triple::thumb:
		case llvm::Triple::thumbeb:
		// This value is between 8 and 64. We are using the upper bound to be safe.
		return 64;
		case llvm::Triple::aarch64:
		case llvm::Triple::aarch64_be:
		// Sometimes bit.LITTLE will have cores with both a 64 and 128 line sizes.
		return 128;
		case llvm::Triple::x86:
		case llvm::Triple::x86_64:
		return 64;
		case llvm::Triple::ppc:
		case llvm::Triple::ppc64:
		case llvm::Triple::ppc64le:
		return 128;
		case llvm::Triple::r600:
		case llvm::Triple::amdgcn:
		return 64;
		case llvm::Triple::nvptx:
		case llvm::Triple::nvptx64:
		return 64;
		case llvm::Triple::systemz:
		case llvm::Triple::wasm32:
		case llvm::Triple::wasm64:
		case llvm::Triple::hexagon:
		default:
		return 0;
		}
		}

bool IntExprEvaluator::VisitBuiltinCallExpr(const CallExpr *E,		bool IntExprEvaluator::VisitBuiltinCallExpr(const CallExpr *E,
unsigned BuiltinOp) {		unsigned BuiltinOp) {
switch (unsigned BuiltinOp = E->getBuiltinCallee()) {		switch (unsigned BuiltinOp = E->getBuiltinCallee()) {
default:		default:
return ExprEvaluatorBaseTy::VisitCallExpr(E);		return ExprEvaluatorBaseTy::VisitCallExpr(E);

case Builtin::BI__builtin_dynamic_object_size:		case Builtin::BI__builtin_dynamic_object_size:
case Builtin::BI__builtin_object_size: {		case Builtin::BI__builtin_object_size: {
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	if (Info.InConstantContext \|\| Arg->HasSideEffects(Info.Ctx)) {
return Success(false, E);		return Success(false, E);
}		}
Info.FFDiag(E, diag::note_invalid_subexpr_in_const_expr);		Info.FFDiag(E, diag::note_invalid_subexpr_in_const_expr);
return false;		return false;
}		}

case Builtin::BI__builtin_is_constant_evaluated:		case Builtin::BI__builtin_is_constant_evaluated:
return Success(Info.InConstantContext, E);		return Success(Info.InConstantContext, E);
		case Builtin::BI__builtin_hardware_destructive_interference_size:
		case Builtin::BI__builtin_hardware_constructive_interference_size: {
		unsigned cacheLineSize
		= getCacheLineSize(Info.Ctx.getTargetInfo().getTriple().getArch());
		return Success(cacheLineSize, E);
		}

case Builtin::BI__builtin_ctz:		case Builtin::BI__builtin_ctz:
case Builtin::BI__builtin_ctzl:		case Builtin::BI__builtin_ctzl:
case Builtin::BI__builtin_ctzll:		case Builtin::BI__builtin_ctzll:
case Builtin::BI__builtin_ctzs: {		case Builtin::BI__builtin_ctzs: {
APSInt Val;		APSInt Val;
if (!EvaluateInteger(E->getArg(0), Val, Info))		if (!EvaluateInteger(E->getArg(0), Val, Info))
return false;		return false;
▲ Show 20 Lines • Show All 3,665 Lines • Show Last 20 Lines

clang/test/Sema/builtins.c

Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	void Test19(void)
strlcpy(buf, b, sizeof(b)); // expected-warning {{size argument in 'strlcpy' call appears to be size of the source; expected the size of the destination}} \\		strlcpy(buf, b, sizeof(b)); // expected-warning {{size argument in 'strlcpy' call appears to be size of the source; expected the size of the destination}} \\
// expected-note {{change size argument to be the size of the destination}}		// expected-note {{change size argument to be the size of the destination}}
__builtin___strlcpy_chk(buf, b, sizeof(b), __builtin_object_size(buf, 0)); // expected-warning {{size argument in '__builtin___strlcpy_chk' call appears to be size of the source; expected the size of the destination}} \		__builtin___strlcpy_chk(buf, b, sizeof(b), __builtin_object_size(buf, 0)); // expected-warning {{size argument in '__builtin___strlcpy_chk' call appears to be size of the source; expected the size of the destination}} \
// expected-note {{change size argument to be the size of the destination}} \		// expected-note {{change size argument to be the size of the destination}} \
// expected-warning {{'strlcpy' will always overflow; destination buffer has size 20, but size argument is 40}}		// expected-warning {{'strlcpy' will always overflow; destination buffer has size 20, but size argument is 40}}

strlcat(buf, b, sizeof(b)); // expected-warning {{size argument in 'strlcat' call appears to be size of the source; expected the size of the destination}} \		strlcat(buf, b, sizeof(b)); // expected-warning {{size argument in 'strlcat' call appears to be size of the source; expected the size of the destination}} \
// expected-note {{change size argument to be the size of the destination}}		// expected-note {{change size argument to be the size of the destination}}

__builtin___strlcat_chk(buf, b, sizeof(b), __builtin_object_size(buf, 0)); // expected-warning {{size argument in '__builtin___strlcat_chk' call appears to be size of the source; expected the size of the destination}} \		__builtin___strlcat_chk(buf, b, sizeof(b), __builtin_object_size(buf, 0)); // expected-warning {{size argument in '__builtin___strlcat_chk' call appears to be size of the source; expected the size of the destination}} \
// expected-note {{change size argument to be the size of the destination}} \		// expected-note {{change size argument to be the size of the destination}} \
// expected-warning {{'strlcat' will always overflow; destination buffer has size 20, but size argument is 40}}		// expected-warning {{'strlcat' will always overflow; destination buffer has size 20, but size argument is 40}}
}		}

// rdar://11076881		// rdar://11076881
char * Test20(char p, const char in, unsigned n)		char * Test20(char p, const char in, unsigned n)
{		{
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines

void test23() {		void test23() {
char src[1024];		char src[1024];
char buf[10];		char buf[10];
memcpy(buf, src, 11); // expected-warning{{'memcpy' will always overflow; destination buffer has size 10, but size argument is 11}}		memcpy(buf, src, 11); // expected-warning{{'memcpy' will always overflow; destination buffer has size 10, but size argument is 11}}
my_memcpy(buf, src, 11); // expected-warning{{'memcpy' will always overflow; destination buffer has size 10, but size argument is 11}}		my_memcpy(buf, src, 11); // expected-warning{{'memcpy' will always overflow; destination buffer has size 10, but size argument is 11}}
}		}

// Test that __builtin_is_constant_evaluated() is not allowed in C		// Test that C++ builtins are not allowed in C
int test_cxx_builtin() {		void test_cxx_builtin() {
// expected-error@+1 {{use of unknown builtin '__builtin_is_constant_evaluated'}}		// expected-error@+1 {{use of unknown builtin '__builtin_is_constant_evaluated'}}
return __builtin_is_constant_evaluated();		(void)__builtin_is_constant_evaluated();

		// expected-error@+2 {{use of unknown builtin '__builtin_hardware_destructive_interference_size'}}
		// expected-note@+1 {{'__builtin_hardware_destructive_interference_size' declared here}}
		(void)__builtin_hardware_destructive_interference_size();

		// expected-error@+2 {{use of unknown builtin '__builtin_hardware_constructive_interference_size'}}
		// expected-note@+1 {{did you mean '__builtin_hardware_destructive_interference_size'?}}
		(void)__builtin_hardware_constructive_interference_size();
}		}

clang/test/SemaCXX/builtin-hardware-interference-size-aarch.cpp

This file was added.

				// expected-no-diagnostics
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=aarch64-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=aarch64_be-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=aarch64-apple-darwin16.0.0

				static_assert(__builtin_hardware_constructive_interference_size() == 128);
				static_assert(__builtin_hardware_destructive_interference_size() == 128);

clang/test/SemaCXX/builtin-hardware-interference-size-amd.cpp

This file was added.

				// expected-no-diagnostics
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=r600-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=amdgcn-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=amdgcn-apple-darwin16.0.0

				static_assert(__builtin_hardware_constructive_interference_size() == 64);
				static_assert(__builtin_hardware_destructive_interference_size() == 64);

clang/test/SemaCXX/builtin-hardware-interference-size-arm.cpp

This file was added.

				// expected-no-diagnostics
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=arm-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=armeb-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=thumb-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=thumbeb-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=arm-apple-darwin16.0.0

				static_assert(__builtin_hardware_constructive_interference_size() == 64);
				static_assert(__builtin_hardware_destructive_interference_size() == 64);

clang/test/SemaCXX/builtin-hardware-interference-size-nvptx.cpp

This file was added.

				// expected-no-diagnostics
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=nvptx-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=nvptx64-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=nvptx-apple-darwin16.0.0

				static_assert(__builtin_hardware_constructive_interference_size() == 64);
				static_assert(__builtin_hardware_destructive_interference_size() == 64);

clang/test/SemaCXX/builtin-hardware-interference-size-ppc.cpp

This file was added.

				// expected-no-diagnostics
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=ppc-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=ppc64-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=ppc64le-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=ppc-apple-darwin16.0.0

				static_assert(__builtin_hardware_constructive_interference_size() == 128);
				static_assert(__builtin_hardware_destructive_interference_size() == 128);

clang/test/SemaCXX/builtin-hardware-interference-size-unknown.cpp

This file was added.

				// expected-no-diagnostics
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=systemz-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=wasm32-unknown-unknown
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=wasm64-unknown-unknown
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=hexagon-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=systemz-apple-darwin16.0.0

				static_assert(__builtin_hardware_constructive_interference_size() == 0);
				static_assert(__builtin_hardware_destructive_interference_size() == 0);

clang/test/SemaCXX/builtin-hardware-interference-size-x86.cpp

This file was added.

				// expected-no-diagnostics
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=x86_64-linux-gnu
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=x86_64-unknown-unknown
				// RUN: %clang_cc1 -std=c++2a -verify %s -fcxx-exceptions -triple=x86_64-apple-darwin16.0.0

				static_assert(__builtin_hardware_constructive_interference_size() == 64);
				static_assert(__builtin_hardware_destructive_interference_size() == 64);

This is an archive of the discontinued LLVM Phabricator instance.

Hardware cache line size builtinsNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 217470

clang/include/clang/Basic/Builtins.def

clang/lib/AST/ExprConstant.cpp

clang/test/Sema/builtins.c

clang/test/SemaCXX/builtin-hardware-interference-size-aarch.cpp

clang/test/SemaCXX/builtin-hardware-interference-size-amd.cpp

clang/test/SemaCXX/builtin-hardware-interference-size-arm.cpp

clang/test/SemaCXX/builtin-hardware-interference-size-nvptx.cpp

clang/test/SemaCXX/builtin-hardware-interference-size-ppc.cpp

clang/test/SemaCXX/builtin-hardware-interference-size-unknown.cpp

clang/test/SemaCXX/builtin-hardware-interference-size-x86.cpp

Hardware cache line size builtins
Needs RevisionPublic