Download Raw Diff

Details

Reviewers

fhahn
scanon
aaron.ballman
craig.topper

Commits

rG8a92c45e07dc: [Clang] Add integer mul reduction builtin

Summary

Similar to the existing bitwise reduction builtins, this lowers to a llvm.vector.reduce.mul intrinsic call.

For other reductions, we've tried to share builtins for float/integer vectors, but the fmul reduction intrinsic also take a starting value argument and can either do unordered or serialized, but not reduction-trees as specified for the builtins. However we address fmul support this shouldn't affect the integer case.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RKSimon requested review of this revision.Jan 20 2022, 12:20 PM

RKSimon created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2022, 12:20 PM

I should mention - according to https://clang.llvm.org/docs/LanguageExtensions.html __builtin_reduce_add() already exists, which I don't think is true.

Harbormaster completed remote builds in B144657: Diff 401740.Jan 20 2022, 2:36 PM

craig.topper added a comment.Jan 20 2022, 4:23 PM

This comment was removed by craig.topper.

Thanks for the patch!

For other reductions, we've tried to share builtins for float/integer vectors, but the fadd/fmul reduction builtins also take a starting value argument. Technically I could support float by using default values, but we're probably better off with specific fadd/fmul reduction builtins for both arguments.

Just to double check, you mean LLVM intrinsics here, right? As specified, the Clang __builtin_reduce_add should support both integer and floating point reductions. Do you think that's problematic? If so, we should update the spec.

For float reductions, the key questions is how to lower it. Unfortunately llvm.vector.reduce.fadd reductions at the moment are either unordered or sequential, but we need a particular order. @junaire has been looking into this and put up D117480 as an option to extend the intrinsic to have a dedicated order argument. That would make it easy for backends like AArch64 to select the right reduction instruction. Alternatively clang could also create the expanded reduction tree to start with. But we really want to lower this to native reduction instructions if we can.

In D117829#3259588, @RKSimon wrote:

I should mention - according to https://clang.llvm.org/docs/LanguageExtensions.html __builtin_reduce_add() already exists, which I don't think is true.

Do you mean it is listed in the docs but not implemented yet? That's true, the docs specified the whole set of proposed builtins from the start, regardless of implementation status.

FWIW, I think we intentionally did not specify __builtin_reduce_mul to start with, because it is very prone to overflows for integer vectors. Not saying that we cannot add it, but it would at least require updating the extension documentation. @scanon might have additional feedback. In any case, it might be good to only handle the add case in this patch.

clang/lib/Sema/SemaChecking.cpp
2599	`_add` should also support floats, add a TODO?

I'm happy to continue with this just for integers or wait until we have a plan for floats as well. I guess we need to decide if we want to support the starting value in the fadd/fmul intrinsics from the builtin or not? If we don't then adding float support to the add/mul reduction builtins now (or later on) would just involve using default starting values, if we do then we probably need a separate fadd/fmul builtin.

Or we could add starting values to the add/mul reduction builtins as well and we manually insert a scalar post-reduction add/mul instruction in cgbuiltin?

wrt float orders, currently the avx512f reductions attach fmf attributes when the builtin is translated to the intrinsic: https://github.com/llvm/llvm-project/blob/d2012d965d60c3258b3a69d024491698f8aec386/clang/lib/CodeGen/CGBuiltin.cpp#L14070

We might be able to get away with just expecting users to handle this with pragmas in code? I was planning to see if that would work for the avx512f fmin/fmax reduction intrinsics so I can use __builtin_reduce_min/max.

I'm mainly interested in __builtin_reduce_mul as avx512f requires it - the (few) cases I've see it used have always involved char/short pixel data, extended to int/long before the mul reduction to address the overflow issues.

In D117829#3260772, @RKSimon wrote:

I'm happy to continue with this just for integers or wait until we have a plan for floats as well. I guess we need to decide if we want to support the starting value in the fadd/fmul intrinsics from the builtin or not? If we don't then adding float support to the add/mul reduction builtins now (or later on) would just involve using default starting values, if we do then we probably need a separate fadd/fmul builtin.

I'm not sure if there's a major benefit of specifying the start value? Unless there is, I think we should not add a start value argument.

Or we could add starting values to the add/mul reduction builtins as well and we manually insert a scalar post-reduction add/mul instruction in cgbuiltin?

wrt float orders, currently the avx512f reductions attach fmf attributes when the builtin is translated to the intrinsic: https://github.com/llvm/llvm-project/blob/d2012d965d60c3258b3a69d024491698f8aec386/clang/lib/CodeGen/CGBuiltin.cpp#L14070

I am not familiar how exactly ia32_reduce_fadd_pd512 & co are specified, but it looks like adding reassoicate to the intrinsic call there might be incorrect technically, i.e. calming the order does not matter, while I assume the C builtin guarantees a particular order? It might not result in an actual mis-compile on X86, because the X86 backend kind of guarantees that reductions with reassoicate are lowered exactly as the C builtin requires (relying on some kind of secret handshake between frontend & backend).

As discussed in D117480 this seems like a major weakness in how llvm.vector.reduce.fadd is specified. As long as it is only used for target specific builtins, things might work out fine in most cases, but different targets might lower reassoicate reductions differently. Also, things might fall apart if the middle end uses the reassoicate property during a transform that changes the reduction order.

We might be able to get away with just expecting users to handle this with pragmas in code?

If the user allows reassoication via a pragma/flag, we can use the reduction builtin at the moment without problems. The motivation behind specifying a well defined order that can be lowered efficiently by targets is to guarantee portable results by default. This is important for uses cases where the builtins are used in code that's executed on different platforms.

I was planning to see if that would work for the avx512f fmin/fmax reduction intrinsics so I can use __builtin_reduce_min/max.

There shouldn't be any problem with fmin/fmax, as the result should be independent of the evaluation order IIUC.

I'm mainly interested in __builtin_reduce_mul as avx512f requires it - the (few) cases I've see it used have always involved char/short pixel data, extended to int/long before the mul reduction to address the overflow issues.

Sounds reasonable! As mentioned earlier, it would be good to do that as separate patch, also updating LanguageExtensions.rst.

@junaire has been looking into this and put up D117480 as an option to extend the intrinsic to have a dedicated order argument.

Hi @fhahn, I wonder if we should continue working on this as I remember one of the reviewers was doubted this change.

I agree that the major problem now is how to lower it with a particular order and maybe we can reach an agreement before we can work on it.

craig.topper mentioned this in D117881: [X86] Remove avx512f integer and/or/xor/min/max reduction intrinsics and use generic equivalents.Jan 21 2022, 10:44 PM

RKSimon planned changes to this revision.Jan 23 2022, 11:24 AM

@RKSimon are you planning on pushing this patch through? From my perspective, it looks good with splitting off __builtin_reduce_mul and adding a TODO to also implement it for floating point types.

Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2022, 9:12 AM

In D117829#3386398, @fhahn wrote:

@RKSimon are you planning on pushing this patch through? From my perspective, it looks good with splitting off __builtin_reduce_mul and adding a TODO to also implement it for floating point types.

You mean handle add/mul in separate patches? I'm happy to continue with this, I just didn't want to make it more difficult for us to add fp support in the future - particularly if we do end up needing the scalar initial_value argument

In D117829#3389241, @RKSimon wrote:

In D117829#3386398, @fhahn wrote:

@RKSimon are you planning on pushing this patch through? From my perspective, it looks good with splitting off __builtin_reduce_mul and adding a TODO to also implement it for floating point types.

You mean handle add/mul in separate patches? I'm happy to continue with this, I just didn't want to make it more difficult for us to add fp support in the future - particularly if we do end up needing the scalar initial_value argument

Yeah that's what I meant. I'm not sure about the need for supporting a scalar initial value, but implementing a subset of the specified behavior for now shouldn't cause much trouble further down the line hopefully.

@fhahn Remind me - why did you want me to split these? If we're initially just going for integer support, can't both be done at the same time in this patch?

Herald added a subscriber: StephenFan. · View Herald TranscriptApr 8 2022, 3:00 AM

rebase

RKSimon marked an inline comment as done.Apr 8 2022, 9:39 AM

Harbormaster completed remote builds in B158720: Diff 421556.Apr 8 2022, 11:05 AM

ping?

@fhahn Remind me - why did you want me to split these? If we're initially just going for integer support, can't both be done at the same time in this patch?

I think my original thinking was that __builtin_reduce_mul isn't defined at the moment https://clang.llvm.org/docs/LanguageExtensions.html#vector-builtins

So I think it would be good to either the update the docs in the patch (or preferably be split off the straight forward __builtin_reduce_add change :))

RKSimon mentioned this in D124741: [Clang] Add integer add reduction builtin.May 1 2022, 12:52 PM

RKSimon mentioned this in rGa23291b7db48: [Clang] Add integer add reduction builtin.May 2 2022, 3:03 AM

Just handle the mul case now that D124741 has landed

Add "__builtin_reduce_mul" to the docs listing supported reductions

Harbormaster completed remote builds in B162211: Diff 426367.May 2 2022, 6:07 AM

LGTM, thanks!

IIRC @scanon had some opinions about __builtin_reduce_mul during some earlier discussions. Please wait a few days in case there are additional comments.

This revision is now accepted and ready to land.May 5 2022, 12:58 PM

ping @scanon

Closed by commit rG8a92c45e07dc: [Clang] Add integer mul reduction builtin (authored by RKSimon). · Explain WhyMay 9 2022, 4:13 AM

This revision was automatically updated to reflect the committed changes.

RKSimon added a commit: rG8a92c45e07dc: [Clang] Add integer mul reduction builtin.

RKSimon mentioned this in D125222: [X86] Replace avx512f integer mul reduction builtins with generic builtin.May 9 2022, 4:57 AM

RKSimon mentioned this in rGec6024d0811b: [X86] Replace avx512f integer mul reduction builtins with generic builtin.May 9 2022, 6:10 AM

Diff 428026

clang/docs/LanguageExtensions.rst

	Show First 20 Lines • Show All 641 Lines • ▼ Show 20 Lines
	======================================= ================================================================ ==================================			======================================= ================================================================ ==================================
	ET __builtin_reduce_max(VT a) return x or y, whichever is larger; If exactly one argument is integer and floating point types			ET __builtin_reduce_max(VT a) return x or y, whichever is larger; If exactly one argument is integer and floating point types
	a NaN, return the other argument. If both arguments are NaNs,			a NaN, return the other argument. If both arguments are NaNs,
	fmax() return a NaN.			fmax() return a NaN.
	ET __builtin_reduce_min(VT a) return x or y, whichever is smaller; If exactly one argument integer and floating point types			ET __builtin_reduce_min(VT a) return x or y, whichever is smaller; If exactly one argument integer and floating point types
	is a NaN, return the other argument. If both arguments are			is a NaN, return the other argument. If both arguments are
	NaNs, fmax() return a NaN.			NaNs, fmax() return a NaN.
	ET __builtin_reduce_add(VT a) \+ integer and floating point types			ET __builtin_reduce_add(VT a) \+ integer and floating point types
				ET __builtin_reduce_mul(VT a) * integer and floating point types
	ET __builtin_reduce_and(VT a) & integer types			ET __builtin_reduce_and(VT a) & integer types
	ET __builtin_reduce_or(VT a) \\| integer types			ET __builtin_reduce_or(VT a) \\| integer types
	ET __builtin_reduce_xor(VT a) ^ integer types			ET __builtin_reduce_xor(VT a) ^ integer types
	======================================= ================================================================ ==================================			======================================= ================================================================ ==================================

	Matrix Types			Matrix Types
	============			============

	▲ Show 20 Lines • Show All 3,991 Lines • Show Last 20 Lines

clang/include/clang/Basic/Builtins.def

	Show First 20 Lines • Show All 658 Lines • ▼ Show 20 Lines
	BUILTIN(__builtin_elementwise_add_sat, "v.", "nct")			BUILTIN(__builtin_elementwise_add_sat, "v.", "nct")
	BUILTIN(__builtin_elementwise_sub_sat, "v.", "nct")			BUILTIN(__builtin_elementwise_sub_sat, "v.", "nct")
	BUILTIN(__builtin_reduce_max, "v.", "nct")			BUILTIN(__builtin_reduce_max, "v.", "nct")
	BUILTIN(__builtin_reduce_min, "v.", "nct")			BUILTIN(__builtin_reduce_min, "v.", "nct")
	BUILTIN(__builtin_reduce_xor, "v.", "nct")			BUILTIN(__builtin_reduce_xor, "v.", "nct")
	BUILTIN(__builtin_reduce_or, "v.", "nct")			BUILTIN(__builtin_reduce_or, "v.", "nct")
	BUILTIN(__builtin_reduce_and, "v.", "nct")			BUILTIN(__builtin_reduce_and, "v.", "nct")
	BUILTIN(__builtin_reduce_add, "v.", "nct")			BUILTIN(__builtin_reduce_add, "v.", "nct")
				BUILTIN(__builtin_reduce_mul, "v.", "nct")

	BUILTIN(__builtin_matrix_transpose, "v.", "nFt")			BUILTIN(__builtin_matrix_transpose, "v.", "nFt")
	BUILTIN(__builtin_matrix_column_major_load, "v.", "nFt")			BUILTIN(__builtin_matrix_column_major_load, "v.", "nFt")
	BUILTIN(__builtin_matrix_column_major_store, "v.", "nFt")			BUILTIN(__builtin_matrix_column_major_store, "v.", "nFt")

	// "Overloaded" Atomic operator builtins. These are overloaded to support data			// "Overloaded" Atomic operator builtins. These are overloaded to support data
	// types of i8, i16, i32, i64, and i128. The front-end sees calls to the			// types of i8, i16, i32, i64, and i128. The front-end sees calls to the
	// non-suffixed version of these (which has a bogus type) and transforms them to			// non-suffixed version of these (which has a bogus type) and transforms them to
	▲ Show 20 Lines • Show All 1,035 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,140 Lines • ▼ Show 20 Lines	case Builtin::BI__builtin_reduce_min: {

return RValue::get(emitUnaryBuiltin(		return RValue::get(emitUnaryBuiltin(
*this, E, GetIntrinsicID(E->getArg(0)->getType()), "rdx.min"));		*this, E, GetIntrinsicID(E->getArg(0)->getType()), "rdx.min"));
}		}

case Builtin::BI__builtin_reduce_add:		case Builtin::BI__builtin_reduce_add:
return RValue::get(emitUnaryBuiltin(		return RValue::get(emitUnaryBuiltin(
*this, E, llvm::Intrinsic::vector_reduce_add, "rdx.add"));		*this, E, llvm::Intrinsic::vector_reduce_add, "rdx.add"));
		case Builtin::BI__builtin_reduce_mul:
		return RValue::get(emitUnaryBuiltin(
		*this, E, llvm::Intrinsic::vector_reduce_mul, "rdx.mul"));
case Builtin::BI__builtin_reduce_xor:		case Builtin::BI__builtin_reduce_xor:
return RValue::get(emitUnaryBuiltin(		return RValue::get(emitUnaryBuiltin(
*this, E, llvm::Intrinsic::vector_reduce_xor, "rdx.xor"));		*this, E, llvm::Intrinsic::vector_reduce_xor, "rdx.xor"));
case Builtin::BI__builtin_reduce_or:		case Builtin::BI__builtin_reduce_or:
return RValue::get(emitUnaryBuiltin(		return RValue::get(emitUnaryBuiltin(
*this, E, llvm::Intrinsic::vector_reduce_or, "rdx.or"));		*this, E, llvm::Intrinsic::vector_reduce_or, "rdx.or"));
case Builtin::BI__builtin_reduce_and:		case Builtin::BI__builtin_reduce_and:
return RValue::get(emitUnaryBuiltin(		return RValue::get(emitUnaryBuiltin(
▲ Show 20 Lines • Show All 16,030 Lines • Show Last 20 Lines

clang/lib/Sema/SemaChecking.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,590 Lines • ▼ Show 20 Lines	if (!TyA) {
return ExprError();		return ExprError();
}		}

TheCall->setType(TyA->getElementType());		TheCall->setType(TyA->getElementType());
break;		break;
}		}

// These builtins support vectors of integers only.		// These builtins support vectors of integers only.
// TODO: ADD should support floating-point types.		// TODO: ADD/MUL should support floating-point types.
		fhahnUnsubmitted Done Reply Inline Actions `_add` should also support floats, add a TODO? fhahn: `_add` should also support floats, add a TODO?
case Builtin::BI__builtin_reduce_add:		case Builtin::BI__builtin_reduce_add:
		case Builtin::BI__builtin_reduce_mul:
case Builtin::BI__builtin_reduce_xor:		case Builtin::BI__builtin_reduce_xor:
case Builtin::BI__builtin_reduce_or:		case Builtin::BI__builtin_reduce_or:
case Builtin::BI__builtin_reduce_and: {		case Builtin::BI__builtin_reduce_and: {
if (PrepareBuiltinReduceMathOneArgCall(TheCall))		if (PrepareBuiltinReduceMathOneArgCall(TheCall))
return ExprError();		return ExprError();

const Expr *Arg = TheCall->getArg(0);		const Expr *Arg = TheCall->getArg(0);
const auto *TyA = Arg->getType()->getAs<VectorType>();		const auto *TyA = Arg->getType()->getAs<VectorType>();
▲ Show 20 Lines • Show All 15,133 Lines • Show Last 20 Lines

clang/test/CodeGen/builtins-reduction-math.c

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	void test_builtin_reduce_add(si8 vi1, u4 vu1) {

// CHECK: [[CVU1:%.+]] = load <4 x i32>, <4 x i32>* %cvu1, align 16		// CHECK: [[CVU1:%.+]] = load <4 x i32>, <4 x i32>* %cvu1, align 16
// CHECK-NEXT: [[RDX2:%.+]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[CVU1]])		// CHECK-NEXT: [[RDX2:%.+]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[CVU1]])
// CHECK-NEXT: zext i32 [[RDX2]] to i64		// CHECK-NEXT: zext i32 [[RDX2]] to i64
const u4 cvu1 = vu1;		const u4 cvu1 = vu1;
unsigned long long r5 = __builtin_reduce_add(cvu1);		unsigned long long r5 = __builtin_reduce_add(cvu1);
}		}

		void test_builtin_reduce_mul(si8 vi1, u4 vu1) {
		// CHECK: [[VI1:%.+]] = load <8 x i16>, <8 x i16>* %vi1.addr, align 16
		// CHECK-NEXT: call i16 @llvm.vector.reduce.mul.v8i16(<8 x i16> [[VI1]])
		short r2 = __builtin_reduce_mul(vi1);

		// CHECK: [[VU1:%.+]] = load <4 x i32>, <4 x i32>* %vu1.addr, align 16
		// CHECK-NEXT: call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[VU1]])
		unsigned r3 = __builtin_reduce_mul(vu1);

		// CHECK: [[CVI1:%.+]] = load <8 x i16>, <8 x i16>* %cvi1, align 16
		// CHECK-NEXT: [[RDX1:%.+]] = call i16 @llvm.vector.reduce.mul.v8i16(<8 x i16> [[CVI1]])
		// CHECK-NEXT: sext i16 [[RDX1]] to i32
		const si8 cvi1 = vi1;
		int r4 = __builtin_reduce_mul(cvi1);

		// CHECK: [[CVU1:%.+]] = load <4 x i32>, <4 x i32>* %cvu1, align 16
		// CHECK-NEXT: [[RDX2:%.+]] = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[CVU1]])
		// CHECK-NEXT: zext i32 [[RDX2]] to i64
		const u4 cvu1 = vu1;
		unsigned long long r5 = __builtin_reduce_mul(cvu1);
		}

void test_builtin_reduce_xor(si8 vi1, u4 vu1) {		void test_builtin_reduce_xor(si8 vi1, u4 vu1) {

// CHECK: [[VI1:%.+]] = load <8 x i16>, <8 x i16>* %vi1.addr, align 16		// CHECK: [[VI1:%.+]] = load <8 x i16>, <8 x i16>* %vi1.addr, align 16
// CHECK-NEXT: call i16 @llvm.vector.reduce.xor.v8i16(<8 x i16> [[VI1]])		// CHECK-NEXT: call i16 @llvm.vector.reduce.xor.v8i16(<8 x i16> [[VI1]])
short r2 = __builtin_reduce_xor(vi1);		short r2 = __builtin_reduce_xor(vi1);

// CHECK: [[VU1:%.+]] = load <4 x i32>, <4 x i32>* %vu1.addr, align 16		// CHECK: [[VU1:%.+]] = load <4 x i32>, <4 x i32>* %vu1.addr, align 16
// CHECK-NEXT: call i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> [[VU1]])		// CHECK-NEXT: call i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> [[VU1]])
Show All 24 Lines

clang/test/Sema/builtins-reduction-math.c

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	void test_builtin_reduce_add(int i, float4 v, int3 iv) {

i = __builtin_reduce_add(i);		i = __builtin_reduce_add(i);
// expected-error@-1 {{1st argument must be a vector of integers (was 'int')}}		// expected-error@-1 {{1st argument must be a vector of integers (was 'int')}}

i = __builtin_reduce_add(v);		i = __builtin_reduce_add(v);
// expected-error@-1 {{1st argument must be a vector of integers (was 'float4' (vector of 4 'float' values))}}		// expected-error@-1 {{1st argument must be a vector of integers (was 'float4' (vector of 4 'float' values))}}
}		}

		void test_builtin_reduce_mul(int i, float4 v, int3 iv) {
		struct Foo s = __builtin_reduce_mul(iv);
		// expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'int'}}

		i = __builtin_reduce_mul();
		// expected-error@-1 {{too few arguments to function call, expected 1, have 0}}

		i = __builtin_reduce_mul(iv, iv);
		// expected-error@-1 {{too many arguments to function call, expected 1, have 2}}

		i = __builtin_reduce_mul(i);
		// expected-error@-1 {{1st argument must be a vector of integers (was 'int')}}

		i = __builtin_reduce_mul(v);
		// expected-error@-1 {{1st argument must be a vector of integers (was 'float4' (vector of 4 'float' values))}}
		}

void test_builtin_reduce_xor(int i, float4 v, int3 iv) {		void test_builtin_reduce_xor(int i, float4 v, int3 iv) {
struct Foo s = __builtin_reduce_xor(iv);		struct Foo s = __builtin_reduce_xor(iv);
// expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'int'}}		// expected-error@-1 {{initializing 'struct Foo' with an expression of incompatible type 'int'}}

i = __builtin_reduce_xor();		i = __builtin_reduce_xor();
// expected-error@-1 {{too few arguments to function call, expected 1, have 0}}		// expected-error@-1 {{too few arguments to function call, expected 1, have 0}}

i = __builtin_reduce_xor(iv, iv);		i = __builtin_reduce_xor(iv, iv);
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Add integer mul reduction builtin
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 428026

clang/docs/LanguageExtensions.rst

clang/include/clang/Basic/Builtins.def

clang/lib/CodeGen/CGBuiltin.cpp

clang/lib/Sema/SemaChecking.cpp

clang/test/CodeGen/builtins-reduction-math.c

clang/test/Sema/builtins-reduction-math.c

This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Add integer mul reduction builtinClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 428026

clang/docs/LanguageExtensions.rst

clang/include/clang/Basic/Builtins.def

clang/lib/CodeGen/CGBuiltin.cpp

clang/lib/Sema/SemaChecking.cpp

clang/test/CodeGen/builtins-reduction-math.c

clang/test/Sema/builtins-reduction-math.c

[Clang] Add integer mul reduction builtin
ClosedPublic