This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/
-
CodeGen/
4/4
CGBuiltin.cpp
-
Headers/
7/7
avx512fintrin.h
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
avx512-reduceIntrin.c

Differential D96231

[X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd.
ClosedPublic

Authored by pengfei on Feb 7 2021, 6:58 PM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
spatel

Commits

rGdd2460ed5d77: [X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd.

Summary

Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	410 ms	x64 debian > libarcher.races::task-dependency.c
	370 ms	x64 debian > libarcher.races::task-taskgroup-unrelated.c
	270 ms	x64 debian > libarcher.races::task-taskwait-nested.c
	510 ms	x64 debian > libarcher.races::task-two.c
	300 ms	x64 debian > libarcher.task::task-barrier.c
		View Full Test Results (13 Failed)

Event Timeline

pengfei requested review of this revision.Feb 7 2021, 6:58 PM

pengfei created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2021, 6:58 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

pengfei mentioned this in D93179: [X86] Convert fmin/fmax _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506).Feb 7 2021, 7:01 PM

Harbormaster completed remote builds in B88234: Diff 322015.Feb 7 2021, 7:32 PM

spatel added inline comments.Feb 8 2021, 8:28 AM

clang/lib/CodeGen/CGBuiltin.cpp
13829	I haven't looked at this part of the compiler in a long time, so I was wondering how we handle FMF scope. It looks like there is already an FMFGuard object in place -- CodeGenFunction::CGFPOptionsRAII(). So setting FMF here will not affect anything but this CreateCall(). Does that match your understanding? Should we have an extra regression test to make sure that does not change? I am imagining something like: double test_mm512_reduce_add_pd(__m512d __W, double ExtraAddOp) { double S = _mm512_reduce_add_pd(__W) + ExtraAddOp; return S; } Then we could confirm that `reassoc` is not applied to the `fadd` that follows the reduction call.
13829	Currently (and we could say that this is an LLVM codegen bug), we will not generate the optimal/expected reduction with `reassoc` alone. I think the x86 reduction definition is implicitly assuming that -0.0 is not meaningful here, so we should add `nsz` too. The backend is expecting an explicit `nsz` on this op. Ie, I see this x86 asm currently with only `reassoc`: vextractf64x4 $1, %zmm0, %ymm1 vaddpd %zmm1, %zmm0, %zmm0 vextractf128 $1, %ymm0, %xmm1 vaddpd %xmm1, %xmm0, %xmm0 vpermilpd $1, %xmm0, %xmm1 vaddsd %xmm1, %xmm0, %xmm0 vxorpd %xmm1, %xmm1, %xmm1 <--- create 0.0 vaddsd %xmm1, %xmm0, %xmm0 <--- add it to the reduction result Alternatively (and I'm not sure where it is specified), we could replace the default 0.0 argument with -0.0?
clang/lib/Headers/avx512fintrin.h
9300	This is an existing text bug, but if we are changing this text, we might as well fix it in this patch - I'm not sure what "off" refers to here. Should that be "order"?
9303	Typo: "floating-point types"
9304	Also mention that sign of zero is indeterminate. We might use the LangRef text as a model for what to say here: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fadd-intrinsic

spatel added inline comments.Feb 8 2021, 8:51 AM

clang/lib/Headers/avx512fintrin.h
9352	Ah - this is where the +0.0 is specified. This should be -0.0. We could still add 'nsz' flag to be safe.
9362	This also should be changed to -0.0?

Address Sanjay's comments. Thanks for the thoroughly review!

clang/lib/CodeGen/CGBuiltin.cpp
13829	Confirmed by new tests.
13829	I think there's no such assumption for fadd/fmul instructions. We do have it for fmin/fmax. So I think we don't need to add nsz here.
clang/lib/Headers/avx512fintrin.h
9304	Got it. Thanks!
9352	-0.0 can fix the problem. But we don't need to add 'nsz'. We can add it if we can find a corner case.

LGTM

This revision is now accepted and ready to land.Feb 9 2021, 4:52 AM

Harbormaster completed remote builds in B88443: Diff 322344.Feb 9 2021, 5:12 AM

This revision was landed with ongoing or failed builds.Feb 9 2021, 5:14 AM

Closed by commit rGdd2460ed5d77: [X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd. (authored by Wang, Pengfei <pengfei.wang@intel.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

pengfei added a commit: rGdd2460ed5d77: [X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd..

qiucf mentioned this in D101209: [PowerPC] Provide fastmath sqrt and div functions in altivec.h.Apr 25 2021, 9:31 PM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGBuiltin.cpp

2 lines

Headers/

avx512fintrin.h

3 lines

test/

CodeGen/

X86/

avx512-reduceIntrin.c

16 lines

Diff 322015

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,820 Lines • ▼ Show 20 Lines	case X86::BI__builtin_ia32_reduce_and_q512: {
Function *F =		Function *F =
CGM.getIntrinsic(Intrinsic::vector_reduce_and, Ops[0]->getType());		CGM.getIntrinsic(Intrinsic::vector_reduce_and, Ops[0]->getType());
return Builder.CreateCall(F, {Ops[0]});		return Builder.CreateCall(F, {Ops[0]});
}		}
case X86::BI__builtin_ia32_reduce_fadd_pd512:		case X86::BI__builtin_ia32_reduce_fadd_pd512:
case X86::BI__builtin_ia32_reduce_fadd_ps512: {		case X86::BI__builtin_ia32_reduce_fadd_ps512: {
Function *F =		Function *F =
CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Ops[1]->getType());		CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Ops[1]->getType());
		Builder.getFastMathFlags().setAllowReassoc(true);
		spatelUnsubmitted Done Reply Inline Actions I haven't looked at this part of the compiler in a long time, so I was wondering how we handle FMF scope. It looks like there is already an FMFGuard object in place -- CodeGenFunction::CGFPOptionsRAII(). So setting FMF here will not affect anything but this CreateCall(). Does that match your understanding? Should we have an extra regression test to make sure that does not change? I am imagining something like: double test_mm512_reduce_add_pd(__m512d __W, double ExtraAddOp) { double S = _mm512_reduce_add_pd(__W) + ExtraAddOp; return S; } Then we could confirm that `reassoc` is not applied to the `fadd` that follows the reduction call. spatel: I haven't looked at this part of the compiler in a long time, so I was wondering how we handle…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions Confirmed by new tests. pengfei: Confirmed by new tests.
		spatelUnsubmitted Done Reply Inline Actions Currently (and we could say that this is an LLVM codegen bug), we will not generate the optimal/expected reduction with `reassoc` alone. I think the x86 reduction definition is implicitly assuming that -0.0 is not meaningful here, so we should add `nsz` too. The backend is expecting an explicit `nsz` on this op. Ie, I see this x86 asm currently with only `reassoc`: vextractf64x4 $1, %zmm0, %ymm1 vaddpd %zmm1, %zmm0, %zmm0 vextractf128 $1, %ymm0, %xmm1 vaddpd %xmm1, %xmm0, %xmm0 vpermilpd $1, %xmm0, %xmm1 vaddsd %xmm1, %xmm0, %xmm0 vxorpd %xmm1, %xmm1, %xmm1 <--- create 0.0 vaddsd %xmm1, %xmm0, %xmm0 <--- add it to the reduction result Alternatively (and I'm not sure where it is specified), we could replace the default 0.0 argument with -0.0? spatel: Currently (and we could say that this is an LLVM codegen bug), we will not generate the…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions I think there's no such assumption for fadd/fmul instructions. We do have it for fmin/fmax. So I think we don't need to add nsz here. pengfei: I think there's no such assumption for fadd/fmul instructions. We do have it for fmin/fmax. So…
return Builder.CreateCall(F, {Ops[0], Ops[1]});		return Builder.CreateCall(F, {Ops[0], Ops[1]});
}		}
case X86::BI__builtin_ia32_reduce_fmul_pd512:		case X86::BI__builtin_ia32_reduce_fmul_pd512:
case X86::BI__builtin_ia32_reduce_fmul_ps512: {		case X86::BI__builtin_ia32_reduce_fmul_ps512: {
Function *F =		Function *F =
CGM.getIntrinsic(Intrinsic::vector_reduce_fmul, Ops[1]->getType());		CGM.getIntrinsic(Intrinsic::vector_reduce_fmul, Ops[1]->getType());
		Builder.getFastMathFlags().setAllowReassoc(true);
return Builder.CreateCall(F, {Ops[0], Ops[1]});		return Builder.CreateCall(F, {Ops[0], Ops[1]});
}		}
case X86::BI__builtin_ia32_reduce_mul_d512:		case X86::BI__builtin_ia32_reduce_mul_d512:
case X86::BI__builtin_ia32_reduce_mul_q512: {		case X86::BI__builtin_ia32_reduce_mul_q512: {
Function *F =		Function *F =
CGM.getIntrinsic(Intrinsic::vector_reduce_mul, Ops[0]->getType());		CGM.getIntrinsic(Intrinsic::vector_reduce_mul, Ops[0]->getType());
return Builder.CreateCall(F, {Ops[0]});		return Builder.CreateCall(F, {Ops[0]});
}		}
▲ Show 20 Lines • Show All 3,802 Lines • Show Last 20 Lines

clang/lib/Headers/avx512fintrin.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	/*===---- avx512fintrin.h - AVX512F intrinsics -----------------------------===			/*===---- avx512fintrin.h - AVX512F intrinsics -----------------------------===
	*			*
	* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	* See https://llvm.org/LICENSE.txt for license information.			* See https://llvm.org/LICENSE.txt for license information.
	* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	*			*
	*===-----------------------------------------------------------------------===			*===-----------------------------------------------------------------------===
	*/			*/
	#ifndef __IMMINTRIN_H			#ifndef __IMMINTRIN_H
	#error "Never use <avx512fintrin.h> directly; include <immintrin.h> instead."			#error "Never use <avx512fintrin.h> directly; include <immintrin.h> instead."
				Lint: Pre-merge checks Inline Actions clang-tidy: error: "Never use <avx512fintrin.h> directly; include <immintrin.h> instead." [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: "Never use <avx512fintrin.h> directly; include <immintrin.h> instead."…
	#endif			#endif

	#ifndef __AVX512FINTRIN_H			#ifndef __AVX512FINTRIN_H
	#define __AVX512FINTRIN_H			#define __AVX512FINTRIN_H

	typedef char __v64qi __attribute__((__vector_size__(64)));			typedef char __v64qi __attribute__((__vector_size__(64)));
	typedef short __v32hi __attribute__((__vector_size__(64)));			typedef short __v32hi __attribute__((__vector_size__(64)));
	typedef double __v8df __attribute__((__vector_size__(64)));			typedef double __v8df __attribute__((__vector_size__(64)));
	▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines

	static __inline__ __m512i __DEFAULT_FN_ATTRS512			static __inline__ __m512i __DEFAULT_FN_ATTRS512
	_mm512_undefined_epi32(void)			_mm512_undefined_epi32(void)
	{			{
	return (__m512i)__builtin_ia32_undef512();			return (__m512i)__builtin_ia32_undef512();
	}			}

	static __inline__ __m512i __DEFAULT_FN_ATTRS512			static __inline__ __m512i __DEFAULT_FN_ATTRS512
	_mm512_broadcastd_epi32 (__m128i __A)			_mm512_broadcastd_epi32 (__m128i __A)
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'm128i'; did you mean 'm512i'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__m128i'; did you mean '__m512i'? [clang-diagnostic…
	{			{
	return (__m512i)__builtin_shufflevector((__v4si) __A, (__v4si) __A,			return (__m512i)__builtin_shufflevector((__v4si) __A, (__v4si) __A,
				Lint: Pre-merge checks Inline Actions clang-tidy: error: use of undeclared identifier '__v4si' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: use of undeclared identifier '__v4si' [clang-diagnostic-error] [[https…
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);			0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
	}			}

	static __inline__ __m512i __DEFAULT_FN_ATTRS512			static __inline__ __m512i __DEFAULT_FN_ATTRS512
	_mm512_mask_broadcastd_epi32 (__m512i __O, __mmask16 __M, __m128i __A)			_mm512_mask_broadcastd_epi32 (__m512i __O, __mmask16 __M, __m128i __A)
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'm128i'; did you mean 'm512i'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__m128i'; did you mean '__m512i'? [clang-diagnostic…
	{			{
	return (__m512i)__builtin_ia32_selectd_512(__M,			return (__m512i)__builtin_ia32_selectd_512(__M,
	(__v16si) _mm512_broadcastd_epi32(__A),			(__v16si) _mm512_broadcastd_epi32(__A),
	(__v16si) __O);			(__v16si) __O);
	}			}

	static __inline__ __m512i __DEFAULT_FN_ATTRS512			static __inline__ __m512i __DEFAULT_FN_ATTRS512
	_mm512_maskz_broadcastd_epi32 (__mmask16 __M, __m128i __A)			_mm512_maskz_broadcastd_epi32 (__mmask16 __M, __m128i __A)
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'm128i'; did you mean 'm512i'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__m128i'; did you mean '__m512i'? [clang-diagnostic…
	{			{
	return (__m512i)__builtin_ia32_selectd_512(__M,			return (__m512i)__builtin_ia32_selectd_512(__M,
	(__v16si) _mm512_broadcastd_epi32(__A),			(__v16si) _mm512_broadcastd_epi32(__A),
	(__v16si) _mm512_setzero_si512());			(__v16si) _mm512_setzero_si512());
	}			}

	static __inline__ __m512i __DEFAULT_FN_ATTRS512			static __inline__ __m512i __DEFAULT_FN_ATTRS512
	_mm512_broadcastq_epi64 (__m128i __A)			_mm512_broadcastq_epi64 (__m128i __A)
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'm128i'; did you mean 'm512i'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__m128i'; did you mean '__m512i'? [clang-diagnostic…
	{			{
	return (__m512i)__builtin_shufflevector((__v2di) __A, (__v2di) __A,			return (__m512i)__builtin_shufflevector((__v2di) __A, (__v2di) __A,
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'v2di'; did you mean 'v8di'? [clang-diagnostic-error] not useful clang-tidy: error: unknown type name 'v2di'; did you mean 'v8di'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__v2di'; did you mean '__v8di'? [clang-diagnostic-error]…
	0, 0, 0, 0, 0, 0, 0, 0);			0, 0, 0, 0, 0, 0, 0, 0);
	}			}

	static __inline__ __m512i __DEFAULT_FN_ATTRS512			static __inline__ __m512i __DEFAULT_FN_ATTRS512
	_mm512_mask_broadcastq_epi64 (__m512i __O, __mmask8 __M, __m128i __A)			_mm512_mask_broadcastq_epi64 (__m512i __O, __mmask8 __M, __m128i __A)
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'm128i'; did you mean 'm512i'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__m128i'; did you mean '__m512i'? [clang-diagnostic…
	{			{
	return (__m512i)__builtin_ia32_selectq_512(__M,			return (__m512i)__builtin_ia32_selectq_512(__M,
	(__v8di) _mm512_broadcastq_epi64(__A),			(__v8di) _mm512_broadcastq_epi64(__A),
	(__v8di) __O);			(__v8di) __O);

	}			}

	static __inline__ __m512i __DEFAULT_FN_ATTRS512			static __inline__ __m512i __DEFAULT_FN_ATTRS512
	_mm512_maskz_broadcastq_epi64 (__mmask8 __M, __m128i __A)			_mm512_maskz_broadcastq_epi64 (__mmask8 __M, __m128i __A)
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'm128i'; did you mean 'm512i'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__m128i'; did you mean '__m512i'? [clang-diagnostic…
	{			{
	return (__m512i)__builtin_ia32_selectq_512(__M,			return (__m512i)__builtin_ia32_selectq_512(__M,
	(__v8di) _mm512_broadcastq_epi64(__A),			(__v8di) _mm512_broadcastq_epi64(__A),
	(__v8di) _mm512_setzero_si512());			(__v8di) _mm512_setzero_si512());
	}			}


	static __inline __m512 __DEFAULT_FN_ATTRS512			static __inline __m512 __DEFAULT_FN_ATTRS512
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	_mm512_maskz_set1_epi64(__mmask8 __M, long long __A)			_mm512_maskz_set1_epi64(__mmask8 __M, long long __A)
	{			{
	return (__m512i)__builtin_ia32_selectq_512(__M,			return (__m512i)__builtin_ia32_selectq_512(__M,
	(__v8di)_mm512_set1_epi64(__A),			(__v8di)_mm512_set1_epi64(__A),
	(__v8di)_mm512_setzero_si512());			(__v8di)_mm512_setzero_si512());
	}			}

	static __inline__ __m512 __DEFAULT_FN_ATTRS512			static __inline__ __m512 __DEFAULT_FN_ATTRS512
	_mm512_broadcastss_ps(__m128 __A)			_mm512_broadcastss_ps(__m128 __A)
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name '__m128' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__m128' [clang-diagnostic-error] [[https://github.
	{			{
	return (__m512)__builtin_shufflevector((__v4sf) __A, (__v4sf) __A,			return (__m512)__builtin_shufflevector((__v4sf) __A, (__v4sf) __A,
				Lint: Pre-merge checks Inline Actions clang-tidy: error: use of undeclared identifier '__v4sf' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: use of undeclared identifier '__v4sf' [clang-diagnostic-error] [[https…
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);			0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
	}			}

	static __inline __m512i __DEFAULT_FN_ATTRS512			static __inline __m512i __DEFAULT_FN_ATTRS512
	_mm512_set4_epi32 (int __A, int __B, int __C, int __D)			_mm512_set4_epi32 (int __A, int __B, int __C, int __D)
	{			{
	return __extension__ (__m512i)(__v16si)			return __extension__ (__m512i)(__v16si)
	{ __D, __C, __B, __A, __D, __C, __B, __A,			{ __D, __C, __B, __A, __D, __C, __B, __A,
	Show All 31 Lines

	#define _mm512_setr4_pd(e0,e1,e2,e3) \			#define _mm512_setr4_pd(e0,e1,e2,e3) \
	_mm512_set4_pd((e3),(e2),(e1),(e0))			_mm512_set4_pd((e3),(e2),(e1),(e0))

	#define _mm512_setr4_ps(e0,e1,e2,e3) \			#define _mm512_setr4_ps(e0,e1,e2,e3) \
	_mm512_set4_ps((e3),(e2),(e1),(e0))			_mm512_set4_ps((e3),(e2),(e1),(e0))

	static __inline__ __m512d __DEFAULT_FN_ATTRS512			static __inline__ __m512d __DEFAULT_FN_ATTRS512
	_mm512_broadcastsd_pd(__m128d __A)			_mm512_broadcastsd_pd(__m128d __A)
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'm128d'; did you mean 'm512d'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__m128d'; did you mean '__m512d'? [clang-diagnostic…
	{			{
	return (__m512d)__builtin_shufflevector((__v2df) __A, (__v2df) __A,			return (__m512d)__builtin_shufflevector((__v2df) __A, (__v2df) __A,
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'v2df'; did you mean 'v8df'? [clang-diagnostic-error] not useful clang-tidy: error: unknown type name 'v2df'; did you mean 'v8df'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__v2df'; did you mean '__v8df'? [clang-diagnostic-error]…
	0, 0, 0, 0, 0, 0, 0, 0);			0, 0, 0, 0, 0, 0, 0, 0);
	}			}

	/* Cast between vector types */			/* Cast between vector types */

	static __inline __m512d __DEFAULT_FN_ATTRS512			static __inline __m512d __DEFAULT_FN_ATTRS512
	_mm512_castpd256_pd512(__m256d __a)			_mm512_castpd256_pd512(__m256d __a)
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name '__m256d' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__m256d' [clang-diagnostic-error] [[https://github.
	{			{
	return __builtin_shufflevector(__a, __a, 0, 1, 2, 3, -1, -1, -1, -1);			return __builtin_shufflevector(__a, __a, 0, 1, 2, 3, -1, -1, -1, -1);
	}			}

	static __inline __m512 __DEFAULT_FN_ATTRS512			static __inline __m512 __DEFAULT_FN_ATTRS512
	_mm512_castps256_ps512(__m256 __a)			_mm512_castps256_ps512(__m256 __a)
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name '__m256' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__m256' [clang-diagnostic-error] [[https://github.
	{			{
	return __builtin_shufflevector(__a, __a, 0, 1, 2, 3, 4, 5, 6, 7,			return __builtin_shufflevector(__a, __a, 0, 1, 2, 3, 4, 5, 6, 7,
	-1, -1, -1, -1, -1, -1, -1, -1);			-1, -1, -1, -1, -1, -1, -1, -1);
	}			}

	static __inline __m128d __DEFAULT_FN_ATTRS512			static __inline __m128d __DEFAULT_FN_ATTRS512
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'm128d'; did you mean 'm512d'? [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name '__m128d'; did you mean '__m512d'? [clang-diagnostic…
	_mm512_castpd512_pd128(__m512d __a)			_mm512_castpd512_pd128(__m512d __a)
	{			{
	return __builtin_shufflevector(__a, __a, 0, 1);			return __builtin_shufflevector(__a, __a, 0, 1);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: cannot initialize return object of type 'm512d' (vector of 8 'double' values) with an rvalue of type 'attribute((vector_size__(2 * sizeof(double)))) double' (vector of 2 'double' values) [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: cannot initialize return object of type '__m512d' (vector of 8 'double'…
	}			}

	static __inline __m256d __DEFAULT_FN_ATTRS512			static __inline __m256d __DEFAULT_FN_ATTRS512
	_mm512_castpd512_pd256 (__m512d __A)			_mm512_castpd512_pd256 (__m512d __A)
	{			{
	return __builtin_shufflevector(__A, __A, 0, 1, 2, 3);			return __builtin_shufflevector(__A, __A, 0, 1, 2, 3);
	}			}

	▲ Show 20 Lines • Show All 8,874 Lines • ▼ Show 20 Lines
	static __inline__ __m512d __DEFAULT_FN_ATTRS512			static __inline__ __m512d __DEFAULT_FN_ATTRS512
	_mm512_mask_abs_pd(__m512d __W, __mmask8 __K, __m512d __A)			_mm512_mask_abs_pd(__m512d __W, __mmask8 __K, __m512d __A)
	{			{
	return (__m512d)_mm512_mask_and_epi64((__v8di)__W, __K, _mm512_set1_epi64(0x7FFFFFFFFFFFFFFF),(__v8di)__A);			return (__m512d)_mm512_mask_and_epi64((__v8di)__W, __K, _mm512_set1_epi64(0x7FFFFFFFFFFFFFFF),(__v8di)__A);
	}			}

	/* Vector-reduction arithmetic accepts vectors as inputs and produces scalars as			/* Vector-reduction arithmetic accepts vectors as inputs and produces scalars as
	* outputs. This class of vector operation forms the basis of many scientific			* outputs. This class of vector operation forms the basis of many scientific
	* computations. In vector-reduction arithmetic, the evaluation off is			* computations. In vector-reduction arithmetic, the evaluation off is
				spatelUnsubmitted Done Reply Inline Actions This is an existing text bug, but if we are changing this text, we might as well fix it in this patch - I'm not sure what "off" refers to here. Should that be "order"? spatel: This is an existing text bug, but if we are changing this text, we might as well fix it in this…
	* independent of the order of the input elements of V.			* independent of the order of the input elements of V.

				* For floating points type, we always assume the elements are reassociable even
				spatelUnsubmitted Done Reply Inline Actions Typo: "floating-point types" spatel: Typo: "floating-point types"
				* if -fast-math is off.
				spatelUnsubmitted Done Reply Inline Actions Also mention that sign of zero is indeterminate. We might use the LangRef text as a model for what to say here: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fadd-intrinsic spatel: Also mention that sign of zero is indeterminate. We might use the LangRef text as a model for…
				pengfeiAuthorUnsubmitted Done Reply Inline Actions Got it. Thanks! pengfei: Got it. Thanks!

	* Used bisection method. At each step, we partition the vector with previous			* Used bisection method. At each step, we partition the vector with previous
	* step in half, and the operation is performed on its two halves.			* step in half, and the operation is performed on its two halves.
	* This takes log2(n) steps where n is the number of elements in the vector.			* This takes log2(n) steps where n is the number of elements in the vector.
	*/			*/

	static __inline__ long long __DEFAULT_FN_ATTRS512 _mm512_reduce_add_epi64(__m512i __W) {			static __inline__ long long __DEFAULT_FN_ATTRS512 _mm512_reduce_add_epi64(__m512i __W) {
	return __builtin_ia32_reduce_add_q512(__W);			return __builtin_ia32_reduce_add_q512(__W);
	}			}
	Show All 30 Lines

	static __inline__ long long __DEFAULT_FN_ATTRS512			static __inline__ long long __DEFAULT_FN_ATTRS512
	_mm512_mask_reduce_or_epi64(__mmask8 __M, __m512i __W) {			_mm512_mask_reduce_or_epi64(__mmask8 __M, __m512i __W) {
	__W = _mm512_maskz_mov_epi64(__M, __W);			__W = _mm512_maskz_mov_epi64(__M, __W);
	return __builtin_ia32_reduce_or_q512(__W);			return __builtin_ia32_reduce_or_q512(__W);
	}			}

	static __inline__ double __DEFAULT_FN_ATTRS512 _mm512_reduce_add_pd(__m512d __W) {			static __inline__ double __DEFAULT_FN_ATTRS512 _mm512_reduce_add_pd(__m512d __W) {
	return __builtin_ia32_reduce_fadd_pd512(0.0, __W);			return __builtin_ia32_reduce_fadd_pd512(0.0, __W);
				spatelUnsubmitted Done Reply Inline Actions Ah - this is where the +0.0 is specified. This should be -0.0. We could still add 'nsz' flag to be safe. spatel: Ah - this is where the +0.0 is specified. This should be -0.0. We could still add 'nsz' flag to…
				pengfeiAuthorUnsubmitted Done Reply Inline Actions -0.0 can fix the problem. But we don't need to add 'nsz'. We can add it if we can find a corner case. pengfei: -0.0 can fix the problem. But we don't need to add 'nsz'. We can add it if we can find a corner…
	}			}

	static __inline__ double __DEFAULT_FN_ATTRS512 _mm512_reduce_mul_pd(__m512d __W) {			static __inline__ double __DEFAULT_FN_ATTRS512 _mm512_reduce_mul_pd(__m512d __W) {
	return __builtin_ia32_reduce_fmul_pd512(1.0, __W);			return __builtin_ia32_reduce_fmul_pd512(1.0, __W);
	}			}

	static __inline__ double __DEFAULT_FN_ATTRS512			static __inline__ double __DEFAULT_FN_ATTRS512
	_mm512_mask_reduce_add_pd(__mmask8 __M, __m512d __W) {			_mm512_mask_reduce_add_pd(__mmask8 __M, __m512d __W) {
	__W = _mm512_maskz_mov_pd(__M, __W);			__W = _mm512_maskz_mov_pd(__M, __W);
	return __builtin_ia32_reduce_fadd_pd512(0.0, __W);			return __builtin_ia32_reduce_fadd_pd512(0.0, __W);
				spatelUnsubmitted Done Reply Inline Actions This also should be changed to -0.0? spatel: This also should be changed to -0.0?
	}			}

	static __inline__ double __DEFAULT_FN_ATTRS512			static __inline__ double __DEFAULT_FN_ATTRS512
	_mm512_mask_reduce_mul_pd(__mmask8 __M, __m512d __W) {			_mm512_mask_reduce_mul_pd(__mmask8 __M, __m512d __W) {
	__W = _mm512_mask_mov_pd(_mm512_set1_pd(1.0), __M, __W);			__W = _mm512_mask_mov_pd(_mm512_set1_pd(1.0), __M, __W);
	return __builtin_ia32_reduce_fmul_pd512(1.0, __W);			return __builtin_ia32_reduce_fmul_pd512(1.0, __W);
	}			}

	▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

clang/test/CodeGen/X86/avx512-reduceIntrin.c

	Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	// CHECK: bitcast i16 %{{.*}} to <16 x i1>			// CHECK: bitcast i16 %{{.*}} to <16 x i1>
	// CHECK: select <16 x i1> %{{.}}, <16 x i32> %{{.}}, <16 x i32> %{{.*}}			// CHECK: select <16 x i1> %{{.}}, <16 x i32> %{{.}}, <16 x i32> %{{.*}}
	// CHECK: call i32 @llvm.vector.reduce.or.v16i32(<16 x i32> %{{.*}})			// CHECK: call i32 @llvm.vector.reduce.or.v16i32(<16 x i32> %{{.*}})
	return _mm512_mask_reduce_or_epi32(__M, __W);			return _mm512_mask_reduce_or_epi32(__M, __W);
	}			}

	double test_mm512_reduce_add_pd(__m512d __W){			double test_mm512_reduce_add_pd(__m512d __W){
	// CHECK-LABEL: @test_mm512_reduce_add_pd(			// CHECK-LABEL: @test_mm512_reduce_add_pd(
	// CHECK: call double @llvm.vector.reduce.fadd.v8f64(double 0.000000e+00, <8 x double> %{{.*}})			// CHECK: call reassoc double @llvm.vector.reduce.fadd.v8f64(double 0.000000e+00, <8 x double> %{{.*}})
	return _mm512_reduce_add_pd(__W);			return _mm512_reduce_add_pd(__W);
	}			}

	double test_mm512_reduce_mul_pd(__m512d __W){			double test_mm512_reduce_mul_pd(__m512d __W){
	// CHECK-LABEL: @test_mm512_reduce_mul_pd(			// CHECK-LABEL: @test_mm512_reduce_mul_pd(
	// CHECK: call double @llvm.vector.reduce.fmul.v8f64(double 1.000000e+00, <8 x double> %{{.*}})			// CHECK: call reassoc double @llvm.vector.reduce.fmul.v8f64(double 1.000000e+00, <8 x double> %{{.*}})
	return _mm512_reduce_mul_pd(__W);			return _mm512_reduce_mul_pd(__W);
	}			}

	float test_mm512_reduce_add_ps(__m512 __W){			float test_mm512_reduce_add_ps(__m512 __W){
	// CHECK-LABEL: @test_mm512_reduce_add_ps(			// CHECK-LABEL: @test_mm512_reduce_add_ps(
	// CHECK: call float @llvm.vector.reduce.fadd.v16f32(float 0.000000e+00, <16 x float> %{{.*}})			// CHECK: call reassoc float @llvm.vector.reduce.fadd.v16f32(float 0.000000e+00, <16 x float> %{{.*}})
	return _mm512_reduce_add_ps(__W);			return _mm512_reduce_add_ps(__W);
	}			}

	float test_mm512_reduce_mul_ps(__m512 __W){			float test_mm512_reduce_mul_ps(__m512 __W){
	// CHECK-LABEL: @test_mm512_reduce_mul_ps(			// CHECK-LABEL: @test_mm512_reduce_mul_ps(
	// CHECK: call float @llvm.vector.reduce.fmul.v16f32(float 1.000000e+00, <16 x float> %{{.*}})			// CHECK: call reassoc float @llvm.vector.reduce.fmul.v16f32(float 1.000000e+00, <16 x float> %{{.*}})
	return _mm512_reduce_mul_ps(__W);			return _mm512_reduce_mul_ps(__W);
	}			}

	double test_mm512_mask_reduce_add_pd(__mmask8 __M, __m512d __W){			double test_mm512_mask_reduce_add_pd(__mmask8 __M, __m512d __W){
	// CHECK-LABEL: @test_mm512_mask_reduce_add_pd(			// CHECK-LABEL: @test_mm512_mask_reduce_add_pd(
	// CHECK: bitcast i8 %{{.*}} to <8 x i1>			// CHECK: bitcast i8 %{{.*}} to <8 x i1>
	// CHECK: select <8 x i1> %{{.}}, <8 x double> %{{.}}, <8 x double> %{{.*}}			// CHECK: select <8 x i1> %{{.}}, <8 x double> %{{.}}, <8 x double> %{{.*}}
	// CHECK: call double @llvm.vector.reduce.fadd.v8f64(double 0.000000e+00, <8 x double> %{{.*}})			// CHECK: call reassoc double @llvm.vector.reduce.fadd.v8f64(double 0.000000e+00, <8 x double> %{{.*}})
	return _mm512_mask_reduce_add_pd(__M, __W);			return _mm512_mask_reduce_add_pd(__M, __W);
	}			}

	double test_mm512_mask_reduce_mul_pd(__mmask8 __M, __m512d __W){			double test_mm512_mask_reduce_mul_pd(__mmask8 __M, __m512d __W){
	// CHECK-LABEL: @test_mm512_mask_reduce_mul_pd(			// CHECK-LABEL: @test_mm512_mask_reduce_mul_pd(
	// CHECK: bitcast i8 %{{.*}} to <8 x i1>			// CHECK: bitcast i8 %{{.*}} to <8 x i1>
	// CHECK: select <8 x i1> %{{.}}, <8 x double> %{{.}}, <8 x double> %{{.*}}			// CHECK: select <8 x i1> %{{.}}, <8 x double> %{{.}}, <8 x double> %{{.*}}
	// CHECK: call double @llvm.vector.reduce.fmul.v8f64(double 1.000000e+00, <8 x double> %{{.*}})			// CHECK: call reassoc double @llvm.vector.reduce.fmul.v8f64(double 1.000000e+00, <8 x double> %{{.*}})
	return _mm512_mask_reduce_mul_pd(__M, __W);			return _mm512_mask_reduce_mul_pd(__M, __W);
	}			}

	float test_mm512_mask_reduce_add_ps(__mmask16 __M, __m512 __W){			float test_mm512_mask_reduce_add_ps(__mmask16 __M, __m512 __W){
	// CHECK-LABEL: @test_mm512_mask_reduce_add_ps(			// CHECK-LABEL: @test_mm512_mask_reduce_add_ps(
	// CHECK: bitcast i16 %{{.*}} to <16 x i1>			// CHECK: bitcast i16 %{{.*}} to <16 x i1>
	// CHECK: select <16 x i1> %{{.}}, <16 x float> {{.}}, <16 x float> {{.*}}			// CHECK: select <16 x i1> %{{.}}, <16 x float> {{.}}, <16 x float> {{.*}}
	// CHECK: call float @llvm.vector.reduce.fadd.v16f32(float 0.000000e+00, <16 x float> %{{.*}})			// CHECK: call reassoc float @llvm.vector.reduce.fadd.v16f32(float 0.000000e+00, <16 x float> %{{.*}})
	return _mm512_mask_reduce_add_ps(__M, __W);			return _mm512_mask_reduce_add_ps(__M, __W);
	}			}

	float test_mm512_mask_reduce_mul_ps(__mmask16 __M, __m512 __W){			float test_mm512_mask_reduce_mul_ps(__mmask16 __M, __m512 __W){
	// CHECK-LABEL: @test_mm512_mask_reduce_mul_ps(			// CHECK-LABEL: @test_mm512_mask_reduce_mul_ps(
	// CHECK: bitcast i16 %{{.*}} to <16 x i1>			// CHECK: bitcast i16 %{{.*}} to <16 x i1>
	// CHECK: select <16 x i1> %{{.}}, <16 x float> {{.}}, <16 x float> %{{.*}}			// CHECK: select <16 x i1> %{{.}}, <16 x float> {{.}}, <16 x float> %{{.*}}
	// CHECK: call float @llvm.vector.reduce.fmul.v16f32(float 1.000000e+00, <16 x float> %{{.*}})			// CHECK: call reassoc float @llvm.vector.reduce.fmul.v16f32(float 1.000000e+00, <16 x float> %{{.*}})
	return _mm512_mask_reduce_mul_ps(__M, __W);			return _mm512_mask_reduce_mul_ps(__M, __W);
	}			}