This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
4/8
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
1/2
vector-bo-select.ll

Differential D119111

[X86] Invert a vector select IR canonicalization with a binop identity constant
AbandonedPublic

Authored by LuoYuanke on Feb 7 2022, 12:33 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
craig.topper
pengfei
xbolva00

Summary

This patch follows https://reviews.llvm.org/D118644 to invert fmul and
fdiv in X86 backend when AVX512 is available.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

LuoYuanke created this revision.Feb 7 2022, 12:33 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptFeb 7 2022, 12:33 AM

LuoYuanke requested review of this revision.Feb 7 2022, 12:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2022, 12:33 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

LuoYuanke added reviewers: spatel, RKSimon, craig.topper, pengfei, xbolva00.Feb 7 2022, 12:34 AM

Harbormaster completed remote builds in B147889: Diff 406341.Feb 7 2022, 1:10 AM

pengfei added inline comments.Feb 7 2022, 1:10 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
48950	`/`
49012–49015	Equals to `return combineBinopWithSelect(N, DAG, Subtarget)`?
53883	Or call `combineBinopWithSelect` directly?
llvm/test/CodeGen/X86/vector-bo-select.ll
345–347	Curiously: it should be equal to vmulps %zmm2, %zmm1, %zmm1 {%k1} vmovaps %zmm1, %zmm0 Why sometimes we use this way, sometime another?

LuoYuanke added inline comments.Feb 7 2022, 1:41 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
48950	Sorry, I don't understand this comment.
49012–49015	OK, I'll update it.
53883	I prefer to following the current coding convention, so that when there is more to combine we can extent the code in the sub-function.
llvm/test/CodeGen/X86/vector-bo-select.ll
345–347	I guess it is because sometime it is commuted or swapped, sometimes it is not.

pengfei added inline comments.Feb 7 2022, 1:43 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
48950	`X / 1.0 --> X`

LuoYuanke added inline comments.Feb 7 2022, 1:47 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
48950	Got it. :)

Address Phoebe's comments.

Harbormaster completed remote builds in B147898: Diff 406353.Feb 7 2022, 2:35 AM

Thanks for working on this.
I have a patch to move the existing code over to DAGCombiner with a TLI hook that is only enabled for x86 currently, so it would be effectively NFC.
That would allow us to avoid adding code for custom combining fmul/fdiv to x86 (just add cases for the new opcodes in DAGCombiner).
I'm not sure if one way is more efficient than the other, but let me post that for review.

spatel mentioned this in D119150: [SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC.Feb 7 2022, 9:08 AM

See D119150.

I think this patch is good either way, but there would be less diffs if we move the existing code over as a first step.

In D119111#3301516, @spatel wrote:

See D119150.

I think this patch is good either way, but there would be less diffs if we move the existing code over as a first step.

Ok, we can improve more operator based on D119150. I'll abandon this patch.

LuoYuanke abandoned this revision.Feb 7 2022, 6:37 PM

spatel mentioned this in D90113: [DAGCombiner] Fold BinOp into Select containing identity constant.Feb 8 2022, 5:16 AM

spatel mentioned this in rGa68e09802470: [SDAG] move x86 select-with-identity-constant fold behind a target hook; NFC.Feb 8 2022, 6:55 AM

spatel mentioned this in rG905abc5b7db2: [SDAG] enable binop identity constant folds for fmul/fdiv.Feb 8 2022, 7:52 AM

Added FMUL/FDIV to the updated code here:
905abc5b7db2 (test diffs are identical)

In D119111#3304769, @spatel wrote:

Added FMUL/FDIV to the updated code here:
905abc5b7db2 (test diffs are identical)

Thanks, Sanjay.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

17 lines

test/

CodeGen/

X86/

vector-bo-select.ll

39 lines

Diff 406353

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,252 Lines • ▼ Show 20 Lines
setTargetDAGCombine(ISD::MUL);		setTargetDAGCombine(ISD::MUL);
setTargetDAGCombine(ISD::XOR);		setTargetDAGCombine(ISD::XOR);
setTargetDAGCombine(ISD::MSCATTER);		setTargetDAGCombine(ISD::MSCATTER);
setTargetDAGCombine(ISD::MGATHER);		setTargetDAGCombine(ISD::MGATHER);
setTargetDAGCombine(ISD::FP16_TO_FP);		setTargetDAGCombine(ISD::FP16_TO_FP);
setTargetDAGCombine(ISD::FP_EXTEND);		setTargetDAGCombine(ISD::FP_EXTEND);
setTargetDAGCombine(ISD::STRICT_FP_EXTEND);		setTargetDAGCombine(ISD::STRICT_FP_EXTEND);
setTargetDAGCombine(ISD::FP_ROUND);		setTargetDAGCombine(ISD::FP_ROUND);
		setTargetDAGCombine(ISD::FMUL);
		setTargetDAGCombine(ISD::FDIV);

computeRegisterProperties(Subtarget.getRegisterInfo());		computeRegisterProperties(Subtarget.getRegisterInfo());

MaxStoresPerMemset = 16; // For @llvm.memset -> sequence of stores		MaxStoresPerMemset = 16; // For @llvm.memset -> sequence of stores
MaxStoresPerMemsetOptSize = 8;		MaxStoresPerMemsetOptSize = 8;
MaxStoresPerMemcpy = 8; // For @llvm.memcpy -> sequence of stores		MaxStoresPerMemcpy = 8; // For @llvm.memcpy -> sequence of stores
MaxStoresPerMemcpyOptSize = 4;		MaxStoresPerMemcpyOptSize = 4;
MaxStoresPerMemmove = 8; // For @llvm.memmove -> sequence of stores		MaxStoresPerMemmove = 8; // For @llvm.memmove -> sequence of stores
▲ Show 20 Lines • Show All 46,670 Lines • ▼ Show 20 Lines	static SDValue foldSelectWithIdentityConstant(SDNode *N, SelectionDAG &DAG,
// TODO: With fast-math (NSZ), allow the opposite-sign form of zero?		// TODO: With fast-math (NSZ), allow the opposite-sign form of zero?
auto isIdentityConstantForOpcode = [](unsigned Opcode, SDValue V) {		auto isIdentityConstantForOpcode = [](unsigned Opcode, SDValue V) {
if (ConstantFPSDNode *C = isConstOrConstSplatFP(V)) {		if (ConstantFPSDNode *C = isConstOrConstSplatFP(V)) {
switch (Opcode) {		switch (Opcode) {
case ISD::FADD: // X + -0.0 --> X		case ISD::FADD: // X + -0.0 --> X
return C->isZero() && C->isNegative();		return C->isZero() && C->isNegative();
case ISD::FSUB: // X - 0.0 --> X		case ISD::FSUB: // X - 0.0 --> X
return C->isZero() && !C->isNegative();		return C->isZero() && !C->isNegative();
		case ISD::FMUL: // X * 1.0 --> X
		case ISD::FDIV: // X / 1.0 --> X
		pengfeiUnsubmitted Not Done Reply Inline Actions `/` pengfei: `/`
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Sorry, I don't understand this comment. LuoYuanke: Sorry, I don't understand this comment.
		pengfeiUnsubmitted Not Done Reply Inline Actions `X / 1.0 --> X` pengfei: `X / 1.0 --> X`
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Got it. :) LuoYuanke: Got it. :)
		return C->isExactlyValue(1.0);
}		}
}		}
return false;		return false;
};		};

// This transform increases uses of N0, so freeze it to be safe.		// This transform increases uses of N0, so freeze it to be safe.
// binop N0, (vselect Cond, IDC, FVal) --> vselect Cond, N0, (binop N0, FVal)		// binop N0, (vselect Cond, IDC, FVal) --> vselect Cond, N0, (binop N0, FVal)
if (isIdentityConstantForOpcode(Opcode, TVal)) {		if (isIdentityConstantForOpcode(Opcode, TVal)) {
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	if (SDValue COp = combineFaddCFmul(N, DAG, Subtarget))
return COp;		return COp;

if (SDValue Sel = combineBinopWithSelect(N, DAG, Subtarget))		if (SDValue Sel = combineBinopWithSelect(N, DAG, Subtarget))
return Sel;		return Sel;

return SDValue();		return SDValue();
}		}

		static SDValue combineFmul(SDNode *N, SelectionDAG &DAG,
		const X86Subtarget &Subtarget) {
		return combineBinopWithSelect(N, DAG, Subtarget);
		}

		static SDValue combineFdiv(SDNode *N, SelectionDAG &DAG,
		pengfeiUnsubmitted Not Done Reply Inline Actions Equals to `return combineBinopWithSelect(N, DAG, Subtarget)`? pengfei: Equals to `return combineBinopWithSelect(N, DAG, Subtarget)`?
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions OK, I'll update it. LuoYuanke: OK, I'll update it.
		const X86Subtarget &Subtarget) {
		return combineBinopWithSelect(N, DAG, Subtarget);
		}

/// Attempt to pre-truncate inputs to arithmetic ops if it will simplify		/// Attempt to pre-truncate inputs to arithmetic ops if it will simplify
/// the codegen.		/// the codegen.
/// e.g. TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) )		/// e.g. TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) )
/// TODO: This overlaps with the generic combiner's visitTRUNCATE. Remove		/// TODO: This overlaps with the generic combiner's visitTRUNCATE. Remove
/// anything that is guaranteed to be transformed by DAGCombiner.		/// anything that is guaranteed to be transformed by DAGCombiner.
static SDValue combineTruncatedArithmetic(SDNode *N, SelectionDAG &DAG,		static SDValue combineTruncatedArithmetic(SDNode *N, SelectionDAG &DAG,
const X86Subtarget &Subtarget,		const X86Subtarget &Subtarget,
const SDLoc &DL) {		const SDLoc &DL) {
▲ Show 20 Lines • Show All 4,847 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::PerformDAGCombine(SDNode *N,
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::STRICT_SINT_TO_FP:		case ISD::STRICT_SINT_TO_FP:
return combineSIntToFP(N, DAG, DCI, Subtarget);		return combineSIntToFP(N, DAG, DCI, Subtarget);
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
case ISD::STRICT_UINT_TO_FP:		case ISD::STRICT_UINT_TO_FP:
return combineUIntToFP(N, DAG, Subtarget);		return combineUIntToFP(N, DAG, Subtarget);
case ISD::FADD:		case ISD::FADD:
case ISD::FSUB: return combineFaddFsub(N, DAG, Subtarget);		case ISD::FSUB: return combineFaddFsub(N, DAG, Subtarget);
		case ISD::FMUL: return combineFmul(N, DAG, Subtarget);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case ISD::FMUL: return combineFmul(N, DAG, Subtarget); - case ISD::FDIV: return combineFdiv(N, DAG, Subtarget); + case ISD::FMUL: + return combineFmul(N, DAG, Subtarget); + case ISD::FDIV: + return combineFdiv(N, DAG, Subtarget); Lint: Pre-merge checks: clang-format: please reformat the code ``` - case ISD::FMUL: return combineFmul(N…
		pengfeiUnsubmitted Not Done Reply Inline Actions Or call `combineBinopWithSelect` directly? pengfei: Or call `combineBinopWithSelect` directly?
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions I prefer to following the current coding convention, so that when there is more to combine we can extent the code in the sub-function. LuoYuanke: I prefer to following the current coding convention, so that when there is more to combine we…
		case ISD::FDIV: return combineFdiv(N, DAG, Subtarget);
case X86ISD::VFCMULC:		case X86ISD::VFCMULC:
case X86ISD::VFMULC: return combineFMulcFCMulc(N, DAG, Subtarget);		case X86ISD::VFMULC: return combineFMulcFCMulc(N, DAG, Subtarget);
case ISD::FNEG: return combineFneg(N, DAG, DCI, Subtarget);		case ISD::FNEG: return combineFneg(N, DAG, DCI, Subtarget);
case ISD::TRUNCATE: return combineTruncate(N, DAG, Subtarget);		case ISD::TRUNCATE: return combineTruncate(N, DAG, Subtarget);
case X86ISD::VTRUNC: return combineVTRUNC(N, DAG, DCI);		case X86ISD::VTRUNC: return combineVTRUNC(N, DAG, DCI);
case X86ISD::ANDNP: return combineAndnp(N, DAG, DCI, Subtarget);		case X86ISD::ANDNP: return combineAndnp(N, DAG, DCI, Subtarget);
case X86ISD::FAND: return combineFAnd(N, DAG, Subtarget);		case X86ISD::FAND: return combineFAnd(N, DAG, Subtarget);
case X86ISD::FANDN: return combineFAndn(N, DAG, Subtarget);		case X86ISD::FANDN: return combineFAndn(N, DAG, Subtarget);
▲ Show 20 Lines • Show All 1,429 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-bo-select.ll

	Show First 20 Lines • Show All 273 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vmulps %xmm0, %xmm1, %xmm0			; AVX512F-NEXT: vmulps %xmm0, %xmm1, %xmm0
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: fmul_v4f32:			; AVX512VL-LABEL: fmul_v4f32:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpslld $31, %xmm0, %xmm0			; AVX512VL-NEXT: vpslld $31, %xmm0, %xmm0
	; AVX512VL-NEXT: vptestmd %xmm0, %xmm0, %k1			; AVX512VL-NEXT: vptestmd %xmm0, %xmm0, %k1
	; AVX512VL-NEXT: vbroadcastss {{.*#+}} xmm0 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; AVX512VL-NEXT: vmulps %xmm2, %xmm1, %xmm1 {%k1}
	; AVX512VL-NEXT: vmovaps %xmm2, %xmm0 {%k1}			; AVX512VL-NEXT: vmovaps %xmm1, %xmm0
	; AVX512VL-NEXT: vmulps %xmm0, %xmm1, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%s = select <4 x i1> %b, <4 x float> %y, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>			%s = select <4 x i1> %b, <4 x float> %y, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>
	%r = fmul <4 x float> %x, %s			%r = fmul <4 x float> %x, %s
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <8 x float> @fmul_v8f32_commute(<8 x i1> %b, <8 x float> noundef %x, <8 x float> noundef %y) {			define <8 x float> @fmul_v8f32_commute(<8 x i1> %b, <8 x float> noundef %x, <8 x float> noundef %y) {
	; AVX2-LABEL: fmul_v8f32_commute:			; AVX2-LABEL: fmul_v8f32_commute:
	Show All 16 Lines
	; AVX512F-NEXT: vmulps %ymm1, %ymm0, %ymm0			; AVX512F-NEXT: vmulps %ymm1, %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: fmul_v8f32_commute:			; AVX512VL-LABEL: fmul_v8f32_commute:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpmovsxwd %xmm0, %ymm0			; AVX512VL-NEXT: vpmovsxwd %xmm0, %ymm0
	; AVX512VL-NEXT: vpslld $31, %ymm0, %ymm0			; AVX512VL-NEXT: vpslld $31, %ymm0, %ymm0
	; AVX512VL-NEXT: vptestmd %ymm0, %ymm0, %k1			; AVX512VL-NEXT: vptestmd %ymm0, %ymm0, %k1
	; AVX512VL-NEXT: vbroadcastss {{.*#+}} ymm0 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; AVX512VL-NEXT: vmulps %ymm2, %ymm1, %ymm1 {%k1}
	; AVX512VL-NEXT: vmovaps %ymm2, %ymm0 {%k1}			; AVX512VL-NEXT: vmovaps %ymm1, %ymm0
	; AVX512VL-NEXT: vmulps %ymm1, %ymm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%s = select <8 x i1> %b, <8 x float> %y, <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>			%s = select <8 x i1> %b, <8 x float> %y, <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>
	%r = fmul <8 x float> %s, %x			%r = fmul <8 x float> %s, %x
	ret <8 x float> %r			ret <8 x float> %r
	}			}

	define <16 x float> @fmul_v16f32_swap(<16 x i1> %b, <16 x float> noundef %x, <16 x float> noundef %y) {			define <16 x float> @fmul_v16f32_swap(<16 x i1> %b, <16 x float> noundef %x, <16 x float> noundef %y) {
	; AVX2-LABEL: fmul_v16f32_swap:			; AVX2-LABEL: fmul_v16f32_swap:
	Show All 11 Lines
	; AVX2-NEXT: vmulps %ymm4, %ymm2, %ymm1			; AVX2-NEXT: vmulps %ymm4, %ymm2, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: fmul_v16f32_swap:			; AVX512-LABEL: fmul_v16f32_swap:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpmovsxbd %xmm0, %zmm0			; AVX512-NEXT: vpmovsxbd %xmm0, %zmm0
	; AVX512-NEXT: vpslld $31, %zmm0, %zmm0			; AVX512-NEXT: vpslld $31, %zmm0, %zmm0
	; AVX512-NEXT: vptestmd %zmm0, %zmm0, %k1			; AVX512-NEXT: vptestmd %zmm0, %zmm0, %k1
	; AVX512-NEXT: vbroadcastss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2 {%k1}
	; AVX512-NEXT: vmulps %zmm2, %zmm1, %zmm0			; AVX512-NEXT: vmulps %zmm2, %zmm1, %zmm0
				; AVX512-NEXT: vmovaps %zmm1, %zmm0 {%k1}
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
				pengfeiUnsubmitted Not Done Reply Inline Actions Curiously: it should be equal to vmulps %zmm2, %zmm1, %zmm1 {%k1} vmovaps %zmm1, %zmm0 Why sometimes we use this way, sometime another? pengfei: Curiously: it should be equal to ``` vmulps %zmm2, %zmm1, %zmm1 {%k1} vmovaps %zmm1, %zmm0 ```…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions I guess it is because sometime it is commuted or swapped, sometimes it is not. LuoYuanke: I guess it is because sometime it is commuted or swapped, sometimes it is not.
	%s = select <16 x i1> %b, <16 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, <16 x float> %y			%s = select <16 x i1> %b, <16 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, <16 x float> %y
	%r = fmul <16 x float> %x, %s			%r = fmul <16 x float> %x, %s
	ret <16 x float> %r			ret <16 x float> %r
	}			}

	define <16 x float> @fmul_v16f32_commute_swap(<16 x i1> %b, <16 x float> noundef %x, <16 x float> noundef %y) {			define <16 x float> @fmul_v16f32_commute_swap(<16 x i1> %b, <16 x float> noundef %x, <16 x float> noundef %y) {
	; AVX2-LABEL: fmul_v16f32_commute_swap:			; AVX2-LABEL: fmul_v16f32_commute_swap:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	Show All 10 Lines
	; AVX2-NEXT: vmulps %ymm2, %ymm4, %ymm1			; AVX2-NEXT: vmulps %ymm2, %ymm4, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: fmul_v16f32_commute_swap:			; AVX512-LABEL: fmul_v16f32_commute_swap:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpmovsxbd %xmm0, %zmm0			; AVX512-NEXT: vpmovsxbd %xmm0, %zmm0
	; AVX512-NEXT: vpslld $31, %zmm0, %zmm0			; AVX512-NEXT: vpslld $31, %zmm0, %zmm0
	; AVX512-NEXT: vptestmd %zmm0, %zmm0, %k1			; AVX512-NEXT: vptestmd %zmm0, %zmm0, %k1
	; AVX512-NEXT: vbroadcastss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2 {%k1}			; AVX512-NEXT: vmulps %zmm2, %zmm1, %zmm0
	; AVX512-NEXT: vmulps %zmm1, %zmm2, %zmm0			; AVX512-NEXT: vmovaps %zmm1, %zmm0 {%k1}
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%s = select <16 x i1> %b, <16 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, <16 x float> %y			%s = select <16 x i1> %b, <16 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, <16 x float> %y
	%r = fmul <16 x float> %s, %x			%r = fmul <16 x float> %s, %x
	ret <16 x float> %r			ret <16 x float> %r
	}			}

	define <4 x float> @fdiv_v4f32(<4 x i1> %b, <4 x float> noundef %x, <4 x float> noundef %y) {			define <4 x float> @fdiv_v4f32(<4 x i1> %b, <4 x float> noundef %x, <4 x float> noundef %y) {
	; AVX2-LABEL: fdiv_v4f32:			; AVX2-LABEL: fdiv_v4f32:
	Show All 14 Lines
	; AVX512F-NEXT: vdivps %xmm0, %xmm1, %xmm0			; AVX512F-NEXT: vdivps %xmm0, %xmm1, %xmm0
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: fdiv_v4f32:			; AVX512VL-LABEL: fdiv_v4f32:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpslld $31, %xmm0, %xmm0			; AVX512VL-NEXT: vpslld $31, %xmm0, %xmm0
	; AVX512VL-NEXT: vptestmd %xmm0, %xmm0, %k1			; AVX512VL-NEXT: vptestmd %xmm0, %xmm0, %k1
	; AVX512VL-NEXT: vbroadcastss {{.*#+}} xmm0 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; AVX512VL-NEXT: vdivps %xmm2, %xmm1, %xmm1 {%k1}
	; AVX512VL-NEXT: vmovaps %xmm2, %xmm0 {%k1}			; AVX512VL-NEXT: vmovaps %xmm1, %xmm0
	; AVX512VL-NEXT: vdivps %xmm0, %xmm1, %xmm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%s = select <4 x i1> %b, <4 x float> %y, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>			%s = select <4 x i1> %b, <4 x float> %y, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>
	%r = fdiv <4 x float> %x, %s			%r = fdiv <4 x float> %x, %s
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <8 x float> @fdiv_v8f32_commute(<8 x i1> %b, <8 x float> noundef %x, <8 x float> noundef %y) {			define <8 x float> @fdiv_v8f32_commute(<8 x i1> %b, <8 x float> noundef %x, <8 x float> noundef %y) {
	; AVX2-LABEL: fdiv_v8f32_commute:			; AVX2-LABEL: fdiv_v8f32_commute:
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vdivps %ymm4, %ymm2, %ymm1			; AVX2-NEXT: vdivps %ymm4, %ymm2, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: fdiv_v16f32_swap:			; AVX512-LABEL: fdiv_v16f32_swap:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpmovsxbd %xmm0, %zmm0			; AVX512-NEXT: vpmovsxbd %xmm0, %zmm0
	; AVX512-NEXT: vpslld $31, %zmm0, %zmm0			; AVX512-NEXT: vpslld $31, %zmm0, %zmm0
	; AVX512-NEXT: vptestmd %zmm0, %zmm0, %k1			; AVX512-NEXT: vptestmd %zmm0, %zmm0, %k1
	; AVX512-NEXT: vbroadcastss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm2 {%k1}
	; AVX512-NEXT: vdivps %zmm2, %zmm1, %zmm0			; AVX512-NEXT: vdivps %zmm2, %zmm1, %zmm0
				; AVX512-NEXT: vmovaps %zmm1, %zmm0 {%k1}
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%s = select <16 x i1> %b, <16 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, <16 x float> %y			%s = select <16 x i1> %b, <16 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, <16 x float> %y
	%r = fdiv <16 x float> %x, %s			%r = fdiv <16 x float> %x, %s
	ret <16 x float> %r			ret <16 x float> %r
	}			}

	define <16 x float> @fdiv_v16f32_commute_swap(<16 x i1> %b, <16 x float> noundef %x, <16 x float> noundef %y) {			define <16 x float> @fdiv_v16f32_commute_swap(<16 x i1> %b, <16 x float> noundef %x, <16 x float> noundef %y) {
	; AVX2-LABEL: fdiv_v16f32_commute_swap:			; AVX2-LABEL: fdiv_v16f32_commute_swap:
	▲ Show 20 Lines • Show All 357 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vbroadcastss {{.*#+}} ymm2 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; AVX512F-NEXT: vbroadcastss {{.*#+}} ymm2 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]
	; AVX512F-NEXT: vmovaps %zmm1, %zmm2 {%k1}			; AVX512F-NEXT: vmovaps %zmm1, %zmm2 {%k1}
	; AVX512F-NEXT: vmulps %ymm2, %ymm0, %ymm0			; AVX512F-NEXT: vmulps %ymm2, %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: fmul_v8f32_cast_cond:			; AVX512VL-LABEL: fmul_v8f32_cast_cond:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: kmovw %edi, %k1			; AVX512VL-NEXT: kmovw %edi, %k1
	; AVX512VL-NEXT: vbroadcastss {{.*#+}} ymm2 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; AVX512VL-NEXT: vmulps %ymm1, %ymm0, %ymm0 {%k1}
	; AVX512VL-NEXT: vmovaps %ymm1, %ymm2 {%k1}
	; AVX512VL-NEXT: vmulps %ymm2, %ymm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%b = bitcast i8 %pb to <8 x i1>			%b = bitcast i8 %pb to <8 x i1>
	%s = select <8 x i1> %b, <8 x float> %y, <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>			%s = select <8 x i1> %b, <8 x float> %y, <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>
	%r = fmul <8 x float> %x, %s			%r = fmul <8 x float> %x, %s
	ret <8 x float> %r			ret <8 x float> %r
	}			}

	define <8 x double> @fmul_v8f64_cast_cond(i8 noundef zeroext %pb, <8 x double> noundef %x, <8 x double> noundef %y) {			define <8 x double> @fmul_v8f64_cast_cond(i8 noundef zeroext %pb, <8 x double> noundef %x, <8 x double> noundef %y) {
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vblendvpd %ymm4, %ymm2, %ymm6, %ymm2			; AVX2-NEXT: vblendvpd %ymm4, %ymm2, %ymm6, %ymm2
	; AVX2-NEXT: vmulpd %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vmulpd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vmulpd %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vmulpd %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: fmul_v8f64_cast_cond:			; AVX512-LABEL: fmul_v8f64_cast_cond:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: kmovw %edi, %k1			; AVX512-NEXT: kmovw %edi, %k1
	; AVX512-NEXT: vbroadcastsd {{.*#+}} zmm2 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; AVX512-NEXT: vmulpd %zmm1, %zmm0, %zmm0 {%k1}
	; AVX512-NEXT: vmovapd %zmm1, %zmm2 {%k1}
	; AVX512-NEXT: vmulpd %zmm2, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%b = bitcast i8 %pb to <8 x i1>			%b = bitcast i8 %pb to <8 x i1>
	%s = select <8 x i1> %b, <8 x double> %y, <8 x double> <double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0>			%s = select <8 x i1> %b, <8 x double> %y, <8 x double> <double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0>
	%r = fmul <8 x double> %x, %s			%r = fmul <8 x double> %x, %s
	ret <8 x double> %r			ret <8 x double> %r
	}			}

	define <8 x float> @fdiv_v8f32_cast_cond(i8 noundef zeroext %pb, <8 x float> noundef %x, <8 x float> noundef %y) {			define <8 x float> @fdiv_v8f32_cast_cond(i8 noundef zeroext %pb, <8 x float> noundef %x, <8 x float> noundef %y) {
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: vbroadcastss {{.*#+}} ymm2 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; AVX512F-NEXT: vbroadcastss {{.*#+}} ymm2 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]
	; AVX512F-NEXT: vmovaps %zmm1, %zmm2 {%k1}			; AVX512F-NEXT: vmovaps %zmm1, %zmm2 {%k1}
	; AVX512F-NEXT: vdivps %ymm2, %ymm0, %ymm0			; AVX512F-NEXT: vdivps %ymm2, %ymm0, %ymm0
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: fdiv_v8f32_cast_cond:			; AVX512VL-LABEL: fdiv_v8f32_cast_cond:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: kmovw %edi, %k1			; AVX512VL-NEXT: kmovw %edi, %k1
	; AVX512VL-NEXT: vbroadcastss {{.*#+}} ymm2 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; AVX512VL-NEXT: vdivps %ymm1, %ymm0, %ymm0 {%k1}
	; AVX512VL-NEXT: vmovaps %ymm1, %ymm2 {%k1}
	; AVX512VL-NEXT: vdivps %ymm2, %ymm0, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	%b = bitcast i8 %pb to <8 x i1>			%b = bitcast i8 %pb to <8 x i1>
	%s = select <8 x i1> %b, <8 x float> %y, <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>			%s = select <8 x i1> %b, <8 x float> %y, <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>
	%r = fdiv <8 x float> %x, %s			%r = fdiv <8 x float> %x, %s
	ret <8 x float> %r			ret <8 x float> %r
	}			}

	define <8 x double> @fdiv_v8f64_cast_cond(i8 noundef zeroext %pb, <8 x double> noundef %x, <8 x double> noundef %y) {			define <8 x double> @fdiv_v8f64_cast_cond(i8 noundef zeroext %pb, <8 x double> noundef %x, <8 x double> noundef %y) {
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vblendvpd %ymm4, %ymm2, %ymm6, %ymm2			; AVX2-NEXT: vblendvpd %ymm4, %ymm2, %ymm6, %ymm2
	; AVX2-NEXT: vdivpd %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vdivpd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vdivpd %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vdivpd %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: fdiv_v8f64_cast_cond:			; AVX512-LABEL: fdiv_v8f64_cast_cond:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: kmovw %edi, %k1			; AVX512-NEXT: kmovw %edi, %k1
	; AVX512-NEXT: vbroadcastsd {{.*#+}} zmm2 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0,1.0E+0]			; AVX512-NEXT: vdivpd %zmm1, %zmm0, %zmm0 {%k1}
	; AVX512-NEXT: vmovapd %zmm1, %zmm2 {%k1}
	; AVX512-NEXT: vdivpd %zmm2, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%b = bitcast i8 %pb to <8 x i1>			%b = bitcast i8 %pb to <8 x i1>
	%s = select <8 x i1> %b, <8 x double> %y, <8 x double> <double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0>			%s = select <8 x i1> %b, <8 x double> %y, <8 x double> <double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0, double 1.0>
	%r = fdiv <8 x double> %x, %s			%r = fdiv <8 x double> %x, %s
	ret <8 x double> %r			ret <8 x double> %r
	}			}