This is an archive of the discontinued LLVM Phabricator instance.

[X86] Disable commuting for the first source operand of zero masked scalar fma intrinsic instructions.
ClosedPublic

Authored by craig.topper on Mar 3 2020, 7:40 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
LiuChen3

Commits

rG6ca96765c7e6: [X86] Disable commuting for the first source operand of zero masked scalar fma…

Summary

I believe this is the correct fix for D75506 rather than disabling all commuting. We can still commute the remaining two sources.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.Mar 3 2020, 7:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2020, 7:40 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B47911: Diff 247902.Mar 3 2020, 8:00 AM

spatel added inline comments.Mar 3 2020, 8:58 AM

llvm/test/CodeGen/X86/avx512-intrinsics.ll

5816

Not too familiar with this code path, but we can shrink this test a bit and still crash:

define <4 x float> @test_int_x86_avx512_maskz_vfmadd_ss_load0(i1 zeroext %t0, <4 x float>* nocapture readonly %t1, float %t2, float %t3) {
  %t5 = load <4 x float>, <4 x float>* %t1, align 16
  %t6 = extractelement <4 x float> %t5, i64 0
  %t9 = tail call float @llvm.fma.f32(float %t6, float %t2, float %t3) #2
  %t12 = select i1 %t0, float %t9, float 0.0
  %t13 = insertelement <4 x float> %t5, float %t12, i64 0
  ret <4 x float> %t13
}

Simplify test

RKSimon added inline comments.Mar 3 2020, 10:03 AM

llvm/test/CodeGen/X86/avx512-intrinsics.ll
5832	(sidenote) losing the mask register from the asm comment is really bad.......

LGTM

This revision is now accepted and ready to land.Mar 3 2020, 11:08 AM

LiuChen3 mentioned this in D75506: [X86] Fix bug: Scalar FMA intrinsics generate wrong result.Mar 4 2020, 5:39 PM

Fixed in 6ca96765c7e6f63b45e6c311918a648ef684ea20 but I mistyped the Differential Revision line

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86InstrInfo.cpp

4 lines

test/

CodeGen/

X86/

avx512-intrinsics.ll

31 lines

Diff 247902

llvm/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,877 Lines • ▼ Show 20 Lines	X86InstrInfo::findThreeSrcCommutedOpIndices(const MachineInstr &MI,
bool IsIntrinsic) const {		bool IsIntrinsic) const {
uint64_t TSFlags = MI.getDesc().TSFlags;		uint64_t TSFlags = MI.getDesc().TSFlags;

unsigned FirstCommutableVecOp = 1;		unsigned FirstCommutableVecOp = 1;
unsigned LastCommutableVecOp = 3;		unsigned LastCommutableVecOp = 3;
unsigned KMaskOp = -1U;		unsigned KMaskOp = -1U;
if (X86II::isKMasked(TSFlags)) {		if (X86II::isKMasked(TSFlags)) {
// For k-zero-masked operations it is Ok to commute the first vector		// For k-zero-masked operations it is Ok to commute the first vector
// operand.		// operand. Unless this is an intrinsic instruction.
// For regular k-masked operations a conservative choice is done as the		// For regular k-masked operations a conservative choice is done as the
// elements of the first vector operand, for which the corresponding bit		// elements of the first vector operand, for which the corresponding bit
// in the k-mask operand is set to 0, are copied to the result of the		// in the k-mask operand is set to 0, are copied to the result of the
// instruction.		// instruction.
// TODO/FIXME: The commute still may be legal if it is known that the		// TODO/FIXME: The commute still may be legal if it is known that the
// k-mask operand is set to either all ones or all zeroes.		// k-mask operand is set to either all ones or all zeroes.
// It is also Ok to commute the 1st operand if all users of MI use only		// It is also Ok to commute the 1st operand if all users of MI use only
// the elements enabled by the k-mask operand. For example,		// the elements enabled by the k-mask operand. For example,
// v4 = VFMADD213PSZrk v1, k, v2, v3; // v1[i] = k[i] ? v2[i]*v1[i]+v3[i]		// v4 = VFMADD213PSZrk v1, k, v2, v3; // v1[i] = k[i] ? v2[i]*v1[i]+v3[i]
// : v1[i];		// : v1[i];
// VMOVAPSZmrk <mem_addr>, k, v4; // this is the ONLY user of v4 ->		// VMOVAPSZmrk <mem_addr>, k, v4; // this is the ONLY user of v4 ->
// // Ok, to commute v1 in FMADD213PSZrk.		// // Ok, to commute v1 in FMADD213PSZrk.

// The k-mask operand has index = 2 for masked and zero-masked operations.		// The k-mask operand has index = 2 for masked and zero-masked operations.
KMaskOp = 2;		KMaskOp = 2;

// The operand with index = 1 is used as a source for those elements for		// The operand with index = 1 is used as a source for those elements for
// which the corresponding bit in the k-mask is set to 0.		// which the corresponding bit in the k-mask is set to 0.
if (X86II::isKMergeMasked(TSFlags))		if (X86II::isKMergeMasked(TSFlags) \|\| IsIntrinsic)
FirstCommutableVecOp = 3;		FirstCommutableVecOp = 3;

LastCommutableVecOp++;		LastCommutableVecOp++;
} else if (IsIntrinsic) {		} else if (IsIntrinsic) {
// Commuting the first operand of an intrinsic instruction isn't possible		// Commuting the first operand of an intrinsic instruction isn't possible
// unless we can prove that only the lowest element of the result is used.		// unless we can prove that only the lowest element of the result is used.
FirstCommutableVecOp = 2;		FirstCommutableVecOp = 2;
}		}
▲ Show 20 Lines • Show All 6,419 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512-intrinsics.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,806 Lines • ▼ Show 20 Lines	; X86-NEXT: retl
%13 = bitcast i8 %x3 to <8 x i1>		%13 = bitcast i8 %x3 to <8 x i1>
%14 = extractelement <8 x i1> %13, i64 0		%14 = extractelement <8 x i1> %13, i64 0
%15 = select i1 %14, float %12, float 0.000000e+00		%15 = select i1 %14, float %12, float 0.000000e+00
%16 = insertelement <4 x float> %x0, float %15, i64 0		%16 = insertelement <4 x float> %x0, float %15, i64 0
%res2 = fadd <4 x float> %8, %16		%res2 = fadd <4 x float> %8, %16
ret <4 x float> %8		ret <4 x float> %8
}		}

		; Make sure we don't commute this to fold the load as that source isn't commutable.
		define <4 x float> @test_int_x86_avx512_maskz_vfmadd_ss_load0(i8 zeroext %0, <4 x float>* nocapture readonly %1, <4 x float> %2, <4 x float> %3) {
		spatelUnsubmitted Not Done Reply Inline Actions Not too familiar with this code path, but we can shrink this test a bit and still crash: define <4 x float> @test_int_x86_avx512_maskz_vfmadd_ss_load0(i1 zeroext %t0, <4 x float>* nocapture readonly %t1, float %t2, float %t3) { %t5 = load <4 x float>, <4 x float>* %t1, align 16 %t6 = extractelement <4 x float> %t5, i64 0 %t9 = tail call float @llvm.fma.f32(float %t6, float %t2, float %t3) #2 %t12 = select i1 %t0, float %t9, float 0.0 %t13 = insertelement <4 x float> %t5, float %t12, i64 0 ret <4 x float> %t13 } spatel: Not too familiar with this code path, but we can shrink this test a bit and still crash: ```…
		; X64-LABEL: test_int_x86_avx512_maskz_vfmadd_ss_load0:
		; X64: # %bb.0:
		; X64-NEXT: vmovaps (%rsi), %xmm2
		; X64-NEXT: kmovw %edi, %k1
		; X64-NEXT: vfmadd213ss {{.#+}} xmm2 = (xmm0 xmm2) + xmm1
		; X64-NEXT: vmovaps %xmm2, %xmm0
		; X64-NEXT: retq
		;
		; X86-LABEL: test_int_x86_avx512_maskz_vfmadd_ss_load0:
		; X86: # %bb.0:
		; X86-NEXT: movb {{[0-9]+}}(%esp), %al
		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
		; X86-NEXT: vmovaps (%ecx), %xmm2
		; X86-NEXT: kmovw %eax, %k1
		; X86-NEXT: vfmadd213ss {{.#+}} xmm2 = (xmm0 xmm2) + xmm1
		; X86-NEXT: vmovaps %xmm2, %xmm0
		RKSimonUnsubmitted Not Done Reply Inline Actions (sidenote) losing the mask register from the asm comment is really bad....... RKSimon: (sidenote) losing the mask register from the asm comment is really bad.......
		; X86-NEXT: retl
		%5 = load <4 x float>, <4 x float>* %1, align 16
		%6 = extractelement <4 x float> %5, i64 0
		%7 = extractelement <4 x float> %2, i64 0
		%8 = extractelement <4 x float> %3, i64 0
		%9 = tail call float @llvm.fma.f32(float %6, float %7, float %8) #2
		%10 = bitcast i8 %0 to <8 x i1>
		%11 = extractelement <8 x i1> %10, i64 0
		%12 = select i1 %11, float %9, float 0.000000e+00
		%13 = insertelement <4 x float> %5, float %12, i64 0
		ret <4 x float> %13
		}

define <2 x double>@test_int_x86_avx512_mask3_vfmadd_sd(<2 x double> %x0, <2 x double> %x1, <2 x double> %x2, i8 %x3,i32 %x4 ){		define <2 x double>@test_int_x86_avx512_mask3_vfmadd_sd(<2 x double> %x0, <2 x double> %x1, <2 x double> %x2, i8 %x3,i32 %x4 ){
; X64-LABEL: test_int_x86_avx512_mask3_vfmadd_sd:		; X64-LABEL: test_int_x86_avx512_mask3_vfmadd_sd:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vmovapd %xmm2, %xmm3		; X64-NEXT: vmovapd %xmm2, %xmm3
; X64-NEXT: vfmadd231sd {{.#+}} xmm3 = (xmm0 xmm1) + xmm3		; X64-NEXT: vfmadd231sd {{.#+}} xmm3 = (xmm0 xmm1) + xmm3
; X64-NEXT: kmovw %edi, %k1		; X64-NEXT: kmovw %edi, %k1
; X64-NEXT: vmovapd %xmm2, %xmm4		; X64-NEXT: vmovapd %xmm2, %xmm4
; X64-NEXT: vfmadd231sd {{.#+}} xmm4 = (xmm0 xmm1) + xmm4		; X64-NEXT: vfmadd231sd {{.#+}} xmm4 = (xmm0 xmm1) + xmm4
▲ Show 20 Lines • Show All 1,736 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Disable commuting for the first source operand of zero masked scalar fma intrinsic instructions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 247902

llvm/lib/Target/X86/X86InstrInfo.cpp

llvm/test/CodeGen/X86/avx512-intrinsics.ll

[X86] Disable commuting for the first source operand of zero masked scalar fma intrinsic instructions.
ClosedPublic