This is an archive of the discontinued LLVM Phabricator instance.

[AVX512] Don't create SHRUNKBLEND SDNodes for 512-bit vectors.x
ClosedPublic

Authored by craig.topper on Aug 21 2017, 5:51 PM.

Download Raw Diff

Details

Reviewers

guyblank
zvi
RKSimon
spatel
delena

Commits

rGd0c62f909fa4: Merging r311572: --------------------------------------------------------------…
rG853a8d9ffcf7: [AVX512] Don't create SHRUNKBLEND SDNodes for 512-bit vectors
rL311593: Merging r311572:
rL311572: [AVX512] Don't create SHRUNKBLEND SDNodes for 512-bit vectors

Summary

There are no 512-bit blend instructions so we shouldn't create SHRUNKBLEND for them.

On a side note, it looks like there may be a missed opportunity for constant folding TESTM when LHS and RHS are equal.

This fixes PR34139.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Aug 21 2017, 5:51 PM

delena added a subscriber: delena.Aug 22 2017, 11:37 PM

delena added inline comments.

test/CodeGen/X86/pr34139.ll
13 ↗	(On Diff #112097)	Could you, please, explain me how <16 x double> value is stored using one ZMM instruction?

zvi added inline comments.Aug 23 2017, 6:40 AM

lib/Target/X86/X86ISelLowering.cpp
30679 ↗	(On Diff #112097)	Any chance that due to the added bail-out we will be missing out on this combine?

craig.topper added inline comments.Aug 23 2017, 9:11 AM

lib/Target/X86/X86ISelLowering.cpp
30679 ↗	(On Diff #112097)	This combine runs on the very last DAG combine. The one above runs on earlier DAG combine. So I don't think there's an issue. If there was, I think the early out on BitWidth==1 above would be much worse.
test/CodeGen/X86/pr34139.ll
13 ↗	(On Diff #112097)	I think its because the IR is using a store to undef as its address. So I think we sort of merged the stores. If i put in a real address we get two stores. I'll try to unreduce the test case a little

Use a less reduced test case so that we still get multiple stores

delena accepted this revision.Aug 23 2017, 9:28 AM

This revision is now accepted and ready to land.Aug 23 2017, 9:28 AM

Closed by commit rL311572: [AVX512] Don't create SHRUNKBLEND SDNodes for 512-bit vectors (authored by ctopper). · Explain WhyAug 23 2017, 9:42 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

3 lines

test/

CodeGen/

X86/

pr34139.ll

24 lines

Diff 112391

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 30,622 Lines • ▼ Show 20 Lines	if (N->getOpcode() == ISD::VSELECT && DCI.isBeforeLegalizeOps() &&
if (VT.getVectorElementType() == MVT::i16)		if (VT.getVectorElementType() == MVT::i16)
return SDValue();		return SDValue();
// Dynamic blending was only available from SSE4.1 onward.		// Dynamic blending was only available from SSE4.1 onward.
if (VT.is128BitVector() && !Subtarget.hasSSE41())		if (VT.is128BitVector() && !Subtarget.hasSSE41())
return SDValue();		return SDValue();
// Byte blends are only available in AVX2		// Byte blends are only available in AVX2
if (VT == MVT::v32i8 && !Subtarget.hasAVX2())		if (VT == MVT::v32i8 && !Subtarget.hasAVX2())
return SDValue();		return SDValue();
		// There are no 512-bit blend instructions that use sign bits.
		if (VT.is512BitVector())
		return SDValue();

assert(BitWidth >= 8 && BitWidth <= 64 && "Invalid mask size");		assert(BitWidth >= 8 && BitWidth <= 64 && "Invalid mask size");
APInt DemandedMask(APInt::getSignMask(BitWidth));		APInt DemandedMask(APInt::getSignMask(BitWidth));
KnownBits Known;		KnownBits Known;
TargetLowering::TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),		TargetLowering::TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),
!DCI.isBeforeLegalizeOps());		!DCI.isBeforeLegalizeOps());
if (TLI.ShrinkDemandedConstant(Cond, DemandedMask, TLO) \|\|		if (TLI.ShrinkDemandedConstant(Cond, DemandedMask, TLO) \|\|
TLI.SimplifyDemandedBits(Cond, DemandedMask, Known, TLO)) {		TLI.SimplifyDemandedBits(Cond, DemandedMask, Known, TLO)) {
▲ Show 20 Lines • Show All 6,164 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/pr34139.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=knl \| FileCheck %s

				define void @f_f(<16 x double>* %ptr) {
				; CHECK-LABEL: f_f:
				; CHECK: # BB#0:
				; CHECK-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; CHECK-NEXT: vmovdqa %xmm0, (%rax)
				; CHECK-NEXT: vpternlogd $255, %zmm0, %zmm0, %zmm0
				; CHECK-NEXT: vmovapd (%rdi), %zmm1
				; CHECK-NEXT: vmovapd 64(%rdi), %zmm2
				; CHECK-NEXT: vptestmq %zmm0, %zmm0, %k1
				; CHECK-NEXT: vmovapd %zmm0, %zmm1 {%k1}
				; CHECK-NEXT: vmovapd %zmm0, %zmm2 {%k1}
				; CHECK-NEXT: vmovapd %zmm2, 64(%rdi)
				; CHECK-NEXT: vmovapd %zmm1, (%rdi)
				store <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8>* undef
				%load_mask8.i.i.i = load <16 x i8>, <16 x i8>* undef
				%v.i.i.i.i = load <16 x double>, <16 x double>* %ptr
				%mask_vec_i1.i.i.i51.i.i = icmp ne <16 x i8> %load_mask8.i.i.i, zeroinitializer
				%v1.i.i.i.i = select <16 x i1> %mask_vec_i1.i.i.i51.i.i, <16 x double> undef, <16 x double> %v.i.i.i.i
				store <16 x double> %v1.i.i.i.i, <16 x double>* %ptr
				unreachable
				}