Download Raw Diff

Details

Reviewers

lebedev.ri
RKSimon
craig.topper
efriedma

Commits

rG0d2a0b44c812: [VectorCombine] scalarize binop of inserted elements into vector constants

Summary

As with the extractelement patterns that are currently in vector-combine, there are going to be several possible variations on this theme. This should be the clearest, simplest example.

Scalarization is the right direction for target-independent canonicalization, and InstCombine has some of those folds already, but it doesn't do this. I proposed a similar transform in D50992. Here in vector-combine, we can check the cost model to be sure it's profitable, so there should be less risk.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.May 5 2020, 3:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2020, 3:58 PM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

Seems fairly uncontroversial to me.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
320	Hm, do we have an interface to ask for cost of `InsertElement` with variable insert index?
340–341	`// We want to scalarize unless vector variant actually has lower cost`
357	This should retain the name from `I`.

This revision is now accepted and ready to land.May 6 2020, 12:25 AM

RKSimon added inline comments.May 6 2020, 2:41 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
320	Yes we set Index to -1 in the getVectorInstrCost call if its unknown/variable - we don't do much with it in x86 at least though........

Random question - what about special case undef handling for the upper elements?

opt -instcombine

define i32 @square()  {
    %1 = xor i32 undef, undef
    ret i32 %1
}
->
define i32 @square() {
  ret i32 0
}

In D79452#2022321, @RKSimon wrote:
Random question - what about special case undef handling for the upper elements?

opt -instcombine
define i32 @square()  {
    %1 = xor i32 undef, undef
    ret i32 %1
}
->
define i32 @square() {
  ret i32 0
}

cc'ing @aqjune @nlopes @regehr

undef ^ undef is actually undef:
http://volta.cs.utah.edu:8080/z/GAs5QJ

But I think we choose to fold to "0" on that to avoid breaking too much existing source code and/or expose possible semantic incompatibility with source code?

In the original vector code here, we're not performing the binop on the *same* value because 'undef' is an arbitrary bit pattern in each instruction:
http://volta.cs.utah.edu:8080/z/eqrVkm

I was assuming that we don't have the same concern about vector code as the scalar code here, but we could use ConstantFolding to generate the result for each element of the output vector.

But that will definitely need to be handled when we generalize this to deal with non-undef constants with something like the code from D50992.

Patch updated - no logic changes but:

Added code comment to make cost metric explicit - scalarize unless original vector code is cheaper.
Added/retained value names to improve debugging experience (tests regenerated to show diffs).

In D79452#2022567, @spatel wrote:

undef ^ undef is actually undef:
http://volta.cs.utah.edu:8080/z/GAs5QJ

But I think we choose to fold to "0" on that to avoid breaking too much existing source code and/or expose possible semantic incompatibility with source code?

https://groups.google.com/forum/#!topic/llvm-dev/C5ydxnn-r0o

In D79452#2022815, @spatel wrote:

In D79452#2022567, @spatel wrote:

undef ^ undef is actually undef:
http://volta.cs.utah.edu:8080/z/GAs5QJ

But I think we choose to fold to "0" on that to avoid breaking too much existing source code and/or expose possible semantic incompatibility with source code?

https://groups.google.com/forum/#!topic/llvm-dev/C5ydxnn-r0o

This seems interesting.

I modified llvm/lib/IR/ConstantFold.cpp to return undef on xor undef undef, and it did not miscompile testsuite & SPEC on my machine, at least.
I can make a patch for this.

aqjune mentioned this in D79528: [ConstantFold] Optimize xor undef, undef to undef.May 6 2020, 4:59 PM

In D79452#2022567, @spatel wrote:

I was assuming that we don't have the same concern about vector code as the scalar code here, but we could use ConstantFolding to generate the result for each element of the output vector.

Inserting into a binop(undef, undef) constant fold makes sense as a safer option.

In D79452#2024909, @RKSimon wrote:

In D79452#2022567, @spatel wrote:

I was assuming that we don't have the same concern about vector code as the scalar code here, but we could use ConstantFolding to generate the result for each element of the output vector.

Inserting into a binop(undef, undef) constant fold makes sense as a safer option.

Ok - we can generalize the base vector constant matching then. I'll add some tests and update the code.

Patch updated:

Loosen pattern matching to allow any vector constant rather than just undef.
Use constant folding to generate the new base vector for insertion.
Add tests/comments.
Add TODO code comments for potential enhancements.

spatel mentioned this in rG666c61db7962: [VectorCombine] add tests for insert into arbitrary constant; NFC.May 7 2020, 7:54 AM

(requesting re-review because the logic changed slightly)

RKSimon added inline comments.May 7 2020, 8:40 AM

llvm/test/Transforms/VectorCombine/X86/insert-binop.ll
102	Can we ever have a base vector with undef elements here? sdiv/udiv won't like that but I'm not sure if we can ever get in that state. %i0 = insertelement <2 x i64> <i64 42, i64 undef>, i64 %x, i64 1 %i1 = insertelement <2 x i64> <i64 -7, i64 undef>, i64 %y, i32 1

spatel marked an inline comment as done.May 7 2020, 10:11 AM

spatel added inline comments.

llvm/test/Transforms/VectorCombine/X86/insert-binop.ll
102	Yes, it's ok to have undefs in the lane that we're inserting into, and ConstantFolding will deal with that. Demanded elements will eventually replace the unused constants that we see in this test with undefs. We can't, however, have undefs in lanes we are not inserting into with div/rem - that would be immediate UB, so we would have simplified that before we reach here. But I can add tests with non-canonical IR to make sure there's nothing crazy happening here.

RKSimon added inline comments.May 7 2020, 10:37 AM

llvm/test/Transforms/VectorCombine/X86/insert-binop.ll
102	Thanks, I think I was thinking that an undef value had the same effect as divide by zero in sdiv/udiv (causing the entire vector to become undef).

spatel mentioned this in rG5d0f2fdfa52b: [VectorCombine] add tests with undefs; NFC.May 7 2020, 12:29 PM

spatel marked 3 inline comments as done.May 7 2020, 12:53 PM

spatel added inline comments.

llvm/test/Transforms/VectorCombine/X86/insert-binop.ll
102	It does, and I was wrong about this getting folded sooner. I thought that analysis existed, but we don't have an InstSimplify that tries to recursively find undef/zero divisor elements and zap the entire value. We just do a simple check on vector constants. So the transform here could mask that potential simplification (although that seems like a rare possibility). I'll update with some test examples.

Patch updated:
No code changes, but added 2 more tests with undef and udiv/urem to better demonstrate constant folding.

LGTM - cheers!

This revision is now accepted and ready to land.May 7 2020, 2:17 PM

Closed by commit rG0d2a0b44c812: [VectorCombine] scalarize binop of inserted elements into vector constants (authored by spatel). · Explain WhyMay 8 2020, 1:58 PM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in rG5f730b645d5a: [VectorCombine] account for extra uses in scalarization cost.May 11 2020, 12:56 PM

Diff 262947

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

Show All 28 Lines
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"

using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "vector-combine"		#define DEBUG_TYPE "vector-combine"
STATISTIC(NumVecCmp, "Number of vector compares formed");		STATISTIC(NumVecCmp, "Number of vector compares formed");
STATISTIC(NumVecBO, "Number of vector binops formed");		STATISTIC(NumVecBO, "Number of vector binops formed");
		STATISTIC(NumScalarBO, "Number of scalar binops formed");

static cl::opt<bool> DisableVectorCombine(		static cl::opt<bool> DisableVectorCombine(
"disable-vector-combine", cl::init(false), cl::Hidden,		"disable-vector-combine", cl::init(false), cl::Hidden,
cl::desc("Disable all vector combine transforms"));		cl::desc("Disable all vector combine transforms"));

static cl::opt<bool> DisableBinopExtractShuffle(		static cl::opt<bool> DisableBinopExtractShuffle(
"disable-binop-extract-shuffle", cl::init(false), cl::Hidden,		"disable-binop-extract-shuffle", cl::init(false), cl::Hidden,
cl::desc("Disable binop extract to shuffle transforms"));		cl::desc("Disable binop extract to shuffle transforms"));
▲ Show 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	static bool foldBitcastShuf(Instruction &I, const TargetTransformInfo &TTI) {
IRBuilder<> Builder(&I);		IRBuilder<> Builder(&I);
Value *CastV = Builder.CreateBitCast(V, DestTy);		Value *CastV = Builder.CreateBitCast(V, DestTy);
Value *Shuf = Builder.CreateShuffleVector(CastV, UndefValue::get(DestTy),		Value *Shuf = Builder.CreateShuffleVector(CastV, UndefValue::get(DestTy),
NewMask);		NewMask);
I.replaceAllUsesWith(Shuf);		I.replaceAllUsesWith(Shuf);
return true;		return true;
}		}

		/// Match a vector binop instruction with inserted scalar operands and convert
		/// to scalar binop followed by insertelement.
		static bool scalarizeBinop(Instruction &I, const TargetTransformInfo &TTI) {
		Instruction Ins0, Ins1;
		if (!match(&I, m_BinOp(m_Instruction(Ins0), m_Instruction(Ins1))))
		return false;

		// TODO: Loosen restriction for one-use by adjusting cost equation.
		// TODO: Deal with mismatched index constants and variable indexes?
		lebedev.riUnsubmitted Done Reply Inline Actions Hm, do we have an interface to ask for cost of `InsertElement` with variable insert index? lebedev.ri: Hm, do we have an interface to ask for cost of `InsertElement` with variable insert index?
		RKSimonUnsubmitted Done Reply Inline Actions Yes we set Index to -1 in the getVectorInstrCost call if its unknown/variable - we don't do much with it in x86 at least though........ RKSimon: Yes we set Index to -1 in the getVectorInstrCost call if its unknown/variable - we don't do…
		Constant VecC0, VecC1;
		Value V0, V1;
		uint64_t Index;
		if (!match(Ins0, m_OneUse(m_InsertElement(m_Constant(VecC0), m_Value(V0),
		m_ConstantInt(Index)))) \|\|
		!match(Ins1, m_OneUse(m_InsertElement(m_Constant(VecC1), m_Value(V1),
		m_SpecificInt(Index)))))
		return false;

		Type *ScalarTy = V0->getType();
		Type *VecTy = I.getType();
		assert(VecTy->isVectorTy() && ScalarTy == V1->getType() &&
		(ScalarTy->isIntegerTy() \|\| ScalarTy->isFloatingPointTy()) &&
		"Unexpected types for insert into binop");

		Instruction::BinaryOps Opcode = cast<BinaryOperator>(&I)->getOpcode();
		int ScalarOpCost = TTI.getArithmeticInstrCost(Opcode, ScalarTy);
		int VectorOpCost = TTI.getArithmeticInstrCost(Opcode, VecTy);

		// Get cost estimate for the insert element. This cost will factor into
		// both sequences.
		lebedev.riUnsubmitted Done Reply Inline Actions `// We want to scalarize unless vector variant actually has lower cost` lebedev.ri: `// We want to scalarize unless vector variant actually has lower cost`
		int InsertCost =
		TTI.getVectorInstrCost(Instruction::InsertElement, VecTy, Index);
		int OldCost = InsertCost + InsertCost + VectorOpCost;
		int NewCost = ScalarOpCost + InsertCost;

		// We want to scalarize unless the vector variant actually has lower cost.
		if (OldCost < NewCost)
		return false;

		// vec_bo (inselt VecC0, V0, Index), (inselt VecC1, V1, Index) -->
		// inselt NewVecC, (scalar_bo V0, V1), Index
		++NumScalarBO;
		IRBuilder<> Builder(&I);
		Value *Scalar = Builder.CreateBinOp(Opcode, V0, V1, I.getName() + ".scalar");

		// All IR flags are safe to back-propagate. There is no potential for extra
		lebedev.riUnsubmitted Done Reply Inline Actions This should retain the name from `I`. lebedev.ri: This should retain the name from `I`.
		// poison to be created by the scalar instruction.
		if (auto *ScalarInst = dyn_cast<Instruction>(Scalar))
		ScalarInst->copyIRFlags(&I);

		// Fold the vector constants in the original vectors into a new base vector.
		Constant *NewVecC = ConstantExpr::get(Opcode, VecC0, VecC1);
		Value *Insert = Builder.CreateInsertElement(NewVecC, Scalar, Index);
		I.replaceAllUsesWith(Insert);
		Insert->takeName(&I);
		return true;
		}

/// This is the entry point for all transforms. Pass manager differences are		/// This is the entry point for all transforms. Pass manager differences are
/// handled in the callers of this function.		/// handled in the callers of this function.
static bool runImpl(Function &F, const TargetTransformInfo &TTI,		static bool runImpl(Function &F, const TargetTransformInfo &TTI,
const DominatorTree &DT) {		const DominatorTree &DT) {
if (DisableVectorCombine)		if (DisableVectorCombine)
return false;		return false;

bool MadeChange = false;		bool MadeChange = false;
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
// Ignore unreachable basic blocks.		// Ignore unreachable basic blocks.
if (!DT.isReachableFromEntry(&BB))		if (!DT.isReachableFromEntry(&BB))
continue;		continue;
// Do not delete instructions under here and invalidate the iterator.		// Do not delete instructions under here and invalidate the iterator.
// Walk the block backwards for efficiency. We're matching a chain of		// Walk the block backwards for efficiency. We're matching a chain of
// use->defs, so we're more likely to succeed by starting from the bottom.		// use->defs, so we're more likely to succeed by starting from the bottom.
// TODO: It could be more efficient to remove dead instructions		// TODO: It could be more efficient to remove dead instructions
// iteratively in this loop rather than waiting until the end.		// iteratively in this loop rather than waiting until the end.
for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {		for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {
if (isa<DbgInfoIntrinsic>(I))		if (isa<DbgInfoIntrinsic>(I))
continue;		continue;
MadeChange \|= foldExtractExtract(I, TTI);		MadeChange \|= foldExtractExtract(I, TTI);
MadeChange \|= foldBitcastShuf(I, TTI);		MadeChange \|= foldBitcastShuf(I, TTI);
		MadeChange \|= scalarizeBinop(I, TTI);
}		}
}		}

// We're done with transforms, so remove dead instructions.		// We're done with transforms, so remove dead instructions.
if (MadeChange)		if (MadeChange)
for (BasicBlock &BB : F)		for (BasicBlock &BB : F)
SimplifyInstructionsInBlock(&BB);		SimplifyInstructionsInBlock(&BB);

▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/Transforms/VectorCombine/X86/insert-binop.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -vector-combine -S -mtriple=x86_64-- -mattr=SSE2 \| FileCheck %s --check-prefixes=CHECK,SSE			; RUN: opt < %s -vector-combine -S -mtriple=x86_64-- -mattr=SSE2 \| FileCheck %s --check-prefixes=CHECK,SSE
	; RUN: opt < %s -vector-combine -S -mtriple=x86_64-- -mattr=AVX2 \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -vector-combine -S -mtriple=x86_64-- -mattr=AVX2 \| FileCheck %s --check-prefixes=CHECK,AVX

	declare void @use(<4 x i32>)			declare void @use(<4 x i32>)

	; Eliminating an insert is profitable.			; Eliminating an insert is profitable.

	define <16 x i8> @ins0_ins0_add(i8 %x, i8 %y) {			define <16 x i8> @ins0_ins0_add(i8 %x, i8 %y) {
	; CHECK-LABEL: @ins0_ins0_add(			; CHECK-LABEL: @ins0_ins0_add(
	; CHECK-NEXT: [[I0:%.]] = insertelement <16 x i8> undef, i8 [[X:%.]], i32 0			; CHECK-NEXT: [[R_SCALAR:%.]] = add i8 [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: [[I1:%.]] = insertelement <16 x i8> undef, i8 [[Y:%.]], i32 0			; CHECK-NEXT: [[R:%.*]] = insertelement <16 x i8> undef, i8 [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[R:%.*]] = add <16 x i8> [[I0]], [[I1]]
	; CHECK-NEXT: ret <16 x i8> [[R]]			; CHECK-NEXT: ret <16 x i8> [[R]]
	;			;
	%i0 = insertelement <16 x i8> undef, i8 %x, i32 0			%i0 = insertelement <16 x i8> undef, i8 %x, i32 0
	%i1 = insertelement <16 x i8> undef, i8 %y, i32 0			%i1 = insertelement <16 x i8> undef, i8 %y, i32 0
	%r = add <16 x i8> %i0, %i1			%r = add <16 x i8> %i0, %i1
	ret <16 x i8> %r			ret <16 x i8> %r
	}			}

	; Eliminating an insert is still profitable. Flags propagate. Mismatch types on index is ok.			; Eliminating an insert is still profitable. Flags propagate. Mismatch types on index is ok.

	define <8 x i16> @ins0_ins0_sub_flags(i16 %x, i16 %y) {			define <8 x i16> @ins0_ins0_sub_flags(i16 %x, i16 %y) {
	; CHECK-LABEL: @ins0_ins0_sub_flags(			; CHECK-LABEL: @ins0_ins0_sub_flags(
	; CHECK-NEXT: [[I0:%.]] = insertelement <8 x i16> undef, i16 [[X:%.]], i8 5			; CHECK-NEXT: [[R_SCALAR:%.]] = sub nuw nsw i16 [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: [[I1:%.]] = insertelement <8 x i16> undef, i16 [[Y:%.]], i32 5			; CHECK-NEXT: [[R:%.*]] = insertelement <8 x i16> undef, i16 [[R_SCALAR]], i64 5
	; CHECK-NEXT: [[R:%.*]] = sub nuw nsw <8 x i16> [[I0]], [[I1]]
	; CHECK-NEXT: ret <8 x i16> [[R]]			; CHECK-NEXT: ret <8 x i16> [[R]]
	;			;
	%i0 = insertelement <8 x i16> undef, i16 %x, i8 5			%i0 = insertelement <8 x i16> undef, i16 %x, i8 5
	%i1 = insertelement <8 x i16> undef, i16 %y, i32 5			%i1 = insertelement <8 x i16> undef, i16 %y, i32 5
	%r = sub nsw nuw <8 x i16> %i0, %i1			%r = sub nsw nuw <8 x i16> %i0, %i1
	ret <8 x i16> %r			ret <8 x i16> %r
	}			}

				; The new vector constant is calculated by constant folding.
				; This is conservatively created as zero rather than undef for 'undef ^ undef'.

	define <2 x i64> @ins1_ins1_xor(i64 %x, i64 %y) {			define <2 x i64> @ins1_ins1_xor(i64 %x, i64 %y) {
	; CHECK-LABEL: @ins1_ins1_xor(			; CHECK-LABEL: @ins1_ins1_xor(
	; CHECK-NEXT: [[I0:%.]] = insertelement <2 x i64> undef, i64 [[X:%.]], i64 1			; CHECK-NEXT: [[R_SCALAR:%.]] = xor i64 [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: [[I1:%.]] = insertelement <2 x i64> undef, i64 [[Y:%.]], i32 1			; CHECK-NEXT: [[R:%.*]] = insertelement <2 x i64> zeroinitializer, i64 [[R_SCALAR]], i64 1
	; CHECK-NEXT: [[R:%.*]] = xor <2 x i64> [[I0]], [[I1]]
	; CHECK-NEXT: ret <2 x i64> [[R]]			; CHECK-NEXT: ret <2 x i64> [[R]]
	;			;
	%i0 = insertelement <2 x i64> undef, i64 %x, i64 1			%i0 = insertelement <2 x i64> undef, i64 %x, i64 1
	%i1 = insertelement <2 x i64> undef, i64 %y, i32 1			%i1 = insertelement <2 x i64> undef, i64 %y, i32 1
	%r = xor <2 x i64> %i0, %i1			%r = xor <2 x i64> %i0, %i1
	ret <2 x i64> %r			ret <2 x i64> %r
	}			}

	; The inserts are free, but it's still better to scalarize.			; The inserts are free, but it's still better to scalarize.

	define <2 x double> @ins0_ins0_fadd(double %x, double %y) {			define <2 x double> @ins0_ins0_fadd(double %x, double %y) {
	; CHECK-LABEL: @ins0_ins0_fadd(			; CHECK-LABEL: @ins0_ins0_fadd(
	; CHECK-NEXT: [[I0:%.]] = insertelement <2 x double> undef, double [[X:%.]], i32 0			; CHECK-NEXT: [[R_SCALAR:%.]] = fadd reassoc nsz double [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: [[I1:%.]] = insertelement <2 x double> undef, double [[Y:%.]], i32 0			; CHECK-NEXT: [[R:%.*]] = insertelement <2 x double> undef, double [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[R:%.*]] = fadd reassoc nsz <2 x double> [[I0]], [[I1]]
	; CHECK-NEXT: ret <2 x double> [[R]]			; CHECK-NEXT: ret <2 x double> [[R]]
	;			;
	%i0 = insertelement <2 x double> undef, double %x, i32 0			%i0 = insertelement <2 x double> undef, double %x, i32 0
	%i1 = insertelement <2 x double> undef, double %y, i32 0			%i1 = insertelement <2 x double> undef, double %y, i32 0
	%r = fadd reassoc nsz <2 x double> %i0, %i1			%r = fadd reassoc nsz <2 x double> %i0, %i1
	ret <2 x double> %r			ret <2 x double> %r
	}			}

				; Negative test - mismatched indexes (but could fold this).

	define <16 x i8> @ins1_ins0_add(i8 %x, i8 %y) {			define <16 x i8> @ins1_ins0_add(i8 %x, i8 %y) {
	; CHECK-LABEL: @ins1_ins0_add(			; CHECK-LABEL: @ins1_ins0_add(
	; CHECK-NEXT: [[I0:%.]] = insertelement <16 x i8> undef, i8 [[X:%.]], i32 1			; CHECK-NEXT: [[I0:%.]] = insertelement <16 x i8> undef, i8 [[X:%.]], i32 1
	; CHECK-NEXT: [[I1:%.]] = insertelement <16 x i8> undef, i8 [[Y:%.]], i32 0			; CHECK-NEXT: [[I1:%.]] = insertelement <16 x i8> undef, i8 [[Y:%.]], i32 0
	; CHECK-NEXT: [[R:%.*]] = add <16 x i8> [[I0]], [[I1]]			; CHECK-NEXT: [[R:%.*]] = add <16 x i8> [[I0]], [[I1]]
	; CHECK-NEXT: ret <16 x i8> [[R]]			; CHECK-NEXT: ret <16 x i8> [[R]]
	;			;
	%i0 = insertelement <16 x i8> undef, i8 %x, i32 1			%i0 = insertelement <16 x i8> undef, i8 %x, i32 1
	%i1 = insertelement <16 x i8> undef, i8 %y, i32 0			%i1 = insertelement <16 x i8> undef, i8 %y, i32 0
	%r = add <16 x i8> %i0, %i1			%r = add <16 x i8> %i0, %i1
	ret <16 x i8> %r			ret <16 x i8> %r
	}			}

				; Base vector does not have to be undef.

	define <4 x i32> @ins0_ins0_mul(i32 %x, i32 %y) {			define <4 x i32> @ins0_ins0_mul(i32 %x, i32 %y) {
	; CHECK-LABEL: @ins0_ins0_mul(			; CHECK-LABEL: @ins0_ins0_mul(
	; CHECK-NEXT: [[I0:%.]] = insertelement <4 x i32> zeroinitializer, i32 [[X:%.]], i32 0			; CHECK-NEXT: [[R_SCALAR:%.]] = mul i32 [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: [[I1:%.]] = insertelement <4 x i32> undef, i32 [[Y:%.]], i32 0			; CHECK-NEXT: [[R:%.*]] = insertelement <4 x i32> zeroinitializer, i32 [[R_SCALAR]], i64 0
	; CHECK-NEXT: [[R:%.*]] = mul <4 x i32> [[I0]], [[I1]]
	; CHECK-NEXT: ret <4 x i32> [[R]]			; CHECK-NEXT: ret <4 x i32> [[R]]
	;			;
	%i0 = insertelement <4 x i32> zeroinitializer, i32 %x, i32 0			%i0 = insertelement <4 x i32> zeroinitializer, i32 %x, i32 0
	%i1 = insertelement <4 x i32> undef, i32 %y, i32 0			%i1 = insertelement <4 x i32> undef, i32 %y, i32 0
	%r = mul <4 x i32> %i0, %i1			%r = mul <4 x i32> %i0, %i1
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

				; It is safe to scalarize any binop (no extra UB/poison danger).

	define <2 x i64> @ins1_ins1_sdiv(i64 %x, i64 %y) {			define <2 x i64> @ins1_ins1_sdiv(i64 %x, i64 %y) {
	; CHECK-LABEL: @ins1_ins1_sdiv(			; CHECK-LABEL: @ins1_ins1_sdiv(
	; CHECK-NEXT: [[I0:%.]] = insertelement <2 x i64> <i64 42, i64 -42>, i64 [[X:%.]], i64 1			; CHECK-NEXT: [[R_SCALAR:%.]] = sdiv i64 [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: [[I1:%.]] = insertelement <2 x i64> <i64 -7, i64 128>, i64 [[Y:%.]], i32 1			; CHECK-NEXT: [[R:%.*]] = insertelement <2 x i64> <i64 -6, i64 0>, i64 [[R_SCALAR]], i64 1
	; CHECK-NEXT: [[R:%.*]] = sdiv <2 x i64> [[I0]], [[I1]]
	; CHECK-NEXT: ret <2 x i64> [[R]]			; CHECK-NEXT: ret <2 x i64> [[R]]
	;			;
	%i0 = insertelement <2 x i64> <i64 42, i64 -42>, i64 %x, i64 1			%i0 = insertelement <2 x i64> <i64 42, i64 -42>, i64 %x, i64 1
	%i1 = insertelement <2 x i64> <i64 -7, i64 128>, i64 %y, i32 1			%i1 = insertelement <2 x i64> <i64 -7, i64 128>, i64 %y, i32 1
				RKSimonUnsubmitted Done Reply Inline Actions Can we ever have a base vector with undef elements here? sdiv/udiv won't like that but I'm not sure if we can ever get in that state. %i0 = insertelement <2 x i64> <i64 42, i64 undef>, i64 %x, i64 1 %i1 = insertelement <2 x i64> <i64 -7, i64 undef>, i64 %y, i32 1 RKSimon: Can we ever have a base vector with undef elements here? sdiv/udiv won't like that but I'm not…
				spatelAuthorUnsubmitted Done Reply Inline Actions Yes, it's ok to have undefs in the lane that we're inserting into, and ConstantFolding will deal with that. Demanded elements will eventually replace the unused constants that we see in this test with undefs. We can't, however, have undefs in lanes we are not inserting into with div/rem - that would be immediate UB, so we would have simplified that before we reach here. But I can add tests with non-canonical IR to make sure there's nothing crazy happening here. spatel: Yes, it's ok to have undefs in the lane that we're inserting into, and ConstantFolding will…
				RKSimonUnsubmitted Done Reply Inline Actions Thanks, I think I was thinking that an undef value had the same effect as divide by zero in sdiv/udiv (causing the entire vector to become undef). RKSimon: Thanks, I think I was thinking that an undef value had the same effect as divide by zero in…
				spatelAuthorUnsubmitted Done Reply Inline Actions It does, and I was wrong about this getting folded sooner. I thought that analysis existed, but we don't have an InstSimplify that tries to recursively find undef/zero divisor elements and zap the entire value. We just do a simple check on vector constants. So the transform here could mask that potential simplification (although that seems like a rare possibility). I'll update with some test examples. spatel: It does, and I was wrong about this getting folded sooner. I thought that analysis existed, but…
	%r = sdiv <2 x i64> %i0, %i1			%r = sdiv <2 x i64> %i0, %i1
	ret <2 x i64> %r			ret <2 x i64> %r
	}			}

				; Constant folding deals with undef per element - the entire value does not become undef.

	define <2 x i64> @ins1_ins1_udiv(i64 %x, i64 %y) {			define <2 x i64> @ins1_ins1_udiv(i64 %x, i64 %y) {
	; CHECK-LABEL: @ins1_ins1_udiv(			; CHECK-LABEL: @ins1_ins1_udiv(
	; CHECK-NEXT: [[I0:%.]] = insertelement <2 x i64> <i64 42, i64 undef>, i64 [[X:%.]], i32 1			; CHECK-NEXT: [[R_SCALAR:%.]] = udiv i64 [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: [[I1:%.]] = insertelement <2 x i64> <i64 7, i64 undef>, i64 [[Y:%.]], i32 1			; CHECK-NEXT: [[R:%.*]] = insertelement <2 x i64> <i64 6, i64 undef>, i64 [[R_SCALAR]], i64 1
	; CHECK-NEXT: [[R:%.*]] = udiv <2 x i64> [[I0]], [[I1]]
	; CHECK-NEXT: ret <2 x i64> [[R]]			; CHECK-NEXT: ret <2 x i64> [[R]]
	;			;
	%i0 = insertelement <2 x i64> <i64 42, i64 undef>, i64 %x, i32 1			%i0 = insertelement <2 x i64> <i64 42, i64 undef>, i64 %x, i32 1
	%i1 = insertelement <2 x i64> <i64 7, i64 undef>, i64 %y, i32 1			%i1 = insertelement <2 x i64> <i64 7, i64 undef>, i64 %y, i32 1
	%r = udiv <2 x i64> %i0, %i1			%r = udiv <2 x i64> %i0, %i1
	ret <2 x i64> %r			ret <2 x i64> %r
	}			}

				; This could be simplified -- creates immediate UB without the transform because
				; divisor has an undef element -- but that is hidden after the transform.

	define <2 x i64> @ins1_ins1_urem(i64 %x, i64 %y) {			define <2 x i64> @ins1_ins1_urem(i64 %x, i64 %y) {
	; CHECK-LABEL: @ins1_ins1_urem(			; CHECK-LABEL: @ins1_ins1_urem(
	; CHECK-NEXT: [[I0:%.]] = insertelement <2 x i64> <i64 42, i64 undef>, i64 [[X:%.]], i64 1			; CHECK-NEXT: [[R_SCALAR:%.]] = urem i64 [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: [[I1:%.]] = insertelement <2 x i64> <i64 undef, i64 128>, i64 [[Y:%.]], i32 1			; CHECK-NEXT: [[R:%.*]] = insertelement <2 x i64> <i64 undef, i64 0>, i64 [[R_SCALAR]], i64 1
	; CHECK-NEXT: [[R:%.*]] = urem <2 x i64> [[I0]], [[I1]]
	; CHECK-NEXT: ret <2 x i64> [[R]]			; CHECK-NEXT: ret <2 x i64> [[R]]
	;			;
	%i0 = insertelement <2 x i64> <i64 42, i64 undef>, i64 %x, i64 1			%i0 = insertelement <2 x i64> <i64 42, i64 undef>, i64 %x, i64 1
	%i1 = insertelement <2 x i64> <i64 undef, i64 128>, i64 %y, i32 1			%i1 = insertelement <2 x i64> <i64 undef, i64 128>, i64 %y, i32 1
	%r = urem <2 x i64> %i0, %i1			%r = urem <2 x i64> %i0, %i1
	ret <2 x i64> %r			ret <2 x i64> %r
	}			}

				; Negative test
				; TODO: extra use can be accounted for in cost calculation.

	define <4 x i32> @ins0_ins0_xor(i32 %x, i32 %y) {			define <4 x i32> @ins0_ins0_xor(i32 %x, i32 %y) {
	; CHECK-LABEL: @ins0_ins0_xor(			; CHECK-LABEL: @ins0_ins0_xor(
	; CHECK-NEXT: [[I0:%.]] = insertelement <4 x i32> undef, i32 [[X:%.]], i32 0			; CHECK-NEXT: [[I0:%.]] = insertelement <4 x i32> undef, i32 [[X:%.]], i32 0
	; CHECK-NEXT: call void @use(<4 x i32> [[I0]])			; CHECK-NEXT: call void @use(<4 x i32> [[I0]])
	; CHECK-NEXT: [[I1:%.]] = insertelement <4 x i32> undef, i32 [[Y:%.]], i32 0			; CHECK-NEXT: [[I1:%.]] = insertelement <4 x i32> undef, i32 [[Y:%.]], i32 0
	; CHECK-NEXT: [[R:%.*]] = xor <4 x i32> [[I0]], [[I1]]			; CHECK-NEXT: [[R:%.*]] = xor <4 x i32> [[I0]], [[I1]]
	; CHECK-NEXT: ret <4 x i32> [[R]]			; CHECK-NEXT: ret <4 x i32> [[R]]
	;			;
	%i0 = insertelement <4 x i32> undef, i32 %x, i32 0			%i0 = insertelement <4 x i32> undef, i32 %x, i32 0
	call void @use(<4 x i32> %i0)			call void @use(<4 x i32> %i0)
	%i1 = insertelement <4 x i32> undef, i32 %y, i32 0			%i1 = insertelement <4 x i32> undef, i32 %y, i32 0
	%r = xor <4 x i32> %i0, %i1			%r = xor <4 x i32> %i0, %i1
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] scalarize binop of inserted elements into vector constants
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 262947

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

llvm/test/Transforms/VectorCombine/X86/insert-binop.ll

This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] scalarize binop of inserted elements into vector constantsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 262947

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

llvm/test/Transforms/VectorCombine/X86/insert-binop.ll

[VectorCombine] scalarize binop of inserted elements into vector constants
ClosedPublic