This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/Vectorize/
-
lib/
-
Transforms/
-
Vectorize/
19/28
VectorCombine.cpp

Differential D94069

[NFC][InstructionCost]Migrate VectorCombine.cpp to use InstructionCost
ClosedPublic

Authored by CarolineConcatto on Jan 5 2021, 1:09 AM.

Download Raw Diff

Details

Reviewers

david-arm
sdesmalen
ctetreau
spatel

Commits

rG36710c38c1b7: [NFC]Migrate VectorCombine.cpp to use InstructionCost

Summary

This patch changes these functions:
  vectorizeLoadInsert
  isExtractExtractCheap
  foldExtractedCmps
  scalarizeBinopOrCmp
  getShuffleExtract
  foldBitcastShuf
  to use the class InstructionCost when calling TTI.get<something>Cost().

  This patch is part of a series of patches to use InstructionCost instead of
   unsigned/int for the cost model functions.
  See this thread for context:
      http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
  See this patch for the introduction of the type:
      https://reviews.llvm.org/D91174

Observation: 
 This patch adds the test || !NewCost.isValid(), because we want to
  return false when:
   !NewCost.isValid && !OldCost.isValid()->the cost to transform it expensive
  and
   !NewCost.isValid() && OldCost.isValid()
  Therefore for simplication we only add  test for !NewCost.isValid()

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	490 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::strcmp.c

Event Timeline

CarolineConcatto created this revision.Jan 5 2021, 1:09 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 5 2021, 1:09 AM

CarolineConcatto requested review of this revision.Jan 5 2021, 1:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2021, 1:09 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

CarolineConcatto added reviewers: david-arm, sdesmalen, ctetreau, spatel.Jan 5 2021, 1:10 AM

CarolineConcatto retitled this revision from [NFC]Migrate VectorCombine.cpp to use InstructionCost to [NFC][InstructionCost]Migrate VectorCombine.cpp to use InstructionCost.Jan 5 2021, 1:18 AM

Harbormaster completed remote builds in B83993: Diff 314523.Jan 5 2021, 1:48 AM

ctetreau added inline comments.Jan 5 2021, 9:30 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
201	the or is redundant.
259	Because of this assert, the branches will never be taken. This will result in different behavior in release vs debug mode. Either remove the assert, or remove the two early returns. Regardless, invalid costs are guaranteed to compare higher than valid costs, so the early returns are redundant.
321	can std::min be used here? InstructionCost has overloaded comparison operators and a total ordering. Assuming it can be, we should probably get rid of InstructionCost::min and InstructionCost::max. That can be a different patch.
363–364	this is redundant. If OldCost is valid, and NewCost is invalid, then OldCost < NewCost returns true.
526	the or is redundant
636	the or is redundant
737	the or is redundant

david-arm added a subscriber: paulwalker-arm.Jan 6 2021, 1:55 AM

david-arm added inline comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
259	Hi @ctetreau, after discussion during the last SVE sync call @paulwalker-arm thought we shouldn't be relying upon the lexicographical ordering that defines invalid costs to be infinitely expensive. He suggested that doing so is actually a bug in the code. So the route we've been taking so far is to either check for validity explicitly or assert that it's valid. If you think this is the wrong approach here then we can perhaps discuss it and agree on a consistent approach?
363–364	Again, @CarolineConcatto is just adding checks here as per discussion on the last SVE sync call, but we're happy to discuss the correct approach.

I haven't followed the details enough to comment on the changes directly, but thanks for the cleanup! The mismatched signed/unsigned cost model APIs are/were a mess.

ctetreau added inline comments.Jan 6 2021, 9:49 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
259	I'm not sure how I missed that conversation, but as you may recall from the review thread for D91174, I fought hard for the total ordering to be added and documented so that it's guaranteed to be true. This is exactly the sort of case you'd want to this ordering for; the validity checks are guaranteed to be redundant with the greater-than checks. Additionally, adding the redundant validity checks is more error prone, because it's more operator-heavy lines of code you can mess up.

paulwalker-arm added inline comments.Jan 7 2021, 1:40 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
259	To be clear I have nothing against relying on the total ordering but I feel if the transformation is expecting instructions to have an actual cost then it should either assert or explicitly check for such. An example of this is LoopVectorize where there has already been extensive validation to ensure a loop is vectorisable and thus not being able to cost the loop is a sure sign there's either a bug in LoopVectorize's isLegal code or the cost functions themselves.

sdesmalen added inline comments.Jan 8 2021, 6:11 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
259	It seems that the algorithm requires at least one of the Costs to be valid (it has to choose either Ext0 or Ext1), so if the assert is changed to: assert((Cost0.isValid() \|\| Cost1.isValid()) && "At least one of Cost0 and Cost1 should be valid"); the existing code below should be sufficient and the two early returns that were added can be removed like @ctetreau suggested.
363–364	Similar to above, change the assert to check `OldCost.isValid() \|\| NewCost.isValid()` and remove the early exit.

david-arm mentioned this in D91957: [Support] Migrate more high level cost functions to using InstructionCost.Jan 8 2021, 8:52 AM

-remove redundat invalid check

Thank you, everyone, for the review.
I have removed the redundant invalid check.
Also, thank you for making clear that invalid, atm, means as well high cost.
I'll have that in mind for the next patches.

Harbormaster completed remote builds in B84643: Diff 315728.Jan 11 2021, 2:02 AM

In D94069#2489637, @CarolineConcatto wrote:

Also, thank you for making clear that invalid, atm, means as well high cost.
I'll have that in mind for the next patches.

I would say "infinitely costly", not "high cost". Somebody may have "a lot" of LLVMBucks, nobody has infinity LLVMBucks.

Sorry to nitpick, but it's important to get these things right initially before everybody sees some flawed example and emulates it. If you need something to have a really high cost, you should just pick some really high valid cost. If you need something to never be within any cost budget, you should use invalid.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
259	To be clear I have nothing against relying on the total ordering but I feel if the transformation is expecting instructions to have an actual cost then it should either assert or explicitly check for such. If it's an honest-to-gosh bug for some call to return invalid, then this is fine. I feel like this should never happen in any function that returns `InstructionCost` though. This would be akin to swallowing an exception and calling `exit()`.

ctetreau added inline comments.Jan 11 2021, 9:38 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
200	Is this needed? If the old cost is invalid, but the new cost is valid, then do the transform. If neither cost is valid, then `OldCost < NewCost` will cause a return of false.
253	Reasonable to return nullptr? If neither cost is valid, then neither of the inputs should be replaced
359–360	Would it be reasonable to return false here? If all costs involved are invalid, then I would say the transform is not cheap.
525	reasonable to return false here? If neither cost is valid, then do not do the transform
635	reasonable to return false?
736	reasonable to return false?

-remove asserts for OldCost invalid
-add return if both costs are invalid

Hi @ctetreau,
Thank you for the review.
So I removed all the asserts and added the earlier return if both costs are invalid, because in this case it means that the transformation is not cheap.
If only OldCost is invalid I believe we should do nothing, for the same reason I removed the test for !NewCost.isValid().

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
253	I think it is fine to add if the test if both are equal following the logic of if (Index0 == Index1)
359–360	If I remove the assert and leave the test to do its job the return will be false, because OldCost would be equal to NewCost.
635	If both are invalid, maybe. But if OldCost is invalid I think you mean return true, because OldCost>NewCost And in this last case I don't think we should add the earlier return because it is a change on the code path.
736	If both are invalid, maybe. But if OldCost is invalid I think you mean return true, because OldCost>NewCost And in this last case I don't think we should add the earlier return because it is a change on the code path.

-add missing invalid costs test for scalarizeBinopOrCmp

Harbormaster completed remote builds in B84986: Diff 316344.Jan 13 2021, 2:08 AM

Harbormaster completed remote builds in B84987: Diff 316347.Jan 13 2021, 2:19 AM

sdesmalen added inline comments.Jan 13 2021, 7:32 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
525	Does this assert need to be removed still?

-remove missed assert

CarolineConcatto marked an inline comment as done.Jan 13 2021, 9:28 AM

CarolineConcatto added inline comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
525	Yes, sorry missed that!

Harbormaster completed remote builds in B85035: Diff 316426.Jan 13 2021, 10:04 AM

Added two nits, but LGTM otherwise.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

635

nit: || !NewCost.isValid() should be sufficient (see the other comment).

737

I don't think the or was redundant.

If the new cost is invalid, then it shouldn't do the transform. With only testing OldCost < NewCost, we'd get:

        isValid?
   OldCost  NewCost       (OldCost < NewCost)     result
--------------------------------------------------------
1.    true    true     OldCost.Val < NewCost.Val    ?
2.    true    false          Valid < Invalid       true
3.    false   true         Invalid < Valid        false
4.    false   false        Invalid < Invalid      false

However, 4. should be 'true' in order to return early from the function.

         isValid?
   OldCost  NewCost  (OldCost < NewCost || !NewCost.isValid)    result
-----------------------------------------------------------------------
1.    true    true      OldCost.Val < NewCost.Val || false        ?
2.    true    false           Valid < Invalid || true            true
3.    false   true          Invalid < Valid   || false          false
4.    false   false         Invalid < Invalid || true            true

Gives us the result we want.

nit: based on that I believe !OldCost.isValid() && is now redundant.

This revision is now accepted and ready to land.Jan 14 2021, 6:36 AM

-replace (!NewCost.isValid && !OldCost.isvalid()) by !NewCost.isValid()

CarolineConcatto edited the summary of this revision. (Show Details)Jan 17 2021, 11:09 AM

CarolineConcatto marked an inline comment as done.

Harbormaster completed remote builds in B85530: Diff 317234.Jan 17 2021, 12:27 PM

This revision was landed with ongoing or failed builds.Jan 18 2021, 5:37 AM

Closed by commit rG36710c38c1b7: [NFC]Migrate VectorCombine.cpp to use InstructionCost (authored by CarolineConcatto). · Explain Why

This revision was automatically updated to reflect the committed changes.

CarolineConcatto added a commit: rG36710c38c1b7: [NFC]Migrate VectorCombine.cpp to use InstructionCost.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

VectorCombine.cpp

58 lines

Diff 317234

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	if (!isSafeToLoadUnconditionally(SrcPtr, MinVecTy, Align(1), DL, Load, &DT)) {
// negation does not change the result of the alignment calculation.		// negation does not change the result of the alignment calculation.
Alignment = commonAlignment(Alignment, Offset.getZExtValue());		Alignment = commonAlignment(Alignment, Offset.getZExtValue());
}		}

// Original pattern: insertelt undef, load [free casts of] PtrOp, 0		// Original pattern: insertelt undef, load [free casts of] PtrOp, 0
// Use the greater of the alignment on the load or its source pointer.		// Use the greater of the alignment on the load or its source pointer.
Alignment = std::max(SrcPtr->getPointerAlignment(DL), Alignment);		Alignment = std::max(SrcPtr->getPointerAlignment(DL), Alignment);
Type *LoadTy = Load->getType();		Type *LoadTy = Load->getType();
int OldCost = TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS);		InstructionCost OldCost =
		TTI.getMemoryOpCost(Instruction::Load, LoadTy, Alignment, AS);
APInt DemandedElts = APInt::getOneBitSet(MinVecNumElts, 0);		APInt DemandedElts = APInt::getOneBitSet(MinVecNumElts, 0);
OldCost += TTI.getScalarizationOverhead(MinVecTy, DemandedElts,		OldCost += TTI.getScalarizationOverhead(MinVecTy, DemandedElts,
/* Insert */ true, HasExtract);		/* Insert */ true, HasExtract);

// New pattern: load VecPtr		// New pattern: load VecPtr
int NewCost = TTI.getMemoryOpCost(Instruction::Load, MinVecTy, Alignment, AS);		InstructionCost NewCost =
		TTI.getMemoryOpCost(Instruction::Load, MinVecTy, Alignment, AS);
// Optionally, we are shuffling the loaded vector element(s) into place.		// Optionally, we are shuffling the loaded vector element(s) into place.
if (OffsetEltIndex)		if (OffsetEltIndex)
NewCost += TTI.getShuffleCost(TTI::SK_PermuteSingleSrc, MinVecTy);		NewCost += TTI.getShuffleCost(TTI::SK_PermuteSingleSrc, MinVecTy);

// We can aggressively convert to the vector form because the backend can		// We can aggressively convert to the vector form because the backend can
// invert this transform if it does not result in a performance win.		// invert this transform if it does not result in a performance win.
if (OldCost < NewCost)		if (OldCost < NewCost \|\| !NewCost.isValid())
		ctetreauUnsubmitted Done Reply Inline Actions Is this needed? If the old cost is invalid, but the new cost is valid, then do the transform. If neither cost is valid, then `OldCost < NewCost` will cause a return of false. ctetreau: Is this needed? - If the old cost is invalid, but the new cost is valid, then do the transform.
return false;		return false;
		ctetreauUnsubmitted Done Reply Inline Actions the or is redundant. ctetreau: the or is redundant.

// It is safe and potentially profitable to load a vector directly:		// It is safe and potentially profitable to load a vector directly:
// inselt undef, load Scalar, 0 --> load VecPtr		// inselt undef, load Scalar, 0 --> load VecPtr
IRBuilder<> Builder(Load);		IRBuilder<> Builder(Load);
Value *CastedPtr = Builder.CreateBitCast(SrcPtr, MinVecTy->getPointerTo(AS));		Value *CastedPtr = Builder.CreateBitCast(SrcPtr, MinVecTy->getPointerTo(AS));
Value *VecLd = Builder.CreateAlignedLoad(MinVecTy, CastedPtr, Alignment);		Value *VecLd = Builder.CreateAlignedLoad(MinVecTy, CastedPtr, Alignment);

// Set everything but element 0 to undef to prevent poison from propagating		// Set everything but element 0 to undef to prevent poison from propagating
Show All 26 Lines	ExtractElementInst *VectorCombine::getShuffleExtract(
unsigned Index1 = cast<ConstantInt>(Ext1->getIndexOperand())->getZExtValue();		unsigned Index1 = cast<ConstantInt>(Ext1->getIndexOperand())->getZExtValue();

// If the extract indexes are identical, no shuffle is needed.		// If the extract indexes are identical, no shuffle is needed.
if (Index0 == Index1)		if (Index0 == Index1)
return nullptr;		return nullptr;

Type *VecTy = Ext0->getVectorOperand()->getType();		Type *VecTy = Ext0->getVectorOperand()->getType();
assert(VecTy == Ext1->getVectorOperand()->getType() && "Need matching types");		assert(VecTy == Ext1->getVectorOperand()->getType() && "Need matching types");
int Cost0 = TTI.getVectorInstrCost(Ext0->getOpcode(), VecTy, Index0);		InstructionCost Cost0 =
int Cost1 = TTI.getVectorInstrCost(Ext1->getOpcode(), VecTy, Index1);		TTI.getVectorInstrCost(Ext0->getOpcode(), VecTy, Index0);
		InstructionCost Cost1 =
		TTI.getVectorInstrCost(Ext1->getOpcode(), VecTy, Index1);

		// If both costs are invalid no shuffle is needed
		if (!Cost0.isValid() && !Cost1.isValid())
		return nullptr;

// We are extracting from 2 different indexes, so one operand must be shuffled		// We are extracting from 2 different indexes, so one operand must be shuffled
		ctetreauUnsubmitted Done Reply Inline Actions Reasonable to return nullptr? If neither cost is valid, then neither of the inputs should be replaced ctetreau: Reasonable to return nullptr? If neither cost is valid, then neither of the inputs should be…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions I think it is fine to add if the test if both are equal following the logic of if (Index0 == Index1) CarolineConcatto: I think it is fine to add if the test if both are equal following the logic of if (Index0 ==…
// before performing a vector operation and/or extract. The more expensive		// before performing a vector operation and/or extract. The more expensive
// extract will be replaced by a shuffle.		// extract will be replaced by a shuffle.
if (Cost0 > Cost1)		if (Cost0 > Cost1)
return Ext0;		return Ext0;
if (Cost1 > Cost0)		if (Cost1 > Cost0)
return Ext1;		return Ext1;
		ctetreauUnsubmitted Done Reply Inline Actions Because of this assert, the branches will never be taken. This will result in different behavior in release vs debug mode. Either remove the assert, or remove the two early returns. Regardless, invalid costs are guaranteed to compare higher than valid costs, so the early returns are redundant. ctetreau: Because of this assert, the branches will never be taken. This will result in different…
		david-armUnsubmitted Not Done Reply Inline Actions Hi @ctetreau, after discussion during the last SVE sync call @paulwalker-arm thought we shouldn't be relying upon the lexicographical ordering that defines invalid costs to be infinitely expensive. He suggested that doing so is actually a bug in the code. So the route we've been taking so far is to either check for validity explicitly or assert that it's valid. If you think this is the wrong approach here then we can perhaps discuss it and agree on a consistent approach? david-arm: Hi @ctetreau, after discussion during the last SVE sync call @paulwalker-arm thought we…
		ctetreauUnsubmitted Not Done Reply Inline Actions I'm not sure how I missed that conversation, but as you may recall from the review thread for D91174, I fought hard for the total ordering to be added and documented so that it's guaranteed to be true. This is exactly the sort of case you'd want to this ordering for; the validity checks are guaranteed to be redundant with the greater-than checks. Additionally, adding the redundant validity checks is more error prone, because it's more operator-heavy lines of code you can mess up. ctetreau: I'm not sure how I missed that conversation, but as you may recall from the review thread for…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions To be clear I have nothing against relying on the total ordering but I feel if the transformation is expecting instructions to have an actual cost then it should either assert or explicitly check for such. An example of this is LoopVectorize where there has already been extensive validation to ensure a loop is vectorisable and thus not being able to cost the loop is a sure sign there's either a bug in LoopVectorize's isLegal code or the cost functions themselves. paulwalker-arm: To be clear I have nothing against relying on the total ordering but I feel if the…
		ctetreauUnsubmitted Not Done Reply Inline Actions To be clear I have nothing against relying on the total ordering but I feel if the transformation is expecting instructions to have an actual cost then it should either assert or explicitly check for such. If it's an honest-to-gosh bug for some call to return invalid, then this is fine. I feel like this should never happen in any function that returns `InstructionCost` though. This would be akin to swallowing an exception and calling `exit()`. ctetreau: > To be clear I have nothing against relying on the total ordering but I feel if the…
		sdesmalenUnsubmitted Done Reply Inline Actions It seems that the algorithm requires at least one of the Costs to be valid (it has to choose either Ext0 or Ext1), so if the assert is changed to: assert((Cost0.isValid() \|\| Cost1.isValid()) && "At least one of Cost0 and Cost1 should be valid"); the existing code below should be sufficient and the two early returns that were added can be removed like @ctetreau suggested. sdesmalen: It seems that the algorithm requires at least one of the Costs to be valid (it has to choose…

// If the costs are equal and there is a preferred extract index, shuffle the		// If the costs are equal and there is a preferred extract index, shuffle the
// opposite operand.		// opposite operand.
if (PreferredExtractIndex == Index0)		if (PreferredExtractIndex == Index0)
return Ext1;		return Ext1;
if (PreferredExtractIndex == Index1)		if (PreferredExtractIndex == Index1)
return Ext0;		return Ext0;

Show All 11 Lines	bool VectorCombine::isExtractExtractCheap(ExtractElementInst *Ext0,
unsigned Opcode,		unsigned Opcode,
ExtractElementInst *&ConvertToShuffle,		ExtractElementInst *&ConvertToShuffle,
unsigned PreferredExtractIndex) {		unsigned PreferredExtractIndex) {
assert(isa<ConstantInt>(Ext0->getOperand(1)) &&		assert(isa<ConstantInt>(Ext0->getOperand(1)) &&
isa<ConstantInt>(Ext1->getOperand(1)) &&		isa<ConstantInt>(Ext1->getOperand(1)) &&
"Expected constant extract indexes");		"Expected constant extract indexes");
Type *ScalarTy = Ext0->getType();		Type *ScalarTy = Ext0->getType();
auto *VecTy = cast<VectorType>(Ext0->getOperand(0)->getType());		auto *VecTy = cast<VectorType>(Ext0->getOperand(0)->getType());
int ScalarOpCost, VectorOpCost;		InstructionCost ScalarOpCost, VectorOpCost;

// Get cost estimates for scalar and vector versions of the operation.		// Get cost estimates for scalar and vector versions of the operation.
bool IsBinOp = Instruction::isBinaryOp(Opcode);		bool IsBinOp = Instruction::isBinaryOp(Opcode);
if (IsBinOp) {		if (IsBinOp) {
ScalarOpCost = TTI.getArithmeticInstrCost(Opcode, ScalarTy);		ScalarOpCost = TTI.getArithmeticInstrCost(Opcode, ScalarTy);
VectorOpCost = TTI.getArithmeticInstrCost(Opcode, VecTy);		VectorOpCost = TTI.getArithmeticInstrCost(Opcode, VecTy);
} else {		} else {
assert((Opcode == Instruction::ICmp \|\| Opcode == Instruction::FCmp) &&		assert((Opcode == Instruction::ICmp \|\| Opcode == Instruction::FCmp) &&
"Expected a compare");		"Expected a compare");
ScalarOpCost = TTI.getCmpSelInstrCost(Opcode, ScalarTy,		ScalarOpCost = TTI.getCmpSelInstrCost(Opcode, ScalarTy,
CmpInst::makeCmpResultType(ScalarTy));		CmpInst::makeCmpResultType(ScalarTy));
VectorOpCost = TTI.getCmpSelInstrCost(Opcode, VecTy,		VectorOpCost = TTI.getCmpSelInstrCost(Opcode, VecTy,
CmpInst::makeCmpResultType(VecTy));		CmpInst::makeCmpResultType(VecTy));
}		}

// Get cost estimates for the extract elements. These costs will factor into		// Get cost estimates for the extract elements. These costs will factor into
// both sequences.		// both sequences.
unsigned Ext0Index = cast<ConstantInt>(Ext0->getOperand(1))->getZExtValue();		unsigned Ext0Index = cast<ConstantInt>(Ext0->getOperand(1))->getZExtValue();
unsigned Ext1Index = cast<ConstantInt>(Ext1->getOperand(1))->getZExtValue();		unsigned Ext1Index = cast<ConstantInt>(Ext1->getOperand(1))->getZExtValue();

int Extract0Cost =		InstructionCost Extract0Cost =
TTI.getVectorInstrCost(Instruction::ExtractElement, VecTy, Ext0Index);		TTI.getVectorInstrCost(Instruction::ExtractElement, VecTy, Ext0Index);
int Extract1Cost =		InstructionCost Extract1Cost =
TTI.getVectorInstrCost(Instruction::ExtractElement, VecTy, Ext1Index);		TTI.getVectorInstrCost(Instruction::ExtractElement, VecTy, Ext1Index);

// A more expensive extract will always be replaced by a splat shuffle.		// A more expensive extract will always be replaced by a splat shuffle.
// For example, if Ext0 is more expensive:		// For example, if Ext0 is more expensive:
// opcode (extelt V0, Ext0), (ext V1, Ext1) -->		// opcode (extelt V0, Ext0), (ext V1, Ext1) -->
// extelt (opcode (splat V0, Ext0), V1), Ext1		// extelt (opcode (splat V0, Ext0), V1), Ext1
// TODO: Evaluate whether that always results in lowest cost. Alternatively,		// TODO: Evaluate whether that always results in lowest cost. Alternatively,
// check the cost of creating a broadcast shuffle and shuffling both		// check the cost of creating a broadcast shuffle and shuffling both
// operands to element 0.		// operands to element 0.
int CheapExtractCost = std::min(Extract0Cost, Extract1Cost);		InstructionCost CheapExtractCost = std::min(Extract0Cost, Extract1Cost);

		ctetreauUnsubmitted Done Reply Inline Actions can std::min be used here? InstructionCost has overloaded comparison operators and a total ordering. Assuming it can be, we should probably get rid of InstructionCost::min and InstructionCost::max. That can be a different patch. ctetreau: can std::min be used here? InstructionCost has overloaded comparison operators and a total…
// Extra uses of the extracts mean that we include those costs in the		// Extra uses of the extracts mean that we include those costs in the
// vector total because those instructions will not be eliminated.		// vector total because those instructions will not be eliminated.
int OldCost, NewCost;		InstructionCost OldCost, NewCost;
if (Ext0->getOperand(0) == Ext1->getOperand(0) && Ext0Index == Ext1Index) {		if (Ext0->getOperand(0) == Ext1->getOperand(0) && Ext0Index == Ext1Index) {
// Handle a special case. If the 2 extracts are identical, adjust the		// Handle a special case. If the 2 extracts are identical, adjust the
// formulas to account for that. The extra use charge allows for either the		// formulas to account for that. The extra use charge allows for either the
// CSE'd pattern or an unoptimized form with identical values:		// CSE'd pattern or an unoptimized form with identical values:
// opcode (extelt V, C), (extelt V, C) --> extelt (opcode V, V), C		// opcode (extelt V, C), (extelt V, C) --> extelt (opcode V, V), C
bool HasUseTax = Ext0 == Ext1 ? !Ext0->hasNUses(2)		bool HasUseTax = Ext0 == Ext1 ? !Ext0->hasNUses(2)
: !Ext0->hasOneUse() \|\| !Ext1->hasOneUse();		: !Ext0->hasOneUse() \|\| !Ext1->hasOneUse();
OldCost = CheapExtractCost + ScalarOpCost;		OldCost = CheapExtractCost + ScalarOpCost;
Show All 18 Lines	if (ConvertToShuffle) {
// extraction lane. Therefore, it is a splat shuffle. Ex:		// extraction lane. Therefore, it is a splat shuffle. Ex:
// ShufMask = { undef, undef, 0, undef }		// ShufMask = { undef, undef, 0, undef }
// TODO: The cost model has an option for a "broadcast" shuffle		// TODO: The cost model has an option for a "broadcast" shuffle
// (splat-from-element-0), but no option for a more general splat.		// (splat-from-element-0), but no option for a more general splat.
NewCost +=		NewCost +=
TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, VecTy);		TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, VecTy);
}		}

// Aggressively form a vector op if the cost is equal because the transform		// Aggressively form a vector op if the cost is equal because the transform
// may enable further optimization.		// may enable further optimization.
		ctetreauUnsubmitted Not Done Reply Inline Actions Would it be reasonable to return false here? If all costs involved are invalid, then I would say the transform is not cheap. ctetreau: Would it be reasonable to return false here? If all costs involved are invalid, then I would…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions If I remove the assert and leave the test to do its job the return will be false, because OldCost would be equal to NewCost. CarolineConcatto: If I remove the assert and leave the test to do its job the return will be false, because…
// Codegen can reverse this transform (scalarize) if it was not profitable.		// Codegen can reverse this transform (scalarize) if it was not profitable.
return OldCost < NewCost;		return OldCost < NewCost;
}		}

		ctetreauUnsubmitted Done Reply Inline Actions this is redundant. If OldCost is valid, and NewCost is invalid, then OldCost < NewCost returns true. ctetreau: this is redundant. If OldCost is valid, and NewCost is invalid, then OldCost < NewCost returns…
		david-armUnsubmitted Not Done Reply Inline Actions Again, @CarolineConcatto is just adding checks here as per discussion on the last SVE sync call, but we're happy to discuss the correct approach. david-arm: Again, @CarolineConcatto is just adding checks here as per discussion on the last SVE sync call…
		sdesmalenUnsubmitted Done Reply Inline Actions Similar to above, change the assert to check `OldCost.isValid() \|\| NewCost.isValid()` and remove the early exit. sdesmalen: Similar to above, change the assert to check `OldCost.isValid() \|\| NewCost.isValid()` and…
/// Create a shuffle that translates (shifts) 1 element from the input vector		/// Create a shuffle that translates (shifts) 1 element from the input vector
/// to a new element location.		/// to a new element location.
static Value createShiftShuffle(Value Vec, unsigned OldIndex,		static Value createShiftShuffle(Value Vec, unsigned OldIndex,
unsigned NewIndex, IRBuilder<> &Builder) {		unsigned NewIndex, IRBuilder<> &Builder) {
// The shuffle mask is undefined except for 1 lane that is being translated		// The shuffle mask is undefined except for 1 lane that is being translated
// to the new element index. Example for OldIndex == 2 and NewIndex == 0:		// to the new element index. Example for OldIndex == 2 and NewIndex == 0:
// ShufMask = { 2, undef, undef, undef }		// ShufMask = { 2, undef, undef, undef }
auto *VecTy = cast<FixedVectorType>(Vec->getType());		auto *VecTy = cast<FixedVectorType>(Vec->getType());
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	bool VectorCombine::foldBitcastShuf(Instruction &I) {
// TODO: We could allow any shuffle.		// TODO: We could allow any shuffle.
auto *DestTy = dyn_cast<FixedVectorType>(I.getType());		auto *DestTy = dyn_cast<FixedVectorType>(I.getType());
auto *SrcTy = dyn_cast<FixedVectorType>(V->getType());		auto *SrcTy = dyn_cast<FixedVectorType>(V->getType());
if (!SrcTy \|\| !DestTy \|\| I.getOperand(0)->getType() != SrcTy)		if (!SrcTy \|\| !DestTy \|\| I.getOperand(0)->getType() != SrcTy)
return false;		return false;

// The new shuffle must not cost more than the old shuffle. The bitcast is		// The new shuffle must not cost more than the old shuffle. The bitcast is
// moved ahead of the shuffle, so assume that it has the same cost as before.		// moved ahead of the shuffle, so assume that it has the same cost as before.
if (TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, DestTy) >		InstructionCost DestCost =
TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, SrcTy))		TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, DestTy);
		InstructionCost SrcCost =
		TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, SrcTy);
		if (DestCost > SrcCost \|\| !DestCost.isValid())
		ctetreauUnsubmitted Done Reply Inline Actions reasonable to return false here? If neither cost is valid, then do not do the transform ctetreau: reasonable to return false here? If neither cost is valid, then do not do the transform
		sdesmalenUnsubmitted Done Reply Inline Actions Does this assert need to be removed still? sdesmalen: Does this assert need to be removed still?
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Yes, sorry missed that! CarolineConcatto: Yes, sorry missed that!
return false;		return false;
		ctetreauUnsubmitted Done Reply Inline Actions the or is redundant ctetreau: the or is redundant

unsigned DestNumElts = DestTy->getNumElements();		unsigned DestNumElts = DestTy->getNumElements();
unsigned SrcNumElts = SrcTy->getNumElements();		unsigned SrcNumElts = SrcTy->getNumElements();
SmallVector<int, 16> NewMask;		SmallVector<int, 16> NewMask;
if (SrcNumElts <= DestNumElts) {		if (SrcNumElts <= DestNumElts) {
// The bitcast is from wide to narrow/equal elements. The shuffle mask can		// The bitcast is from wide to narrow/equal elements. The shuffle mask can
// always be expanded to the equivalent form choosing narrower elements.		// always be expanded to the equivalent form choosing narrower elements.
assert(DestNumElts % SrcNumElts == 0 && "Unexpected shuffle mask");		assert(DestNumElts % SrcNumElts == 0 && "Unexpected shuffle mask");
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	bool VectorCombine::scalarizeBinopOrCmp(Instruction &I) {
Type *VecTy = I.getType();		Type *VecTy = I.getType();
assert(VecTy->isVectorTy() &&		assert(VecTy->isVectorTy() &&
(IsConst0 \|\| IsConst1 \|\| V0->getType() == V1->getType()) &&		(IsConst0 \|\| IsConst1 \|\| V0->getType() == V1->getType()) &&
(ScalarTy->isIntegerTy() \|\| ScalarTy->isFloatingPointTy() \|\|		(ScalarTy->isIntegerTy() \|\| ScalarTy->isFloatingPointTy() \|\|
ScalarTy->isPointerTy()) &&		ScalarTy->isPointerTy()) &&
"Unexpected types for insert element into binop or cmp");		"Unexpected types for insert element into binop or cmp");

unsigned Opcode = I.getOpcode();		unsigned Opcode = I.getOpcode();
int ScalarOpCost, VectorOpCost;		InstructionCost ScalarOpCost, VectorOpCost;
if (IsCmp) {		if (IsCmp) {
ScalarOpCost = TTI.getCmpSelInstrCost(Opcode, ScalarTy);		ScalarOpCost = TTI.getCmpSelInstrCost(Opcode, ScalarTy);
VectorOpCost = TTI.getCmpSelInstrCost(Opcode, VecTy);		VectorOpCost = TTI.getCmpSelInstrCost(Opcode, VecTy);
} else {		} else {
ScalarOpCost = TTI.getArithmeticInstrCost(Opcode, ScalarTy);		ScalarOpCost = TTI.getArithmeticInstrCost(Opcode, ScalarTy);
VectorOpCost = TTI.getArithmeticInstrCost(Opcode, VecTy);		VectorOpCost = TTI.getArithmeticInstrCost(Opcode, VecTy);
}		}

// Get cost estimate for the insert element. This cost will factor into		// Get cost estimate for the insert element. This cost will factor into
// both sequences.		// both sequences.
int InsertCost =		InstructionCost InsertCost =
TTI.getVectorInstrCost(Instruction::InsertElement, VecTy, Index);		TTI.getVectorInstrCost(Instruction::InsertElement, VecTy, Index);
int OldCost = (IsConst0 ? 0 : InsertCost) + (IsConst1 ? 0 : InsertCost) +		InstructionCost OldCost =
VectorOpCost;		(IsConst0 ? 0 : InsertCost) + (IsConst1 ? 0 : InsertCost) + VectorOpCost;
int NewCost = ScalarOpCost + InsertCost +		InstructionCost NewCost = ScalarOpCost + InsertCost +
(IsConst0 ? 0 : !Ins0->hasOneUse() * InsertCost) +		(IsConst0 ? 0 : !Ins0->hasOneUse() * InsertCost) +
(IsConst1 ? 0 : !Ins1->hasOneUse() * InsertCost);		(IsConst1 ? 0 : !Ins1->hasOneUse() * InsertCost);

// We want to scalarize unless the vector variant actually has lower cost.		// We want to scalarize unless the vector variant actually has lower cost.
if (OldCost < NewCost)		if (OldCost < NewCost \|\| !NewCost.isValid())
		ctetreauUnsubmitted Not Done Reply Inline Actions reasonable to return false? ctetreau: reasonable to return false?
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions If both are invalid, maybe. But if OldCost is invalid I think you mean return true, because OldCost>NewCost And in this last case I don't think we should add the earlier return because it is a change on the code path. CarolineConcatto: If both are invalid, maybe. But if OldCost is invalid I think you mean return true, because…
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: `\|\| !NewCost.isValid()` should be sufficient (see the other comment). sdesmalen: nit: `\|\| !NewCost.isValid()` should be sufficient (see the other comment).
return false;		return false;
		ctetreauUnsubmitted Done Reply Inline Actions the or is redundant ctetreau: the or is redundant

// vec_op (inselt VecC0, V0, Index), (inselt VecC1, V1, Index) -->		// vec_op (inselt VecC0, V0, Index), (inselt VecC1, V1, Index) -->
// inselt NewVecC, (scalar_op V0, V1), Index		// inselt NewVecC, (scalar_op V0, V1), Index
if (IsCmp)		if (IsCmp)
++NumScalarCmp;		++NumScalarCmp;
else		else
++NumScalarBO;		++NumScalarBO;

▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	bool VectorCombine::foldExtractedCmps(Instruction &I) {
// binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)		// binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)
CmpInst::Predicate Pred = P0;		CmpInst::Predicate Pred = P0;
unsigned CmpOpcode = CmpInst::isFPPredicate(Pred) ? Instruction::FCmp		unsigned CmpOpcode = CmpInst::isFPPredicate(Pred) ? Instruction::FCmp
: Instruction::ICmp;		: Instruction::ICmp;
auto *VecTy = dyn_cast<FixedVectorType>(X->getType());		auto *VecTy = dyn_cast<FixedVectorType>(X->getType());
if (!VecTy)		if (!VecTy)
return false;		return false;

int OldCost = TTI.getVectorInstrCost(Ext0->getOpcode(), VecTy, Index0);		InstructionCost OldCost =
		TTI.getVectorInstrCost(Ext0->getOpcode(), VecTy, Index0);
OldCost += TTI.getVectorInstrCost(Ext1->getOpcode(), VecTy, Index1);		OldCost += TTI.getVectorInstrCost(Ext1->getOpcode(), VecTy, Index1);
OldCost += TTI.getCmpSelInstrCost(CmpOpcode, I0->getType()) * 2;		OldCost += TTI.getCmpSelInstrCost(CmpOpcode, I0->getType()) * 2;
OldCost += TTI.getArithmeticInstrCost(I.getOpcode(), I.getType());		OldCost += TTI.getArithmeticInstrCost(I.getOpcode(), I.getType());

// The proposed vector pattern is:		// The proposed vector pattern is:
// vcmp = cmp Pred X, VecC		// vcmp = cmp Pred X, VecC
// ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0		// ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0
int CheapIndex = ConvertToShuf == Ext0 ? Index1 : Index0;		int CheapIndex = ConvertToShuf == Ext0 ? Index1 : Index0;
int ExpensiveIndex = ConvertToShuf == Ext0 ? Index0 : Index1;		int ExpensiveIndex = ConvertToShuf == Ext0 ? Index0 : Index1;
auto *CmpTy = cast<FixedVectorType>(CmpInst::makeCmpResultType(X->getType()));		auto *CmpTy = cast<FixedVectorType>(CmpInst::makeCmpResultType(X->getType()));
int NewCost = TTI.getCmpSelInstrCost(CmpOpcode, X->getType());		InstructionCost NewCost = TTI.getCmpSelInstrCost(CmpOpcode, X->getType());
NewCost +=		NewCost +=
TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, CmpTy);		TTI.getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, CmpTy);
NewCost += TTI.getArithmeticInstrCost(I.getOpcode(), CmpTy);		NewCost += TTI.getArithmeticInstrCost(I.getOpcode(), CmpTy);
NewCost += TTI.getVectorInstrCost(Ext0->getOpcode(), CmpTy, CheapIndex);		NewCost += TTI.getVectorInstrCost(Ext0->getOpcode(), CmpTy, CheapIndex);

// Aggressively form vector ops if the cost is equal because the transform		// Aggressively form vector ops if the cost is equal because the transform
// may enable further optimization.		// may enable further optimization.
// Codegen can reverse this transform (scalarize) if it was not profitable.		// Codegen can reverse this transform (scalarize) if it was not profitable.
if (OldCost < NewCost)		if (OldCost < NewCost \|\| !NewCost.isValid())
		ctetreauUnsubmitted Not Done Reply Inline Actions reasonable to return false? ctetreau: reasonable to return false?
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions If both are invalid, maybe. But if OldCost is invalid I think you mean return true, because OldCost>NewCost And in this last case I don't think we should add the earlier return because it is a change on the code path. CarolineConcatto: If both are invalid, maybe. But if OldCost is invalid I think you mean return true, because…
return false;		return false;
		ctetreauUnsubmitted Done Reply Inline Actions the or is redundant ctetreau: the or is redundant
		sdesmalenUnsubmitted Done Reply Inline Actions I don't think the `or` was redundant. If the new cost is invalid, then it shouldn't do the transform. With only testing `OldCost < NewCost`, we'd get: isValid? OldCost NewCost (OldCost < NewCost) result -------------------------------------------------------- 1. true true OldCost.Val < NewCost.Val ? 2. true false Valid < Invalid true 3. false true Invalid < Valid false 4. false false Invalid < Invalid false However, 4. should be 'true' in order to return early from the function. isValid? OldCost NewCost (OldCost < NewCost \|\| !NewCost.isValid) result ----------------------------------------------------------------------- 1. true true OldCost.Val < NewCost.Val \|\| false ? 2. true false Valid < Invalid \|\| true true 3. false true Invalid < Valid \|\| false false 4. false false Invalid < Invalid \|\| true true Gives us the result we want. nit: based on that I believe `!OldCost.isValid() &&` is now redundant. sdesmalen: I don't think the `or` was redundant. If the new cost is invalid, then it shouldn't do the…

// Create a vector constant from the 2 scalar constants.		// Create a vector constant from the 2 scalar constants.
SmallVector<Constant *, 32> CmpC(VecTy->getNumElements(),		SmallVector<Constant *, 32> CmpC(VecTy->getNumElements(),
UndefValue::get(VecTy->getElementType()));		UndefValue::get(VecTy->getElementType()));
CmpC[Index0] = C0;		CmpC[Index0] = C0;
CmpC[Index1] = C1;		CmpC[Index1] = C1;
Value *VCmp = Builder.CreateCmp(Pred, X, ConstantVector::get(CmpC));		Value *VCmp = Builder.CreateCmp(Pred, X, ConstantVector::get(CmpC));

▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[NFC][InstructionCost]Migrate VectorCombine.cpp to use InstructionCostClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 317234

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

[NFC][InstructionCost]Migrate VectorCombine.cpp to use InstructionCost
ClosedPublic