This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
21/24
AArch64TargetTransformInfo.cpp
-
test/Analysis/CostModel/AArch64/
-
Analysis/
-
CostModel/
-
AArch64/
3/4
div.ll

Differential D135991

[AArch64] Fix cost model for `udiv` instruction when one of the operands is a uniform constant
ClosedPublic

Authored by zjaffal on Oct 14 2022, 2:17 PM.

Download Raw Diff

Details

Reviewers

hassnaa-arm
david-arm
dmgreen
fhahn

Commits

rG6e4cea55f0d1: [AArch64] Fix cost model for `udiv` instruction when one of the operands is a…

Summary

Currently the model over estimates the cost of a udiv instruction with one constant. The correct cost for a udiv instruction is
insert_cost * extract_cost * num_elements

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

zjaffal created this revision.Oct 14 2022, 2:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2022, 2:17 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

zjaffal requested review of this revision.Oct 14 2022, 2:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2022, 2:17 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B192267: Diff 467910.Oct 14 2022, 3:17 PM

dmgreen added inline comments.Oct 17 2022, 12:32 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2188	Using getArithmeticInstrCost here seems very odd. I would expect the base cost to be `NgetVectorInstrCost(Instruction::ExtractElement) + N getVectorInstrCost(Instruction::InsertElement) + N*getArithmeticInstrCost(Div)`. Sometimes we use a cheaper cross-register-bank copy cost than getVectorInstrCost would give. I would expect the getVectorInstrCost's to become removed if one of the operations was uniform or constant. And maybe the Cost += Cost shouldn't be needed if the vector instructions have correct costs. All the code here seems to be trying to give a very high cost for vector divides, as they will just be scalarized.
llvm/test/Analysis/CostModel/AArch64/div.ll
181–183	It seems odd that doing 2 divides, plus some cross-register-bank copies, would be cheaper than doing 1

david-arm added inline comments.Oct 17 2022, 1:05 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2147	I think this is broken because you're assuming that all vectors are of type FixedVectorType. This code will crash if you one of the operands is a constant and we're dealing with scalable vector types. Perhaps it's better to move this to the else case below, i.e. if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) { ... } else { if ((Op1Info.isConstant() && Op1Info.isUniform()) \|\| ... // On AArch64, without SVE, vector divisions are expanded We shouldn't be dealing with scalable vectors at this point so the `cast<FixedVectorType>(Ty)` ought to be fine.
llvm/test/Analysis/CostModel/AArch64/div.ll
181–183	I agree with @dmgreen, this does look odd.

RKSimon added a subscriber: RKSimon.Oct 17 2022, 2:10 AM

RKSimon added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2188	You should be able to use getScalarizationOverhead to handle all of the extract/insert costs for you.

Matt added a subscriber: Matt.Oct 19 2022, 5:19 AM

zjaffal added inline comments.Oct 19 2022, 6:39 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2147	I will move it after that block then
2188	I think this probably needs fixing because the assumed result is over a 100 for the cases that I was dealing with if you look at the test cases down below. I think there is repeated counting because first the base cost is InstructionCost Cost = BaseT::getArithmeticInstrCost( Opcode, Ty, CostKind, Op1Info, Op2Info); And in the BaseT it calculates the extract and insert cost. This is the relevant snippet from `llvm/include/llvm/CodeGen/BasicTTIImpl.h` if (auto VTy = dyn_cast<FixedVectorType>(Ty)) { InstructionCost Cost = thisT()->getArithmeticInstrCost( Opcode, VTy->getScalarType(), CostKind, Opd1Info, Opd2Info, Args, CxtI); // Return the cost of multiple scalar invocation plus the cost of // inserting and extracting the values. SmallVector<Type > Tys(Args.size(), Ty); return getScalarizationOverhead(VTy, Args, Tys) + VTy->getNumElements() * Cost; }
llvm/test/Analysis/CostModel/AArch64/div.ll
181–183	I assumed that the insert and extract cost is 1. This is because `getVectorInstrCost(Instruction::ExtractElement)` and `getVectorInstrCost(Instruction::InsertElement)` is set to 2. I am not sure if I should use those values or keep it as it is.

Move the code block to make sure that vectors are fixed size.
Update the cost model to use insert and extract cost

Harbormaster completed remote builds in B194628: Diff 471125.Oct 27 2022, 6:37 AM

Can you explain more about the motivating case you have? Why should the cost of the vector divide be so low?

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2178	getArithmeticInstrCost should not be used with InsertElement or ExtractElement. It is used for arithmetic instructions like Add and Div. getVectorInstrCost should be used with InsertElement and ExtractElement. For the i64 Mul case below we hard-coded the values 2 instead, due to the more regular nature of the scalarization. (It was originally 1, but we had to increase it as it was not quite high enough in cases).
2183	Remove the debug message in the final version.
2185	This is assuming that the cost of the actual divide is 1? That sounds too low in some cases, compared to scalar. Should it be `(InsertCost + ExtractCost + Cost) * VTy->getNumElements()`?

zjaffal marked 5 inline comments as done.Oct 31 2022, 2:42 AM

zjaffal added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2178	The code before my patch used `getArithmeticInstrCost` should we only use `getVectorInstrCost` then?
2185	The problem is that cost here counts the insert and extract cost so we will end up counting twice. InstructionCost Cost = BaseT::getArithmeticInstrCost( Opcode, Ty, CostKind, Op1Info, Op2Info); Takes into account the scalarization overhead. For the following example define <8 x i32> @foo(<8 x i32> %v) { %res = udiv <8 x i32> %v, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255> ret <8 x i32> %res } The cost was 44. I don't think it is correct

dmgreen added inline comments.Oct 31 2022, 3:59 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2178	I think that using getArithmeticInstrCost in the old code was wrong, yeah. But all the old costmode code is pretty odd, like the way it does `Cost += Cost;`. Using getVectorInstrCost might end up giving too high a cost, if it does I would not be against using a cost of 2, like we do for Mul. I would expect the basic cost of a vector divide that we expand to be: `(getArithmeticInstrCost(Div, ScalarTy) + 2 * getVectorInstrCost(Extract) + getVectorInstrCost(Insert)) * VF`. The `2ExractCost` could be reduced to `1ExtractCost` if the operand is Uniform. I haven't done any checking of that scheme though. That's where benchmarks come in, to make sure the theory matches practice and it doesn't end up making things worse. There may have been a reason why the old code got things "wrong".
2185	I see, because we convert the divide into a series of multiplies and shifts. That would apply to scalar divide too, right? Do we need some code like this? https://github.com/llvm/llvm-project/blob/72e9447e29abf111c742da59afe4152150a2f8e7/llvm/lib/Target/X86/X86TargetTransformInfo.cpp#L302 It only handles powers-of-2 though, so might need extending for your usecase. Having it somewhere generic would be good too.

zjaffal added inline comments.Nov 2 2022, 3:28 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

2178

I tried doing the following now for uniform constants

FixedVectorType *VTy = cast<FixedVectorType>(Ty);
InstructionCost InsertExtractCost;
for(unsigned Idx = 0; Idx < VTy->getNumElements() ; Idx ++){
  InsertExtractCost += getVectorInstrCost(Instruction::InsertElement, VTy, Idx);
  InsertExtractCost += getVectorInstrCost(Instruction::ExtractElement, VTy, Idx);
}
InstructionCost DivCost = getArithmeticInstrCost(Opcode, VTy->getScalarType(), CostKind);
return InsertExtractCost + DivCost * VTy->getNumElements();

It overly estimates the cost of div instructions, take the following example:

%V64i8 = udiv <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>

The original estimate was: 880
My current patch estimate: 80
Using getArithmeticInstrCost: 424

I haven't done any checking of that scheme though. That's where benchmarks come in, to make sure the theory matches practice and it doesn't end up making things worse. There may have been a reason why the old code got things "wrong".

I will put a microbenchmark patch to test the cost result. I think this might give a clear indicator of the cost estimate

dmgreen added inline comments.Nov 2 2022, 8:46 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2178	424 seems like it might it might not be too far off, just counting instructions: https://godbolt.org/z/x8fsYP5Pb. Any 64x vector is going to fairly expensive if we need to scalarize it. If it was me, I might be tempted to just use `(DivCost + 4) * VF`, to match the mul variant below. 4 is 2 for insert, 2 for extract. It would be 6 for an instruction with variable operands. It might work better for regular scalarization of integers. That would rely on getArithmeticInstrCost(Div, ScalarTy) getting the correct cost for non-power2 constants to be precise for the example above, which it might not at the moment.

Use scalar div cost + 4 to calcuate uniform costant division cost

Harbormaster completed remote builds in B196694: Diff 473975.Nov 8 2022, 8:03 AM

Thanks. The existing code is pretty odd, we may want to clean it up at some point. The new code seems better though. If you can clean it up a little as suggested and explain the difference with the scalar costs then this sounds good to me.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
18	Are these changes necessary?
2147	Remove this change now?
2174	if -> If
2182	Should UDiv be Opcode, to handle SDiv a little better?
llvm/test/Analysis/CostModel/AArch64/div.ll
181–183	Can you explain why this isn't (7+4)*2 = 22? Is the scalar cost treated as 1?

zjaffal added inline comments.Nov 9 2022, 6:49 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
18	I don't think so. Automatic imports from vscode
2182	Do you mean using `ISD::UDiv` instead of `Instruction::UDiv`. I tried that but it get the following error if I do that Assertion failed: (ISD && "Invalid opcode"), function getArithmeticInstrCost, file BasicTTIImpl.h, line 834. We can do the following instead int Opcdoe = (ISD == ISD::UDIV) ? Instruction::UDiv : Instruction::SDiv;

dmgreen added inline comments.Nov 9 2022, 7:05 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2182	I just meant the variable/argument `Opcode` - as far as I understand it could be UDiv or SDiv here.

zjaffal marked 2 inline comments as done.Nov 9 2022, 7:07 AM

zjaffal added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2182	We can do the following then int Opcdoe = (ISD == ISD::UDIV) ? Instruction::UDiv : Instruction::SDiv;

Remove unecessary imports, fix comments.
Opcode can be either UDiv or SDiv

Harbormaster completed remote builds in B199408: Diff 477764.Nov 24 2022, 6:55 AM

Other than changing the parameter passed through, this LGTM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2090	This one :)
2182	I think the Opcode parameter will already be the UDiv/SDiv, and can be used directly. Correct me if I'm wrong.

This revision is now accepted and ready to land.Nov 25 2022, 12:30 AM

Use opcode to get div cost

This revision was landed with ongoing or failed builds.Nov 28 2022, 12:42 AM

Closed by commit rG6e4cea55f0d1: [AArch64] Fix cost model for `udiv` instruction when one of the operands is a… (authored by zjaffal). · Explain Why

This revision was automatically updated to reflect the committed changes.

zjaffal added a commit: rG6e4cea55f0d1: [AArch64] Fix cost model for `udiv` instruction when one of the operands is a….

Harbormaster completed remote builds in B199686: Diff 478141.Nov 28 2022, 1:16 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.cpp

12 lines

test/

Analysis/

CostModel/

AArch64/

div.ll

90 lines

Diff 478144

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show All 9 Lines
#include "AArch64ExpandImm.h"		#include "AArch64ExpandImm.h"
#include "AArch64PerfectShuffle.h"		#include "AArch64PerfectShuffle.h"
#include "MCTargetDesc/AArch64AddressingModes.h"		#include "MCTargetDesc/AArch64AddressingModes.h"
#include "llvm/Analysis/IVDescriptors.h"		#include "llvm/Analysis/IVDescriptors.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/CodeGen/BasicTTIImpl.h"		#include "llvm/CodeGen/BasicTTIImpl.h"
#include "llvm/CodeGen/CostTable.h"		#include "llvm/CodeGen/CostTable.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"
		dmgreenUnsubmitted Done Reply Inline Actions Are these changes necessary? dmgreen: Are these changes necessary?
		zjaffalAuthorUnsubmitted Done Reply Inline Actions I don't think so. Automatic imports from vscode zjaffal: I don't think so. Automatic imports from vscode
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/IntrinsicsAArch64.h"		#include "llvm/IR/IntrinsicsAArch64.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Transforms/InstCombine/InstCombiner.h"		#include "llvm/Transforms/InstCombine/InstCombiner.h"
#include "llvm/Transforms/Vectorize/LoopVectorizationLegality.h"		#include "llvm/Transforms/Vectorize/LoopVectorizationLegality.h"
#include <algorithm>		#include <algorithm>
▲ Show 20 Lines • Show All 2,055 Lines • ▼ Show 20 Lines
}		}

InstructionCost AArch64TTIImpl::getVectorInstrCost(const Instruction &I,		InstructionCost AArch64TTIImpl::getVectorInstrCost(const Instruction &I,
Type *Val, unsigned Index) {		Type *Val, unsigned Index) {
return getVectorInstrCostHelper(Val, Index, true /* HasRealUse */);		return getVectorInstrCostHelper(Val, Index, true /* HasRealUse */);
}		}

InstructionCost AArch64TTIImpl::getArithmeticInstrCost(		InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,		unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
		dmgreenUnsubmitted Not Done Reply Inline Actions This one :) dmgreen: This one :)
TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,		TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
ArrayRef<const Value *> Args,		ArrayRef<const Value *> Args,
const Instruction *CxtI) {		const Instruction *CxtI) {

// TODO: Handle more cost kinds.		// TODO: Handle more cost kinds.
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,		return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
Op2Info, Args, CxtI);		Op2Info, Args, CxtI);
Show All 40 Lines	if (Op2Info.isConstant() && Op2Info.isUniform()) {
Instruction::AShr, Ty, CostKind, Op1Info.getNoProps(), Op2Info.getNoProps());		Instruction::AShr, Ty, CostKind, Op1Info.getNoProps(), Op2Info.getNoProps());
return MulCost * 2 + AddCost * 2 + ShrCost * 2 + 1;		return MulCost * 2 + AddCost * 2 + ShrCost * 2 + 1;
}		}
}		}

InstructionCost Cost = BaseT::getArithmeticInstrCost(		InstructionCost Cost = BaseT::getArithmeticInstrCost(
Opcode, Ty, CostKind, Op1Info, Op2Info);		Opcode, Ty, CostKind, Op1Info, Op2Info);
if (Ty->isVectorTy()) {		if (Ty->isVectorTy()) {
if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) {		if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) {
		david-armUnsubmitted Done Reply Inline Actions I think this is broken because you're assuming that all vectors are of type FixedVectorType. This code will crash if you one of the operands is a constant and we're dealing with scalable vector types. Perhaps it's better to move this to the else case below, i.e. if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) { ... } else { if ((Op1Info.isConstant() && Op1Info.isUniform()) \|\| ... // On AArch64, without SVE, vector divisions are expanded We shouldn't be dealing with scalable vectors at this point so the `cast<FixedVectorType>(Ty)` ought to be fine. david-arm: I think this is broken because you're assuming that all vectors are of type FixedVectorType.
		zjaffalAuthorUnsubmitted Done Reply Inline Actions I will move it after that block then zjaffal: I will move it after that block then
		dmgreenUnsubmitted Done Reply Inline Actions Remove this change now? dmgreen: Remove this change now?
// SDIV/UDIV operations are lowered using SVE, then we can have less		// SDIV/UDIV operations are lowered using SVE, then we can have less
// costs.		// costs.
if (isa<FixedVectorType>(Ty) &&		if (isa<FixedVectorType>(Ty) &&
cast<FixedVectorType>(Ty)->getPrimitiveSizeInBits().getFixedSize() <		cast<FixedVectorType>(Ty)->getPrimitiveSizeInBits().getFixedSize() <
128) {		128) {
EVT VT = TLI->getValueType(DL, Ty);		EVT VT = TLI->getValueType(DL, Ty);
static const CostTblEntry DivTbl[]{		static const CostTblEntry DivTbl[]{
{ISD::SDIV, MVT::v2i8, 5}, {ISD::SDIV, MVT::v4i8, 8},		{ISD::SDIV, MVT::v2i8, 5}, {ISD::SDIV, MVT::v4i8, 8},
Show All 10 Lines	if (Ty->isVectorTy()) {
// For 8/16-bit elements, the cost is higher because the type		// For 8/16-bit elements, the cost is higher because the type
// requires promotion and possibly splitting:		// requires promotion and possibly splitting:
if (LT.second.getScalarType() == MVT::i8)		if (LT.second.getScalarType() == MVT::i8)
Cost *= 8;		Cost *= 8;
else if (LT.second.getScalarType() == MVT::i16)		else if (LT.second.getScalarType() == MVT::i16)
Cost *= 4;		Cost *= 4;
return Cost;		return Cost;
} else {		} else {
		// If one of the operands is a uniform constant then the cost for each
		dmgreenUnsubmitted Done Reply Inline Actions if -> If dmgreen: if -> If
		// element is Cost for insertion, extraction and division.
		// Insertion cost = 2, Extraction Cost = 2, Division = cost for the
		// operation with scalar type
		if ((Op1Info.isConstant() && Op1Info.isUniform()) \|\|
		dmgreenUnsubmitted Done Reply Inline Actions getArithmeticInstrCost should not be used with InsertElement or ExtractElement. It is used for arithmetic instructions like Add and Div. getVectorInstrCost should be used with InsertElement and ExtractElement. For the i64 Mul case below we hard-coded the values 2 instead, due to the more regular nature of the scalarization. (It was originally 1, but we had to increase it as it was not quite high enough in cases). dmgreen: getArithmeticInstrCost should not be used with InsertElement or ExtractElement. It is used for…
		zjaffalAuthorUnsubmitted Done Reply Inline Actions The code before my patch used `getArithmeticInstrCost` should we only use `getVectorInstrCost` then? zjaffal: The code before my patch used `getArithmeticInstrCost` should we only use `getVectorInstrCost`…
		dmgreenUnsubmitted Done Reply Inline Actions I think that using getArithmeticInstrCost in the old code was wrong, yeah. But all the old costmode code is pretty odd, like the way it does `Cost += Cost;`. Using getVectorInstrCost might end up giving too high a cost, if it does I would not be against using a cost of 2, like we do for Mul. I would expect the basic cost of a vector divide that we expand to be: `(getArithmeticInstrCost(Div, ScalarTy) + 2 * getVectorInstrCost(Extract) + getVectorInstrCost(Insert)) * VF`. The `2ExractCost` could be reduced to `1ExtractCost` if the operand is Uniform. I haven't done any checking of that scheme though. That's where benchmarks come in, to make sure the theory matches practice and it doesn't end up making things worse. There may have been a reason why the old code got things "wrong". dmgreen: I think that using getArithmeticInstrCost in the old code was wrong, yeah. But all the old…
		zjaffalAuthorUnsubmitted Done Reply Inline Actions I tried doing the following now for uniform constants FixedVectorType VTy = cast<FixedVectorType>(Ty); InstructionCost InsertExtractCost; for(unsigned Idx = 0; Idx < VTy->getNumElements() ; Idx ++){ InsertExtractCost += getVectorInstrCost(Instruction::InsertElement, VTy, Idx); InsertExtractCost += getVectorInstrCost(Instruction::ExtractElement, VTy, Idx); } InstructionCost DivCost = getArithmeticInstrCost(Opcode, VTy->getScalarType(), CostKind); return InsertExtractCost + DivCost VTy->getNumElements(); It overly estimates the cost of div instructions, take the following example: %V64i8 = udiv <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7> The original estimate was: 880 My current patch estimate: 80 Using `getArithmeticInstrCost`: 424 I haven't done any checking of that scheme though. That's where benchmarks come in, to make sure the theory matches practice and it doesn't end up making things worse. There may have been a reason why the old code got things "wrong". I will put a microbenchmark patch to test the cost result. I think this might give a clear indicator of the cost estimate zjaffal: I tried doing the following now for uniform constants ``` FixedVectorType *VTy =…
		dmgreenUnsubmitted Not Done Reply Inline Actions 424 seems like it might it might not be too far off, just counting instructions: https://godbolt.org/z/x8fsYP5Pb. Any 64x vector is going to fairly expensive if we need to scalarize it. If it was me, I might be tempted to just use `(DivCost + 4) * VF`, to match the mul variant below. 4 is 2 for insert, 2 for extract. It would be 6 for an instruction with variable operands. It might work better for regular scalarization of integers. That would rely on getArithmeticInstrCost(Div, ScalarTy) getting the correct cost for non-power2 constants to be precise for the example above, which it might not at the moment. dmgreen: 424 seems like it might it might not be too far off, just counting instructions: https…
		(Op2Info.isConstant() && Op2Info.isUniform())) {
		if (auto *VTy = dyn_cast<FixedVectorType>(Ty)) {
		InstructionCost DivCost = BaseT::getArithmeticInstrCost(
		Opcode, Ty->getScalarType(), CostKind, Op1Info, Op2Info);
		dmgreenUnsubmitted Done Reply Inline Actions Should UDiv be Opcode, to handle SDiv a little better? dmgreen: Should UDiv be Opcode, to handle SDiv a little better?
		zjaffalAuthorUnsubmitted Done Reply Inline Actions Do you mean using `ISD::UDiv` instead of `Instruction::UDiv`. I tried that but it get the following error if I do that Assertion failed: (ISD && "Invalid opcode"), function getArithmeticInstrCost, file BasicTTIImpl.h, line 834. We can do the following instead int Opcdoe = (ISD == ISD::UDIV) ? Instruction::UDiv : Instruction::SDiv; zjaffal: Do you mean using `ISD::UDiv` instead of `Instruction::UDiv`. I tried that but it get the…
		dmgreenUnsubmitted Done Reply Inline Actions I just meant the variable/argument `Opcode` - as far as I understand it could be UDiv or SDiv here. dmgreen: I just meant the variable/argument `Opcode` - as far as I understand it could be UDiv or SDiv…
		zjaffalAuthorUnsubmitted Done Reply Inline Actions We can do the following then int Opcdoe = (ISD == ISD::UDIV) ? Instruction::UDiv : Instruction::SDiv; zjaffal: We can do the following then ``` int Opcdoe = (ISD == ISD::UDIV) ? Instruction::UDiv…
		dmgreenUnsubmitted Not Done Reply Inline Actions I think the Opcode parameter will already be the UDiv/SDiv, and can be used directly. Correct me if I'm wrong. dmgreen: I think the Opcode parameter will already be the UDiv/SDiv, and can be used directly. Correct…
		return (4 + DivCost) * VTy->getNumElements();
		dmgreenUnsubmitted Done Reply Inline Actions Remove the debug message in the final version. dmgreen: Remove the debug message in the final version.
		}
		}
		dmgreenUnsubmitted Done Reply Inline Actions This is assuming that the cost of the actual divide is 1? That sounds too low in some cases, compared to scalar. Should it be `(InsertCost + ExtractCost + Cost) * VTy->getNumElements()`? dmgreen: This is assuming that the cost of the actual divide is 1? That sounds too low in some cases…
		zjaffalAuthorUnsubmitted Done Reply Inline Actions The problem is that cost here counts the insert and extract cost so we will end up counting twice. InstructionCost Cost = BaseT::getArithmeticInstrCost( Opcode, Ty, CostKind, Op1Info, Op2Info); Takes into account the scalarization overhead. For the following example define <8 x i32> @foo(<8 x i32> %v) { %res = udiv <8 x i32> %v, <i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255> ret <8 x i32> %res } The cost was 44. I don't think it is correct zjaffal: The problem is that cost here counts the insert and extract cost so we will end up counting…
		dmgreenUnsubmitted Done Reply Inline Actions I see, because we convert the divide into a series of multiplies and shifts. That would apply to scalar divide too, right? Do we need some code like this? https://github.com/llvm/llvm-project/blob/72e9447e29abf111c742da59afe4152150a2f8e7/llvm/lib/Target/X86/X86TargetTransformInfo.cpp#L302 It only handles powers-of-2 though, so might need extending for your usecase. Having it somewhere generic would be good too. dmgreen: I see, because we convert the divide into a series of multiplies and shifts. That would apply…
// On AArch64, without SVE, vector divisions are expanded		// On AArch64, without SVE, vector divisions are expanded
// into scalar divisions of each pair of elements.		// into scalar divisions of each pair of elements.
Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty,		Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty,
		dmgreenUnsubmitted Done Reply Inline Actions Using getArithmeticInstrCost here seems very odd. I would expect the base cost to be `NgetVectorInstrCost(Instruction::ExtractElement) + N getVectorInstrCost(Instruction::InsertElement) + NgetArithmeticInstrCost(Div)`. Sometimes we use a cheaper cross-register-bank copy cost than getVectorInstrCost would give. I would expect the getVectorInstrCost's to become removed if one of the operations was uniform or constant. And maybe the Cost += Cost shouldn't be needed if the vector instructions have correct costs. All the code here seems to be trying to give a very high cost for vector divides, as they will just be scalarized. dmgreen:* Using getArithmeticInstrCost here seems very odd. I would expect the base cost to be…
		RKSimonUnsubmitted Done Reply Inline Actions You should be able to use getScalarizationOverhead to handle all of the extract/insert costs for you. RKSimon: You should be able to use getScalarizationOverhead to handle all of the extract/insert costs…
		zjaffalAuthorUnsubmitted Done Reply Inline Actions I think this probably needs fixing because the assumed result is over a 100 for the cases that I was dealing with if you look at the test cases down below. I think there is repeated counting because first the base cost is InstructionCost Cost = BaseT::getArithmeticInstrCost( Opcode, Ty, CostKind, Op1Info, Op2Info); And in the BaseT it calculates the extract and insert cost. This is the relevant snippet from `llvm/include/llvm/CodeGen/BasicTTIImpl.h` if (auto VTy = dyn_cast<FixedVectorType>(Ty)) { InstructionCost Cost = thisT()->getArithmeticInstrCost( Opcode, VTy->getScalarType(), CostKind, Opd1Info, Opd2Info, Args, CxtI); // Return the cost of multiple scalar invocation plus the cost of // inserting and extracting the values. SmallVector<Type > Tys(Args.size(), Ty); return getScalarizationOverhead(VTy, Args, Tys) + VTy->getNumElements() * Cost; } zjaffal: I think this probably needs fixing because the assumed result is over a 100 for the cases that…
CostKind, Op1Info, Op2Info);		CostKind, Op1Info, Op2Info);
Cost += getArithmeticInstrCost(Instruction::InsertElement, Ty, CostKind,		Cost += getArithmeticInstrCost(Instruction::InsertElement, Ty, CostKind,
Op1Info, Op2Info);		Op1Info, Op2Info);
}		}

// TODO: if one of the arguments is scalar, then it's not necessary to		// TODO: if one of the arguments is scalar, then it's not necessary to
// double the cost of handling the vector elements.		// double the cost of handling the vector elements.
Cost += Cost;		Cost += Cost;
▲ Show 20 Lines • Show All 1,053 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/div.ll

Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	;
%V64i8 = udiv <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>		%V64i8 = udiv <64 x i8> undef, <i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16, i8 17, i8 18, i8 19>

ret i32 undef		ret i32 undef
}		}

define i32 @sdiv_uniformconst() {		define i32 @sdiv_uniformconst() {
; CHECK-LABEL: 'sdiv_uniformconst'		; CHECK-LABEL: 'sdiv_uniformconst'
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = sdiv i64 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = sdiv i64 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 7, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 7, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>
		dmgreenUnsubmitted Done Reply Inline Actions It seems odd that doing 2 divides, plus some cross-register-bank copies, would be cheaper than doing 1 dmgreen: It seems odd that doing 2 divides, plus some cross-register-bank copies, would be cheaper than…
		david-armUnsubmitted Done Reply Inline Actions I agree with @dmgreen, this does look odd. david-arm: I agree with @dmgreen, this does look odd.
		zjaffalAuthorUnsubmitted Done Reply Inline Actions I assumed that the insert and extract cost is 1. This is because `getVectorInstrCost(Instruction::ExtractElement)` and `getVectorInstrCost(Instruction::InsertElement)` is set to 2. I am not sure if I should use those values or keep it as it is. zjaffal: I assumed that the insert and extract cost is 1. This is because `getVectorInstrCost…
		dmgreenUnsubmitted Not Done Reply Inline Actions Can you explain why this isn't (7+4)2 = 22? Is the scalar cost treated as 1? dmgreen:* Can you explain why this isn't (7+4)*2 = 22? Is the scalar cost treated as 1?
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = sdiv i32 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = sdiv i32 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i32 = sdiv <4 x i32> undef, <i32 7, i32 7, i32 7, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i32 = sdiv <4 x i32> undef, <i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = sdiv <8 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V8i32 = sdiv <8 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = sdiv <16 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V16i32 = sdiv <16 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = sdiv i16 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = sdiv i16 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8i16 = sdiv <8 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8i16 = sdiv <8 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = sdiv <16 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V16i16 = sdiv <16 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = sdiv <32 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %V32i16 = sdiv <32 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = sdiv i8 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = sdiv i8 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = sdiv <16 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = sdiv <16 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = sdiv <32 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %V32i8 = sdiv <32 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = sdiv <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 320 for instruction: %V64i8 = sdiv <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = sdiv i64 undef, 7		%I64 = sdiv i64 undef, 7
%V2i64 = sdiv <2 x i64> undef, <i64 7, i64 7>		%V2i64 = sdiv <2 x i64> undef, <i64 7, i64 7>
%V4i64 = sdiv <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>		%V4i64 = sdiv <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>
%V8i64 = sdiv <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>		%V8i64 = sdiv <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>

%I32 = sdiv i32 undef, 7		%I32 = sdiv i32 undef, 7
Show All 12 Lines	;
%V64i8 = sdiv <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		%V64i8 = sdiv <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>

ret i32 undef		ret i32 undef
}		}

define i32 @udiv_uniformconst() {		define i32 @udiv_uniformconst() {
; CHECK-LABEL: 'udiv_uniformconst'		; CHECK-LABEL: 'udiv_uniformconst'
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = udiv i64 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = udiv i64 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V2i64 = udiv <2 x i64> undef, <i64 7, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V2i64 = udiv <2 x i64> undef, <i64 7, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V4i64 = udiv <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V4i64 = udiv <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V8i64 = udiv <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V8i64 = udiv <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i32 = udiv <4 x i32> undef, <i32 7, i32 7, i32 7, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i32 = udiv <4 x i32> undef, <i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = udiv <8 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V8i32 = udiv <8 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = udiv <16 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V16i32 = udiv <16 x i32> undef, <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8i16 = udiv <8 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8i16 = udiv <8 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = udiv <16 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V16i16 = udiv <16 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = udiv <32 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %V32i16 = udiv <32 x i16> undef, <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, 7		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, 7
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = udiv <16 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = udiv <16 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = udiv <32 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %V32i8 = udiv <32 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = udiv <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>		; CHECK-NEXT: Cost Model: Found an estimated cost of 320 for instruction: %V64i8 = udiv <64 x i8> undef, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = udiv i64 undef, 7		%I64 = udiv i64 undef, 7
%V2i64 = udiv <2 x i64> undef, <i64 7, i64 7>		%V2i64 = udiv <2 x i64> undef, <i64 7, i64 7>
%V4i64 = udiv <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>		%V4i64 = udiv <4 x i64> undef, <i64 7, i64 7, i64 7, i64 7>
%V8i64 = udiv <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>		%V8i64 = udiv <8 x i64> undef, <i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7, i64 7>

%I32 = udiv i32 undef, 7		%I32 = udiv i32 undef, 7
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	;
%V64i8 = sdiv <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		%V64i8 = sdiv <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>

ret i32 undef		ret i32 undef
}		}

define i32 @udiv_uniformconstpow2() {		define i32 @udiv_uniformconstpow2() {
; CHECK-LABEL: 'udiv_uniformconstpow2'		; CHECK-LABEL: 'udiv_uniformconstpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = udiv i64 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = udiv i64 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V2i64 = udiv <2 x i64> undef, <i64 16, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V2i64 = udiv <2 x i64> undef, <i64 16, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V4i64 = udiv <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V4i64 = udiv <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V8i64 = udiv <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V8i64 = udiv <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i32 = udiv <4 x i32> undef, <i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i32 = udiv <4 x i32> undef, <i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = udiv <8 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V8i32 = udiv <8 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = udiv <16 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V16i32 = udiv <16 x i32> undef, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8i16 = udiv <8 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8i16 = udiv <8 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = udiv <16 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V16i16 = udiv <16 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = udiv <32 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %V32i16 = udiv <32 x i16> undef, <i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16, i16 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, 16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, 16
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = udiv <16 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = udiv <16 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = udiv <32 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %V32i8 = udiv <32 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = udiv <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 320 for instruction: %V64i8 = udiv <64 x i8> undef, <i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16, i8 16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = udiv i64 undef, 16		%I64 = udiv i64 undef, 16
%V2i64 = udiv <2 x i64> undef, <i64 16, i64 16>		%V2i64 = udiv <2 x i64> undef, <i64 16, i64 16>
%V4i64 = udiv <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>		%V4i64 = udiv <4 x i64> undef, <i64 16, i64 16, i64 16, i64 16>
%V8i64 = udiv <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>		%V8i64 = udiv <8 x i64> undef, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>

%I32 = udiv i32 undef, 16		%I32 = udiv i32 undef, 16
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	;
%V64i8 = udiv <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>		%V64i8 = udiv <64 x i8> undef, <i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16, i8 -2, i8 -4, i8 -8, i8 -16>

ret i32 undef		ret i32 undef
}		}

define i32 @sdiv_uniformconstnegpow2() {		define i32 @sdiv_uniformconstnegpow2() {
; CHECK-LABEL: 'sdiv_uniformconstnegpow2'		; CHECK-LABEL: 'sdiv_uniformconstnegpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = sdiv i64 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = sdiv i64 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 -16, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V2i64 = sdiv <2 x i64> undef, <i64 -16, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V4i64 = sdiv <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V8i64 = sdiv <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = sdiv i32 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = sdiv i32 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i32 = sdiv <4 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i32 = sdiv <4 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = sdiv <8 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V8i32 = sdiv <8 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = sdiv <16 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V16i32 = sdiv <16 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = sdiv i16 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = sdiv i16 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8i16 = sdiv <8 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8i16 = sdiv <8 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = sdiv <16 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V16i16 = sdiv <16 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = sdiv <32 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %V32i16 = sdiv <32 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = sdiv i8 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = sdiv i8 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = sdiv <16 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = sdiv <16 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = sdiv <32 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %V32i8 = sdiv <32 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = sdiv <64 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 320 for instruction: %V64i8 = sdiv <64 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = sdiv i64 undef, -16		%I64 = sdiv i64 undef, -16
%V2i64 = sdiv <2 x i64> undef, <i64 -16, i64 -16>		%V2i64 = sdiv <2 x i64> undef, <i64 -16, i64 -16>
%V4i64 = sdiv <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>		%V4i64 = sdiv <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>
%V8i64 = sdiv <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>		%V8i64 = sdiv <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>

%I32 = sdiv i32 undef, -16		%I32 = sdiv i32 undef, -16
Show All 12 Lines	;
%V64i8 = sdiv <64 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		%V64i8 = sdiv <64 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>

ret i32 undef		ret i32 undef
}		}

define i32 @udiv_uniformconstnegpow2() {		define i32 @udiv_uniformconstnegpow2() {
; CHECK-LABEL: 'udiv_uniformconstnegpow2'		; CHECK-LABEL: 'udiv_uniformconstnegpow2'
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = udiv i64 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %I64 = udiv i64 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V2i64 = udiv <2 x i64> undef, <i64 -16, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V2i64 = udiv <2 x i64> undef, <i64 -16, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V4i64 = udiv <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V4i64 = udiv <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 192 for instruction: %V8i64 = udiv <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V8i64 = udiv <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I32 = udiv i32 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i32 = udiv <4 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4i32 = udiv <4 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8i32 = udiv <8 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V8i32 = udiv <8 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 208 for instruction: %V16i32 = udiv <16 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V16i32 = udiv <16 x i32> undef, <i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16, i32 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I16 = udiv i16 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8i16 = udiv <8 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8i16 = udiv <8 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16i16 = udiv <16 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V16i16 = udiv <16 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 432 for instruction: %V32i16 = udiv <32 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %V32i16 = udiv <32 x i16> undef, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, -16		; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %I8 = udiv i8 undef, -16
; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = udiv <16 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16i8 = udiv <16 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 440 for instruction: %V32i8 = udiv <32 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 160 for instruction: %V32i8 = udiv <32 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 880 for instruction: %V64i8 = udiv <64 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>		; CHECK-NEXT: Cost Model: Found an estimated cost of 320 for instruction: %V64i8 = udiv <64 x i8> undef, <i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16, i8 -16>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%I64 = udiv i64 undef, -16		%I64 = udiv i64 undef, -16
%V2i64 = udiv <2 x i64> undef, <i64 -16, i64 -16>		%V2i64 = udiv <2 x i64> undef, <i64 -16, i64 -16>
%V4i64 = udiv <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>		%V4i64 = udiv <4 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16>
%V8i64 = udiv <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>		%V8i64 = udiv <8 x i64> undef, <i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16, i64 -16>

%I32 = udiv i32 undef, -16		%I32 = udiv i32 undef, -16
Show All 16 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix cost model for `udiv` instruction when one of the operands is a uniform constantClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 478144

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/AArch64/div.ll

[AArch64] Fix cost model for `udiv` instruction when one of the operands is a uniform constant
ClosedPublic