This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2087–2120	@sdesmalen I noticed something here, is using that isa for Ty is correct, give that Ty is a ptr of "Type" ? I thought about using Ty.isVectorTy(), but it returns true for both scalable and fixed vectors, so it doesn't work here.

Harbormaster completed remote builds in B182855: Diff 454850.Aug 23 2022, 9:21 AM

sdesmalen added inline comments.Aug 23 2022, 9:23 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2087–2120	`class ScalableVectorType` subclasses from `class VectorType`, which in turn subclasses from `class Type`. `class FixedVectorType` also subclasses from `class VectorType`. If `Ty` is a `Type`, you can ask at runtime if `Ty` is actually an object of class `ScalableVectorType` using `isa<ScalableVectorType>(Ty)`. If you ask `Ty->isVectorTy()` it will return true if it is either a `FixedVectorType` or a `ScalableVectorType`, which means that is synonymous with `isa<VectorType>(Ty)`. So yes, if you want to check if a `Type ` is actually a `ScalableVectorType*`, then using `isa` seems correct.

Matt added a subscriber: Matt.Aug 23 2022, 11:37 AM

sdesmalen added inline comments.Aug 25 2022, 1:23 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2045–2047	This should be checking whether we can use SVE for the specified type (Ty). That means to check we have SVE, instead of checking whether the target supports scalable vectors. The answer will be the same, but conceptually we're specifically checking if we can use the SVE div instructions explicitly. Note that SVE also supports it for larger vectors (if the SVE min bits allows) and for vectors with FP elements.
2092	I wouldn't expect there to be any extract and insert, because the division is legal.

sdesmalen added reviewers: david-arm, dmgreen.Aug 25 2022, 1:23 AM

hassnaa-arm added inline comments.Aug 25 2022, 5:10 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2045–2047	"Note that SVE also supports it for larger vectors (if the SVE min bits allows) and for vectors with FP elements." Do you mean that I should check for larger vectors ? I checked for only v2i64 and v4i32 because that was specified in the jira ticket.
2092	I found that the code flow calculate the cost for both extract and insert, that's why I considered them.

david-arm added inline comments.Aug 30 2022, 8:27 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2087–2120	I think you can probably just do something like: if (TLI->isOperationLegalOrCustom(ISD, VT)) { ... possibly increase the cost for 8-bit or 16-bit element vectors ... return Cost; } else { ... existing code ... } This is because the operation will be marked as Expand if we don't have SVE.
2088	The sdiv and udiv operations only operate on integer types so this isn't needed.
2092	Hi @hassnaa-arm, I think Sander is right. If we're going to use the SVE udiv/sdiv instructions then there won't be any inserts or extracts. These only happen for NEON because we are going to scalarise the instruction, which requires something like: Step 1: Extract lane 0 from vectors a and b Step 2: Multiply these scalar values together Step 3: Insert the scalar sdiv/udiv result back into a vector. For SVE we don't need steps 1 and 3 because we perform the sdiv/udiv on the whole vector. If you run the tests you added through llc with -mattr=+sve you will only see a ptrue and a sdiv/udiv instruction, whereas for NEON (-mattr=+neon) you would also see some lane move instructions.
2127–2128	These comments are still valid if we don't have SVE so perhaps it's best to leave unchanged. What you could do is change the first line to say: // Without SVE we do not have a ... and then at the end of the comments you could add something like: // However, if SVE is available then we can lower the v2i64 operation using the // SVE mul instruction, which has a lower cost.
llvm/test/Analysis/CostModel/AArch64/sve-fixed-length-div-mul.ll
1 ↗	(On Diff #454850)	I think it might be worth moving the fixed-width vector tests into the existing file: test/Analysis/CostModel/AArch64/sve-fixed-length.ll Then you could rename this file to just be something like test/Analysis/CostModel/AArch64/sve-arith.ll
2 ↗	(On Diff #454850)	I think you can remove the `-mcpu=neoverse-v1` argument here because your changes apply for all targets.

hassnaa-arm marked an inline comment as done.Aug 31 2022, 9:41 AM

hassnaa-arm added inline comments.

llvm/test/Analysis/CostModel/AArch64/sve-fixed-length-div-mul.ll
1 ↗	(On Diff #454850)	on the main branch, when I run the cost model test for this file: test/Analysis/CostModel/AArch64/sve-fixed-length.ll, it gets changed.

[AArch64-SVE]: cost model.
move test cases from sve-fixed-length-div-mul.ll file to sve-fixed-length.ll file
rename sve-fixed-length-div-mul.ll file to sve-arith.ll
add scalable vector test cases to sve-arith.ll

This is looking better, thanks @hassnaa-arm! I still have a few more comments about the cost model and the tests, but it's almost there. :)

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2092	Hi @hassnaa-arm, I think we should be checking for the element type here rather than the full VT because that also covers other types such as MVT::v8i8, etc.
2094	Same thing here - we should be checking the element type only.
2147	It's ok to check the specific MVT::v2i64 here because we don't care about v1i64.
llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll
2	nit: Could you remove this NOTE please? It's just a bit misleading that's all because if you do actually run that script on this file it deletes the CHECK lines for the `@add()` function.
137	I think this should be `mul <8 x i64> undef, undef`

david-arm added inline comments.Sep 1 2022, 8:00 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2096	I just realised that we also need to multiply by the legalisation cost here too, i.e. Cost *= LT.first so that we calculate the costs correctly for types such as <8 x i64>, etc.

Harbormaster completed remote builds in B184567: Diff 457255.Sep 1 2022, 8:11 AM

[AArch64-SVE]: Improve cost model

check for element type instead of full VT to include all vec lengths of the type.
multiply the cost by the legalization cost.

hassnaa-arm marked an inline comment as done.Sep 1 2022, 10:44 AM

Harbormaster completed remote builds in B184624: Diff 457325.Sep 1 2022, 11:50 AM

Add calculations to the test file for the expected cost.

Harbormaster completed remote builds in B185051: Diff 457914.Sep 5 2022, 4:10 AM

Thanks for adding all the cost calculations for the fixed-length tests @hassnaa-arm - it looks much better now that we're testing all the SVE vector lengths. I just had a few more comments and then I think it's ready to go!

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2097	I am so sorry about this @hassnaa-arm, but I made a mistake previously asking you to multiply by the legalisation cost. It seems that `BaseT::getArithmeticInstrCost` already takes this into account. Also, I didn't spot this before, but I think we should actually return the cost here, i.e. return Cost; because otherwise we fall through to code below that doubles the cost again, i.e. Cost += Cost;
2121	I think it makes sense to restructure this so that this cost-doubling lives in the else case, i.e. something like: if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) { ... return Cost; } else { ... // TODO: if one of the arguments is scalar, then it's not necessary to // double the cost of handling the vector elements. return Cost * 2; }
llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll
61	I'm not sure this comment is quite right because essentially the base cost will be 2, 8, or 16 depending upon the element size. Then we multiply by the legalisation cost. Perhaps it might be clearer to say something like ; The base cost may vary depending upon the legalised element type, i.e. ; 2 for 32 and 64-bit elements, 8 for 16-bit elements and 16 for 8-bit elements. ; Then we multiply by a legalization cost, which is equal to (vec_len-1/VBITS)+1. ; Examples are given for expected costs when VBITS=128.
69	So for all cases where the vector length <= 128 we actually know the answer I think because VBITS is always >=128. For example, for all values of VBITS we know that (16-1)/VBITS=0. The same goes for the other cases up to and including 128-bit vectors, i.e. (128-1)/VBITS=0. So I think you can simplify the CHECK lines for all vectors <= 128 bits to using a single, static value, i.e. Cost Model: Found an estimated cost of 2 for instruction: %sdiv16.i8 = sdiv <2 x i8> undef, undef Cost Model: Found an estimated cost of 8 for instruction: %sdiv32.i8 = sdiv <4 x i8> undef, undef ...
105	With my comment about the incorrect legalisation cost above I think these CHECK lines can be simplified to just a single mul, i.e. ; CHECK: cost of #mul((div(512-1, VBITS)+1), 16 for instruction: %sdiv512.i8 = sdiv <64 x i8> undef, undef ; CHECK: cost of #mul((div(512-1, VBITS)+1), 8 for instruction: %sdiv512.i16 = sdiv <32 x i16> undef, undef ; CHECK: cost of #mul((div(512-1, VBITS)+1), 2 for instruction: %sdiv512.i32 = sdiv <16 x i32> undef, undef ; CHECK: cost of #mul((div(512-1, VBITS)+1), 2 for instruction: %sdiv512.i64 = sdiv <8 x i64> undef, undef

hassnaa-arm added inline comments.Sep 6 2022, 2:29 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2097	so what is the problem of doubling the cost ? I understand that because there are 2 operands, and the same work should be done for both of them, and the code calculates the cost of one of them, so, we have to double the cost to consider the needed work for both operands.

hassnaa-arm added inline comments.Sep 6 2022, 2:31 AM

llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll
105	about the part of multiplying by 16 and 8 instead of (2 * 8) and (2 * 4) I did it that way for the sake of clarification. to clarify that I'm multiplying the base cost * extra cost of 8/4 for i8/16.

david-arm added inline comments.Sep 6 2022, 2:40 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2097	So from looking at the comments // TODO: if one of the arguments is scalar, then it's not necessary to // double the cost of handling the vector elements. I think the doubling was added specifically for when we are scalarising the operation, i.e. extracting the vector elements of each operand, performing a scalar multiply, then inserting the result back into the final vector. So the doubling here refers to the scalarisation of both operands, not just one. However, when using SVE we are not scalarising each operand so we don't need to double the cost. To be honest, I don't think the existing code is quite right either because I think it should really be (BaseCost + (2 * InsertExtractCost)), but I don't think we should change it in this patch.

hassnaa-arm added inline comments.Sep 6 2022, 2:46 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2097	I understood the comment as following, if one of the operands is a scalar not vector, so we shouldn't double the cost. because at this case the work of legalisation and promotion/expansion is done for 1 vector not 2.

return cost model without doubling the cost.
update test file according to the new change.

Harbormaster completed remote builds in B185200: Diff 458131.Sep 6 2022, 5:01 AM

LGTM with nits addressed! Thanks for making all the changes @hassnaa-arm.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2147	nit: You don't need the extra brackets `()` around `ST->hasSVE()`
llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll
124	nit: I think this should be the same as the sdiv case, i.e. just ; Assuming base_cost = 2

This revision is now accepted and ready to land.Sep 6 2022, 5:07 AM

sdesmalen added inline comments.Sep 6 2022, 5:26 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2086	nit: unrelated whitespace change?
2088–2091	nit: How about: // For 8/16-bit element types, the cost is higher because the type requires // promotion and possibly splitting.
2099–2100	nit: this comment needs updating now.
2128–2137	nit: Please use proper capitalisation and punctuation in your comments. How about: // When SVE is not available there is no MUL.2d instruction, which means // a mul <2 x i64> is expensive as elements are extracted from the vectors // and the muls are scalarized.
2144–2148	The comment on lines 2085 to 2094 applies to the code below. Please move the code you've added here above the block of comments above, i.e. just below `case ISD::MUL:`

fixing typo

Harbormaster completed remote builds in B185591: Diff 458695.Sep 8 2022, 4:17 AM

hassnaa-arm marked 10 inline comments as done.Sep 10 2022, 9:15 AM

sdesmalen added inline comments.Sep 12 2022, 3:53 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2085	For the case of using SVE, this probably shouldn't be looking at the original type, but rather the type that is legal for the operation. There are two separate things going on here: The type legalization cost, which is the cost of getting a legal type. The operation legalization cost, which currently assumes a legal type. As your code already points out, operations are only legal on 32- and 64-bit element types. The way the code-generator handles this is that for 8 and 16-bit inputs, the type legalizer first tries to legalize them to something the code-generator can handle. After that, the code-generator will try to promote it to a type that the operation can handle. e.g. if the type is <2 x i8> that will be promoted to <2 x i32>. This is all fine, we can use the cost of LT.first (The cost of a DIV for v2i32) and combine it with the legalisation cost for going from v2i8 -> v2i32. if the type is <4 x i16> that is a legal type, but for the DIV operation v4i16 is _not_ legal, so we'll need to consider legalisation for v4i32. So for i8 and i16 element types, it should call `getTypeLegalizationCost();` again for the type that we know it will be promoted to. That type is a vector with the same number of lanes, but i32 element types.
2088–2091	Just FYI that this comment seems unaddressed.
2096	nit: redundant comment, please remove.
2099–2100	Just FYI that this comment is unaddressed.
llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll
69–75	In practice the cost of `sdiv <4 x i8>` is not 4 times as high as `sdiv <2 x i8>`, see e.g. https://godbolt.org/z/jM3M8rWsf so I think the costs are still not entirely right. Also see my other comment on the code itself.
70	nit: 16-1 => 15 ?

hassnaa-arm added inline comments.Sep 16 2022, 3:17 AM

llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll
70	yes, I'm using 16-1, 64-1,... for clarification. because firstly when I looked at a similar file that had 15, 63, I didn't understand what those numbers are.

hassnaa-arm marked 6 inline comments as done and an inline comment as not done.Sep 17 2022, 4:15 PM

hassnaa-arm added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2085	I understood the 2 points of legalisation, I agree with you. But I don't agree about using the function of getTypeLegalizationCost(); to get the cost of operation legalisation, because that function is related to type legalisation only. I suggest that, after getting the cost of type legalisation, if the legal type is not 32/64-bit element type, then append the cost for promoting it to the nearest legal element type (32). * the splitting cost. and then consider the cost of the operation * splitting.

david-arm added inline comments.Sep 20 2022, 7:26 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2085	So BaseT::getArithmeticInstrCost already takes legalisation cost into account: std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty); bool IsFloat = Ty->isFPOrFPVectorTy(); // Assume that floating point arithmetic operations cost twice as much as // integer operations. InstructionCost OpCost = (IsFloat ? 2 : 1); if (TLI->isOperationLegalOrPromote(ISD, LT.second)) { // The operation is legal. Assume it costs 1. // TODO: Once we have extract/insert subvector cost we need to use them. return LT.first * OpCost; } if (!TLI->isOperationExpand(ISD, LT.second)) { // If the operation is custom lowered, then assume that the code is twice // as expensive. return LT.first * 2 * OpCost; } Esentially it is mostly already doing what @sdesmalen is asking for I believe. If the type is <2 x i8> then the code above will get the type legalisation cost of promoting to <2 x i32> and since the operation is custom lowered we return `LT.first * 2 * OpCost`. This is behaving exactly as we want - combining the legalisation cost with the operation cost. We know that <2 x i32> divides will be lowered simply to SVE's sdiv/udiv where the predicate is generated using `ptrue p0.s, vl2`. It's not quite correct for <4 x i16> where the legalisation cost is too low (1) since it's a legal type and therefore we return `1 * 2 * OpCost`. Essentially during lowering for i8 and i18 types we do manual legalisation (see LowerDIV), which neither BaseT::getArithmeticInstrCost nor getTypeLegalizationCost knows anything about. However, @hassnaa-arm's code already tries to take this unknown behaviour into account with the multiplier, i.e. else if (LT.second.getScalarType() == MVT::i16) Cost = 4; Essentially I believe that for all cases except the case where the legalised element type is i8 or i16 BaseT::getArithmeticInstrCost is doing exactly what @sdesmalen is asking for. Consider the different scenarios: <2 x i8> -> <2 x i32>. We return the overall cost as `LT.first 2 * 1`, i.e. 2 because LT.first = 1. <4 x i8> -> <4 x i16>. Overall cost is `4 x LT.first * 2 * 1`, i.e. 8 because LT.first = 1. @hassnaa-arm has used the multipler '4' as an estimate for the cost of manual type legalisation in `AArch64TargetLowering::LowerDIV`. <2 x i16> -> <2 x i32>. We return the overall cost as `LT.first * 2 * 1`, i.e. 2 because LT.first = 1. <4 x i16>. The type is already legal so we return `4 x LT.first * 2 * 1`, i.e. 8 because LT.first = 1. <8 x i8>. The type is already legal so we return `8 x LT.first * 2 * 1`, i.e. 16 because LT.first = 1. It seems to me that the real problem here is that getTypeLegalizationCost returns the wrong cost for the legalisation of types such as <2 x i8>, i.e. it only returns 1 even though we need a SIGN_EXTEND_INREG operation for promotion of each input operand. If we want more accurate costs for < 128 bits how about a compromise here? If the type is < 128 bits we can just manually create a table of costs, since the list is quite small and look up the opcode cost from the table? It feels like it's going to be too difficult trying to model this using generic TTI interfaces. If the vector width >= 128 bits then @hassnaa-arm can use the existing code with the hard-coded multipliers? You can look for existing calls to CostTableLookup in this file for examples. You'd end up with a table something like this: static const CostTblEntry DivTbl[]{ {ISD::SDIV, MVT::v2i8, 5}, {ISD::SDIV, MVT::v4i8, 8}, {ISD::SDIV, MVT::v8i8, 8}, {ISD::SDIV, MVT::v2i16, 5}, {ISD::SDIV, MVT::v4i16, 5}, {ISD::SDIV, MVT::v2i32, 1}, ... plus entries for ISD::UDIV ... if (... size of Ty is < 128 bits ...) { MVT OpTy = TLI->getValueType(DL, Ty); const auto Entry = CostTableLookup(DivTbl, ISD::SDIV, OpTy); return Entry->Cost; } else if (LT.second.getScalarType() == MVT::i8) Cost = 8; else if (LT.second.getScalarType() == MVT::i16) Cost *= 4; where here I have estimated the cost based on the number of instructions generated for these operations. I treated the ptrue instruction as zero cost.
2144	nit: This line looks unnecessary.
llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll
144	I think these examples of "expected costs" need updating.

use cost table for fixed length vectors < 128

Harbormaster completed remote builds in B188134: Diff 462125.Sep 22 2022, 3:56 AM

Hi @hassnaa-arm, thanks for updating the patch with more accurate costs for <128 bit vectors. It's almost there I think - just a few more comments and then I'll be happy!

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2088	nit: Perhaps this is better written as something like // If SDIV/UDIV operations are lowered using SVE sdiv/udiv // instructions, then the cost is less because don't need to // scalarize the operation.
2089	nit: I think this comment now needs moving down below to where we check the element type.
2091	I think you can make some of the following code a bit simpler if you also check the type size here, i.e. if (isa<FixedVectorType>(Ty) && cast<FixedVectorType>(Ty)->getPrimitiveSizeInBits().getFixedSize() < 128) { This means you can remove the `if (VT.getSizeInBits() < 128u)` below.
2101	Just FYI `getSizeInBits()` returns a `TypeSize` class (llvm/include/llvm/Support/TypeSize.h) that tries to represent both fixed-width and scalable sizes. The code you've written here works, but it's relying upon a cast from a TypeSize class to an unsigned integer that only works for fixed-width vectors. We'd like to avoid relying upon these casts and so for things like this we'd prefer to use `getFixedSizeInBits()` instead, which asserts the TypeSize is fixed.
2144	This line still needs removing I think?
llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll
63	nit: I think it might be clearer to say: ; Assuming extra cost multipler of 8 for i8. ; Assuming extra cost multipler of 4 for i8.
93	Looks like these `; expected cost` comments aren't quite right here. For example, for <16 x i8> it looks like the cost should 2*8=16. I think it's probably worth either: Removing all the `; expected cost` comments in the file because they may need updating over time, or Update them to reflect the correct costs for each operation. I'm happy with either!

hassnaa-arm marked 4 inline comments as done.Sep 22 2022, 6:48 AM

check size of fixed-length vector correctly

Harbormaster completed remote builds in B188170: Diff 462169.Sep 22 2022, 6:50 AM

LGTM with nits addressed. Thanks @hassnaa-arm!

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2088	nit: I think the comment is still a bit vague here because the operations are always lowered somehow, whether they are marked as legal, custom or expand. I think the key thing is how they are lowered. I think it's important to mention they are `lowered using SVE`. Can you just add something like that before landing the patch?
llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll
63	nit: This still says `Assuming extra cost of 8`, which implies we're adding a cost of 8, not multiplying. Before landing the patch can you change this to say we're multiplying the cost by 8?
65	nit: I think this comment can be removed now.
115	nit: This comment can also be removed now.

Closed by commit rGf2072e0ae0f1: [AArh64-SVE]: Improve cost model for div/udiv/mul 128-bit vector operations (authored by Hassnaa Hamdi <hassnaa.hamdi@arm.com>). · Explain WhySep 22 2022, 9:51 AM

This revision was automatically updated to reflect the committed changes.

hassnaa-arm added a commit: rGf2072e0ae0f1: [AArh64-SVE]: Improve cost model for div/udiv/mul 128-bit vector operations.

hassnaa-arm marked 4 inline comments as done.Sep 23 2022, 4:53 AM

RKSimon added a subscriber: RKSimon.Sep 27 2022, 7:48 AM

RKSimon added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2117	@hassnaa-arm Shouldn't insertelement/extractelement costs be determined with getVectorInstrCost? Just noticed this while investigating followups to D134605.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.cpp

55 lines

test/

Analysis/

CostModel/

AArch64/

sve-arith.ll

57 lines

sve-fixed-length.ll

111 lines

sve-remainder.ll

36 lines

Diff 462216

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 2,036 Lines • ▼ Show 20 Lines	if (CostKind != TTI::TCK_RecipThroughput)
return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,		return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
Op2Info, Args, CxtI);		Op2Info, Args, CxtI);

// Legalize the type.		// Legalize the type.
std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);		std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);

switch (ISD) {		switch (ISD) {
default:		default:
return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,		return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
Op2Info);		Op2Info);
		sdesmalenUnsubmitted Done Reply Inline Actions This should be checking whether we can use SVE for the specified type (Ty). That means to check we have SVE, instead of checking whether the target supports scalable vectors. The answer will be the same, but conceptually we're specifically checking if we can use the SVE div instructions explicitly. Note that SVE also supports it for larger vectors (if the SVE min bits allows) and for vectors with FP elements. sdesmalen: This should be checking whether we can use SVE for the specified type (Ty). That means to check…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions "Note that SVE also supports it for larger vectors (if the SVE min bits allows) and for vectors with FP elements." Do you mean that I should check for larger vectors ? I checked for only v2i64 and v4i32 because that was specified in the jira ticket. hassnaa-arm: "Note that SVE also supports it for larger vectors (if the SVE min bits allows) and for vectors…
case ISD::SDIV:		case ISD::SDIV:
if (Op2Info.isConstant() && Op2Info.isUniform() && Op2Info.isPowerOf2()) {		if (Op2Info.isConstant() && Op2Info.isUniform() && Op2Info.isPowerOf2()) {
// On AArch64, scalar signed division by constants power-of-two are		// On AArch64, scalar signed division by constants power-of-two are
// normally expanded to the sequence ADD + CMP + SELECT + SRA.		// normally expanded to the sequence ADD + CMP + SELECT + SRA.
// The OperandValue properties many not be same as that of previous		// The OperandValue properties many not be same as that of previous
// operation; conservatively assume OP_None.		// operation; conservatively assume OP_None.
InstructionCost Cost = getArithmeticInstrCost(		InstructionCost Cost = getArithmeticInstrCost(
Instruction::Add, Ty, CostKind,		Instruction::Add, Ty, CostKind,
Show All 21 Lines	if (Op2Info.isConstant() && Op2Info.isUniform()) {
Instruction::Add, Ty, CostKind, Op1Info.getNoProps(), Op2Info.getNoProps());		Instruction::Add, Ty, CostKind, Op1Info.getNoProps(), Op2Info.getNoProps());
InstructionCost ShrCost = getArithmeticInstrCost(		InstructionCost ShrCost = getArithmeticInstrCost(
Instruction::AShr, Ty, CostKind, Op1Info.getNoProps(), Op2Info.getNoProps());		Instruction::AShr, Ty, CostKind, Op1Info.getNoProps(), Op2Info.getNoProps());
return MulCost * 2 + AddCost * 2 + ShrCost * 2 + 1;		return MulCost * 2 + AddCost * 2 + ShrCost * 2 + 1;
}		}
}		}

InstructionCost Cost = BaseT::getArithmeticInstrCost(		InstructionCost Cost = BaseT::getArithmeticInstrCost(
Opcode, Ty, CostKind, Op1Info, Op2Info);		Opcode, Ty, CostKind, Op1Info, Op2Info);
		sdesmalenUnsubmitted Not Done Reply Inline Actions For the case of using SVE, this probably shouldn't be looking at the original type, but rather the type that is legal for the operation. There are two separate things going on here: The type legalization cost, which is the cost of getting a legal type. The operation legalization cost, which currently assumes a legal type. As your code already points out, operations are only legal on 32- and 64-bit element types. The way the code-generator handles this is that for 8 and 16-bit inputs, the type legalizer first tries to legalize them to something the code-generator can handle. After that, the code-generator will try to promote it to a type that the operation can handle. e.g. if the type is <2 x i8> that will be promoted to <2 x i32>. This is all fine, we can use the cost of LT.first (The cost of a DIV for v2i32) and combine it with the legalisation cost for going from v2i8 -> v2i32. if the type is <4 x i16> that is a legal type, but for the DIV operation v4i16 is _not_ legal, so we'll need to consider legalisation for v4i32. So for i8 and i16 element types, it should call `getTypeLegalizationCost();` again for the type that we know it will be promoted to. That type is a vector with the same number of lanes, but i32 element types. sdesmalen: For the case of using SVE, this probably shouldn't be looking at the original type, but rather…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions I understood the 2 points of legalisation, I agree with you. But I don't agree about using the function of getTypeLegalizationCost(); to get the cost of operation legalisation, because that function is related to type legalisation only. I suggest that, after getting the cost of type legalisation, if the legal type is not 32/64-bit element type, then append the cost for promoting it to the nearest legal element type (32). * the splitting cost. and then consider the cost of the operation * splitting. hassnaa-arm: I understood the 2 points of legalisation, I agree with you. But I don't agree about using the…
		david-armUnsubmitted Not Done Reply Inline Actions So BaseT::getArithmeticInstrCost already takes legalisation cost into account: std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty); bool IsFloat = Ty->isFPOrFPVectorTy(); // Assume that floating point arithmetic operations cost twice as much as // integer operations. InstructionCost OpCost = (IsFloat ? 2 : 1); if (TLI->isOperationLegalOrPromote(ISD, LT.second)) { // The operation is legal. Assume it costs 1. // TODO: Once we have extract/insert subvector cost we need to use them. return LT.first * OpCost; } if (!TLI->isOperationExpand(ISD, LT.second)) { // If the operation is custom lowered, then assume that the code is twice // as expensive. return LT.first * 2 * OpCost; } Esentially it is mostly already doing what @sdesmalen is asking for I believe. If the type is <2 x i8> then the code above will get the type legalisation cost of promoting to <2 x i32> and since the operation is custom lowered we return `LT.first * 2 * OpCost`. This is behaving exactly as we want - combining the legalisation cost with the operation cost. We know that <2 x i32> divides will be lowered simply to SVE's sdiv/udiv where the predicate is generated using `ptrue p0.s, vl2`. It's not quite correct for <4 x i16> where the legalisation cost is too low (1) since it's a legal type and therefore we return `1 * 2 * OpCost`. Essentially during lowering for i8 and i18 types we do manual legalisation (see LowerDIV), which neither BaseT::getArithmeticInstrCost nor getTypeLegalizationCost knows anything about. However, @hassnaa-arm's code already tries to take this unknown behaviour into account with the multiplier, i.e. else if (LT.second.getScalarType() == MVT::i16) Cost = 4; Essentially I believe that for all cases except the case where the legalised element type is i8 or i16 BaseT::getArithmeticInstrCost is doing exactly what @sdesmalen is asking for. Consider the different scenarios: <2 x i8> -> <2 x i32>. We return the overall cost as `LT.first 2 * 1`, i.e. 2 because LT.first = 1. <4 x i8> -> <4 x i16>. Overall cost is `4 x LT.first * 2 * 1`, i.e. 8 because LT.first = 1. @hassnaa-arm has used the multipler '4' as an estimate for the cost of manual type legalisation in `AArch64TargetLowering::LowerDIV`. <2 x i16> -> <2 x i32>. We return the overall cost as `LT.first * 2 * 1`, i.e. 2 because LT.first = 1. <4 x i16>. The type is already legal so we return `4 x LT.first * 2 * 1`, i.e. 8 because LT.first = 1. <8 x i8>. The type is already legal so we return `8 x LT.first * 2 * 1`, i.e. 16 because LT.first = 1. It seems to me that the real problem here is that getTypeLegalizationCost returns the wrong cost for the legalisation of types such as <2 x i8>, i.e. it only returns 1 even though we need a SIGN_EXTEND_INREG operation for promotion of each input operand. If we want more accurate costs for < 128 bits how about a compromise here? If the type is < 128 bits we can just manually create a table of costs, since the list is quite small and look up the opcode cost from the table? It feels like it's going to be too difficult trying to model this using generic TTI interfaces. If the vector width >= 128 bits then @hassnaa-arm can use the existing code with the hard-coded multipliers? You can look for existing calls to CostTableLookup in this file for examples. You'd end up with a table something like this: static const CostTblEntry DivTbl[]{ {ISD::SDIV, MVT::v2i8, 5}, {ISD::SDIV, MVT::v4i8, 8}, {ISD::SDIV, MVT::v8i8, 8}, {ISD::SDIV, MVT::v2i16, 5}, {ISD::SDIV, MVT::v4i16, 5}, {ISD::SDIV, MVT::v2i32, 1}, ... plus entries for ISD::UDIV ... if (... size of Ty is < 128 bits ...) { MVT OpTy = TLI->getValueType(DL, Ty); const auto Entry = CostTableLookup(DivTbl, ISD::SDIV, OpTy); return Entry->Cost; } else if (LT.second.getScalarType() == MVT::i8) Cost = 8; else if (LT.second.getScalarType() == MVT::i16) Cost = 4; where here I have estimated the cost based on the number of instructions generated for these operations. I treated the ptrue instruction as zero cost. david-arm:* So BaseT::getArithmeticInstrCost already takes legalisation cost into account: std…
if (Ty->isVectorTy()) {		if (Ty->isVectorTy()) {
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: unrelated whitespace change? sdesmalen: nit: unrelated whitespace change?
// On AArch64, vector divisions are not supported natively and are		if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) {
// expanded into scalar divisions of each pair of elements.		// SDIV/UDIV operations are lowered, then we can have less costs.
		david-armUnsubmitted Done Reply Inline Actions The sdiv and udiv operations only operate on integer types so this isn't needed. david-arm: The sdiv and udiv operations only operate on integer types so this isn't needed.
		david-armUnsubmitted Not Done Reply Inline Actions nit: Perhaps this is better written as something like // If SDIV/UDIV operations are lowered using SVE sdiv/udiv // instructions, then the cost is less because don't need to // scalarize the operation. david-arm: nit: Perhaps this is better written as something like // If SDIV/UDIV operations are…
		david-armUnsubmitted Done Reply Inline Actions nit: I think the comment is still a bit vague here because the operations are always lowered somehow, whether they are marked as legal, custom or expand. I think the key thing is how they are lowered. I think it's important to mention they are `lowered using SVE`. Can you just add something like that before landing the patch? david-arm: nit: I think the comment is still a bit vague here because the operations are always lowered…
Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty, CostKind,		if (isa<FixedVectorType>(Ty) &&
		david-armUnsubmitted Done Reply Inline Actions nit: I think this comment now needs moving down below to where we check the element type. david-arm: nit: I think this comment now needs moving down below to where we check the element type.
Op1Info, Op2Info);		cast<FixedVectorType>(Ty)->getPrimitiveSizeInBits().getFixedSize() <
		128) {
		sdesmalenUnsubmitted Done Reply Inline Actions nit: How about: // For 8/16-bit element types, the cost is higher because the type requires // promotion and possibly splitting. sdesmalen: nit: How about: // For 8/16-bit element types, the cost is higher because the type requires…
		sdesmalenUnsubmitted Done Reply Inline Actions Just FYI that this comment seems unaddressed. sdesmalen: Just FYI that this comment seems unaddressed.
		david-armUnsubmitted Done Reply Inline Actions I think you can make some of the following code a bit simpler if you also check the type size here, i.e. if (isa<FixedVectorType>(Ty) && cast<FixedVectorType>(Ty)->getPrimitiveSizeInBits().getFixedSize() < 128) { This means you can remove the `if (VT.getSizeInBits() < 128u)` below. david-arm: I think you can make some of the following code a bit simpler if you also check the type size…
		EVT VT = TLI->getValueType(DL, Ty);
		sdesmalenUnsubmitted Done Reply Inline Actions I wouldn't expect there to be any extract and insert, because the division is legal. sdesmalen: I wouldn't expect there to be any extract and insert, because the division is legal.
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions I found that the code flow calculate the cost for both extract and insert, that's why I considered them. hassnaa-arm: I found that the code flow calculate the cost for both extract and insert, that's why I…
		david-armUnsubmitted Done Reply Inline Actions Hi @hassnaa-arm, I think Sander is right. If we're going to use the SVE udiv/sdiv instructions then there won't be any inserts or extracts. These only happen for NEON because we are going to scalarise the instruction, which requires something like: Step 1: Extract lane 0 from vectors a and b Step 2: Multiply these scalar values together Step 3: Insert the scalar sdiv/udiv result back into a vector. For SVE we don't need steps 1 and 3 because we perform the sdiv/udiv on the whole vector. If you run the tests you added through llc with -mattr=+sve you will only see a ptrue and a sdiv/udiv instruction, whereas for NEON (-mattr=+neon) you would also see some lane move instructions. david-arm: Hi @hassnaa-arm, I think Sander is right. If we're going to use the SVE udiv/sdiv instructions…
		david-armUnsubmitted Done Reply Inline Actions Hi @hassnaa-arm, I think we should be checking for the element type here rather than the full VT because that also covers other types such as MVT::v8i8, etc. david-arm: Hi @hassnaa-arm, I think we should be checking for the element type here rather than the full…
		static const CostTblEntry DivTbl[]{
		{ISD::SDIV, MVT::v2i8, 5}, {ISD::SDIV, MVT::v4i8, 8},
		david-armUnsubmitted Done Reply Inline Actions Same thing here - we should be checking the element type only. david-arm: Same thing here - we should be checking the element type only.
		{ISD::SDIV, MVT::v8i8, 8}, {ISD::SDIV, MVT::v2i16, 5},
		{ISD::SDIV, MVT::v4i16, 5}, {ISD::SDIV, MVT::v2i32, 1},
		david-armUnsubmitted Done Reply Inline Actions I just realised that we also need to multiply by the legalisation cost here too, i.e. Cost = LT.first so that we calculate the costs correctly for types such as <8 x i64>, etc. david-arm:* I just realised that we also need to multiply by the legalisation cost here too, i.e. Cost *=…
		sdesmalenUnsubmitted Done Reply Inline Actions nit: redundant comment, please remove. sdesmalen: nit: redundant comment, please remove.
		{ISD::UDIV, MVT::v2i8, 5}, {ISD::UDIV, MVT::v4i8, 8},
		david-armUnsubmitted Done Reply Inline Actions I am so sorry about this @hassnaa-arm, but I made a mistake previously asking you to multiply by the legalisation cost. It seems that `BaseT::getArithmeticInstrCost` already takes this into account. Also, I didn't spot this before, but I think we should actually return the cost here, i.e. return Cost; because otherwise we fall through to code below that doubles the cost again, i.e. Cost += Cost; david-arm: I am so sorry about this @hassnaa-arm, but I made a mistake previously asking you to multiply…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions so what is the problem of doubling the cost ? I understand that because there are 2 operands, and the same work should be done for both of them, and the code calculates the cost of one of them, so, we have to double the cost to consider the needed work for both operands. hassnaa-arm: so what is the problem of doubling the cost ? I understand that because there are 2 operands…
		david-armUnsubmitted Done Reply Inline Actions So from looking at the comments // TODO: if one of the arguments is scalar, then it's not necessary to // double the cost of handling the vector elements. I think the doubling was added specifically for when we are scalarising the operation, i.e. extracting the vector elements of each operand, performing a scalar multiply, then inserting the result back into the final vector. So the doubling here refers to the scalarisation of both operands, not just one. However, when using SVE we are not scalarising each operand so we don't need to double the cost. To be honest, I don't think the existing code is quite right either because I think it should really be (BaseCost + (2 * InsertExtractCost)), but I don't think we should change it in this patch. david-arm: So from looking at the comments // TODO: if one of the arguments is scalar, then it's…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions I understood the comment as following, if one of the operands is a scalar not vector, so we shouldn't double the cost. because at this case the work of legalisation and promotion/expansion is done for 1 vector not 2. hassnaa-arm: I understood the comment as following, if one of the operands is a scalar not vector, so we…
		{ISD::UDIV, MVT::v8i8, 8}, {ISD::UDIV, MVT::v2i16, 5},
		{ISD::UDIV, MVT::v4i16, 5}, {ISD::UDIV, MVT::v2i32, 1}};

		sdesmalenUnsubmitted Done Reply Inline Actions nit: this comment needs updating now. sdesmalen: nit: this comment needs updating now.
		sdesmalenUnsubmitted Done Reply Inline Actions Just FYI that this comment is unaddressed. sdesmalen: Just FYI that this comment is unaddressed.
		const auto *Entry = CostTableLookup(DivTbl, ISD, VT.getSimpleVT());
		david-armUnsubmitted Done Reply Inline Actions Just FYI `getSizeInBits()` returns a `TypeSize` class (llvm/include/llvm/Support/TypeSize.h) that tries to represent both fixed-width and scalable sizes. The code you've written here works, but it's relying upon a cast from a TypeSize class to an unsigned integer that only works for fixed-width vectors. We'd like to avoid relying upon these casts and so for things like this we'd prefer to use `getFixedSizeInBits()` instead, which asserts the TypeSize is fixed. david-arm: Just FYI `getSizeInBits()` returns a `TypeSize` class (llvm/include/llvm/Support/TypeSize.h)…
		if (nullptr != Entry)
		return Entry->Cost;
		}
		// For 8/16-bit elements, the cost is higher because the type
		// requires promotion and possibly splitting:
		if (LT.second.getScalarType() == MVT::i8)
		Cost *= 8;
		else if (LT.second.getScalarType() == MVT::i16)
		Cost *= 4;
		return Cost;
		} else {
		// On AArch64, without SVE, vector divisions are expanded
		// into scalar divisions of each pair of elements.
		Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty,
		CostKind, Op1Info, Op2Info);
Cost += getArithmeticInstrCost(Instruction::InsertElement, Ty, CostKind,		Cost += getArithmeticInstrCost(Instruction::InsertElement, Ty, CostKind,
		RKSimonUnsubmitted Not Done Reply Inline Actions @hassnaa-arm Shouldn't insertelement/extractelement costs be determined with getVectorInstrCost? Just noticed this while investigating followups to D134605. RKSimon: @hassnaa-arm Shouldn't insertelement/extractelement costs be determined with getVectorInstrCost?
Op1Info, Op2Info);		Op1Info, Op2Info);
		}

		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions @sdesmalen I noticed something here, is using that isa for Ty is correct, give that Ty is a ptr of "Type" ? I thought about using Ty.isVectorTy(), but it returns true for both scalable and fixed vectors, so it doesn't work here. hassnaa-arm: @sdesmalen I noticed something here, is using that isa for Ty is correct, give that Ty is…
		sdesmalenUnsubmitted Done Reply Inline Actions `class ScalableVectorType` subclasses from `class VectorType`, which in turn subclasses from `class Type`. `class FixedVectorType` also subclasses from `class VectorType`. If `Ty` is a `Type`, you can ask at runtime if `Ty` is actually an object of class `ScalableVectorType` using `isa<ScalableVectorType>(Ty)`. If you ask `Ty->isVectorTy()` it will return true if it is either a `FixedVectorType` or a `ScalableVectorType`, which means that is synonymous with `isa<VectorType>(Ty)`. So yes, if you want to check if a `Type ` is actually a `ScalableVectorType`, then using `isa` seems correct. sdesmalen:* `class ScalableVectorType` subclasses from `class VectorType`, which in turn subclasses from…
		david-armUnsubmitted Not Done Reply Inline Actions I think you can probably just do something like: if (TLI->isOperationLegalOrCustom(ISD, VT)) { ... possibly increase the cost for 8-bit or 16-bit element vectors ... return Cost; } else { ... existing code ... } This is because the operation will be marked as Expand if we don't have SVE. david-arm: I think you can probably just do something like: if (TLI->isOperationLegalOrCustom(ISD, VT))…
// TODO: if one of the arguments is scalar, then it's not necessary to		// TODO: if one of the arguments is scalar, then it's not necessary to
		david-armUnsubmitted Not Done Reply Inline Actions I think it makes sense to restructure this so that this cost-doubling lives in the else case, i.e. something like: if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) { ... return Cost; } else { ... // TODO: if one of the arguments is scalar, then it's not necessary to // double the cost of handling the vector elements. return Cost * 2; } david-arm: I think it makes sense to restructure this so that this cost-doubling lives in the else case, i.
// double the cost of handling the vector elements.		// double the cost of handling the vector elements.
Cost += Cost;		Cost += Cost;
}		}
return Cost;		return Cost;
}		}
case ISD::MUL:		case ISD::MUL:
// Since we do not have a MUL.2d instruction, a mul <2 x i64> is expensive		// When SVE is available, then we can lower the v2i64 operation using
		david-armUnsubmitted Not Done Reply Inline Actions These comments are still valid if we don't have SVE so perhaps it's best to leave unchanged. What you could do is change the first line to say: // Without SVE we do not have a ... and then at the end of the comments you could add something like: // However, if SVE is available then we can lower the v2i64 operation using the // SVE mul instruction, which has a lower cost. david-arm: These comments are still valid if we don't have SVE so perhaps it's best to leave unchanged.
// as elements are extracted from the vectors and the muls scalarized.		// the SVE mul instruction, which has a lower cost.
// As getScalarizationOverhead is a bit too pessimistic, we estimate the		if (LT.second == MVT::v2i64 && ST->hasSVE())
// cost for a i64 vector directly here, which is:		return LT.first;

		// When SVE is not available, there is no MUL.2d instruction,
		// which means mul <2 x i64> is expensive as elements are extracted
		// from the vectors and the muls scalarized.
		// As getScalarizationOverhead is a bit too pessimistic, we
		// estimate the cost for a i64 vector directly here, which is:
		sdesmalenUnsubmitted Done Reply Inline Actions nit: Please use proper capitalisation and punctuation in your comments. How about: // When SVE is not available there is no MUL.2d instruction, which means // a mul <2 x i64> is expensive as elements are extracted from the vectors // and the muls are scalarized. sdesmalen: nit: Please use proper capitalisation and punctuation in your comments. How about: // When…
// - four 2-cost i64 extracts,		// - four 2-cost i64 extracts,
// - two 2-cost i64 inserts, and		// - two 2-cost i64 inserts, and
// - two 1-cost muls.		// - two 1-cost muls.
// So, for a v2i64 with LT.First = 1 the cost is 14, and for a v4i64 with		// So, for a v2i64 with LT.First = 1 the cost is 14, and for a v4i64 with
// LT.first = 2 the cost is 28. If both operands are extensions it will not		// LT.first = 2 the cost is 28. If both operands are extensions it will not
// need to scalarize so the cost can be cheaper (smull or umull).		// need to scalarize so the cost can be cheaper (smull or umull).
		// so the cost can be cheaper (smull or umull).
		david-armUnsubmitted Not Done Reply Inline Actions nit: This line looks unnecessary. david-arm: nit: This line looks unnecessary.
		david-armUnsubmitted Not Done Reply Inline Actions This line still needs removing I think? david-arm: This line still needs removing I think?
if (LT.second != MVT::v2i64 \|\| isWideningInstruction(Ty, Opcode, Args))		if (LT.second != MVT::v2i64 \|\| isWideningInstruction(Ty, Opcode, Args))
return LT.first;		return LT.first;
return LT.first * 14;		return LT.first * 14;
		david-armUnsubmitted Done Reply Inline Actions It's ok to check the specific MVT::v2i64 here because we don't care about v1i64. david-arm: It's ok to check the specific MVT::v2i64 here because we don't care about v1i64.
		david-armUnsubmitted Not Done Reply Inline Actions nit: You don't need the extra brackets `()` around `ST->hasSVE()` david-arm: nit: You don't need the extra brackets `()` around `ST->hasSVE()`
case ISD::ADD:		case ISD::ADD:
		sdesmalenUnsubmitted Done Reply Inline Actions The comment on lines 2085 to 2094 applies to the code below. Please move the code you've added here above the block of comments above, i.e. just below `case ISD::MUL:` sdesmalen: The comment on lines 2085 to 2094 applies to the code below. Please move the code you've added…
case ISD::XOR:		case ISD::XOR:
case ISD::OR:		case ISD::OR:
case ISD::AND:		case ISD::AND:
case ISD::SRL:		case ISD::SRL:
case ISD::SRA:		case ISD::SRA:
case ISD::SHL:		case ISD::SHL:
// These nodes are marked as 'custom' for combining purposes only.		// These nodes are marked as 'custom' for combining purposes only.
// We know that they are legal. See LowerAdd in ISelLowering.		// We know that they are legal. See LowerAdd in ISelLowering.
▲ Show 20 Lines • Show All 1,020 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/sve-arith.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=128 \| FileCheck %s -D#VBITS=128

				target triple = "aarch64-unknown-linux-gnu"

				define void @scalable_sdiv() #0 {
				; CHECK-LABEL: 'scalable_sdiv'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %sdiv_nxv16i8 = sdiv <vscale x 16 x i8> undef, undef
				; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %sdiv_nxv8i16 = sdiv <vscale x 8 x i16> undef, undef
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sdiv_nxv4i32 = sdiv <vscale x 4 x i32> undef, undef
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %sdiv_nxv2i64 = sdiv <vscale x 2 x i64> undef, undef
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				entry:
				%sdiv_nxv16i8 = sdiv <vscale x 16 x i8> undef, undef
				%sdiv_nxv8i16 = sdiv <vscale x 8 x i16> undef, undef
				%sdiv_nxv4i32 = sdiv <vscale x 4 x i32> undef, undef
				%sdiv_nxv2i64 = sdiv <vscale x 2 x i64> undef, undef

				ret void
				}

				define void @scalable_udiv() #0 {
				; CHECK-LABEL: 'scalable_udiv'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %udiv_nxv16i8 = udiv <vscale x 16 x i8> undef, undef
				; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %udiv_nxv8i16 = udiv <vscale x 8 x i16> undef, undef
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %udiv_nxv4i32 = udiv <vscale x 4 x i32> undef, undef
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %udiv_nxv2i64 = udiv <vscale x 2 x i64> undef, undef
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				entry:
				%udiv_nxv16i8 = udiv <vscale x 16 x i8> undef, undef
				%udiv_nxv8i16 = udiv <vscale x 8 x i16> undef, undef
				%udiv_nxv4i32 = udiv <vscale x 4 x i32> undef, undef
				%udiv_nxv2i64 = udiv <vscale x 2 x i64> undef, undef

				ret void
				}

				define void @scalable_mul() #0 {
				; CHECK-LABEL: 'scalable_mul'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mul_nxv16i8 = mul <vscale x 16 x i8> undef, undef
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mul_nxv8i16 = mul <vscale x 8 x i16> undef, undef
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mul_nxv4i32 = mul <vscale x 4 x i32> undef, undef
				; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %mul_nxv2i64 = mul <vscale x 2 x i64> undef, undef
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
				;
				entry:
				%mul_nxv16i8 = mul <vscale x 16 x i8> undef, undef
				%mul_nxv8i16 = mul <vscale x 8 x i16> undef, undef
				%mul_nxv4i32 = mul <vscale x 4 x i32> undef, undef
				%mul_nxv2i64 = mul <vscale x 2 x i64> undef, undef

				ret void
				}

				attributes #0 = { "target-features"="+sve" }

llvm/test/Analysis/CostModel/AArch64/sve-fixed-length.ll

; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s -D#VBITS=128		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output \| FileCheck %s -D#VBITS=128
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=128 \| FileCheck %s -D#VBITS=128		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=128 \| FileCheck %s -D#VBITS=128
		david-armUnsubmitted Done Reply Inline Actions nit: Could you remove this NOTE please? It's just a bit misleading that's all because if you do actually run that script on this file it deletes the CHECK lines for the `@add()` function. david-arm: nit: Could you remove this NOTE please? It's just a bit misleading that's all because if you do…
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=256 \| FileCheck %s -D#VBITS=256		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=256 \| FileCheck %s -D#VBITS=256
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=384 \| FileCheck %s -D#VBITS=256		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=384 \| FileCheck %s -D#VBITS=256
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=512 \| FileCheck %s -D#VBITS=512		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=512 \| FileCheck %s -D#VBITS=512
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=640 \| FileCheck %s -D#VBITS=512		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=640 \| FileCheck %s -D#VBITS=512
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=768 \| FileCheck %s -D#VBITS=512		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=768 \| FileCheck %s -D#VBITS=512
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=896 \| FileCheck %s -D#VBITS=512		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=896 \| FileCheck %s -D#VBITS=512
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=1024 \| FileCheck %s -D#VBITS=1024		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=1024 \| FileCheck %s -D#VBITS=1024
; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=1152 \| FileCheck %s -D#VBITS=1024		; RUN: opt < %s -passes="print<cost-model>" 2>&1 -disable-output -aarch64-sve-vector-bits-min=1152 \| FileCheck %s -D#VBITS=1024
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	; CHECK: cost of [[#mul(div(511,VBITS)+1,2)]] for instruction: %add512.f64 = fadd <8 x double> undef, undef
%add512.i64 = add <8 x i64> undef, undef		%add512.i64 = add <8 x i64> undef, undef
%add512.f16 = fadd <32 x half> undef, undef		%add512.f16 = fadd <32 x half> undef, undef
%add512.f32 = fadd <16 x float> undef, undef		%add512.f32 = fadd <16 x float> undef, undef
%add512.f64 = fadd <8 x double> undef, undef		%add512.f64 = fadd <8 x double> undef, undef

ret void		ret void
}		}

		; Assuming base_cost = 2
		david-armUnsubmitted Done Reply Inline Actions I'm not sure this comment is quite right because essentially the base cost will be 2, 8, or 16 depending upon the element size. Then we multiply by the legalisation cost. Perhaps it might be clearer to say something like ; The base cost may vary depending upon the legalised element type, i.e. ; 2 for 32 and 64-bit elements, 8 for 16-bit elements and 16 for 8-bit elements. ; Then we multiply by a legalization cost, which is equal to (vec_len-1/VBITS)+1. ; Examples are given for expected costs when VBITS=128. david-arm: I'm not sure this comment is quite right because essentially the base cost will be 2, 8, or 16…
		; Assuming legalization_cost = (vec_len-1/VBITS)+1
		; Assuming extra cost of 8 for i8.
		david-armUnsubmitted Not Done Reply Inline Actions nit: I think it might be clearer to say: ; Assuming extra cost multipler of 8 for i8. ; Assuming extra cost multipler of 4 for i8. david-arm: nit: I think it might be clearer to say: ; Assuming extra cost multipler of 8 for i8.
		david-armUnsubmitted Done Reply Inline Actions nit: This still says `Assuming extra cost of 8`, which implies we're adding a cost of 8, not multiplying. Before landing the patch can you change this to say we're multiplying the cost by 8? david-arm: nit: This still says `Assuming extra cost of 8`, which implies we're adding a cost of 8…
		; Assuming extra cost of 4 for i16.
		; The hard-coded expected cost is based on VBITS=128
		david-armUnsubmitted Done Reply Inline Actions nit: I think this comment can be removed now. david-arm: nit: I think this comment can be removed now.
		define void @sdiv() #0 {
		; CHECK-LABEL: function 'sdiv'

		; CHECK: cost of 5 for instruction: %sdiv16.i8 = sdiv <2 x i8> undef, undef
		david-armUnsubmitted Not Done Reply Inline Actions So for all cases where the vector length <= 128 we actually know the answer I think because VBITS is always >=128. For example, for all values of VBITS we know that (16-1)/VBITS=0. The same goes for the other cases up to and including 128-bit vectors, i.e. (128-1)/VBITS=0. So I think you can simplify the CHECK lines for all vectors <= 128 bits to using a single, static value, i.e. Cost Model: Found an estimated cost of 2 for instruction: %sdiv16.i8 = sdiv <2 x i8> undef, undef Cost Model: Found an estimated cost of 8 for instruction: %sdiv32.i8 = sdiv <4 x i8> undef, undef ... david-arm: So for all cases where the vector length <= 128 we actually know the answer I think because…
		%sdiv16.i8 = sdiv <2 x i8> undef, undef
		sdesmalenUnsubmitted Done Reply Inline Actions nit: 16-1 => 15 ? sdesmalen: nit: 16-1 => 15 ?
		hassnaa-armAuthorUnsubmitted Not Done Reply Inline Actions yes, I'm using 16-1, 64-1,... for clarification. because firstly when I looked at a similar file that had 15, 63, I didn't understand what those numbers are. hassnaa-arm: yes, I'm using 16-1, 64-1,... for clarification. because firstly when I looked at a similar…

		; CHECK: cost of 8 for instruction: %sdiv32.i8 = sdiv <4 x i8> undef, undef
		%sdiv32.i8 = sdiv <4 x i8> undef, undef

		; CHECK: cost of 5 for instruction: %sdiv32.i16 = sdiv <2 x i16> undef, undef
		sdesmalenUnsubmitted Done Reply Inline Actions In practice the cost of `sdiv <4 x i8>` is not 4 times as high as `sdiv <2 x i8>`, see e.g. https://godbolt.org/z/jM3M8rWsf so I think the costs are still not entirely right. Also see my other comment on the code itself. sdesmalen: In practice the cost of `sdiv <4 x i8>` is not 4 times as high as `sdiv <2 x i8>`, see e.g.
		%sdiv32.i16 = sdiv <2 x i16> undef, undef

		; CHECK: cost of 8 for instruction: %sdiv64.i8 = sdiv <8 x i8> undef, undef
		%sdiv64.i8 = sdiv <8 x i8> undef, undef

		; CHECK: cost of 5 for instruction: %sdiv64.i16 = sdiv <4 x i16> undef, undef
		%sdiv64.i16 = sdiv <4 x i16> undef, undef

		; CHECK: cost of 1 for instruction: %sdiv64.i32 = sdiv <2 x i32> undef, undef
		%sdiv64.i32 = sdiv <2 x i32> undef, undef

		; CHECK: cost of [[#mul(mul(div(128-1, VBITS)+1, 2), 8)]] for instruction: %sdiv128.i8 = sdiv <16 x i8> undef, undef
		%sdiv128.i8 = sdiv <16 x i8> undef, undef

		; CHECK: cost of [[#mul(mul(div(128-1, VBITS)+1, 2), 4)]] for instruction: %sdiv128.i16 = sdiv <8 x i16> undef, undef
		%sdiv128.i16 = sdiv <8 x i16> undef, undef

		; CHECK: cost of [[#mul(div(128-1, VBITS)+1, 2)]] for instruction: %sdiv128.i64 = sdiv <2 x i64> undef, undef
		david-armUnsubmitted Done Reply Inline Actions Looks like these `; expected cost` comments aren't quite right here. For example, for <16 x i8> it looks like the cost should 28=16. I think it's probably worth either: Removing all the `; expected cost` comments in the file because they may need updating over time, or Update them to reflect the correct costs for each operation. I'm happy with either! david-arm:* Looks like these `; expected cost` comments aren't quite right here. For example, for <16 x i8>…
		%sdiv128.i64 = sdiv <2 x i64> undef, undef

		; CHECK: cost of [[#mul(mul(div(512-1, VBITS)+1, 2), 8)]] for instruction: %sdiv512.i8 = sdiv <64 x i8> undef, undef
		%sdiv512.i8 = sdiv <64 x i8> undef, undef

		; CHECK: cost of [[#mul(mul(div(512-1, VBITS)+1, 2), 4)]] for instruction: %sdiv512.i16 = sdiv <32 x i16> undef, undef
		%sdiv512.i16 = sdiv <32 x i16> undef, undef

		; CHECK: cost of [[#mul(div(512-1, VBITS)+1, 2)]] for instruction: %sdiv512.i32 = sdiv <16 x i32> undef, undef
		%sdiv512.i32 = sdiv <16 x i32> undef, undef

		; CHECK: cost of [[#mul(div(512-1, VBITS)+1, 2)]] for instruction: %sdiv512.i64 = sdiv <8 x i64> undef, undef
		david-armUnsubmitted Not Done Reply Inline Actions With my comment about the incorrect legalisation cost above I think these CHECK lines can be simplified to just a single mul, i.e. ; CHECK: cost of #mul((div(512-1, VBITS)+1), 16 for instruction: %sdiv512.i8 = sdiv <64 x i8> undef, undef ; CHECK: cost of #mul((div(512-1, VBITS)+1), 8 for instruction: %sdiv512.i16 = sdiv <32 x i16> undef, undef ; CHECK: cost of #mul((div(512-1, VBITS)+1), 2 for instruction: %sdiv512.i32 = sdiv <16 x i32> undef, undef ; CHECK: cost of #mul((div(512-1, VBITS)+1), 2 for instruction: %sdiv512.i64 = sdiv <8 x i64> undef, undef david-arm: With my comment about the incorrect legalisation cost above I think these CHECK lines can be…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions about the part of multiplying by 16 and 8 instead of (2 * 8) and (2 * 4) I did it that way for the sake of clarification. to clarify that I'm multiplying the base cost * extra cost of 8/4 for i8/16. hassnaa-arm: about the part of multiplying by 16 and 8 instead of (2 * 8) and (2 * 4) I did it that way for…
		%sdiv512.i64 = sdiv <8 x i64> undef, undef

		ret void
		}

		; Assuming base_cost = 2
		; Assuming legalization_cost = (vec_len-1/VBITS)+1
		; Assuming extra cost of 8 for i8.
		; Assuming extra cost of 4 for i16.
		; The hard-coded expected cost is based on VBITS=128
		david-armUnsubmitted Done Reply Inline Actions nit: This comment can also be removed now. david-arm: nit: This comment can also be removed now.
		define void @udiv() #0 {
		; CHECK-LABEL: function 'udiv'

		; CHECK: cost of 5 for instruction: %udiv16.i8 = udiv <2 x i8> undef, undef
		%udiv16.i8 = udiv <2 x i8> undef, undef

		; CHECK: cost of 8 for instruction: %udiv32.i8 = udiv <4 x i8> undef, undef
		%udiv32.i8 = udiv <4 x i8> undef, undef

		david-armUnsubmitted Not Done Reply Inline Actions nit: I think this should be the same as the sdiv case, i.e. just ; Assuming base_cost = 2 david-arm: nit: I think this should be the same as the sdiv case, i.e. just ; Assuming base_cost = 2
		; CHECK: cost of 5 for instruction: %udiv32.i16 = udiv <2 x i16> undef, undef
		%udiv32.i16 = udiv <2 x i16> undef, undef

		; CHECK: cost of 8 for instruction: %udiv64.i8 = udiv <8 x i8> undef, undef
		%udiv64.i8 = udiv <8 x i8> undef, undef

		; CHECK: cost of 5 for instruction: %udiv64.i16 = udiv <4 x i16> undef, undef
		%udiv64.i16 = udiv <4 x i16> undef, undef

		; CHECK: cost of 1 for instruction: %udiv64.i32 = udiv <2 x i32> undef, undef
		%udiv64.i32 = udiv <2 x i32> undef, undef

		; CHECK: cost of [[#mul(mul(div(128-1, VBITS)+1, 2), 8)]] for instruction: %udiv128.i8 = udiv <16 x i8> undef, undef
		david-armUnsubmitted Done Reply Inline Actions I think this should be `mul <8 x i64> undef, undef` david-arm: I think this should be `mul <8 x i64> undef, undef`
		%udiv128.i8 = udiv <16 x i8> undef, undef

		; CHECK: cost of [[#mul(mul(div(128-1, VBITS)+1, 2), 4)]] for instruction: %udiv128.i16 = udiv <8 x i16> undef, undef
		%udiv128.i16 = udiv <8 x i16> undef, undef

		; CHECK: cost of [[#mul(div(128-1, VBITS)+1, 2)]] for instruction: %udiv128.i64 = udiv <2 x i64> undef, undef
		%udiv128.i64 = udiv <2 x i64> undef, undef
		david-armUnsubmitted Not Done Reply Inline Actions I think these examples of "expected costs" need updating. david-arm: I think these examples of "expected costs" need updating.

		; CHECK: cost of [[#mul(mul(div(512-1, VBITS)+1, 2), 8)]] for instruction: %udiv512.i8 = udiv <64 x i8> undef, undef
		%udiv512.i8 = udiv <64 x i8> undef, undef

		; CHECK: cost of [[#mul(mul(div(512-1, VBITS)+1, 2), 4)]] for instruction: %udiv512.i16 = udiv <32 x i16> undef, undef
		%udiv512.i16 = udiv <32 x i16> undef, undef

		; CHECK: cost of [[#mul(div(512-1, VBITS)+1, 2)]] for instruction: %udiv512.i32 = udiv <16 x i32> undef, undef
		%udiv512.i32 = udiv <16 x i32> undef, undef

		; CHECK: cost of [[#mul(div(512-1, VBITS)+1, 2)]] for instruction: %udiv512.i64 = udiv <8 x i64> undef, undef
		%udiv512.i64 = udiv <8 x i64> undef, undef

		ret void
		}

		; The hard-coded expected cost is based on VBITS=128
		define void @mul() #0 {
		; CHECK: cost of [[#div(128-1, VBITS)+1]] for instruction: %mul128.i64 = mul <2 x i64> undef, undef
		%mul128.i64 = mul <2 x i64> undef, undef

		; CHECK: cost of [[#div(512-1, VBITS)+1]] for instruction: %mul512.i64 = mul <8 x i64> undef, undef
		%mul512.i64 = mul <8 x i64> undef, undef

		ret void
		}

attributes #0 = { "target-features"="+sve" }		attributes #0 = { "target-features"="+sve" }

llvm/test/Analysis/CostModel/AArch64/sve-remainder.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s			; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	define void @test_urem_srem_expand() {			define void @test_urem_srem_expand() {
	; CHECK-LABEL: 'test_urem_srem_expand'			; CHECK-LABEL: 'test_urem_srem_expand'
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %legal_type_urem_0 = urem <vscale x 16 x i8> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %legal_type_urem_0 = urem <vscale x 16 x i8> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %legal_type_urem_1 = urem <vscale x 8 x i16> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %legal_type_urem_1 = urem <vscale x 8 x i16> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %legal_type_urem_2 = urem <vscale x 4 x i32> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %legal_type_urem_2 = urem <vscale x 4 x i32> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %legal_type_urem_3 = urem <vscale x 2 x i64> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %legal_type_urem_3 = urem <vscale x 2 x i64> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %legal_type_srem_0 = srem <vscale x 16 x i8> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %legal_type_srem_0 = srem <vscale x 16 x i8> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %legal_type_srem_1 = srem <vscale x 8 x i16> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %legal_type_srem_1 = srem <vscale x 8 x i16> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %legal_type_srem_2 = srem <vscale x 4 x i32> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %legal_type_srem_2 = srem <vscale x 4 x i32> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %legal_type_srem_3 = srem <vscale x 2 x i64> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %legal_type_srem_3 = srem <vscale x 2 x i64> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %split_type_urem_0 = urem <vscale x 32 x i8> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %split_type_urem_0 = urem <vscale x 32 x i8> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %split_type_urem_1 = urem <vscale x 16 x i16> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %split_type_urem_1 = urem <vscale x 16 x i16> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %split_type_urem_2 = urem <vscale x 8 x i32> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %split_type_urem_2 = urem <vscale x 8 x i32> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %split_type_urem_3 = urem <vscale x 4 x i64> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %split_type_urem_3 = urem <vscale x 4 x i64> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %split_type_srem_0 = srem <vscale x 32 x i8> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %split_type_srem_0 = srem <vscale x 32 x i8> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %split_type_srem_1 = srem <vscale x 16 x i16> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %split_type_srem_1 = srem <vscale x 16 x i16> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %split_type_srem_2 = srem <vscale x 8 x i32> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %split_type_srem_2 = srem <vscale x 8 x i32> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %split_type_srem_3 = srem <vscale x 4 x i64> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %split_type_srem_3 = srem <vscale x 4 x i64> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %widen_type_urem_0 = urem <vscale x 31 x i8> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %widen_type_urem_0 = urem <vscale x 31 x i8> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %widen_type_urem_1 = urem <vscale x 15 x i16> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %widen_type_urem_1 = urem <vscale x 15 x i16> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %widen_type_urem_2 = urem <vscale x 7 x i32> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %widen_type_urem_2 = urem <vscale x 7 x i32> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %widen_type_urem_3 = urem <vscale x 3 x i64> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %widen_type_urem_3 = urem <vscale x 3 x i64> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %widen_type_srem_0 = srem <vscale x 31 x i8> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %widen_type_srem_0 = srem <vscale x 31 x i8> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %widen_type_srem_1 = srem <vscale x 15 x i16> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %widen_type_srem_1 = srem <vscale x 15 x i16> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %widen_type_srem_2 = srem <vscale x 7 x i32> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %widen_type_srem_2 = srem <vscale x 7 x i32> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %widen_type_srem_3 = srem <vscale x 3 x i64> undef, undef			; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %widen_type_srem_3 = srem <vscale x 3 x i64> undef, undef
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	entry:			entry:
	%legal_type_urem_0 = urem <vscale x 16 x i8> undef, undef			%legal_type_urem_0 = urem <vscale x 16 x i8> undef, undef
	%legal_type_urem_1 = urem <vscale x 8 x i16> undef, undef			%legal_type_urem_1 = urem <vscale x 8 x i16> undef, undef
	%legal_type_urem_2 = urem <vscale x 4 x i32> undef, undef			%legal_type_urem_2 = urem <vscale x 4 x i32> undef, undef
	%legal_type_urem_3 = urem <vscale x 2 x i64> undef, undef			%legal_type_urem_3 = urem <vscale x 2 x i64> undef, undef
	%legal_type_srem_0 = srem <vscale x 16 x i8> undef, undef			%legal_type_srem_0 = srem <vscale x 16 x i8> undef, undef
	Show All 24 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Improve cost model for some 128-bit vector operations that use SVEClosedPublic

Details

Diff Detail