This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64TargetTransformInfo.h
11/16
AArch64TargetTransformInfo.cpp
-
test/Analysis/CostModel/AArch64/
-
Analysis/
-
CostModel/
-
AArch64/
1/1
ext-rhadd.ll

Differential D157628

[AArch64][SVE2] Change the cost of extends with S/URHADD to 0
ClosedPublic

Authored by kmclaughlin on Aug 10 2023, 8:32 AM.

Download Raw Diff

Details

Reviewers

david-arm
hassnaa-arm
dtemirbulatov
sdesmalen
efriedma

Commits

rG9a98ab589a4f: [AArch64][SVE2] Change the cost of extends with S/URHADD to 0
rGdda2cd250530: [AArch64][SVE2] Change the cost of extends with S/URHADD to 0

Summary

When SVE2 is enabled, we can combine an add of 1, add & shift right by 1
to a single s/urhadd instruction. If the operands to the adds are extended,
these extends will fold into the s/urhadd and their costs should be 0.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kmclaughlin created this revision.Aug 10 2023, 8:32 AM

Herald added a reviewer: efriedma. · View Herald TranscriptAug 10 2023, 8:32 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: ctetreau, hiraditya, kristof.beyls. · View Herald Transcript

kmclaughlin requested review of this revision.Aug 10 2023, 8:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2023, 8:32 AM

Herald added subscribers: llvm-commits, wangpc. · View Herald Transcript

Harbormaster completed remote builds in B251704: Diff 549049.Aug 10 2023, 1:56 PM

dtemirbulatov added inline comments.Aug 11 2023, 5:00 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2105	Check for Trunc->hasOneUser() here as well?

dtemirbulatov added inline comments.Aug 11 2023, 6:35 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2105	oh, Trunc here is a final instruction in the sequence, so no need to check for hasOneUser().

I noticed now test/CodeGen/AArch64/sve-hadd.ll to test S/URHADD code generation. LGTM.

This revision is now accepted and ready to land.Aug 11 2023, 6:45 AM

Matt added a subscriber: Matt.Aug 12 2023, 12:05 AM

This revision was landed with ongoing or failed builds.Aug 14 2023, 3:33 AM

Closed by commit rGdda2cd250530: [AArch64][SVE2] Change the cost of extends with S/URHADD to 0 (authored by kmclaughlin). · Explain Why

This revision was automatically updated to reflect the committed changes.

kmclaughlin added a commit: rGdda2cd250530: [AArch64][SVE2] Change the cost of extends with S/URHADD to 0.

sdesmalen added inline comments.Aug 14 2023, 3:37 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2088	Is it possible for I to have no users? (if so, should it return?)
llvm/test/Transforms/LoopVectorize/AArch64/sve2-ext-rhadd-costs.ll
44 ↗	(On Diff #549863)	Is there a way to test this without requiring a loop? (and without requiring these tests to be in llvm/test/Transforms/LoopVectorize)

kmclaughlin added a reverting change: rG5d814b384826: Revert "[AArch64][SVE2] Change the cost of extends with S/URHADD to 0".Aug 14 2023, 3:52 AM

kmclaughlin reopened this revision.Aug 14 2023, 6:46 AM

kmclaughlin added inline comments.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2088	If we have reached this point then we can assume there is only one user, as there are checks above which return if `!I->hasOneUser()`
llvm/test/Transforms/LoopVectorize/AArch64/sve2-ext-rhadd-costs.ll
44 ↗	(On Diff #549863)	I've rewritten these tests so that they don't need a loop and moved them to `Analysis/CostModel/AArch64/sve2-ext-rhadd.ll`

This revision is now accepted and ready to land.Aug 14 2023, 6:46 AM

Simplified cost model tests and moved them to test/Analysis/CostModel/AArch64/sve2-ext-rhadd.ll

Harbormaster completed remote builds in B252335: Diff 549912.Aug 14 2023, 8:37 AM

Removed unnecessary test from test/Transforms/LoopVectorize

Harbormaster completed remote builds in B252640: Diff 550328.Aug 15 2023, 8:40 AM

sdesmalen added inline comments.Aug 15 2023, 8:47 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2047–2056	nit: I find the description a bit confusing, how about: s/urhadd instructions implements the following pattern, making the extends free: %x = (zext i8 -> i16) %y = (zext i8 -> i16) + 1) trunc i16 (shl (add %x, %y), 1)) -> i8 ?
2057	nit: Perhaps better to make this a `CastInst`, because it it must be?
2057	Can you give `I` a more descriptive name? Or add some documentation above to explain what the relationship between `I` and `Ext` is?
2069	This code is puzzling me quite a bit. Because of how this function is called (`I` is the only user of `Ext`), the only possible inputs we can have is: add ( Ext, ... ) add ( ..., Ext ) So if `Op1` is not a constant, then the input must be `add (..., Ext)`. Here you're asking the question if `... == Ext`, which we know is `false`, after which `Op` is assigned `...`. It then changes the meaning of `I`, at which point I'm a bit lost :) It's a lot easier to use the Patterns from PatternMatch.h for this, that way you can do things like: if (match(I, m_c_Add(m_Specific(Ext), m_c_Add(m_ZExt(m_Value(V)), m_SpecificInt(1))))) which also handles the commutativity of the add. Note that instead of directly matching for m_ZExt, you could match `m_UnOp` and have another check to see that it's either `ZExt` (if `Ext` is a `ZExt`), or a SExt (if `Ext` is a `SExt`).
2086	Is it worth first checking if this add has a single user which is a `LShr`, which itself has a single user that's a `Trunc`? That way you avoid having to match the whole expression, only to find out that the user of the `add(add(..), Ext)` isn't used by a `LShr`. It probably makes it a lot easier to match the entire pattern once you know you have a `Trunc(LShr(Add(..), ..))` expression on your hands, when you take my suggestion above to use the `m_<patterns>` from PatternMatch.h.

kmclaughlin planned changes to this revision.Aug 15 2023, 8:53 AM

Refactored isExtShiftRightAdd to use patterns from PatternMatch

This revision is now accepted and ready to land.Aug 17 2023, 9:04 AM

Harbormaster completed remote builds in B253241: Diff 551165.Aug 17 2023, 10:41 AM

kmclaughlin requested review of this revision.Aug 21 2023, 1:52 AM

sdesmalen added inline comments.Aug 21 2023, 3:20 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2053	nit: is it worth renaming this to `isExtPartOfAvgExpr` ?
2062	`getUniqueUndroppableUser()` can return `nullptr` if there isn't a unique and droppable user, so this should use `dyn_cast_or_null`. I guess this is also missing a negative test-case?
2084	Rather than checking for `getUniqueUndroppableUser()` again here, maybe you can generalise the match expression using `m_Instruction(Ext1)` and then add a new match expression to see if `m_ZExtOrSext(Ext1)` and `Ext1->getOpcode() == Ext2->getOpcode()`

Renamed isExtShiftRightAdd to isExtPartOfAvgExpr
Used dyn_cast_or_null with casts from getUniqueUndroppableUser()
Added a new match expression for m_ZExtOrSext and check the opcodes of the extends match
Added fixed-width & negative tests

Harbormaster completed remote builds in B253875: Diff 552059.Aug 21 2023, 10:54 AM

This looks really nice now @kmclaughlin! I just had a few more comments ...

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2060	Is it worth having a negative test for cases like i16->i64 that we don't support?
2075	Unless I've missed something, this seems to be saying if the final trunc has more than one user then don't treat the sext/zext as free, right? That seems a bit restrictive because this is the end of the pattern required to get urhadd/srhadd matched in the backend. In theory, I don't see why you can't have %ld1 = load <vscale x 16 x i8>, ptr %gep1 %ld2 = load <vscale x 16 x i8>, ptr %gep2 %ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16> %ext2 = sext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16> %add1 = add nuw nsw <vscale x 16 x i16> %ext1, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer) %add2 = add nuw nsw <vscale x 16 x i16> %add1, %ext2 %shr = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer) %trunc = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8> store <vscale x 16 x i16> %trunc, ptr %a, align 16 ret <vscale x 16 x i8> %trunc and you should still get ld1b ... ld1b ... srhadd z0, ... st1b z0 ... ret
llvm/test/Analysis/CostModel/AArch64/sve2-ext-rhadd.ll
1 ↗	(On Diff #552059)	Given this file now contains fixed-width vectors testing for NEON's urhadd and shade is it worth renaming the file to something like `ext-rhadd.ll`?
15 ↗	(On Diff #552059)	I think you can simplify the tests here and remove the GEPs, then just pass the ptr directly to the load? That way you can also remove the `%n` argument too. For example, %ld1 = load <16 x i8>, ptr %a %ld2 = load <16 x i8>, ptr %b

Removed check that Dst type is double the Src type and replaced with a check that Src is a legal type
Added tests where the extends are more than doubling the Src type
Renamed test file & removed GEPs from tests

Harbormaster completed remote builds in B254092: Diff 552351.Aug 22 2023, 8:22 AM

kmclaughlin added inline comments.Aug 22 2023, 8:47 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2060	When trying to add tests for this I realised that we will still generate s/urhadd instructions when the extend is not doubling. For example, extending from i16->i64: ptrue p0.h ld1h { z0.h }, p0/z, [x0] ld1h { z1.h }, p0/z, [x1] srhadd z0.h, p0/m, z0.h, z1.h Instead I've replaced this with a check that the source type is legal and added some tests for cases such as the above.
2075	This is checking that the `LShr` instruction only has one user, then the checks below make sure that this is a truncate. There is no check that the truncate has only one use, so the extends will still be treated as free if the rest of the pattern matches. I've amended the tests to add a store of the final truncate so that this is tested.

LGTM with nit addressed! Grazie @kmclaughlin. :)

llvm/test/Analysis/CostModel/AArch64/ext-rhadd.ll
72	nit: I think the test should be named `@urhadd_i32_zext_i64_fixed` and the same for other tests with the same issue below.

Fixed incorrect names of the sign-extend tests in ext-rhadd.ll

kmclaughlin marked an inline comment as done.Aug 22 2023, 10:26 AM

Harbormaster completed remote builds in B254139: Diff 552422.Aug 22 2023, 11:28 AM

LGTM!

This revision is now accepted and ready to land.Aug 29 2023, 1:10 AM

Closed by commit rG9a98ab589a4f: [AArch64][SVE2] Change the cost of extends with S/URHADD to 0 (authored by kmclaughlin). · Explain WhyAug 29 2023, 5:44 AM

This revision was automatically updated to reflect the committed changes.

kmclaughlin added a commit: rG9a98ab589a4f: [AArch64][SVE2] Change the cost of extends with S/URHADD to 0.

kmclaughlin mentioned this in D158988: [LV] Choose the wider VF where they have same cost.Aug 29 2023, 10:27 AM

Allen mentioned this in D159273: [AArch64] Delete an unused parameter for isExtPartOfAvgExpr, NFC.Aug 31 2023, 5:38 AM

Allen mentioned this in rGf41223eecaeb: [AArch64][SVE2] Delete an unused parameter for isExtPartOfAvgExpr, NFC.Sep 1 2023, 8:50 AM

Allen added a child revision: D159273: [AArch64] Delete an unused parameter for isExtPartOfAvgExpr, NFC.Sep 1 2023, 8:52 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.h

3 lines

AArch64TargetTransformInfo.cpp

55 lines

test/

Analysis/

CostModel/

AArch64/

ext-rhadd.ll

201 lines

Diff 554274

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,		InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,
const Value *Ptr, bool VariableMask,		const Value *Ptr, bool VariableMask,
Align Alignment,		Align Alignment,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

		bool isExtPartOfAvgExpr(const Instruction ExtUser, const CastInst Ext,
		Type Dst, Type Src);

InstructionCost getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		InstructionCost getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::CastContextHint CCH,		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

InstructionCost getExtractWithExtendCost(unsigned Opcode, Type *Dst,		InstructionCost getExtractWithExtendCost(unsigned Opcode, Type *Dst,
VectorType *VecTy, unsigned Index);		VectorType *VecTy, unsigned Index);

▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 2,038 Lines • ▼ Show 20 Lines	bool AArch64TTIImpl::isWideningInstruction(Type *DstTy, unsigned Opcode,
InstructionCost NumSrcEls =		InstructionCost NumSrcEls =
SrcTyL.first * SrcTyL.second.getVectorMinNumElements();		SrcTyL.first * SrcTyL.second.getVectorMinNumElements();

// Return true if the legalized types have the same number of vector elements		// Return true if the legalized types have the same number of vector elements
// and the destination element type size is twice that of the source type.		// and the destination element type size is twice that of the source type.
return NumDstEls == NumSrcEls && 2 * SrcElTySize == DstEltSize;		return NumDstEls == NumSrcEls && 2 * SrcElTySize == DstEltSize;
}		}

		// s/urhadd instructions implement the following pattern, making the
		// extends free:
		// %x = add ((zext i8 -> i16), 1)
		// %y = (zext i8 -> i16)
		// trunc i16 (lshr (add %x, %y), 1) -> i8
		//
		bool AArch64TTIImpl::isExtPartOfAvgExpr(const Instruction *ExtUser,
		sdesmalenUnsubmitted Done Reply Inline Actions nit: is it worth renaming this to `isExtPartOfAvgExpr` ? sdesmalen: nit: is it worth renaming this to `isExtPartOfAvgExpr` ?
		const CastInst Ext, Type Dst,
		Type *Src) {

		sdesmalenUnsubmitted Done Reply Inline Actions nit: I find the description a bit confusing, how about: s/urhadd instructions implements the following pattern, making the extends free: %x = (zext i8 -> i16) %y = (zext i8 -> i16) + 1) trunc i16 (shl (add %x, %y), 1)) -> i8 ? sdesmalen: nit: I find the description a bit confusing, how about: s/urhadd instructions implements the…
		// The source should be a legal vector type.
		sdesmalenUnsubmitted Done Reply Inline Actions nit: Perhaps better to make this a `CastInst`, because it it must be? sdesmalen: nit: Perhaps better to make this a `CastInst`, because it it must be?
		sdesmalenUnsubmitted Done Reply Inline Actions Can you give `I` a more descriptive name? Or add some documentation above to explain what the relationship between `I` and `Ext` is? sdesmalen: Can you give `I` a more descriptive name? Or add some documentation above to explain what the…
		if (!Src->isVectorTy() \|\| !TLI->isTypeLegal(TLI->getValueType(DL, Src)) \|\|
		(Src->isScalableTy() && !ST->hasSVE2()))
		return false;
		david-armUnsubmitted Done Reply Inline Actions Is it worth having a negative test for cases like i16->i64 that we don't support? david-arm: Is it worth having a negative test for cases like i16->i64 that we don't support?
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions When trying to add tests for this I realised that we will still generate s/urhadd instructions when the extend is not doubling. For example, extending from i16->i64: ptrue p0.h ld1h { z0.h }, p0/z, [x0] ld1h { z1.h }, p0/z, [x1] srhadd z0.h, p0/m, z0.h, z1.h Instead I've replaced this with a check that the source type is legal and added some tests for cases such as the above. kmclaughlin: When trying to add tests for this I realised that we will still generate s/urhadd instructions…

		if (ExtUser->getOpcode() != Instruction::Add \|\| !ExtUser->hasOneUse())
		sdesmalenUnsubmitted Done Reply Inline Actions `getUniqueUndroppableUser()` can return `nullptr` if there isn't a unique and droppable user, so this should use `dyn_cast_or_null`. I guess this is also missing a negative test-case? sdesmalen: `getUniqueUndroppableUser()` can return `nullptr` if there isn't a unique and droppable user…
		return false;

		// Look for trunc/shl/add before trying to match the pattern.
		const Instruction *Add = ExtUser;
		auto *AddUser =
		dyn_cast_or_null<Instruction>(Add->getUniqueUndroppableUser());
		if (AddUser && AddUser->getOpcode() == Instruction::Add)
		sdesmalenUnsubmitted Not Done Reply Inline Actions This code is puzzling me quite a bit. Because of how this function is called (`I` is the only user of `Ext`), the only possible inputs we can have is: add ( Ext, ... ) add ( ..., Ext ) So if `Op1` is not a constant, then the input must be `add (..., Ext)`. Here you're asking the question if `... == Ext`, which we know is `false`, after which `Op` is assigned `...`. It then changes the meaning of `I`, at which point I'm a bit lost :) It's a lot easier to use the Patterns from PatternMatch.h for this, that way you can do things like: if (match(I, m_c_Add(m_Specific(Ext), m_c_Add(m_ZExt(m_Value(V)), m_SpecificInt(1))))) which also handles the commutativity of the add. Note that instead of directly matching for m_ZExt, you could match `m_UnOp` and have another check to see that it's either `ZExt` (if `Ext` is a `ZExt`), or a SExt (if `Ext` is a `SExt`). sdesmalen: This code is puzzling me quite a bit. Because of how this function is called (`I` is the only…
		Add = AddUser;

		auto *Shr = dyn_cast_or_null<Instruction>(Add->getUniqueUndroppableUser());
		if (!Shr \|\| Shr->getOpcode() != Instruction::LShr)
		return false;

		david-armUnsubmitted Done Reply Inline Actions Unless I've missed something, this seems to be saying if the final trunc has more than one user then don't treat the sext/zext as free, right? That seems a bit restrictive because this is the end of the pattern required to get urhadd/srhadd matched in the backend. In theory, I don't see why you can't have %ld1 = load <vscale x 16 x i8>, ptr %gep1 %ld2 = load <vscale x 16 x i8>, ptr %gep2 %ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16> %ext2 = sext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16> %add1 = add nuw nsw <vscale x 16 x i16> %ext1, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer) %add2 = add nuw nsw <vscale x 16 x i16> %add1, %ext2 %shr = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer) %trunc = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8> store <vscale x 16 x i16> %trunc, ptr %a, align 16 ret <vscale x 16 x i8> %trunc and you should still get ld1b ... ld1b ... srhadd z0, ... st1b z0 ... ret david-arm: Unless I've missed something, this seems to be saying if the final trunc has more than one user…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions This is checking that the `LShr` instruction only has one user, then the checks below make sure that this is a truncate. There is no check that the truncate has only one use, so the extends will still be treated as free if the rest of the pattern matches. I've amended the tests to add a store of the final truncate so that this is tested. kmclaughlin: This is checking that the `LShr` instruction only has one user, then the checks below make sure…
		auto *Trunc = dyn_cast_or_null<Instruction>(Shr->getUniqueUndroppableUser());
		if (!Trunc \|\| Trunc->getOpcode() != Instruction::Trunc \|\|
		Src->getScalarSizeInBits() !=
		cast<CastInst>(Trunc)->getDestTy()->getScalarSizeInBits())
		return false;

		// Try to match the whole pattern. Ext could be either the first or second
		// m_ZExtOrSExt matched.
		Instruction Ex1, Ex2;
		sdesmalenUnsubmitted Done Reply Inline Actions Rather than checking for `getUniqueUndroppableUser()` again here, maybe you can generalise the match expression using `m_Instruction(Ext1)` and then add a new match expression to see if `m_ZExtOrSext(Ext1)` and `Ext1->getOpcode() == Ext2->getOpcode()` sdesmalen: Rather than checking for `getUniqueUndroppableUser()` again here, maybe you can generalise the…
		if (!(match(Add, m_c_Add(m_Instruction(Ex1),
		m_c_Add(m_Instruction(Ex2), m_SpecificInt(1))))))
		sdesmalenUnsubmitted Not Done Reply Inline Actions Is it worth first checking if this add has a single user which is a `LShr`, which itself has a single user that's a `Trunc`? That way you avoid having to match the whole expression, only to find out that the user of the `add(add(..), Ext)` isn't used by a `LShr`. It probably makes it a lot easier to match the entire pattern once you know you have a `Trunc(LShr(Add(..), ..))` expression on your hands, when you take my suggestion above to use the `m_<patterns>` from PatternMatch.h. sdesmalen: Is it worth first checking if this add has a single user which is a `LShr`, which itself has a…
		return false;

		sdesmalenUnsubmitted Not Done Reply Inline Actions Is it possible for I to have no users? (if so, should it return?) sdesmalen: Is it possible for I to have no users? (if so, should it return?)
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions If we have reached this point then we can assume there is only one user, as there are checks above which return if `!I->hasOneUser()` kmclaughlin: If we have reached this point then we can assume there is only one user, as there are checks…
		// Ensure both extends are of the same type
		if (match(Ex1, m_ZExtOrSExt(m_Value())) &&
		Ex1->getOpcode() == Ex2->getOpcode())
		return true;

		return false;
		}

InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,		InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
Type *Src,		Type *Src,
TTI::CastContextHint CCH,		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) {		const Instruction *I) {
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");
// If the cast is observable, and it is used by a widening instruction (e.g.,		// If the cast is observable, and it is used by a widening instruction (e.g.,
// uaddl, saddw, etc.), it may be free.		// uaddl, saddw, etc.), it may be free.
		dtemirbulatovUnsubmitted Not Done Reply Inline Actions Check for Trunc->hasOneUser() here as well? dtemirbulatov: Check for Trunc->hasOneUser() here as well?
		dtemirbulatovUnsubmitted Not Done Reply Inline Actions oh, Trunc here is a final instruction in the sequence, so no need to check for hasOneUser(). dtemirbulatov: oh, Trunc here is a final instruction in the sequence, so no need to check for hasOneUser().
if (I && I->hasOneUser()) {		if (I && I->hasOneUser()) {
auto SingleUser = cast<Instruction>(I->user_begin());		auto SingleUser = cast<Instruction>(I->user_begin());
SmallVector<const Value *, 4> Operands(SingleUser->operand_values());		SmallVector<const Value *, 4> Operands(SingleUser->operand_values());
if (isWideningInstruction(Dst, SingleUser->getOpcode(), Operands, Src)) {		if (isWideningInstruction(Dst, SingleUser->getOpcode(), Operands, Src)) {
// For adds only count the second operand as free if both operands are		// For adds only count the second operand as free if both operands are
// extends but not the same operation. (i.e both operands are not free in		// extends but not the same operation. (i.e both operands are not free in
// add(sext, zext)).		// add(sext, zext)).
if (SingleUser->getOpcode() == Instruction::Add) {		if (SingleUser->getOpcode() == Instruction::Add) {
if (I == SingleUser->getOperand(1) \|\|		if (I == SingleUser->getOperand(1) \|\|
(isa<CastInst>(SingleUser->getOperand(1)) &&		(isa<CastInst>(SingleUser->getOperand(1)) &&
cast<CastInst>(SingleUser->getOperand(1))->getOpcode() == Opcode))		cast<CastInst>(SingleUser->getOperand(1))->getOpcode() == Opcode))
return 0;		return 0;
} else // Others are free so long as isWideningInstruction returned true.		} else // Others are free so long as isWideningInstruction returned true.
return 0;		return 0;
}		}

		// The cast will be free for the s/urhadd instructions
		if ((isa<ZExtInst>(I) \|\| isa<SExtInst>(I)) &&
		isExtPartOfAvgExpr(SingleUser, cast<CastInst>(I), Dst, Src))
		return 0;
}		}

// TODO: Allow non-throughput costs that aren't binary.		// TODO: Allow non-throughput costs that aren't binary.
auto AdjustCost = [&CostKind](InstructionCost Cost) -> InstructionCost {		auto AdjustCost = [&CostKind](InstructionCost Cost) -> InstructionCost {
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return Cost == 0 ? 0 : 1;		return Cost == 0 ? 0 : 1;
return Cost;		return Cost;
};		};
▲ Show 20 Lines • Show All 1,740 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/ext-rhadd.ll

This file was added.

				; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s -check-prefix=SVE
				; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple aarch64-linux-gnu -mattr=+sve2 < %s \| FileCheck %s --check-prefix=SVE2

				; SRHADD

				define void @srhadd_i8_sext_i16_fixed(ptr %a, ptr %b, ptr %dst) {
				; SVE-LABEL: 'srhadd_i8_sext_i16_fixed'
				; SVE: Cost Model: Found an estimated cost of 0 for instruction: %ext1 = sext <16 x i8> %ld1 to <16 x i16>
				; SVE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %ext2 = sext <16 x i8> %ld2 to <16 x i16>
				;
				; SVE2-LABEL: 'srhadd_i8_sext_i16_fixed'
				; SVE2: Cost Model: Found an estimated cost of 0 for instruction: %ext1 = sext <16 x i8> %ld1 to <16 x i16>
				; SVE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %ext2 = sext <16 x i8> %ld2 to <16 x i16>
				;
				%ld1 = load <16 x i8>, ptr %a
				%ld2 = load <16 x i8>, ptr %b
				%ext1 = sext <16 x i8> %ld1 to <16 x i16>
				%ext2 = sext <16 x i8> %ld2 to <16 x i16>
				%add1 = add nuw nsw <16 x i16> %ext1, shufflevector (<16 x i16> insertelement (<16 x i16> poison, i16 1, i64 0), <16 x i16> poison, <16 x i32> zeroinitializer)
				%add2 = add nuw nsw <16 x i16> %add1, %ext2
				%shr = lshr <16 x i16> %add2, shufflevector (<16 x i16> insertelement (<16 x i16> poison, i16 1, i64 0), <16 x i16> poison, <16 x i32> zeroinitializer)
				%trunc = trunc <16 x i16> %shr to <16 x i8>
				store <16 x i8> %trunc, ptr %a
				ret void
				}

				define void @srhadd_i8_sext_i16_scalable(ptr %a, ptr %b, ptr %dst) {
				; SVE-LABEL: 'srhadd_i8_sext_i16_scalable'
				; SVE: Cost Model: Found an estimated cost of 2 for instruction: %ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
				; SVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %ext2 = sext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
				;
				; SVE2-LABEL: 'srhadd_i8_sext_i16_scalable'
				; SVE2: Cost Model: Found an estimated cost of 0 for instruction: %ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
				; SVE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %ext2 = sext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
				;
				%ld1 = load <vscale x 16 x i8>, ptr %a
				%ld2 = load <vscale x 16 x i8>, ptr %b
				%ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
				%ext2 = sext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
				%add1 = add nuw nsw <vscale x 16 x i16> %ext1, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 16 x i16> %add1, %ext2
				%shr = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%trunc = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8>
				store <vscale x 16 x i8> %trunc, ptr %a
				ret void
				}

				define void @srhadd_i16_sext_i64_scalable(ptr %a, ptr %b, ptr %dst) {
				; SVE-LABEL: 'srhadd_i16_sext_i64_scalable'
				; SVE: Cost Model: Found an estimated cost of 6 for instruction: %ext1 = sext <vscale x 8 x i16> %ld1 to <vscale x 8 x i64>
				; SVE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %ext2 = sext <vscale x 8 x i16> %ld2 to <vscale x 8 x i64>
				;
				; SVE2-LABEL: 'srhadd_i16_sext_i64_scalable'
				; SVE2: Cost Model: Found an estimated cost of 0 for instruction: %ext1 = sext <vscale x 8 x i16> %ld1 to <vscale x 8 x i64>
				; SVE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %ext2 = sext <vscale x 8 x i16> %ld2 to <vscale x 8 x i64>
				;
				%ld1 = load <vscale x 8 x i16>, ptr %a
				%ld2 = load <vscale x 8 x i16>, ptr %b
				%ext1 = sext <vscale x 8 x i16> %ld1 to <vscale x 8 x i64>
				%ext2 = sext <vscale x 8 x i16> %ld2 to <vscale x 8 x i64>
				%add1 = add nuw nsw <vscale x 8 x i64> %ext1, shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 1, i64 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 8 x i64> %add1, %ext2
				%shr = lshr <vscale x 8 x i64> %add2, shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 1, i64 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer)
				%trunc = trunc <vscale x 8 x i64> %shr to <vscale x 8 x i16>
				store <vscale x 8 x i16> %trunc, ptr %a
				ret void
				}

				; URHADD

				define void @urhadd_i32_zext_i64_fixed(ptr %a, ptr %b, ptr %dst) {
				; SVE-LABEL: 'urhadd_i32_zext_i64_fixed'
				david-armUnsubmitted Done Reply Inline Actions nit: I think the test should be named `@urhadd_i32_zext_i64_fixed` and the same for other tests with the same issue below. david-arm: nit: I think the test should be named `@urhadd_i32_zext_i64_fixed` and the same for other tests…
				; SVE: Cost Model: Found an estimated cost of 0 for instruction: %ext1 = zext <4 x i32> %ld1 to <4 x i64>
				; SVE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %ext2 = zext <4 x i32> %ld2 to <4 x i64>
				;
				; SVE2-LABEL: 'urhadd_i32_zext_i64_fixed'
				; SVE2: Cost Model: Found an estimated cost of 0 for instruction: %ext1 = zext <4 x i32> %ld1 to <4 x i64>
				; SVE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %ext2 = zext <4 x i32> %ld2 to <4 x i64>
				;
				%ld1 = load <4 x i32>, ptr %a
				%ld2 = load <4 x i32>, ptr %b
				%ext1 = zext <4 x i32> %ld1 to <4 x i64>
				%ext2 = zext <4 x i32> %ld2 to <4 x i64>
				%add1 = add nuw nsw <4 x i64> %ext1, shufflevector (<4 x i64> insertelement (<4 x i64> poison, i64 1, i64 0), <4 x i64> poison, <4 x i32> zeroinitializer)
				%add2 = add nuw nsw <4 x i64> %add1, %ext2
				%shr = lshr <4 x i64> %add2, shufflevector (<4 x i64> insertelement (<4 x i64> poison, i64 1, i64 0), <4 x i64> poison, <4 x i32> zeroinitializer)
				%trunc = trunc <4 x i64> %shr to <4 x i32>
				store <4 x i32> %trunc, ptr %a
				ret void
				}

				define void @urhadd_i8_zext_i64(ptr %a, ptr %b, ptr %dst) {
				; SVE-LABEL: 'urhadd_i8_zext_i64'
				; SVE: Cost Model: Found an estimated cost of 14 for instruction: %ext1 = zext <vscale x 16 x i8> %ld1 to <vscale x 16 x i64>
				; SVE-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %ext2 = zext <vscale x 16 x i8> %ld2 to <vscale x 16 x i64>
				;
				; SVE2-LABEL: 'urhadd_i8_zext_i64'
				; SVE2: Cost Model: Found an estimated cost of 0 for instruction: %ext1 = zext <vscale x 16 x i8> %ld1 to <vscale x 16 x i64>
				; SVE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %ext2 = zext <vscale x 16 x i8> %ld2 to <vscale x 16 x i64>
				;
				%ld1 = load <vscale x 16 x i8>, ptr %a
				%ld2 = load <vscale x 16 x i8>, ptr %b
				%ext1 = zext <vscale x 16 x i8> %ld1 to <vscale x 16 x i64>
				%ext2 = zext <vscale x 16 x i8> %ld2 to <vscale x 16 x i64>
				%add1 = add nuw nsw <vscale x 16 x i64> %ext1, shufflevector (<vscale x 16 x i64> insertelement (<vscale x 16 x i64> poison, i64 1, i64 0), <vscale x 16 x i64> poison, <vscale x 16 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 16 x i64> %add1, %ext2
				%shr = lshr <vscale x 16 x i64> %add2, shufflevector (<vscale x 16 x i64> insertelement (<vscale x 16 x i64> poison, i64 1, i64 0), <vscale x 16 x i64> poison, <vscale x 16 x i32> zeroinitializer)
				%trunc = trunc <vscale x 16 x i64> %shr to <vscale x 16 x i8>
				store <vscale x 16 x i8> %trunc, ptr %a
				ret void
				}

				define void @urhadd_i16_zext_i32(ptr %a, ptr %b, ptr %dst) {
				; SVE-LABEL: 'urhadd_i16_zext_i32'
				; SVE: Cost Model: Found an estimated cost of 2 for instruction: %ext1 = zext <vscale x 8 x i16> %ld1 to <vscale x 8 x i32>
				; SVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %ext2 = zext <vscale x 8 x i16> %ld2 to <vscale x 8 x i32>
				;
				; SVE2-LABEL: 'urhadd_i16_zext_i32'
				; SVE2: Cost Model: Found an estimated cost of 0 for instruction: %ext1 = zext <vscale x 8 x i16> %ld1 to <vscale x 8 x i32>
				; SVE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %ext2 = zext <vscale x 8 x i16> %ld2 to <vscale x 8 x i32>
				;
				%ld1 = load <vscale x 8 x i16>, ptr %a
				%ld2 = load <vscale x 8 x i16>, ptr %b
				%ext1 = zext <vscale x 8 x i16> %ld1 to <vscale x 8 x i32>
				%ext2 = zext <vscale x 8 x i16> %ld2 to <vscale x 8 x i32>
				%add1 = add nuw nsw <vscale x 8 x i32> %ext1, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i64 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 8 x i32> %add1, %ext2
				%shr = lshr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i64 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%trunc = trunc <vscale x 8 x i32> %shr to <vscale x 8 x i16>
				store <vscale x 8 x i16> %trunc, ptr %a
				ret void
				}

				; NEGATIVE TESTS

				define void @ext_operand_mismatch(ptr %a, ptr %b, ptr %dst) {
				; SVE-LABEL: 'ext_operand_mismatch'
				; SVE: Cost Model: Found an estimated cost of 2 for instruction: %ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
				; SVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %ext2 = zext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
				;
				; SVE2-LABEL: 'ext_operand_mismatch'
				; SVE2: Cost Model: Found an estimated cost of 2 for instruction: %ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
				; SVE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %ext2 = zext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
				;
				%ld1 = load <vscale x 16 x i8>, ptr %a
				%ld2 = load <vscale x 16 x i8>, ptr %b
				%ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
				%ext2 = zext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
				%add1 = add nuw nsw <vscale x 16 x i16> %ext1, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 16 x i16> %add1, %ext2
				%shr = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%trunc = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8>
				store <vscale x 16 x i8> %trunc, ptr %a
				ret void
				}

				define void @add_multiple_uses(ptr %a, ptr %b, ptr %dst) {
				; SVE-LABEL: 'add_multiple_uses'
				; SVE: Cost Model: Found an estimated cost of 2 for instruction: %ext1 = sext <vscale x 8 x i16> %ld1 to <vscale x 8 x i32>
				; SVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %ext2 = sext <vscale x 8 x i16> %ld2 to <vscale x 8 x i32>
				;
				; SVE2-LABEL: 'add_multiple_uses'
				; SVE2: Cost Model: Found an estimated cost of 2 for instruction: %ext1 = sext <vscale x 8 x i16> %ld1 to <vscale x 8 x i32>
				; SVE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %ext2 = sext <vscale x 8 x i16> %ld2 to <vscale x 8 x i32>
				;
				%ld1 = load <vscale x 8 x i16>, ptr %a
				%ld2 = load <vscale x 8 x i16>, ptr %b
				%ext1 = sext <vscale x 8 x i16> %ld1 to <vscale x 8 x i32>
				%ext2 = sext <vscale x 8 x i16> %ld2 to <vscale x 8 x i32>
				%add1 = add nuw nsw <vscale x 8 x i32> %ext1, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i64 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 8 x i32> %add1, %ext2
				%shr = lshr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i64 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
				%trunc = trunc <vscale x 8 x i32> %shr to <vscale x 8 x i16>
				%add.res = add nuw nsw <vscale x 8 x i32> %add1, %add2
				%res = trunc <vscale x 8 x i32> %add.res to <vscale x 8 x i16>
				store <vscale x 8 x i16> %res, ptr %a
				ret void
				}

				define void @shift_multiple_uses(ptr %a, ptr %b, ptr %dst) {
				; SVE-LABEL: 'shift_multiple_uses'
				; SVE: Cost Model: Found an estimated cost of 2 for instruction: %ext1 = zext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
				; SVE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %ext2 = zext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
				;
				; SVE2-LABEL: 'shift_multiple_uses'
				; SVE2: Cost Model: Found an estimated cost of 2 for instruction: %ext1 = zext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
				; SVE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %ext2 = zext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
				;
				%ld1 = load <vscale x 16 x i8>, ptr %a
				%ld2 = load <vscale x 16 x i8>, ptr %b
				%ext1 = zext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
				%ext2 = zext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
				%add1 = add nuw nsw <vscale x 16 x i16> %ext1, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%add2 = add nuw nsw <vscale x 16 x i16> %add1, %ext2
				%shr = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
				%trunc = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8>
				%add3 = add nuw nsw <vscale x 16 x i16> %shr, %add2
				%res = trunc <vscale x 16 x i16> %add3 to <vscale x 16 x i8>
				store <vscale x 16 x i8> %res, ptr %a
				ret void
				}