This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
17/39
AMDGPUInstCombineIntrinsic.cpp
-
test/Transforms/InstCombine/AMDGPU/
-
Transforms/
-
InstCombine/
-
AMDGPU/
1
amdgcn-intrinsics.ll
1/1
amdgcn-simplify-image-buffer-stores.ll

Differential D146737

[AMDGPU] Trim zero components from buffer and image stores
ClosedPublic

Authored by matejam on Mar 23 2023, 9:55 AM.

Download Raw Diff

Details

Reviewers

foad
mbrkusanin
piotr
rampitec
Joe_Nash
arsenm
nhaehnle
dstuttard

Commits

rG3181a6e3e7da: [AMDGPU] Trim zero components from buffer and image stores

Summary

For image and buffer stores the default behaviour on GFX11 and
older is to set all unset components to zero. So if we pass
only X component it will be the same as X000, or XY same as XY00.

This patch simplifies the passed vector of components in InstCombine
by removing zero components from the end.

For image stores it also trims DMask if necessary.

Diff Detail

Event Timeline

matejam created this revision.Mar 23 2023, 9:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2023, 9:55 AM

Herald added subscribers: nlopes, kosarev, StephenFan and 8 others. · View Herald Transcript

matejam requested review of this revision.Mar 23 2023, 9:55 AM

Herald added a subscriber: wdng. · View Herald TranscriptMar 23 2023, 9:55 AM

Harbormaster completed remote builds in B221342: Diff 507777.Mar 23 2023, 11:41 AM

matejam added a reviewer: Joe_Nash.Mar 24 2023, 7:37 AM

arsenm added inline comments.Mar 24 2023, 7:56 AM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
360	Can you use possiblyDemandedEltsInMask or something else from vector utils? This doesn't feel like something you would need to invent yourself
362	This is an unchecked dyn_cast. You already did a dyn_cast at the call site, so just pass in Instruction to begin with?
365	SmallVector
366	Don't use std::map. Also, can this just be a SmallVector in the reverse direction per component?
386	Replace the opcode check with the dyn_cast
401–402	Replace the opcode check with the dyn_cast
1135	Don't bother using APInt here?

Thanks for the review!

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
360	I thought about using that, but this is a bit more than that. It doesn't just find which elements were set, but also takes into consideration the DMask for image intrinsics and also ignores zeros that were inserted at the end of the vector, which is the primary goal of this patch. There is also findDemandedEltsByAllUsers, but that works in a different direction, meaning that it will find the uses of the given Value * and update DemandedElts that way. In my case I look at the definitions (insertelement, shufflevector or ConstantVector) of the given Value * and "recursively" go up and update DemandedElts accordingly.
365	Will be done.
366	I will look into that.
386	Will be done.
401–402	Will be done.
1135	I see I can do the same thing a lot more easier. Like "if (DMaskVal > (1 << VWidth))"

matejam added a reviewer: arsenm.Mar 24 2023, 9:13 AM

matejam added inline comments.Mar 24 2023, 9:30 AM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
366	What about MapVector? I'm not sure how would I use SmallVector to implement a map?

Using SmallVector instead of std::vector, MapVector instead of std::map, eliminated some unnecessary dyn_casts and a few more small changes.

Harbormaster completed remote builds in B221626: Diff 508140.Mar 24 2023, 10:53 AM

arsenm added inline comments.Mar 31 2023, 3:16 PM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
366	I meant it's the value to vector indexes? You could just go the other direction and find the value from the index
387	Unchecked dyn_cast, need a test with a variable vector index

Use of SmallVector instead of VectorMap for tracking which components were already added.
Remove some unnecessary dyn_casts.

Harbormaster completed remote builds in B223554: Diff 510767.Apr 4 2023, 6:59 AM

Rebase and minor changes.

Harbormaster completed remote builds in B225012: Diff 512757.Apr 12 2023, 4:59 AM

Rebase.

@foad @arsenm would you please review this?
Thank you.

Harbormaster completed remote builds in B228250: Diff 517104.Apr 26 2023, 3:31 AM

foad added inline comments.Apr 26 2023, 4:06 AM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
365–368	Maybe call them `ComponentIndices` and `ComponentValues`?
371	I wonder if the whole of this loop could be replaced by calling llvm::computeKnownBits with an appropriate DemandedElts mask to test each element from the last to the first.
1099	Use getArgOperand throughout for consistency?
1102	Why does this need to be an Instruction? Couldn't it be a Constant? E.g. if it was an all zeroes constant, that should be simplified.
1134	Should be `>=`
1270–1271	Keep the braces around this, because it is more than one physical line.

[AMDGPU] Default component broadcast store

There is no "broadcasting" here. Maybe call it something like "trim zero components from buffer and image stores"?

arsenm added inline comments.Apr 26 2023, 7:51 AM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
1096	Don't see why the element type would matter? They're a bit artificial

matejam added inline comments.Apr 27 2023, 6:59 AM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
371	I looked that up. computeKnownBits works only for integers, pointers and vector of integers.

foad added inline comments.Apr 27 2023, 7:54 AM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
371	You might be able to use the brand new computeKnownFPClass instead, and check if the result is exactly fcPosZero?

matejam updated this revision to Diff 517563.Apr 27 2023, 8:05 AM

matejam marked 5 inline comments as done.

matejam marked an inline comment as not done.Apr 27 2023, 8:13 AM

matejam added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
371	Thanks.

Harbormaster completed remote builds in B228560: Diff 517563.Apr 27 2023, 9:04 AM

Changes in findDemandedElts, use computeKnownFPClass.

When I suggested using computeKnownFPClass I meant something like this: https://reviews.llvm.org/differential/diff/517952/
Further cleanups are possible.

Thank you @foad.
findDemandedElts with correct usage of computeKnownFPClass.

matejam retitled this revision from [AMDGPU] Default component broadcast store to [AMDGPU] Trim zero components from buffer and image stores.Apr 28 2023, 9:53 AM

matejam edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B228859: Diff 517963.Apr 28 2023, 10:26 AM

Rebase and change in comments.

Harbormaster completed remote builds in B229684: Diff 519066.May 3 2023, 7:52 AM

Please review this. @foad @arsenm

foad added inline comments.May 9 2023, 6:46 AM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
358	This function needs a better name now to explain what it does, and a comment, and should return APInt instead of taking an APInt& argument.
1110–1113	It would be simpler to use dyn_cast and break if it fails.
1119	`dyn_cast` should be `isa` since you don't use the result. But what is this actually testing for? Would it be better to test hasNamedOperand?

foad added inline comments.May 9 2023, 7:30 AM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
1119	I think these intrinsics all have `immarg` on the dmask argument, so it should always be a constant?

arsenm added inline comments.May 9 2023, 10:46 AM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
1119	Yes, should be able to just cast<ConstantInt>

Thanks for the review.

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
358	Is the name trimTrailingZerosInVector more explanatory?
1119	Both image and buffer intrinsics have the same body in this switch. For buffer instructions the first operand is not a constant, that is why I used dyn_cast<ConstantInt>. I could use isa<ConstantInt>. I'm not sure how would I check for hasNamedOperand, because II is a call instruction.

Change the name from findDemandedElts to trimTrailingZerosInVector.
Remove some unnecessary dyn_casts.
Refactor and rebase.

Harbormaster completed remote builds in B231070: Diff 520957.May 10 2023, 5:40 AM

foad added inline comments.May 10 2023, 6:14 AM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
358	Sure.
1111	You don't use `IIVTy` for anything else so just test `isa<FixedVectorType>(...)` instead.
1119	You definitely should not rely on the second argument of the buffer intrinsics not being a constant! Instead you should test the opcode.

Use Intrinsic opcode to know if the instructions has DMask instead of testing if the instruction has a ConstantInt as the second operand.
Add more run-lines to the test.

Harbormaster completed remote builds in B231091: Diff 520986.May 10 2023, 7:53 AM

Code looks good now, thanks.

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
4554	This looks like we have lost an optimization that converts the "mip" form to the non-"mip" form.

Do the optimizations for image instructions that were done prior to this patch.

Refactor.

foad added inline comments.May 15 2023, 7:32 AM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
1123–1126	Instead of duplicating this code, I think it would be neater to move it after the whole `switch (...) {...}` statement. Then everything should Just Work.

matejam updated this revision to Diff 522186.May 15 2023, 7:38 AM

Move the default case out of the switch.

LGTM, thanks!

This revision is now accepted and ready to land.May 15 2023, 7:52 AM

Harbormaster completed remote builds in B232006: Diff 522187.May 15 2023, 9:19 AM

This revision was landed with ongoing or failed builds.May 15 2023, 9:24 AM

Closed by commit rG3181a6e3e7da: [AMDGPU] Trim zero components from buffer and image stores (authored by matejam). · Explain Why

This revision was automatically updated to reflect the committed changes.

matejam added a commit: rG3181a6e3e7da: [AMDGPU] Trim zero components from buffer and image stores.

matejam added a reverting change: rG9c8c31eea439: Revert "[AMDGPU] Trim zero components from buffer and image stores".May 18 2023, 8:11 AM

matejam added a reverting change: rG9c8c31eea439: Revert "[AMDGPU] Trim zero components from buffer and image stores".

Have you got any more details about what was wrong with it?

In D146737#4353046, @foad wrote:

matejam added a reverting change: rG9c8c31eea439: Revert "[AMDGPU] Trim zero components from buffer and image stores".

Have you got any more details about what was wrong with it?

No, we're having issues accessing amd network.
I don't know what tests exactly are failing and why.

foad added inline comments.May 24 2023, 12:41 PM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
366	I don't think you want to loop to go down to i = 0. If the 0'th element is zero then you will remove the whole store instruction, which would not be right. Can you add a test for that case please?

Change condition in for loop, instead of i >= 0, put i > 0. We don't want to optimize out the 0th element.

This patch was reverted on upstream, because of failed cts tests.

This revision is now accepted and ready to land.May 26 2023, 10:16 AM

Add test case with all zero components.

Harbormaster completed remote builds in B234898: Diff 526110.May 26 2023, 11:09 AM

Remove *_buffer_store instructions from being optimized.

matejam requested review of this revision.Jun 2 2023, 7:14 AM

Harbormaster completed remote builds in B236176: Diff 527849.Jun 2 2023, 8:33 AM

arsenm accepted this revision.Jun 2 2023, 2:25 PM

This revision is now accepted and ready to land.Jun 2 2023, 2:25 PM

foad added inline comments.Jun 5 2023, 12:21 AM

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-simplify-image-buffer-stores.ll
12–16	Nit: could just use zeroinitializer.

matejam mentioned this in rGc91246b71eec: fix failures caused by https://reviews.llvm.org/D146737.Jun 5 2023, 4:13 AM

foad added inline comments.Jun 8 2023, 5:12 AM

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
375	Could also handle undef here and treat it as zero.

matejam closed this revision.Jun 16 2023, 2:01 AM

matejam marked 2 inline comments as done.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUInstCombineIntrinsic.cpp

123 lines

test/

Transforms/

InstCombine/

AMDGPU/

amdgcn-intrinsics.ll

44 lines

amdgcn-simplify-image-buffer-stores.ll

102 lines

Diff 520957

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

Show First 20 Lines • Show All 349 Lines • ▼ Show 20 Lines	if (isKnownNeverInfOrNaN(Op0, IC.getDataLayout(), TLI, 0,
&IC.getAssumptionCache(), &I, &IC.getDominatorTree(),		&IC.getAssumptionCache(), &I, &IC.getDominatorTree(),
&IC.getOptimizationRemarkEmitter())) {		&IC.getOptimizationRemarkEmitter())) {
// Neither operand is infinity or NaN.		// Neither operand is infinity or NaN.
return true;		return true;
}		}
return false;		return false;
}		}

		// Trim all zero components from the end of the vector \p UseV and return
		foadUnsubmitted Not Done Reply Inline Actions This function needs a better name now to explain what it does, and a comment, and should return APInt instead of taking an APInt& argument. foad: This function needs a better name now to explain what it does, and a comment, and should return…
		matejamAuthorUnsubmitted Done Reply Inline Actions Is the name trimTrailingZerosInVector more explanatory? matejam: Is the name trimTrailingZerosInVector more explanatory?
		foadUnsubmitted Done Reply Inline Actions Sure. foad: Sure.
		// an appropriate bitset with known elements.
		static APInt trimTrailingZerosInVector(InstCombiner &IC, Value *UseV,
		arsenmUnsubmitted Not Done Reply Inline Actions Can you use possiblyDemandedEltsInMask or something else from vector utils? This doesn't feel like something you would need to invent yourself arsenm: Can you use possiblyDemandedEltsInMask or something else from vector utils? This doesn't feel…
		matejamAuthorUnsubmitted Done Reply Inline Actions I thought about using that, but this is a bit more than that. It doesn't just find which elements were set, but also takes into consideration the DMask for image intrinsics and also ignores zeros that were inserted at the end of the vector, which is the primary goal of this patch. There is also findDemandedEltsByAllUsers, but that works in a different direction, meaning that it will find the uses of the given Value * and update DemandedElts that way. In my case I look at the definitions (insertelement, shufflevector or ConstantVector) of the given Value * and "recursively" go up and update DemandedElts accordingly. matejam: I thought about using that, but this is a bit more than that. It doesn't just find which…
		Instruction *I) {
		auto *VTy = cast<FixedVectorType>(UseV->getType());
		arsenmUnsubmitted Not Done Reply Inline Actions This is an unchecked dyn_cast. You already did a dyn_cast at the call site, so just pass in Instruction to begin with? arsenm: This is an unchecked dyn_cast. You already did a dyn_cast at the call site, so just pass in…
		unsigned VWidth = VTy->getNumElements();
		APInt DemandedElts = APInt::getAllOnes(VWidth);

		arsenmUnsubmitted Not Done Reply Inline Actions SmallVector arsenm: SmallVector
		matejamAuthorUnsubmitted Done Reply Inline Actions Will be done. matejam: Will be done.
		for (int i = VWidth - 1; i >= 0; --i) {
		arsenmUnsubmitted Not Done Reply Inline Actions Don't use std::map. Also, can this just be a SmallVector in the reverse direction per component? arsenm: Don't use std::map. Also, can this just be a SmallVector in the reverse direction per component?
		matejamAuthorUnsubmitted Done Reply Inline Actions I will look into that. matejam: I will look into that.
		matejamAuthorUnsubmitted Done Reply Inline Actions What about MapVector? I'm not sure how would I use SmallVector to implement a map? matejam: What about MapVector? I'm not sure how would I use SmallVector to implement a map?
		arsenmUnsubmitted Not Done Reply Inline Actions I meant it's the value to vector indexes? You could just go the other direction and find the value from the index arsenm: I meant it's the value to vector indexes? You could just go the other direction and find the…
		foadUnsubmitted Not Done Reply Inline Actions I don't think you want to loop to go down to i = 0. If the 0'th element is zero then you will remove the whole store instruction, which would not be right. Can you add a test for that case please? foad: I don't think you want to loop to go down to i = 0. If the 0'th element is zero then you will…
		APInt DemandOneElt = APInt::getOneBitSet(VWidth, i);
		KnownFPClass KnownFPClass =
		foadUnsubmitted Done Reply Inline Actions Maybe call them `ComponentIndices` and `ComponentValues`? foad: Maybe call them `ComponentIndices` and `ComponentValues`?
		computeKnownFPClass(UseV, DemandOneElt, IC.getDataLayout(),
		/InterestedClasses=/fcAllFlags,
		/Depth=/0, &IC.getTargetLibraryInfo(),
		foadUnsubmitted Not Done Reply Inline Actions I wonder if the whole of this loop could be replaced by calling llvm::computeKnownBits with an appropriate DemandedElts mask to test each element from the last to the first. foad: I wonder if the whole of this loop could be replaced by calling llvm::computeKnownBits with an…
		matejamAuthorUnsubmitted Not Done Reply Inline Actions I looked that up. computeKnownBits works only for integers, pointers and vector of integers. matejam: I looked that up. computeKnownBits works only for integers, pointers and vector of integers.
		foadUnsubmitted Not Done Reply Inline Actions You might be able to use the brand new computeKnownFPClass instead, and check if the result is exactly fcPosZero? foad: You might be able to use the brand new computeKnownFPClass instead, and check if the result is…
		matejamAuthorUnsubmitted Done Reply Inline Actions Thanks. matejam: Thanks.
		&IC.getAssumptionCache(), I,
		&IC.getDominatorTree(),
		&IC.getOptimizationRemarkEmitter());
		if (KnownFPClass.KnownFPClasses != fcPosZero)
		foadUnsubmitted Done Reply Inline Actions Could also handle undef here and treat it as zero. foad: Could also handle undef here and treat it as zero.
		break;
		DemandedElts.clearBit(i);
		}
		return DemandedElts;
		}

		static Value *simplifyAMDGCNMemoryIntrinsicDemanded(InstCombiner &IC,
		IntrinsicInst &II,
		APInt DemandedElts,
		int DMaskIdx = -1,
		bool IsLoad = true);
		arsenmUnsubmitted Not Done Reply Inline Actions Replace the opcode check with the dyn_cast arsenm: Replace the opcode check with the dyn_cast
		matejamAuthorUnsubmitted Done Reply Inline Actions Will be done. matejam: Will be done.

		arsenmUnsubmitted Not Done Reply Inline Actions Unchecked dyn_cast, need a test with a variable vector index arsenm: Unchecked dyn_cast, need a test with a variable vector index
std::optional<Instruction *>		std::optional<Instruction *>
GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {		GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
Intrinsic::ID IID = II.getIntrinsicID();		Intrinsic::ID IID = II.getIntrinsicID();
switch (IID) {		switch (IID) {
case Intrinsic::amdgcn_rcp: {		case Intrinsic::amdgcn_rcp: {
Value *Src = II.getArgOperand(0);		Value *Src = II.getArgOperand(0);

// TODO: Move to ConstantFolding/InstSimplify?		// TODO: Move to ConstantFolding/InstSimplify?
if (isa<UndefValue>(Src)) {		if (isa<UndefValue>(Src)) {
Type *Ty = II.getType();		Type *Ty = II.getType();
auto *QNaN = ConstantFP::get(Ty, APFloat::getQNaN(Ty->getFltSemantics()));		auto *QNaN = ConstantFP::get(Ty, APFloat::getQNaN(Ty->getFltSemantics()));
return IC.replaceInstUsesWith(II, QNaN);		return IC.replaceInstUsesWith(II, QNaN);
}		}

if (II.isStrictFP())		if (II.isStrictFP())
		arsenmUnsubmitted Not Done Reply Inline Actions Replace the opcode check with the dyn_cast arsenm: Replace the opcode check with the dyn_cast
		matejamAuthorUnsubmitted Done Reply Inline Actions Will be done. matejam: Will be done.
break;		break;

if (const ConstantFP *C = dyn_cast<ConstantFP>(Src)) {		if (const ConstantFP *C = dyn_cast<ConstantFP>(Src)) {
const APFloat &ArgVal = C->getValueAPF();		const APFloat &ArgVal = C->getValueAPF();
APFloat Val(ArgVal.getSemantics(), 1);		APFloat Val(ArgVal.getSemantics(), 1);
Val.divide(ArgVal, APFloat::rmNearestTiesToEven);		Val.divide(ArgVal, APFloat::rmNearestTiesToEven);

// This is more precise than the instruction may give.		// This is more precise than the instruction may give.
▲ Show 20 Lines • Show All 668 Lines • ▼ Show 20 Lines	GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
case Intrinsic::amdgcn_is_private: {		case Intrinsic::amdgcn_is_private: {
if (isa<UndefValue>(II.getArgOperand(0)))		if (isa<UndefValue>(II.getArgOperand(0)))
return IC.replaceInstUsesWith(II, UndefValue::get(II.getType()));		return IC.replaceInstUsesWith(II, UndefValue::get(II.getType()));

if (isa<ConstantPointerNull>(II.getArgOperand(0)))		if (isa<ConstantPointerNull>(II.getArgOperand(0)))
return IC.replaceInstUsesWith(II, ConstantInt::getFalse(II.getType()));		return IC.replaceInstUsesWith(II, ConstantInt::getFalse(II.getType()));
break;		break;
}		}
		case Intrinsic::amdgcn_buffer_store:
		case Intrinsic::amdgcn_buffer_store_format:
		case Intrinsic::amdgcn_raw_buffer_store:
		case Intrinsic::amdgcn_raw_buffer_store_format:
		case Intrinsic::amdgcn_raw_tbuffer_store:
		case Intrinsic::amdgcn_struct_buffer_store:
		case Intrinsic::amdgcn_struct_buffer_store_format:
		case Intrinsic::amdgcn_struct_tbuffer_store:
		case Intrinsic::amdgcn_tbuffer_store:
		case Intrinsic::amdgcn_image_store_1d:
		arsenmUnsubmitted Done Reply Inline Actions Don't see why the element type would matter? They're a bit artificial arsenm: Don't see why the element type would matter? They're a bit artificial
		case Intrinsic::amdgcn_image_store_1darray:
		case Intrinsic::amdgcn_image_store_2d:
		case Intrinsic::amdgcn_image_store_2darray:
		foadUnsubmitted Done Reply Inline Actions Use getArgOperand throughout for consistency? foad: Use getArgOperand throughout for consistency?
		case Intrinsic::amdgcn_image_store_2darraymsaa:
		case Intrinsic::amdgcn_image_store_2dmsaa:
		case Intrinsic::amdgcn_image_store_3d:
		foadUnsubmitted Done Reply Inline Actions Why does this need to be an Instruction? Couldn't it be a Constant? E.g. if it was an all zeroes constant, that should be simplified. foad: Why does this need to be an Instruction? Couldn't it be a Constant? E.g. if it was an all…
		case Intrinsic::amdgcn_image_store_cube:
		case Intrinsic::amdgcn_image_store_mip_1d:
		case Intrinsic::amdgcn_image_store_mip_1darray:
		case Intrinsic::amdgcn_image_store_mip_2d:
		case Intrinsic::amdgcn_image_store_mip_2darray:
		case Intrinsic::amdgcn_image_store_mip_3d:
		case Intrinsic::amdgcn_image_store_mip_cube: {
		auto *IIVTy = dyn_cast<FixedVectorType>(II.getArgOperand(0)->getType());
		if (!IIVTy)
		foadUnsubmitted Not Done Reply Inline Actions You don't use `IIVTy` for anything else so just test `isa<FixedVectorType>(...)` instead. foad: You don't use `IIVTy` for anything else so just test `isa<FixedVectorType>(...)` instead.
		break;

		foadUnsubmitted Not Done Reply Inline Actions It would be simpler to use dyn_cast and break if it fails. foad: It would be simpler to use dyn_cast and break if it fails.
		APInt DemandedElts = trimTrailingZerosInVector(IC, II.getArgOperand(0), &II);

		int DMaskIdx = isa<ConstantInt>(II.getArgOperand(1)) ? 1 : -1;
		if (simplifyAMDGCNMemoryIntrinsicDemanded(IC, II, DemandedElts, DMaskIdx, false))
		return IC.eraseInstFromFunction(II);

		foadUnsubmitted Not Done Reply Inline Actions `dyn_cast` should be `isa` since you don't use the result. But what is this actually testing for? Would it be better to test hasNamedOperand? foad: `dyn_cast` should be `isa` since you don't use the result. But what is this actually testing…
		foadUnsubmitted Not Done Reply Inline Actions I think these intrinsics all have `immarg` on the dmask argument, so it should always be a constant? foad: I think these intrinsics all have `immarg` on the dmask argument, so it should always be a…
		arsenmUnsubmitted Not Done Reply Inline Actions Yes, should be able to just cast<ConstantInt> arsenm: Yes, should be able to just cast<ConstantInt>
		matejamAuthorUnsubmitted Done Reply Inline Actions Both image and buffer intrinsics have the same body in this switch. For buffer instructions the first operand is not a constant, that is why I used dyn_cast<ConstantInt>. I could use isa<ConstantInt>. I'm not sure how would I check for hasNamedOperand, because II is a call instruction. matejam: Both image and buffer intrinsics have the same body in this switch. For buffer instructions the…
		foadUnsubmitted Not Done Reply Inline Actions You definitely should not rely on the second argument of the buffer intrinsics not being a constant! Instead you should test the opcode. foad: You definitely should not rely on the second argument of the buffer intrinsics not being a…
		break;
		}
default: {		default: {
if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =		if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
AMDGPU::getImageDimIntrinsicInfo(II.getIntrinsicID())) {		AMDGPU::getImageDimIntrinsicInfo(II.getIntrinsicID())) {
return simplifyAMDGCNImageIntrinsic(ST, ImageDimIntr, II, IC);		return simplifyAMDGCNImageIntrinsic(ST, ImageDimIntr, II, IC);
}		}
		foadUnsubmitted Not Done Reply Inline Actions Instead of duplicating this code, I think it would be neater to move it after the whole `switch (...) {...}` statement. Then everything should Just Work. foad: Instead of duplicating this code, I think it would be neater to move it after the whole `switch…
}		}
}		}
return std::nullopt;		return std::nullopt;
}		}

/// Implement SimplifyDemandedVectorElts for amdgcn buffer and image intrinsics.		/// Implement SimplifyDemandedVectorElts for amdgcn buffer and image intrinsics.
///		///
		/// The result of simplifying amdgcn image and buffer store intrinsics is updating
		foadUnsubmitted Not Done Reply Inline Actions Should be `>=` foad: Should be `>=`
		/// definitions of the intrinsics vector argument, not Uses of the result like
		arsenmUnsubmitted Not Done Reply Inline Actions Don't bother using APInt here? arsenm: Don't bother using APInt here?
		matejamAuthorUnsubmitted Done Reply Inline Actions I see I can do the same thing a lot more easier. Like "if (DMaskVal > (1 << VWidth))" matejam: I see I can do the same thing a lot more easier. Like "if (DMaskVal > (1 << VWidth))"
		/// image and buffer loads.
/// Note: This only supports non-TFE/LWE image intrinsic calls; those have		/// Note: This only supports non-TFE/LWE image intrinsic calls; those have
/// struct returns.		/// struct returns.
static Value *simplifyAMDGCNMemoryIntrinsicDemanded(InstCombiner &IC,		static Value *simplifyAMDGCNMemoryIntrinsicDemanded(InstCombiner &IC,
IntrinsicInst &II,		IntrinsicInst &II,
APInt DemandedElts,		APInt DemandedElts,
int DMaskIdx = -1) {		int DMaskIdx, bool IsLoad) {

auto *IIVTy = cast<FixedVectorType>(II.getType());		auto *IIVTy = cast<FixedVectorType>(IsLoad ? II.getType()
		: II.getOperand(0)->getType());
unsigned VWidth = IIVTy->getNumElements();		unsigned VWidth = IIVTy->getNumElements();
if (VWidth == 1)		if (VWidth == 1)
return nullptr;		return nullptr;
Type *EltTy = IIVTy->getElementType();		Type *EltTy = IIVTy->getElementType();

IRBuilderBase::InsertPointGuard Guard(IC.Builder);		IRBuilderBase::InsertPointGuard Guard(IC.Builder);
IC.Builder.SetInsertPoint(&II);		IC.Builder.SetInsertPoint(&II);

▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	if (DMaskIdx < 0) {

ConstantInt *DMask = cast<ConstantInt>(Args[DMaskIdx]);		ConstantInt *DMask = cast<ConstantInt>(Args[DMaskIdx]);
unsigned DMaskVal = DMask->getZExtValue() & 0xf;		unsigned DMaskVal = DMask->getZExtValue() & 0xf;

// Mask off values that are undefined because the dmask doesn't cover them		// Mask off values that are undefined because the dmask doesn't cover them
DemandedElts &= (1 << llvm::popcount(DMaskVal)) - 1;		DemandedElts &= (1 << llvm::popcount(DMaskVal)) - 1;

unsigned NewDMaskVal = 0;		unsigned NewDMaskVal = 0;
unsigned OrigLoadIdx = 0;		unsigned OrigLdStIdx = 0;
for (unsigned SrcIdx = 0; SrcIdx < 4; ++SrcIdx) {		for (unsigned SrcIdx = 0; SrcIdx < 4; ++SrcIdx) {
const unsigned Bit = 1 << SrcIdx;		const unsigned Bit = 1 << SrcIdx;
if (!!(DMaskVal & Bit)) {		if (!!(DMaskVal & Bit)) {
if (!!DemandedElts[OrigLoadIdx])		if (!!DemandedElts[OrigLdStIdx])
NewDMaskVal \|= Bit;		NewDMaskVal \|= Bit;
OrigLoadIdx++;		OrigLdStIdx++;
}		}
}		}

if (DMaskVal != NewDMaskVal)		if (DMaskVal != NewDMaskVal)
Args[DMaskIdx] = ConstantInt::get(DMask->getType(), NewDMaskVal);		Args[DMaskIdx] = ConstantInt::get(DMask->getType(), NewDMaskVal);
}		}

unsigned NewNumElts = DemandedElts.popcount();		unsigned NewNumElts = DemandedElts.popcount();
Show All 11 Lines	static Value *simplifyAMDGCNMemoryIntrinsicDemanded(InstCombiner &IC,
SmallVector<Type *, 6> OverloadTys;		SmallVector<Type *, 6> OverloadTys;
if (!Intrinsic::getIntrinsicSignature(II.getCalledFunction(), OverloadTys))		if (!Intrinsic::getIntrinsicSignature(II.getCalledFunction(), OverloadTys))
return nullptr;		return nullptr;

Type *NewTy =		Type *NewTy =
(NewNumElts == 1) ? EltTy : FixedVectorType::get(EltTy, NewNumElts);		(NewNumElts == 1) ? EltTy : FixedVectorType::get(EltTy, NewNumElts);
OverloadTys[0] = NewTy;		OverloadTys[0] = NewTy;

		if (!IsLoad) {
		SmallVector<int, 8> EltMask;
		for (unsigned OrigStoreIdx = 0; OrigStoreIdx < VWidth; ++OrigStoreIdx)
		if (DemandedElts[OrigStoreIdx])
		EltMask.push_back(OrigStoreIdx);

		if (NewNumElts == 1)
		Args[0] = IC.Builder.CreateExtractElement(II.getOperand(0), EltMask[0]);
		else
		Args[0] = IC.Builder.CreateShuffleVector(II.getOperand(0), EltMask);
		}

Function *NewIntrin = Intrinsic::getDeclaration(		Function *NewIntrin = Intrinsic::getDeclaration(
II.getModule(), II.getIntrinsicID(), OverloadTys);		II.getModule(), II.getIntrinsicID(), OverloadTys);
CallInst *NewCall = IC.Builder.CreateCall(NewIntrin, Args);		CallInst *NewCall = IC.Builder.CreateCall(NewIntrin, Args);
NewCall->takeName(&II);		NewCall->takeName(&II);
NewCall->copyMetadata(II);		NewCall->copyMetadata(II);

		if (IsLoad) {
if (NewNumElts == 1) {		if (NewNumElts == 1) {
return IC.Builder.CreateInsertElement(UndefValue::get(IIVTy), NewCall,		return IC.Builder.CreateInsertElement(UndefValue::get(IIVTy), NewCall,
DemandedElts.countr_zero());		DemandedElts.countr_zero());
		foadUnsubmitted Done Reply Inline Actions Keep the braces around this, because it is more than one physical line. foad: Keep the braces around this, because it is more than one physical line.
}		}

SmallVector<int, 8> EltMask;		SmallVector<int, 8> EltMask;
unsigned NewLoadIdx = 0;		unsigned NewLoadIdx = 0;
for (unsigned OrigLoadIdx = 0; OrigLoadIdx < VWidth; ++OrigLoadIdx) {		for (unsigned OrigLoadIdx = 0; OrigLoadIdx < VWidth; ++OrigLoadIdx) {
if (!!DemandedElts[OrigLoadIdx])		if (!!DemandedElts[OrigLoadIdx])
EltMask.push_back(NewLoadIdx++);		EltMask.push_back(NewLoadIdx++);
else		else
EltMask.push_back(NewNumElts);		EltMask.push_back(NewNumElts);
}		}

Value *Shuffle = IC.Builder.CreateShuffleVector(NewCall, EltMask);		auto *Shuffle = IC.Builder.CreateShuffleVector(NewCall, EltMask);

return Shuffle;		return Shuffle;
}		}

		return NewCall;
		}

std::optional<Value *> GCNTTIImpl::simplifyDemandedVectorEltsIntrinsic(		std::optional<Value *> GCNTTIImpl::simplifyDemandedVectorEltsIntrinsic(
InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,		InstCombiner &IC, IntrinsicInst &II, APInt DemandedElts, APInt &UndefElts,
APInt &UndefElts2, APInt &UndefElts3,		APInt &UndefElts2, APInt &UndefElts3,
std::function<void(Instruction *, unsigned, APInt, APInt &)>		std::function<void(Instruction *, unsigned, APInt, APInt &)>
SimplifyAndSetOp) const {		SimplifyAndSetOp) const {
switch (II.getIntrinsicID()) {		switch (II.getIntrinsicID()) {
case Intrinsic::amdgcn_buffer_load:		case Intrinsic::amdgcn_buffer_load:
case Intrinsic::amdgcn_buffer_load_format:		case Intrinsic::amdgcn_buffer_load_format:
Show All 18 Lines

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret double 0x3F97D05F417D05F4		; CHECK-NEXT: ret double 0x3F97D05F417D05F4
;		;
%val = call double @llvm.amdgcn.rcp.f64(double 4.300000e+01) nounwind readnone		%val = call double @llvm.amdgcn.rcp.f64(double 4.300000e+01) nounwind readnone
ret double %val		ret double %val
}		}

define float @test_constant_fold_rcp_f32_43_strictfp() nounwind strictfp {		define float @test_constant_fold_rcp_f32_43_strictfp() nounwind strictfp {
; CHECK-LABEL: @test_constant_fold_rcp_f32_43_strictfp(		; CHECK-LABEL: @test_constant_fold_rcp_f32_43_strictfp(
; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.rcp.f32(float 4.300000e+01) #[[ATTR14:[0-9]+]]		; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.rcp.f32(float 4.300000e+01) #[[ATTR12:[0-9]+]]
; CHECK-NEXT: ret float [[VAL]]		; CHECK-NEXT: ret float [[VAL]]
;		;
%val = call float @llvm.amdgcn.rcp.f32(float 4.300000e+01) strictfp nounwind readnone		%val = call float @llvm.amdgcn.rcp.f32(float 4.300000e+01) strictfp nounwind readnone
ret float %val		ret float %val
}		}

; --------------------------------------------------------------------		; --------------------------------------------------------------------
; llvm.amdgcn.sqrt		; llvm.amdgcn.sqrt
Show All 24 Lines
; CHECK-NEXT: ret double 0x7FF8000000000000		; CHECK-NEXT: ret double 0x7FF8000000000000
;		;
%val = call double @llvm.amdgcn.sqrt.f64(double undef) nounwind readnone		%val = call double @llvm.amdgcn.sqrt.f64(double undef) nounwind readnone
ret double %val		ret double %val
}		}

define half @test_constant_fold_sqrt_f16_0() nounwind {		define half @test_constant_fold_sqrt_f16_0() nounwind {
; CHECK-LABEL: @test_constant_fold_sqrt_f16_0(		; CHECK-LABEL: @test_constant_fold_sqrt_f16_0(
; CHECK-NEXT: [[VAL:%.*]] = call half @llvm.amdgcn.sqrt.f16(half 0xH0000) #[[ATTR15:[0-9]+]]		; CHECK-NEXT: [[VAL:%.*]] = call half @llvm.amdgcn.sqrt.f16(half 0xH0000) #[[ATTR13:[0-9]+]]
; CHECK-NEXT: ret half [[VAL]]		; CHECK-NEXT: ret half [[VAL]]
;		;
%val = call half @llvm.amdgcn.sqrt.f16(half 0.0) nounwind readnone		%val = call half @llvm.amdgcn.sqrt.f16(half 0.0) nounwind readnone
ret half %val		ret half %val
}		}

define float @test_constant_fold_sqrt_f32_0() nounwind {		define float @test_constant_fold_sqrt_f32_0() nounwind {
; CHECK-LABEL: @test_constant_fold_sqrt_f32_0(		; CHECK-LABEL: @test_constant_fold_sqrt_f32_0(
; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.sqrt.f32(float 0.000000e+00) #[[ATTR15]]		; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.sqrt.f32(float 0.000000e+00) #[[ATTR13]]
; CHECK-NEXT: ret float [[VAL]]		; CHECK-NEXT: ret float [[VAL]]
;		;
%val = call float @llvm.amdgcn.sqrt.f32(float 0.0) nounwind readnone		%val = call float @llvm.amdgcn.sqrt.f32(float 0.0) nounwind readnone
ret float %val		ret float %val
}		}

define double @test_constant_fold_sqrt_f64_0() nounwind {		define double @test_constant_fold_sqrt_f64_0() nounwind {
; CHECK-LABEL: @test_constant_fold_sqrt_f64_0(		; CHECK-LABEL: @test_constant_fold_sqrt_f64_0(
; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double 0.000000e+00) #[[ATTR15]]		; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double 0.000000e+00) #[[ATTR13]]
; CHECK-NEXT: ret double [[VAL]]		; CHECK-NEXT: ret double [[VAL]]
;		;
%val = call double @llvm.amdgcn.sqrt.f64(double 0.0) nounwind readnone		%val = call double @llvm.amdgcn.sqrt.f64(double 0.0) nounwind readnone
ret double %val		ret double %val
}		}

define half @test_constant_fold_sqrt_f16_neg0() nounwind {		define half @test_constant_fold_sqrt_f16_neg0() nounwind {
; CHECK-LABEL: @test_constant_fold_sqrt_f16_neg0(		; CHECK-LABEL: @test_constant_fold_sqrt_f16_neg0(
; CHECK-NEXT: [[VAL:%.*]] = call half @llvm.amdgcn.sqrt.f16(half 0xH8000) #[[ATTR15]]		; CHECK-NEXT: [[VAL:%.*]] = call half @llvm.amdgcn.sqrt.f16(half 0xH8000) #[[ATTR13]]
; CHECK-NEXT: ret half [[VAL]]		; CHECK-NEXT: ret half [[VAL]]
;		;
%val = call half @llvm.amdgcn.sqrt.f16(half -0.0) nounwind readnone		%val = call half @llvm.amdgcn.sqrt.f16(half -0.0) nounwind readnone
ret half %val		ret half %val
}		}

define float @test_constant_fold_sqrt_f32_neg0() nounwind {		define float @test_constant_fold_sqrt_f32_neg0() nounwind {
; CHECK-LABEL: @test_constant_fold_sqrt_f32_neg0(		; CHECK-LABEL: @test_constant_fold_sqrt_f32_neg0(
; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.sqrt.f32(float -0.000000e+00) #[[ATTR15]]		; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.sqrt.f32(float -0.000000e+00) #[[ATTR13]]
; CHECK-NEXT: ret float [[VAL]]		; CHECK-NEXT: ret float [[VAL]]
;		;
%val = call float @llvm.amdgcn.sqrt.f32(float -0.0) nounwind readnone		%val = call float @llvm.amdgcn.sqrt.f32(float -0.0) nounwind readnone
ret float %val		ret float %val
}		}

define double @test_constant_fold_sqrt_f64_neg0() nounwind {		define double @test_constant_fold_sqrt_f64_neg0() nounwind {
; CHECK-LABEL: @test_constant_fold_sqrt_f64_neg0(		; CHECK-LABEL: @test_constant_fold_sqrt_f64_neg0(
; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double -0.000000e+00) #[[ATTR15]]		; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double -0.000000e+00) #[[ATTR13]]
; CHECK-NEXT: ret double [[VAL]]		; CHECK-NEXT: ret double [[VAL]]
;		;
%val = call double @llvm.amdgcn.sqrt.f64(double -0.0) nounwind readnone		%val = call double @llvm.amdgcn.sqrt.f64(double -0.0) nounwind readnone
ret double %val		ret double %val
}		}

define double @test_constant_fold_sqrt_snan_f64() nounwind {		define double @test_constant_fold_sqrt_snan_f64() nounwind {
; CHECK-LABEL: @test_constant_fold_sqrt_snan_f64(		; CHECK-LABEL: @test_constant_fold_sqrt_snan_f64(
▲ Show 20 Lines • Show All 475 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret i1 [[VAL]]		; CHECK-NEXT: ret i1 [[VAL]]
;		;
%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 3)		%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 3)
ret i1 %val		ret i1 %val
}		}

define i1 @test_class_isnan_f32_strict(float %x) nounwind {		define i1 @test_class_isnan_f32_strict(float %x) nounwind {
; CHECK-LABEL: @test_class_isnan_f32_strict(		; CHECK-LABEL: @test_class_isnan_f32_strict(
; CHECK-NEXT: [[VAL:%.]] = call i1 @llvm.amdgcn.class.f32(float [[X:%.]], i32 3) #[[ATTR16:[0-9]+]]		; CHECK-NEXT: [[VAL:%.]] = call i1 @llvm.amdgcn.class.f32(float [[X:%.]], i32 3) #[[ATTR14:[0-9]+]]
; CHECK-NEXT: ret i1 [[VAL]]		; CHECK-NEXT: ret i1 [[VAL]]
;		;
%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 3) strictfp		%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 3) strictfp
ret i1 %val		ret i1 %val
}		}

define i1 @test_class_is_p0_n0_f32(float %x) nounwind {		define i1 @test_class_is_p0_n0_f32(float %x) nounwind {
; CHECK-LABEL: @test_class_is_p0_n0_f32(		; CHECK-LABEL: @test_class_is_p0_n0_f32(
; CHECK-NEXT: [[VAL:%.]] = fcmp oeq float [[X:%.]], 0.000000e+00		; CHECK-NEXT: [[VAL:%.]] = fcmp oeq float [[X:%.]], 0.000000e+00
; CHECK-NEXT: ret i1 [[VAL]]		; CHECK-NEXT: ret i1 [[VAL]]
;		;
%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 96)		%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 96)
ret i1 %val		ret i1 %val
}		}

define i1 @test_class_is_p0_n0_f32_strict(float %x) nounwind {		define i1 @test_class_is_p0_n0_f32_strict(float %x) nounwind {
; CHECK-LABEL: @test_class_is_p0_n0_f32_strict(		; CHECK-LABEL: @test_class_is_p0_n0_f32_strict(
; CHECK-NEXT: [[VAL:%.]] = call i1 @llvm.amdgcn.class.f32(float [[X:%.]], i32 96) #[[ATTR16]]		; CHECK-NEXT: [[VAL:%.]] = call i1 @llvm.amdgcn.class.f32(float [[X:%.]], i32 96) #[[ATTR14]]
; CHECK-NEXT: ret i1 [[VAL]]		; CHECK-NEXT: ret i1 [[VAL]]
;		;
%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 96) strictfp		%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 96) strictfp
ret i1 %val		ret i1 %val
}		}

define i1 @test_constant_class_snan_test_snan_f64() nounwind {		define i1 @test_constant_class_snan_test_snan_f64() nounwind {
; CHECK-LABEL: @test_constant_class_snan_test_snan_f64(		; CHECK-LABEL: @test_constant_class_snan_test_snan_f64(
▲ Show 20 Lines • Show All 596 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret i32 0		; CHECK-NEXT: ret i32 0
;		;
%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 0, i32 0)		%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 0, i32 0)
ret i32 %bfe		ret i32 %bfe
}		}

define i32 @ubfe_offset_0_width_3(i32 %src) {		define i32 @ubfe_offset_0_width_3(i32 %src) {
; CHECK-LABEL: @ubfe_offset_0_width_3(		; CHECK-LABEL: @ubfe_offset_0_width_3(
; CHECK-NEXT: [[TMP1:%.]] = and i32 [[SRC:%.]], 7		; CHECK-NEXT: [[BFE:%.]] = and i32 [[SRC:%.]], 7
; CHECK-NEXT: ret i32 [[TMP1]]		; CHECK-NEXT: ret i32 [[BFE]]
;		;
%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 0, i32 3)		%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 0, i32 3)
ret i32 %bfe		ret i32 %bfe
}		}

define i32 @ubfe_offset_3_width_1(i32 %src) {		define i32 @ubfe_offset_3_width_1(i32 %src) {
; CHECK-LABEL: @ubfe_offset_3_width_1(		; CHECK-LABEL: @ubfe_offset_3_width_1(
; CHECK-NEXT: [[TMP1:%.]] = lshr i32 [[SRC:%.]], 3		; CHECK-NEXT: [[TMP1:%.]] = lshr i32 [[SRC:%.]], 3
▲ Show 20 Lines • Show All 500 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret i64 0		; CHECK-NEXT: ret i64 0
;		;
%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 8, i32 32)		%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 8, i32 32)
ret i64 %result		ret i64 %result
}		}

define i64 @icmp_constant_inputs_true() {		define i64 @icmp_constant_inputs_true() {
; CHECK-LABEL: @icmp_constant_inputs_true(		; CHECK-LABEL: @icmp_constant_inputs_true(
; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.read_register.i64(metadata [[META0:![0-9]+]]) #[[ATTR17:[0-9]+]]		; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.read_register.i64(metadata [[META0:![0-9]+]]) #[[ATTR15:[0-9]+]]
; CHECK-NEXT: ret i64 [[RESULT]]		; CHECK-NEXT: ret i64 [[RESULT]]
;		;
%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 8, i32 34)		%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 8, i32 34)
ret i64 %result		ret i64 %result
}		}

define i64 @icmp_constant_to_rhs_slt(i32 %x) {		define i64 @icmp_constant_to_rhs_slt(i32 %x) {
; CHECK-LABEL: @icmp_constant_to_rhs_slt(		; CHECK-LABEL: @icmp_constant_to_rhs_slt(
▲ Show 20 Lines • Show All 690 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret i64 0		; CHECK-NEXT: ret i64 0
;		;
%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 2.0, float 4.0, i32 1)		%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 2.0, float 4.0, i32 1)
ret i64 %result		ret i64 %result
}		}

define i64 @fcmp_constant_inputs_true() {		define i64 @fcmp_constant_inputs_true() {
; CHECK-LABEL: @fcmp_constant_inputs_true(		; CHECK-LABEL: @fcmp_constant_inputs_true(
; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.read_register.i64(metadata [[META0]]) #[[ATTR17]]		; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.read_register.i64(metadata [[META0]]) #[[ATTR15]]
; CHECK-NEXT: ret i64 [[RESULT]]		; CHECK-NEXT: ret i64 [[RESULT]]
;		;
%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 2.0, float 4.0, i32 4)		%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 2.0, float 4.0, i32 4)
ret i64 %result		ret i64 %result
}		}

define i64 @fcmp_constant_to_rhs_olt(float %x) {		define i64 @fcmp_constant_to_rhs_olt(float %x) {
; CHECK-LABEL: @fcmp_constant_to_rhs_olt(		; CHECK-LABEL: @fcmp_constant_to_rhs_olt(
Show All 25 Lines
; CHECK-NEXT: ret i64 0		; CHECK-NEXT: ret i64 0
;		;
%b = call i64 @llvm.amdgcn.ballot.i64(i1 0)		%b = call i64 @llvm.amdgcn.ballot.i64(i1 0)
ret i64 %b		ret i64 %b
}		}

define i64 @ballot_one_64() {		define i64 @ballot_one_64() {
; CHECK-LABEL: @ballot_one_64(		; CHECK-LABEL: @ballot_one_64(
; CHECK-NEXT: [[B:%.*]] = call i64 @llvm.read_register.i64(metadata [[META0]]) #[[ATTR17]]		; CHECK-NEXT: [[B:%.*]] = call i64 @llvm.read_register.i64(metadata [[META0]]) #[[ATTR15]]
; CHECK-NEXT: ret i64 [[B]]		; CHECK-NEXT: ret i64 [[B]]
;		;
%b = call i64 @llvm.amdgcn.ballot.i64(i1 1)		%b = call i64 @llvm.amdgcn.ballot.i64(i1 1)
ret i64 %b		ret i64 %b
}		}

define i32 @ballot_nocombine_32(i1 %i) {		define i32 @ballot_nocombine_32(i1 %i) {
; CHECK-LABEL: @ballot_nocombine_32(		; CHECK-LABEL: @ballot_nocombine_32(
Show All 9 Lines
; CHECK-NEXT: ret i32 0		; CHECK-NEXT: ret i32 0
;		;
%b = call i32 @llvm.amdgcn.ballot.i32(i1 0)		%b = call i32 @llvm.amdgcn.ballot.i32(i1 0)
ret i32 %b		ret i32 %b
}		}

define i32 @ballot_one_32() {		define i32 @ballot_one_32() {
; CHECK-LABEL: @ballot_one_32(		; CHECK-LABEL: @ballot_one_32(
; CHECK-NEXT: [[B:%.*]] = call i32 @llvm.read_register.i32(metadata [[META1:![0-9]+]]) #[[ATTR17]]		; CHECK-NEXT: [[B:%.*]] = call i32 @llvm.read_register.i32(metadata [[META1:![0-9]+]]) #[[ATTR15]]
; CHECK-NEXT: ret i32 [[B]]		; CHECK-NEXT: ret i32 [[B]]
;		;
%b = call i32 @llvm.amdgcn.ballot.i32(i1 1)		%b = call i32 @llvm.amdgcn.ballot.i32(i1 1)
ret i32 %b		ret i32 %b
}		}

; --------------------------------------------------------------------		; --------------------------------------------------------------------
; llvm.amdgcn.wqm.vote		; llvm.amdgcn.wqm.vote
▲ Show 20 Lines • Show All 1,966 Lines • ▼ Show 20 Lines	main_body:
store <4 x float> %v, ptr addrspace(1) %out		store <4 x float> %v, ptr addrspace(1) %out
ret void		ret void
}		}


define amdgpu_kernel void @store_mip_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s) {		define amdgpu_kernel void @store_mip_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s) {
; CHECK-LABEL: @store_mip_1d(		; CHECK-LABEL: @store_mip_1d(
; CHECK-NEXT: main_body:		; CHECK-NEXT: main_body:
; CHECK-NEXT: call void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)		; CHECK-NEXT: call void @llvm.amdgcn.image.store.mip.1d.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 0, <8 x i32> [[RSRC:%.*]], i32 0, i32 0)
		foadUnsubmitted Not Done Reply Inline Actions This looks like we have lost an optimization that converts the "mip" form to the non-"mip" form. foad: This looks like we have lost an optimization that converts the "mip" form to the non-"mip" form.
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
main_body:		main_body:
call void @llvm.amdgcn.image.store.mip.1d.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 0, <8 x i32> %rsrc, i32 0, i32 0)		call void @llvm.amdgcn.image.store.mip.1d.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
ret void		ret void
}		}

define amdgpu_kernel void @store_mip_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t) {		define amdgpu_kernel void @store_mip_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t) {
; CHECK-LABEL: @store_mip_2d(		; CHECK-LABEL: @store_mip_2d(
; CHECK-NEXT: main_body:		; CHECK-NEXT: main_body:
; CHECK-NEXT: call void @llvm.amdgcn.image.store.2d.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], <8 x i32> [[RSRC:%.]], i32 0, i32 0)		; CHECK-NEXT: call void @llvm.amdgcn.image.store.mip.2d.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 0, <8 x i32> [[RSRC:%.]], i32 0, i32 0)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
main_body:		main_body:
call void @llvm.amdgcn.image.store.mip.2d.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 0, <8 x i32> %rsrc, i32 0, i32 0)		call void @llvm.amdgcn.image.store.mip.2d.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
ret void		ret void
}		}

define amdgpu_kernel void @store_mip_3d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t, i32 %u) {		define amdgpu_kernel void @store_mip_3d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t, i32 %u) {
; CHECK-LABEL: @store_mip_3d(		; CHECK-LABEL: @store_mip_3d(
; CHECK-NEXT: main_body:		; CHECK-NEXT: main_body:
; CHECK-NEXT: call void @llvm.amdgcn.image.store.3d.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)		; CHECK-NEXT: call void @llvm.amdgcn.image.store.mip.3d.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]], i32 0, <8 x i32> [[RSRC:%.*]], i32 0, i32 0)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
main_body:		main_body:
call void @llvm.amdgcn.image.store.mip.3d.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)		call void @llvm.amdgcn.image.store.mip.3d.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
ret void		ret void
}		}

define amdgpu_kernel void @store_mip_1darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t) {		define amdgpu_kernel void @store_mip_1darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t) {
; CHECK-LABEL: @store_mip_1darray(		; CHECK-LABEL: @store_mip_1darray(
; CHECK-NEXT: main_body:		; CHECK-NEXT: main_body:
; CHECK-NEXT: call void @llvm.amdgcn.image.store.1darray.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], <8 x i32> [[RSRC:%.]], i32 0, i32 0)		; CHECK-NEXT: call void @llvm.amdgcn.image.store.mip.1darray.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 0, <8 x i32> [[RSRC:%.]], i32 0, i32 0)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
main_body:		main_body:
call void @llvm.amdgcn.image.store.mip.1darray.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 0, <8 x i32> %rsrc, i32 0, i32 0)		call void @llvm.amdgcn.image.store.mip.1darray.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
ret void		ret void
}		}

define amdgpu_kernel void @store_mip_2darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t, i32 %u) {		define amdgpu_kernel void @store_mip_2darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t, i32 %u) {
; CHECK-LABEL: @store_mip_2darray(		; CHECK-LABEL: @store_mip_2darray(
; CHECK-NEXT: main_body:		; CHECK-NEXT: main_body:
; CHECK-NEXT: call void @llvm.amdgcn.image.store.2darray.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)		; CHECK-NEXT: call void @llvm.amdgcn.image.store.mip.2darray.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]], i32 0, <8 x i32> [[RSRC:%.*]], i32 0, i32 0)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
main_body:		main_body:
call void @llvm.amdgcn.image.store.mip.2darray.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)		call void @llvm.amdgcn.image.store.mip.2darray.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
ret void		ret void
}		}

define amdgpu_kernel void @store_mip_cube(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t, i32 %u) {		define amdgpu_kernel void @store_mip_cube(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t, i32 %u) {
; CHECK-LABEL: @store_mip_cube(		; CHECK-LABEL: @store_mip_cube(
; CHECK-NEXT: main_body:		; CHECK-NEXT: main_body:
; CHECK-NEXT: call void @llvm.amdgcn.image.store.cube.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)		; CHECK-NEXT: call void @llvm.amdgcn.image.store.mip.cube.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]], i32 0, <8 x i32> [[RSRC:%.*]], i32 0, i32 0)
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
main_body:		main_body:
call void @llvm.amdgcn.image.store.mip.cube.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)		call void @llvm.amdgcn.image.store.mip.cube.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
ret void		ret void
}		}

declare <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1		declare <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1
▲ Show 20 Lines • Show All 963 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret double [[VAL]]		; CHECK-NEXT: ret double [[VAL]]
;		;
%val = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5)		%val = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5)
ret double %val		ret double %val
}		}

define double @trig_preop_constfold_strictfp() {		define double @trig_preop_constfold_strictfp() {
; CHECK-LABEL: @trig_preop_constfold_strictfp(		; CHECK-LABEL: @trig_preop_constfold_strictfp(
; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5) #[[ATTR16]]		; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5) #[[ATTR14]]
; CHECK-NEXT: ret double [[VAL]]		; CHECK-NEXT: ret double [[VAL]]
;		;
%val = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5) strictfp		%val = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5) strictfp
ret double %val		ret double %val
}		}

define double @trig_preop_constfold_0.0__0() {		define double @trig_preop_constfold_0.0__0() {
; CHECK-LABEL: @trig_preop_constfold_0.0__0(		; CHECK-LABEL: @trig_preop_constfold_0.0__0(
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-simplify-image-buffer-stores.ll

This file was added.

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py

; RUN: opt -mcpu=gfx1100 -S -passes=instcombine -mtriple=amdgcn-amd-amdhsa %s | FileCheck -check-prefixes=GFX11 %s

define amdgpu_ps void @image_store_1d_store_insert_zeros_at_end(<8 x i32> inreg %rsrc, float %vdata1, i32 %s) #0 {

; GFX11-LABEL: @image_store_1d_store_insert_zeros_at_end(

; GFX11-NEXT: call void @llvm.amdgcn.image.store.1d.f32.i32(float [[VDATA1:%.*]], i32 1, i32 [[S:%.*]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)

; GFX11-NEXT: ret void

;

%newvdata1 = insertelement <4 x float> undef, float %vdata1, i32 0

%newvdata2 = insertelement <4 x float> %newvdata1, float 0.0, i32 1

%newvdata3 = insertelement <4 x float> %newvdata2, float 0.0, i32 2

%newvdata4 = insertelement <4 x float> %newvdata3, float 0.0, i32 3

call void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float> %newvdata4, i32 15, i32 %s, <8 x i32> %rsrc, i32 0, i32 0)

ret void

}

foadUnsubmitted

Done

; GCN-NEXT: ret void

;

- %newvdata1 = insertelement <4 x float> undef, float 0.0, i32 0

- %newvdata2 = insertelement <4 x float> %newvdata1, float 0.0, i32 1

- %newvdata3 = insertelement <4 x float> %newvdata2, float 0.0, i32 2

- %newvdata4 = insertelement <4 x float> %newvdata3, float 0.0, i32 3

- call void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float> %newvdata4, i32 15, i32 %s, <8 x i32> %rsrc, i32 0, i32 0)

+ call void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float> zeroinitializer, i32 15, i32 %s, <8 x i32> %rsrc, i32 0, i32 0)

ret void

Nit: could just use zeroinitializer.

foad: Nit: could just use zeroinitializer.

define amdgpu_ps void @image_store_mip_1d_store_insert_zeros_at_end(<8 x i32> inreg %rsrc, float %vdata1, float %vdata2, i32 %s, i32 %mip) #0 {

; GFX11-LABEL: @image_store_mip_1d_store_insert_zeros_at_end(

; GFX11-NEXT: [[TMP1:%.*]] = insertelement <3 x float> <float 0.000000e+00, float poison, float poison>, float [[VDATA1:%.*]], i64 1

; GFX11-NEXT: [[TMP2:%.*]] = insertelement <3 x float> [[TMP1]], float [[VDATA2:%.*]], i64 2

; GFX11-NEXT: call void @llvm.amdgcn.image.store.mip.1d.v3f32.i32(<3 x float> [[TMP2]], i32 7, i32 [[S:%.*]], i32 [[MIP:%.*]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)

; GFX11-NEXT: ret void

;

%newvdata1 = insertelement <4 x float> undef, float 0.0, i32 0

%newvdata2 = insertelement <4 x float> %newvdata1, float %vdata1, i32 1

%newvdata3 = insertelement <4 x float> %newvdata2, float %vdata2, i32 2

%newvdata4 = insertelement <4 x float> %newvdata3, float 0.0, i32 3

call void @llvm.amdgcn.image.store.mip.1d.v4f32.i32(<4 x float> %newvdata4, i32 7, i32 %s, i32 %mip, <8 x i32> %rsrc, i32 0, i32 0)

ret void

}

define amdgpu_ps void @buffer_store_insert_zeros_at_end(<4 x i32> inreg %a, float %vdata1, i32 %b) {

; GFX11-LABEL: @buffer_store_insert_zeros_at_end(

; GFX11-NEXT: [[TMP1:%.*]] = insertelement <2 x float> undef, float [[VDATA1:%.*]], i64 0

; GFX11-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <2 x i32> zeroinitializer

; GFX11-NEXT: call void @llvm.amdgcn.buffer.store.v2f32(<2 x float> [[TMP2]], <4 x i32> [[A:%.*]], i32 [[B:%.*]], i32 0, i1 false, i1 false)

; GFX11-NEXT: ret void

;

%newvdata1 = insertelement <4 x float> undef, float %vdata1, i32 0

%newvdata2 = insertelement <4 x float> %newvdata1, float %vdata1, i32 1

%newvdata3 = insertelement <4 x float> %newvdata2, float 0.0, i32 2

%newvdata4 = insertelement <4 x float> %newvdata3, float 0.0, i32 3

call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %newvdata4, <4 x i32> %a, i32 %b, i32 0, i1 0, i1 0)

ret void

}

define amdgpu_ps void @struct_buffer_store_insert_zeros(<4 x i32> inreg %a, float %vdata1, i32 %b) {

; GFX11-LABEL: @struct_buffer_store_insert_zeros(

; GFX11-NEXT: [[TMP1:%.*]] = insertelement <3 x float> <float poison, float 0.000000e+00, float poison>, float [[VDATA1:%.*]], i64 0

; GFX11-NEXT: [[TMP2:%.*]] = insertelement <3 x float> [[TMP1]], float [[VDATA1]], i64 2

; GFX11-NEXT: call void @llvm.amdgcn.struct.buffer.store.v3f32(<3 x float> [[TMP2]], <4 x i32> [[A:%.*]], i32 [[B:%.*]], i32 0, i32 42, i32 0)

; GFX11-NEXT: ret void

;

%newvdata1 = insertelement <4 x float> undef, float %vdata1, i32 0

%newvdata2 = insertelement <4 x float> %newvdata1, float 0.0, i32 1

%newvdata3 = insertelement <4 x float> %newvdata2, float %vdata1, i32 2

%newvdata4 = insertelement <4 x float> %newvdata3, float 0.0, i32 3

call void @llvm.amdgcn.struct.buffer.store.v4f32(<4 x float> %newvdata4, <4 x i32> %a, i32 %b, i32 0, i32 42, i32 0)

ret void

}

define amdgpu_ps void @struct_tbuffer_store_insert_zeros_at_beginning(<4 x i32> inreg %a, float %vdata1, i32 %b) {

; GFX11-LABEL: @struct_tbuffer_store_insert_zeros_at_beginning(

; GFX11-NEXT: [[NEWVDATA4:%.*]] = insertelement <4 x float> <float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float poison>, float [[VDATA1:%.*]], i64 3

; GFX11-NEXT: call void @llvm.amdgcn.struct.tbuffer.store.v4f32(<4 x float> [[NEWVDATA4]], <4 x i32> [[A:%.*]], i32 [[B:%.*]], i32 0, i32 42, i32 0, i32 15)

; GFX11-NEXT: ret void

;

%newvdata1 = insertelement <4 x float> undef, float 0.0, i32 0

%newvdata2 = insertelement <4 x float> %newvdata1, float 0.0, i32 1

%newvdata3 = insertelement <4 x float> %newvdata2, float 0.0, i32 2

%newvdata4 = insertelement <4 x float> %newvdata3, float %vdata1, i32 3

call void @llvm.amdgcn.struct.tbuffer.store.v4f32(<4 x float> %newvdata4, <4 x i32> %a, i32 %b, i32 0, i32 42, i32 0, i32 15)

ret void

}

declare void @llvm.amdgcn.raw.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32) #2

declare void @llvm.amdgcn.raw.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i32) #2

declare void @llvm.amdgcn.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i1, i1) #0

declare void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i1, i1) #1

declare void @llvm.amdgcn.struct.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32) #2

declare void @llvm.amdgcn.struct.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32) #2

declare void @llvm.amdgcn.struct.tbuffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32, i32) #0

declare void @llvm.amdgcn.raw.tbuffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32) #0

declare void @llvm.amdgcn.tbuffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32, i32, i32, i1, i1) #0

declare void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float>, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.2d.v4f32.i32(<4 x float>, i32, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.3d.v4f32.i32(<4 x float>, i32, i32, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.cube.v4f32.i32(<4 x float>, i32, i32, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.1darray.v4f32.i32(<4 x float>, i32, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.2darray.v4f32.i32(<4 x float>, i32, i32, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.2dmsaa.v4f32.i32(<4 x float>, i32, i32, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.2darraymsaa.v4f32.i32(<4 x float>, i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.mip.1d.v4f32.i32(<4 x float>, i32, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.mip.2d.v4f32.i32(<4 x float>, i32, i32, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.mip.3d.v4f32.i32(<4 x float>, i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.mip.cube.v4f32.i32(<4 x float>, i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.mip.1darray.v4f32.i32(<4 x float>, i32, i32, i32, i32, <8 x i32>, i32, i32) #0

declare void @llvm.amdgcn.image.store.mip.2darray.v4f32.i32(<4 x float>, i32, i32, i32, i32, i32, <8 x i32>, i32, i32) #0

attributes #0 = { nounwind }

attributes #1 = { nounwind writeonly }

attributes #2 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Trim zero components from buffer and image storesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 520957

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-simplify-image-buffer-stores.ll

[AMDGPU] Trim zero components from buffer and image stores
ClosedPublic