This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
3/14
DeadStoreElimination.cpp
-
test/Transforms/DeadStoreElimination/
-
Transforms/
-
DeadStoreElimination/
-
masked-dead-store.ll

Differential D132700

[DSE] Add value type checks for masked store candidates in Dead Store Elimination
ClosedPublic

Authored by mcberg2021 on Aug 25 2022, 12:53 PM.

Download Raw Diff

Details

Reviewers

craig.topper
reames
rui.zhang
fhahn

Commits

rG897a79f97004: [DSE] Add value type info checks for masked store candidates in Dead Store…

Summary

The types of the store values can diverge when checking for valid mask store candidates to eliminate via DSE.

Diff Detail

Event Timeline

mcberg2021 created this revision.Aug 25 2022, 12:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2022, 12:53 PM

Herald added subscribers: StephenFan, hiraditya. · View Herald Transcript

mcberg2021 requested review of this revision.Aug 25 2022, 12:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2022, 12:53 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B183459: Diff 455690.Aug 25 2022, 2:08 PM

mcberg2021 retitled this revision from Add value type checks for masked store candidates in DSE to [DSE] Add value type checks for masked store candidates in Dead Store Elimination.Aug 25 2022, 8:20 PM

mcberg2021 added a reviewer: rui.zhang.Aug 26 2022, 2:42 PM

StephenFan added inline comments.Aug 27 2022, 1:38 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
250	Is it possible to relax the comparison from type to type size?

Your change looks great to me. Thanks for improving this case.

This revision is now accepted and ready to land.Aug 29 2022, 11:33 AM

nikic added a subscriber: nikic.Aug 29 2022, 11:36 AM

nikic added inline comments.

llvm/test/Transforms/DeadStoreElimination/masked-dead-store-no-merge.ll
3 ↗	(On Diff #455690)	`-tbaa` and datalayout should not be necessary here.

fhahn added a subscriber: fhahn.Aug 30 2022, 1:03 PM

fhahn added inline comments.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
245	I am not sure I understand the meaning of the comment. Could you re-phrase?
250	Yeah it would be good to check the stored sizes instead of requiring the types to match. I think we should remove one store if both masked stores are flipped.
llvm/test/Transforms/DeadStoreElimination/masked-dead-store-no-merge.ll
11 ↗	(On Diff #455690)	Could you instead extend the existing masked store test? `llvm/test/Transforms/DeadStoreElimination/masked-dead-store.ll` Also, it would be good to pre-commit the test separately and only include the changed lines caused by the patch in this diff.

mcberg2021 added inline comments.Aug 30 2022, 2:23 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
250	Well, we have an ulterior motive, VLA uses VL with type to constrain size.

fhahn added inline comments.Aug 31 2022, 1:08 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
250	Right, so it sounds like this should also 1) have tests with scalable vectors and 2) different handling for scalable vs non-scalable vectors?

mcberg2021 added inline comments.Aug 31 2022, 3:18 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
250	We would like to keep this patch to just the fixed non-scalable vectors like in this example. I can add the type size and I think element count would also be useful, as it is within context of this intrinsic.

craig.topper added inline comments.Aug 31 2022, 6:24 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
250	@fhahn what does "flipped" mean in your comment. "I think we should remove one store if both masked stores are flipped." I think there was a confusion about was meant by scalable here. When @mcberg2021 said scalable that was referring to a different intrinsic, vp_store, that we will also need to handle here eventually. When @fhahn said scalable that was referring to this intrinsic with a scalable vector type. My suggestion is to check `getElementCount` and that `getScalarSizeInBits` for the type match. That will cover both fixed and scalable. Alternatively, since the masks are the same, the element count is guaranteed the same. So we could check only that `getPrimitiveSizeInBits()` is the same.

mcberg2021 added inline comments.Aug 31 2022, 6:26 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
250	Query: What does "flipped" mean? Right now I think we cover both fixed and scalable for masked_store with a type size check and ElementCount check being the same for both stores as is it precludes mixing fixed and scalable and will work for either scenario.

mcberg2021 added inline comments.Sep 1 2022, 11:45 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

250

I'm going to venture a reasonable guess for flipped: It's the test example turned around from:

tail call void @llvm.masked.store.v4i32.p0(<4 x i32> %v2, ptr %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
tail call void @llvm.masked.store.v4i8.p0(<4 x i8> %v1, ptr %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)

tail call void @llvm.masked.store.v4i8.p0(<4 x i8> %v1, ptr %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
tail call void @llvm.masked.store.v4i32.p0(<4 x i32> %v2, ptr %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)

where if you applied a mask to supply the superset for the second store it would be : 1111 1111 1111 1111
and the first store would accordingly be a well formed subset that fits under it: 0001 0001 0001 0001

meaning the store is fully contained in a constant mask known at compile time.

fhahn added inline comments.Sep 2 2022, 1:10 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
250	Yes that's exactly what I had in mind!

Updated with pre-commit test and changes we discussed.

Harbormaster completed remote builds in B185663: Diff 458806.Sep 8 2022, 12:25 PM

fhahn added inline comments.Sep 8 2022, 1:20 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
252	This should be fine if `KillingTy->getScalarSizeInBits >= DeadTy->getScalarSizeInBits()`, right? I might have missed it, but t looks like there's no test case like the one @mcberg2021 mentioned above?

mcberg2021 added inline comments.Sep 8 2022, 1:51 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
252	That too should work. I will update for the subsumption case. However I would like to leave the mask decomposition support for a later change.

craig.topper added inline comments.Sep 8 2022, 1:57 PM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
252	If the mask is unknown then the store is not writing contiguous bytes. There can be gaps. So it is not sufficient to check that scalar size is larger.

LGTM, thanks! Please add the extra test case as discussed separately.

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
252	Yeah good point, we will need to combine this with the mask analysis. Still good to add the extra test coverage. The new test seems to have constant masks, it would be good to also have variants where the mask is unknown (e.g. arg to function)

Test added as requested.

Harbormaster completed remote builds in B185962: Diff 459216.Sep 9 2022, 5:43 PM

Closed by commit rG897a79f97004: [DSE] Add value type info checks for masked store candidates in Dead Store… (authored by mcberg2021). · Explain WhySep 20 2022, 3:54 PM

This revision was automatically updated to reflect the committed changes.

mcberg2021 added a commit: rG897a79f97004: [DSE] Add value type info checks for masked store candidates in Dead Store….

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

35 lines

test/

Transforms/

DeadStoreElimination/

masked-dead-store.ll

16 lines

Diff 459216

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

	Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines
	/// overwrite \p DeadI.			/// overwrite \p DeadI.
	static OverwriteResult isMaskedStoreOverwrite(const Instruction *KillingI,			static OverwriteResult isMaskedStoreOverwrite(const Instruction *KillingI,
	const Instruction *DeadI,			const Instruction *DeadI,
	BatchAAResults &AA) {			BatchAAResults &AA) {
	const auto *KillingII = dyn_cast<IntrinsicInst>(KillingI);			const auto *KillingII = dyn_cast<IntrinsicInst>(KillingI);
	const auto *DeadII = dyn_cast<IntrinsicInst>(DeadI);			const auto *DeadII = dyn_cast<IntrinsicInst>(DeadI);
	if (KillingII == nullptr \|\| DeadII == nullptr)			if (KillingII == nullptr \|\| DeadII == nullptr)
	return OW_Unknown;			return OW_Unknown;
	if (KillingII->getIntrinsicID() != Intrinsic::masked_store \|\|			if (KillingII->getIntrinsicID() != DeadII->getIntrinsicID())
				fhahnUnsubmitted Not Done Reply Inline Actions I am not sure I understand the meaning of the comment. Could you re-phrase? fhahn: I am not sure I understand the meaning of the comment. Could you re-phrase?
	DeadII->getIntrinsicID() != Intrinsic::masked_store)			return OW_Unknown;
				if (KillingII->getIntrinsicID() == Intrinsic::masked_store) {
				// Type size.
				VectorType *KillingTy =
				cast<VectorType>(KillingII->getArgOperand(0)->getType());
				StephenFanUnsubmitted Not Done Reply Inline Actions Is it possible to relax the comparison from type to type size? StephenFan: Is it possible to relax the comparison from type to type size?
				fhahnUnsubmitted Not Done Reply Inline Actions Yeah it would be good to check the stored sizes instead of requiring the types to match. I think we should remove one store if both masked stores are flipped. fhahn: Yeah it would be good to check the stored sizes instead of requiring the types to match. I…
				mcberg2021AuthorUnsubmitted Not Done Reply Inline Actions Well, we have an ulterior motive, VLA uses VL with type to constrain size. mcberg2021: Well, we have an ulterior motive, VLA uses VL with type to constrain size.
				fhahnUnsubmitted Not Done Reply Inline Actions Right, so it sounds like this should also 1) have tests with scalable vectors and 2) different handling for scalable vs non-scalable vectors? fhahn: Right, so it sounds like this should also 1) have tests with scalable vectors and 2) different…
				mcberg2021AuthorUnsubmitted Not Done Reply Inline Actions We would like to keep this patch to just the fixed non-scalable vectors like in this example. I can add the type size and I think element count would also be useful, as it is within context of this intrinsic. mcberg2021: We would like to keep this patch to just the fixed non-scalable vectors like in this example.
				craig.topperUnsubmitted Not Done Reply Inline Actions @fhahn what does "flipped" mean in your comment. "I think we should remove one store if both masked stores are flipped." I think there was a confusion about was meant by scalable here. When @mcberg2021 said scalable that was referring to a different intrinsic, vp_store, that we will also need to handle here eventually. When @fhahn said scalable that was referring to this intrinsic with a scalable vector type. My suggestion is to check `getElementCount` and that `getScalarSizeInBits` for the type match. That will cover both fixed and scalable. Alternatively, since the masks are the same, the element count is guaranteed the same. So we could check only that `getPrimitiveSizeInBits()` is the same. craig.topper: @fhahn what does "flipped" mean in your comment. "I think we should remove one store if both…
				mcberg2021AuthorUnsubmitted Done Reply Inline Actions Query: What does "flipped" mean? Right now I think we cover both fixed and scalable for masked_store with a type size check and ElementCount check being the same for both stores as is it precludes mixing fixed and scalable and will work for either scenario. mcberg2021: Query: What does "flipped" mean? Right now I think we cover both fixed and scalable for…
				mcberg2021AuthorUnsubmitted Done Reply Inline Actions I'm going to venture a reasonable guess for flipped: It's the test example turned around from: tail call void @llvm.masked.store.v4i32.p0(<4 x i32> %v2, ptr %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>) tail call void @llvm.masked.store.v4i8.p0(<4 x i8> %v1, ptr %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>) to tail call void @llvm.masked.store.v4i8.p0(<4 x i8> %v1, ptr %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>) tail call void @llvm.masked.store.v4i32.p0(<4 x i32> %v2, ptr %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>) where if you applied a mask to supply the superset for the second store it would be : 1111 1111 1111 1111 and the first store would accordingly be a well formed subset that fits under it: 0001 0001 0001 0001 meaning the store is fully contained in a constant mask known at compile time. mcberg2021: I'm going to venture a reasonable guess for flipped: It's the test example turned around from…
				fhahnUnsubmitted Not Done Reply Inline Actions Yes that's exactly what I had in mind! fhahn: Yes that's exactly what I had in mind!
				VectorType *DeadTy = cast<VectorType>(DeadII->getArgOperand(0)->getType());
				if (KillingTy->getScalarSizeInBits() != DeadTy->getScalarSizeInBits())
				fhahnUnsubmitted Not Done Reply Inline Actions This should be fine if `KillingTy->getScalarSizeInBits >= DeadTy->getScalarSizeInBits()`, right? I might have missed it, but t looks like there's no test case like the one @mcberg2021 mentioned above? fhahn: This should be fine if `KillingTy->getScalarSizeInBits >= DeadTy->getScalarSizeInBits()`, right?
				mcberg2021AuthorUnsubmitted Done Reply Inline Actions That too should work. I will update for the subsumption case. However I would like to leave the mask decomposition support for a later change. mcberg2021: That too should work. I will update for the subsumption case. However I would like to leave…
				craig.topperUnsubmitted Not Done Reply Inline Actions If the mask is unknown then the store is not writing contiguous bytes. There can be gaps. So it is not sufficient to check that scalar size is larger. craig.topper: If the mask is unknown then the store is not writing contiguous bytes. There can be gaps. So it…
				fhahnUnsubmitted Not Done Reply Inline Actions Yeah good point, we will need to combine this with the mask analysis. Still good to add the extra test coverage. The new test seems to have constant masks, it would be good to also have variants where the mask is unknown (e.g. arg to function) fhahn: Yeah good point, we will need to combine this with the mask analysis. Still good to add the…
				return OW_Unknown;
				// Element count.
				if (KillingTy->getElementCount() != DeadTy->getElementCount())
	return OW_Unknown;			return OW_Unknown;
	// Pointers.			// Pointers.
	Value *KillingPtr = KillingII->getArgOperand(1)->stripPointerCasts();			Value *KillingPtr = KillingII->getArgOperand(1)->stripPointerCasts();
	Value *DeadPtr = DeadII->getArgOperand(1)->stripPointerCasts();			Value *DeadPtr = DeadII->getArgOperand(1)->stripPointerCasts();
	if (KillingPtr != DeadPtr && !AA.isMustAlias(KillingPtr, DeadPtr))			if (KillingPtr != DeadPtr && !AA.isMustAlias(KillingPtr, DeadPtr))
	return OW_Unknown;			return OW_Unknown;
	// Masks.			// Masks.
	// TODO: check that KillingII's mask is a superset of the DeadII's mask.			// TODO: check that KillingII's mask is a superset of the DeadII's mask.
	if (KillingII->getArgOperand(3) != DeadII->getArgOperand(3))			if (KillingII->getArgOperand(3) != DeadII->getArgOperand(3))
	return OW_Unknown;			return OW_Unknown;
	return OW_Complete;			return OW_Complete;
	}			}
				return OW_Unknown;
				}

	/// Return 'OW_Complete' if a store to the 'KillingLoc' location completely			/// Return 'OW_Complete' if a store to the 'KillingLoc' location completely
	/// overwrites a store to the 'DeadLoc' location, 'OW_End' if the end of the			/// overwrites a store to the 'DeadLoc' location, 'OW_End' if the end of the
	/// 'DeadLoc' location is completely overwritten by 'KillingLoc', 'OW_Begin'			/// 'DeadLoc' location is completely overwritten by 'KillingLoc', 'OW_Begin'
	/// if the beginning of the 'DeadLoc' location is overwritten by 'KillingLoc'.			/// if the beginning of the 'DeadLoc' location is overwritten by 'KillingLoc'.
	/// 'OW_PartialEarlierWithFullLater' means that a dead (big) store was			/// 'OW_PartialEarlierWithFullLater' means that a dead (big) store was
	/// overwritten by a killing (smaller) store which doesn't write outside the big			/// overwritten by a killing (smaller) store which doesn't write outside the big
	/// store's memory locations. Returns 'OW_Unknown' if nothing can be determined.			/// store's memory locations. Returns 'OW_Unknown' if nothing can be determined.
	▲ Show 20 Lines • Show All 1,972 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/masked-dead-store.ll

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	b0:
%v24 = select <32 x i1> %v23, <32 x i8> %v16, <32 x i8> %v22		%v24 = select <32 x i1> %v23, <32 x i8> %v16, <32 x i8> %v22
%v25 = shufflevector <32 x i8> %v24, <32 x i8> undef, <128 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		%v25 = shufflevector <32 x i8> %v24, <32 x i8> undef, <128 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
tail call void @llvm.masked.store.v128i8.p0v128i8(<128 x i8> %v25, <128 x i8>* %v3, i32 32, <128 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>), !tbaa !3		tail call void @llvm.masked.store.v128i8.p0v128i8(<128 x i8> %v25, <128 x i8>* %v3, i32 32, <128 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>), !tbaa !3
ret i32 0		ret i32 0
}		}

define dllexport i32 @f1(<4 x i32>* %a, <4 x i8> %v1, <4 x i32> %v2) {		define dllexport i32 @f1(<4 x i32>* %a, <4 x i8> %v1, <4 x i32> %v2) {
; CHECK-LABEL: @f1(		; CHECK-LABEL: @f1(
; CHECK-NEXT: [[PTR:%.]] = bitcast <4 x i32> [[A:%.]] to <4 x i8>		; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[V2:%.]], <4 x i32> [[A:%.*]], i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
		; CHECK-NEXT: [[PTR:%.]] = bitcast <4 x i32> [[A]] to <4 x i8>*
; CHECK-NEXT: call void @llvm.masked.store.v4i8.p0v4i8(<4 x i8> [[V1:%.]], <4 x i8> [[PTR]], i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		; CHECK-NEXT: call void @llvm.masked.store.v4i8.p0v4i8(<4 x i8> [[V1:%.]], <4 x i8> [[PTR]], i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
; CHECK-NEXT: ret i32 0		; CHECK-NEXT: ret i32 0
;		;
tail call void @llvm.masked.store.v4i32.p0(<4 x i32> %v2, <4 x i32>* %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		tail call void @llvm.masked.store.v4i32.p0(<4 x i32> %v2, <4 x i32>* %a, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
%ptr = bitcast <4 x i32>* %a to <4 x i8>*		%ptr = bitcast <4 x i32>* %a to <4 x i8>*
tail call void @llvm.masked.store.v4i8.p0(<4 x i8> %v1, <4 x i8>* %ptr, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)		tail call void @llvm.masked.store.v4i8.p0(<4 x i8> %v1, <4 x i8>* %ptr, i32 1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>)
ret i32 0		ret i32 0
}		}

		define dllexport i32 @f2(<4 x i32>* %a, <4 x i8> %v1, <4 x i32> %v2, <4 x i1> %mask) {
		; CHECK-LABEL: @f2(
		; CHECK-NEXT: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> [[V2:%.]], <4 x i32> [[A:%.]], i32 1, <4 x i1> [[MASK:%.]])
		; CHECK-NEXT: [[PTR:%.]] = bitcast <4 x i32> [[A]] to <4 x i8>*
		; CHECK-NEXT: call void @llvm.masked.store.v4i8.p0v4i8(<4 x i8> [[V1:%.]], <4 x i8> [[PTR]], i32 1, <4 x i1> [[MASK]])
		; CHECK-NEXT: ret i32 0
		;
		tail call void @llvm.masked.store.v4i32.p0(<4 x i32> %v2, <4 x i32>* %a, i32 1, <4 x i1> %mask)
		%ptr = bitcast <4 x i32>* %a to <4 x i8>*
		tail call void @llvm.masked.store.v4i8.p0(<4 x i8> %v1, <4 x i8>* %ptr, i32 1, <4 x i1> %mask)
		ret i32 0
		}

declare void @llvm.masked.store.v4i8.p0(<4 x i8>, <4 x i8>*, i32, <4 x i1>)		declare void @llvm.masked.store.v4i8.p0(<4 x i8>, <4 x i8>*, i32, <4 x i1>)
declare void @llvm.masked.store.v4i32.p0(<4 x i32>, <4 x i32>*, i32, <4 x i1>)		declare void @llvm.masked.store.v4i32.p0(<4 x i32>, <4 x i32>*, i32, <4 x i1>)

declare void @llvm.masked.store.v128i8.p0v128i8(<128 x i8>, <128 x i8>*, i32 immarg, <128 x i1>) #1		declare void @llvm.masked.store.v128i8.p0v128i8(<128 x i8>, <128 x i8>*, i32 immarg, <128 x i1>) #1
declare <128 x i8> @llvm.masked.load.v128i8.p0v128i8(<128 x i8>*, i32 immarg, <128 x i1>, <128 x i8>) #2		declare <128 x i8> @llvm.masked.load.v128i8.p0v128i8(<128 x i8>*, i32 immarg, <128 x i1>, <128 x i8>) #2

attributes #0 = { nounwind willreturn }		attributes #0 = { nounwind willreturn }
attributes #1 = { argmemonly nounwind willreturn }		attributes #1 = { argmemonly nounwind willreturn }
Show All 13 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Add value type checks for masked store candidates in Dead Store EliminationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 459216

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

llvm/test/Transforms/DeadStoreElimination/masked-dead-store.ll

[DSE] Add value type checks for masked store candidates in Dead Store Elimination
ClosedPublic