Download Raw Diff

Details

Reviewers

sdesmalen
david-arm
peterwaller-arm
MattDevereau
RKSimon

Commits

rGd44b31eca27c: [DAGCombine] Allow DAGCombine to remove dead masked stores.

Summary

Remove a dead masked store if another one has the same base pointer and mask.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dtemirbulatov created this revision.Feb 1 2023, 7:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2023, 7:01 AM

Herald added subscribers: ecnelises, pengfei, hiraditya. · View Herald Transcript

dtemirbulatov requested review of this revision.Feb 1 2023, 7:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2023, 7:01 AM

sdesmalen added inline comments.Feb 1 2023, 7:43 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11434	Is this check necessary? I would expect that it doesn't run the DAGCombiner when OptLevel == CodeGenOpt::None. Does it still run the DAGCombiner with -O0?
11435	A store returns a Chain, which always has one use, so this test can be removed.
11436	This seems like an odd case to ever happen, and you did not add a specific test for this case. Why did you add this check?
11437	Is this check necessary? I would expect that if both pointers are equal, they use the same address space?
11439	I think this can be isKnownEQ, because for the predicates to match, the sizes must match too.
llvm/test/CodeGen/AArch64/sve-dead-masked-store.ll
13	Another case that can be handled is where the second store has an 'all true' mask, then it doesn't really matter what the mask of the first store is, e.g. %alltrue.ins = insertelement <vscale x 4 x i1> poison, i1 true, i32 0 %alltrue = shufflevector <vscale x 4 x i1> %splat.ins, <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask) call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %alltrue)

Harbormaster completed remote builds in B211219: Diff 493933.Feb 1 2023, 7:53 AM

Resolved comments.

Harbormaster completed remote builds in B211540: Diff 494363.Feb 2 2023, 11:36 AM

I found an error in my implementation: we could not remove the store if the chained store is a fixed type and the store we consider to remove is a scalable type, since we don't know scalable type size in the runtime. fixed.

Harbormaster completed remote builds in B211605: Diff 494462.Feb 2 2023, 5:20 PM

sdesmalen added inline comments.Feb 7 2023, 5:12 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11371–11374	Can this be replaced by: ElementCount::isKnownLE(MST1->getMemoryVT().getStoreSize(), MST->getMemoryVT().getStoreSize()) ?

dtemirbulatov added inline comments.Feb 7 2023, 7:24 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

11371–11374

hmm, I could image somethis like:

%alltrue.ins = insertelement <vscale x 4 x i1> poison, i1 true, i32 0
%alltrue = shufflevector  <vscale x 4 x i1> %alltrue.ins,  <vscale x 4 x i1> poison,  <vscale x 4 x i32> zeroinitializer
call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val1, ptr %a, i32 4, <vscale x 4 x i1> %alltrue)

that ends up with st1h, probably TypeSize::KnownLE suits better here.

Replaced to TypeSize::isKnownLE() usage for all constant true case. Added few tests.

Harbormaster completed remote builds in B212615: Diff 495850.Feb 8 2023, 8:23 AM

Thanks for adding the tests @dtemirbulatov! I've just got two more nits on the tests, then I'm happy to accept.

llvm/test/CodeGen/AArch64/sve-dead-masked-store.ll
37–38	Can you change this to: call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask) call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask) ; with same value for the mask such that it's more clear that the first store is eliminated? Now it code-generates the store of nxv4i64 into two stores, because nxv4i64 is not a legal type and needs splitting.
52	For this test, can you use the same %mask value?

Adjust according to comments, fixed error with the same mask but different type size by forbid to delete the store.

dtemirbulatov marked an inline comment as done.Feb 12 2023, 4:48 PM

Harbormaster completed remote builds in B213321: Diff 496815.Feb 12 2023, 6:00 PM

sdesmalen added inline comments.Feb 13 2023, 1:44 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

11369

fixed error with the same mask but different type size by forbid to delete the store

I'm not sure if this is the reason you made this change, but the test that previously removed the redundant store:

define void @dead_masked_store_same_mask_bigger_type(<vscale x 4 x i16> %val, <vscale x 4 x i32> %val1, ptr %a, <vscale x 4 x i1> %mask) {
; CHECK-LABEL: dead_masked_store_same_mask_bigger_type:
; CHECK:       // %bb.0:
; CHECK-NEXT:    st1w { z1.s }, p0, [x0]
; CHECK-NEXT:    ret
  call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
  call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask)
  ret void
}

Is now doing this:

define void @dead_masked_store_same_mask_bigger_type(<vscale x 4 x i16> %val, <vscale x 4 x i32> %val1, ptr %a, <vscale x 4 x i1> %mask) {
; CHECK-LABEL: dead_masked_store_same_mask_bigger_type:
; CHECK:       // %bb.0:
; CHECK-NEXT:    st1h { z0.s }, p0, [x0]
; CHECK-NEXT:    st1w { z1.s }, p0, [x0]
; CHECK-NEXT:    ret
  call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
  call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask)
  ret void
}

The second store is overwriting all the i16 elements that were stored in the first llvm.masked.store, with i32 elements from the second llvm.masked.store, so that means the first store can be removed right?

dtemirbulatov added inline comments.Feb 13 2023, 2:07 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11369	The mask is the same for both stores, but because we types are different starting then offsets of starting and ending affected elements different too. I think we have to keep both store in those cases.

dtemirbulatov marked an inline comment as not done.Feb 13 2023, 2:12 AM

dtemirbulatov added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11369	The mask is the same for both stores, but because types are different starting then offsets of starting and ending affected elements different too. I think we have to keep both store in those cases.

sdesmalen accepted this revision.Feb 13 2023, 3:14 AM

sdesmalen added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

11369

You're right, your current code is correct. The elements wouldn't necessarily line-up in memory, e.g.

call void @llvm.masked.store.v4i16(<4 x i16> <a, b, c, d>, ptr %a, i32 4, <4 x i1> <1, 0, 0, 1>)  ; would store `a` and `d` resulting in   { a__d____ }
call void @llvm.masked.store.v4i32(<4 x i32> <e, f, g, h>, ptr %a, i32 4, <4 x i1> <1, 0, 0, 1>)   ; would store `e` and `h` resulting in  { ee_d__hh }

so the first call can indeed not be removed.

This revision is now accepted and ready to land.Feb 13 2023, 3:14 AM

This revision was landed with ongoing or failed builds.Feb 13 2023, 8:12 AM

Closed by commit rGd44b31eca27c: [DAGCombine] Allow DAGCombine to remove dead masked stores. (authored by dtemirbulatov). · Explain Why

This revision was automatically updated to reflect the committed changes.

dtemirbulatov added a commit: rGd44b31eca27c: [DAGCombine] Allow DAGCombine to remove dead masked stores..

Diff 496996

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,355 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitMSTORE(SDNode *N) {
SDValue Value = MST->getValue();		SDValue Value = MST->getValue();
SDValue Ptr = MST->getBasePtr();		SDValue Ptr = MST->getBasePtr();
SDLoc DL(N);		SDLoc DL(N);

// Zap masked stores with a zero mask.		// Zap masked stores with a zero mask.
if (ISD::isConstantSplatVectorAllZeros(Mask.getNode()))		if (ISD::isConstantSplatVectorAllZeros(Mask.getNode()))
return Chain;		return Chain;

		// Remove a masked store if base pointers and masks are equal.
		if (MaskedStoreSDNode *MST1 = dyn_cast<MaskedStoreSDNode>(Chain)) {
		if (MST->isUnindexed() && MST->isSimple() && MST1->isUnindexed() &&
		MST1->isSimple() && MST1->getBasePtr() == Ptr &&
		!MST->getBasePtr().isUndef() &&
		((Mask == MST1->getMask() && MST->getMemoryVT().getStoreSize() ==
		sdesmalenUnsubmitted Not Done Reply Inline Actions fixed error with the same mask but different type size by forbid to delete the store I'm not sure if this is the reason you made this change, but the test that previously removed the redundant store: define void @dead_masked_store_same_mask_bigger_type(<vscale x 4 x i16> %val, <vscale x 4 x i32> %val1, ptr %a, <vscale x 4 x i1> %mask) { ; CHECK-LABEL: dead_masked_store_same_mask_bigger_type: ; CHECK: // %bb.0: ; CHECK-NEXT: st1w { z1.s }, p0, [x0] ; CHECK-NEXT: ret call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask) call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask) ret void } Is now doing this: define void @dead_masked_store_same_mask_bigger_type(<vscale x 4 x i16> %val, <vscale x 4 x i32> %val1, ptr %a, <vscale x 4 x i1> %mask) { ; CHECK-LABEL: dead_masked_store_same_mask_bigger_type: ; CHECK: // %bb.0: ; CHECK-NEXT: st1h { z0.s }, p0, [x0] ; CHECK-NEXT: st1w { z1.s }, p0, [x0] ; CHECK-NEXT: ret call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask) call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask) ret void } The second store is overwriting all the i16 elements that were stored in the first llvm.masked.store, with i32 elements from the second llvm.masked.store, so that means the first store can be removed right? sdesmalen: > fixed error with the same mask but different type size by forbid to delete the store I'm not…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions The mask is the same for both stores, but because we types are different starting then offsets of starting and ending affected elements different too. I think we have to keep both store in those cases. dtemirbulatov: The mask is the same for both stores, but because we types are different starting then offsets…
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions The mask is the same for both stores, but because types are different starting then offsets of starting and ending affected elements different too. I think we have to keep both store in those cases. dtemirbulatov: The mask is the same for both stores, but because types are different starting then offsets of…
		sdesmalenUnsubmitted Not Done Reply Inline Actions You're right, your current code is correct. The elements wouldn't necessarily line-up in memory, e.g. call void @llvm.masked.store.v4i16(<4 x i16> <a, b, c, d>, ptr %a, i32 4, <4 x i1> <1, 0, 0, 1>) ; would store `a` and `d` resulting in { a__d____ } call void @llvm.masked.store.v4i32(<4 x i32> <e, f, g, h>, ptr %a, i32 4, <4 x i1> <1, 0, 0, 1>) ; would store `e` and `h` resulting in { ee_d__hh } so the first call can indeed not be removed. sdesmalen: You're right, your current code is correct. The elements wouldn't necessarily line-up in memory…
		MST1->getMemoryVT().getStoreSize()) \|\|
		ISD::isConstantSplatVectorAllOnes(Mask.getNode())) &&
		TypeSize::isKnownLE(MST1->getMemoryVT().getStoreSize(),
		MST->getMemoryVT().getStoreSize())) {
		CombineTo(MST1, MST1->getChain());
		sdesmalenUnsubmitted Not Done Reply Inline Actions Can this be replaced by: ElementCount::isKnownLE(MST1->getMemoryVT().getStoreSize(), MST->getMemoryVT().getStoreSize()) ? sdesmalen: Can this be replaced by: ElementCount::isKnownLE(MST1->getMemoryVT().getStoreSize()…
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions hmm, I could image somethis like: %alltrue.ins = insertelement <vscale x 4 x i1> poison, i1 true, i32 0 %alltrue = shufflevector <vscale x 4 x i1> %alltrue.ins, <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask) call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val1, ptr %a, i32 4, <vscale x 4 x i1> %alltrue) that ends up with st1h, probably TypeSize::KnownLE suits better here. dtemirbulatov: hmm, I could image somethis like: %alltrue.ins = insertelement <vscale x 4 x i1> poison, i1…
		if (N->getOpcode() != ISD::DELETED_NODE)
		AddToWorklist(N);
		return SDValue(N, 0);
		}
		}

// If this is a masked load with an all ones mask, we can use a unmasked load.		// If this is a masked load with an all ones mask, we can use a unmasked load.
// FIXME: Can we do this for indexed, compressing, or truncating stores?		// FIXME: Can we do this for indexed, compressing, or truncating stores?
if (ISD::isConstantSplatVectorAllOnes(Mask.getNode()) && MST->isUnindexed() &&		if (ISD::isConstantSplatVectorAllOnes(Mask.getNode()) && MST->isUnindexed() &&
!MST->isCompressingStore() && !MST->isTruncatingStore())		!MST->isCompressingStore() && !MST->isTruncatingStore())
return DAG.getStore(MST->getChain(), SDLoc(N), MST->getValue(),		return DAG.getStore(MST->getChain(), SDLoc(N), MST->getValue(),
MST->getBasePtr(), MST->getPointerInfo(),		MST->getBasePtr(), MST->getPointerInfo(),
MST->getOriginalAlign(), MachineMemOperand::MOStore,		MST->getOriginalAlign(), MachineMemOperand::MOStore,
MST->getAAInfo());		MST->getAAInfo());
Show All 37 Lines	return DAG.getMaskedStore(Chain, SDLoc(N), Value.getOperand(0), Ptr,
MST->getOffset(), Mask, MST->getMemoryVT(),		MST->getOffset(), Mask, MST->getMemoryVT(),
MST->getMemOperand(), MST->getAddressingMode(),		MST->getMemOperand(), MST->getAddressingMode(),
/IsTruncating=/true);		/IsTruncating=/true);
}		}

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitVPGATHER(SDNode *N) {		SDValue DAGCombiner::visitVPGATHER(SDNode *N) {
		sdesmalenUnsubmitted Done Reply Inline Actions Is this check necessary? I would expect that it doesn't run the DAGCombiner when OptLevel == CodeGenOpt::None. Does it still run the DAGCombiner with -O0? sdesmalen: Is this check necessary? I would expect that it doesn't run the DAGCombiner when OptLevel ==…
VPGatherSDNode *MGT = cast<VPGatherSDNode>(N);		VPGatherSDNode *MGT = cast<VPGatherSDNode>(N);
		sdesmalenUnsubmitted Done Reply Inline Actions A store returns a Chain, which always has one use, so this test can be removed. sdesmalen: A store returns a Chain, which always has one use, so this test can be removed.
SDValue Mask = MGT->getMask();		SDValue Mask = MGT->getMask();
		sdesmalenUnsubmitted Done Reply Inline Actions This seems like an odd case to ever happen, and you did not add a specific test for this case. Why did you add this check? sdesmalen: This seems like an odd case to ever happen, and you did not add a specific test for this case.
SDValue Chain = MGT->getChain();		SDValue Chain = MGT->getChain();
		sdesmalenUnsubmitted Done Reply Inline Actions Is this check necessary? I would expect that if both pointers are equal, they use the same address space? sdesmalen: Is this check necessary? I would expect that if both pointers are equal, they use the same…
SDValue Index = MGT->getIndex();		SDValue Index = MGT->getIndex();
SDValue Scale = MGT->getScale();		SDValue Scale = MGT->getScale();
		sdesmalenUnsubmitted Done Reply Inline Actions I think this can be isKnownEQ, because for the predicates to match, the sizes must match too. sdesmalen: I think this can be isKnownEQ, because for the predicates to match, the sizes must match too.
SDValue BasePtr = MGT->getBasePtr();		SDValue BasePtr = MGT->getBasePtr();
SDValue VL = MGT->getVectorLength();		SDValue VL = MGT->getVectorLength();
ISD::MemIndexType IndexType = MGT->getIndexType();		ISD::MemIndexType IndexType = MGT->getIndexType();
SDLoc DL(N);		SDLoc DL(N);

if (refineUniformBase(BasePtr, Index, MGT->isIndexScaled(), DAG, DL)) {		if (refineUniformBase(BasePtr, Index, MGT->isIndexScaled(), DAG, DL)) {
SDValue Ops[] = {Chain, BasePtr, Index, Scale, Mask, VL};		SDValue Ops[] = {Chain, BasePtr, Index, Scale, Mask, VL};
return DAG.getGatherVP(		return DAG.getGatherVP(
▲ Show 20 Lines • Show All 15,394 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-dead-masked-store.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve < %s \| FileCheck %s

				define void @dead_masked_store(<vscale x 4 x i32> %val, ptr %a, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: dead_masked_store:
				; CHECK: // %bb.0:
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
				call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
				ret void
				}

				sdesmalenUnsubmitted Done Reply Inline Actions Another case that can be handled is where the second store has an 'all true' mask, then it doesn't really matter what the mask of the first store is, e.g. %alltrue.ins = insertelement <vscale x 4 x i1> poison, i1 true, i32 0 %alltrue = shufflevector <vscale x 4 x i1> %splat.ins, <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask) call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %alltrue) sdesmalen: Another case that can be handled is where the second store has an 'all true' mask, then it…
				define void @dead_masked_store_alltrue_same(<vscale x 4 x i32> %val, ptr %a, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: dead_masked_store_alltrue_same:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: ret
				%alltrue.ins = insertelement <vscale x 4 x i1> poison, i1 true, i32 0
				%alltrue = shufflevector <vscale x 4 x i1> %alltrue.ins, <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer
				call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
				call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %alltrue)
				ret void
				}

				define void @dead_masked_store_alltrue_bigger(<vscale x 4 x i16> %val, <vscale x 4 x i32> %val1, ptr %a, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: dead_masked_store_alltrue_bigger:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: st1w { z1.s }, p0, [x0]
				; CHECK-NEXT: ret
				%alltrue.ins = insertelement <vscale x 4 x i1> poison, i1 true, i32 0
				%alltrue = shufflevector <vscale x 4 x i1> %alltrue.ins, <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer
				call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
				call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %alltrue)
				ret void
				}
				sdesmalenUnsubmitted Not Done Reply Inline Actions Can you change this to: call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask) call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask) ; with same value for the mask such that it's more clear that the first store is eliminated? Now it code-generates the store of nxv4i64 into two stores, because nxv4i64 is not a legal type and needs splitting. sdesmalen: Can you change this to: call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a…

				define void @dead_masked_store_alltrue_smaller(<vscale x 4 x i32> %val, <vscale x 4 x i16> %val1, ptr %a, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: dead_masked_store_alltrue_smaller:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p1.s
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: st1h { z1.s }, p1, [x0]
				; CHECK-NEXT: ret
				%alltrue.ins = insertelement <vscale x 4 x i1> poison, i1 true, i32 0
				%alltrue = shufflevector <vscale x 4 x i1> %alltrue.ins, <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer
				call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
				call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val1, ptr %a, i32 4, <vscale x 4 x i1> %alltrue)
				ret void
				}
				sdesmalenUnsubmitted Done Reply Inline Actions For this test, can you use the same %mask value? sdesmalen: For this test, can you use the same %mask value?

				define void @dead_masked_store_same_mask_smaller_type(<vscale x 4 x i32> %val, <vscale x 4 x i16> %val1, ptr %a, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: dead_masked_store_same_mask_smaller_type:
				; CHECK: // %bb.0:
				; CHECK-NEXT: st1w { z0.s }, p0, [x0]
				; CHECK-NEXT: st1h { z1.s }, p0, [x0]
				; CHECK-NEXT: ret
				call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
				call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask)
				ret void
				}

				define void @dead_masked_store_same_mask_bigger_type(<vscale x 4 x i16> %val, <vscale x 4 x i32> %val1, ptr %a, <vscale x 4 x i1> %mask) {
				; CHECK-LABEL: dead_masked_store_same_mask_bigger_type:
				; CHECK: // %bb.0:
				; CHECK-NEXT: st1h { z0.s }, p0, [x0]
				; CHECK-NEXT: st1w { z1.s }, p0, [x0]
				; CHECK-NEXT: ret
				call void @llvm.masked.store.nxv4i16(<vscale x 4 x i16> %val, ptr %a, i32 4, <vscale x 4 x i1> %mask)
				call void @llvm.masked.store.nxv4i32(<vscale x 4 x i32> %val1, ptr %a, i32 4, <vscale x 4 x i1> %mask)
				ret void
				}

				declare void @llvm.masked.store.nxv4i16(<vscale x 4 x i16>, <vscale x 4 x i16>*, i32, <vscale x 4 x i1>)
				declare void @llvm.masked.store.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>*, i32, <vscale x 4 x i1>)

llvm/test/CodeGen/X86/masked_store.ll

	Show First 20 Lines • Show All 5,558 Lines • ▼ Show 20 Lines
	; SSE4-NEXT: testb $8, %al			; SSE4-NEXT: testb $8, %al
	; SSE4-NEXT: je LBB30_16			; SSE4-NEXT: je LBB30_16
	; SSE4-NEXT: LBB30_15: ## %cond.store14			; SSE4-NEXT: LBB30_15: ## %cond.store14
	; SSE4-NEXT: extractps $3, %xmm1, 12(%rdi)			; SSE4-NEXT: extractps $3, %xmm1, 12(%rdi)
	; SSE4-NEXT: retq			; SSE4-NEXT: retq
	;			;
	; AVX1OR2-LABEL: PR11210:			; AVX1OR2-LABEL: PR11210:
	; AVX1OR2: ## %bb.0:			; AVX1OR2: ## %bb.0:
	; AVX1OR2-NEXT: vmaskmovps %xmm0, %xmm2, (%rdi)
	; AVX1OR2-NEXT: vmaskmovps %xmm1, %xmm2, (%rdi)			; AVX1OR2-NEXT: vmaskmovps %xmm1, %xmm2, (%rdi)
	; AVX1OR2-NEXT: retq			; AVX1OR2-NEXT: retq
	;			;
	; AVX512F-LABEL: PR11210:			; AVX512F-LABEL: PR11210:
	; AVX512F: ## %bb.0:			; AVX512F: ## %bb.0:
	; AVX512F-NEXT: ## kill: def $xmm2 killed $xmm2 def $zmm2			; AVX512F-NEXT: ## kill: def $xmm2 killed $xmm2 def $zmm2
	; AVX512F-NEXT: ## kill: def $xmm1 killed $xmm1 def $zmm1			; AVX512F-NEXT: ## kill: def $xmm1 killed $xmm1 def $zmm1
	; AVX512F-NEXT: ## kill: def $xmm0 killed $xmm0 def $zmm0			; AVX512F-NEXT: vpxor %xmm0, %xmm0, %xmm0
	; AVX512F-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512F-NEXT: vpcmpgtd %zmm2, %zmm0, %k0
	; AVX512F-NEXT: vpcmpgtd %zmm2, %zmm3, %k0
	; AVX512F-NEXT: kshiftlw $12, %k0, %k0			; AVX512F-NEXT: kshiftlw $12, %k0, %k0
	; AVX512F-NEXT: kshiftrw $12, %k0, %k1			; AVX512F-NEXT: kshiftrw $12, %k0, %k1
	; AVX512F-NEXT: vmovups %zmm0, (%rdi) {%k1}
	; AVX512F-NEXT: vmovups %zmm1, (%rdi) {%k1}			; AVX512F-NEXT: vmovups %zmm1, (%rdi) {%k1}
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VLDQ-LABEL: PR11210:			; AVX512VLDQ-LABEL: PR11210:
	; AVX512VLDQ: ## %bb.0:			; AVX512VLDQ: ## %bb.0:
	; AVX512VLDQ-NEXT: vpmovd2m %xmm2, %k1			; AVX512VLDQ-NEXT: vpmovd2m %xmm2, %k1
	; AVX512VLDQ-NEXT: vmovups %xmm0, (%rdi) {%k1}
	; AVX512VLDQ-NEXT: vmovups %xmm1, (%rdi) {%k1}			; AVX512VLDQ-NEXT: vmovups %xmm1, (%rdi) {%k1}
	; AVX512VLDQ-NEXT: retq			; AVX512VLDQ-NEXT: retq
	;			;
	; AVX512VLBW-LABEL: PR11210:			; AVX512VLBW-LABEL: PR11210:
	; AVX512VLBW: ## %bb.0:			; AVX512VLBW: ## %bb.0:
	; AVX512VLBW-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX512VLBW-NEXT: vpxor %xmm0, %xmm0, %xmm0
	; AVX512VLBW-NEXT: vpcmpgtd %xmm2, %xmm3, %k1			; AVX512VLBW-NEXT: vpcmpgtd %xmm2, %xmm0, %k1
	; AVX512VLBW-NEXT: vmovups %xmm0, (%rdi) {%k1}
	; AVX512VLBW-NEXT: vmovups %xmm1, (%rdi) {%k1}			; AVX512VLBW-NEXT: vmovups %xmm1, (%rdi) {%k1}
	; AVX512VLBW-NEXT: retq			; AVX512VLBW-NEXT: retq
	;			;
	; X86-AVX512-LABEL: PR11210:			; X86-AVX512-LABEL: PR11210:
	; X86-AVX512: ## %bb.0:			; X86-AVX512: ## %bb.0:
	; X86-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-AVX512-NEXT: vpmovd2m %xmm2, %k1			; X86-AVX512-NEXT: vpmovd2m %xmm2, %k1
	; X86-AVX512-NEXT: vmovups %xmm0, (%eax) {%k1}
	; X86-AVX512-NEXT: vmovups %xmm1, (%eax) {%k1}			; X86-AVX512-NEXT: vmovups %xmm1, (%eax) {%k1}
	; X86-AVX512-NEXT: retl			; X86-AVX512-NEXT: retl
	%bc = bitcast <2 x i64> %mask to <4 x i32>			%bc = bitcast <2 x i64> %mask to <4 x i32>
	%trunc = icmp slt <4 x i32> %bc, zeroinitializer			%trunc = icmp slt <4 x i32> %bc, zeroinitializer
	call void @llvm.masked.store.v4f32.p0(<4 x float> %x, ptr %ptr, i32 1, <4 x i1> %trunc)			call void @llvm.masked.store.v4f32.p0(<4 x float> %x, ptr %ptr, i32 1, <4 x i1> %trunc)
	call void @llvm.masked.store.v4f32.p0(<4 x float> %y, ptr %ptr, i32 1, <4 x i1> %trunc)			call void @llvm.masked.store.v4f32.p0(<4 x float> %y, ptr %ptr, i32 1, <4 x i1> %trunc)
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 858 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Allow DAGCombine to remove dead masked stores
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 496996

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AArch64/sve-dead-masked-store.ll

llvm/test/CodeGen/X86/masked_store.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Allow DAGCombine to remove dead masked storesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 496996

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AArch64/sve-dead-masked-store.ll

llvm/test/CodeGen/X86/masked_store.ll

[DAGCombine] Allow DAGCombine to remove dead masked stores
ClosedPublic