This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
1/3
DAGCombiner.cpp
-
Target/AArch64/
-
AArch64/
1/3
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-masked-ldst-zext.ll

Differential D122703

[AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD
ClosedPublic

Authored by Allen on Mar 29 2022, 10:09 PM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
david-arm
fhahn
craig.topper

Commits

rG19e523514714: [AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD

Summary

Accord the discussion in D122281, we missing an ISD::AND combine for MLOAD
because it relies on BuildVectorSDNode is fails for scalable vectors.
This patch is intend to handle that, so we can circle back the type MVT::nxv2i32

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Allen created this revision.Mar 29 2022, 10:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2022, 10:09 PM

Herald added subscribers: StephenFan, hiraditya, kristof.beyls. · View Herald Transcript

Allen requested review of this revision.Mar 29 2022, 10:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2022, 10:09 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

paulwalker-arm added inline comments.Mar 30 2022, 3:03 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14425–14426	The masked load's MemVT is not sufficient as you must also consider the load's extension type. As it stand this transform is only safe for zero-extending masked loads... for an any existing masked load the extension can be changed to zero-extending... for a single use sign extending load the extension can be changed to zero-extending. How far down this path you go depends on the exact use case you need. If it's just (1) then I think this can be a common DAGCombine. For (2) and (3) it'll depend on whether there are sufficient TLI hooks to ensure you don't create a version a target cannot lower. That said, given you require the input to be an extending masked load perhaps that is not a problem. One final issue is the handling of the masked load's pass through value. With the `AND` in place this will be zero-extended but once removed that stops and thus that will need to be considered as well. My guess is that for today's use case you want to restrict the combine to instances where the pass through value is either undef or zero.

Harbormaster completed remote builds in B156884: Diff 419047.Mar 30 2022, 1:47 PM

hi @paulwalker-arm

With some debug with function DAGCombiner::visitAND, I find there is another issue need to check:

it use TLI.isLoadExtLegal(ISD::ZEXTLOAD, ExtVT, LoadVT) to check whether the following transform can be done, do we need similar check in the AArch64 backend? If yes, I think we must set MVT::nxv2i32 legal.
fold (and (masked_load) (build_vec (x, ...))) to zext_masked_load

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14425–14426	Thanks @paulwalker-arm for detail comment. I has a question: Both 2) any extension and 3) sign extension can be changed to zero-extending, so why we need check the load's extension type ?

In D122703#3418244, @Allen wrote:
hi @paulwalker-arm
With some debug with function DAGCombiner::visitAND, I find there is another issue need to check:
't s
it use TLI.isLoadExtLegal(ISD::ZEXTLOAD, ExtVT, LoadVT) to check whether the following transform can be done, do we need similar check in the AArch64 backend? If yes, I think we must set MVT::nxv2i32 legal.

Not sure I understand what you mean here. ExtVT will be the legal type, most likely MVT::nxv2i64, whereas MVT::nxv2i32 will be the MemVT that doesn't need to be legal, so I don't see an issue? Looking at DAGCombiner::visitAND it looks like it already does a related transformation but because it relies on BuildVectorSDNode is fails for scalable vectors. To me it looks like we can make this common code more portable and support all vector types.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14425–14426	The need to check the extension type depends on which of the options you go with. If you only implement (1) then you need to check the extension type is explicitly zero-extending. Options (1) & (2) require you to skip sign-extending cases and for (3) you'll still need a use count check for the sign-extending case.

rewrite to support splat_vec in DAGCombiner::visitAND

Herald added a subscriber: ecnelises. · View Herald TranscriptApr 1 2022, 2:56 AM

In D122703#3419784, @paulwalker-arm wrote:
In D122703#3418244, @Allen wrote:
hi @paulwalker-arm
With some debug with function DAGCombiner::visitAND, I find there is another issue need to check:
't s
it use TLI.isLoadExtLegal(ISD::ZEXTLOAD, ExtVT, LoadVT) to check whether the following transform can be done, do we need similar check in the AArch64 backend? If yes, I think we must set MVT::nxv2i32 legal.
Not sure I understand what you mean here. ExtVT will be the legal type, most likely MVT::nxv2i64, whereas MVT::nxv2i32 will be the MemVT that doesn't need to be legal, so I don't see an issue? Looking at DAGCombiner::visitAND it looks like it already does a related transformation but because it relies on BuildVectorSDNode is fails for scalable vectors. To me it looks like we can make this common code more portable and support all vector types.

I rewrite the MR with above review with support splat_vec in DAGCombiner::visitAND, thanks @paulwalker-arm very much for detail advice.

Harbormaster completed remote builds in B157371: Diff 419693.Apr 1 2022, 3:50 AM

paulwalker-arm added inline comments.Apr 4 2022, 4:54 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6070	If you follow the advice below I'd also remove the `build_vec` comment as the `splat_vec` one is meaning full enough in itself.
6072	You could simplify the code by extract the splat here, i.e. ConstantSDNode *Splat = isConstOrConstSplat(N1, true, true) That would remove the separate buildvector and splatvector checks and also remove a level of indentation in the `TLI.isLoadExtLegal` block.

update according the review

Herald added a subscriber: alextsao1999. · View Herald TranscriptApr 5 2022, 6:42 PM

Harbormaster completed remote builds in B158103: Diff 420680.Apr 5 2022, 7:28 PM

paulwalker-arm accepted this revision.Apr 6 2022, 5:20 AM

paulwalker-arm added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6070	Sorry I think I wasn't clear enough with my previous comment. I meant that it is not worth having both the original build_vec and your new splat_vec comment and so it would be better to just keep the splat_vec one (i.e. `fold (and (masked_load) (splat_vec (x, ...))) to zext_masked_load`)

This revision is now accepted and ready to land.Apr 6 2022, 5:20 AM

This revision was landed with ongoing or failed builds.Apr 6 2022, 5:54 AM

Closed by commit rG19e523514714: [AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD (authored by Allen). · Explain Why

This revision was automatically updated to reflect the committed changes.

Allen marked an inline comment as done.

Allen added a commit: rG19e523514714: [AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

26 lines

Target/

AArch64/

AArch64ISelLowering.cpp

1 line

test/

CodeGen/

AArch64/

sve-masked-ldst-zext.ll

2 lines

Diff 420815

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,061 Lines • ▼ Show 20 Lines	if (ISD::isConstantSplatVectorAllZeros(N1.getNode()))
// do not return N1, because undef node may exist in N1		// do not return N1, because undef node may exist in N1
return DAG.getConstant(APInt::getZero(N1.getScalarValueSizeInBits()),		return DAG.getConstant(APInt::getZero(N1.getScalarValueSizeInBits()),
SDLoc(N), N1.getValueType());		SDLoc(N), N1.getValueType());

// fold (and x, -1) -> x, vector edition		// fold (and x, -1) -> x, vector edition
if (ISD::isConstantSplatVectorAllOnes(N1.getNode()))		if (ISD::isConstantSplatVectorAllOnes(N1.getNode()))
return N0;		return N0;

// fold (and (masked_load) (build_vec (x, ...))) to zext_masked_load		// fold (and (masked_load) (splat_vec (x, ...))) to zext_masked_load
paulwalker-armUnsubmitted Not Done Reply Inline Actions Sorry I think I wasn't clear enough with my previous comment. I meant that it is not worth having both the original build_vec and your new splat_vec comment and so it would be better to just keep the splat_vec one (i.e. `fold (and (masked_load) (splat_vec (x, ...))) to zext_masked_load`) paulwalker-arm: Sorry I think I wasn't clear enough with my previous comment. I meant that it is not worth…
		paulwalker-armUnsubmitted Done Reply Inline Actions If you follow the advice below I'd also remove the `build_vec` comment as the `splat_vec` one is meaning full enough in itself. paulwalker-arm: If you follow the advice below I'd also remove the `build_vec` comment as the `splat_vec` one…
auto *MLoad = dyn_cast<MaskedLoadSDNode>(N0);		auto *MLoad = dyn_cast<MaskedLoadSDNode>(N0);
auto *BVec = dyn_cast<BuildVectorSDNode>(N1);		ConstantSDNode *Splat = isConstOrConstSplat(N1, true, true);
paulwalker-armUnsubmitted Not Done Reply Inline Actions You could simplify the code by extract the splat here, i.e. ConstantSDNode Splat = isConstOrConstSplat(N1, true, true) That would remove the separate buildvector and splatvector checks and also remove a level of indentation in the `TLI.isLoadExtLegal` block. paulwalker-arm:* You could simplify the code by extract the splat here, i.e. ``` ConstantSDNode *Splat =…
if (MLoad && BVec && MLoad->getExtensionType() == ISD::EXTLOAD &&		if (MLoad && MLoad->getExtensionType() == ISD::EXTLOAD && N0.hasOneUse() &&
N0.hasOneUse() && N1.hasOneUse()) {		Splat && N1.hasOneUse()) {
EVT LoadVT = MLoad->getMemoryVT();		EVT LoadVT = MLoad->getMemoryVT();
EVT ExtVT = VT;		EVT ExtVT = VT;
if (TLI.isLoadExtLegal(ISD::ZEXTLOAD, ExtVT, LoadVT)) {		if (TLI.isLoadExtLegal(ISD::ZEXTLOAD, ExtVT, LoadVT)) {
// For this AND to be a zero extension of the masked load the elements		// For this AND to be a zero extension of the masked load the elements
// of the BuildVec must mask the bottom bits of the extended element		// of the BuildVec must mask the bottom bits of the extended element
// type		// type
if (ConstantSDNode *Splat = BVec->getConstantSplatNode()) {
uint64_t ElementSize =		uint64_t ElementSize =
LoadVT.getVectorElementType().getScalarSizeInBits();		LoadVT.getVectorElementType().getScalarSizeInBits();
if (Splat->getAPIntValue().isMask(ElementSize)) {		if (Splat->getAPIntValue().isMask(ElementSize)) {
return DAG.getMaskedLoad(		return DAG.getMaskedLoad(
ExtVT, SDLoc(N), MLoad->getChain(), MLoad->getBasePtr(),		ExtVT, SDLoc(N), MLoad->getChain(), MLoad->getBasePtr(),
MLoad->getOffset(), MLoad->getMask(), MLoad->getPassThru(),		MLoad->getOffset(), MLoad->getMask(), MLoad->getPassThru(),
LoadVT, MLoad->getMemOperand(), MLoad->getAddressingMode(),		LoadVT, MLoad->getMemOperand(), MLoad->getAddressingMode(),
ISD::ZEXTLOAD, MLoad->isExpandingLoad());		ISD::ZEXTLOAD, MLoad->isExpandingLoad());
}		}
}		}
}		}
}		}
}

// fold (and x, -1) -> x		// fold (and x, -1) -> x
if (isAllOnesConstant(N1))		if (isAllOnesConstant(N1))
return N0;		return N0;

// if (and x, c) is known to be zero, return 0		// if (and x, c) is known to be zero, return 0
unsigned BitWidth = VT.getScalarSizeInBits();		unsigned BitWidth = VT.getScalarSizeInBits();
ConstantSDNode *N1C = isConstOrConstSplat(N1);		ConstantSDNode *N1C = isConstOrConstSplat(N1);
▲ Show 20 Lines • Show All 18,409 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,224 Lines • ▼ Show 20 Lines	if (Subtarget->hasSVE()) {
}		}

// Then, selectively enable those which we directly support.		// Then, selectively enable those which we directly support.
for (auto Op : {ISD::ZEXTLOAD, ISD::SEXTLOAD, ISD::EXTLOAD}) {		for (auto Op : {ISD::ZEXTLOAD, ISD::SEXTLOAD, ISD::EXTLOAD}) {
setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i8, Legal);		setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i8, Legal);
setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i16, Legal);		setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i16, Legal);
setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i32, Legal);		setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i32, Legal);
setLoadExtAction(Op, MVT::nxv4i32, MVT::nxv4i8, Legal);		setLoadExtAction(Op, MVT::nxv4i32, MVT::nxv4i8, Legal);
setLoadExtAction(Op, MVT::nxv2i32, MVT::nxv2i16, Legal);
setLoadExtAction(Op, MVT::nxv4i32, MVT::nxv4i16, Legal);		setLoadExtAction(Op, MVT::nxv4i32, MVT::nxv4i16, Legal);
setLoadExtAction(Op, MVT::nxv8i16, MVT::nxv8i8, Legal);		setLoadExtAction(Op, MVT::nxv8i16, MVT::nxv8i8, Legal);
}		}

// SVE supports truncating stores of 64 and 128-bit vectors		// SVE supports truncating stores of 64 and 128-bit vectors
setTruncStoreAction(MVT::v2i64, MVT::v2i8, Custom);		setTruncStoreAction(MVT::v2i64, MVT::v2i8, Custom);
setTruncStoreAction(MVT::v2i64, MVT::v2i16, Custom);		setTruncStoreAction(MVT::v2i64, MVT::v2i16, Custom);
setTruncStoreAction(MVT::v2i64, MVT::v2i32, Custom);		setTruncStoreAction(MVT::v2i64, MVT::v2i32, Custom);
▲ Show 20 Lines • Show All 13,176 Lines • ▼ Show 20 Lines	case AArch64ISD::LDFF1_MERGE_ZERO:
break;		break;
case AArch64ISD::GLD1_MERGE_ZERO:		case AArch64ISD::GLD1_MERGE_ZERO:
case AArch64ISD::GLD1_SCALED_MERGE_ZERO:		case AArch64ISD::GLD1_SCALED_MERGE_ZERO:
case AArch64ISD::GLD1_SXTW_MERGE_ZERO:		case AArch64ISD::GLD1_SXTW_MERGE_ZERO:
case AArch64ISD::GLD1_SXTW_SCALED_MERGE_ZERO:		case AArch64ISD::GLD1_SXTW_SCALED_MERGE_ZERO:
case AArch64ISD::GLD1_UXTW_MERGE_ZERO:		case AArch64ISD::GLD1_UXTW_MERGE_ZERO:
case AArch64ISD::GLD1_UXTW_SCALED_MERGE_ZERO:		case AArch64ISD::GLD1_UXTW_SCALED_MERGE_ZERO:
case AArch64ISD::GLD1_IMM_MERGE_ZERO:		case AArch64ISD::GLD1_IMM_MERGE_ZERO:
case AArch64ISD::GLDFF1_MERGE_ZERO:		case AArch64ISD::GLDFF1_MERGE_ZERO:
case AArch64ISD::GLDFF1_SCALED_MERGE_ZERO:		case AArch64ISD::GLDFF1_SCALED_MERGE_ZERO:
		paulwalker-armUnsubmitted Not Done Reply Inline Actions The masked load's MemVT is not sufficient as you must also consider the load's extension type. As it stand this transform is only safe for zero-extending masked loads... for an any existing masked load the extension can be changed to zero-extending... for a single use sign extending load the extension can be changed to zero-extending. How far down this path you go depends on the exact use case you need. If it's just (1) then I think this can be a common DAGCombine. For (2) and (3) it'll depend on whether there are sufficient TLI hooks to ensure you don't create a version a target cannot lower. That said, given you require the input to be an extending masked load perhaps that is not a problem. One final issue is the handling of the masked load's pass through value. With the `AND` in place this will be zero-extended but once removed that stops and thus that will need to be considered as well. My guess is that for today's use case you want to restrict the combine to instances where the pass through value is either undef or zero. paulwalker-arm: The masked load's MemVT is not sufficient as you must also consider the load's extension type.
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks @paulwalker-arm for detail comment. I has a question: Both 2) any extension and 3) sign extension can be changed to zero-extending, so why we need check the load's extension type ? Allen: Thanks @paulwalker-arm for detail comment. I has a question: Both 2) any extension and 3) sign…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions The need to check the extension type depends on which of the options you go with. If you only implement (1) then you need to check the extension type is explicitly zero-extending. Options (1) & (2) require you to skip sign-extending cases and for (3) you'll still need a use count check for the sign-extending case. paulwalker-arm: The need to check the extension type depends on which of the options you go with. If you only…
case AArch64ISD::GLDFF1_SXTW_MERGE_ZERO:		case AArch64ISD::GLDFF1_SXTW_MERGE_ZERO:
case AArch64ISD::GLDFF1_SXTW_SCALED_MERGE_ZERO:		case AArch64ISD::GLDFF1_SXTW_SCALED_MERGE_ZERO:
case AArch64ISD::GLDFF1_UXTW_MERGE_ZERO:		case AArch64ISD::GLDFF1_UXTW_MERGE_ZERO:
case AArch64ISD::GLDFF1_UXTW_SCALED_MERGE_ZERO:		case AArch64ISD::GLDFF1_UXTW_SCALED_MERGE_ZERO:
case AArch64ISD::GLDFF1_IMM_MERGE_ZERO:		case AArch64ISD::GLDFF1_IMM_MERGE_ZERO:
case AArch64ISD::GLDNT1_MERGE_ZERO:		case AArch64ISD::GLDNT1_MERGE_ZERO:
MemVT = cast<VTSDNode>(Src->getOperand(4))->getVT();		MemVT = cast<VTSDNode>(Src->getOperand(4))->getVT();
break;		break;
▲ Show 20 Lines • Show All 6,446 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-masked-ldst-zext.ll

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret <vscale x 8 x i64> %ext		ret <vscale x 8 x i64> %ext
}		}

; Masked load requires promotion		; Masked load requires promotion
define <vscale x 2 x double> @masked_zload_2i16_2f64(<vscale x 2 x i16>* noalias %in, <vscale x 2 x i1> %mask) {		define <vscale x 2 x double> @masked_zload_2i16_2f64(<vscale x 2 x i16>* noalias %in, <vscale x 2 x i1> %mask) {
; CHECK-LABEL: masked_zload_2i16_2f64:		; CHECK-LABEL: masked_zload_2i16_2f64:
; CHECK: ld1h { z0.d }, p0/z, [x0]		; CHECK: ld1h { z0.d }, p0/z, [x0]
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: ucvtf z0.d, p0/m, z0.s		; CHECK-NEXT: ucvtf z0.d, p0/m, z0.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%wide.load = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16(<vscale x 2 x i16>* %in, i32 2, <vscale x 2 x i1> %mask, <vscale x 2 x i16> undef)		%wide.load = call <vscale x 2 x i16> @llvm.masked.load.nxv2i16(<vscale x 2 x i16>* %in, i32 2, <vscale x 2 x i1> %mask, <vscale x 2 x i16> undef)
%zext = zext <vscale x 2 x i16> %wide.load to <vscale x 2 x i32>		%zext = zext <vscale x 2 x i16> %wide.load to <vscale x 2 x i32>
%res = uitofp <vscale x 2 x i32> %zext to <vscale x 2 x double>		%res = uitofp <vscale x 2 x i32> %zext to <vscale x 2 x double>
ret <vscale x 2 x double> %res		ret <vscale x 2 x double> %res
}		}

declare <vscale x 2 x i8> @llvm.masked.load.nxv2i8(<vscale x 2 x i8>*, i32, <vscale x 2 x i1>, <vscale x 2 x i8>)		declare <vscale x 2 x i8> @llvm.masked.load.nxv2i8(<vscale x 2 x i8>*, i32, <vscale x 2 x i1>, <vscale x 2 x i8>)
declare <vscale x 2 x i16> @llvm.masked.load.nxv2i16(<vscale x 2 x i16>*, i32, <vscale x 2 x i1>, <vscale x 2 x i16>)		declare <vscale x 2 x i16> @llvm.masked.load.nxv2i16(<vscale x 2 x i16>*, i32, <vscale x 2 x i1>, <vscale x 2 x i16>)
declare <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>*, i32, <vscale x 2 x i1>, <vscale x 2 x i32>)		declare <vscale x 2 x i32> @llvm.masked.load.nxv2i32(<vscale x 2 x i32>*, i32, <vscale x 2 x i1>, <vscale x 2 x i32>)
declare <vscale x 4 x i8> @llvm.masked.load.nxv4i8(<vscale x 4 x i8>*, i32, <vscale x 4 x i1>, <vscale x 4 x i8>)		declare <vscale x 4 x i8> @llvm.masked.load.nxv4i8(<vscale x 4 x i8>*, i32, <vscale x 4 x i1>, <vscale x 4 x i8>)
declare <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>*, i32, <vscale x 4 x i1>, <vscale x 4 x i16>)		declare <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>*, i32, <vscale x 4 x i1>, <vscale x 4 x i16>)
declare <vscale x 8 x i8> @llvm.masked.load.nxv8i8(<vscale x 8 x i8>*, i32, <vscale x 8 x i1>, <vscale x 8 x i8>)		declare <vscale x 8 x i8> @llvm.masked.load.nxv8i8(<vscale x 8 x i8>*, i32, <vscale x 8 x i1>, <vscale x 8 x i8>)
declare <vscale x 8 x i16> @llvm.masked.load.nxv8i16(<vscale x 8 x i16>*, i32, <vscale x 8 x i1>, <vscale x 8 x i16>)		declare <vscale x 8 x i16> @llvm.masked.load.nxv8i16(<vscale x 8 x i16>*, i32, <vscale x 8 x i1>, <vscale x 8 x i16>)