Download Raw Diff

Details

Reviewers

craig.topper
mehdi_amini
nicolasvasilache

Commits

rG907871d9ad2d: [llvm] [CodeGen] Fixed vector halving bug for masked load

Summary

Given a VL=14 that is enveloped by a proper VL=16, splitting the
masked load using the enveloping halving VL=8/8 should yields
should eventually yield V=8/5. This fixes various assert failures
in getHalfNumVectorElementsVT() and IncrementMemoryAddress().

Note, I suspect similar fixes will be needed for other masked
operations, but for now I send out a fix for masked load only.

Bugzilla issue 45563
https://bugs.llvm.org/show_bug.cgi?id=45563

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	170 ms	lldb-unit.Host/_/HostTests::Unknown Unit Message ("")

Event Timeline

aartbik created this revision.Apr 21 2020, 8:35 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2020, 8:35 PM

Herald added subscribers: llvm-commits, dmgreen, hiraditya. · View Herald Transcript

aartbik added reviewers: craig.topper, mehdi_amini, nicolasvasilache.Apr 21 2020, 8:36 PM

Harbormaster failed remote builds in B54188: Diff 259156!Apr 21 2020, 9:37 PM

I'm concerned that there might be a way to get HiMemVT to contain 0 elements. I think v17i32 would trigger it. We'd widen to v32i32, then split to v16i32 which will have a memvt of v1i32 and v16i32. Then we'll split the v16i32 pieces. And now we'll try to split that v1i32 memory vt.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9468	EVTs should be passed by value.

In D78608#1996059, @craig.topper wrote:

I'm concerned that there might be a way to get HiMemVT to contain 0 elements. I think v17i32 would trigger it. We'd widen to v32i32, then split to v16i32 which will have a memvt of v1i32 and v16i32. Then we'll split the v16i32 pieces. And now we'll try to split that v1i32 memory vt.

But do you agree this solution goes in the right direction?
Do you have any suggestions for the v17i32 case?
Happy to test that tomorrow and think about it some more.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9468	I mimicked the prototypes of related methods in this file? Are those wrong too? Happy to change, just checking what best practices are here?

In D78608#1996085, @aartbik wrote:

In D78608#1996059, @craig.topper wrote:

I'm concerned that there might be a way to get HiMemVT to contain 0 elements. I think v17i32 would trigger it. We'd widen to v32i32, then split to v16i32 which will have a memvt of v1i32 and v16i32. Then we'll split the v16i32 pieces. And now we'll try to split that v1i32 memory vt.

But do you agree this solution goes in the right direction?
Do you have any suggestions for the v17i32 case?
Happy to test that tomorrow and think about it some more.

I wonder if we should fix this in WidenVecRes_MLOAD by doing something closer to WidenVecRes_LOAD. Finding a legal type and slicing up the vector as we widen it.

do not generate zero storage size masked loads

PTAL

You had a very sharp eye that v17f32 would cause a split with an empty hi masked load eventually.
It is somewhat harmless since the load is not used and thus removed from the DAG, but this revision
is even cleaner by not generating the hi masked load at all!

removed left behind debug code

craig.topper added inline comments.Apr 22 2020, 7:39 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9526	Stray 'k' got added to this line.

fixed some stray diffs

aartbik marked an inline comment as done.Apr 22 2020, 8:03 PM

Harbormaster failed remote builds in B54336: Diff 259456!Apr 22 2020, 8:38 PM

Harbormaster failed remote builds in B54340: Diff 259461!Apr 22 2020, 9:10 PM

Harbormaster failed remote builds in B54337: Diff 259457!

dmgreen added inline comments.Apr 22 2020, 10:55 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1596	Could we return PassThruHi directly?
llvm/test/CodeGen/X86/pr45563-2.ll
1	Please use update_llc_test_checks on tests this size

aartbik marked 3 inline comments as done.Apr 22 2020, 11:38 PM

aartbik added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1596	I experimented with different values, this seemed the least code change, since it folds directly in the DAG.getNode below, and is removed right after that. Any thing else would require more changes to deal with the 1 vs 2 result values.
llvm/test/CodeGen/X86/pr45563-2.ll
1	Ah, I saw some auto generated messages, but was not aware of this tool. But, forgive my noobness, what will this auto-generate? I am really just counting the number of masked moves that result, I don't want to pin down the full assembly?

craig.topper added inline comments.Apr 23 2020, 12:11 AM

llvm/test/CodeGen/X86/pr45563-2.ll
1	It will generate checks for the full assembly, but the X86 maintainers are used to that. We'll just rerun the script and look at the diff if it fails for some change in the future.

used update_llc_test_checks (cool!)

aartbik added inline comments.Apr 23 2020, 11:49 AM

llvm/test/CodeGen/X86/pr45563-2.ll
1	Nice utility! Having the full assembly makes it a bit sensitive to changes, but I can also see that the X86 maintainers want to be aware of even the slightest of changes and with this utility that becomes a lot simpler. Cool!

Harbormaster failed remote builds in B54448: Diff 259647!Apr 23 2020, 2:08 PM

LGTM

This revision is now accepted and ready to land.Apr 23 2020, 2:21 PM

Closed by commit rG907871d9ad2d: [llvm] [CodeGen] Fixed vector halving bug for masked load (authored by aartbik). · Explain WhyApr 23 2020, 3:15 PM

This revision was automatically updated to reflect the committed changes.

Diff 259461

llvm/include/llvm/CodeGen/SelectionDAG.h

Show First 20 Lines • Show All 1,825 Lines • ▼ Show 20 Lines	if (auto A = InferPtrAlign(Ptr))
return A->value();		return A->value();
return 0;		return 0;
}		}

/// Compute the VTs needed for the low/hi parts of a type		/// Compute the VTs needed for the low/hi parts of a type
/// which is split (or expanded) into two not necessarily identical pieces.		/// which is split (or expanded) into two not necessarily identical pieces.
std::pair<EVT, EVT> GetSplitDestVTs(const EVT &VT) const;		std::pair<EVT, EVT> GetSplitDestVTs(const EVT &VT) const;

		/// Compute the VTs needed for the low/hi parts of a type, dependent on an
		/// enveloping VT that has been split into two identical pieces. Sets the
		/// HisIsEmpty flag when hi type has zero storage size.
		std::pair<EVT, EVT> GetDependentSplitDestVTs(const EVT &VT, const EVT &EnvVT,
		bool *HiIsEmpty) const;

/// Split the vector with EXTRACT_SUBVECTOR using the provides		/// Split the vector with EXTRACT_SUBVECTOR using the provides
/// VTs and return the low/high part.		/// VTs and return the low/high part.
std::pair<SDValue, SDValue> SplitVector(const SDValue &N, const SDLoc &DL,		std::pair<SDValue, SDValue> SplitVector(const SDValue &N, const SDLoc &DL,
const EVT &LoVT, const EVT &HiVT);		const EVT &LoVT, const EVT &HiVT);

/// Split the vector with EXTRACT_SUBVECTOR and return the low/high part.		/// Split the vector with EXTRACT_SUBVECTOR and return the low/high part.
std::pair<SDValue, SDValue> SplitVector(const SDValue &N, const SDLoc &DL) {		std::pair<SDValue, SDValue> SplitVector(const SDValue &N, const SDLoc &DL) {
EVT LoVT, HiVT;		EVT LoVT, HiVT;
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 1,565 Lines • ▼ Show 20 Lines	if (Mask.getOpcode() == ISD::SETCC) {
if (getTypeAction(Mask.getValueType()) == TargetLowering::TypeSplitVector)		if (getTypeAction(Mask.getValueType()) == TargetLowering::TypeSplitVector)
GetSplitVector(Mask, MaskLo, MaskHi);		GetSplitVector(Mask, MaskLo, MaskHi);
else		else
std::tie(MaskLo, MaskHi) = DAG.SplitVector(Mask, dl);		std::tie(MaskLo, MaskHi) = DAG.SplitVector(Mask, dl);
}		}

EVT MemoryVT = MLD->getMemoryVT();		EVT MemoryVT = MLD->getMemoryVT();
EVT LoMemVT, HiMemVT;		EVT LoMemVT, HiMemVT;
std::tie(LoMemVT, HiMemVT) = DAG.GetSplitDestVTs(MemoryVT);		bool HiIsEmpty = false;
		std::tie(LoMemVT, HiMemVT) =
		DAG.GetDependentSplitDestVTs(MemoryVT, LoVT, &HiIsEmpty);

SDValue PassThruLo, PassThruHi;		SDValue PassThruLo, PassThruHi;
if (getTypeAction(PassThru.getValueType()) == TargetLowering::TypeSplitVector)		if (getTypeAction(PassThru.getValueType()) == TargetLowering::TypeSplitVector)
GetSplitVector(PassThru, PassThruLo, PassThruHi);		GetSplitVector(PassThru, PassThruLo, PassThruHi);
else		else
std::tie(PassThruLo, PassThruHi) = DAG.SplitVector(PassThru, dl);		std::tie(PassThruLo, PassThruHi) = DAG.SplitVector(PassThru, dl);

MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(		MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
MLD->getPointerInfo(), MachineMemOperand::MOLoad, LoMemVT.getStoreSize(),		MLD->getPointerInfo(), MachineMemOperand::MOLoad, LoMemVT.getStoreSize(),
Alignment, MLD->getAAInfo(), MLD->getRanges());		Alignment, MLD->getAAInfo(), MLD->getRanges());

Lo = DAG.getMaskedLoad(LoVT, dl, Ch, Ptr, Offset, MaskLo, PassThruLo, LoMemVT,		Lo = DAG.getMaskedLoad(LoVT, dl, Ch, Ptr, Offset, MaskLo, PassThruLo, LoMemVT,
MMO, MLD->getAddressingMode(), ExtType,		MMO, MLD->getAddressingMode(), ExtType,
MLD->isExpandingLoad());		MLD->isExpandingLoad());

		if (HiIsEmpty) {
		// The hi masked load has zero storage size. We therefore simply set
		// it to the low masked load and rely on subsequent optimization to
		// remove the unused element in the chain.
		Hi = Lo;
		dmgreenUnsubmitted Done Reply Inline Actions Could we return PassThruHi directly? dmgreen: Could we return PassThruHi directly?
		aartbikAuthorUnsubmitted Done Reply Inline Actions I experimented with different values, this seemed the least code change, since it folds directly in the DAG.getNode below, and is removed right after that. Any thing else would require more changes to deal with the 1 vs 2 result values. aartbik: I experimented with different values, this seemed the least code change, since it folds…
		} else {
		// Generate hi masked load.
Ptr = TLI.IncrementMemoryAddress(Ptr, MaskLo, dl, LoMemVT, DAG,		Ptr = TLI.IncrementMemoryAddress(Ptr, MaskLo, dl, LoMemVT, DAG,
MLD->isExpandingLoad());		MLD->isExpandingLoad());
unsigned HiOffset = LoMemVT.getStoreSize();		unsigned HiOffset = LoMemVT.getStoreSize();

MMO = DAG.getMachineFunction().getMachineMemOperand(		MMO = DAG.getMachineFunction().getMachineMemOperand(
MLD->getPointerInfo().getWithOffset(HiOffset), MachineMemOperand::MOLoad,		MLD->getPointerInfo().getWithOffset(HiOffset),
HiMemVT.getStoreSize(), Alignment, MLD->getAAInfo(), MLD->getRanges());		MachineMemOperand::MOLoad, HiMemVT.getStoreSize(), Alignment,
		MLD->getAAInfo(), MLD->getRanges());

Hi = DAG.getMaskedLoad(HiVT, dl, Ch, Ptr, Offset, MaskHi, PassThruHi, HiMemVT,		Hi = DAG.getMaskedLoad(HiVT, dl, Ch, Ptr, Offset, MaskHi, PassThruHi,
MMO, MLD->getAddressingMode(), ExtType,		HiMemVT, MMO, MLD->getAddressingMode(), ExtType,
MLD->isExpandingLoad());		MLD->isExpandingLoad());
		}

// Build a factor node to remember that this load is independent of the		// Build a factor node to remember that this load is independent of the
// other one.		// other one.
Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),		Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
Hi.getValue(1));		Hi.getValue(1));

// Legalize the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
▲ Show 20 Lines • Show All 3,586 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,455 Lines • ▼ Show 20 Lines	std::pair<EVT, EVT> SelectionDAG::GetSplitDestVTs(const EVT &VT) const {
if (!VT.isVector())		if (!VT.isVector())
LoVT = HiVT = TLI->getTypeToTransformTo(*getContext(), VT);		LoVT = HiVT = TLI->getTypeToTransformTo(*getContext(), VT);
else		else
LoVT = HiVT = VT.getHalfNumVectorElementsVT(*getContext());		LoVT = HiVT = VT.getHalfNumVectorElementsVT(*getContext());

return std::make_pair(LoVT, HiVT);		return std::make_pair(LoVT, HiVT);
}		}

		/// GetDependentSplitDestVTs - Compute the VTs needed for the low/hi parts of a
		/// type, dependent on an enveloping VT that has been split into two identical
		/// pieces. Sets the HiIsEmpty flag when hi type has zero storage size.
		std::pair<EVT, EVT>
		SelectionDAG::GetDependentSplitDestVTs(const EVT &VT, const EVT &EnvVT,
		craig.topperUnsubmitted Not Done Reply Inline Actions EVTs should be passed by value. craig.topper: EVTs should be passed by value.
		aartbikAuthorUnsubmitted Done Reply Inline Actions I mimicked the prototypes of related methods in this file? Are those wrong too? Happy to change, just checking what best practices are here? aartbik: I mimicked the prototypes of related methods in this file? Are those wrong too? Happy to change…
		bool *HiIsEmpty) const {
		EVT EltTp = VT.getVectorElementType();
		bool IsScalable = VT.isScalableVector();
		// Examples:
		// custom VL=8 with enveloping VL=8/8 yields 8/0 (hi empty)
		// custom VL=9 with enveloping VL=8/8 yields 8/1
		// custom VL=10 with enveloping VL=8/8 yields 8/2
		// etc.
		unsigned VTNumElts = VT.getVectorNumElements();
		unsigned EnvNumElts = EnvVT.getVectorNumElements();
		EVT LoVT, HiVT;
		if (VTNumElts > EnvNumElts) {
		LoVT = EnvVT;
		HiVT = EVT::getVectorVT(*getContext(), EltTp, VTNumElts - EnvNumElts,
		IsScalable);
		*HiIsEmpty = false;
		} else {
		// Flag that hi type has zero storage size, but return split envelop type
		// (this would be easier if vector types with zero elements were allowed).
		LoVT = EVT::getVectorVT(*getContext(), EltTp, VTNumElts, IsScalable);
		HiVT = EnvVT;
		*HiIsEmpty = true;
		}
		return std::make_pair(LoVT, HiVT);
		}

/// SplitVector - Split the vector with EXTRACT_SUBVECTOR and return the		/// SplitVector - Split the vector with EXTRACT_SUBVECTOR and return the
/// low/high part.		/// low/high part.
std::pair<SDValue, SDValue>		std::pair<SDValue, SDValue>
SelectionDAG::SplitVector(const SDValue &N, const SDLoc &DL, const EVT &LoVT,		SelectionDAG::SplitVector(const SDValue &N, const SDLoc &DL, const EVT &LoVT,
const EVT &HiVT) {		const EVT &HiVT) {
assert(LoVT.getVectorNumElements() + HiVT.getVectorNumElements() <=		assert(LoVT.getVectorNumElements() + HiVT.getVectorNumElements() <=
N.getValueType().getVectorNumElements() &&		N.getValueType().getVectorNumElements() &&
"More vector elements requested than available!");		"More vector elements requested than available!");
Show All 15 Lines
}		}

void SelectionDAG::ExtractVectorElements(SDValue Op,		void SelectionDAG::ExtractVectorElements(SDValue Op,
SmallVectorImpl<SDValue> &Args,		SmallVectorImpl<SDValue> &Args,
unsigned Start, unsigned Count,		unsigned Start, unsigned Count,
EVT EltVT) {		EVT EltVT) {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
if (Count == 0)		if (Count == 0)
Count = VT.getVectorNumElements();		Count = VT.getVectorNumElements();
		craig.topperUnsubmitted Done Reply Inline Actions Stray 'k' got added to this line. craig.topper: Stray 'k' got added to this line.
if (EltVT == EVT())		if (EltVT == EVT())
EltVT = VT.getVectorElementType();		EltVT = VT.getVectorElementType();
SDLoc SL(Op);		SDLoc SL(Op);
for (unsigned i = Start, e = Start + Count; i != e; ++i) {		for (unsigned i = Start, e = Start + Count; i != e; ++i) {
Args.push_back(getNode(ISD::EXTRACT_VECTOR_ELT, SL, EltVT, Op,		Args.push_back(getNode(ISD::EXTRACT_VECTOR_ELT, SL, EltVT, Op,
getVectorIdxConstant(i, SL)));		getVectorIdxConstant(i, SL)));
}		}
}		}
▲ Show 20 Lines • Show All 295 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/pr45563-2.ll

This file was added.

				; RUN: llc < %s -O3 -mattr=avx \| FileCheck %s
				dmgreenUnsubmitted Done Reply Inline Actions Please use update_llc_test_checks on tests this size dmgreen: Please use update_llc_test_checks on tests this size
				aartbikAuthorUnsubmitted Done Reply Inline Actions Ah, I saw some auto generated messages, but was not aware of this tool. But, forgive my noobness, what will this auto-generate? I am really just counting the number of masked moves that result, I don't want to pin down the full assembly? aartbik: Ah, I saw some auto generated messages, but was not aware of this tool. But, forgive my…
				craig.topperUnsubmitted Done Reply Inline Actions It will generate checks for the full assembly, but the X86 maintainers are used to that. We'll just rerun the script and look at the diff if it fails for some change in the future. craig.topper: It will generate checks for the full assembly, but the X86 maintainers are used to that. We'll…
				aartbikAuthorUnsubmitted Done Reply Inline Actions Nice utility! Having the full assembly makes it a bit sensitive to changes, but I can also see that the X86 maintainers want to be aware of even the slightest of changes and with this utility that becomes a lot simpler. Cool! aartbik: Nice utility! Having the full assembly makes it a bit sensitive to changes, but I can also see…

				; Bug 45563:
				; The SplitVecRes_MLOAD method should split a extended value type
				; according to the halving of the enveloping type to avoid all sorts
				; of inconsistencies downstream. For example for a extended value type
				; with VL=14 and enveloping type VL=16 that is split 8/8, the extended
				; type should be split 8/6 and not 7/7. This also accounts for hi masked
				; load that get zero storage size (and are unused).

				define <9 x float> @mload_split9(<9 x i1> %mask, <9 x float>* %addr, <9 x float> %dst) {
				; CHECK-LABEL: load_split9:
				; CHECK: vmaskmovps
				; CHECK: vmaskmovps
				; CHECK-NOT: vmaskmovps
				%res = call <9 x float> @llvm.masked.load.v9f32.p0v9f32(<9 x float>* %addr, i32 4, <9 x i1>%mask, <9 x float> %dst)
				ret <9 x float> %res
				}

				define <13 x float> @mload_split13(<13 x i1> %mask, <13 x float>* %addr, <13 x float> %dst) {
				; CHECK-LABEL: mload_split13:
				; CHECK: vmaskmovps
				; CHECK: vmaskmovps
				; CHECK-NOT: vmaskmovps
				%res = call <13 x float> @llvm.masked.load.v13f32.p0v13f32(<13 x float>* %addr, i32 4, <13 x i1>%mask, <13 x float> %dst)
				ret <13 x float> %res
				}

				define <14 x float> @mload_split14(<14 x i1> %mask, <14 x float>* %addr, <14 x float> %dst) {
				; CHECK-LABEL: mload_split14:
				; CHECK: vmaskmovps
				; CHECK: vmaskmovps
				; CHECK-NOT: vmaskmovps
				%res = call <14 x float> @llvm.masked.load.v14f32.p0v14f32(<14 x float>* %addr, i32 4, <14 x i1>%mask, <14 x float> %dst)
				ret <14 x float> %res
				}

				define <17 x float> @mload_split17(<17 x i1> %mask, <17 x float>* %addr, <17 x float> %dst) {
				; CHECK-LABEL: mload_split17:
				; CHECK: vmaskmovps
				; CHECK: vmaskmovps
				; CHECK: vmaskmovps
				; CHECK-NOT: vmaskmovps
				%res = call <17 x float> @llvm.masked.load.v17f32.p0v17f32(<17 x float>* %addr, i32 4, <17 x i1>%mask, <17 x float> %dst)
				ret <17 x float> %res
				}

				define <23 x float> @mload_split23(<23 x i1> %mask, <23 x float>* %addr, <23 x float> %dst) {
				; CHECK-LABEL: mload_split23:
				; CHECK: vmaskmovps
				; CHECK: vmaskmovps
				; CHECK: vmaskmovps
				; CHECK-NOT: vmaskmovps
				%res = call <23 x float> @llvm.masked.load.v23f32.p0v23f32(<23 x float>* %addr, i32 4, <23 x i1>%mask, <23 x float> %dst)
				ret <23 x float> %res
				}

				declare <9 x float> @llvm.masked.load.v9f32.p0v9f32(<9 x float>* %addr, i32 %align, <9 x i1> %mask, <9 x float> %dst)
				declare <13 x float> @llvm.masked.load.v13f32.p0v13f32(<13 x float>* %addr, i32 %align, <13 x i1> %mask, <13 x float> %dst)
				declare <14 x float> @llvm.masked.load.v14f32.p0v14f32(<14 x float>* %addr, i32 %align, <14 x i1> %mask, <14 x float> %dst)
				declare <17 x float> @llvm.masked.load.v17f32.p0v17f32(<17 x float>* %addr, i32 %align, <17 x i1> %mask, <17 x float> %dst)
				declare <23 x float> @llvm.masked.load.v23f32.p0v23f32(<23 x float>* %addr, i32 %align, <23 x i1> %mask, <23 x float> %dst)

This is an archive of the discontinued LLVM Phabricator instance.

[llvm] [CodeGen] Fixed vector halving bug for masked load
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 259461

llvm/include/llvm/CodeGen/SelectionDAG.h

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/test/CodeGen/X86/pr45563-2.ll

This is an archive of the discontinued LLVM Phabricator instance.

[llvm] [CodeGen] Fixed vector halving bug for masked loadClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 259461

llvm/include/llvm/CodeGen/SelectionDAG.h

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/test/CodeGen/X86/pr45563-2.ll

[llvm] [CodeGen] Fixed vector halving bug for masked load
ClosedPublic