Diff 371070

llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h

Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines
/// True iff the specified type index is a scalar or vector with an element type		/// True iff the specified type index is a scalar or vector with an element type
/// that's narrower than the given size.		/// that's narrower than the given size.
LegalityPredicate scalarOrEltNarrowerThan(unsigned TypeIdx, unsigned Size);		LegalityPredicate scalarOrEltNarrowerThan(unsigned TypeIdx, unsigned Size);

/// True iff the specified type index is a scalar or a vector with an element		/// True iff the specified type index is a scalar or a vector with an element
/// type that's wider than the given size.		/// type that's wider than the given size.
LegalityPredicate scalarOrEltWiderThan(unsigned TypeIdx, unsigned Size);		LegalityPredicate scalarOrEltWiderThan(unsigned TypeIdx, unsigned Size);

		/// True iff the specified type index is a scalar whose size is not a multiple
		/// of Size.
		foadUnsubmitted Not Done Reply Inline Actions Update comment. foad: Update comment.
		LegalityPredicate sizeNotMultipleOf(unsigned TypeIdx, unsigned Size);
		foadUnsubmitted Not Done Reply Inline Actions I think the generic predicates should probably be named like `sizeNotMultipleOf`, and take the value 32 as an extra `Size` (or `Unit`?) argument. foad: I think the generic predicates should probably be named like `sizeNotMultipleOf`, and take the…

/// True iff the specified type index is a scalar whose size is not a power of		/// True iff the specified type index is a scalar whose size is not a power of
/// 2.		/// 2.
LegalityPredicate sizeNotPow2(unsigned TypeIdx);		LegalityPredicate sizeNotPow2(unsigned TypeIdx);

/// True iff the specified type index is a scalar or vector whose element size		/// True iff the specified type index is a scalar or vector whose element size
/// is not a power of 2.		/// is not a power of 2.
LegalityPredicate scalarOrEltSizeNotPow2(unsigned TypeIdx);		LegalityPredicate scalarOrEltSizeNotPow2(unsigned TypeIdx);

Show All 39 Lines
/// index \p FromIndex. Unlike changeElementTo, this discards pointer types and		/// index \p FromIndex. Unlike changeElementTo, this discards pointer types and
/// only changes the size.		/// only changes the size.
LegalizeMutation changeElementSizeTo(unsigned TypeIdx, unsigned FromTypeIdx);		LegalizeMutation changeElementSizeTo(unsigned TypeIdx, unsigned FromTypeIdx);

/// Widen the scalar type or vector element type for the given type index to the		/// Widen the scalar type or vector element type for the given type index to the
/// next power of 2.		/// next power of 2.
LegalizeMutation widenScalarOrEltToNextPow2(unsigned TypeIdx, unsigned Min = 0);		LegalizeMutation widenScalarOrEltToNextPow2(unsigned TypeIdx, unsigned Min = 0);

		/// Widen the scalar type or vector element type for the given type index to
		/// next multiple of \p Size.
		foadUnsubmitted Not Done Reply Inline Actions Update comments to remove 32. foad: Update comments to remove 32.
		foadUnsubmitted Not Done Reply Inline Actions The "depending on which is greater" part doesn't make much sense to me. Either remove Min (here and in widenScalarToNextMultipleOf) or use the same wording as widenScalarToNextMultipleOf: "Widen ... to the next multiple of Size that is at least MinSize." foad: The "depending on which is greater" part doesn't make much sense to me. Either remove Min…
		LegalizeMutation widenScalarOrEltToNextMultipleOf(unsigned TypeIdx,
		foadUnsubmitted Not Done Reply Inline Actions I would suggest to add Size as a parameter here also, instead of hardcoding 32. Also remove Min unless/until there is a need for it? foad: I would suggest to add Size as a parameter here also, instead of hardcoding 32. Also remove Min…
		unsigned Size);

/// Add more elements to the type for the given type index to the next power of		/// Add more elements to the type for the given type index to the next power of
/// 2.		/// 2.
LegalizeMutation moreElementsToNextPow2(unsigned TypeIdx, unsigned Min = 0);		LegalizeMutation moreElementsToNextPow2(unsigned TypeIdx, unsigned Min = 0);
/// Break up the vector type for the given type index into the element type.		/// Break up the vector type for the given type index into the element type.
LegalizeMutation scalarize(unsigned TypeIdx);		LegalizeMutation scalarize(unsigned TypeIdx);
} // end namespace LegalizeMutations		} // end namespace LegalizeMutations

/// A single rule in a legalizer info ruleset.		/// A single rule in a legalizer info ruleset.
▲ Show 20 Lines • Show All 464 Lines • ▼ Show 20 Lines	public:
LegalizeRuleSet &widenScalarToNextPow2(unsigned TypeIdx,		LegalizeRuleSet &widenScalarToNextPow2(unsigned TypeIdx,
unsigned MinSize = 0) {		unsigned MinSize = 0) {
using namespace LegalityPredicates;		using namespace LegalityPredicates;
return actionIf(		return actionIf(
LegalizeAction::WidenScalar, sizeNotPow2(typeIdx(TypeIdx)),		LegalizeAction::WidenScalar, sizeNotPow2(typeIdx(TypeIdx)),
LegalizeMutations::widenScalarOrEltToNextPow2(TypeIdx, MinSize));		LegalizeMutations::widenScalarOrEltToNextPow2(TypeIdx, MinSize));
}		}

		/// Widen the scalar to the next multiple of Size. No effect if the
		foadUnsubmitted Not Done Reply Inline Actions Add Size as a parameter and remove MinSize? foad: Add Size as a parameter and remove MinSize?
		foadUnsubmitted Not Done Reply Inline Actions I still find this comment confusing. It says "only if Size is greater than MinSize". Do you mean "only if the type is wider than MinSize"? But I still don't think you really need a MinSize parameter. Instead I think you could change the legalization rules to something like: getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL}) .legalFor({S32, S16, V2S16}) .minScalar(0, S16) .clampMaxNumElements(0, S16, 2) .widenScalarToNextMultipleOf(0, 32) .maxScalar(0, S32) .scalarize(0); Would that work? foad: I still find this comment confusing. It says "only if Size is greater than MinSize". Do you…
		/// type is not a scalar or is a multiple of Size.
		LegalizeRuleSet &widenScalarToNextMultipleOf(unsigned TypeIdx,
		unsigned Size) {
		using namespace LegalityPredicates;
		return actionIf(
		LegalizeAction::WidenScalar, sizeNotMultipleOf(typeIdx(TypeIdx), Size),
		LegalizeMutations::widenScalarOrEltToNextMultipleOf(TypeIdx, Size));
		mbrkusaninUnsubmitted Not Done Reply Inline Actions Should we have a separate rule widenScalarToNextMultipleOfIf so we can take out this scalarWiderThan check? We need this check for targets that can work with s16. mbrkusanin: Should we have a separate rule widenScalarToNextMultipleOf**If** so we can take out this…
		}

/// Widen the scalar or vector element type to the next power of two that is		/// Widen the scalar or vector element type to the next power of two that is
/// at least MinSize. No effect if the scalar size is a power of two.		/// at least MinSize. No effect if the scalar size is a power of two.
LegalizeRuleSet &widenScalarOrEltToNextPow2(unsigned TypeIdx,		LegalizeRuleSet &widenScalarOrEltToNextPow2(unsigned TypeIdx,
unsigned MinSize = 0) {		unsigned MinSize = 0) {
using namespace LegalityPredicates;		using namespace LegalityPredicates;
return actionIf(		return actionIf(
LegalizeAction::WidenScalar, scalarOrEltSizeNotPow2(typeIdx(TypeIdx)),		LegalizeAction::WidenScalar, scalarOrEltSizeNotPow2(typeIdx(TypeIdx)),
LegalizeMutations::widenScalarOrEltToNextPow2(TypeIdx, MinSize));		LegalizeMutations::widenScalarOrEltToNextPow2(TypeIdx, MinSize));
▲ Show 20 Lines • Show All 352 Lines • Show Last 20 Lines

llvm/lib/CodeGen/GlobalISel/LegalityPredicates.cpp

	Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines

	LegalityPredicate LegalityPredicates::scalarOrEltSizeNotPow2(unsigned TypeIdx) {			LegalityPredicate LegalityPredicates::scalarOrEltSizeNotPow2(unsigned TypeIdx) {
	return [=](const LegalityQuery &Query) {			return [=](const LegalityQuery &Query) {
	const LLT QueryTy = Query.Types[TypeIdx];			const LLT QueryTy = Query.Types[TypeIdx];
	return !isPowerOf2_32(QueryTy.getScalarSizeInBits());			return !isPowerOf2_32(QueryTy.getScalarSizeInBits());
	};			};
	}			}

				LegalityPredicate LegalityPredicates::sizeNotMultipleOf(unsigned TypeIdx,
				unsigned Size) {
				return [=](const LegalityQuery &Query) {
				const LLT QueryTy = Query.Types[TypeIdx];
				foadUnsubmitted Not Done Reply Inline Actions Nit: don't need the parentheses but please do add a `!= 0`. foad: Nit: don't need the parentheses but please do add a `!= 0`.
				return QueryTy.isScalar() && QueryTy.getSizeInBits() % Size != 0;
				};
				}

	LegalityPredicate LegalityPredicates::sizeNotPow2(unsigned TypeIdx) {			LegalityPredicate LegalityPredicates::sizeNotPow2(unsigned TypeIdx) {
	return [=](const LegalityQuery &Query) {			return [=](const LegalityQuery &Query) {
	const LLT QueryTy = Query.Types[TypeIdx];			const LLT QueryTy = Query.Types[TypeIdx];
	return QueryTy.isScalar() && !isPowerOf2_32(QueryTy.getSizeInBits());			return QueryTy.isScalar() && !isPowerOf2_32(QueryTy.getSizeInBits());
	};			};
	}			}

	LegalityPredicate LegalityPredicates::sizeIs(unsigned TypeIdx, unsigned Size) {			LegalityPredicate LegalityPredicates::sizeIs(unsigned TypeIdx, unsigned Size) {
	Show All 32 Lines

llvm/lib/CodeGen/GlobalISel/LegalizeMutations.cpp

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	LegalizeMutation LegalizeMutations::widenScalarOrEltToNextPow2(unsigned TypeIdx,
return [=](const LegalityQuery &Query) {		return [=](const LegalityQuery &Query) {
const LLT Ty = Query.Types[TypeIdx];		const LLT Ty = Query.Types[TypeIdx];
unsigned NewEltSizeInBits =		unsigned NewEltSizeInBits =
std::max(1u << Log2_32_Ceil(Ty.getScalarSizeInBits()), Min);		std::max(1u << Log2_32_Ceil(Ty.getScalarSizeInBits()), Min);
return std::make_pair(TypeIdx, Ty.changeElementSize(NewEltSizeInBits));		return std::make_pair(TypeIdx, Ty.changeElementSize(NewEltSizeInBits));
};		};
}		}

		LegalizeMutation
		LegalizeMutations::widenScalarOrEltToNextMultipleOf(unsigned TypeIdx,
		unsigned Size) {
		return [=](const LegalityQuery &Query) {
		const LLT Ty = Query.Types[TypeIdx];
		unsigned NewEltSizeInBits = alignTo(Ty.getScalarSizeInBits(), Size);
		foadUnsubmitted Not Done Reply Inline Actions Use `alignTo(Ty.getScalarSizeInBits(), 32)`. foad: Use `alignTo(Ty.getScalarSizeInBits(), 32)`.
		return std::make_pair(TypeIdx, Ty.changeElementSize(NewEltSizeInBits));
		};
		}

LegalizeMutation LegalizeMutations::moreElementsToNextPow2(unsigned TypeIdx,		LegalizeMutation LegalizeMutations::moreElementsToNextPow2(unsigned TypeIdx,
unsigned Min) {		unsigned Min) {
return [=](const LegalityQuery &Query) {		return [=](const LegalityQuery &Query) {
const LLT VecTy = Query.Types[TypeIdx];		const LLT VecTy = Query.Types[TypeIdx];
unsigned NewNumElements =		unsigned NewNumElements =
std::max(1u << Log2_32_Ceil(VecTy.getNumElements()), Min);		std::max(1u << Log2_32_Ceil(VecTy.getNumElements()), Min);
return std::make_pair(		return std::make_pair(
TypeIdx, LLT::fixed_vector(NewNumElements, VecTy.getElementType()));		TypeIdx, LLT::fixed_vector(NewNumElements, VecTy.getElementType()));
};		};
}		}

LegalizeMutation LegalizeMutations::scalarize(unsigned TypeIdx) {		LegalizeMutation LegalizeMutations::scalarize(unsigned TypeIdx) {
return [=](const LegalityQuery &Query) {		return [=](const LegalityQuery &Query) {
return std::make_pair(TypeIdx, Query.Types[TypeIdx].getElementType());		return std::make_pair(TypeIdx, Query.Types[TypeIdx].getElementType());
};		};
}		}

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 526 Lines • ▼ Show 20 Lines	getActionDefinitionsBuilder(G_PHI)
.clampMaxNumElements(0, S32, 16)		.clampMaxNumElements(0, S32, 16)
.moreElementsIf(isSmallOddVector(0), oneMoreElement(0))		.moreElementsIf(isSmallOddVector(0), oneMoreElement(0))
.scalarize(0);		.scalarize(0);

if (ST.hasVOP3PInsts() && ST.hasAddNoCarry() && ST.hasIntClamp()) {		if (ST.hasVOP3PInsts() && ST.hasAddNoCarry() && ST.hasIntClamp()) {
// Full set of gfx9 features.		// Full set of gfx9 features.
getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL})		getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL})
.legalFor({S32, S16, V2S16})		.legalFor({S32, S16, V2S16})
.clampScalar(0, S16, S32)		.minScalar(0, S16)
.clampMaxNumElements(0, S16, 2)		.clampMaxNumElements(0, S16, 2)
.scalarize(0)		.widenScalarToNextMultipleOf(0, 32)
.widenScalarToNextPow2(0, 32);		.maxScalar(0, S32)
		.scalarize(0);

getActionDefinitionsBuilder({G_UADDSAT, G_USUBSAT, G_SADDSAT, G_SSUBSAT})		getActionDefinitionsBuilder({G_UADDSAT, G_USUBSAT, G_SADDSAT, G_SSUBSAT})
.legalFor({S32, S16, V2S16}) // Clamp modifier		.legalFor({S32, S16, V2S16}) // Clamp modifier
.minScalarOrElt(0, S16)		.minScalarOrElt(0, S16)
.clampMaxNumElements(0, S16, 2)		.clampMaxNumElements(0, S16, 2)
.scalarize(0)		.scalarize(0)
.widenScalarToNextPow2(0, 32)		.widenScalarToNextPow2(0, 32)
.lower();		.lower();
} else if (ST.has16BitInsts()) {		} else if (ST.has16BitInsts()) {
getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL})		getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL})
.legalFor({S32, S16})		.legalFor({S32, S16})
.clampScalar(0, S16, S32)		.minScalar(0, S16)
.scalarize(0)		.widenScalarToNextMultipleOf(0, 32)
		foadUnsubmitted Not Done Reply Inline Actions For this case (have 16 insts but no 16 bit packed insts) I think the order should be something like: .legalFor({S32, S16}) .minScalar(0, S16) .scalarize(0) .widenScalarToNextMultipleOf(0, 32) .maxScalar(0, S32) Otherwise v2s16 would be widened to v2s32 and then scalarized to s32. Instead we want to scalarize it to s16. foad: For this case (have 16 insts but no 16 bit packed insts) I think the order should be something…
		mbrkusaninUnsubmitted Not Done Reply Inline Actions llvm/test/CodeGen/AMDGPU/GlobalISel/mul.v2i16.ll has those tests and current version of the patch does not seem to affect it. It is still scalarized to s16. Either way scalarize() will be called first. widenScalarToNextMultipleOf() checks if type is scalar. mbrkusanin: llvm/test/CodeGen/AMDGPU/GlobalISel/mul.v2i16.ll has those tests and current version of the…
		foadUnsubmitted Not Done Reply Inline Actions Either way scalarize() will be called first. widenScalarToNextMultipleOf() checks if type is scalar. Oh yes, I missed that. This is OK then. foad: > Either way scalarize() will be called first. widenScalarToNextMultipleOf() checks if type is…
.widenScalarToNextPow2(0, 32); // FIXME: min should be 16		.maxScalar(0, S32)
		.scalarize(0);

// Technically the saturating operations require clamp bit support, but this		// Technically the saturating operations require clamp bit support, but this
// was introduced at the same time as 16-bit operations.		// was introduced at the same time as 16-bit operations.
getActionDefinitionsBuilder({G_UADDSAT, G_USUBSAT})		getActionDefinitionsBuilder({G_UADDSAT, G_USUBSAT})
.legalFor({S32, S16}) // Clamp modifier		.legalFor({S32, S16}) // Clamp modifier
.minScalar(0, S16)		.minScalar(0, S16)
.scalarize(0)		.scalarize(0)
.widenScalarToNextPow2(0, 16)		.widenScalarToNextPow2(0, 16)
.lower();		.lower();

// We're just lowering this, but it helps get a better result to try to		// We're just lowering this, but it helps get a better result to try to
// coerce to the desired type first.		// coerce to the desired type first.
getActionDefinitionsBuilder({G_SADDSAT, G_SSUBSAT})		getActionDefinitionsBuilder({G_SADDSAT, G_SSUBSAT})
.minScalar(0, S16)		.minScalar(0, S16)
.scalarize(0)		.scalarize(0)
.lower();		.lower();
} else {		} else {
getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL})		getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL})
.legalFor({S32})		.legalFor({S32})
		.widenScalarToNextMultipleOf(0, 32)
.clampScalar(0, S32, S32)		.clampScalar(0, S32, S32)
.scalarize(0);		.scalarize(0);

if (ST.hasIntClamp()) {		if (ST.hasIntClamp()) {
getActionDefinitionsBuilder({G_UADDSAT, G_USUBSAT})		getActionDefinitionsBuilder({G_UADDSAT, G_USUBSAT})
.legalFor({S32}) // Clamp modifier.		.legalFor({S32}) // Clamp modifier.
.scalarize(0)		.scalarize(0)
.minScalarOrElt(0, S32)		.minScalarOrElt(0, S32)
▲ Show 20 Lines • Show All 4,565 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-mul.mir

Show First 20 Lines • Show All 494 Lines • ▼ Show 20 Lines	bb.0:
%1:_(s32) = COPY $vgpr1		%1:_(s32) = COPY $vgpr1
%2:_(s24) = G_TRUNC %0		%2:_(s24) = G_TRUNC %0
%3:_(s24) = G_TRUNC %1		%3:_(s24) = G_TRUNC %1
%4:_(s24) = G_MUL %2, %3		%4:_(s24) = G_MUL %2, %3
%5:_(s32) = G_ANYEXT %4		%5:_(s32) = G_ANYEXT %4
$vgpr0 = COPY %5		$vgpr0 = COPY %5
...		...

# FIXME:		---
# ---		name: test_mul_s33
# name: test_mul_s33		body: \|
# body: \|		bb.0:
# bb.0:		liveins: $vgpr0_vgpr1, $vgpr2_vgpr3
# liveins: $vgpr0_vgpr1, $vgpr2_vgpr3		; GFX6-LABEL: name: test_mul_s33
		; GFX6: [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr0_vgpr1
# %0:_(s64) = COPY $vgpr0_vgpr1		; GFX6: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
# %1:_(s64) = COPY $vgpr2_vgpr3		; GFX6: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](s64)
# %2:_(s33) = G_TRUNC %0		; GFX6: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](s64)
# %3:_(s33) = G_TRUNC %1		; GFX6: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[UV]], [[UV2]]
# %4:_(s33) = G_MUL %2, %3		; GFX6: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UV1]], [[UV2]]
# %5:_(s64) = G_ANYEXT %4		; GFX6: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[UV]], [[UV3]]
# $vgpr0_vgpr1 = COPY %5		; GFX6: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[UV2]]
# ...		; GFX6: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[MUL1]], [[MUL2]]
		; GFX6: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[ADD]], [[UMULH]]
		; GFX6: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[MUL]](s32), [[ADD1]](s32)
		; GFX6: $vgpr0_vgpr1 = COPY [[MV]](s64)
		; GFX8-LABEL: name: test_mul_s33
		; GFX8: [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr0_vgpr1
		; GFX8: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
		; GFX8: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](s64)
		; GFX8: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](s64)
		; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[UV]], [[UV2]]
		; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UV1]], [[UV2]]
		; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[UV]], [[UV3]]
		; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[UV2]]
		; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[MUL1]], [[MUL2]]
		; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[ADD]], [[UMULH]]
		; GFX8: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[MUL]](s32), [[ADD1]](s32)
		; GFX8: $vgpr0_vgpr1 = COPY [[MV]](s64)
		; GFX9-LABEL: name: test_mul_s33
		; GFX9: [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr0_vgpr1
		; GFX9: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
		; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](s64)
		; GFX9: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](s64)
		; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[UV]], [[UV2]]
		; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UV1]], [[UV2]]
		; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[UV]], [[UV3]]
		; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[UV2]]
		; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[MUL1]], [[MUL2]]
		; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[ADD]], [[UMULH]]
		; GFX9: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[MUL]](s32), [[ADD1]](s32)
		; GFX9: $vgpr0_vgpr1 = COPY [[MV]](s64)
		%0:_(s64) = COPY $vgpr0_vgpr1
		%1:_(s64) = COPY $vgpr2_vgpr3
		%2:_(s33) = G_TRUNC %0
		%3:_(s33) = G_TRUNC %1
		%4:_(s33) = G_MUL %2, %3
		%5:_(s64) = G_ANYEXT %4
		$vgpr0_vgpr1 = COPY %5
		...

---		---
name: test_mul_s96		name: test_mul_s96
body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0_vgpr1_vgpr2, $vgpr3_vgpr4_vgpr5		liveins: $vgpr0_vgpr1_vgpr2, $vgpr3_vgpr4_vgpr5

; GFX6-LABEL: name: test_mul_s96		; GFX6-LABEL: name: test_mul_s96
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll

	Show First 20 Lines • Show All 270 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_mul_lo_u32 v0, v0, v2			; GFX10-NEXT: v_mul_lo_u32 v0, v0, v2
	; GFX10-NEXT: v_mul_lo_u32 v1, v1, v3			; GFX10-NEXT: v_mul_lo_u32 v1, v1, v3
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	%result = mul <2 x i32> %num, %den			%result = mul <2 x i32> %num, %den
	ret <2 x i32> %result			ret <2 x i32> %result
	}			}

				define amdgpu_cs i33 @s_mul_i33(i33 inreg %num, i33 inreg %den) {
				; GFX7-LABEL: s_mul_i33:
				; GFX7: ; %bb.0:
				; GFX7-NEXT: v_mov_b32_e32 v0, s2
				; GFX7-NEXT: v_mul_hi_u32 v0, s0, v0
				; GFX7-NEXT: s_mul_i32 s4, s0, s2
				; GFX7-NEXT: s_mul_i32 s1, s1, s2
				; GFX7-NEXT: s_mul_i32 s0, s0, s3
				; GFX7-NEXT: s_add_i32 s1, s1, s0
				; GFX7-NEXT: v_add_i32_e32 v0, vcc, s1, v0
				; GFX7-NEXT: v_readfirstlane_b32 s1, v0
				; GFX7-NEXT: s_mov_b32 s0, s4
				; GFX7-NEXT: ; return to shader part epilog
				;
				; GFX8-LABEL: s_mul_i33:
				; GFX8: ; %bb.0:
				; GFX8-NEXT: v_mov_b32_e32 v0, s2
				; GFX8-NEXT: v_mul_hi_u32 v0, s0, v0
				; GFX8-NEXT: s_mul_i32 s4, s0, s2
				; GFX8-NEXT: s_mul_i32 s1, s1, s2
				; GFX8-NEXT: s_mul_i32 s0, s0, s3
				; GFX8-NEXT: s_add_i32 s1, s1, s0
				; GFX8-NEXT: v_add_u32_e32 v0, vcc, s1, v0
				; GFX8-NEXT: v_readfirstlane_b32 s1, v0
				; GFX8-NEXT: s_mov_b32 s0, s4
				; GFX8-NEXT: ; return to shader part epilog
				;
				; GFX9-LABEL: s_mul_i33:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_mul_i32 s1, s1, s2
				; GFX9-NEXT: s_mul_i32 s3, s0, s3
				; GFX9-NEXT: s_mul_i32 s4, s0, s2
				; GFX9-NEXT: s_mul_hi_u32 s0, s0, s2
				; GFX9-NEXT: s_add_i32 s1, s1, s3
				; GFX9-NEXT: s_add_i32 s1, s1, s0
				; GFX9-NEXT: s_mov_b32 s0, s4
				; GFX9-NEXT: ; return to shader part epilog
				;
				; GFX10-LABEL: s_mul_i33:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_mul_i32 s1, s1, s2
				; GFX10-NEXT: s_mul_i32 s3, s0, s3
				; GFX10-NEXT: s_mul_hi_u32 s4, s0, s2
				; GFX10-NEXT: s_add_i32 s1, s1, s3
				; GFX10-NEXT: s_mul_i32 s0, s0, s2
				; GFX10-NEXT: s_add_i32 s1, s1, s4
				; GFX10-NEXT: ; return to shader part epilog
				%result = mul i33 %num, %den
				ret i33 %result
				}

	define amdgpu_ps i64 @s_mul_i64(i64 inreg %num, i64 inreg %den) {			define amdgpu_ps i64 @s_mul_i64(i64 inreg %num, i64 inreg %den) {
	; GFX7-LABEL: s_mul_i64:			; GFX7-LABEL: s_mul_i64:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: v_mov_b32_e32 v0, s2			; GFX7-NEXT: v_mov_b32_e32 v0, s2
	; GFX7-NEXT: v_mul_hi_u32 v0, s0, v0			; GFX7-NEXT: v_mul_hi_u32 v0, s0, v0
	; GFX7-NEXT: s_mul_i32 s4, s0, s2			; GFX7-NEXT: s_mul_i32 s4, s0, s2
	; GFX7-NEXT: s_mul_i32 s1, s1, s2			; GFX7-NEXT: s_mul_i32 s1, s1, s2
	; GFX7-NEXT: s_mul_i32 s0, s0, s3			; GFX7-NEXT: s_mul_i32 s0, s0, s3
	▲ Show 20 Lines • Show All 279 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_add_u32_e32 v2, v2, v10			; GFX9-NEXT: v_add_u32_e32 v2, v2, v10
	; GFX9-NEXT: v_add_u32_e32 v3, v8, v9			; GFX9-NEXT: v_add_u32_e32 v3, v8, v9
	; GFX9-NEXT: v_add3_u32 v1, v2, v5, v1			; GFX9-NEXT: v_add3_u32 v1, v2, v5, v1
	; GFX9-NEXT: v_add3_u32 v2, v1, v0, v3			; GFX9-NEXT: v_add3_u32 v2, v1, v0, v3
	; GFX9-NEXT: v_mov_b32_e32 v0, v6			; GFX9-NEXT: v_mov_b32_e32 v0, v6
	; GFX9-NEXT: v_mov_b32_e32 v1, v7			; GFX9-NEXT: v_mov_b32_e32 v1, v7
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: v_mul_i96:			; GFX10-LABEL: v_mul_i96:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_mul_lo_u32 v6, v1, v3			; GFX10-NEXT: v_mul_lo_u32 v6, v1, v3
	; GFX10-NEXT: v_mul_lo_u32 v7, v0, v4			; GFX10-NEXT: v_mul_lo_u32 v7, v0, v4
	; GFX10-NEXT: v_mul_hi_u32 v8, v0, v3			; GFX10-NEXT: v_mul_hi_u32 v8, v0, v3
	; GFX10-NEXT: v_mul_lo_u32 v9, v1, v4			; GFX10-NEXT: v_mul_lo_u32 v9, v1, v4
	; GFX10-NEXT: v_mul_lo_u32 v2, v2, v3			; GFX10-NEXT: v_mul_lo_u32 v2, v2, v3
	; GFX10-NEXT: v_mul_lo_u32 v5, v0, v5			; GFX10-NEXT: v_mul_lo_u32 v5, v0, v5
	; GFX10-NEXT: v_mul_hi_u32 v4, v0, v4			; GFX10-NEXT: v_mul_hi_u32 v4, v0, v4
	; GFX10-NEXT: v_mul_lo_u32 v0, v0, v3			; GFX10-NEXT: v_mul_lo_u32 v0, v0, v3
	; GFX10-NEXT: v_add_co_u32 v6, s4, v6, v7			; GFX10-NEXT: v_add_co_u32 v6, s4, v6, v7
	; GFX10-NEXT: v_mul_hi_u32 v7, v1, v3			; GFX10-NEXT: v_mul_hi_u32 v7, v1, v3
	; GFX10-NEXT: v_cndmask_b32_e64 v10, 0, 1, s4			; GFX10-NEXT: v_cndmask_b32_e64 v10, 0, 1, s4
	; GFX10-NEXT: v_add_nc_u32_e32 v2, v2, v9			; GFX10-NEXT: v_add_nc_u32_e32 v2, v2, v9
	; GFX10-NEXT: v_add_co_u32 v1, s4, v6, v8			; GFX10-NEXT: v_add_co_u32 v1, s4, v6, v8
	; GFX10-NEXT: v_cndmask_b32_e64 v6, 0, 1, s4			; GFX10-NEXT: v_cndmask_b32_e64 v6, 0, 1, s4
	; GFX10-NEXT: v_add3_u32 v2, v2, v5, v7			; GFX10-NEXT: v_add3_u32 v2, v2, v5, v7
	; GFX10-NEXT: v_add_nc_u32_e32 v3, v10, v6			; GFX10-NEXT: v_add_nc_u32_e32 v3, v10, v6
	; GFX10-NEXT: v_add3_u32 v2, v2, v4, v3			; GFX10-NEXT: v_add3_u32 v2, v2, v4, v3
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
				mbrkusaninUnsubmitted Not Done Reply Inline Actions Now s96 is widened to 128 and then truncated down to 96 which is why those add3 instructions are gone. They will only be selected for most significant register/bits. Here these registers will end up dead after trunc. A rule to widen to next multiple of 32 might be better then next power of 2 (might not make sense for scalars smaller then 16, because we want s16 in some cases). This way scalars in range (65,96) will be widened into 96, not 128. Same for anything above 128, we don't need to go from 4x32 to 8x32. So a rule like widenToNextMultipleOf32 followed by clampScalar(0, S16, S32) that is already there should do the trick. What do you think @foad? mbrkusanin: Now s96 is widened to 128 and then truncated down to 96 which is why those add3 instructions…
	%result = mul i96 %num, %den			%result = mul i96 %num, %den
	ret i96 %result			ret i96 %result
	}			}

	define amdgpu_ps <4 x i32> @s_mul_i128(i128 inreg %num, i128 inreg %den) {			define amdgpu_ps <4 x i32> @s_mul_i128(i128 inreg %num, i128 inreg %den) {
	; GFX7-LABEL: s_mul_i128:			; GFX7-LABEL: s_mul_i128:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: v_mov_b32_e32 v0, s4			; GFX7-NEXT: v_mov_b32_e32 v0, s4
	▲ Show 20 Lines • Show All 2,341 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Legalize G_MUL for non-standard types
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 371070

llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h

llvm/lib/CodeGen/GlobalISel/LegalityPredicates.cpp

llvm/lib/CodeGen/GlobalISel/LegalizeMutations.cpp

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-mul.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Legalize G_MUL for non-standard typesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 371070

llvm/include/llvm/CodeGen/GlobalISel/LegalizerInfo.h

llvm/lib/CodeGen/GlobalISel/LegalityPredicates.cpp

llvm/lib/CodeGen/GlobalISel/LegalizeMutations.cpp

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-mul.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll

[AMDGPU][GlobalISel] Legalize G_MUL for non-standard types
ClosedPublic