This is an archive of the discontinued LLVM Phabricator instance.

This is NFC. The condition was "not a power of two and not a multiple of 16". But this was already clamped to at least S32 on line 1616, so the condition simplifies to "not a multiple of 16".

However, I am sceptical that this condition is what was intended, since it doesn't seem to match the widening rule of "Pick the next power of 2, or a multiple of 64 over 128".

We probably only want 32-bit multiple merge results (except maybe with true16)

This revision is now accepted and ready to land.Feb 17 2023, 3:04 AM

Harbormaster completed remote builds in B214354: Diff 498293.Feb 17 2023, 3:11 AM

In D144250#4134530, @arsenm wrote:

We probably only want 32-bit multiple merge results (except maybe with true16)

AMDGPUInstructionSelector::selectG_UNMERGE_VALUES seems to assume that the source type has a corresponding register class. We certainly don't have register classes for every multiple of 32 up to MaxScalar (currently 1024). How should this work? Should the legalization be guided by exactly which register classes exist?

This revision was landed with ongoing or failed builds.Feb 17 2023, 3:12 AM

Closed by commit rG62e4f81c6793: [AMDGPU] Simplify widenScalar condition for BigTy for G_(UN)MERGE_VALUES (authored by foad). · Explain Why

This revision was automatically updated to reflect the committed changes.

foad added a commit: rG62e4f81c6793: [AMDGPU] Simplify widenScalar condition for BigTy for G_(UN)MERGE_VALUES.

In D144250#4134550, @foad wrote:

In D144250#4134530, @arsenm wrote:

We probably only want 32-bit multiple merge results (except maybe with true16)

AMDGPUInstructionSelector::selectG_UNMERGE_VALUES seems to assume that the source type has a corresponding register class. We certainly don't have register classes for every multiple of 32 up to MaxScalar (currently 1024). How should this work? Should the legalization be guided by exactly which register classes exist?

The point of the legalizer is to match the register classes, which would imply rounding up in the non-existent cases. I was thinking we would eventually just have every multiple of 32 register classes

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPULegalizerInfo.cpp

3 lines

Diff 498307

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 1,622 Lines • ▼ Show 20 Lines	if (Op == G_MERGE_VALUES) {
return Ty.getSizeInBits() < 32;		return Ty.getSizeInBits() < 32;
},		},
changeTo(LitTyIdx, S32));		changeTo(LitTyIdx, S32));
}		}

Builder.widenScalarIf(		Builder.widenScalarIf(
[=](const LegalityQuery &Query) {		[=](const LegalityQuery &Query) {
const LLT Ty = Query.Types[BigTyIdx];		const LLT Ty = Query.Types[BigTyIdx];
return !llvm::has_single_bit<uint32_t>(Ty.getSizeInBits()) &&		return Ty.getSizeInBits() % 16 != 0;
Ty.getSizeInBits() % 16 != 0;
},		},
[=](const LegalityQuery &Query) {		[=](const LegalityQuery &Query) {
// Pick the next power of 2, or a multiple of 64 over 128.		// Pick the next power of 2, or a multiple of 64 over 128.
// Whichever is smaller.		// Whichever is smaller.
const LLT &Ty = Query.Types[BigTyIdx];		const LLT &Ty = Query.Types[BigTyIdx];
unsigned NewSizeInBits = 1 << Log2_32_Ceil(Ty.getSizeInBits() + 1);		unsigned NewSizeInBits = 1 << Log2_32_Ceil(Ty.getSizeInBits() + 1);
if (NewSizeInBits >= 256) {		if (NewSizeInBits >= 256) {
unsigned RoundedTo = alignTo<64>(Ty.getSizeInBits() + 1);		unsigned RoundedTo = alignTo<64>(Ty.getSizeInBits() + 1);
▲ Show 20 Lines • Show All 4,180 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Simplify widenScalar condition for BigTy for G_(UN)MERGE_VALUESClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 498307

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

[AMDGPU] Simplify widenScalar condition for BigTy for G_(UN)MERGE_VALUES
ClosedPublic