This is an archive of the discontinued LLVM Phabricator instance.

GlobalISel: Fix narrowScalar for load/store with different mem size
ClosedPublic

Authored by arsenm on Jan 21 2019, 9:53 AM.

Download Raw Diff

Details

Reviewers

dsanders
volkan
aditya_nandakumar
aemerson

Summary

This was ignoring the memory size, and producing multiple loads/stores
if the operand size was different from the memory size.

I assume this is the intent of not having an explicit G_ANYEXTLOAD
(although I think that would probably be better).

Diff Detail

Event Timeline

arsenm created this revision.Jan 21 2019, 9:53 AM

Herald added subscribers: kristof.beyls, rovka, nhaehnle and 2 others. · View Herald TranscriptJan 21 2019, 9:53 AM

arsenm added a parent revision: D55814: GlobalISel: Support narrowing zextload/sextload.Jan 21 2019, 9:55 AM

In what situations can these kinds of loads be generated?

At least for AArch64, the extending loads combiner only runs before legalisation, so it wouldn't have a chance to combine these. In the absence of a G_ANYEXTLOAD opcode, I think having an explicit G_ZEXTLOAD is preferable here even though it's slightly pessimistic.

@dsanders thoughts?

Herald added a subscriber: Petar.Avramovic. · View Herald TranscriptJan 28 2019, 10:18 AM

In D57029#1373957, @aemerson wrote:

In what situations can these kinds of loads be generated?

At least for AArch64, the extending loads combiner only runs before legalisation, so it wouldn't have a chance to combine these. In the absence of a G_ANYEXTLOAD opcode, I think having an explicit G_ZEXTLOAD is preferable here even though it's slightly pessimistic.

@dsanders thoughts?

I'm operating under the assumption that eventually we'll have canonicalizations that look like what SelectionDAG does today, which involves producing extloads like this.

On AMDGPU in some cases on some sub targets, there is a codegen difference between zextload and aextload so I think we should have a way to distinguish these. The current apparent representation choice I think is bug prone. I have a few patches I haven't posted yet fixing legalization bugs from inconsistently assuming the result size is the same as the memory size.

In D57029#1373957, @aemerson wrote:

In what situations can these kinds of loads be generated?

At least for AArch64, the extending loads combiner only runs before legalisation, so it wouldn't have a chance to combine these. In the absence of a G_ANYEXTLOAD opcode, I think having an explicit G_ZEXTLOAD is preferable here even though it's slightly pessimistic.

@dsanders thoughts?

I haven't had chance to read the code (I'm juggling quite a few tasks right now) but an any-extending G_LOAD is a load extended with undefined bits. It seems reasonable for narrowScalar to just change the result type so long as the result type is at least as wide as the memory access. If it's narrower then it needs to start splitting the memory access into multiple G_LOADs too

Regarding the comment about not having a chance to combine them: I don't think it's particularly important how the extending G_LOAD came to exist in the MIR as the legalizers job is to take any/all inputs and constrain them. If the target needs to narrow them, there should be a path to do so. Regarding G_ZEXTLOAD being preferable, I think that's target specific. For example, some targets may have loads that target subregisters, effectively any-extending them. In that case, a zextload requires explicit additional code in the output to zero the remainder of the register. In general, I think it's better that the generic code tries not to constrain the code more than necessary. A given target can always map an any-extending load to a zextload if the any-extending load is pointless for them but they can't drop the zextload once it's there.

I assume this is the intent of not having an explicit G_ANYEXTLOAD
(although I think that would probably be better).

FWIW, I have a slight preference for a separate opcode but it's also quite nice to not have to check/modify the opcode when fixing up types.

LGTM then.

This revision is now accepted and ready to land.Jan 28 2019, 11:59 AM

r352523

Revision Contents

Path

Size

lib/

CodeGen/

GlobalISel/

LegalizerHelper.cpp

29 lines

Target/

AMDGPU/

AMDGPULegalizerInfo.cpp

28 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

legalize-load.mir

68 lines

legalize-store.mir

64 lines

Diff 182811

lib/CodeGen/GlobalISel/LegalizerHelper.cpp

Show First 20 Lines • Show All 473 Lines • ▼ Show 20 Lines	LegalizerHelper::LegalizeResult LegalizerHelper::narrowScalar(MachineInstr &MI,
}		}
case TargetOpcode::G_LOAD: {		case TargetOpcode::G_LOAD: {
// FIXME: add support for when SizeOp0 isn't an exact multiple of		// FIXME: add support for when SizeOp0 isn't an exact multiple of
// NarrowSize.		// NarrowSize.
if (SizeOp0 % NarrowSize != 0)		if (SizeOp0 % NarrowSize != 0)
return UnableToLegalize;		return UnableToLegalize;

const auto &MMO = **MI.memoperands_begin();		const auto &MMO = **MI.memoperands_begin();
		unsigned DstReg = MI.getOperand(0).getReg();
		LLT DstTy = MRI.getType(DstReg);

		if (8 * MMO.getSize() != DstTy.getSizeInBits()) {
		unsigned TmpReg = MRI.createGenericVirtualRegister(NarrowTy);
		auto &MMO = **MI.memoperands_begin();
		MIRBuilder.buildLoad(TmpReg, MI.getOperand(1).getReg(), MMO);
		MIRBuilder.buildAnyExt(DstReg, TmpReg);
		MI.eraseFromParent();
		return Legalized;
		}

// This implementation doesn't work for atomics. Give up instead of doing		// This implementation doesn't work for atomics. Give up instead of doing
// something invalid.		// something invalid.
if (MMO.getOrdering() != AtomicOrdering::NotAtomic \|\|		if (MMO.getOrdering() != AtomicOrdering::NotAtomic \|\|
MMO.getFailureOrdering() != AtomicOrdering::NotAtomic)		MMO.getFailureOrdering() != AtomicOrdering::NotAtomic)
return UnableToLegalize;		return UnableToLegalize;

int NumParts = SizeOp0 / NarrowSize;		int NumParts = SizeOp0 / NarrowSize;
LLT OffsetTy = LLT::scalar(		LLT OffsetTy = LLT::scalar(
Show All 13 Lines	for (int i = 0; i < NumParts; ++i) {

MIRBuilder.materializeGEP(SrcReg, MI.getOperand(1).getReg(), OffsetTy,		MIRBuilder.materializeGEP(SrcReg, MI.getOperand(1).getReg(), OffsetTy,
Adjustment);		Adjustment);

MIRBuilder.buildLoad(DstReg, SrcReg, *SplitMMO);		MIRBuilder.buildLoad(DstReg, SrcReg, *SplitMMO);

DstRegs.push_back(DstReg);		DstRegs.push_back(DstReg);
}		}
unsigned DstReg = MI.getOperand(0).getReg();
if(MRI.getType(DstReg).isVector())		if (DstTy.isVector())
MIRBuilder.buildBuildVector(DstReg, DstRegs);		MIRBuilder.buildBuildVector(DstReg, DstRegs);
else		else
MIRBuilder.buildMerge(DstReg, DstRegs);		MIRBuilder.buildMerge(DstReg, DstRegs);
MI.eraseFromParent();		MI.eraseFromParent();
return Legalized;		return Legalized;
}		}
case TargetOpcode::G_ZEXTLOAD:		case TargetOpcode::G_ZEXTLOAD:
case TargetOpcode::G_SEXTLOAD: {		case TargetOpcode::G_SEXTLOAD: {
Show All 24 Lines	LegalizerHelper::LegalizeResult LegalizerHelper::narrowScalar(MachineInstr &MI,
}		}
case TargetOpcode::G_STORE: {		case TargetOpcode::G_STORE: {
// FIXME: add support for when SizeOp0 isn't an exact multiple of		// FIXME: add support for when SizeOp0 isn't an exact multiple of
// NarrowSize.		// NarrowSize.
if (SizeOp0 % NarrowSize != 0)		if (SizeOp0 % NarrowSize != 0)
return UnableToLegalize;		return UnableToLegalize;

const auto &MMO = **MI.memoperands_begin();		const auto &MMO = **MI.memoperands_begin();

		unsigned SrcReg = MI.getOperand(0).getReg();
		LLT SrcTy = MRI.getType(SrcReg);

		if (8 * MMO.getSize() != SrcTy.getSizeInBits()) {
		unsigned TmpReg = MRI.createGenericVirtualRegister(NarrowTy);
		auto &MMO = **MI.memoperands_begin();
		MIRBuilder.buildTrunc(TmpReg, SrcReg);
		MIRBuilder.buildStore(TmpReg, MI.getOperand(1).getReg(), MMO);
		MI.eraseFromParent();
		return Legalized;
		}

// This implementation doesn't work for atomics. Give up instead of doing		// This implementation doesn't work for atomics. Give up instead of doing
// something invalid.		// something invalid.
if (MMO.getOrdering() != AtomicOrdering::NotAtomic \|\|		if (MMO.getOrdering() != AtomicOrdering::NotAtomic \|\|
MMO.getFailureOrdering() != AtomicOrdering::NotAtomic)		MMO.getFailureOrdering() != AtomicOrdering::NotAtomic)
return UnableToLegalize;		return UnableToLegalize;

int NumParts = SizeOp0 / NarrowSize;		int NumParts = SizeOp0 / NarrowSize;
LLT OffsetTy = LLT::scalar(		LLT OffsetTy = LLT::scalar(
▲ Show 20 Lines • Show All 888 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show All 30 Lines	AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST,
auto GetAddrSpacePtr = [&TM](unsigned AS) {		auto GetAddrSpacePtr = [&TM](unsigned AS) {
return LLT::pointer(AS, TM.getPointerSizeInBits(AS));		return LLT::pointer(AS, TM.getPointerSizeInBits(AS));
};		};

const LLT S1 = LLT::scalar(1);		const LLT S1 = LLT::scalar(1);
const LLT S16 = LLT::scalar(16);		const LLT S16 = LLT::scalar(16);
const LLT S32 = LLT::scalar(32);		const LLT S32 = LLT::scalar(32);
const LLT S64 = LLT::scalar(64);		const LLT S64 = LLT::scalar(64);
		const LLT S128 = LLT::scalar(128);
const LLT S256 = LLT::scalar(256);		const LLT S256 = LLT::scalar(256);
const LLT S512 = LLT::scalar(512);		const LLT S512 = LLT::scalar(512);

const LLT V2S16 = LLT::vector(2, 16);		const LLT V2S16 = LLT::vector(2, 16);
const LLT V4S16 = LLT::vector(4, 16);		const LLT V4S16 = LLT::vector(4, 16);
const LLT V8S16 = LLT::vector(8, 16);		const LLT V8S16 = LLT::vector(8, 16);

const LLT V2S32 = LLT::vector(2, 32);		const LLT V2S32 = LLT::vector(2, 32);
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	getActionDefinitionsBuilder(G_FSUB)
.clampScalar(0, S32, S64);		.clampScalar(0, S32, S64);

setAction({G_FCMP, S1}, Legal);		setAction({G_FCMP, S1}, Legal);
setAction({G_FCMP, 1, S32}, Legal);		setAction({G_FCMP, 1, S32}, Legal);
setAction({G_FCMP, 1, S64}, Legal);		setAction({G_FCMP, 1, S64}, Legal);

getActionDefinitionsBuilder({G_SEXT, G_ZEXT, G_ANYEXT})		getActionDefinitionsBuilder({G_SEXT, G_ZEXT, G_ANYEXT})
.legalFor({{S64, S32}, {S32, S16}, {S64, S16},		.legalFor({{S64, S32}, {S32, S16}, {S64, S16},
{S32, S1}, {S64, S1}, {S16, S1}});		{S32, S1}, {S64, S1}, {S16, S1},
		// FIXME: Hack
		{S128, S32}});

setAction({G_FPTOSI, S32}, Legal);		setAction({G_FPTOSI, S32}, Legal);
setAction({G_FPTOSI, 1, S32}, Legal);		setAction({G_FPTOSI, 1, S32}, Legal);

setAction({G_SITOFP, S32}, Legal);		setAction({G_SITOFP, S32}, Legal);
setAction({G_SITOFP, 1, S32}, Legal);		setAction({G_SITOFP, 1, S32}, Legal);

setAction({G_UITOFP, S32}, Legal);		setAction({G_UITOFP, S32}, Legal);
Show All 33 Lines	getActionDefinitionsBuilder(G_INTTOPTR)
});		});

getActionDefinitionsBuilder(G_PTRTOINT)		getActionDefinitionsBuilder(G_PTRTOINT)
.legalIf([](const LegalityQuery &Query) {		.legalIf([](const LegalityQuery &Query) {
return true;		return true;
});		});

getActionDefinitionsBuilder({G_LOAD, G_STORE})		getActionDefinitionsBuilder({G_LOAD, G_STORE})
		.narrowScalarIf([](const LegalityQuery &Query) {
		unsigned Size = Query.Types[0].getSizeInBits();
		unsigned MemSize = Query.MMODescrs[0].SizeInBits;
		return (Size > 32 && MemSize < Size);
		},
		[](const LegalityQuery &Query) {
		return std::make_pair(0, LLT::scalar(32));
		})
.legalIf([=, &ST](const LegalityQuery &Query) {		.legalIf([=, &ST](const LegalityQuery &Query) {
const LLT &Ty0 = Query.Types[0];		const LLT &Ty0 = Query.Types[0];

		unsigned Size = Ty0.getSizeInBits();
		unsigned MemSize = Query.MMODescrs[0].SizeInBits;
		if (Size > 32 && MemSize < Size)
		return false;

		if (Ty0.isVector() && Size != MemSize)
		return false;

// TODO: Decompose private loads into 4-byte components.		// TODO: Decompose private loads into 4-byte components.
// TODO: Illegal flat loads on SI		// TODO: Illegal flat loads on SI
switch (Ty0.getSizeInBits()) {		switch (MemSize) {
		case 8:
		case 16:
case 32:		case 32:
case 64:		case 64:
case 128:		case 128:
return true;		return true;

case 96:		case 96:
// XXX hasLoadX3		// XXX hasLoadX3
return (ST.getGeneration() >= AMDGPUSubtarget::SEA_ISLANDS);		return (ST.getGeneration() >= AMDGPUSubtarget::SEA_ISLANDS);

case 256:		case 256:
case 512:		case 512:
// TODO: constant loads		// TODO: constant loads
default:		default:
return false;		return false;
}		}
});		})
		.clampScalar(0, S32, S64);


auto &ExtLoads = getActionDefinitionsBuilder({G_SEXTLOAD, G_ZEXTLOAD})		auto &ExtLoads = getActionDefinitionsBuilder({G_SEXTLOAD, G_ZEXTLOAD})
.legalForTypesWithMemSize({		.legalForTypesWithMemSize({
{S32, GlobalPtr, 8},		{S32, GlobalPtr, 8},
{S32, GlobalPtr, 16},		{S32, GlobalPtr, 16},
{S32, LocalPtr, 8},		{S32, LocalPtr, 8},
{S32, LocalPtr, 16},		{S32, LocalPtr, 16},
▲ Show 20 Lines • Show All 173 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/GlobalISel/legalize-load.mir

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	bb.0:
; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1		; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
; CHECK: [[LOAD:%[0-9]+]]:_(<3 x s32>) = G_LOAD [[COPY]](p1) :: (load 12, align 4, addrspace 1)		; CHECK: [[LOAD:%[0-9]+]]:_(<3 x s32>) = G_LOAD [[COPY]](p1) :: (load 12, align 4, addrspace 1)
; CHECK: $vgpr0_vgpr1_vgpr2 = COPY [[LOAD]](<3 x s32>)		; CHECK: $vgpr0_vgpr1_vgpr2 = COPY [[LOAD]](<3 x s32>)
%0:_(p1) = COPY $vgpr0_vgpr1		%0:_(p1) = COPY $vgpr0_vgpr1
%1:_(<3 x s32>) = G_LOAD %0 :: (load 12, align 4, addrspace 1)		%1:_(<3 x s32>) = G_LOAD %0 :: (load 12, align 4, addrspace 1)

$vgpr0_vgpr1_vgpr2 = COPY %1		$vgpr0_vgpr1_vgpr2 = COPY %1
...		...

		---
		name: test_ext_load_global_s64_from_1_align1
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1

		; CHECK-LABEL: name: test_ext_load_global_s64_from_1_align1
		; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
		; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p1) :: (load 1, align 4, addrspace 1)
		; CHECK: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[LOAD]](s32)
		; CHECK: $vgpr0_vgpr1 = COPY [[ANYEXT]](s64)
		%0:_(p1) = COPY $vgpr0_vgpr1
		%1:_(s64) = G_LOAD %0 :: (load 1, addrspace 1, align 4)

		$vgpr0_vgpr1 = COPY %1
		...

		---
		name: test_ext_load_global_s64_from_2_align2
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1

		; CHECK-LABEL: name: test_ext_load_global_s64_from_2_align2
		; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
		; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p1) :: (load 2, align 4, addrspace 1)
		; CHECK: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[LOAD]](s32)
		; CHECK: $vgpr0_vgpr1 = COPY [[ANYEXT]](s64)
		%0:_(p1) = COPY $vgpr0_vgpr1
		%1:_(s64) = G_LOAD %0 :: (load 2, addrspace 1, align 4)

		$vgpr0_vgpr1 = COPY %1
		...

		---
		name: test_ext_load_global_s64_from_4_align4
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1

		; CHECK-LABEL: name: test_ext_load_global_s64_from_4_align4
		; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
		; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p1) :: (load 4, addrspace 1)
		; CHECK: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[LOAD]](s32)
		; CHECK: $vgpr0_vgpr1 = COPY [[ANYEXT]](s64)
		%0:_(p1) = COPY $vgpr0_vgpr1
		%1:_(s64) = G_LOAD %0 :: (load 4, addrspace 1, align 4)

		$vgpr0_vgpr1 = COPY %1
		...

		---
		name: test_ext_load_global_s128_from_4_align4
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1

		; CHECK-LABEL: name: test_ext_load_global_s128_from_4_align4
		; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
		; CHECK: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p1) :: (load 4, addrspace 1)
		; CHECK: [[ANYEXT:%[0-9]+]]:_(s128) = G_ANYEXT [[LOAD]](s32)
		; CHECK: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[ANYEXT]](s128)
		%0:_(p1) = COPY $vgpr0_vgpr1
		%1:_(s128) = G_LOAD %0 :: (load 4, addrspace 1, align 4)

		$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %1
		...

test/CodeGen/AMDGPU/GlobalISel/legalize-store.mir

Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	bb.0:
; CHECK-LABEL: name: test_store_global_v3s32		; CHECK-LABEL: name: test_store_global_v3s32
; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1		; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
; CHECK: [[COPY1:%[0-9]+]]:_(<3 x s32>) = COPY $vgpr2_vgpr3_vgpr4		; CHECK: [[COPY1:%[0-9]+]]:_(<3 x s32>) = COPY $vgpr2_vgpr3_vgpr4
; CHECK: G_STORE [[COPY1]](<3 x s32>), [[COPY]](p1) :: (store 12, align 4, addrspace 1)		; CHECK: G_STORE [[COPY1]](<3 x s32>), [[COPY]](p1) :: (store 12, align 4, addrspace 1)
%0:_(p1) = COPY $vgpr0_vgpr1		%0:_(p1) = COPY $vgpr0_vgpr1
%1:_(<3 x s32>) = COPY $vgpr2_vgpr3_vgpr4		%1:_(<3 x s32>) = COPY $vgpr2_vgpr3_vgpr4
G_STORE %1, %0 :: (store 12, align 4, addrspace 1)		G_STORE %1, %0 :: (store 12, align 4, addrspace 1)
...		...

		---
		name: test_truncestore_global_s64_to_s8
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1, $vgpr2_vgpr3

		; CHECK-LABEL: name: test_truncestore_global_s64_to_s8
		; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
		; CHECK: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
		; CHECK: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY1]](s64)
		; CHECK: G_STORE [[TRUNC]](s32), [[COPY]](p1) :: (store 1, addrspace 1)
		%0:_(p1) = COPY $vgpr0_vgpr1
		%1:_(s64) = COPY $vgpr2_vgpr3
		G_STORE %1, %0 :: (store 1, addrspace 1)
		...

		---
		name: test_truncestore_global_s64_to_s16
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1, $vgpr2_vgpr3

		; CHECK-LABEL: name: test_truncestore_global_s64_to_s16
		; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
		; CHECK: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
		; CHECK: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY1]](s64)
		; CHECK: G_STORE [[TRUNC]](s32), [[COPY]](p1) :: (store 1, addrspace 1)
		%0:_(p1) = COPY $vgpr0_vgpr1
		%1:_(s64) = COPY $vgpr2_vgpr3
		G_STORE %1, %0 :: (store 1, addrspace 1)
		...

		---
		name: test_truncestore_global_s64_to_s32
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1, $vgpr2_vgpr3

		; CHECK-LABEL: name: test_truncestore_global_s64_to_s32
		; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
		; CHECK: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
		; CHECK: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY1]](s64)
		; CHECK: G_STORE [[TRUNC]](s32), [[COPY]](p1) :: (store 4, addrspace 1)
		%0:_(p1) = COPY $vgpr0_vgpr1
		%1:_(s64) = COPY $vgpr2_vgpr3
		G_STORE %1, %0 :: (store 4, addrspace 1)
		...

		---
		name: test_truncestore_global_s128_to_s16
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1, $vgpr2_vgpr3_vgpr4_vgpr5

		; CHECK-LABEL: name: test_truncestore_global_s128_to_s16
		; CHECK: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
		; CHECK: [[COPY1:%[0-9]+]]:_(s128) = COPY $vgpr2_vgpr3_vgpr4_vgpr5
		; CHECK: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY1]](s128)
		; CHECK: G_STORE [[TRUNC]](s32), [[COPY]](p1) :: (store 1, addrspace 1)
		%0:_(p1) = COPY $vgpr0_vgpr1
		%1:_(s128) = COPY $vgpr2_vgpr3_vgpr4_vgpr5
		G_STORE %1, %0 :: (store 1, addrspace 1)
		...