llvm/lib/Target/AMDGPU/AMDGPUCombine.td
48–52	This is generic
llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
162–164	The DAG treats this as an initial canonicalization, so the obvious codegen benefit isn't so important
166–170	Don't see why this would restrict the vector type

Pierre-vh added inline comments.Oct 13 2022, 12:46 AM

llvm/lib/Target/AMDGPU/AMDGPUCombine.td
48–52	Do you mean it should go in the generic combiner? I'm worried that if we put it there, and remove the filtering on small vectors/hasScalarPackInsts that all INSERT_VECTOR_ELT instructions will become SHUFFLE_VECTOR and that some targets won't like it? If we remove if (!MI.getMF()->getSubtarget<GCNSubtarget>().hasScalarPackInsts()) return false; // TODO: Only on small vectors? LLT VecTy = MRI.getType(MI.getOperand(0).getReg()); if (VecTy.getElementType() != LLT::scalar(16) \|\| (VecTy.getSizeInBits() % 32) != 0) return false; I would leave it in the AMDGPUCombiner, if we want to make it generic, I would at least add some safeguard so it doesn't turn every INSERT_VECTOR_ELT into a shuffle - maybe only do it for 2-elt vectors? That or just don't add it to "all_combines" - we put it in the generic helper but it's opt-in and targets have too add the combine to their pipeline.
llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
166–170	Shouldn't this just be on 2-element vectors?

nhaehnle removed a subscriber: nhaehnle.Oct 13 2022, 1:29 AM

What is the motivation for this?

Can you add a comment somewhere explaining what the combine does?

Relax some restrictions on the combine and add comment to describe why the current restrictions are in place

Harbormaster completed remote builds in B192918: Diff 468794.Oct 19 2022, 12:07 AM

In D135145#3861774, @foad wrote:

What is the motivation for this?

Can you add a comment somewhere explaining what the combine does?

We want to use SHUFFLE_VECTOR (which is always lowered during legalization anyway) as the canonical form for this kind of operation (INSERT_VECTOR_ELT w/ a constant index on small vectors). It benefits mad_mix codegen.

Rebase

Harbormaster completed remote builds in B193429: Diff 469479.Oct 21 2022, 12:36 AM

foad added inline comments.Oct 24 2022, 7:29 AM

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
162	"G_SHUFFLE_VECTOR"

Comment
@arsenm please review so D134354 can land?

Harbormaster completed remote builds in B194145: Diff 470445.Oct 25 2022, 6:21 AM

ping

arsenm added inline comments.Oct 27 2022, 8:12 AM

llvm/lib/Target/AMDGPU/AMDGPUCombine.td
48–52	Yes. This is a generic combine as it is. What the target directly wants isn't necessarily the point. A larger shuffle should be legalizable to what the target does want, and is a better canonical form
48–52	By as-is I mean DAGCombiner

Rebase on D136922, make combine generic

Pierre-vh retitled this revision from [AMDGPU][GISel] Combine G_INSERT_VECTOR_ELT to G_SHUFFLE_VECTOR to [GISel] Combine G_INSERT_VECTOR_ELT to G_SHUFFLE_VECTOR.Oct 28 2022, 1:17 AM

Pierre-vh edited the summary of this revision. (Show Details)

Pierre-vh added a parent revision: D136922: [AMDGPU][GISel] Widen s16 SHUFFLE_VECTOR where there are no scalar pack insts.

Harbormaster completed remote builds in B194850: Diff 471430.Oct 28 2022, 2:19 AM

arsenm added inline comments.Nov 1 2022, 2:36 PM

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
2688–2694 ↗	(On Diff #471430)	There are cases where insert_vector_elts combine to form shuffles but you don't seem to be handling those. This looks like you're just handling basic cases that can use build_vector (which is already implemented in matchCombineInsertVecElts). I'm not following what the shuffles are adding here
2697 ↗	(On Diff #471430)	I'm not really sure why this eraseInst helper exists
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-insertvecelt-to-shufflevector.mir
2 ↗	(On Diff #471430)	Don't need -global-isel with -run-pass

Pierre-vh mentioned this in D136922: [AMDGPU][GISel] Widen s16 SHUFFLE_VECTOR where there are no scalar pack insts.Nov 2 2022, 2:10 AM

Comments + rebase

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
2688–2694 ↗	(On Diff #471430)	I thought we ultimately wanted insert_vector_elt to be lowered to shuffle_vector, as the latter is easier to handle (and is needed for mad_mix selection)? Is that not the case? It the reason why we're lowering shuffle vectors now, no? In any case, I'm not sure I understand the issue: Is it that the combine is unnecessary? (Then why are working towards this? Why did we lower shuffle vector in the Legalizer?) Is it that the combine as-is is fine, but should be handling more (like chained insert_vector_elt) ? But then, what makes it different from matchCombineInsertVecElts? Note that, IIRC, matchCombineInsertVecElts only handles chains of insert_vector_elt. If there's a single one, it doesn't touch it. This combine is targeted towards single insert_vector_elts.
2697 ↗	(On Diff #471430)	Not sure either, it's a Combiner helper. I thought it was doing some other things like notifying the observer but it really just calls MI.eraseFromParent(). I've removed this use and will propose a patch to remove it entirely.

Harbormaster completed remote builds in B195651: Diff 472543.Nov 2 2022, 2:58 AM

Not sure yet this is the right thing to do, I can resurrect the diff later if we still want to do it.

Pierre-vh mentioned this in D134354: [AMDGPU][GlobalISel] Support mad/fma_mix selection.Nov 6 2022, 11:48 PM

Diff 468794

llvm/lib/Target/AMDGPU/AMDGPUCombine.td

Show All 39 Lines	def cvt_f32_ubyteN : GICombineRule<
(defs root:$cvt_f32_ubyteN, cvt_f32_ubyteN_matchdata:$matchinfo),		(defs root:$cvt_f32_ubyteN, cvt_f32_ubyteN_matchdata:$matchinfo),
(match (wip_match_opcode G_AMDGPU_CVT_F32_UBYTE0,		(match (wip_match_opcode G_AMDGPU_CVT_F32_UBYTE0,
G_AMDGPU_CVT_F32_UBYTE1,		G_AMDGPU_CVT_F32_UBYTE1,
G_AMDGPU_CVT_F32_UBYTE2,		G_AMDGPU_CVT_F32_UBYTE2,
G_AMDGPU_CVT_F32_UBYTE3):$cvt_f32_ubyteN,		G_AMDGPU_CVT_F32_UBYTE3):$cvt_f32_ubyteN,
[{ return PostLegalizerHelper.matchCvtF32UByteN(*${cvt_f32_ubyteN}, ${matchinfo}); }]),		[{ return PostLegalizerHelper.matchCvtF32UByteN(*${cvt_f32_ubyteN}, ${matchinfo}); }]),
(apply [{ PostLegalizerHelper.applyCvtF32UByteN(*${cvt_f32_ubyteN}, ${matchinfo}); }])>;		(apply [{ PostLegalizerHelper.applyCvtF32UByteN(*${cvt_f32_ubyteN}, ${matchinfo}); }])>;

		def insert_vec_elt_to_shuffle : GICombineRule<
		(defs root:$insertelt, unsigned_matchinfo:$matchinfo),
		(match (wip_match_opcode G_INSERT_VECTOR_ELT):$insertelt,
		[{ return PreLegalizerHelper.matchInsertVectorEltToShuffle(*${insertelt}, ${matchinfo}); }]),
		(apply [{ PreLegalizerHelper.applyInsertVectorEltToShuffle(*${insertelt}, ${matchinfo}); }])>;
		arsenmUnsubmitted Done Reply Inline Actions This is generic arsenm: This is generic
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Do you mean it should go in the generic combiner? I'm worried that if we put it there, and remove the filtering on small vectors/hasScalarPackInsts that all INSERT_VECTOR_ELT instructions will become SHUFFLE_VECTOR and that some targets won't like it? If we remove if (!MI.getMF()->getSubtarget<GCNSubtarget>().hasScalarPackInsts()) return false; // TODO: Only on small vectors? LLT VecTy = MRI.getType(MI.getOperand(0).getReg()); if (VecTy.getElementType() != LLT::scalar(16) \|\| (VecTy.getSizeInBits() % 32) != 0) return false; I would leave it in the AMDGPUCombiner, if we want to make it generic, I would at least add some safeguard so it doesn't turn every INSERT_VECTOR_ELT into a shuffle - maybe only do it for 2-elt vectors? That or just don't add it to "all_combines" - we put it in the generic helper but it's opt-in and targets have too add the combine to their pipeline. Pierre-vh: Do you mean it should go in the generic combiner? I'm worried that if we put it there, and…
		arsenmUnsubmitted Done Reply Inline Actions Yes. This is a generic combine as it is. What the target directly wants isn't necessarily the point. A larger shuffle should be legalizable to what the target does want, and is a better canonical form arsenm: Yes. This is a generic combine as it is. What the target directly wants isn't necessarily the…
		arsenmUnsubmitted Done Reply Inline Actions By as-is I mean DAGCombiner arsenm: By as-is I mean DAGCombiner

def clamp_i64_to_i16_matchdata : GIDefMatchData<"AMDGPUPreLegalizerCombinerHelper::ClampI64ToI16MatchInfo">;		def clamp_i64_to_i16_matchdata : GIDefMatchData<"AMDGPUPreLegalizerCombinerHelper::ClampI64ToI16MatchInfo">;

def clamp_i64_to_i16 : GICombineRule<		def clamp_i64_to_i16 : GICombineRule<
(defs root:$clamp_i64_to_i16, clamp_i64_to_i16_matchdata:$matchinfo),		(defs root:$clamp_i64_to_i16, clamp_i64_to_i16_matchdata:$matchinfo),
(match (wip_match_opcode G_TRUNC):$clamp_i64_to_i16,		(match (wip_match_opcode G_TRUNC):$clamp_i64_to_i16,
[{ return PreLegalizerHelper.matchClampI64ToI16(${clamp_i64_to_i16}, MRI, MF, ${matchinfo}); }]),		[{ return PreLegalizerHelper.matchClampI64ToI16(${clamp_i64_to_i16}, MRI, MF, ${matchinfo}); }]),
(apply [{ PreLegalizerHelper.applyClampI64ToI16(*${clamp_i64_to_i16}, ${matchinfo}); }])>;		(apply [{ PreLegalizerHelper.applyClampI64ToI16(*${clamp_i64_to_i16}, ${matchinfo}); }])>;

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	(match (wip_match_opcode G_FNEG):$ffn,
[{ return Helper.matchFoldableFneg(*${ffn}, ${matchinfo}); }]),		[{ return Helper.matchFoldableFneg(*${ffn}, ${matchinfo}); }]),
(apply [{ Helper.applyFoldableFneg(*${ffn}, ${matchinfo}); }])>;		(apply [{ Helper.applyFoldableFneg(*${ffn}, ${matchinfo}); }])>;

// Combines which should only apply on SI/VI		// Combines which should only apply on SI/VI
def gfx6gfx7_combines : GICombineGroup<[fcmp_select_to_fmin_fmax_legacy]>;		def gfx6gfx7_combines : GICombineGroup<[fcmp_select_to_fmin_fmax_legacy]>;

def AMDGPUPreLegalizerCombinerHelper: GICombinerHelper<		def AMDGPUPreLegalizerCombinerHelper: GICombinerHelper<
"AMDGPUGenPreLegalizerCombinerHelper",		"AMDGPUGenPreLegalizerCombinerHelper",
[all_combines, clamp_i64_to_i16, foldable_fneg]> {		[all_combines, clamp_i64_to_i16, foldable_fneg, insert_vec_elt_to_shuffle]> {
let DisableRuleOption = "amdgpuprelegalizercombiner-disable-rule";		let DisableRuleOption = "amdgpuprelegalizercombiner-disable-rule";
let StateClass = "AMDGPUPreLegalizerCombinerHelperState";		let StateClass = "AMDGPUPreLegalizerCombinerHelperState";
let AdditionalArguments = [];		let AdditionalArguments = [];
}		}

def AMDGPUPostLegalizerCombinerHelper: GICombinerHelper<		def AMDGPUPostLegalizerCombinerHelper: GICombinerHelper<
"AMDGPUGenPostLegalizerCombinerHelper",		"AMDGPUGenPostLegalizerCombinerHelper",
[all_combines, gfx6gfx7_combines,		[all_combines, gfx6gfx7_combines,
Show All 15 Lines

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	public:
};		};

bool matchClampI64ToI16(MachineInstr &MI, MachineRegisterInfo &MRI,		bool matchClampI64ToI16(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineFunction &MF,		MachineFunction &MF,
ClampI64ToI16MatchInfo &MatchInfo);		ClampI64ToI16MatchInfo &MatchInfo);

void applyClampI64ToI16(MachineInstr &MI,		void applyClampI64ToI16(MachineInstr &MI,
const ClampI64ToI16MatchInfo &MatchInfo);		const ClampI64ToI16MatchInfo &MatchInfo);

		bool matchInsertVectorEltToShuffle(MachineInstr &MI, unsigned &Idx);
		void applyInsertVectorEltToShuffle(MachineInstr &MI, unsigned &Idx);
};		};

bool AMDGPUPreLegalizerCombinerHelper::matchClampI64ToI16(		bool AMDGPUPreLegalizerCombinerHelper::matchClampI64ToI16(
MachineInstr &MI, MachineRegisterInfo &MRI, MachineFunction &MF,		MachineInstr &MI, MachineRegisterInfo &MRI, MachineFunction &MF,
ClampI64ToI16MatchInfo &MatchInfo) {		ClampI64ToI16MatchInfo &MatchInfo) {
assert(MI.getOpcode() == TargetOpcode::G_TRUNC && "Invalid instruction!");		assert(MI.getOpcode() == TargetOpcode::G_TRUNC && "Invalid instruction!");

// Try to find a pattern where an i64 value should get clamped to short.		// Try to find a pattern where an i64 value should get clamped to short.
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	auto Med3 = B.buildInstr(
{MinBoundaryDst.getReg(0), Bitcast.getReg(0), MaxBoundaryDst.getReg(0)},		{MinBoundaryDst.getReg(0), Bitcast.getReg(0), MaxBoundaryDst.getReg(0)},
MI.getFlags());		MI.getFlags());

B.buildTrunc(MI.getOperand(0).getReg(), Med3);		B.buildTrunc(MI.getOperand(0).getReg(), Med3);

MI.eraseFromParent();		MI.eraseFromParent();
}		}

		bool AMDGPUPreLegalizerCombinerHelper::matchInsertVectorEltToShuffle(
		MachineInstr &MI, unsigned &Idx) {
		// Transfroms a G_INSERT_VECTOR_ELT into an equivalent G_SHUFFLE_MASK if:
		foadUnsubmitted Done Reply Inline Actions "G_SHUFFLE_VECTOR" foad: "G_SHUFFLE_VECTOR"
		// - Scalar Pack insts are present (for <32 bits element types)
		// - The vector has <= 4 elements.
		arsenmUnsubmitted Done Reply Inline Actions The DAG treats this as an initial canonicalization, so the obvious codegen benefit isn't so important arsenm: The DAG treats this as an initial canonicalization, so the obvious codegen benefit isn't so…
		// as this is a preferred canonical form of the operation.
		//
		// Note that both restrictions are arbitrary. Currently, it's mostly targeted
		// towards 2x16 vectors. Restrictions could be relaxed or entirely removed in
		// the future if codegen can handle it without causing regressions.

		arsenmUnsubmitted Not Done Reply Inline Actions Don't see why this would restrict the vector type arsenm: Don't see why this would restrict the vector type
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Shouldn't this just be on 2-element vectors? Pierre-vh: Shouldn't this just be on 2-element vectors?
		LLT VecTy = MRI.getType(MI.getOperand(0).getReg());
		const unsigned EltSize = VecTy.getElementType().getSizeInBits();
		if (EltSize < 32 &&
		!MI.getMF()->getSubtarget<GCNSubtarget>().hasScalarPackInsts())
		return false;

		if (VecTy.isScalable() \|\| VecTy.getNumElements() > 4)
		return false;

		Optional<ValueAndVReg> MaybeIdxVal =
		getIConstantVRegValWithLookThrough(MI.getOperand(3).getReg(), MRI);
		if (!MaybeIdxVal)
		return false;

		Idx = MaybeIdxVal->Value.getZExtValue();
		return true;
		}

		void AMDGPUPreLegalizerCombinerHelper::applyInsertVectorEltToShuffle(
		MachineInstr &MI, unsigned &Idx) {
		B.setInstrAndDebugLoc(MI);

		Register Ins = MI.getOperand(2).getReg();
		Register Vec = MI.getOperand(1).getReg();
		Register Dst = MI.getOperand(0).getReg();

		LLT VecTy = MRI.getType(Dst);
		LLT EltTy = VecTy.getElementType();
		const unsigned NumElts = VecTy.getNumElements();

		const auto Undef = MRI.createGenericVirtualRegister(EltTy);
		B.buildUndef(Undef);

		const auto OtherVec = MRI.createGenericVirtualRegister(VecTy);

		SmallVector<Register, 4> Srcs;
		Srcs.push_back(Ins);
		for (unsigned K = 1; K < NumElts; ++K)
		Srcs.push_back(Undef);

		B.buildBuildVector(OtherVec, Srcs);

		// NumElts == Ins in OtherVec
		// 0...(NumElts-1) = Original elements
		SmallVector<int, 4> ShuffleMask;
		for (unsigned CurIdx = 0; CurIdx < NumElts; ++CurIdx) {
		if (CurIdx == Idx)
		ShuffleMask.push_back(NumElts);
		else
		ShuffleMask.push_back(CurIdx);
		}

		B.buildShuffleVector(Dst, Vec, OtherVec, ShuffleMask);
		Helper.eraseInst(MI);
		}

class AMDGPUPreLegalizerCombinerHelperState {		class AMDGPUPreLegalizerCombinerHelperState {
protected:		protected:
AMDGPUCombinerHelper &Helper;		AMDGPUCombinerHelper &Helper;
AMDGPUPreLegalizerCombinerHelper &PreLegalizerHelper;		AMDGPUPreLegalizerCombinerHelper &PreLegalizerHelper;

public:		public:
AMDGPUPreLegalizerCombinerHelperState(		AMDGPUPreLegalizerCombinerHelperState(
AMDGPUCombinerHelper &Helper,		AMDGPUCombinerHelper &Helper,
▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/prelegalizer-combiner-insertvecelt-to-shufflevector.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -run-pass=amdgpu-prelegalizer-combiner -verify-machineinstrs -o - %s \| FileCheck %s --check-prefixes=CHECK,GFX9PLUS
				# RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -run-pass=amdgpu-prelegalizer-combiner -verify-machineinstrs -o - %s \| FileCheck %s --check-prefixes=CHECK,VI

				---
				name: test_v2s16_idx0
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0
				; GFX9PLUS-LABEL: name: test_v2s16_idx0
				; GFX9PLUS: liveins: $vgpr0
				; GFX9PLUS-NEXT: {{ $}}
				; GFX9PLUS-NEXT: %src:_(<2 x s16>) = COPY $vgpr0
				; GFX9PLUS-NEXT: %elt:_(s16) = G_CONSTANT i16 42
				; GFX9PLUS-NEXT: [[DEF:%[0-9]+]]:_(s16) = G_IMPLICIT_DEF
				; GFX9PLUS-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR %elt(s16), [[DEF]](s16)
				; GFX9PLUS-NEXT: %ins:_(<2 x s16>) = G_SHUFFLE_VECTOR %src(<2 x s16>), [[BUILD_VECTOR]], shufflemask(2, 1)
				; GFX9PLUS-NEXT: $vgpr0 = COPY %ins(<2 x s16>)
				; VI-LABEL: name: test_v2s16_idx0
				; VI: liveins: $vgpr0
				; VI-NEXT: {{ $}}
				; VI-NEXT: %src:_(<2 x s16>) = COPY $vgpr0
				; VI-NEXT: %idx:_(s32) = G_CONSTANT i32 0
				; VI-NEXT: %elt:_(s16) = G_CONSTANT i16 42
				; VI-NEXT: %ins:_(<2 x s16>) = G_INSERT_VECTOR_ELT %src, %elt(s16), %idx(s32)
				; VI-NEXT: $vgpr0 = COPY %ins(<2 x s16>)
				%src:_(<2 x s16>) = COPY $vgpr0
				%idx:_(s32) = G_CONSTANT i32 0
				%elt:_(s16) = G_CONSTANT i16 42
				%ins:_(<2 x s16>) = G_INSERT_VECTOR_ELT %src, %elt, %idx
				$vgpr0 = COPY %ins
				...

				---
				name: test_v2s16_idx1
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0
				; GFX9PLUS-LABEL: name: test_v2s16_idx1
				; GFX9PLUS: liveins: $vgpr0
				; GFX9PLUS-NEXT: {{ $}}
				; GFX9PLUS-NEXT: %src:_(<2 x s16>) = COPY $vgpr0
				; GFX9PLUS-NEXT: %elt:_(s16) = G_CONSTANT i16 42
				; GFX9PLUS-NEXT: [[DEF:%[0-9]+]]:_(s16) = G_IMPLICIT_DEF
				; GFX9PLUS-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR %elt(s16), [[DEF]](s16)
				; GFX9PLUS-NEXT: %ins:_(<2 x s16>) = G_SHUFFLE_VECTOR %src(<2 x s16>), [[BUILD_VECTOR]], shufflemask(0, 2)
				; GFX9PLUS-NEXT: $vgpr0 = COPY %ins(<2 x s16>)
				; VI-LABEL: name: test_v2s16_idx1
				; VI: liveins: $vgpr0
				; VI-NEXT: {{ $}}
				; VI-NEXT: %src:_(<2 x s16>) = COPY $vgpr0
				; VI-NEXT: %idx:_(s32) = G_CONSTANT i32 1
				; VI-NEXT: %elt:_(s16) = G_CONSTANT i16 42
				; VI-NEXT: %ins:_(<2 x s16>) = G_INSERT_VECTOR_ELT %src, %elt(s16), %idx(s32)
				; VI-NEXT: $vgpr0 = COPY %ins(<2 x s16>)
				%src:_(<2 x s16>) = COPY $vgpr0
				%idx:_(s32) = G_CONSTANT i32 1
				%elt:_(s16) = G_CONSTANT i16 42
				%ins:_(<2 x s16>) = G_INSERT_VECTOR_ELT %src, %elt, %idx
				$vgpr0 = COPY %ins
				...

				---
				name: test_v2s16_idx2_nofold
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0
				; CHECK-LABEL: name: test_v2s16_idx2_nofold
				; CHECK: liveins: $vgpr0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: %ins:_(<2 x s16>) = G_IMPLICIT_DEF
				; CHECK-NEXT: $vgpr0 = COPY %ins(<2 x s16>)
				%src:_(<2 x s16>) = COPY $vgpr0
				%idx:_(s32) = G_CONSTANT i32 2
				%elt:_(s16) = G_CONSTANT i16 42
				%ins:_(<2 x s16>) = G_INSERT_VECTOR_ELT %src, %elt, %idx
				$vgpr0 = COPY %ins
				...

				---
				name: test_v3s16_idx2
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0_vgpr1_vgpr2
				; GFX9PLUS-LABEL: name: test_v3s16_idx2
				; GFX9PLUS: liveins: $vgpr0_vgpr1_vgpr2
				; GFX9PLUS-NEXT: {{ $}}
				; GFX9PLUS-NEXT: %src:_(<3 x s32>) = COPY $vgpr0_vgpr1_vgpr2
				; GFX9PLUS-NEXT: %truncsrc:_(<3 x s16>) = G_TRUNC %src(<3 x s32>)
				; GFX9PLUS-NEXT: %elt:_(s16) = G_CONSTANT i16 42
				; GFX9PLUS-NEXT: [[DEF:%[0-9]+]]:_(s16) = G_IMPLICIT_DEF
				; GFX9PLUS-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<3 x s16>) = G_BUILD_VECTOR %elt(s16), [[DEF]](s16), [[DEF]](s16)
				; GFX9PLUS-NEXT: %ins:_(<3 x s16>) = G_SHUFFLE_VECTOR %truncsrc(<3 x s16>), [[BUILD_VECTOR]], shufflemask(0, 1, 3)
				; GFX9PLUS-NEXT: %zextins:_(<3 x s32>) = G_ZEXT %ins(<3 x s16>)
				; GFX9PLUS-NEXT: $vgpr0_vgpr1_vgpr2 = COPY %zextins(<3 x s32>)
				; VI-LABEL: name: test_v3s16_idx2
				; VI: liveins: $vgpr0_vgpr1_vgpr2
				; VI-NEXT: {{ $}}
				; VI-NEXT: %src:_(<3 x s32>) = COPY $vgpr0_vgpr1_vgpr2
				; VI-NEXT: %truncsrc:_(<3 x s16>) = G_TRUNC %src(<3 x s32>)
				; VI-NEXT: %idx:_(s32) = G_CONSTANT i32 2
				; VI-NEXT: %elt:_(s16) = G_CONSTANT i16 42
				; VI-NEXT: %ins:_(<3 x s16>) = G_INSERT_VECTOR_ELT %truncsrc, %elt(s16), %idx(s32)
				; VI-NEXT: %zextins:_(<3 x s32>) = G_ZEXT %ins(<3 x s16>)
				; VI-NEXT: $vgpr0_vgpr1_vgpr2 = COPY %zextins(<3 x s32>)
				%src:_(<3 x s32>) = COPY $vgpr0_vgpr1_vgpr2
				%truncsrc:_(<3 x s16>) = G_TRUNC %src
				%idx:_(s32) = G_CONSTANT i32 2
				%elt:_(s16) = G_CONSTANT i16 42
				%ins:_(<3 x s16>) = G_INSERT_VECTOR_ELT %truncsrc, %elt, %idx
				%zextins:_(<3 x s32>) = G_ZEXT %ins
				$vgpr0_vgpr1_vgpr2 = COPY %zextins
				...

				---
				name: test_v2s32_idx1
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0_vgpr1
				; CHECK-LABEL: name: test_v2s32_idx1
				; CHECK: liveins: $vgpr0_vgpr1
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: %src:_(<2 x s32>) = COPY $vgpr0_vgpr1
				; CHECK-NEXT: %elt:_(s32) = G_CONSTANT i32 42
				; CHECK-NEXT: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
				; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR %elt(s32), [[DEF]](s32)
				; CHECK-NEXT: %ins:_(<2 x s32>) = G_SHUFFLE_VECTOR %src(<2 x s32>), [[BUILD_VECTOR]], shufflemask(0, 2)
				; CHECK-NEXT: $vgpr0_vgpr1 = COPY %ins(<2 x s32>)
				%src:_(<2 x s32>) = COPY $vgpr0_vgpr1
				%idx:_(s32) = G_CONSTANT i32 1
				%elt:_(s32) = G_CONSTANT i32 42
				%ins:_(<2 x s32>) = G_INSERT_VECTOR_ELT %src, %elt, %idx
				$vgpr0_vgpr1 = COPY %ins
				...

				---
				name: test_v4s16_idx3
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0_vgpr1
				; GFX9PLUS-LABEL: name: test_v4s16_idx3
				; GFX9PLUS: liveins: $vgpr0_vgpr1
				; GFX9PLUS-NEXT: {{ $}}
				; GFX9PLUS-NEXT: %src:_(<4 x s16>) = COPY $vgpr0_vgpr1
				; GFX9PLUS-NEXT: %elt:_(s16) = G_CONSTANT i16 42
				; GFX9PLUS-NEXT: [[DEF:%[0-9]+]]:_(s16) = G_IMPLICIT_DEF
				; GFX9PLUS-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s16>) = G_BUILD_VECTOR %elt(s16), [[DEF]](s16), [[DEF]](s16), [[DEF]](s16)
				; GFX9PLUS-NEXT: %ins:_(<4 x s16>) = G_SHUFFLE_VECTOR %src(<4 x s16>), [[BUILD_VECTOR]], shufflemask(0, 1, 2, 4)
				; GFX9PLUS-NEXT: $vgpr0_vgpr1 = COPY %ins(<4 x s16>)
				; VI-LABEL: name: test_v4s16_idx3
				; VI: liveins: $vgpr0_vgpr1
				; VI-NEXT: {{ $}}
				; VI-NEXT: %src:_(<4 x s16>) = COPY $vgpr0_vgpr1
				; VI-NEXT: %idx:_(s32) = G_CONSTANT i32 3
				; VI-NEXT: %elt:_(s16) = G_CONSTANT i16 42
				; VI-NEXT: %ins:_(<4 x s16>) = G_INSERT_VECTOR_ELT %src, %elt(s16), %idx(s32)
				; VI-NEXT: $vgpr0_vgpr1 = COPY %ins(<4 x s16>)
				%src:_(<4 x s16>) = COPY $vgpr0_vgpr1
				%idx:_(s32) = G_CONSTANT i32 3
				%elt:_(s16) = G_CONSTANT i16 42
				%ins:_(<4 x s16>) = G_INSERT_VECTOR_ELT %src, %elt, %idx
				$vgpr0_vgpr1 = COPY %ins
				...

This is an archive of the discontinued LLVM Phabricator instance.

[GISel] Combine G_INSERT_VECTOR_ELT to G_SHUFFLE_VECTOR
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 468794

llvm/lib/Target/AMDGPU/AMDGPUCombine.td

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/prelegalizer-combiner-insertvecelt-to-shufflevector.mir

This is an archive of the discontinued LLVM Phabricator instance.

[GISel] Combine G_INSERT_VECTOR_ELT to G_SHUFFLE_VECTORAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 468794

llvm/lib/Target/AMDGPU/AMDGPUCombine.td

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/prelegalizer-combiner-insertvecelt-to-shufflevector.mir

[GISel] Combine G_INSERT_VECTOR_ELT to G_SHUFFLE_VECTOR
AbandonedPublic