Download Raw Diff

Details

Reviewers

Commits

rG3612d9eaacd0: [GISel] Rework trunc/shl combine in a generic trunc/shift combine

Summary

This combine only handled left shifts, but now it can handle right shifts as well. It handles right shifts conservatively and only truncates them to the size returned by TLI.

AMDGPU benefits from always lowering shifts to 32 bits for instance, but AArch64 would rather keep them at 64 bits.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Pierre-vh created this revision.Oct 20 2022, 12:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 12:36 AM

Herald added subscribers: kosarev, foad, kerbowa and 7 others. · View Herald Transcript

Pierre-vh requested review of this revision.Oct 20 2022, 12:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 12:36 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Pierre-vh mentioned this in D136059: [AMDGPU][DAG] Fix trunc/shift combine condition.Oct 20 2022, 12:36 AM

Harbormaster completed remote builds in B193157: Diff 469126.Oct 20 2022, 1:19 AM

Remove dead var

Harbormaster completed remote builds in B193165: Diff 469138.Oct 20 2022, 3:06 AM

The generic combiner already has trunc(shl), this should go there along with it (preferably with fewer hardcoded sizes)

Move to generic combine, rework

Pierre-vh retitled this revision from [AMDGPU][GISel] Add trunc/shr combine to [GISel] Rework trunc/shl combine in a generic trunc/shift combine.Oct 21 2022, 3:00 AM

Pierre-vh edited the summary of this revision. (Show Details)

Herald added a subscriber: kristof.beyls. · View Herald TranscriptOct 21 2022, 3:00 AM

Docs

Harbormaster completed remote builds in B193465: Diff 469524.Oct 21 2022, 3:53 AM

foad added inline comments.Oct 21 2022, 5:27 AM

llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
415 ↗	(On Diff #469524)	Interesting - this seems like an abuse of getPreferredShiftAmountTy, but I guess it works out OK in practice for the one case we care about (converting 64-bit shifts to 32-bit shifts).

arsenm added inline comments.Oct 21 2022, 10:27 AM

llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
415 ↗	(On Diff #469524)	This is definitely not what getPreferredShiftAmountTy is for, which is only supposed to help produce already legal shift RHS operands
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
2283 ↗	(On Diff #469524)	This is what getPreferredShiftAmountTy is for
2297 ↗	(On Diff #469524)	This usage of getPreferredShiftAmountTy doesn't make sense, but you need something to pick an intermediate type here. Without adding anything, I guess you could bisect types until you found a legal shift? Creating the minimum required shift also "should" work out, but may create more legalizer and optimizer work. It would be nice if we could directly inspect the list of legal shift types, but the API doesn't have that. Does just reusing the truncating type work as well?

Pierre-vh added inline comments.Oct 23 2022, 11:38 PM

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
2297 ↗	(On Diff #469524)	Reusing the truncating type doesn't work as well, and trying to guess a legal one doesn't either for AArch64 because 16 bits shifts seem legal there, but are suboptimal compared to 64 (word size) shifts. My first version actually did a simplified bisecting to find a good type, but AArch64 suffered and that's why I didn't do it. I think a new TLI hook would be best so targets can just choose whatever they prefer. I'll post a version with that and we can discuss there.

Pierre-vh marked 4 inline comments as done.Oct 24 2022, 12:50 AM

Pierre-vh added inline comments.

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
2297 ↗	(On Diff #469524)	Nevermind, I found that AArch64 "suffers" because the combine added in D109419 gets confused by the added trunc on the right shift operands. I'm looking into fixing it.

Remove misuse of getPreferredShiftAmountTy
Only truncate right shifts to 32 bits for now. 16 bits doesn't seem to benefit any targets, I only observed regressions in our backend and no clear gains.
Don't truncate right shifts if the trunc has any store users to prevent blocking truncstore combine.
- I looked into adding support for the patterns generated by this new combine to the truncstore combine, but it's a complex combine so it's not immediately clear what needs to change. I also didn't observe any regressions caused by adding this restriction so it's not high-priority IMO.

Harbormaster completed remote builds in B193887: Diff 470082.Oct 24 2022, 2:43 AM

ping

Not sure why you deleted the test

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
2262 ↗	(On Diff #470082)	Start with lowercase letter
2313–2316 ↗	(On Diff #470082)	I don't understand special casing this, I'd rather just ignore it

All combines are always on GISEL opcodes? There are no AMDGPU native combines?

arsenm added inline comments.Nov 16 2022, 9:18 AM

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
2313 ↗	(On Diff #470082)	Do this first then?

In D136319#3931108, @tschuett wrote:

All combines are always on GISEL opcodes? There are no AMDGPU native combines?

Yes. You can also define target specific "generic" opcodes

You have define generic opcodes in the *amdgpu* namespace that represent AMDGPU native instructions?

In D136319#3931139, @tschuett wrote:

You have define generic opcodes in the *amdgpu* namespace that represent AMDGPU native instructions?

Yes. They are regbankselectable and have the same restrictions as G_* opcodes

Thanks!

Comments

In D136319#3931063, @arsenm wrote:

Not sure why you deleted the test

Test cases are in the new file, not sure why it shows as deleted

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
2313–2316 ↗	(On Diff #470082)	I spent a while trying to get the truncstore combine to work properly in such cases but it's really not easy. The matching logic is a bit complex. There's also a comment in that combine's matcher indicating it may be moved to a separate pass in the future to improve its matching logic, so I think it's better to just leave this special case in for now since: Fixing the truncstore combine seems to be hard This seems to be a niche case - this special case doesn't look like it causes any harm, on the contrary, it allows the combine to not interfere with truncstore. The detailed reason is that the truncstore combine calculates the number of stores it should be merging together based on the type of the source it matches. If we always ignore the trunc, it breaks the combine on AArch64 for 16 bit sources, as the source is always going to be a trunc of a `w` register. (See `trunc_i16_to_i8` testcase). It would always take the 32 bit value as the source, but in those cases the 16 bit source is the correct one. Conditionally ignoring the trunc (e.g. only when the src was matched from a shift) can cause premature matching of the combine. e.g. if this (right-shift trunc) combine kicks in on `trunc_i64_to_i16`, it causes the shift's source to be 32 bits. We then see that we need 2 or 4 stores (depending on which source we use). As the combiner works top-down within a block, we'll find 2 stores before finding 4 of them, and the matcher has to conservatively assume it won't find more. It'll end up merging 2 stores instead of 4. TL;DR: I would rather leave the special case in for now as removing isn't worth the effort IMO (and it doesn't seem to improve codegen quality significantly either)

Harbormaster completed remote builds in B198164: Diff 476064.Nov 17 2022, 3:33 AM

ping

arsenm accepted this revision.Dec 7 2022, 8:33 AM

This revision is now accepted and ready to land.Dec 7 2022, 8:33 AM

@arsenm I accidentally forgot to leave in the special case to not fold in the presence of G_STORE users. That's why some aarch64 test was failing, see my previous comments.
Is it still good to land even with the special-case left in?

Harbormaster completed remote builds in B201895: Diff 481177.Dec 8 2022, 9:04 AM

In D136319#3980696, @Pierre-vh wrote:

@arsenm I accidentally forgot to leave in the special case to not fold in the presence of G_STORE users. That's why some aarch64 test was failing, see my previous comments.
Is it still good to land even with the special-case left in?

I forgot about that. Can you file an issue to pick it up later?

This revision was landed with ongoing or failed builds.Dec 9 2022, 1:46 AM

Closed by commit rG3612d9eaacd0: [GISel] Rework trunc/shl combine in a generic trunc/shift combine (authored by Pierre-vh). · Explain Why

This revision was automatically updated to reflect the committed changes.

Pierre-vh added a commit: rG3612d9eaacd0: [GISel] Rework trunc/shl combine in a generic trunc/shift combine.

Diff 469138

llvm/lib/Target/AMDGPU/AMDGPUCombine.td

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines

	def remove_fcanonicalize : GICombineRule<			def remove_fcanonicalize : GICombineRule<
	(defs root:$fcanonicalize, remove_fcanonicalize_matchinfo:$matchinfo),			(defs root:$fcanonicalize, remove_fcanonicalize_matchinfo:$matchinfo),
	(match (wip_match_opcode G_FCANONICALIZE):$fcanonicalize,			(match (wip_match_opcode G_FCANONICALIZE):$fcanonicalize,
	[{ return PostLegalizerHelper.matchRemoveFcanonicalize(*${fcanonicalize}, ${matchinfo}); }]),			[{ return PostLegalizerHelper.matchRemoveFcanonicalize(*${fcanonicalize}, ${matchinfo}); }]),
	(apply [{ Helper.replaceSingleDefInstWithReg(*${fcanonicalize}, ${matchinfo}); }])>;			(apply [{ Helper.replaceSingleDefInstWithReg(*${fcanonicalize}, ${matchinfo}); }])>;

	def foldable_fneg_matchdata : GIDefMatchData<"MachineInstr *">;			def foldable_fneg_matchdata : GIDefMatchData<"MachineInstr *">;

	def foldable_fneg : GICombineRule<			def foldable_fneg : GICombineRule<
	(defs root:$ffn, foldable_fneg_matchdata:$matchinfo),			(defs root:$ffn, foldable_fneg_matchdata:$matchinfo),
	(match (wip_match_opcode G_FNEG):$ffn,			(match (wip_match_opcode G_FNEG):$ffn,
	[{ return Helper.matchFoldableFneg(*${ffn}, ${matchinfo}); }]),			[{ return Helper.matchFoldableFneg(*${ffn}, ${matchinfo}); }]),
	(apply [{ Helper.applyFoldableFneg(*${ffn}, ${matchinfo}); }])>;			(apply [{ Helper.applyFoldableFneg(*${ffn}, ${matchinfo}); }])>;

				def trunc_right_shift_reduction_matchdata : GIDefMatchData<"MachineInstr *">;
				def trunc_right_shift : GICombineRule<
				(defs root:$trunc, trunc_right_shift_reduction_matchdata:$matchinfo),
				(match (wip_match_opcode G_TRUNC):$trunc,
				[{ return Helper.matchTruncRightShiftReduction(*${trunc}, ${matchinfo}); }]),
				(apply [{ Helper.applyTruncRightShiftReduction(*${trunc}, ${matchinfo}); }])>;

	// Combines which should only apply on SI/VI			// Combines which should only apply on SI/VI
	def gfx6gfx7_combines : GICombineGroup<[fcmp_select_to_fmin_fmax_legacy]>;			def gfx6gfx7_combines : GICombineGroup<[fcmp_select_to_fmin_fmax_legacy]>;

	def AMDGPUPreLegalizerCombinerHelper: GICombinerHelper<			def AMDGPUPreLegalizerCombinerHelper: GICombinerHelper<
	"AMDGPUGenPreLegalizerCombinerHelper",			"AMDGPUGenPreLegalizerCombinerHelper",
	[all_combines, clamp_i64_to_i16, foldable_fneg]> {			[all_combines, clamp_i64_to_i16, foldable_fneg, trunc_right_shift]> {
	let DisableRuleOption = "amdgpuprelegalizercombiner-disable-rule";			let DisableRuleOption = "amdgpuprelegalizercombiner-disable-rule";
	let StateClass = "AMDGPUPreLegalizerCombinerHelperState";			let StateClass = "AMDGPUPreLegalizerCombinerHelperState";
	let AdditionalArguments = [];			let AdditionalArguments = [];
	}			}

	def AMDGPUPostLegalizerCombinerHelper: GICombinerHelper<			def AMDGPUPostLegalizerCombinerHelper: GICombinerHelper<
	"AMDGPUGenPostLegalizerCombinerHelper",			"AMDGPUGenPostLegalizerCombinerHelper",
	[all_combines, gfx6gfx7_combines,			[all_combines, gfx6gfx7_combines,
	uchar_to_float, cvt_f32_ubyteN, remove_fcanonicalize, foldable_fneg,			uchar_to_float, cvt_f32_ubyteN, remove_fcanonicalize, foldable_fneg,
	rcp_sqrt_to_rsq]> {			rcp_sqrt_to_rsq, trunc_right_shift]> {
	let DisableRuleOption = "amdgpupostlegalizercombiner-disable-rule";			let DisableRuleOption = "amdgpupostlegalizercombiner-disable-rule";
	let StateClass = "AMDGPUPostLegalizerCombinerHelperState";			let StateClass = "AMDGPUPostLegalizerCombinerHelperState";
	let AdditionalArguments = [];			let AdditionalArguments = [];
	}			}

	def AMDGPURegBankCombinerHelper : GICombinerHelper<			def AMDGPURegBankCombinerHelper : GICombinerHelper<
	"AMDGPUGenRegBankCombinerHelper",			"AMDGPUGenRegBankCombinerHelper",
	[zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,			[zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
	fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp]> {			fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp]> {
	let DisableRuleOption = "amdgpuregbankcombiner-disable-rule";			let DisableRuleOption = "amdgpuregbankcombiner-disable-rule";
	let StateClass = "AMDGPURegBankCombinerHelperState";			let StateClass = "AMDGPURegBankCombinerHelperState";
	let AdditionalArguments = [];			let AdditionalArguments = [];
	}			}

llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.h

	Show All 17 Lines
	using namespace llvm;			using namespace llvm;

	class AMDGPUCombinerHelper : public CombinerHelper {			class AMDGPUCombinerHelper : public CombinerHelper {
	public:			public:
	using CombinerHelper::CombinerHelper;			using CombinerHelper::CombinerHelper;

	bool matchFoldableFneg(MachineInstr &MI, MachineInstr *&MatchInfo);			bool matchFoldableFneg(MachineInstr &MI, MachineInstr *&MatchInfo);
	void applyFoldableFneg(MachineInstr &MI, MachineInstr *&MatchInfo);			void applyFoldableFneg(MachineInstr &MI, MachineInstr *&MatchInfo);

				bool matchTruncRightShiftReduction(MachineInstr &MI,
				MachineInstr *&MatchInfo);
				void applyTruncRightShiftReduction(MachineInstr &MI,
				MachineInstr *&MatchInfo);
	};			};

llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp

//=== lib/CodeGen/GlobalISel/AMDGPUCombinerHelper.cpp ---------------------===//		//=== lib/CodeGen/GlobalISel/AMDGPUCombinerHelper.cpp ---------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPUCombinerHelper.h"		#include "AMDGPUCombinerHelper.h"
#include "GCNSubtarget.h"		#include "GCNSubtarget.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
		#include "llvm/CodeGen/GlobalISel/GISelKnownBits.h"
#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"		#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"

using namespace llvm;		using namespace llvm;
using namespace MIPatternMatch;		using namespace MIPatternMatch;

LLVM_READNONE		LLVM_READNONE
▲ Show 20 Lines • Show All 355 Lines • ▼ Show 20 Lines	if (MRI.hasOneNonDBGUse(MatchInfoDst)) {
// Recreate non negated value for other uses of old MatchInfoDst		// Recreate non negated value for other uses of old MatchInfoDst
auto NextInst = ++MatchInfo->getIterator();		auto NextInst = ++MatchInfo->getIterator();
Builder.setInstrAndDebugLoc(*NextInst);		Builder.setInstrAndDebugLoc(*NextInst);
Builder.buildFNeg(MatchInfoDst, NegatedMatchInfo, MI.getFlags());		Builder.buildFNeg(MatchInfoDst, NegatedMatchInfo, MI.getFlags());
}		}

MI.eraseFromParent();		MI.eraseFromParent();
}		}

		bool AMDGPUCombinerHelper::matchTruncRightShiftReduction(
		MachineInstr &MI, MachineInstr *&MatchInfo) {
		assert(MI.getOpcode() == AMDGPU::G_TRUNC);

		// Shrink >32 bits right shifts to 32-bit if truncated to <32 bits:
		// e.g.:
		// (i16 (trunc (i64 (sr[la] X, K))))
		// -> (i16 (trunc (i32 (sr[la] (i32 (trunc X)), K))))
		//
		// Note: this only needs to handle right shifts as the generic trunc_shl
		// combine handles left shifts.
		LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
		const unsigned DstSize = DstTy.getSizeInBits();
		if (DstSize >= 32)
		return false;

		Register Src = MI.getOperand(1).getReg();
		MachineInstr *SrcMI = getDefIgnoringCopies(Src, MRI);

		if (MRI.getType(Src).getSizeInBits() <= 32 \|\| !MRI.hasOneNonDBGUse(Src) \|\|
		(SrcMI->getOpcode() != AMDGPU::G_ASHR &&
		SrcMI->getOpcode() != AMDGPU::G_LSHR))
		return false;

		// Don't do the transformation if we risk losing information.
		Register ShiftAmt = SrcMI->getOperand(2).getReg();
		if (KB->getKnownBits(ShiftAmt).getMaxValue().ugt(32 - DstSize))
		return false;

		MatchInfo = SrcMI;
		return true;
		}

		void AMDGPUCombinerHelper::applyTruncRightShiftReduction(
		MachineInstr &MI, MachineInstr *&MatchInfo) {
		Builder.setInstrAndDebugLoc(MI);

		Register ShiftSrc = MatchInfo->getOperand(1).getReg();
		Register ShiftAmt = MatchInfo->getOperand(2).getReg();

		// Trunc shift src to 32 bits.
		Register NewShiftSrc = MRI.createGenericVirtualRegister(LLT::scalar(32));
		Builder.buildTrunc(NewShiftSrc, ShiftSrc);

		// Create a 32 bits shift.
		Register NewShiftDst = MRI.createGenericVirtualRegister(LLT::scalar(32));
		Builder.buildInstr(MatchInfo->getOpcode(), {NewShiftDst},
		{NewShiftSrc, ShiftAmt});

		// Fix the trunc operand.
		replaceRegOpWith(MRI, MI.getOperand(1), NewShiftDst);
		}
		No newline at end of file

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-trunc-right-shift.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -run-pass=amdgpu-prelegalizer-combiner -verify-machineinstrs %s -o - \| FileCheck %s
				# RUN: llc -march=amdgcn -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - \| FileCheck %s

				---
				name: s16_trunc_s64_lshr_16
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0
				; CHECK-LABEL: name: s16_trunc_s64_lshr_16
				; CHECK: liveins: $vgpr0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; CHECK-NEXT: %amt:_(s32) = G_CONSTANT i32 16
				; CHECK-NEXT: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], %amt(s32)
				; CHECK-NEXT: %trunc:_(s16) = G_TRUNC [[LSHR]](s32)
				; CHECK-NEXT: %foo:_(s16) = G_CONSTANT i16 55
				; CHECK-NEXT: %keep:_(s32) = G_MERGE_VALUES %trunc(s16), %foo(s16)
				; CHECK-NEXT: $vgpr0 = COPY %keep(s32)
				%0:_(s32) = COPY $vgpr0
				%src:_(s64) = G_ZEXT %0
				%amt:_(s32) = G_CONSTANT i32 16
				%shift:_(s64) = G_LSHR %src, %amt
				%trunc:_(s16) = G_TRUNC %shift
				%foo:_(s16) = G_CONSTANT i16 55
				%keep:_(s32) = G_MERGE_VALUES %trunc, %foo
				$vgpr0 = COPY %keep
				...

				---
				name: s16_trunc_s64_ashr_16
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0
				; CHECK-LABEL: name: s16_trunc_s64_ashr_16
				; CHECK: liveins: $vgpr0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; CHECK-NEXT: %amt:_(s32) = G_CONSTANT i32 16
				; CHECK-NEXT: [[ASHR:%[0-9]+]]:_(s32) = G_ASHR [[COPY]], %amt(s32)
				; CHECK-NEXT: %trunc:_(s16) = G_TRUNC [[ASHR]](s32)
				; CHECK-NEXT: %foo:_(s16) = G_CONSTANT i16 55
				; CHECK-NEXT: %keep:_(s32) = G_MERGE_VALUES %trunc(s16), %foo(s16)
				; CHECK-NEXT: $vgpr0 = COPY %keep(s32)
				%0:_(s32) = COPY $vgpr0
				%src:_(s64) = G_ZEXT %0
				%amt:_(s32) = G_CONSTANT i32 16
				%shift:_(s64) = G_ASHR %src, %amt
				%trunc:_(s16) = G_TRUNC %shift
				%foo:_(s16) = G_CONSTANT i16 55
				%keep:_(s32) = G_MERGE_VALUES %trunc, %foo
				$vgpr0 = COPY %keep
				...

				---
				name: s16_trunc_s64_lshr_17_nofold
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0
				; CHECK-LABEL: name: s16_trunc_s64_lshr_17_nofold
				; CHECK: liveins: $vgpr0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; CHECK-NEXT: %src:_(s64) = G_ZEXT [[COPY]](s32)
				; CHECK-NEXT: %amt:_(s32) = G_CONSTANT i32 17
				; CHECK-NEXT: %shift:_(s64) = G_LSHR %src, %amt(s32)
				; CHECK-NEXT: %trunc:_(s16) = G_TRUNC %shift(s64)
				; CHECK-NEXT: %foo:_(s16) = G_CONSTANT i16 55
				; CHECK-NEXT: %keep:_(s32) = G_MERGE_VALUES %trunc(s16), %foo(s16)
				; CHECK-NEXT: $vgpr0 = COPY %keep(s32)
				%0:_(s32) = COPY $vgpr0
				%src:_(s64) = G_ZEXT %0
				%amt:_(s32) = G_CONSTANT i32 17
				%shift:_(s64) = G_LSHR %src, %amt
				%trunc:_(s16) = G_TRUNC %shift
				%foo:_(s16) = G_CONSTANT i16 55
				%keep:_(s32) = G_MERGE_VALUES %trunc, %foo
				$vgpr0 = COPY %keep
				...

				---
				name: s26_trunc_s64_lshr_6
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0
				; CHECK-LABEL: name: s26_trunc_s64_lshr_6
				; CHECK: liveins: $vgpr0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; CHECK-NEXT: %amt:_(s32) = G_CONSTANT i32 6
				; CHECK-NEXT: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], %amt(s32)
				; CHECK-NEXT: %trunc:_(s26) = G_TRUNC [[LSHR]](s32)
				; CHECK-NEXT: %foo:_(s26) = G_CONSTANT i26 55
				; CHECK-NEXT: %keep0:_(s26) = G_ADD %trunc, %foo
				; CHECK-NEXT: %keep1:_(s32) = G_ANYEXT %keep0(s26)
				; CHECK-NEXT: $vgpr0 = COPY %keep1(s32)
				%0:_(s32) = COPY $vgpr0
				%src:_(s64) = G_ZEXT %0
				%amt:_(s32) = G_CONSTANT i32 6
				%shift:_(s64) = G_LSHR %src, %amt
				%trunc:_(s26) = G_TRUNC %shift
				%foo:_(s26) = G_CONSTANT i26 55
				%keep0:_(s26) = G_ADD %trunc, %foo
				%keep1:_(s32) = G_ANYEXT %keep0
				$vgpr0 = COPY %keep1
				...

				---
				name: s26_trunc_s64_lshr_7_nofold
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0
				; CHECK-LABEL: name: s26_trunc_s64_lshr_7_nofold
				; CHECK: liveins: $vgpr0
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; CHECK-NEXT: %src:_(s64) = G_ZEXT [[COPY]](s32)
				; CHECK-NEXT: %amt:_(s32) = G_CONSTANT i32 7
				; CHECK-NEXT: %shift:_(s64) = G_LSHR %src, %amt(s32)
				; CHECK-NEXT: %trunc:_(s26) = G_TRUNC %shift(s64)
				; CHECK-NEXT: %foo:_(s26) = G_CONSTANT i26 55
				; CHECK-NEXT: %keep0:_(s26) = G_ADD %trunc, %foo
				; CHECK-NEXT: %keep1:_(s32) = G_ANYEXT %keep0(s26)
				; CHECK-NEXT: $vgpr0 = COPY %keep1(s32)
				%0:_(s32) = COPY $vgpr0
				%src:_(s64) = G_ZEXT %0
				%amt:_(s32) = G_CONSTANT i32 7
				%shift:_(s64) = G_LSHR %src, %amt
				%trunc:_(s26) = G_TRUNC %shift
				%foo:_(s26) = G_CONSTANT i26 55
				%keep0:_(s26) = G_ADD %trunc, %foo
				%keep1:_(s32) = G_ANYEXT %keep0
				$vgpr0 = COPY %keep1
				...

This is an archive of the discontinued LLVM Phabricator instance.

[GISel] Rework trunc/shl combine in a generic trunc/shift combine
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 469138

llvm/lib/Target/AMDGPU/AMDGPUCombine.td

llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.h

llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-trunc-right-shift.mir

This is an archive of the discontinued LLVM Phabricator instance.

[GISel] Rework trunc/shl combine in a generic trunc/shift combineClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 469138

llvm/lib/Target/AMDGPU/AMDGPUCombine.td

llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.h

llvm/lib/Target/AMDGPU/AMDGPUCombinerHelper.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-trunc-right-shift.mir

[GISel] Rework trunc/shl combine in a generic trunc/shift combine
ClosedPublic