Download Raw Diff

Details

Reviewers

foad
arsenm
Petar.Avramovic
mbrkusanin

Summary

Combine V_RCP and V_SQRT into V_RSQ on AMDGPU for GlobalISel.
A similar combiner already exists for SDAG.

Diff Detail

Event Timeline

matejam created this revision.Sep 20 2021, 8:16 AM

Herald added subscribers: kerbowa, hiraditya, t-tye and 7 others. · View Herald TranscriptSep 20 2021, 8:16 AM

matejam requested review of this revision.Sep 20 2021, 8:16 AM

Herald added a subscriber: wdng. · View Herald TranscriptSep 20 2021, 8:16 AM

Harbormaster completed remote builds in B124669: Diff 373594.Sep 20 2021, 9:00 AM

arsenm added inline comments.Sep 20 2021, 2:07 PM

llvm/lib/Target/AMDGPU/SIInstructions.td
830	I don't understand this change. Are you saying this is a dead selection pattern for the DAG? Should we be doing this in the combiner instead and just delete this? That way we could consider the fast math flags and not rely on the function attribute

matejam added inline comments.Sep 21 2021, 5:54 AM

llvm/lib/Target/AMDGPU/SIInstructions.td
830	I am, with or without this pattern SDAG combines v_sqrt + v_rcp into v_rsq. I'm not sure which would be better to leave this as a pattern or write a combiner for this. In fact SDAG doesn't even need any flags to combine into v_rsq.

arsenm added inline comments.Sep 21 2021, 8:59 AM

llvm/lib/Target/AMDGPU/SIInstructions.td
830	If this is a dead pattern in the DAG, I would just delete it. When you say without flags, I assume you mean with the unsafe attribute? I'm a bit worried this pattern is just broken as-is. This depends on the denormal mode, and also could be augmented to use the per-instruction flags. I think it's safer to move this to a combine.

matejam added inline comments.Oct 4 2021, 6:20 AM

llvm/lib/Target/AMDGPU/SIInstructions.td
830	I tried deleting the SDAG combiner (SITargetLowering::performRcpCombine()) for v_rsq and then SDAG uses this pattern instead. So I assume it's either this pattern without the SDAG rcp combiner or the SDAG rcp combiner + new GlobalISel combiner?

Instead of a pattern, use a combiner on AMDGPU for GlobalISel.

matejam edited the summary of this revision. (Show Details)Oct 7 2021, 5:43 AM

Harbormaster completed remote builds in B127496: Diff 377812.Oct 7 2021, 6:20 AM

arsenm requested changes to this revision.Oct 20 2021, 4:56 PM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/SIInstructions.td
832	Can probably delete the definition of RsqPat too
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-rsq.mir
15–17	This looks like it lost the fast math flags

This revision now requires changes to proceed.Oct 20 2021, 4:56 PM

Delete RsqPat pattern definition and uses and copy the flags from the original instruction to the newly built instruction (fast math flags...).

Harbormaster completed remote builds in B131189: Diff 383014.Oct 28 2021, 6:57 AM

arsenm accepted this revision.Oct 28 2021, 7:03 AM

This revision is now accepted and ready to land.Oct 28 2021, 7:03 AM

Do we also need to handle:

sqrt(rcp(x)) as well as rcp(sqrt(x)) ?
1.0 / x as well as llvm.amdgcn.rcp(x) ?

Added implementation for all possible cases which should be combined into rsq (rcp(sqrt(x)), sqrt(rcp(x)), 1/sqrt(x), sqrt(1/x)).

Harbormaster completed remote builds in B132196: Diff 384418.Nov 3 2021, 7:31 AM

Formatting.

Harbormaster completed remote builds in B132249: Diff 384483.Nov 3 2021, 9:58 AM

Formatting.

Harbormaster completed remote builds in B132268: Diff 384511.Nov 3 2021, 11:06 AM

Added implementation for all possible cases which should be combined into rsq (rcp(sqrt(x)), sqrt(rcp(x)), 1/sqrt(x), sqrt(1/x)).

I thought this would be two separate combines:

(1.0 / x) -> (rcp x)
(sqrt (rcp x)) or (rcp (sqrt x)) -> (rsq x)

Is there some reason we don't implement the first combine, e.g. because of the precision of the rcp instruction is not good enough? What does SelectionDAG do?

In D110076#3108478, @foad wrote:

Added implementation for all possible cases which should be combined into rsq (rcp(sqrt(x)), sqrt(rcp(x)), 1/sqrt(x), sqrt(1/x)).

I thought this would be two separate combines:

(1.0 / x) -> (rcp x)

(sqrt (rcp x)) or (rcp (sqrt x)) -> (rsq x)

Is there some reason we don't implement the first combine, e.g. because of the precision of the rcp instruction is not good enough? What does SelectionDAG do?

If we run an .ll test which has (1.0 / x), by the time it gets to the amdgpu-postlegalizer-combiner it will be combined into rcp, just like SDAG.
This is a 'fake' case of a .mir test, where we put the (1.0 / x) in the test and let the combiner take care of that.

foad added inline comments.Nov 25 2021, 4:41 AM

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
217	I still think it's wrong to handle G_FDIV here. it's unnecessary, because we are running post-legalizer and G_FDIV will always get legalized to something else. even if G_FDIV did appear here, I don't think it should be combined into an rsq instruction without checking for all the fast/unsafe math flags, like in AMDGPULegalizerInfo::legalizeFastUnsafeFDIV. I think we just need an IR test to check that `fdiv float 1.0, %x1` with appropriate fast math flags get combined with `@llvm.fsqrt` to generate a v_rsq instruction.

Added .ll test. Don't cover the G_FDIV + G_FSQRT case, only with rcp intrinsic (by the time it gets to the postlegalizer it will be transformed to that).

LGTM, thanks!

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
234	I'm not sure whether it's best to copy flags from MI or RcpSrcMI or somehow combine both. I guess this is fine for now.

Harbormaster completed remote builds in B136242: Diff 390062.Nov 26 2021, 8:36 AM

commit ca57b80cd6767b97477fd157831a2b099b5f8f75

Diff 390062

llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h

Context not available.
	return UnaryOp_match<SrcTy, TargetOpcode::COPY>(std::forward<SrcTy>(Src));	return UnaryOp_match<SrcTy, TargetOpcode::COPY>(std::forward<SrcTy>(Src));
	}	}

		template <typename SrcTy>
		inline UnaryOp_match<SrcTy, TargetOpcode::G_FSQRT> m_GFSqrt(const SrcTy &Src) {
		return UnaryOp_match<SrcTy, TargetOpcode::G_FSQRT>(Src);
		}

	// General helper for generic MI compares, i.e. G_ICMP and G_FCMP	// General helper for generic MI compares, i.e. G_ICMP and G_FCMP
	// TODO: Allow checking a specific predicate.	// TODO: Allow checking a specific predicate.
	template <typename Pred_P, typename LHS_P, typename RHS_P, unsigned Opcode>	template <typename Pred_P, typename LHS_P, typename RHS_P, unsigned Opcode>
Context not available.

llvm/lib/Target/AMDGPU/AMDGPUCombine.td

Context not available.
	[{ return PostLegalizerHelper.matchUCharToFloat(*${itofp}); }]),	[{ return PostLegalizerHelper.matchUCharToFloat(*${itofp}); }]),
	(apply [{ PostLegalizerHelper.applyUCharToFloat(*${itofp}); }])>;	(apply [{ PostLegalizerHelper.applyUCharToFloat(*${itofp}); }])>;


		def rcp_sqrt_to_rsq : GICombineRule<
		(defs root:$rcp, build_fn_matchinfo:$matchinfo),
		(match (wip_match_opcode G_INTRINSIC, G_FSQRT):$rcp,
		[{ return PostLegalizerHelper.matchRcpSqrtToRsq(*${rcp}, ${matchinfo}); }]),
		(apply [{ Helper.applyBuildFn(*${rcp}, ${matchinfo}); }])>;


	def cvt_f32_ubyteN_matchdata : GIDefMatchData<"AMDGPUPostLegalizerCombinerHelper::CvtF32UByteMatchInfo">;	def cvt_f32_ubyteN_matchdata : GIDefMatchData<"AMDGPUPostLegalizerCombinerHelper::CvtF32UByteMatchInfo">;

	def cvt_f32_ubyteN : GICombineRule<	def cvt_f32_ubyteN : GICombineRule<
Context not available.
	def AMDGPUPostLegalizerCombinerHelper: GICombinerHelper<	def AMDGPUPostLegalizerCombinerHelper: GICombinerHelper<
	"AMDGPUGenPostLegalizerCombinerHelper",	"AMDGPUGenPostLegalizerCombinerHelper",
	[all_combines, gfx6gfx7_combines,	[all_combines, gfx6gfx7_combines,
	uchar_to_float, cvt_f32_ubyteN, remove_fcanonicalize, foldable_fneg]> {	uchar_to_float, cvt_f32_ubyteN, remove_fcanonicalize, foldable_fneg,
		rcp_sqrt_to_rsq]> {
	let DisableRuleOption = "amdgpupostlegalizercombiner-disable-rule";	let DisableRuleOption = "amdgpupostlegalizercombiner-disable-rule";
	let StateClass = "AMDGPUPostLegalizerCombinerHelperState";	let StateClass = "AMDGPUPostLegalizerCombinerHelperState";
	let AdditionalArguments = [];	let AdditionalArguments = [];
Context not available.

llvm/lib/Target/AMDGPU/AMDGPUInstructions.td

Context not available.
	(RcpInst $src)	(RcpInst $src)
	>;	>;

	class RsqPat<Instruction RsqInst, ValueType vt> : AMDGPUPat <
	(AMDGPUrcp (fsqrt vt:$src)),
	(RsqInst $src)
	>;

	// Instructions which select to the same v_min_f*	// Instructions which select to the same v_min_f*
	def fminnum_like : PatFrags<(ops node:$src0, node:$src1),	def fminnum_like : PatFrags<(ops node:$src0, node:$src1),
	[(fminnum_ieee node:$src0, node:$src1),	[(fminnum_ieee node:$src0, node:$src1),
Context not available.

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp

Context not available.
	#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"	#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"
	#include "llvm/CodeGen/MachineDominators.h"	#include "llvm/CodeGen/MachineDominators.h"
	#include "llvm/CodeGen/TargetPassConfig.h"	#include "llvm/CodeGen/TargetPassConfig.h"
		#include "llvm/IR/IntrinsicsAMDGPU.h"
	#include "llvm/Target/TargetMachine.h"	#include "llvm/Target/TargetMachine.h"

	#define DEBUG_TYPE "amdgpu-postlegalizer-combiner"	#define DEBUG_TYPE "amdgpu-postlegalizer-combiner"
Context not available.
	bool matchUCharToFloat(MachineInstr &MI);	bool matchUCharToFloat(MachineInstr &MI);
	void applyUCharToFloat(MachineInstr &MI);	void applyUCharToFloat(MachineInstr &MI);

		bool matchRcpSqrtToRsq(MachineInstr &MI,
		std::function<void(MachineIRBuilder &)> &MatchInfo);

	// FIXME: Should be able to have 2 separate matchdatas rather than custom	// FIXME: Should be able to have 2 separate matchdatas rather than custom
	// struct boilerplate.	// struct boilerplate.
	struct CvtF32UByteMatchInfo {	struct CvtF32UByteMatchInfo {
Context not available.
	MI.eraseFromParent();	MI.eraseFromParent();
	}	}

		bool AMDGPUPostLegalizerCombinerHelper::matchRcpSqrtToRsq(
		MachineInstr &MI, std::function<void(MachineIRBuilder &)> &MatchInfo) {

		auto getRcpSrc = [=](const MachineInstr &MI) {
		MachineInstr *ResMI = nullptr;
		if (MI.getOpcode() == TargetOpcode::G_INTRINSIC &&
		MI.getIntrinsicID() == Intrinsic::amdgcn_rcp)
		ResMI = MRI.getVRegDef(MI.getOperand(2).getReg());
		foadUnsubmitted Not Done Reply Inline Actions I still think it's wrong to handle G_FDIV here. it's unnecessary, because we are running post-legalizer and G_FDIV will always get legalized to something else. even if G_FDIV did appear here, I don't think it should be combined into an rsq instruction without checking for all the fast/unsafe math flags, like in AMDGPULegalizerInfo::legalizeFastUnsafeFDIV. I think we just need an IR test to check that `fdiv float 1.0, %x1` with appropriate fast math flags get combined with `@llvm.fsqrt` to generate a v_rsq instruction. foad: I still think it's wrong to handle G_FDIV here. - it's unnecessary, because we are running post…

		return ResMI;
		};

		auto getSqrtSrc = [=](const MachineInstr &MI) {
		MachineInstr *SqrtSrcMI = nullptr;
		mi_match(MI.getOperand(0).getReg(), MRI, m_GFSqrt(m_MInstr(SqrtSrcMI)));
		return SqrtSrcMI;
		};

		MachineInstr RcpSrcMI = nullptr, SqrtSrcMI = nullptr;
		// rcp(sqrt(x))
		if ((RcpSrcMI = getRcpSrc(MI)) && (SqrtSrcMI = getSqrtSrc(*RcpSrcMI))) {
		MatchInfo = [SqrtSrcMI, &MI](MachineIRBuilder &B) {
		B.buildIntrinsic(Intrinsic::amdgcn_rsq, {MI.getOperand(0)}, false)
		.addUse(SqrtSrcMI->getOperand(0).getReg())
		.setMIFlags(MI.getFlags());
		foadUnsubmitted Not Done Reply Inline Actions I'm not sure whether it's best to copy flags from MI or RcpSrcMI or somehow combine both. I guess this is fine for now. foad: I'm not sure whether it's best to copy flags from MI or RcpSrcMI or somehow combine both. I…
		};
		return true;
		}

		// sqrt(rcp(x))
		if ((SqrtSrcMI = getSqrtSrc(MI)) && (RcpSrcMI = getRcpSrc(*SqrtSrcMI))) {
		MatchInfo = [RcpSrcMI, &MI](MachineIRBuilder &B) {
		B.buildIntrinsic(Intrinsic::amdgcn_rsq, {MI.getOperand(0)}, false)
		.addUse(RcpSrcMI->getOperand(0).getReg())
		.setMIFlags(MI.getFlags());
		};
		return true;
		}

		return false;
		}

	bool AMDGPUPostLegalizerCombinerHelper::matchCvtF32UByteN(	bool AMDGPUPostLegalizerCombinerHelper::matchCvtF32UByteN(
	MachineInstr &MI, CvtF32UByteMatchInfo &MatchInfo) {	MachineInstr &MI, CvtF32UByteMatchInfo &MatchInfo) {
	Register SrcReg = MI.getOperand(1).getReg();	Register SrcReg = MI.getOperand(1).getReg();
Context not available.

llvm/lib/Target/AMDGPU/CaymanInstructions.td

Context not available.
	def COS_cm : COS_Common<0x8E>;	def COS_cm : COS_Common<0x8E>;
	} // End isVector = 1	} // End isVector = 1

	def : RsqPat<RECIPSQRT_IEEE_cm, f32>;

	def : SqrtPat<RECIPSQRT_IEEE_cm, RECIP_IEEE_cm>;	def : SqrtPat<RECIPSQRT_IEEE_cm, RECIP_IEEE_cm>;

	def : POW_Common <LOG_IEEE_cm, EXP_IEEE_cm, MUL>;	def : POW_Common <LOG_IEEE_cm, EXP_IEEE_cm, MUL>;
Context not available.

llvm/lib/Target/AMDGPU/EvergreenInstructions.td

Context not available.
	def LOG_IEEE_eg : LOG_IEEE_Common<0x83>;	def LOG_IEEE_eg : LOG_IEEE_Common<0x83>;
	def RECIP_CLAMPED_eg : RECIP_CLAMPED_Common<0x84>;	def RECIP_CLAMPED_eg : RECIP_CLAMPED_Common<0x84>;
	def RECIPSQRT_IEEE_eg : RECIPSQRT_IEEE_Common<0x89>;	def RECIPSQRT_IEEE_eg : RECIPSQRT_IEEE_Common<0x89>;
	def : RsqPat<RECIPSQRT_IEEE_eg, f32>;
	def : SqrtPat<RECIPSQRT_IEEE_eg, RECIP_IEEE_eg>;	def : SqrtPat<RECIPSQRT_IEEE_eg, RECIP_IEEE_eg>;

	def SIN_eg : SIN_Common<0x8D>;	def SIN_eg : SIN_Common<0x8D>;
Context not available.

llvm/lib/Target/AMDGPU/R600Instructions.td

Context not available.
	defm DIV_r600 : DIV_Common<RECIP_IEEE_r600>;	defm DIV_r600 : DIV_Common<RECIP_IEEE_r600>;
	def : POW_Common <LOG_IEEE_r600, EXP_IEEE_r600, MUL>;	def : POW_Common <LOG_IEEE_r600, EXP_IEEE_r600, MUL>;

	def : RsqPat<RECIPSQRT_IEEE_r600, f32>;
	def : SqrtPat<RECIPSQRT_IEEE_r600, RECIP_IEEE_r600>;	def : SqrtPat<RECIPSQRT_IEEE_r600, RECIP_IEEE_r600>;

	def R600_ExportSwz : ExportSwzInst {	def R600_ExportSwz : ExportSwzInst {
Context not available.

llvm/lib/Target/AMDGPU/SIInstructions.td

Context not available.

	let OtherPredicates = [UnsafeFPMath] in {	let OtherPredicates = [UnsafeFPMath] in {

	//defm : RsqPat<V_RSQ_F32_e32, f32>;

	def : RsqPat<V_RSQ_F32_e32, f32>;
	arsenmUnsubmitted Not Done Reply Inline Actions Can probably delete the definition of RsqPat too arsenm: Can probably delete the definition of RsqPat too

	// Convert (x - floor(x)) to fract(x)	// Convert (x - floor(x)) to fract(x)
		arsenmUnsubmitted Not Done Reply Inline Actions I don't understand this change. Are you saying this is a dead selection pattern for the DAG? Should we be doing this in the combiner instead and just delete this? That way we could consider the fast math flags and not rely on the function attribute arsenm: I don't understand this change. Are you saying this is a dead selection pattern for the DAG?
		matejamAuthorUnsubmitted Done Reply Inline Actions I am, with or without this pattern SDAG combines v_sqrt + v_rcp into v_rsq. I'm not sure which would be better to leave this as a pattern or write a combiner for this. In fact SDAG doesn't even need any flags to combine into v_rsq. matejam: I am, with or without this pattern SDAG combines v_sqrt + v_rcp into v_rsq. I'm not sure which…
		arsenmUnsubmitted Not Done Reply Inline Actions If this is a dead pattern in the DAG, I would just delete it. When you say without flags, I assume you mean with the unsafe attribute? I'm a bit worried this pattern is just broken as-is. This depends on the denormal mode, and also could be augmented to use the per-instruction flags. I think it's safer to move this to a combine. arsenm: If this is a dead pattern in the DAG, I would just delete it. When you say without flags, I…
		matejamAuthorUnsubmitted Done Reply Inline Actions I tried deleting the SDAG combiner (SITargetLowering::performRcpCombine()) for v_rsq and then SDAG uses this pattern instead. So I assume it's either this pattern without the SDAG rcp combiner or the SDAG rcp combiner + new GlobalISel combiner? matejam: I tried deleting the SDAG combiner (SITargetLowering::performRcpCombine()) for v_rsq and then…
	def : GCNPat <	def : GCNPat <
	(f32 (fsub (f32 (VOP3Mods f32:$x, i32:$mods)),	(f32 (fsub (f32 (VOP3Mods f32:$x, i32:$mods)),
Context not available.

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-rsq.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

				define amdgpu_cs float @div_sqrt(float inreg %arg1) {
				; GCN-LABEL: div_sqrt:
				; GCN: ; %bb.0: ; %.entry
				; GCN-NEXT: v_rsq_f32_e32 v0, s0
				; GCN-NEXT: ; return to shader part epilog
				.entry:
				%a = call float @llvm.sqrt.f32(float %arg1)
				%b = fdiv afn float 1.000000e+00, %a
				ret float %b
				}

				define amdgpu_cs float @sqrt_div(float inreg %arg1) {
				; GCN-LABEL: sqrt_div:
				; GCN: ; %bb.0: ; %.entry
				; GCN-NEXT: v_rsq_f32_e32 v0, s0
				; GCN-NEXT: ; return to shader part epilog
				.entry:
				%a = fdiv afn float 1.000000e+00, %arg1
				%b = call float @llvm.sqrt.f32(float %a)
				ret float %b
				}

				define amdgpu_cs float @rcp_sqrt(float inreg %arg1) {
				; GCN-LABEL: rcp_sqrt:
				; GCN: ; %bb.0: ; %.entry
				; GCN-NEXT: v_rsq_f32_e32 v0, s0
				; GCN-NEXT: ; return to shader part epilog
				.entry:
				%a = call float @llvm.sqrt.f32(float %arg1)
				%b = call float @llvm.amdgcn.rcp.f32(float %a)
				ret float %b
				}

				define amdgpu_cs float @sqrt_rcp(float inreg %arg1) {
				; GCN-LABEL: sqrt_rcp:
				; GCN: ; %bb.0: ; %.entry
				; GCN-NEXT: v_rsq_f32_e32 v0, s0
				; GCN-NEXT: ; return to shader part epilog
				.entry:
				%a = call float @llvm.amdgcn.rcp.f32(float %arg1)
				%b = call float @llvm.sqrt.f32(float %a)
				ret float %b
				}


				declare float @llvm.sqrt.f32(float)
				declare float @llvm.amdgcn.rcp.f32(float)

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-rsq.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -global-isel -march=amdgcn -mcpu=gfx1010 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

				---
				name: rcp_sqrt_test
				body: \|
				bb.0:
				liveins: $sgpr0

				; CHECK: $vgpr0 = COPY %3
				; CHECK: SI_RETURN_TO_EPILOG implicit $vgpr0
				; GCN-LABEL: name: rcp_sqrt_test
				; GCN: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
				; GCN: [[INT:%[0-9]+]]:_(s32) = afn G_INTRINSIC intrinsic(@llvm.amdgcn.rsq), [[COPY]](s32)
				; GCN: $vgpr0 = COPY [[INT]](s32)
				; GCN: SI_RETURN_TO_EPILOG implicit $vgpr0
				%0:_(s32) = COPY $sgpr0
				arsenmUnsubmitted Not Done Reply Inline Actions This looks like it lost the fast math flags arsenm: This looks like it lost the fast math flags
				%2:_(s32) = G_FSQRT %0:_
				%3:_(s32) = afn G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), %2:_(s32)
				$vgpr0 = COPY %3:_(s32)
				SI_RETURN_TO_EPILOG implicit $vgpr0

				...

				---
				name: sqrt_rcp_test
				body: \|
				bb.0:
				liveins: $sgpr0

				; GCN-LABEL: name: sqrt_rcp_test
				; GCN: [[COPY:%[0-9]+]]:_(s32) = COPY $sgpr0
				; GCN: [[INT:%[0-9]+]]:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.rsq), [[COPY]](s32)
				; GCN: $vgpr0 = COPY [[INT]](s32)
				; GCN: SI_RETURN_TO_EPILOG implicit $vgpr0
				%0:_(s32) = COPY $sgpr0
				%2:_(s32) = afn G_INTRINSIC intrinsic(@llvm.amdgcn.rcp), %0:_(s32)
				%3:_(s32) = G_FSQRT %2:_
				$vgpr0 = COPY %3:_(s32)
				SI_RETURN_TO_EPILOG implicit $vgpr0

				...

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Code quality: Combine V_RSQ
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 390062

llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h

llvm/lib/Target/AMDGPU/AMDGPUCombine.td

llvm/lib/Target/AMDGPU/AMDGPUInstructions.td

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp

llvm/lib/Target/AMDGPU/CaymanInstructions.td

llvm/lib/Target/AMDGPU/EvergreenInstructions.td

llvm/lib/Target/AMDGPU/R600Instructions.td

llvm/lib/Target/AMDGPU/SIInstructions.td

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-rsq.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-rsq.mir

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Code quality: Combine V_RSQClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 390062

llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h

llvm/lib/Target/AMDGPU/AMDGPUCombine.td

llvm/lib/Target/AMDGPU/AMDGPUInstructions.td

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp

llvm/lib/Target/AMDGPU/CaymanInstructions.td

llvm/lib/Target/AMDGPU/EvergreenInstructions.td

llvm/lib/Target/AMDGPU/R600Instructions.td

llvm/lib/Target/AMDGPU/SIInstructions.td

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-rsq.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-rsq.mir

[AMDGPU][GlobalISel] Code quality: Combine V_RSQ
ClosedPublic