This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: override shouldNormalizeToSelectSequence
AbandonedPublic

Authored by nhaehnle on Jul 25 2016, 3:33 AM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm

Summary

Prefer to keep logic operations on i1 flags so that they get lowered to the
corresponding SALU instructions instead of the equivalent v_cndmask. The idea
is to get a better balance of SALU and VALU instructions, especially when
combined with https://reviews.llvm.org/D22747.

Diff Detail

Event Timeline

nhaehnle updated this revision to Diff 65323.Jul 25 2016, 3:33 AM

nhaehnle retitled this revision from to AMDGPU: override shouldNormalizeToSelectSequence.

nhaehnle updated this object.

nhaehnle added reviewers: arsenm, • tstellarAMD.

nhaehnle added a subscriber: llvm-commits.

Herald added subscribers: kzhuravl, arsenm. · View Herald TranscriptJul 25 2016, 3:33 AM

I think you should try https://reviews.llvm.org/D9419 instead (which will have the same result). HasMultipleConditionRegisters also impacts CodeGenPrepare unlike this. I never had time to look at the regression Tom found.

The problem is Whether or not this is really profitable depends on whether the select will be a scalar or vector select. CodeGenPrepare currently does something like this depending on HasMultipleConditionRegisters. I want to split out this code (which I also think is redundant with FlattenCFG) into a utility, and then have AMDGPUCodeGenPrepare decide to do this based on DivergenceAnalysis. The problem now is we don't ever use scalar selects, so that codegen problem should probably be fixed first.

D9419 doesn't have the same result, actually, but your point about scalar selects makes a lot of sense, so I'm dropping this for now.

In D22748#495810, @nhaehnle wrote:

D9419 doesn't have the same result, actually, but your point about scalar selects makes a lot of sense, so I'm dropping this for now.

How does it differ? Does your testcase change?

Yes. With shouldNormalizeToSelectSequence = false, the test cases generate

v_cmp
v_cmp
s_and/s_or
v_cndmask

With shouldNormalizeToSelectSequence = true (ie., the default), they generate

v_cmp
v_cmp
v_cndmask
v_cndmask

setHasMultipleConditionRegisters doesn't make a difference.

I believe the first sequence is better, since we usually don't use many SALU instructions, and the s_and/s_or gives us a better balance between SALU and VALU. But of course that only applies when the select operates on VGPRs...

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SIISelLowering.h

2 lines

SIISelLowering.cpp

6 lines

test/

CodeGen/

AMDGPU/

select-andor.ll

28 lines

Diff 65323

lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	public:

bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override;		Type *Ty) const override;

bool isTypeDesirableForOp(unsigned Op, EVT VT) const override;		bool isTypeDesirableForOp(unsigned Op, EVT VT) const override;

bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;		bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;

		bool shouldNormalizeToSelectSequence(LLVMContext &, EVT) const override;

SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,		SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,
bool isVarArg,		bool isVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
const SDLoc &DL, SelectionDAG &DAG,		const SDLoc &DL, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &InVals) const override;		SmallVectorImpl<SDValue> &InVals) const override;

SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,		SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

	Show First 20 Lines • Show All 1,822 Lines • ▼ Show 20 Lines

	bool			bool
	SITargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {			SITargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
	// We can fold offsets for anything that doesn't require a GOT relocation.			// We can fold offsets for anything that doesn't require a GOT relocation.
	return GA->getAddressSpace() == AMDGPUAS::GLOBAL_ADDRESS &&			return GA->getAddressSpace() == AMDGPUAS::GLOBAL_ADDRESS &&
	!shouldEmitGOTReloc(GA->getGlobal(), getTargetMachine());			!shouldEmitGOTReloc(GA->getGlobal(), getTargetMachine());
	}			}

				bool SITargetLowering::shouldNormalizeToSelectSequence(LLVMContext &,
				EVT) const {
				// Prefer to keep i1 flags around so that boolean logic is done with SALU.
				return false;
				}

	static SDValue buildPCRelGlobalAddress(SelectionDAG &DAG, const GlobalValue *GV,			static SDValue buildPCRelGlobalAddress(SelectionDAG &DAG, const GlobalValue *GV,
	SDLoc DL, unsigned Offset, EVT PtrVT,			SDLoc DL, unsigned Offset, EVT PtrVT,
	unsigned GAFlags = SIInstrInfo::MO_NONE) {			unsigned GAFlags = SIInstrInfo::MO_NONE) {
	// In order to support pc-relative addressing, the PC_ADD_REL_OFFSET SDNode is			// In order to support pc-relative addressing, the PC_ADD_REL_OFFSET SDNode is
	// lowered to the following code sequence:			// lowered to the following code sequence:
	// s_getpc_b64 s[0:1]			// s_getpc_b64 s[0:1]
	// s_add_u32 s0, s0, $symbol			// s_add_u32 s0, s0, $symbol
	// s_addc_u32 s1, s1, 0			// s_addc_u32 s1, s1, 0
	▲ Show 20 Lines • Show All 1,933 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/select-andor.ll

This file was added.

				; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck %s

				; CHECK-LABEL: {{^}}select_and:
				; CHECK: v_cmp_lt
				; CHECK-NEXT: v_cmp_lt
				; CHECK-NEXT: s_and_b64
				; CHECK-NEXT: v_cndmask
				define amdgpu_vs float @select_and(i32 %cond1, i32 %cond2, float %a, float %b) nounwind {
				%cc1 = icmp ugt i32 %cond1, 5
				%cc2 = icmp ugt i32 %cond2, 7
				%cc = and i1 %cc1, %cc2
				%sel = select i1 %cc, float %a, float %b
				ret float %sel
				}

				; CHECK-LABEL: {{^}}select_or:
				; CHECK: v_cmp_lt
				; CHECK-NEXT: v_cmp_lt
				; CHECK-NEXT: s_or_b64
				; CHECK-NEXT: v_cndmask
				define amdgpu_vs float @select_or(i32 %cond1, i32 %cond2, float %a, float %b) nounwind {
				%cc1 = icmp ugt i32 %cond1, 5
				%cc2 = icmp ugt i32 %cond2, 7
				%cc = or i1 %cc1, %cc2
				%sel = select i1 %cc, float %a, float %b
				ret float %sel
				}