This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Preliminary patch for divergence driven instruction selection. Operands Folding 1.
ClosedPublic

Authored by alex-t on Aug 27 2018, 11:33 AM.

Download Raw Diff

Details

Reviewers

Summary

This patch contains changes that are necessary to handle new patterns that will appear when the main patch for divergence driven ISel will be applied. We need this change before to avoid breaking tests.

Testing: lit tests Codegen/AMDGPU passed.

Diff Detail

Event Timeline

alex-t created this revision.Aug 27 2018, 11:33 AM

Herald added subscribers: t-tye, tpr, dstuttard and 6 others. · View Herald TranscriptAug 27 2018, 11:33 AM

alex-t retitled this revision from [AMDGPU] Preliminary patch for divergence driven instruction selection to [AMDGPU] Preliminary patch for divergence driven instruction selection. NFC..Aug 27 2018, 11:34 AM

alex-t edited the summary of this revision. (Show Details)

alex-t added a reviewer: rampitec.

Needs tests and comments.

I think I know what cases you are trying to solve, and there's usually a better way. Do you have a specific example?

lib/Target/AMDGPU/SIFoldOperands.cpp
387	Commented out code

rampitec added inline comments.Aug 27 2018, 2:27 PM

lib/Target/AMDGPU/SIFoldOperands.cpp
385	80 chars per line please.

These are 3 different changes. You can also write mir tests for these even if it does not appear in current codegen.

Change has been split to several separate.
MIR test added

In D51316#1214831, @arsenm wrote:

Needs tests and comments.

I think I know what cases you are trying to solve, and there's usually a better way. Do you have a specific example?

This is NFC. It does nothing until the TD files are not changed.

My change in the TD files contain re-worked isVGPRImm predicate function.
It extends the set of the immediate operands selected to S_MOV. The goal is to save VGPRs.
If it detects that scalar operand is not valid in current position it asks SIInstrInfo if the user is commutable and if the scalar operand will be valid when commuted.
If so it selects to S_MOV.

The aim of the change is to workaround the COPY inserted by the generic ISel code.
It inserts the COPY SGPR to VGPR because it does not aware that we will commute further.

It is not really an NFC. It changes codegen even if you cannot forge such an IR right now. You cannot write a test for an NFC. NFC is something like changing comment or formatting, or changing a vector with list. Also NFC does not usually need a review. Please change subject.

Also fix formatting.

test/CodeGen/AMDGPU/fold-imm-copy.mir
6	I do not see this copy even with current trunk, so the test needs to be different, at least test checks.

alex-t added inline comments.Aug 28 2018, 10:20 AM

test/CodeGen/AMDGPU/fold-imm-copy.mir
6	That is correct. You cannot see the COPY since it is created by general ISel code with new isVGPRImm predicate function. Current trunk contains old one. So no COPY so far. The test checks that the SREG is folded. When I submit the rest of the changes it won't need to be updated.

rampitec added inline comments.Aug 28 2018, 10:39 AM

test/CodeGen/AMDGPU/fold-imm-copy.mir
6	This does not sound relevant. This is a mir test and ISel is not involved, you can create any artificial MIR you want to handle.

alex-t added inline comments.Aug 29 2018, 4:25 AM

test/CodeGen/AMDGPU/fold-imm-copy.mir
6	MIR test was generated by the new ISel and contains COPY: line 32: %9:vgpr_32 = COPY %8 Thus, the aim of the check is that SIFoldOperands with the change is able to fold the SGPR with the COPY in between.

alex-t retitled this revision from [AMDGPU] Preliminary patch for divergence driven instruction selection. Operands Folding 1. NFC. to [AMDGPU] Preliminary patch for divergence driven instruction selection. Operands Folding 1..Aug 29 2018, 4:26 AM

rampitec added inline comments.Aug 29 2018, 8:26 AM

test/CodeGen/AMDGPU/fold-imm-copy.mir
6	The problem is there is no copy even without the optimization. There is v_mov_b32 instruction though. Did you mean to check that mov was eliminated?

Formatting. Test corrected.

test/CodeGen/AMDGPU/fold-imm-copy.mir
6	Oh... Yes. I understand :) No COPY after folding but V_MOV instead. V_MOV will be cleared by DCE further if no uses that require VGPR exist. The GCN-NOT COPY is odd.

LGTM

This revision is now accepted and ready to land.Aug 29 2018, 11:32 AM

arsenm added inline comments.Aug 29 2018, 10:06 PM

test/CodeGen/AMDGPU/fold-imm-copy.mir
10–19	You should be able to drop the registers section

'registers' section dropped in test

comitted: r341068
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341068 91177308-0d34-0410-b5e6-96231b3b80d8

alex-t closed this revision.Sep 13 2018, 5:52 AM

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SIFoldOperands.cpp

27 lines

test/

CodeGen/

AMDGPU/

fold-imm-copy.mir

22 lines

Diff 163303

lib/Target/AMDGPU/SIFoldOperands.cpp

Context not available.

	bool FoldingImm = OpToFold.isImm();	bool FoldingImm = OpToFold.isImm();

	// In order to fold immediates into copies, we need to change the
	// copy to a MOV.
	if (FoldingImm && UseMI->isCopy()) {	if (FoldingImm && UseMI->isCopy()) {
	unsigned DestReg = UseMI->getOperand(0).getReg();	unsigned DestReg = UseMI->getOperand(0).getReg();
	const TargetRegisterClass *DestRC	const TargetRegisterClass *DestRC
Context not available.
	MRI->getRegClass(DestReg) :	MRI->getRegClass(DestReg) :
	TRI->getPhysRegClass(DestReg);	TRI->getPhysRegClass(DestReg);

		unsigned SrcReg = UseMI->getOperand(1).getReg();
		if (TargetRegisterInfo::isVirtualRegister(DestReg) &&
		TargetRegisterInfo::isVirtualRegister(SrcReg)) {
		const TargetRegisterClass * SrcRC = MRI->getRegClass(SrcReg);
		if (TRI->isSGPRClass(SrcRC) && TRI->hasVGPRs(DestRC)) {
		MachineRegisterInfo::use_iterator NextUse;
		SmallVector<FoldCandidate, 4> CopyUses;
		for (MachineRegisterInfo::use_iterator
		Use = MRI->use_begin(DestReg), E = MRI->use_end();
		Use != E; Use = NextUse) {
		NextUse = std::next(Use);
		FoldCandidate FC = FoldCandidate(Use->getParent(),
		Use.getOperandNo(), &UseMI->getOperand(1));
		CopyUses.push_back(FC);
		rampitecUnsubmitted Done Reply Inline Actions 80 chars per line please. rampitec: 80 chars per line please.
		}
		for (auto & F : CopyUses) {
		arsenmUnsubmitted Done Reply Inline Actions Commented out code arsenm: Commented out code
		foldOperand(*F.OpToFold, F.UseMI, F.UseOpNo,
		FoldList, CopiesToReplace);
		}
		}
		}

		// In order to fold immediates into copies, we need to change the
		// copy to a MOV.

	unsigned MovOp = TII->getMovOpcode(DestRC);	unsigned MovOp = TII->getMovOpcode(DestRC);
	if (MovOp == AMDGPU::COPY)	if (MovOp == AMDGPU::COPY)
	return;	return;
Context not available.

test/CodeGen/AMDGPU/fold-imm-copy.mir

This file was added.

				# RUN: llc -march=amdgcn -run-pass si-fold-operands %s -o - \| FileCheck -check-prefix=GCN %s

				# GCN-LABEL: name: fold-imm-copy
				# GCN: [[SREG:%[0-9+]]]:sreg_32_xm0 = S_MOV_B32 65535
				# GCN: V_AND_B32_e32 [[SREG]]

				rampitecUnsubmitted Not Done Reply Inline Actions I do not see this copy even with current trunk, so the test needs to be different, at least test checks. rampitec: I do not see this copy even with current trunk, so the test needs to be different, at least…
				alex-tAuthorUnsubmitted Not Done Reply Inline Actions That is correct. You cannot see the COPY since it is created by general ISel code with new isVGPRImm predicate function. Current trunk contains old one. So no COPY so far. The test checks that the SREG is folded. When I submit the rest of the changes it won't need to be updated. alex-t: That is correct. You cannot see the COPY since it is created by general ISel code with new…
				rampitecUnsubmitted Not Done Reply Inline Actions This does not sound relevant. This is a mir test and ISel is not involved, you can create any artificial MIR you want to handle. rampitec: This does not sound relevant. This is a mir test and ISel is not involved, you can create any…
				alex-tAuthorUnsubmitted Not Done Reply Inline Actions MIR test was generated by the new ISel and contains COPY: line 32: %9:vgpr_32 = COPY %8 Thus, the aim of the check is that SIFoldOperands with the change is able to fold the SGPR with the COPY in between. alex-t: MIR test was generated by the new ISel and contains COPY: line 32: %9:vgpr_32 = COPY %8 Thus…
				rampitecUnsubmitted Not Done Reply Inline Actions The problem is there is no copy even without the optimization. There is v_mov_b32 instruction though. Did you mean to check that mov was eliminated? rampitec: The problem is there is no copy even without the optimization. There is v_mov_b32 instruction…
				alex-tAuthorUnsubmitted Done Reply Inline Actions Oh... Yes. I understand :) No COPY after folding but V_MOV instead. V_MOV will be cleared by DCE further if no uses that require VGPR exist. The GCN-NOT COPY is odd. alex-t: Oh... Yes. I understand :) No COPY after folding but V_MOV instead. V_MOV will be cleared by…
				name: fold-imm-copy
				body: \|
				bb.0:
				liveins: $vgpr0, $sgpr0_sgpr1
				%0:vgpr_32 = COPY $vgpr0
				%1:sgpr_64 = COPY $sgpr0_sgpr1
				%2:sreg_128 = S_LOAD_DWORDX4_IMM %1, 9, 0
				%3:sreg_32_xm0 = S_MOV_B32 2
				%4:vgpr_32 = V_LSHLREV_B32_e64 killed %3, %0, implicit $exec
				%5:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				%6:vreg_64 = REG_SEQUENCE killed %4, %subreg.sub0, killed %5, %subreg.sub1
				%7:vgpr_32 = BUFFER_LOAD_DWORD_ADDR64 %6, %2, 0, 4, 0, 0, 0, implicit $exec
				%8:sreg_32_xm0 = S_MOV_B32 65535
				arsenmUnsubmitted Done Reply Inline Actions You should be able to drop the registers section arsenm: You should be able to drop the registers section
				%9:vgpr_32 = COPY %8
				%10:vgpr_32 = V_AND_B32_e32 %7, %9, implicit $exec
				...