This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Optimize adjacent s_nop instructions
ClosedPublic

Authored by • tstellarAMD on Mar 30 2016, 8:21 AM.

Download Raw Diff

Details

Reviewers

nhaehnle
arsenm

Summary

Use the operand for how long to wait. This is somewhat
distasteful, since it would be better to just emit s_nop
with the right argument in the first place. This would require
changing TII::insertNoop to emit N operands, which would be easy.
Slightly more problematic is the post-RA scheduler and hazard recognizer
represent nops as a single null node, and would require inventing
another way of representing N nops.

Patch by: Matt Arsenault

Diff Detail

Event Timeline

• tstellarAMD updated this revision to Diff 52064.Mar 30 2016, 8:21 AM

• tstellarAMD retitled this revision from to AMDGPU/SI: Optimize adjacent s_nop instructions.

• tstellarAMD updated this object.

• tstellarAMD added a reviewer: arsenm.

• tstellarAMD added a subscriber: llvm-commits.

Herald added a subscriber: arsenm. · View Herald TranscriptMar 30 2016, 8:21 AM

Ping.

Looks mostly good, just one comment.

lib/Target/AMDGPU/SIShrinkInstructions.cpp
251–252	I think you need to guard against the case where Next == MBB.end() - in that case, NextMI.getOpcode() (or I guess technically already the assignment to NextMI) seems to invoke undefined behavior.

Fix possible undefined behavior.

LGTM

This revision is now accepted and ready to land.Apr 22 2016, 3:54 PM

r267456

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SIShrinkInstructions.cpp

26 lines

Diff 54679

lib/Target/AMDGPU/SIShrinkInstructions.cpp

Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	for (I = MBB.begin(); I != MBB.end(); I = Next) {
if (ReverseImm >= -16 && ReverseImm <= 64) {		if (ReverseImm >= -16 && ReverseImm <= 64) {
MI.setDesc(TII->get(AMDGPU::V_BFREV_B32_e32));		MI.setDesc(TII->get(AMDGPU::V_BFREV_B32_e32));
Src.setImm(ReverseImm);		Src.setImm(ReverseImm);
continue;		continue;
}		}
}		}
}		}
}		}
		// Combine adjacent s_nops to use the immediate operand encoding how long
		// to wait.
		//
		// s_nop N
		// s_nop M
		// =>
		// s_nop (N + M)
		if (MI.getOpcode() == AMDGPU::S_NOP &&
		Next != MBB.end() &&
		nhaehnleUnsubmitted Done Reply Inline Actions I think you need to guard against the case where Next == MBB.end() - in that case, NextMI.getOpcode() (or I guess technically already the assignment to NextMI) seems to invoke undefined behavior. nhaehnle: I think you need to guard against the case where Next == MBB.end() - in that case, NextMI.
		(*Next).getOpcode() == AMDGPU::S_NOP) {

		MachineInstr &NextMI = *Next;
		// The instruction encodes the amount to wait with an offset of 1,
		// i.e. 0 is wait 1 cycle. Convert both to cycles and then convert back
		// after adding.
		uint8_t Nop0 = MI.getOperand(0).getImm() + 1;
		uint8_t Nop1 = NextMI.getOperand(0).getImm() + 1;

		// Make sure we don't overflow the bounds.
		if (Nop0 + Nop1 <= 8) {
		NextMI.getOperand(0).setImm(Nop0 + Nop1 - 1);
		MI.eraseFromParent();
		}

		continue;
		}

// FIXME: We also need to consider movs of constant operands since		// FIXME: We also need to consider movs of constant operands since
// immediate operands are not folded if they have more than one use, and		// immediate operands are not folded if they have more than one use, and
// the operand folding pass is unaware if the immediate will be free since		// the operand folding pass is unaware if the immediate will be free since
// it won't know if the src == dest constraint will end up being		// it won't know if the src == dest constraint will end up being
// satisfied.		// satisfied.
if (MI.getOpcode() == AMDGPU::S_ADD_I32 \|\|		if (MI.getOpcode() == AMDGPU::S_ADD_I32 \|\|
MI.getOpcode() == AMDGPU::S_MUL_I32) {		MI.getOpcode() == AMDGPU::S_MUL_I32) {
▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines