This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Optimize adjacent s_nop instructions
ClosedPublic

Authored by • tstellarAMD on Mar 30 2016, 8:21 AM.

Download Raw Diff

Details

Reviewers

nhaehnle
arsenm

Summary

Use the operand for how long to wait. This is somewhat
distasteful, since it would be better to just emit s_nop
with the right argument in the first place. This would require
changing TII::insertNoop to emit N operands, which would be easy.
Slightly more problematic is the post-RA scheduler and hazard recognizer
represent nops as a single null node, and would require inventing
another way of representing N nops.

Patch by: Matt Arsenault

Diff Detail

Event Timeline

• tstellarAMD updated this revision to Diff 52064.Mar 30 2016, 8:21 AM

• tstellarAMD retitled this revision from to AMDGPU/SI: Optimize adjacent s_nop instructions.

• tstellarAMD updated this object.

• tstellarAMD added a reviewer: arsenm.

• tstellarAMD added a subscriber: llvm-commits.

Herald added a subscriber: arsenm. · View Herald TranscriptMar 30 2016, 8:21 AM

Ping.

Looks mostly good, just one comment.

lib/Target/AMDGPU/SIShrinkInstructions.cpp
260–261	I think you need to guard against the case where Next == MBB.end() - in that case, NextMI.getOpcode() (or I guess technically already the assignment to NextMI) seems to invoke undefined behavior.

Fix possible undefined behavior.

LGTM

This revision is now accepted and ready to land.Apr 22 2016, 3:54 PM

r267456

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SIShrinkInstructions.cpp

24 lines

Diff 52064

lib/Target/AMDGPU/SIShrinkInstructions.cpp

Show First 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	bool SIShrinkInstructions::runOnMachineFunction(MachineFunction &MF) {
for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();		for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
BI != BE; ++BI) {		BI != BE; ++BI) {

MachineBasicBlock &MBB = *BI;		MachineBasicBlock &MBB = *BI;
MachineBasicBlock::iterator I, Next;		MachineBasicBlock::iterator I, Next;
for (I = MBB.begin(); I != MBB.end(); I = Next) {		for (I = MBB.begin(); I != MBB.end(); I = Next) {
Next = std::next(I);		Next = std::next(I);
MachineInstr &MI = *I;		MachineInstr &MI = *I;
		MachineInstr &NextMI = *Next;

// Try to use S_MOVK_I32, which will save 4 bytes for small immediates.		// Try to use S_MOVK_I32, which will save 4 bytes for small immediates.
if (MI.getOpcode() == AMDGPU::S_MOV_B32) {		if (MI.getOpcode() == AMDGPU::S_MOV_B32) {
const MachineOperand &Src = MI.getOperand(1);		const MachineOperand &Src = MI.getOperand(1);

if (Src.isImm()) {		if (Src.isImm()) {
if (isInt<16>(Src.getImm()) && !TII->isInlineConstant(Src, 4))		if (isInt<16>(Src.getImm()) && !TII->isInlineConstant(Src, 4))
MI.setDesc(TII->get(AMDGPU::S_MOVK_I32));		MI.setDesc(TII->get(AMDGPU::S_MOVK_I32));
Show All 20 Lines	for (I = MBB.begin(); I != MBB.end(); I = Next) {
if (ReverseImm >= -16 && ReverseImm <= 64) {		if (ReverseImm >= -16 && ReverseImm <= 64) {
MI.setDesc(TII->get(AMDGPU::V_BFREV_B32_e32));		MI.setDesc(TII->get(AMDGPU::V_BFREV_B32_e32));
Src.setImm(ReverseImm);		Src.setImm(ReverseImm);
continue;		continue;
}		}
}		}
}		}
}		}
		// Combine adjacent s_nops to use the immediate operand encoding how long
		// to wait.
		//
		// s_nop N
		// s_nop M
		// =>
		// s_nop (N + M)
		if (MI.getOpcode() == AMDGPU::S_NOP &&
		NextMI.getOpcode() == AMDGPU::S_NOP) {
		nhaehnleUnsubmitted Done Reply Inline Actions I think you need to guard against the case where Next == MBB.end() - in that case, NextMI.getOpcode() (or I guess technically already the assignment to NextMI) seems to invoke undefined behavior. nhaehnle: I think you need to guard against the case where Next == MBB.end() - in that case, NextMI.
		// The instruction encodes the amount to wait with an offset of 1,
		// i.e. 0 is wait 1 cycle. Convert both to cycles and then convert back
		// after adding.
		uint8_t Nop0 = MI.getOperand(0).getImm() + 1;
		uint8_t Nop1 = NextMI.getOperand(0).getImm() + 1;

		// Make sure we don't overflow the bounds.
		if (Nop0 + Nop1 <= 8) {
		NextMI.getOperand(0).setImm(Nop0 + Nop1 - 1);
		MI.eraseFromParent();
		}

		continue;
		}

if (!TII->hasVALU32BitEncoding(MI.getOpcode()))		if (!TII->hasVALU32BitEncoding(MI.getOpcode()))
continue;		continue;

if (!canShrink(MI, TII, TRI, MRI)) {		if (!canShrink(MI, TII, TRI, MRI)) {
// Try commuting the instruction and see if that enables us to shrink		// Try commuting the instruction and see if that enables us to shrink
// it.		// it.
if (!MI.isCommutable() \|\| !TII->commuteInstruction(&MI) \|\|		if (!MI.isCommutable() \|\| !TII->commuteInstruction(&MI) \|\|
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines