This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32
ClosedPublic

Authored by rampitec on Jun 16 2017, 1:23 PM.

Download Raw Diff

Details

Reviewers

arsenm
vpykhtin

Summary

If there is an immediate operand we shall not shrink V_SUBB_U32 and V_ADDC_U32, it does not fit e32 encoding.

Diff Detail

Repository: rL LLVM

Event Timeline

rampitec created this revision.Jun 16 2017, 1:23 PM

Herald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptJun 16 2017, 1:23 PM

Same problem with V_ADDC_U32, was able to reproduce it with mir test.

rampitec added a child revision: D34300: [AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setcc.Jun 16 2017, 3:54 PM

rampitec added a reviewer: vpykhtin.

arsenm added inline comments.Jun 16 2017, 4:16 PM

lib/Target/AMDGPU/SIShrinkInstructions.cpp
95	Can you add a test using a frame index and global address, those should have the same issue

rampitec added inline comments.Jun 16 2017, 4:46 PM

lib/Target/AMDGPU/SIShrinkInstructions.cpp

I really doubt they can directly come to either of these two instructions, and I'm probably unable to forge such a source. Then it I do it in mir...

        %vreg4<def>, %vreg5<def> = V_SUBBREV_U32_e64 <ga:@arr>, %vreg0, %vreg3, %EXEC<imp-use>; VGPR_32:%vreg4,%vreg0 SReg_64:%vreg5,%vreg3

*** Bad machine code: Illegal immediate value for operand. ***
- function:    subbrev
- basic block: BB#0  (0x58eb3e0)
- instruction: %vreg4<def>, %vreg5<def> = V_SUBBREV_U32_e64

The placement in src0 is legal, that is global address is not legal for the operand, so it cannot even come to the pass I guess.

This is the OperandInfo:

static const MCOperandInfo OperandInfo243[] = { { AMDGPU::VGPR_32RegClassID, 0, MCOI::OPERAND_REGISTER, 0 }, { AMDGPU::SReg_64RegClassID, 0, MCOI::OPERAND_REGISTER, 0 }, { AMDGPU::VS_32RegClassID, 0, AMDGPU::OPERAND_REG_INLINE_C_INT32, 0 }, { AMDGPU::VS_32RegClassID, 0, AMDGPU::OPERAND_REG_INLINE_C_INT32, 0 }, { AMDGPU::SReg_64RegClassID, 0, AMDGPU::OPERAND_REG_INLINE_C_INT64, 0 }, };

verifyInstruction:

case AMDGPU::OPERAND_REG_INLINE_C_INT32:
case AMDGPU::OPERAND_REG_INLINE_C_FP32:
case AMDGPU::OPERAND_REG_INLINE_C_INT64:
case AMDGPU::OPERAND_REG_INLINE_C_FP64:
case AMDGPU::OPERAND_REG_INLINE_C_INT16:
case AMDGPU::OPERAND_REG_INLINE_C_FP16: {
  const MachineOperand &MO = MI.getOperand(i);
  if (!MO.isReg() && (!MO.isImm() || !isInlineConstant(MI, i))) {
    ErrInfo = "Illegal immediate value for operand.";
    return false;
  }
  break;

So neither is expected as such an input.

arsenm added inline comments.Jun 19 2017, 10:58 AM

lib/Target/AMDGPU/SIShrinkInstructions.cpp
95	Oh right, this is the inline constant case
95	This is the wrong place to handle this. This is exactly the same as the CNDMASK case, where this needs to be VCC. This should add the handling there so the regalloc hints are added so there is a better chance of shrinking in the post-RA run
test/CodeGen/AMDGPU/shrink-carry.mir
7–8	It's easier to read the test if these are put next to the function rather than all clustered at the top

rampitec added inline comments.Jun 19 2017, 11:07 AM

lib/Target/AMDGPU/SIShrinkInstructions.cpp
95	Not arguing about hints, but I do not see why shall we allow illegal shrinking. A rock solid solution is to call isOperandLegal, but it needs an instruction already built. So that would mean to build an instruction, try, and delete it, which is way suboptimal.

Moved around check statements in the test.

test/CodeGen/AMDGPU/shrink-carry.mir
7–8	OK, To me it was easier to read when they are at the top, but not really very important.

arsenm added inline comments.Jun 19 2017, 11:21 AM

lib/Target/AMDGPU/SIShrinkInstructions.cpp
95	It's not allowing illegal shrinking. It just defers the VCC specific check later to apply the register allocator hints and some other reason I forgot. canShrink could use a better name

rampitec added inline comments.Jun 19 2017, 11:24 AM

lib/Target/AMDGPU/SIShrinkInstructions.cpp
95	Shrink pass runs before and after RA. That is after RA where it breaks, so hints will not help.

arsenm added inline comments.Jun 19 2017, 11:34 AM

lib/Target/AMDGPU/SIShrinkInstructions.cpp
95	The hints aren't supposed to fix the problem, they are supposed to increase the chance shrinking will be possible in the post-RA run

rampitec added inline comments.Jun 19 2017, 11:36 AM

lib/Target/AMDGPU/SIShrinkInstructions.cpp
95	That is what I mean. I agree they are helpful, but that shall be a separate patch.

LGTM.

This revision is now accepted and ready to land.Jun 20 2017, 7:42 AM

rL305840

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SIShrinkInstructions.cpp

2 lines

test/

CodeGen/

AMDGPU/

shrink-carry.mir

101 lines

Diff 103080

lib/Target/AMDGPU/SIShrinkInstructions.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	static bool canShrink(MachineInstr &MI, const SIInstrInfo *TII,
// a register allocation hint pre-regalloc and then do the shrinking		// a register allocation hint pre-regalloc and then do the shrinking
// post-regalloc.		// post-regalloc.
if (Src2) {		if (Src2) {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: return false;		default: return false;

case AMDGPU::V_ADDC_U32_e64:		case AMDGPU::V_ADDC_U32_e64:
case AMDGPU::V_SUBB_U32_e64:		case AMDGPU::V_SUBB_U32_e64:
		if (TII->getNamedOperand(MI, AMDGPU::OpName::src1)->isImm())
		arsenmUnsubmitted Not Done Reply Inline Actions Can you add a test using a frame index and global address, those should have the same issue arsenm: Can you add a test using a frame index and global address, those should have the same issue
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions I really doubt they can directly come to either of these two instructions, and I'm probably unable to forge such a source. Then it I do it in mir... %vreg4<def>, %vreg5<def> = V_SUBBREV_U32_e64 <ga:@arr>, %vreg0, %vreg3, %EXEC<imp-use>; VGPR_32:%vreg4,%vreg0 SReg_64:%vreg5,%vreg3 * Bad machine code: Illegal immediate value for operand. * - function: subbrev - basic block: BB#0 (0x58eb3e0) - instruction: %vreg4<def>, %vreg5<def> = V_SUBBREV_U32_e64 The placement in src0 is legal, that is global address is not legal for the operand, so it cannot even come to the pass I guess. This is the OperandInfo: static const MCOperandInfo OperandInfo243[] = { { AMDGPU::VGPR_32RegClassID, 0, MCOI::OPERAND_REGISTER, 0 }, { AMDGPU::SReg_64RegClassID, 0, MCOI::OPERAND_REGISTER, 0 }, { AMDGPU::VS_32RegClassID, 0, AMDGPU::OPERAND_REG_INLINE_C_INT32, 0 }, { AMDGPU::VS_32RegClassID, 0, AMDGPU::OPERAND_REG_INLINE_C_INT32, 0 }, { AMDGPU::SReg_64RegClassID, 0, AMDGPU::OPERAND_REG_INLINE_C_INT64, 0 }, }; verifyInstruction: case AMDGPU::OPERAND_REG_INLINE_C_INT32: case AMDGPU::OPERAND_REG_INLINE_C_FP32: case AMDGPU::OPERAND_REG_INLINE_C_INT64: case AMDGPU::OPERAND_REG_INLINE_C_FP64: case AMDGPU::OPERAND_REG_INLINE_C_INT16: case AMDGPU::OPERAND_REG_INLINE_C_FP16: { const MachineOperand &MO = MI.getOperand(i); if (!MO.isReg() && (!MO.isImm() \|\| !isInlineConstant(MI, i))) { ErrInfo = "Illegal immediate value for operand."; return false; } break; So neither is expected as such an input. rampitec: I really doubt they can directly come to either of these two instructions, and I'm probably…
		arsenmUnsubmitted Not Done Reply Inline Actions Oh right, this is the inline constant case arsenm: Oh right, this is the inline constant case
		arsenmUnsubmitted Not Done Reply Inline Actions This is the wrong place to handle this. This is exactly the same as the CNDMASK case, where this needs to be VCC. This should add the handling there so the regalloc hints are added so there is a better chance of shrinking in the post-RA run arsenm: This is the wrong place to handle this. This is exactly the same as the CNDMASK case, where…
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions Not arguing about hints, but I do not see why shall we allow illegal shrinking. A rock solid solution is to call isOperandLegal, but it needs an instruction already built. So that would mean to build an instruction, try, and delete it, which is way suboptimal. rampitec: Not arguing about hints, but I do not see why shall we allow illegal shrinking. A rock solid…
		arsenmUnsubmitted Not Done Reply Inline Actions It's not allowing illegal shrinking. It just defers the VCC specific check later to apply the register allocator hints and some other reason I forgot. canShrink could use a better name arsenm: It's not allowing illegal shrinking. It just defers the VCC specific check later to apply the…
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions Shrink pass runs before and after RA. That is after RA where it breaks, so hints will not help. rampitec: Shrink pass runs before and after RA. That is after RA where it breaks, so hints will not help.
		arsenmUnsubmitted Not Done Reply Inline Actions The hints aren't supposed to fix the problem, they are supposed to increase the chance shrinking will be possible in the post-RA run arsenm: The hints aren't supposed to fix the problem, they are supposed to increase the chance…
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions That is what I mean. I agree they are helpful, but that shall be a separate patch. rampitec: That is what I mean. I agree they are helpful, but that shall be a separate patch.
		return false;
// Additional verification is needed for sdst/src2.		// Additional verification is needed for sdst/src2.
return true;		return true;

case AMDGPU::V_MAC_F32_e64:		case AMDGPU::V_MAC_F32_e64:
case AMDGPU::V_MAC_F16_e64:		case AMDGPU::V_MAC_F16_e64:
if (!isVGPR(Src2, TRI, MRI) \|\|		if (!isVGPR(Src2, TRI, MRI) \|\|
TII->hasModifiersSet(MI, AMDGPU::OpName::src2_modifiers))		TII->hasModifiersSet(MI, AMDGPU::OpName::src2_modifiers))
return false;		return false;
▲ Show 20 Lines • Show All 438 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/shrink-carry.mir

This file was added.

				# RUN: llc -march=amdgcn -verify-machineinstrs -start-before si-shrink-instructions -stop-before si-insert-skips -o - %s \| FileCheck -check-prefix=GCN %s

				# GCN-LABEL: name: subbrev{{$}}
				# GCN: V_SUBBREV_U32_e64 0, undef %vgpr0, killed %vcc, implicit %exec

				---
				name: subbrev
				tracksRegLiveness: true
				arsenmUnsubmitted Done Reply Inline Actions It's easier to read the test if these are put next to the function rather than all clustered at the top arsenm: It's easier to read the test if these are put next to the function rather than all clustered at…
				rampitecAuthorUnsubmitted Not Done Reply Inline Actions OK, To me it was easier to read when they are at the top, but not really very important. rampitec: OK, To me it was easier to read when they are at the top, but not really very important.
				registers:
				- { id: 0, class: vgpr_32 }
				- { id: 1, class: vgpr_32 }
				- { id: 2, class: vgpr_32 }
				- { id: 3, class: sreg_64 }
				- { id: 4, class: vgpr_32 }
				- { id: 5, class: sreg_64 }
				body: \|
				bb.0:

				%0 = IMPLICIT_DEF
				%1 = IMPLICIT_DEF
				%2 = IMPLICIT_DEF
				%3 = V_CMP_GT_U32_e64 %0, %1, implicit %exec
				%4, %5 = V_SUBBREV_U32_e64 0, %0, %3, implicit %exec
				S_ENDPGM

				...

				# GCN-LABEL: name: subb{{$}}
				# GCN: V_SUBB_U32_e64 undef %vgpr0, 0, killed %vcc, implicit %exec

				---
				name: subb
				tracksRegLiveness: true
				registers:
				- { id: 0, class: vgpr_32 }
				- { id: 1, class: vgpr_32 }
				- { id: 2, class: vgpr_32 }
				- { id: 3, class: sreg_64 }
				- { id: 4, class: vgpr_32 }
				- { id: 5, class: sreg_64 }
				body: \|
				bb.0:

				%0 = IMPLICIT_DEF
				%1 = IMPLICIT_DEF
				%2 = IMPLICIT_DEF
				%3 = V_CMP_GT_U32_e64 %0, %1, implicit %exec
				%4, %5 = V_SUBB_U32_e64 %0, 0, %3, implicit %exec
				S_ENDPGM

				...

				# GCN-LABEL: name: addc{{$}}
				# GCN: V_ADDC_U32_e32 0, undef %vgpr0, implicit-def %vcc, implicit killed %vcc, implicit %exec

				---
				name: addc
				tracksRegLiveness: true
				registers:
				- { id: 0, class: vgpr_32 }
				- { id: 1, class: vgpr_32 }
				- { id: 2, class: vgpr_32 }
				- { id: 3, class: sreg_64 }
				- { id: 4, class: vgpr_32 }
				- { id: 5, class: sreg_64 }
				body: \|
				bb.0:

				%0 = IMPLICIT_DEF
				%1 = IMPLICIT_DEF
				%2 = IMPLICIT_DEF
				%3 = V_CMP_GT_U32_e64 %0, %1, implicit %exec
				%4, %5 = V_ADDC_U32_e64 0, %0, %3, implicit %exec
				S_ENDPGM

				...

				# GCN-LABEL: name: addc2{{$}}
				# GCN: V_ADDC_U32_e32 0, undef %vgpr0, implicit-def %vcc, implicit killed %vcc, implicit %exec

				---
				name: addc2
				tracksRegLiveness: true
				registers:
				- { id: 0, class: vgpr_32 }
				- { id: 1, class: vgpr_32 }
				- { id: 2, class: vgpr_32 }
				- { id: 3, class: sreg_64 }
				- { id: 4, class: vgpr_32 }
				- { id: 5, class: sreg_64 }
				body: \|
				bb.0:

				%0 = IMPLICIT_DEF
				%1 = IMPLICIT_DEF
				%2 = IMPLICIT_DEF
				%3 = V_CMP_GT_U32_e64 %0, %1, implicit %exec
				%4, %5 = V_ADDC_U32_e64 %0, 0, %3, implicit %exec
				S_ENDPGM

				...

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 103080

lib/Target/AMDGPU/SIShrinkInstructions.cpp

test/CodeGen/AMDGPU/shrink-carry.mir

[AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32
ClosedPublic