This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setcc
ClosedPublic

Authored by rampitec on Jun 16 2017, 3:53 PM.

Details

Summary

This simplification allows to avoid generating v_cndmask_b32 to serialize condition code between compare and use.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Jun 16 2017, 3:53 PM
arsenm edited edge metadata.Jun 16 2017, 4:15 PM

Why does this need to be a target combine? Can't we use the sub with overflow generic nodes?

Why does this need to be a target combine? Can't we use the sub with overflow generic nodes?

Surprisingly ISD::ADDE and ISD::SUBE expect two operands, regardless of what is written in documentation. See for example this pattern with two inputs:

let Uses = [SCC] in { // Carry in comes from SCC
let isCommutable = 1 in {
def S_ADDC_U32 : SOP2_32 <"s_addc_u32",
  [(set i32:$sdst, (adde (i32 SSrc_b32:$src0), (i32 SSrc_b32:$src1)))]>;
} // End isCommutable = 1

So I did it the same way as ARM.

rampitec updated this revision to Diff 103124.Jun 19 2017, 4:48 PM

Fixed typo in the test.

Why does this need to be a target combine? Can't we use the sub with overflow generic nodes?

Surprisingly ISD::ADDE and ISD::SUBE expect two operands, regardless of what is written in documentation. See for example this pattern with two inputs:

let Uses = [SCC] in { // Carry in comes from SCC
let isCommutable = 1 in {
def S_ADDC_U32 : SOP2_32 <"s_addc_u32",
  [(set i32:$sdst, (adde (i32 SSrc_b32:$src0), (i32 SSrc_b32:$src1)))]>;
} // End isCommutable = 1

So I did it the same way as ARM.

I meant the other sets of add/sub nodes, like ISD::SADDO. The input is a n allocatable register bool like how the instructions work rather than a flag relying on glue like ADDE

Why does this need to be a target combine? Can't we use the sub with overflow generic nodes?

Surprisingly ISD::ADDE and ISD::SUBE expect two operands, regardless of what is written in documentation. See for example this pattern with two inputs:

let Uses = [SCC] in { // Carry in comes from SCC
let isCommutable = 1 in {
def S_ADDC_U32 : SOP2_32 <"s_addc_u32",
  [(set i32:$sdst, (adde (i32 SSrc_b32:$src0), (i32 SSrc_b32:$src1)))]>;
} // End isCommutable = 1

So I did it the same way as ARM.

I meant the other sets of add/sub nodes, like ISD::SADDO. The input is a n allocatable register bool like how the instructions work rather than a flag relying on glue like ADDE

It produces flag, not consumes it. Also it distinguish between signed/unsigned.

rampitec updated this revision to Diff 103125.Jun 19 2017, 5:11 PM

Removed unused variable.

Why does this need to be a target combine? Can't we use the sub with overflow generic nodes?

Surprisingly ISD::ADDE and ISD::SUBE expect two operands, regardless of what is written in documentation. See for example this pattern with two inputs:

let Uses = [SCC] in { // Carry in comes from SCC
let isCommutable = 1 in {
def S_ADDC_U32 : SOP2_32 <"s_addc_u32",
  [(set i32:$sdst, (adde (i32 SSrc_b32:$src0), (i32 SSrc_b32:$src1)))]>;
} // End isCommutable = 1

So I did it the same way as ARM.

I meant the other sets of add/sub nodes, like ISD::SADDO. The input is a n allocatable register bool like how the instructions work rather than a flag relying on glue like ADDE

It produces flag, not consumes it. Also it distinguish between signed/unsigned.

I think the right one is ADDCARRY, SUBCARRY:

The use of this opcode is preferable to adde/sube if the

/// target supports it, as the carry is a regular value rather than a
/// glue, which allows further optimisation.

Why does this need to be a target combine? Can't we use the sub with overflow generic nodes?

Surprisingly ISD::ADDE and ISD::SUBE expect two operands, regardless of what is written in documentation. See for example this pattern with two inputs:

let Uses = [SCC] in { // Carry in comes from SCC
let isCommutable = 1 in {
def S_ADDC_U32 : SOP2_32 <"s_addc_u32",
  [(set i32:$sdst, (adde (i32 SSrc_b32:$src0), (i32 SSrc_b32:$src1)))]>;
} // End isCommutable = 1

So I did it the same way as ARM.

I meant the other sets of add/sub nodes, like ISD::SADDO. The input is a n allocatable register bool like how the instructions work rather than a flag relying on glue like ADDE

It produces flag, not consumes it. Also it distinguish between signed/unsigned.

I think the right one is ADDCARRY, SUBCARRY:

The use of this opcode is preferable to adde/sube if the

/// target supports it, as the carry is a regular value rather than a
/// glue, which allows further optimisation.

It seems to be so new that I do not even see a def for them in TargTargetSelectionDAG.td. Am I missing something?

I think the right one is ADDCARRY, SUBCARRY:

The use of this opcode is preferable to adde/sube if the

/// target supports it, as the carry is a regular value rather than a
/// glue, which allows further optimisation.

It seems to be so new that I do not even see a def for them in TargTargetSelectionDAG.td. Am I missing something?

They do not look like normal nodes. They would require custom select I guess.

vpykhtin accepted this revision.Jun 20 2017, 3:31 AM

LGTM.

This revision is now accepted and ready to land.Jun 20 2017, 3:31 AM

I think the right one is ADDCARRY, SUBCARRY:

The use of this opcode is preferable to adde/sube if the

/// target supports it, as the carry is a regular value rather than a
/// glue, which allows further optimisation.

It seems to be so new that I do not even see a def for them in TargTargetSelectionDAG.td. Am I missing something?

They do not look like normal nodes. They would require custom select I guess.

Looks like I can base AMDGPUadde and AMDGPUsube on these ISD opcodes instead of AMDGPUISD though. That way we will have selection working.

rampitec updated this revision to Diff 103245.Jun 20 2017, 11:51 AM

Switched to ISD opcodes ADDCARYY/SUBCARRY and MVT::i1 instead of MVT::Glue.

arsenm added inline comments.Jun 20 2017, 12:14 PM
lib/Target/AMDGPU/AMDGPUInstrInfo.td
186

Weird that the td generic nodes don't exist already for this. Looks like this was added only in April, so you should probably add the generic addcarry/subcarry to TargetSelectionDAG.td

lib/Target/AMDGPU/SIISelLowering.cpp
476–477

ADDCARRY/SUBCARRY should be marked as legal so more combines work on them

4869

This is likely any i1 source. We have some other intrinsics that return i1 values, so maybe a todo?

rampitec added inline comments.Jun 20 2017, 12:20 PM
lib/Target/AMDGPU/AMDGPUInstrInfo.td
186

I'm not sure really. The intent is to gradually switch from addc/subc, so probably new td defs were not assumed.
I think we need to use our nodes for now and remove it in the future. I can add todo here.

lib/Target/AMDGPU/SIISelLowering.cpp
4869

I have initially tried to use it with just any i1 input. That is worse though.
In case if we have i1 argument to a function for example we would create an extra compare if I just perform it with any i1. It looks like it is only beneficial in some cases, like setcc.

rampitec updated this revision to Diff 103258.Jun 20 2017, 12:37 PM
rampitec marked 2 inline comments as done.

Made operations legal.
Added comment.

rampitec added inline comments.Jun 20 2017, 12:38 PM
lib/Target/AMDGPU/SIISelLowering.cpp
476–477

Done, but I do not see any visible difference.

rampitec added inline comments.Jun 20 2017, 3:14 PM
lib/Target/AMDGPU/AMDGPUInstrInfo.td
186

I have created D34423 to add addcarry and subcarry nodes.
Let's see how it goes, and it will be trivial to replace what I have defined here with those if submitted. Meanwhile I guess that is not a blocker for this change.

This revision was automatically updated to reflect the committed changes.