This is an archive of the discontinued LLVM Phabricator instance.

For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes, same as already done for ARM and Thumb2.
ClosedPublic

Authored by tyomitch on Feb 27 2017, 3:41 AM.

Details

Summary

Unfortunately, due to the Thumb1 idiosyncrasy where the instructions
can be *either* flag-setting *or* conditional, this is not expressible
with TableGen patterns, so we have to go for the custom C++ lowering.

Diff Detail

Repository
rL LLVM

Event Timeline

tyomitch created this revision.Feb 27 2017, 3:41 AM
efriedma added inline comments.
lib/Target/ARM/ARMISelDAGToDAG.cpp
3306 ↗(On Diff #89861)

This assertion seems suspicious... why is it true in general?

tyomitch updated this revision to Diff 89993.Feb 28 2017, 2:41 AM

Thanks Eli!
Indeed the assertion was wrong; this also shows how insufficient our tests for long adds/subracts were.
Updating the patch to address both these points.

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

Few nits, but otherwise, looks good.

lib/Target/ARM/ARMISelDAGToDAG.cpp
3240 ↗(On Diff #89993)

Why can't you leave this as an early break?

3272 ↗(On Diff #89993)

use LLVM_FALLTHROUGH

tyomitch added inline comments.Feb 28 2017, 6:36 AM
lib/Target/ARM/ARMISelDAGToDAG.cpp
3240 ↗(On Diff #89993)

Exactly because I want it to fall through to the next case, if the condition doesn't hold.

Ok, just adding LLVM_FALLTHROUGH should be fine for me. I'll let Eli have a final look and approve.

lib/Target/ARM/ARMISelDAGToDAG.cpp
3240 ↗(On Diff #89993)

right, I thought as much.

tyomitch updated this revision to Diff 90031.Feb 28 2017, 7:35 AM

Added LLVM_FALLTHROUGH

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

Are you sure we can't use the same codepath we currently use for Thumb2/ARM here?

test/CodeGen/Thumb/long.ll
80 ↗(On Diff #90031)

I'd also like to see some tests here for subtraction with an immediate amount. ("add i64 %y, -10" etc.)

Are you sure we can't use the same codepath we currently use for Thumb2/ARM here?

I don't think we can.
The existing codepath is itself quite hairy: quoting a comment in ARMInstrInfo.td,

// Currently, ADDS/SUBS are pseudo opcodes that exist only in the
// selection DAG. They are "lowered" to real ADD/SUB opcodes by
// AdjustInstrPostInstrSelection where we determine whether or not to
// set the "s" bit based on CPSR liveness.
//
// FIXME: Eliminate ADDS/SUBS pseudo opcodes after adding tablegen
// support for an optional CPSR definition that corresponds to the DAG
// node's second value. We can then eliminate the implicit def of CPSR.

For the Thumb1 instructions, we cannot choose "whether or not to set the "s" bit"; it's implicitly set iff the instruction isn't predicated.

For the Thumb1 instructions, we cannot choose "whether or not to set the "s" bit"; it's implicitly set iff the instruction isn't predicated.

I think it works out anyway; outside of Thumb1 mode, we want to avoid clobbering CPSR when we don't need to, but it's perfectly legal to produce a dead definition of CPSR.

clobbering CPSR when we don't need to is the least of the problems; what we have in ARM and Thumb2 is that ADD and ADDS are defined separately, the former producing one result (to match an ADD node), and the latter producing two (to match an ADDC node). In Thumb1, we cannot define them separately, so tADD MIs are defined with an OptionalDef for CPSR. The ISel patterns won't let me match an MI with one result value (and an OptionalDef) to an ISD node producing two results. Redefining tADD to always produce two results doesn't work either, because it's assumed, by many layers including AsmParser / AsmPrinter, to still have the OptionalDef for CPSR; and the InstrEmitter won't let me have CPSR as both an OptionalDef and an actual result in the same MI.
Handwave handwave, I cannot really prove that it cannot be done, but I mean I had tried, and I couldn't.

test/CodeGen/Thumb/long.ll
80 ↗(On Diff #90031)

Indeed, subtracting immediates wasn't handled well; I'll upload the updated patch.

tyomitch updated this revision to Diff 90159.Mar 1 2017, 4:17 AM

Patch updated

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

In Thumb1, we cannot define them separately

Why not? "t2ADDSrr" is a pseudo-instruction, not an actual encoding.

lib/Target/ARM/ARMISelDAGToDAG.cpp
3318 ↗(On Diff #90159)

The old patterns don't handle SUBC with an immediate. You can produce this situation with something like this:

long long x(long long a, int b) {
  return a - (((long long)b << 32) | -1U);
}

I think the handling here is correct, but please change it in a separate patch.

Why not? "t2ADDSrr" is a pseudo-instruction, not an actual encoding.

Right; but t2ADC / t2SBC are actual encodings (non-predicable, with non-optional def for CPSR), unlike tADC / tSBC (predicable, with an OptionalDef for CPSR).

It might be possible to do a hybrid implementation, using tPseudoInsts for tADDS / tSUBS, and custom C++ lowering for tADC / tSBC; although this feels like, out of two evils, choosing both.
It would also require duplicating a substantial portion of the code in ARMTargetLowering::AdjustInstrPostInstrSelection to take care of Thumb1 instructions separately, because their MIs have a different structure: in particular, the cc_out operand that AdjustInstrPostInstrSelection is adding must, in Thumb1 instructions, be not last but 1st (and MachineInstr doesn't even have an API to insert a new operand into the middle of an existing instruction).

lib/Target/ARM/ARMISelDAGToDAG.cpp
3318 ↗(On Diff #90159)

The old patterns lower this code into:

movs    r3, #0
mvns    r3, r3
subs    r0, r0, r3
sbcs    r1, r2

on Thumb1, and into much more compact

subs.w  r0, r0, #-1
sbcs    r1, r2

on Thumb2. The new code lowers it into

adds    r0, r0, #1
sbcs    r1, r2

which is equivalent, and even a bit more compact.
I don't really see what the problem is, either with the old patterns or with the new code.

t2ADC / t2SBC are actual encodings (non-predicable

t2ADC should be predicable? At least, there isn't any restriction imposed by the architecture.

The rest makes sense; I'll stop pushing.

I don't really see what the problem is, either with the old patterns or with the new code.

I don't really want to mix multiple orthogonal changes, especially without any test coverage.

t2ADC should be predicable?

I'd think so too! As you see, long addition/subtraction is not the neatest part of the ARM backend :-)

I don't really want to mix multiple orthogonal changes, especially without any test coverage.

Right, now I see what you mean.

These changes are rather intertwined (it's easier to handle both ADDC and SUBC in the same branch of the switch block, than to separate them and faithfully replicate the old behaviour) but I will certainly add a test case for SUBC with immediate.

tyomitch updated this revision to Diff 90404.Mar 2 2017, 2:56 PM

Added tests for SUBC with immediate

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

tyomitch updated this revision to Diff 90476.Mar 3 2017, 7:14 AM

Select(RHS.getNode()) must be deferred until RHS has users; otherwise, if Select() converts RHS into a duplicate of an existing node, then the DAG automatically updates all uses of RHS to use the existing node instead, and deletes the RHS's own node.
If we call Select(RHS.getNode()) when RHS doesn't yet have any users, then nothing gets updated, RHS's node gets deleted, and we end up adding uses to a deleted node. Boom!

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

efriedma added inline comments.Mar 3 2017, 11:40 AM
lib/Target/ARM/ARMISelDAGToDAG.cpp
3299 ↗(On Diff #90476)

Do you actually need to call Select() explicitly here? Instruction selection should pick it up automatically, I think.

tyomitch added inline comments.Mar 3 2017, 12:28 PM
lib/Target/ARM/ARMISelDAGToDAG.cpp
3299 ↗(On Diff #90476)

No, it doesn't re-lower nodes created by ARMDAGToDAGISel::Select(): it is assumed to only output lowered nodes.

efriedma added inline comments.Mar 3 2017, 12:44 PM
lib/Target/ARM/ARMISelDAGToDAG.cpp
3299 ↗(On Diff #90476)

Okay.

The lowering for ISD::AND has some code which deals with a similar situation, but in a different way. Could you refactor to share the same code?

tyomitch updated this revision to Diff 90537.Mar 3 2017, 2:54 PM

Copying the trick that the lowering for ISD::AND uses to create and lower a constant node

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

tyomitch added inline comments.Mar 7 2017, 3:00 PM
test/CodeGen/Thumb/long.ll
56 ↗(On Diff #90537)

Now I see that lowering an (ADDE x, y, (ADDC z, t)) into a chain of (CopyFromReg CPSR, (tADD z, t)), (CopyFromReg CPSR, (tADC x, y, (CopyToReg CPSR))), with the CPSR-copying nodes glued to the arithmetic nodes, -- doesn't prevent LLVM from scheduling CPSR-clobbering operations in between the converted ADDC and the converted ADDE, -- such as in this test case, where a flag-setting tMOVi8 is inserted in the middle.

An ugly patch is certainly better than an incorrect one, so I decided to go back and finish the "hybrid implementation" using tPseudoInsts with two integer outputs each for tADDS / tSUBS, and custom C++ lowering for tADC / tSBC.

tyomitch updated this revision to Diff 90947.Mar 7 2017, 3:07 PM

Hybrid implementation

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

Hmm... given that you've done most of the work of fixing AdjustInstrPostInstrSelection, how hard would it be to add tADCS/tSBCS pseudo-instructions and send them through AdjustInstrPostInstrSelection, as opposed to using custom selection code in C++? I'm sort of concerned you could run into the same scheduling problem for 128-bit addition.

tyomitch updated this revision to Diff 91182.Mar 9 2017, 9:51 AM

Adding tADCS/tSBCS pseudo-instructions does indeed let
simplify the custom selection code quite a bit, but
doesn't get rid of it entirely, as the negative-immediate
operand still needs a "recursive lowering" which cannot
be specified with ISel patterns. (This is similar to how
ISD::AND needs the custom lowering into a tBIC.)

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

but doesn't get rid of it entirely, as the negative-immediate operand still needs a "recursive lowering" which cannot be specified with ISel patterns.

Could you do this as a DAGCombine instead?

tyomitch updated this revision to Diff 91225.Mar 9 2017, 3:17 PM

Lowering the negative-immediate operand as a DAGCombine instead

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

efriedma accepted this revision.Mar 9 2017, 3:30 PM

LGTM, with a few minor tweaks.

lib/Target/ARM/ARMISelLowering.cpp
9075 ↗(On Diff #91225)

Check isThumb1Only() rather than MCID->getSize()?

9129 ↗(On Diff #91225)

Check isThumb1Only() rather than MCID->getSize()?

test/CodeGen/Thumb/long.ll
3 ↗(On Diff #91225)

Add -verify-machineinstrs to the RUN lines.

This revision is now accepted and ready to land.Mar 9 2017, 3:30 PM
tyomitch updated this revision to Diff 91231.Mar 9 2017, 3:55 PM

The minor tweaks

Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,

same as already done for ARM and Thumb2.

This revision was automatically updated to reflect the committed changes.