Unfortunately, due to the Thumb1 idiosyncrasy where the instructions
can be *either* flag-setting *or* conditional, this is not expressible
with TableGen patterns, so we have to go for the custom C++ lowering.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/ARM/ARMISelDAGToDAG.cpp | ||
---|---|---|
3306 ↗ | (On Diff #89861) | This assertion seems suspicious... why is it true in general? |
Thanks Eli!
Indeed the assertion was wrong; this also shows how insufficient our tests for long adds/subracts were.
Updating the patch to address both these points.
Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.
lib/Target/ARM/ARMISelDAGToDAG.cpp | ||
---|---|---|
3240 ↗ | (On Diff #89993) | Exactly because I want it to fall through to the next case, if the condition doesn't hold. |
Ok, just adding LLVM_FALLTHROUGH should be fine for me. I'll let Eli have a final look and approve.
lib/Target/ARM/ARMISelDAGToDAG.cpp | ||
---|---|---|
3240 ↗ | (On Diff #89993) | right, I thought as much. |
Added LLVM_FALLTHROUGH
Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.
Are you sure we can't use the same codepath we currently use for Thumb2/ARM here?
test/CodeGen/Thumb/long.ll | ||
---|---|---|
80 ↗ | (On Diff #90031) | I'd also like to see some tests here for subtraction with an immediate amount. ("add i64 %y, -10" etc.) |
Are you sure we can't use the same codepath we currently use for Thumb2/ARM here?
I don't think we can.
The existing codepath is itself quite hairy: quoting a comment in ARMInstrInfo.td,
// Currently, ADDS/SUBS are pseudo opcodes that exist only in the // selection DAG. They are "lowered" to real ADD/SUB opcodes by // AdjustInstrPostInstrSelection where we determine whether or not to // set the "s" bit based on CPSR liveness. // // FIXME: Eliminate ADDS/SUBS pseudo opcodes after adding tablegen // support for an optional CPSR definition that corresponds to the DAG // node's second value. We can then eliminate the implicit def of CPSR.
For the Thumb1 instructions, we cannot choose "whether or not to set the "s" bit"; it's implicitly set iff the instruction isn't predicated.
For the Thumb1 instructions, we cannot choose "whether or not to set the "s" bit"; it's implicitly set iff the instruction isn't predicated.
I think it works out anyway; outside of Thumb1 mode, we want to avoid clobbering CPSR when we don't need to, but it's perfectly legal to produce a dead definition of CPSR.
clobbering CPSR when we don't need to is the least of the problems; what we have in ARM and Thumb2 is that ADD and ADDS are defined separately, the former producing one result (to match an ADD node), and the latter producing two (to match an ADDC node). In Thumb1, we cannot define them separately, so tADD MIs are defined with an OptionalDef for CPSR. The ISel patterns won't let me match an MI with one result value (and an OptionalDef) to an ISD node producing two results. Redefining tADD to always produce two results doesn't work either, because it's assumed, by many layers including AsmParser / AsmPrinter, to still have the OptionalDef for CPSR; and the InstrEmitter won't let me have CPSR as both an OptionalDef and an actual result in the same MI.
Handwave handwave, I cannot really prove that it cannot be done, but I mean I had tried, and I couldn't.
test/CodeGen/Thumb/long.ll | ||
---|---|---|
80 ↗ | (On Diff #90031) | Indeed, subtracting immediates wasn't handled well; I'll upload the updated patch. |
Patch updated
Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.
In Thumb1, we cannot define them separately
Why not? "t2ADDSrr" is a pseudo-instruction, not an actual encoding.
lib/Target/ARM/ARMISelDAGToDAG.cpp | ||
---|---|---|
3318 ↗ | (On Diff #90159) | The old patterns don't handle SUBC with an immediate. You can produce this situation with something like this: long long x(long long a, int b) { return a - (((long long)b << 32) | -1U); } I think the handling here is correct, but please change it in a separate patch. |
Why not? "t2ADDSrr" is a pseudo-instruction, not an actual encoding.
Right; but t2ADC / t2SBC are actual encodings (non-predicable, with non-optional def for CPSR), unlike tADC / tSBC (predicable, with an OptionalDef for CPSR).
It might be possible to do a hybrid implementation, using tPseudoInsts for tADDS / tSUBS, and custom C++ lowering for tADC / tSBC; although this feels like, out of two evils, choosing both.
It would also require duplicating a substantial portion of the code in ARMTargetLowering::AdjustInstrPostInstrSelection to take care of Thumb1 instructions separately, because their MIs have a different structure: in particular, the cc_out operand that AdjustInstrPostInstrSelection is adding must, in Thumb1 instructions, be not last but 1st (and MachineInstr doesn't even have an API to insert a new operand into the middle of an existing instruction).
lib/Target/ARM/ARMISelDAGToDAG.cpp | ||
---|---|---|
3318 ↗ | (On Diff #90159) | The old patterns lower this code into: movs r3, #0 mvns r3, r3 subs r0, r0, r3 sbcs r1, r2 on Thumb1, and into much more compact subs.w r0, r0, #-1 sbcs r1, r2 on Thumb2. The new code lowers it into adds r0, r0, #1 sbcs r1, r2 which is equivalent, and even a bit more compact. |
t2ADC / t2SBC are actual encodings (non-predicable
t2ADC should be predicable? At least, there isn't any restriction imposed by the architecture.
The rest makes sense; I'll stop pushing.
I don't really see what the problem is, either with the old patterns or with the new code.
I don't really want to mix multiple orthogonal changes, especially without any test coverage.
t2ADC should be predicable?
I'd think so too! As you see, long addition/subtraction is not the neatest part of the ARM backend :-)
I don't really want to mix multiple orthogonal changes, especially without any test coverage.
Right, now I see what you mean.
These changes are rather intertwined (it's easier to handle both ADDC and SUBC in the same branch of the switch block, than to separate them and faithfully replicate the old behaviour) but I will certainly add a test case for SUBC with immediate.
Added tests for SUBC with immediate
Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.
Select(RHS.getNode()) must be deferred until RHS has users; otherwise, if Select() converts RHS into a duplicate of an existing node, then the DAG automatically updates all uses of RHS to use the existing node instead, and deletes the RHS's own node.
If we call Select(RHS.getNode()) when RHS doesn't yet have any users, then nothing gets updated, RHS's node gets deleted, and we end up adding uses to a deleted node. Boom!
Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.
lib/Target/ARM/ARMISelDAGToDAG.cpp | ||
---|---|---|
3299 ↗ | (On Diff #90476) | Do you actually need to call Select() explicitly here? Instruction selection should pick it up automatically, I think. |
lib/Target/ARM/ARMISelDAGToDAG.cpp | ||
---|---|---|
3299 ↗ | (On Diff #90476) | No, it doesn't re-lower nodes created by ARMDAGToDAGISel::Select(): it is assumed to only output lowered nodes. |
lib/Target/ARM/ARMISelDAGToDAG.cpp | ||
---|---|---|
3299 ↗ | (On Diff #90476) | Okay. The lowering for ISD::AND has some code which deals with a similar situation, but in a different way. Could you refactor to share the same code? |
Copying the trick that the lowering for ISD::AND uses to create and lower a constant node
Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.
test/CodeGen/Thumb/long.ll | ||
---|---|---|
56 ↗ | (On Diff #90537) | Now I see that lowering an (ADDE x, y, (ADDC z, t)) into a chain of (CopyFromReg CPSR, (tADD z, t)), (CopyFromReg CPSR, (tADC x, y, (CopyToReg CPSR))), with the CPSR-copying nodes glued to the arithmetic nodes, -- doesn't prevent LLVM from scheduling CPSR-clobbering operations in between the converted ADDC and the converted ADDE, -- such as in this test case, where a flag-setting tMOVi8 is inserted in the middle. An ugly patch is certainly better than an incorrect one, so I decided to go back and finish the "hybrid implementation" using tPseudoInsts with two integer outputs each for tADDS / tSUBS, and custom C++ lowering for tADC / tSBC. |
Hybrid implementation
Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.
Hmm... given that you've done most of the work of fixing AdjustInstrPostInstrSelection, how hard would it be to add tADCS/tSBCS pseudo-instructions and send them through AdjustInstrPostInstrSelection, as opposed to using custom selection code in C++? I'm sort of concerned you could run into the same scheduling problem for 128-bit addition.
Adding tADCS/tSBCS pseudo-instructions does indeed let
simplify the custom selection code quite a bit, but
doesn't get rid of it entirely, as the negative-immediate
operand still needs a "recursive lowering" which cannot
be specified with ISel patterns. (This is similar to how
ISD::AND needs the custom lowering into a tBIC.)
Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.
but doesn't get rid of it entirely, as the negative-immediate operand still needs a "recursive lowering" which cannot be specified with ISel patterns.
Could you do this as a DAGCombine instead?
Lowering the negative-immediate operand as a DAGCombine instead
Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.
The minor tweaks
Updating D30400: For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.