This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMISelLowering.h
-
ARMISelLowering.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
-
su-addsub-overflow.ll

Differential D35635

[ARM] Optimize {s,u}{add,sub}.with.overflow
ClosedPublic

Authored by jgalenson on Jul 19 2017, 10:12 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
efriedma
rengolin
rogfer01
peter.smith

Commits

rG6f4e827e4cfa: [ARM] Optimize {s,u}{add,sub}.with.overflow.
rL321224: [ARM] Optimize {s,u}{add,sub}.with.overflow.

Summary

The AArch64 backend contains code to optimize {s,u}{add,sub}.with.overflow during ISel. This commit ports that code to the ARM backend.

Diff Detail

Repository: rL LLVM

Event Timeline

jgalenson created this revision.Jul 19 2017, 10:12 AM

Herald added subscribers: kristof.beyls, javed.absar, aemerson. · View Herald TranscriptJul 19 2017, 10:12 AM

There doesn't seem to be any test of the inversion logic.

Yes, I can't get a testcase to trigger those paths. Do you have any suggestions for how to do so?

Please include full context when you upload patches to Phabricator; see http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface.

This sort of overlaps with https://reviews.llvm.org/D35192.

lib/Target/ARM/ARMISelLowering.cpp
4484 ↗	(On Diff #107331)	We should be able to optimize away the AND here using known bits; does that not happen on trunk?
test/CodeGen/ARM/su_addsub_overflow.ll
20 ↗	(On Diff #107331)	These CHECK lines are not very good; try generating checks with utils/update_llc_test_checks.py. Also, please commit this first and rebase your patch on top of it; that makes it easy to see what happens to the generated code.

jgalenson added inline comments.Jul 19 2017, 11:27 AM

lib/Target/ARM/ARMISelLowering.cpp
4484 ↗	(On Diff #107331)	It doesn't because ARM doesn't call setBooleanContents. If there's a way to set that correctly, it would indeed be better. Do you think we could do that?
test/CodeGen/ARM/su_addsub_overflow.ll
20 ↗	(On Diff #107331)	That tool does make the changes to the testfiles more obvious, but it also generates overly-specific testcases that seem more brittle. I'll see if I can make a nice middle-ground and update it in a bit.

efriedma added inline comments.Jul 19 2017, 11:33 AM

lib/Target/ARM/ARMISelLowering.cpp
4484 ↗	(On Diff #107331)	Really? Wow, that's a big oversight. We should fix that.

This is the code for my testcase before my patch. I'll now update it after my patch.

Improve CHECK lines on the test.

jgalenson added inline comments.Jul 19 2017, 12:58 PM

lib/Target/ARM/ARMISelLowering.cpp
4484 ↗	(On Diff #107331)	Do you know the correct way to set it? I tried, but a few tests failed, and I don't know the architecture well enough to know if that was my fault or of the tests just needed to be tweaked. If we do make that change, then I'll be able to simplify this code, of course.
test/CodeGen/ARM/su_addsub_overflow.ll
20 ↗	(On Diff #107331)	Okay, I've uploaded both the old and new versions of the test here. Thanks for that tool, by the way; it's pretty convenient.

efriedma added inline comments.Jul 21 2017, 4:08 PM

lib/Target/ARM/ARMISelLowering.cpp
4484 ↗	(On Diff #107331)	It's likely the tests just need to be tweaked; a lot of them are pretty sensitive to the exact compiler output.

jgalenson added inline comments.Jul 21 2017, 5:42 PM

lib/Target/ARM/ARMISelLowering.cpp
4484 ↗	(On Diff #107331)	So which argument to setBooleanContents is correct for ARM? Is it ZeroOrOne?

efriedma added inline comments.Jul 21 2017, 5:59 PM

lib/Target/ARM/ARMISelLowering.cpp
4484 ↗	(On Diff #107331)	ZeroOrOne is probably best; among other things, it matches the AAPCS calling convention rules for bool.

jgalenson added inline comments.Jul 21 2017, 6:04 PM

lib/Target/ARM/ARMISelLowering.cpp
4484 ↗	(On Diff #107331)	Okay, thanks. I'll look into a patch that simply adds that and then rebase this commit on top of it, which will allow me to remove the AND-checking logic here.

Now that the setBooleanContents patch has landed, I can remove the AND checks here.

I also added the same optimization to BRCOND, which allows it to optimize another case.

efriedma added inline comments.Aug 23 2017, 12:27 PM

test/CodeGen/ARM/su-addsub-overflow.ll
49 ↗	(On Diff #112190)	This looks weird; we're generating a cmp, then a sub with exactly the same operands?

jgalenson added inline comments.Aug 23 2017, 12:33 PM

test/CodeGen/ARM/su-addsub-overflow.ll
49 ↗	(On Diff #112190)	Well, as you can see in diff 2, we're currently doing that with a lot of other instructions. This removes most of those other instructions. I have another patch I was going to send after this one that fixes this. Specifically, ARMBaseInstrInfo::optimizeCompareInstr tries to remove the cmp, but it runs right after the MachineSink pass, which sinks the sub into the cont basic block, which stops the optimization from working.

Friendly ping.

If it would help, I can submit my follow-up patch now, although I don't think this should depend on it.

I'm kind of waiting for https://reviews.llvm.org/D35192 to land here... unfortunately, it's been bouncing in and out of the tree due to regressions, but once it lands, I'd like to see what it does to our lowering here.

I think it might make more sense to wait until we have ARMISD::ADDC nodes in the DAG, then try to to optimize away the ARMISD::CMP nodes. That way, you don't have to worry about trying to optimize away redundant instructions after SelectionDAG.

Now that https://reviews.llvm.org/D35192 has landed, I've rebased this patch on top of it.

The existing patch actually applied cleanly and worked. However, the two unsigned cases generated slightly worse code (still better than in-tree, however). I managed to improve the uadd case by having getARMXALUOp return an ADDC instead of an ADD, but this did not work for the usub case (although I left the change). There's probably a way to fix the usub case, but I'm not worried much about it because my follow-up patch that modifies ARMBaseInstrInfo::optimizeCompareInstr handles it.

So does this change to getARMXALUOp look correct?

The change to getARMXALUOOp is wrong; ADDC produces two results, so you're making a node with the wrong type.

So on trunk, for the llvm.uadd.with.overflow.i32 case, we produce a sequence like this:

adds    r0, r0, r1
mov     r2, #0
adc     r1, r2, #0
cmp     r1, #1

This is obviously not great... but the ARMISD::ADDE+ARMISD::CMP pattern is something you could DAGCombine away after legalization. I would prefer to do that, rather than try to clean it up after isel. Everything gets more complicated when you're dealing with MachineInstrs (you might end up optimizing simple cases, but not more complex ones), and the pattern works automatically with llvm.uadd.with.overflow.i64 etc.

I haven't thought through how exactly that extends to signed overflow, though.

Thanks for the info about ADC. How come no test/assert picked that up?

Here's the rebased patch without that change. You can see that uadd is slightly worse than sadd, but it's still an improvement. I'll look into DAGCombine tomorrow, although I do think that it's fine to do at least some of it at the MI level, since the code there already handles some cases and I'm extending it to catch a few more that it misses. But if it can be done at the DAG level, that's worth doing.

In terms of assertions, we have checks for a lot of nodes in SelectionDAG::getNode... but not all (and I guess ISD::ADDC isn't one of the ones we check). And legalization isn't very picky either.

In terms of tests, the ISD::ADDC probably got lowered to something else before it could do any damage. :)

Here's a modification of my incorrect commit from yesterday afternoon that properly gets the correct value out of the ADDC.

efriedma added inline comments.Sep 20 2017, 3:08 PM

lib/Target/ARM/ARMISelLowering.cpp
3942 ↗	(On Diff #116084)	Still the wrong type... you need to call getVTList to get the right type for an ADDC. Also, are you sure you don't want an ARMISD::ADDC, rather than an ISD::ADDC?

Oops, sorry, I must have lost that bit of my commit somehow.

jgalenson added a child revision: D38378: [ARM] Optimize {s,u}{add,sub}.with.overflow..Sep 28 2017, 12:40 PM

Is it really necessary to have two different of almost identical code to generate an ARMISD::BRCOND? (I would rather have an explicit check for an AND than two versions of the code, if that's the issue.)

In D35635#887746, @efriedma wrote:

Is it really necessary to have two different of almost identical code to generate an ARMISD::BRCOND? (I would rather have an explicit check for an AND than two versions of the code, if that's the issue.)

Both of them are used in the attached testcase. The br_cc case handles almost everything, but the brcond case is needed once (when the brcond isn't combined into a br_cc).

We could outline the two cases into a helper function, although they're different enough that I'm not sure that it would help too much. What do you think?

Friendly ping.

when the brcond isn't combined into a br_cc

BRCOND isn't legal on ARM; it will always eventually get transformed to BR_CC.

In D35635#899129, @efriedma wrote:

when the brcond isn't combined into a br_cc

BRCOND isn't legal on ARM; it will always eventually get transformed to BR_CC.

True. In most of my testcases, brcond is combined into br_cc, so we hit my new br_cc case. However, one time the brcond is not combined into br_cc. It is legalized into it shortly afterwards, but then we lower saddo/uaddo/etc. the normal way before we lower the new br_cc node (and hit my new optimized case). That's why I needed the brcond case.

What is the best way to handle that? Adding code to the normal saddo/etc. lowering code not to lower it if there's a br_cc seems the wrong way to go about it (and I don't think it's even possible). And I don't know how to change the order we see nodes.

but then we lower saddo/uaddo/etc. the normal way before we lower the new br_cc node

Oh, hmm, someone else hit the same issue recently (in a different context). It should be possible to fix; SelectionDAG::Legalize is the relevant code. It doesn't quite work the way you want it to because we don't update the ordering when we add new nodes to the DAG. But I haven't thought through how to compute the right order to visit without recomputing the entire topological order from scratch.

Alternatively, you could try DAGCombining ARMISD::BRCOND after legalization?

It doesn't quite work the way you want it to because we don't update the ordering when we add new nodes to the DAG. But I haven't thought through how to compute the right order to visit without recomputing the entire topological order from scratch.

This does sound like it would solve the problem (and solve other problems as well). It seems a bit out of scope for this commit, though.

Alternatively, you could try DAGCombining ARMISD::BRCOND after legalization?

Good idea. However, the ARMISD::BRCOND isn't combined until after saddo is lowered.

For reference, here's the sequence of events I'm seeing in this one example:

brcond is legalized to br_cc
saddo is legalized
br_cc is legalized to ARMISD::BRCOND
ARMISB::BRCOND is legalized
everything is combined

So combining ARMISD::BRCOND runs into the same problem as using br_cc; they're both too late. Thus without changing the ordering of new DAG nodes (which seems a bit difficult), I don't see how I can do this efficiently without keeping the brcond case.

Good idea. However, the ARMISD::BRCOND isn't combined until after saddo is lowered.

That shouldn't completely block all transforms; you could pattern-match the ARMISD nodes. But I guess the patterns become a lot more complicated, so maybe not worth doing.

lib/Target/ARM/ARMISelLowering.cpp
4582 ↗	(On Diff #116090)	Is this assert actually guaranteed somehow? I mean, it should be possible to transform any relevant condition to an equivalent SETEQ or SETNE, but I don't see any code to actually ensure this.
4601 ↗	(On Diff #116090)	I would probably write "if ((CC == ISD::SETNE) != isOneConstant(RHS))".

Good idea. However, the ARMISD::BRCOND isn't combined until after saddo is lowered.

That shouldn't completely block all transforms; you could pattern-match the ARMISD nodes. But I guess the patterns become a lot more complicated, so maybe not worth doing.

I'm not sure what you mean. After the ARMISD::BRCOND is created, only a few unrelated instructions are matched before the saddo is transformed. So without the reordering we've discussed, I don't see how this would help. I'm probably missing something, but it sounds like it might well not be worth doing.

jgalenson marked an inline comment as done.Oct 20 2017, 2:43 PM

jgalenson added inline comments.

lib/Target/ARM/ARMISelLowering.cpp
4582 ↗	(On Diff #116090)	Good point. I actually copied that part of the code over from the AArch64 backend. I would guess that the assumption is that you should only be checking if the overflow bit is set or unset, hence EQ or NE, but I could imagine something transforming those into other operations. I changed it to be part of the condition to avoid the problem. Do you think it's worth doing something similar in the AArch64 backend?

I'm not sure what you mean. After the ARMISD::BRCOND is created, only a few unrelated instructions are matched before the saddo is transformed. So without the reordering we've discussed, I don't see how this would help. I'm probably missing something, but it sounds like it might well not be worth doing.

I mean you could pattern match after the uaddo/saddo is transformed. For unsigned add, you get something like ARMISD::BRCOND(ARMISD::CMPZ(ARMISD::ADDE(0, 0, N), 1)), which you can transform by eliminating the CMPZ+ADDE. The pattern for signed add is slightly different, but similar.

lib/Target/ARM/ARMISelLowering.cpp
4582 ↗	(On Diff #116090)	Yes.

I mean you could pattern match after the uaddo/saddo is transformed. For unsigned add, you get something like ARMISD::BRCOND(ARMISD::CMPZ(ARMISD::ADDE(0, 0, N), 1)), which you can transform by eliminating the CMPZ+ADDE. The pattern for signed add is slightly different, but similar.

Ah yes. I considered that, but as you said, it would have more complicated matching and it would be more brittle if the saddo lowering changes (like it just did), so I preferred doing it this way.

lib/Target/ARM/ARMISelLowering.cpp
4582 ↗	(On Diff #116090)	Okay, I'll work on submitting a patch there (it should be NFC).

Ping.

jgalenson retitled this revision from Optimize {s,u}{add,sub}.with.overflow on ARM to [ARM] Optimize {s,u}{add,sub}.with.overflow.Nov 9 2017, 9:31 AM

jgalenson added reviewers: rengolin, rogfer01.Nov 16 2017, 10:44 AM

rengolin added a reviewer: peter.smith.Nov 16 2017, 12:44 PM

Are there any other comments? What can I do to get this approved?

rogfer01 added inline comments.Nov 29 2017, 10:16 AM

lib/Target/ARM/ARMISelLowering.cpp
4542–4543 ↗	(On Diff #119716)	Apologies if this question looks a bit clueless about all the transformation, but why you need to reverse the condition code here?

jgalenson added inline comments.Nov 29 2017, 11:20 AM

lib/Target/ARM/ARMISelLowering.cpp
4542–4543 ↗	(On Diff #119716)	No, it's a good question, and I'm a bit confused about this myself. The getARMXALUOOp function seems to return the condition for whether there is not an overflow. It doesn't seem to be documented anywhere, but for example, in its SADDO case, it uses the VC flag, which is the "no overflow" case. Assuming that's right, then since these branches are branching when an overflow occurs, I need to invert the condition. Does that logic seem right?

jgalenson added a child revision: D40922: [ARM] Optimize {s|u}mul.with.overflow..Dec 6 2017, 2:54 PM

rogfer01 added inline comments.Dec 7 2017, 6:23 AM

lib/Target/ARM/ARMISelLowering.cpp
3916 ↗	(On Diff #119716)	We could return `std::tuple<SDValue, SDValue, SDValue>` here I think, but better we leave this for a later change.
4542–4543 ↗	(On Diff #119716)	It is certainly confusing. It returns three things: the intended arithmetic code (in `Value`), the comparison of the overflow (in `OverflowCmp`) and a condition code related to `OverflowCmp` (in `ARMcc`). For unsigned addition (`UADDO`) it does the following Value: add z, x, y OverflowCmp: cmp z, x ARMcc: HS The comparison will compute `z - x` and update the flags. Using the `HS` condition code means `z >= x` and this is exactly no overflow (because `z ← x + y`). Similarly when doing a signed addition (`SADDO`) Value: add z, x, y OverflowCmp: cmp z, x ARMcc: VC This case is less obvious but if there is no overflow in this case there must not be overflow in the comparison (`VC`): if there were no overflow in the `add`, then `cmp` because it does `z - x` to update the flags would compute `y` which does not overflow. Similarly for `USUBO` and `SSUBO` but here the comparison is `cmp x, y` (instead of `cmp z, x`) which is (suprisingly) more intuitive because unsigned `x - y` will not overflow if `x >= y` and trivially for the signed case as not overflowing is exactly `VC`. So after this wall of text, I believe your logic is sensible here. Given that `getARMXALUOOp` is now going to be used in three places after your change, would it be possible to add a comment to that function like the one below (or an improved wording if you come up with a better one!) // This function returns three things: the arithmetic computation itself // (Value), a comparison (OverflowCmp) and a condition code (ARMcc). The // comparison and the condition code define the case in which the arithmetic // computation does not overflow. std::pair<SDValue, SDValue> ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG, SDValue &ARMcc) const { ...
3942 ↗	(On Diff #116084)	This comment now looks odd because I had to revert D35192 (still working on it, though). What was the reason to use a target specific node here? Did you want to save some round trips lowering this or you needed it for better codegen?

jgalenson added inline comments.Dec 7 2017, 10:06 AM

lib/Target/ARM/ARMISelLowering.cpp
3916 ↗	(On Diff #119716)	Yes, that does seem cleaner to me. I can make a follow-up patch after this lands.
4542–4543 ↗	(On Diff #119716)	Thanks for the long double-checking! That comment looks good to me, so I'll add it.
3942 ↗	(On Diff #116084)	I used the target-specific node here because efriedma suggested it, but using the generic ISD::ADDC does seem to work (it passes all the tests). Should I change it? I don't know much about the difference between them. It is interesting that this still works after you reverted your patch.

Added rogfer01's comment to getARMXALUOOp.

rogfer01 added inline comments.Dec 8 2017, 12:44 AM

lib/Target/ARM/ARMISelLowering.cpp
3942 ↗	(On Diff #116084)	My understanding is that several generic nodes (`ISD::`) have to be lowered at some point to Arm-specific ones (`ARMISD::`). `ISD::ADD` and `ISD::SUB` are among those nodes. Using Arm-specific nodes earlier might bring finer control but may miss some of the generic combiners done on generic nodes. But it is true that when this function is used, the generic node will be found among specific ones (like `ARMISD::CMP`) so generic combiners are unlikely to kick in here, which might be a reason to use the specific one directly and not bothering using the generic one (which I presume will require an extra iteration replacing it with a specific one). Changing just this case will make this function look a bit odd. If Eli's comment was in the line of "let's make this function only use ARMISD" (instead of `ISD::ADD` and `ISD::SUB`) we can make this change in a later patch and leave this function untouched (except for the comment). @efriedma am I missing something?

I've removed the modification to getARMXALUOOp that uses ADDC instead of ADD. From the discussion, it's not clear exactly how that part should work, and it's no longer needed, so this makes things simpler.

Now that D35192 has re-landed I need to update this patch.

Note that I added back the use of ARMISD::ADDC, which were were discussing a bit before.

efriedma added inline comments.Dec 12 2017, 3:52 PM

lib/Target/ARM/ARMISelLowering.cpp
3942 ↗	(On Diff #116084)	Please don't use ISD::ADDC here; now that we're using ADDCARRY, we shouldn't produce that node at all on ARM. (LowerADDC_ADDE_SUBC_SUBE should have been removed as part of D35192. ARMISD::ADDC is a different node which is still useful.)

Add the ARMISD::ADDC node for real. Thanks for catching that!

rogfer01 added inline comments.Dec 13 2017, 12:28 AM

lib/Target/ARM/ARMISelLowering.cpp
3942 ↗	(On Diff #116084)	Thanks @efriedma. Now I see that I should have removed `LowerADDC_ADDE_SUBC_SUBE` in D35192. That said I fail to understand why we want to update this function in this patch. @jgalenson does keep using `ISD:ADD` (not to confuse it with `ISD::ADDC`) impact the code generation? I mean, I understand we can use `ISD::ADDCARRY` here but maybe we want to do this in a later patch?

jgalenson added inline comments.Dec 13 2017, 8:54 AM

lib/Target/ARM/ARMISelLowering.cpp
3942 ↗	(On Diff #116084)	Yes, with D35192 landed using ISD::ADD produces worse code. It's possible there's a different way to fix that, but doing this works as well.

Ping.

What else should I do to get this approved? It seems like there's agreement that this patch looks at least mostly good, and it (and the follow-ups) significantly improve the performance of overflow-checked code for ARM.

LGTM.

This revision is now accepted and ready to land.Dec 20 2017, 1:37 PM

Closed by commit rL321224: [ARM] Optimize {s,u}{add,sub}.with.overflow. (authored by jgalenson). · Explain WhyDec 20 2017, 2:26 PM

This revision was automatically updated to reflect the committed changes.

Thanks for all the comments on this!

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

ARM/

ARMISelLowering.h

1 line

ARMISelLowering.cpp

73 lines

test/

CodeGen/

ARM/

su-addsub-overflow.ll

135 lines

Diff 127786

llvm/trunk/lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 638 Lines • ▼ Show 20 Lines	private:
SDValue LowerGlobalTLSAddressDarwin(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerGlobalTLSAddressDarwin(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerGlobalTLSAddressWindows(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerGlobalTLSAddressWindows(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerGLOBAL_OFFSET_TABLE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerGLOBAL_OFFSET_TABLE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBR_JT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBR_JT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSignedALUO(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSignedALUO(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerUnsignedALUO(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerUnsignedALUO(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerConstantFP(SDValue Op, SelectionDAG &DAG,		SDValue LowerConstantFP(SDValue Op, SelectionDAG &DAG,
▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,035 Lines • ▼ Show 20 Lines	ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);		setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);
setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);		setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);
setOperationAction(ISD::SELECT_CC, MVT::f64, Custom);		setOperationAction(ISD::SELECT_CC, MVT::f64, Custom);

// Thumb-1 cannot currently select ARMISD::SUBE.		// Thumb-1 cannot currently select ARMISD::SUBE.
if (!Subtarget->isThumb1Only())		if (!Subtarget->isThumb1Only())
setOperationAction(ISD::SETCCE, MVT::i32, Custom);		setOperationAction(ISD::SETCCE, MVT::i32, Custom);

setOperationAction(ISD::BRCOND, MVT::Other, Expand);		setOperationAction(ISD::BRCOND, MVT::Other, Custom);
setOperationAction(ISD::BR_CC, MVT::i32, Custom);		setOperationAction(ISD::BR_CC, MVT::i32, Custom);
setOperationAction(ISD::BR_CC, MVT::f32, Custom);		setOperationAction(ISD::BR_CC, MVT::f32, Custom);
setOperationAction(ISD::BR_CC, MVT::f64, Custom);		setOperationAction(ISD::BR_CC, MVT::f64, Custom);
setOperationAction(ISD::BR_JT, MVT::Other, Custom);		setOperationAction(ISD::BR_JT, MVT::Other, Custom);

// We don't support sin/cos/fmod/copysign/pow		// We don't support sin/cos/fmod/copysign/pow
setOperationAction(ISD::FSIN, MVT::f64, Expand);		setOperationAction(ISD::FSIN, MVT::f64, Expand);
setOperationAction(ISD::FSIN, MVT::f32, Expand);		setOperationAction(ISD::FSIN, MVT::f32, Expand);
▲ Show 20 Lines • Show All 2,836 Lines • ▼ Show 20 Lines	ARMTargetLowering::duplicateCmp(SDValue Cmp, SelectionDAG &DAG) const {
else {		else {
assert(Opc == ARMISD::CMPFPw0 && "unexpected operand of FMSTAT");		assert(Opc == ARMISD::CMPFPw0 && "unexpected operand of FMSTAT");
Cmp = DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0),		Cmp = DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0),
Cmp.getOperand(1));		Cmp.getOperand(1));
}		}
return DAG.getNode(ARMISD::FMSTAT, DL, MVT::Glue, Cmp);		return DAG.getNode(ARMISD::FMSTAT, DL, MVT::Glue, Cmp);
}		}

		// This function returns three things: the arithmetic computation itself
		// (Value), a comparison (OverflowCmp), and a condition code (ARMcc). The
		// comparison and the condition code define the case in which the arithmetic
		// computation does not overflow.
std::pair<SDValue, SDValue>		std::pair<SDValue, SDValue>
ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG,		ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG,
SDValue &ARMcc) const {		SDValue &ARMcc) const {
assert(Op.getValueType() == MVT::i32 && "Unsupported value type");		assert(Op.getValueType() == MVT::i32 && "Unsupported value type");

SDValue Value, OverflowCmp;		SDValue Value, OverflowCmp;
SDValue LHS = Op.getOperand(0);		SDValue LHS = Op.getOperand(0);
SDValue RHS = Op.getOperand(1);		SDValue RHS = Op.getOperand(1);
Show All 9 Lines	default:
llvm_unreachable("Unknown overflow instruction!");		llvm_unreachable("Unknown overflow instruction!");
case ISD::SADDO:		case ISD::SADDO:
ARMcc = DAG.getConstant(ARMCC::VC, dl, MVT::i32);		ARMcc = DAG.getConstant(ARMCC::VC, dl, MVT::i32);
Value = DAG.getNode(ISD::ADD, dl, Op.getValueType(), LHS, RHS);		Value = DAG.getNode(ISD::ADD, dl, Op.getValueType(), LHS, RHS);
OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value, LHS);		OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value, LHS);
break;		break;
case ISD::UADDO:		case ISD::UADDO:
ARMcc = DAG.getConstant(ARMCC::HS, dl, MVT::i32);		ARMcc = DAG.getConstant(ARMCC::HS, dl, MVT::i32);
Value = DAG.getNode(ISD::ADD, dl, Op.getValueType(), LHS, RHS);		// We use ADDC here to correspond to its use in LowerUnsignedALUO.
		// We do not use it in the USUBO case as Value may not be used.
		Value = DAG.getNode(ARMISD::ADDC, dl,
		DAG.getVTList(Op.getValueType(), MVT::i32), LHS, RHS)
		.getValue(0);
OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value, LHS);		OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value, LHS);
break;		break;
case ISD::SSUBO:		case ISD::SSUBO:
ARMcc = DAG.getConstant(ARMCC::VC, dl, MVT::i32);		ARMcc = DAG.getConstant(ARMCC::VC, dl, MVT::i32);
Value = DAG.getNode(ISD::SUB, dl, Op.getValueType(), LHS, RHS);		Value = DAG.getNode(ISD::SUB, dl, Op.getValueType(), LHS, RHS);
OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, LHS, RHS);		OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, LHS, RHS);
break;		break;
case ISD::USUBO:		case ISD::USUBO:
▲ Show 20 Lines • Show All 582 Lines • ▼ Show 20 Lines	if (LHSOk && RHSOk && (LHSSeenZero \|\| RHSSeenZero)) {
SDVTList VTList = DAG.getVTList(MVT::Other, MVT::Glue);		SDVTList VTList = DAG.getVTList(MVT::Other, MVT::Glue);
SDValue Ops[] = { Chain, ARMcc, LHS1, LHS2, RHS1, RHS2, Dest };		SDValue Ops[] = { Chain, ARMcc, LHS1, LHS2, RHS1, RHS2, Dest };
return DAG.getNode(ARMISD::BCC_i64, dl, VTList, Ops);		return DAG.getNode(ARMISD::BCC_i64, dl, VTList, Ops);
}		}

return SDValue();		return SDValue();
}		}

		SDValue ARMTargetLowering::LowerBRCOND(SDValue Op, SelectionDAG &DAG) const {
		SDValue Chain = Op.getOperand(0);
		SDValue Cond = Op.getOperand(1);
		SDValue Dest = Op.getOperand(2);
		SDLoc dl(Op);

		// Optimize {s\|u}{add\|sub}.with.overflow feeding into a branch instruction.
		unsigned Opc = Cond.getOpcode();
		if (Cond.getResNo() == 1 && (Opc == ISD::SADDO \|\| Opc == ISD::UADDO \|\|
		Opc == ISD::SSUBO \|\| Opc == ISD::USUBO)) {
		// Only lower legal XALUO ops.
		if (!DAG.getTargetLoweringInfo().isTypeLegal(Cond->getValueType(0)))
		return SDValue();

		// The actual operation with overflow check.
		SDValue Value, OverflowCmp;
		SDValue ARMcc;
		std::tie(Value, OverflowCmp) = getARMXALUOOp(Cond, DAG, ARMcc);

		// Reverse the condition code.
		ARMCC::CondCodes CondCode =
		(ARMCC::CondCodes)cast<const ConstantSDNode>(ARMcc)->getZExtValue();
		CondCode = ARMCC::getOppositeCondition(CondCode);
		ARMcc = DAG.getConstant(CondCode, SDLoc(ARMcc), MVT::i32);
		SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);

		return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Chain, Dest, ARMcc, CCR,
		OverflowCmp);
		}

		return SDValue();
		}

SDValue ARMTargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {		SDValue ARMTargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {
SDValue Chain = Op.getOperand(0);		SDValue Chain = Op.getOperand(0);
ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();
SDValue LHS = Op.getOperand(2);		SDValue LHS = Op.getOperand(2);
SDValue RHS = Op.getOperand(3);		SDValue RHS = Op.getOperand(3);
SDValue Dest = Op.getOperand(4);		SDValue Dest = Op.getOperand(4);
SDLoc dl(Op);		SDLoc dl(Op);

if (Subtarget->isFPOnlySP() && LHS.getValueType() == MVT::f64) {		if (Subtarget->isFPOnlySP() && LHS.getValueType() == MVT::f64) {
DAG.getTargetLoweringInfo().softenSetCCOperands(DAG, MVT::f64, LHS, RHS, CC,		DAG.getTargetLoweringInfo().softenSetCCOperands(DAG, MVT::f64, LHS, RHS, CC,
dl);		dl);

// If softenSetCCOperands only returned one value, we should compare it to		// If softenSetCCOperands only returned one value, we should compare it to
// zero.		// zero.
if (!RHS.getNode()) {		if (!RHS.getNode()) {
RHS = DAG.getConstant(0, dl, LHS.getValueType());		RHS = DAG.getConstant(0, dl, LHS.getValueType());
CC = ISD::SETNE;		CC = ISD::SETNE;
}		}
}		}

		// Optimize {s\|u}{add\|sub}.with.overflow feeding into a branch instruction.
		unsigned Opc = LHS.getOpcode();
		if (LHS.getResNo() == 1 && (isOneConstant(RHS) \|\| isNullConstant(RHS)) &&
		(Opc == ISD::SADDO \|\| Opc == ISD::UADDO \|\| Opc == ISD::SSUBO \|\|
		Opc == ISD::USUBO) && (CC == ISD::SETEQ \|\| CC == ISD::SETNE)) {
		// Only lower legal XALUO ops.
		if (!DAG.getTargetLoweringInfo().isTypeLegal(LHS->getValueType(0)))
		return SDValue();

		// The actual operation with overflow check.
		SDValue Value, OverflowCmp;
		SDValue ARMcc;
		std::tie(Value, OverflowCmp) = getARMXALUOOp(LHS.getValue(0), DAG, ARMcc);

		if ((CC == ISD::SETNE) != isOneConstant(RHS)) {
		// Reverse the condition code.
		ARMCC::CondCodes CondCode =
		(ARMCC::CondCodes)cast<const ConstantSDNode>(ARMcc)->getZExtValue();
		CondCode = ARMCC::getOppositeCondition(CondCode);
		ARMcc = DAG.getConstant(CondCode, SDLoc(ARMcc), MVT::i32);
		}
		SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);

		return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Chain, Dest, ARMcc, CCR,
		OverflowCmp);
		}

if (LHS.getValueType() == MVT::i32) {		if (LHS.getValueType() == MVT::i32) {
SDValue ARMcc;		SDValue ARMcc;
SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl);		SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl);
SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);		SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other,		return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other,
Chain, Dest, ARMcc, CCR, Cmp);		Chain, Dest, ARMcc, CCR, Cmp);
}		}

▲ Show 20 Lines • Show All 3,239 Lines • ▼ Show 20 Lines	SDValue ARMTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
default: llvm_unreachable("Don't know how to custom lower this!");		default: llvm_unreachable("Don't know how to custom lower this!");
case ISD::WRITE_REGISTER: return LowerWRITE_REGISTER(Op, DAG);		case ISD::WRITE_REGISTER: return LowerWRITE_REGISTER(Op, DAG);
case ISD::ConstantPool: return LowerConstantPool(Op, DAG);		case ISD::ConstantPool: return LowerConstantPool(Op, DAG);
case ISD::BlockAddress: return LowerBlockAddress(Op, DAG);		case ISD::BlockAddress: return LowerBlockAddress(Op, DAG);
case ISD::GlobalAddress: return LowerGlobalAddress(Op, DAG);		case ISD::GlobalAddress: return LowerGlobalAddress(Op, DAG);
case ISD::GlobalTLSAddress: return LowerGlobalTLSAddress(Op, DAG);		case ISD::GlobalTLSAddress: return LowerGlobalTLSAddress(Op, DAG);
case ISD::SELECT: return LowerSELECT(Op, DAG);		case ISD::SELECT: return LowerSELECT(Op, DAG);
case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG);		case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG);
		case ISD::BRCOND: return LowerBRCOND(Op, DAG);
case ISD::BR_CC: return LowerBR_CC(Op, DAG);		case ISD::BR_CC: return LowerBR_CC(Op, DAG);
case ISD::BR_JT: return LowerBR_JT(Op, DAG);		case ISD::BR_JT: return LowerBR_JT(Op, DAG);
case ISD::VASTART: return LowerVASTART(Op, DAG);		case ISD::VASTART: return LowerVASTART(Op, DAG);
case ISD::ATOMIC_FENCE: return LowerATOMIC_FENCE(Op, DAG, Subtarget);		case ISD::ATOMIC_FENCE: return LowerATOMIC_FENCE(Op, DAG, Subtarget);
case ISD::PREFETCH: return LowerPREFETCH(Op, DAG, Subtarget);		case ISD::PREFETCH: return LowerPREFETCH(Op, DAG, Subtarget);
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP: return LowerINT_TO_FP(Op, DAG);		case ISD::UINT_TO_FP: return LowerINT_TO_FP(Op, DAG);
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
▲ Show 20 Lines • Show All 6,593 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/su-addsub-overflow.ll

				; RUN: llc < %s -mtriple=arm-eabi -mcpu=generic \| FileCheck %s

				define i32 @sadd(i32 %a, i32 %b) local_unnamed_addr #0 {
				; CHECK-LABEL: sadd:
				; CHECK: mov r[[R0:[0-9]+]], r0
				; CHECK-NEXT: add r[[R1:[0-9]+]], r[[R0]], r1
				; CHECK-NEXT: cmp r[[R1]], r[[R0]]
				; CHECK-NEXT: movvc pc, lr
				entry:
				%0 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
				%1 = extractvalue { i32, i1 } %0, 1
				br i1 %1, label %trap, label %cont

				trap:
				tail call void @llvm.trap() #2
				unreachable

				cont:
				%2 = extractvalue { i32, i1 } %0, 0
				ret i32 %2

				}

				define i32 @uadd(i32 %a, i32 %b) local_unnamed_addr #0 {
				; CHECK-LABEL: uadd:
				; CHECK: mov r[[R0:[0-9]+]], r0
				; CHECK-NEXT: adds r[[R1:[0-9]+]], r[[R0]], r1
				; CHECK-NEXT: cmp r[[R1]], r[[R0]]
				; CHECK-NEXT: movhs pc, lr
				entry:
				%0 = tail call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
				%1 = extractvalue { i32, i1 } %0, 1
				br i1 %1, label %trap, label %cont

				trap:
				tail call void @llvm.trap() #2
				unreachable

				cont:
				%2 = extractvalue { i32, i1 } %0, 0
				ret i32 %2

				}

				define i32 @ssub(i32 %a, i32 %b) local_unnamed_addr #0 {
				; CHECK-LABEL: ssub:
				; CHECK: cmp r0, r1
				; CHECK-NEXT: subvc r0, r0, r1
				; CHECK-NEXT: movvc pc, lr
				entry:
				%0 = tail call { i32, i1 } @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)
				%1 = extractvalue { i32, i1 } %0, 1
				br i1 %1, label %trap, label %cont

				trap:
				tail call void @llvm.trap() #2
				unreachable

				cont:
				%2 = extractvalue { i32, i1 } %0, 0
				ret i32 %2

				}

				define i32 @usub(i32 %a, i32 %b) local_unnamed_addr #0 {
				; CHECK-LABEL: usub:
				; CHECK: mov r[[R0:[0-9]+]], r0
				; CHECK-NEXT: subs r[[R1:[0-9]+]], r[[R0]], r1
				; CHECK-NEXT: cmp r[[R0]], r1
				; CHECK-NEXT: movhs pc, lr
				entry:
				%0 = tail call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %a, i32 %b)
				%1 = extractvalue { i32, i1 } %0, 1
				br i1 %1, label %trap, label %cont

				trap:
				tail call void @llvm.trap() #2
				unreachable

				cont:
				%2 = extractvalue { i32, i1 } %0, 0
				ret i32 %2

				}

				define void @sum(i32* %a, i32* %b, i32 %n) local_unnamed_addr #0 {
				; CHECK-LABEL: sum:
				; CHECK: ldr [[R0:r[0-9]+]],
				; CHECK-NEXT: ldr [[R1:r[0-9]+\|lr]],
				; CHECK-NEXT: add [[R2:r[0-9]+]], [[R1]], [[R0]]
				; CHECK-NEXT: cmp [[R2]], [[R1]]
				; CHECK-NEXT: strvc [[R2]],
				; CHECK-NEXT: addvc
				; CHECK-NEXT: cmpvc
				; CHECK-NEXT: bvs
				entry:
				%cmp7 = icmp eq i32 %n, 0
				br i1 %cmp7, label %for.cond.cleanup, label %for.body

				for.cond.cleanup:
				ret void

				for.body:
				%i.08 = phi i32 [ %7, %cont2 ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %b, i32 %i.08
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx1 = getelementptr inbounds i32, i32* %a, i32 %i.08
				%1 = load i32, i32* %arrayidx1, align 4
				%2 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %1, i32 %0)
				%3 = extractvalue { i32, i1 } %2, 1
				br i1 %3, label %trap, label %cont

				trap:
				tail call void @llvm.trap() #2
				unreachable

				cont:
				%4 = extractvalue { i32, i1 } %2, 0
				store i32 %4, i32* %arrayidx1, align 4
				%5 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %i.08, i32 1)
				%6 = extractvalue { i32, i1 } %5, 1
				br i1 %6, label %trap, label %cont2

				cont2:
				%7 = extractvalue { i32, i1 } %5, 0
				%cmp = icmp eq i32 %7, %n
				br i1 %cmp, label %for.cond.cleanup, label %for.body

				}

				declare void @llvm.trap() #2
				declare { i32, i1 } @llvm.sadd.with.overflow.i32(i32, i32) #1
				declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) #1
				declare { i32, i1 } @llvm.ssub.with.overflow.i32(i32, i32) #1
				declare { i32, i1 } @llvm.usub.with.overflow.i32(i32, i32) #1