This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMISelLowering.h
1/26
ARMISelLowering.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
2
su-addsub-overflow.ll

Differential D35635

[ARM] Optimize {s,u}{add,sub}.with.overflow
ClosedPublic

Authored by jgalenson on Jul 19 2017, 10:12 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
efriedma
rengolin
rogfer01
peter.smith

Commits

rG6f4e827e4cfa: [ARM] Optimize {s,u}{add,sub}.with.overflow.
rL321224: [ARM] Optimize {s,u}{add,sub}.with.overflow.

Summary

The AArch64 backend contains code to optimize {s,u}{add,sub}.with.overflow during ISel. This commit ports that code to the ARM backend.

Diff Detail

Event Timeline

jgalenson created this revision.Jul 19 2017, 10:12 AM

Herald added subscribers: kristof.beyls, javed.absar, aemerson. · View Herald TranscriptJul 19 2017, 10:12 AM

There doesn't seem to be any test of the inversion logic.

Yes, I can't get a testcase to trigger those paths. Do you have any suggestions for how to do so?

Please include full context when you upload patches to Phabricator; see http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface.

This sort of overlaps with https://reviews.llvm.org/D35192.

lib/Target/ARM/ARMISelLowering.cpp
4484	We should be able to optimize away the AND here using known bits; does that not happen on trunk?
test/CodeGen/ARM/su_addsub_overflow.ll
20 ↗	(On Diff #107331)	These CHECK lines are not very good; try generating checks with utils/update_llc_test_checks.py. Also, please commit this first and rebase your patch on top of it; that makes it easy to see what happens to the generated code.

jgalenson added inline comments.Jul 19 2017, 11:27 AM

lib/Target/ARM/ARMISelLowering.cpp
4484	It doesn't because ARM doesn't call setBooleanContents. If there's a way to set that correctly, it would indeed be better. Do you think we could do that?
test/CodeGen/ARM/su_addsub_overflow.ll
20 ↗	(On Diff #107331)	That tool does make the changes to the testfiles more obvious, but it also generates overly-specific testcases that seem more brittle. I'll see if I can make a nice middle-ground and update it in a bit.

efriedma added inline comments.Jul 19 2017, 11:33 AM

lib/Target/ARM/ARMISelLowering.cpp
4484	Really? Wow, that's a big oversight. We should fix that.

This is the code for my testcase before my patch. I'll now update it after my patch.

Improve CHECK lines on the test.

jgalenson added inline comments.Jul 19 2017, 12:58 PM

lib/Target/ARM/ARMISelLowering.cpp
4484	Do you know the correct way to set it? I tried, but a few tests failed, and I don't know the architecture well enough to know if that was my fault or of the tests just needed to be tweaked. If we do make that change, then I'll be able to simplify this code, of course.
test/CodeGen/ARM/su_addsub_overflow.ll
20 ↗	(On Diff #107331)	Okay, I've uploaded both the old and new versions of the test here. Thanks for that tool, by the way; it's pretty convenient.

efriedma added inline comments.Jul 21 2017, 4:08 PM

lib/Target/ARM/ARMISelLowering.cpp
4484	It's likely the tests just need to be tweaked; a lot of them are pretty sensitive to the exact compiler output.

jgalenson added inline comments.Jul 21 2017, 5:42 PM

lib/Target/ARM/ARMISelLowering.cpp
4484	So which argument to setBooleanContents is correct for ARM? Is it ZeroOrOne?

efriedma added inline comments.Jul 21 2017, 5:59 PM

lib/Target/ARM/ARMISelLowering.cpp
4484	ZeroOrOne is probably best; among other things, it matches the AAPCS calling convention rules for bool.

jgalenson added inline comments.Jul 21 2017, 6:04 PM

lib/Target/ARM/ARMISelLowering.cpp
4484	Okay, thanks. I'll look into a patch that simply adds that and then rebase this commit on top of it, which will allow me to remove the AND-checking logic here.

Now that the setBooleanContents patch has landed, I can remove the AND checks here.

I also added the same optimization to BRCOND, which allows it to optimize another case.

efriedma added inline comments.Aug 23 2017, 12:27 PM

test/CodeGen/ARM/su-addsub-overflow.ll
50	This looks weird; we're generating a cmp, then a sub with exactly the same operands?

jgalenson added inline comments.Aug 23 2017, 12:33 PM

test/CodeGen/ARM/su-addsub-overflow.ll
50	Well, as you can see in diff 2, we're currently doing that with a lot of other instructions. This removes most of those other instructions. I have another patch I was going to send after this one that fixes this. Specifically, ARMBaseInstrInfo::optimizeCompareInstr tries to remove the cmp, but it runs right after the MachineSink pass, which sinks the sub into the cont basic block, which stops the optimization from working.

Friendly ping.

If it would help, I can submit my follow-up patch now, although I don't think this should depend on it.

I'm kind of waiting for https://reviews.llvm.org/D35192 to land here... unfortunately, it's been bouncing in and out of the tree due to regressions, but once it lands, I'd like to see what it does to our lowering here.

I think it might make more sense to wait until we have ARMISD::ADDC nodes in the DAG, then try to to optimize away the ARMISD::CMP nodes. That way, you don't have to worry about trying to optimize away redundant instructions after SelectionDAG.

Now that https://reviews.llvm.org/D35192 has landed, I've rebased this patch on top of it.

The existing patch actually applied cleanly and worked. However, the two unsigned cases generated slightly worse code (still better than in-tree, however). I managed to improve the uadd case by having getARMXALUOp return an ADDC instead of an ADD, but this did not work for the usub case (although I left the change). There's probably a way to fix the usub case, but I'm not worried much about it because my follow-up patch that modifies ARMBaseInstrInfo::optimizeCompareInstr handles it.

So does this change to getARMXALUOp look correct?

The change to getARMXALUOOp is wrong; ADDC produces two results, so you're making a node with the wrong type.

So on trunk, for the llvm.uadd.with.overflow.i32 case, we produce a sequence like this:

adds    r0, r0, r1
mov     r2, #0
adc     r1, r2, #0
cmp     r1, #1

This is obviously not great... but the ARMISD::ADDE+ARMISD::CMP pattern is something you could DAGCombine away after legalization. I would prefer to do that, rather than try to clean it up after isel. Everything gets more complicated when you're dealing with MachineInstrs (you might end up optimizing simple cases, but not more complex ones), and the pattern works automatically with llvm.uadd.with.overflow.i64 etc.

I haven't thought through how exactly that extends to signed overflow, though.

Thanks for the info about ADC. How come no test/assert picked that up?

Here's the rebased patch without that change. You can see that uadd is slightly worse than sadd, but it's still an improvement. I'll look into DAGCombine tomorrow, although I do think that it's fine to do at least some of it at the MI level, since the code there already handles some cases and I'm extending it to catch a few more that it misses. But if it can be done at the DAG level, that's worth doing.

In terms of assertions, we have checks for a lot of nodes in SelectionDAG::getNode... but not all (and I guess ISD::ADDC isn't one of the ones we check). And legalization isn't very picky either.

In terms of tests, the ISD::ADDC probably got lowered to something else before it could do any damage. :)

Here's a modification of my incorrect commit from yesterday afternoon that properly gets the correct value out of the ADDC.

efriedma added inline comments.Sep 20 2017, 3:08 PM

lib/Target/ARM/ARMISelLowering.cpp
3942	Still the wrong type... you need to call getVTList to get the right type for an ADDC. Also, are you sure you don't want an ARMISD::ADDC, rather than an ISD::ADDC?

Oops, sorry, I must have lost that bit of my commit somehow.

jgalenson added a child revision: D38378: [ARM] Optimize {s,u}{add,sub}.with.overflow..Sep 28 2017, 12:40 PM

Is it really necessary to have two different of almost identical code to generate an ARMISD::BRCOND? (I would rather have an explicit check for an AND than two versions of the code, if that's the issue.)

In D35635#887746, @efriedma wrote:

Is it really necessary to have two different of almost identical code to generate an ARMISD::BRCOND? (I would rather have an explicit check for an AND than two versions of the code, if that's the issue.)

Both of them are used in the attached testcase. The br_cc case handles almost everything, but the brcond case is needed once (when the brcond isn't combined into a br_cc).

We could outline the two cases into a helper function, although they're different enough that I'm not sure that it would help too much. What do you think?

Friendly ping.

when the brcond isn't combined into a br_cc

BRCOND isn't legal on ARM; it will always eventually get transformed to BR_CC.

In D35635#899129, @efriedma wrote:

when the brcond isn't combined into a br_cc

BRCOND isn't legal on ARM; it will always eventually get transformed to BR_CC.

True. In most of my testcases, brcond is combined into br_cc, so we hit my new br_cc case. However, one time the brcond is not combined into br_cc. It is legalized into it shortly afterwards, but then we lower saddo/uaddo/etc. the normal way before we lower the new br_cc node (and hit my new optimized case). That's why I needed the brcond case.

What is the best way to handle that? Adding code to the normal saddo/etc. lowering code not to lower it if there's a br_cc seems the wrong way to go about it (and I don't think it's even possible). And I don't know how to change the order we see nodes.

but then we lower saddo/uaddo/etc. the normal way before we lower the new br_cc node

Oh, hmm, someone else hit the same issue recently (in a different context). It should be possible to fix; SelectionDAG::Legalize is the relevant code. It doesn't quite work the way you want it to because we don't update the ordering when we add new nodes to the DAG. But I haven't thought through how to compute the right order to visit without recomputing the entire topological order from scratch.

Alternatively, you could try DAGCombining ARMISD::BRCOND after legalization?

It doesn't quite work the way you want it to because we don't update the ordering when we add new nodes to the DAG. But I haven't thought through how to compute the right order to visit without recomputing the entire topological order from scratch.

This does sound like it would solve the problem (and solve other problems as well). It seems a bit out of scope for this commit, though.

Alternatively, you could try DAGCombining ARMISD::BRCOND after legalization?

Good idea. However, the ARMISD::BRCOND isn't combined until after saddo is lowered.

For reference, here's the sequence of events I'm seeing in this one example:

brcond is legalized to br_cc
saddo is legalized
br_cc is legalized to ARMISD::BRCOND
ARMISB::BRCOND is legalized
everything is combined

So combining ARMISD::BRCOND runs into the same problem as using br_cc; they're both too late. Thus without changing the ordering of new DAG nodes (which seems a bit difficult), I don't see how I can do this efficiently without keeping the brcond case.

Good idea. However, the ARMISD::BRCOND isn't combined until after saddo is lowered.

That shouldn't completely block all transforms; you could pattern-match the ARMISD nodes. But I guess the patterns become a lot more complicated, so maybe not worth doing.

lib/Target/ARM/ARMISelLowering.cpp
4581	Is this assert actually guaranteed somehow? I mean, it should be possible to transform any relevant condition to an equivalent SETEQ or SETNE, but I don't see any code to actually ensure this.
4600	I would probably write "if ((CC == ISD::SETNE) != isOneConstant(RHS))".

Good idea. However, the ARMISD::BRCOND isn't combined until after saddo is lowered.

That shouldn't completely block all transforms; you could pattern-match the ARMISD nodes. But I guess the patterns become a lot more complicated, so maybe not worth doing.

I'm not sure what you mean. After the ARMISD::BRCOND is created, only a few unrelated instructions are matched before the saddo is transformed. So without the reordering we've discussed, I don't see how this would help. I'm probably missing something, but it sounds like it might well not be worth doing.

jgalenson marked an inline comment as done.Oct 20 2017, 2:43 PM

jgalenson added inline comments.

lib/Target/ARM/ARMISelLowering.cpp
4581	Good point. I actually copied that part of the code over from the AArch64 backend. I would guess that the assumption is that you should only be checking if the overflow bit is set or unset, hence EQ or NE, but I could imagine something transforming those into other operations. I changed it to be part of the condition to avoid the problem. Do you think it's worth doing something similar in the AArch64 backend?

I'm not sure what you mean. After the ARMISD::BRCOND is created, only a few unrelated instructions are matched before the saddo is transformed. So without the reordering we've discussed, I don't see how this would help. I'm probably missing something, but it sounds like it might well not be worth doing.

I mean you could pattern match after the uaddo/saddo is transformed. For unsigned add, you get something like ARMISD::BRCOND(ARMISD::CMPZ(ARMISD::ADDE(0, 0, N), 1)), which you can transform by eliminating the CMPZ+ADDE. The pattern for signed add is slightly different, but similar.

lib/Target/ARM/ARMISelLowering.cpp
4581	Yes.

I mean you could pattern match after the uaddo/saddo is transformed. For unsigned add, you get something like ARMISD::BRCOND(ARMISD::CMPZ(ARMISD::ADDE(0, 0, N), 1)), which you can transform by eliminating the CMPZ+ADDE. The pattern for signed add is slightly different, but similar.

Ah yes. I considered that, but as you said, it would have more complicated matching and it would be more brittle if the saddo lowering changes (like it just did), so I preferred doing it this way.

lib/Target/ARM/ARMISelLowering.cpp
4581	Okay, I'll work on submitting a patch there (it should be NFC).

Ping.

jgalenson retitled this revision from Optimize {s,u}{add,sub}.with.overflow on ARM to [ARM] Optimize {s,u}{add,sub}.with.overflow.Nov 9 2017, 9:31 AM

jgalenson added reviewers: rengolin, rogfer01.Nov 16 2017, 10:44 AM

rengolin added a reviewer: peter.smith.Nov 16 2017, 12:44 PM

Are there any other comments? What can I do to get this approved?

rogfer01 added inline comments.Nov 29 2017, 10:16 AM

lib/Target/ARM/ARMISelLowering.cpp
4542–4543	Apologies if this question looks a bit clueless about all the transformation, but why you need to reverse the condition code here?

jgalenson added inline comments.Nov 29 2017, 11:20 AM

lib/Target/ARM/ARMISelLowering.cpp
4542–4543	No, it's a good question, and I'm a bit confused about this myself. The getARMXALUOOp function seems to return the condition for whether there is not an overflow. It doesn't seem to be documented anywhere, but for example, in its SADDO case, it uses the VC flag, which is the "no overflow" case. Assuming that's right, then since these branches are branching when an overflow occurs, I need to invert the condition. Does that logic seem right?

jgalenson added a child revision: D40922: [ARM] Optimize {s|u}mul.with.overflow..Dec 6 2017, 2:54 PM

rogfer01 added inline comments.Dec 7 2017, 6:23 AM

lib/Target/ARM/ARMISelLowering.cpp
3916	We could return `std::tuple<SDValue, SDValue, SDValue>` here I think, but better we leave this for a later change.
3942	This comment now looks odd because I had to revert D35192 (still working on it, though). What was the reason to use a target specific node here? Did you want to save some round trips lowering this or you needed it for better codegen?
4542–4543	It is certainly confusing. It returns three things: the intended arithmetic code (in `Value`), the comparison of the overflow (in `OverflowCmp`) and a condition code related to `OverflowCmp` (in `ARMcc`). For unsigned addition (`UADDO`) it does the following Value: add z, x, y OverflowCmp: cmp z, x ARMcc: HS The comparison will compute `z - x` and update the flags. Using the `HS` condition code means `z >= x` and this is exactly no overflow (because `z ← x + y`). Similarly when doing a signed addition (`SADDO`) Value: add z, x, y OverflowCmp: cmp z, x ARMcc: VC This case is less obvious but if there is no overflow in this case there must not be overflow in the comparison (`VC`): if there were no overflow in the `add`, then `cmp` because it does `z - x` to update the flags would compute `y` which does not overflow. Similarly for `USUBO` and `SSUBO` but here the comparison is `cmp x, y` (instead of `cmp z, x`) which is (suprisingly) more intuitive because unsigned `x - y` will not overflow if `x >= y` and trivially for the signed case as not overflowing is exactly `VC`. So after this wall of text, I believe your logic is sensible here. Given that `getARMXALUOOp` is now going to be used in three places after your change, would it be possible to add a comment to that function like the one below (or an improved wording if you come up with a better one!) // This function returns three things: the arithmetic computation itself // (Value), a comparison (OverflowCmp) and a condition code (ARMcc). The // comparison and the condition code define the case in which the arithmetic // computation does not overflow. std::pair<SDValue, SDValue> ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG, SDValue &ARMcc) const { ...

jgalenson added inline comments.Dec 7 2017, 10:06 AM

lib/Target/ARM/ARMISelLowering.cpp
3916	Yes, that does seem cleaner to me. I can make a follow-up patch after this lands.
3942	I used the target-specific node here because efriedma suggested it, but using the generic ISD::ADDC does seem to work (it passes all the tests). Should I change it? I don't know much about the difference between them. It is interesting that this still works after you reverted your patch.
4542–4543	Thanks for the long double-checking! That comment looks good to me, so I'll add it.

Added rogfer01's comment to getARMXALUOOp.

rogfer01 added inline comments.Dec 8 2017, 12:44 AM

lib/Target/ARM/ARMISelLowering.cpp
3942	My understanding is that several generic nodes (`ISD::`) have to be lowered at some point to Arm-specific ones (`ARMISD::`). `ISD::ADD` and `ISD::SUB` are among those nodes. Using Arm-specific nodes earlier might bring finer control but may miss some of the generic combiners done on generic nodes. But it is true that when this function is used, the generic node will be found among specific ones (like `ARMISD::CMP`) so generic combiners are unlikely to kick in here, which might be a reason to use the specific one directly and not bothering using the generic one (which I presume will require an extra iteration replacing it with a specific one). Changing just this case will make this function look a bit odd. If Eli's comment was in the line of "let's make this function only use ARMISD" (instead of `ISD::ADD` and `ISD::SUB`) we can make this change in a later patch and leave this function untouched (except for the comment). @efriedma am I missing something?

I've removed the modification to getARMXALUOOp that uses ADDC instead of ADD. From the discussion, it's not clear exactly how that part should work, and it's no longer needed, so this makes things simpler.

Now that D35192 has re-landed I need to update this patch.

Note that I added back the use of ARMISD::ADDC, which were were discussing a bit before.

efriedma added inline comments.Dec 12 2017, 3:52 PM

lib/Target/ARM/ARMISelLowering.cpp
3942	Please don't use ISD::ADDC here; now that we're using ADDCARRY, we shouldn't produce that node at all on ARM. (LowerADDC_ADDE_SUBC_SUBE should have been removed as part of D35192. ARMISD::ADDC is a different node which is still useful.)

Add the ARMISD::ADDC node for real. Thanks for catching that!

rogfer01 added inline comments.Dec 13 2017, 12:28 AM

lib/Target/ARM/ARMISelLowering.cpp
3942	Thanks @efriedma. Now I see that I should have removed `LowerADDC_ADDE_SUBC_SUBE` in D35192. That said I fail to understand why we want to update this function in this patch. @jgalenson does keep using `ISD:ADD` (not to confuse it with `ISD::ADDC`) impact the code generation? I mean, I understand we can use `ISD::ADDCARRY` here but maybe we want to do this in a later patch?

jgalenson added inline comments.Dec 13 2017, 8:54 AM

lib/Target/ARM/ARMISelLowering.cpp
3942	Yes, with D35192 landed using ISD::ADD produces worse code. It's possible there's a different way to fix that, but doing this works as well.

Ping.

What else should I do to get this approved? It seems like there's agreement that this patch looks at least mostly good, and it (and the follow-ups) significantly improve the performance of overflow-checked code for ARM.

LGTM.

This revision is now accepted and ready to land.Dec 20 2017, 1:37 PM

Closed by commit rL321224: [ARM] Optimize {s,u}{add,sub}.with.overflow. (authored by jgalenson). · Explain WhyDec 20 2017, 2:26 PM

This revision was automatically updated to reflect the committed changes.

Thanks for all the comments on this!

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMISelLowering.h

1 line

ARMISelLowering.cpp

69 lines

test/

CodeGen/

ARM/

su-addsub-overflow.ll

136 lines

Diff 119716

lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 636 Lines • ▼ Show 20 Lines	private:
SDValue LowerGlobalTLSAddressDarwin(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerGlobalTLSAddressDarwin(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerGlobalTLSAddressWindows(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerGlobalTLSAddressWindows(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerGLOBAL_OFFSET_TABLE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerGLOBAL_OFFSET_TABLE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBR_JT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBR_JT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSignedALUO(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSignedALUO(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerUnsignedALUO(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerUnsignedALUO(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerConstantFP(SDValue Op, SelectionDAG &DAG,		SDValue LowerConstantFP(SDValue Op, SelectionDAG &DAG,
▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,043 Lines • ▼ Show 20 Lines	ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);		setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);
setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);		setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);
setOperationAction(ISD::SELECT_CC, MVT::f64, Custom);		setOperationAction(ISD::SELECT_CC, MVT::f64, Custom);

// Thumb-1 cannot currently select ARMISD::SUBE.		// Thumb-1 cannot currently select ARMISD::SUBE.
if (!Subtarget->isThumb1Only())		if (!Subtarget->isThumb1Only())
setOperationAction(ISD::SETCCE, MVT::i32, Custom);		setOperationAction(ISD::SETCCE, MVT::i32, Custom);

setOperationAction(ISD::BRCOND, MVT::Other, Expand);		setOperationAction(ISD::BRCOND, MVT::Other, Custom);
setOperationAction(ISD::BR_CC, MVT::i32, Custom);		setOperationAction(ISD::BR_CC, MVT::i32, Custom);
setOperationAction(ISD::BR_CC, MVT::f32, Custom);		setOperationAction(ISD::BR_CC, MVT::f32, Custom);
setOperationAction(ISD::BR_CC, MVT::f64, Custom);		setOperationAction(ISD::BR_CC, MVT::f64, Custom);
setOperationAction(ISD::BR_JT, MVT::Other, Custom);		setOperationAction(ISD::BR_JT, MVT::Other, Custom);

// We don't support sin/cos/fmod/copysign/pow		// We don't support sin/cos/fmod/copysign/pow
setOperationAction(ISD::FSIN, MVT::f64, Expand);		setOperationAction(ISD::FSIN, MVT::f64, Expand);
setOperationAction(ISD::FSIN, MVT::f32, Expand);		setOperationAction(ISD::FSIN, MVT::f32, Expand);
▲ Show 20 Lines • Show All 2,847 Lines • ▼ Show 20 Lines	else {
assert(Opc == ARMISD::CMPFPw0 && "unexpected operand of FMSTAT");		assert(Opc == ARMISD::CMPFPw0 && "unexpected operand of FMSTAT");
Cmp = DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0),		Cmp = DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0),
Cmp.getOperand(1));		Cmp.getOperand(1));
}		}
return DAG.getNode(ARMISD::FMSTAT, DL, MVT::Glue, Cmp);		return DAG.getNode(ARMISD::FMSTAT, DL, MVT::Glue, Cmp);
}		}

std::pair<SDValue, SDValue>		std::pair<SDValue, SDValue>
ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG,		ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG,
		rogfer01Unsubmitted Not Done Reply Inline Actions We could return `std::tuple<SDValue, SDValue, SDValue>` here I think, but better we leave this for a later change. rogfer01: We could return `std::tuple<SDValue, SDValue, SDValue>` here I think, but better we leave this…
		jgalensonAuthorUnsubmitted Not Done Reply Inline Actions Yes, that does seem cleaner to me. I can make a follow-up patch after this lands. jgalenson: Yes, that does seem cleaner to me. I can make a follow-up patch after this lands.
SDValue &ARMcc) const {		SDValue &ARMcc) const {
assert(Op.getValueType() == MVT::i32 && "Unsupported value type");		assert(Op.getValueType() == MVT::i32 && "Unsupported value type");

SDValue Value, OverflowCmp;		SDValue Value, OverflowCmp;
SDValue LHS = Op.getOperand(0);		SDValue LHS = Op.getOperand(0);
SDValue RHS = Op.getOperand(1);		SDValue RHS = Op.getOperand(1);
SDLoc dl(Op);		SDLoc dl(Op);

// FIXME: We are currently always generating CMPs because we don't support		// FIXME: We are currently always generating CMPs because we don't support
// generating CMN through the backend. This is not as good as the natural		// generating CMN through the backend. This is not as good as the natural
// CMP case because it causes a register dependency and cannot be folded		// CMP case because it causes a register dependency and cannot be folded
// later.		// later.

switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default:		default:
llvm_unreachable("Unknown overflow instruction!");		llvm_unreachable("Unknown overflow instruction!");
case ISD::SADDO:		case ISD::SADDO:
ARMcc = DAG.getConstant(ARMCC::VC, dl, MVT::i32);		ARMcc = DAG.getConstant(ARMCC::VC, dl, MVT::i32);
Value = DAG.getNode(ISD::ADD, dl, Op.getValueType(), LHS, RHS);		Value = DAG.getNode(ISD::ADD, dl, Op.getValueType(), LHS, RHS);
OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value, LHS);		OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value, LHS);
break;		break;
case ISD::UADDO:		case ISD::UADDO:
ARMcc = DAG.getConstant(ARMCC::HS, dl, MVT::i32);		ARMcc = DAG.getConstant(ARMCC::HS, dl, MVT::i32);
Value = DAG.getNode(ISD::ADD, dl, Op.getValueType(), LHS, RHS);		// We use ADDC here to correspond to its use in LowerUnsignedALUO.
		// We do not use it in the USUBO case as Value may not be used.
		Value = DAG.getNode(ARMISD::ADDC, dl,
		efriedmaUnsubmitted Not Done Reply Inline Actions Still the wrong type... you need to call getVTList to get the right type for an ADDC. Also, are you sure you don't want an ARMISD::ADDC, rather than an ISD::ADDC? efriedma: Still the wrong type... you need to call getVTList to get the right type for an ADDC. Also…
		rogfer01Unsubmitted Not Done Reply Inline Actions This comment now looks odd because I had to revert D35192 (still working on it, though). What was the reason to use a target specific node here? Did you want to save some round trips lowering this or you needed it for better codegen? rogfer01: This comment now looks odd because I had to revert D35192 (still working on it, though). What…
		jgalensonAuthorUnsubmitted Not Done Reply Inline Actions I used the target-specific node here because efriedma suggested it, but using the generic ISD::ADDC does seem to work (it passes all the tests). Should I change it? I don't know much about the difference between them. It is interesting that this still works after you reverted your patch. jgalenson: I used the target-specific node here because efriedma suggested it, but using the generic ISD…
		rogfer01Unsubmitted Not Done Reply Inline Actions My understanding is that several generic nodes (`ISD::`) have to be lowered at some point to Arm-specific ones (`ARMISD::`). `ISD::ADD` and `ISD::SUB` are among those nodes. Using Arm-specific nodes earlier might bring finer control but may miss some of the generic combiners done on generic nodes. But it is true that when this function is used, the generic node will be found among specific ones (like `ARMISD::CMP`) so generic combiners are unlikely to kick in here, which might be a reason to use the specific one directly and not bothering using the generic one (which I presume will require an extra iteration replacing it with a specific one). Changing just this case will make this function look a bit odd. If Eli's comment was in the line of "let's make this function only use ARMISD" (instead of `ISD::ADD` and `ISD::SUB`) we can make this change in a later patch and leave this function untouched (except for the comment). @efriedma am I missing something? rogfer01: My understanding is that several generic nodes (`ISD::*`) have to be lowered at some point to…
		efriedmaUnsubmitted Not Done Reply Inline Actions Please don't use ISD::ADDC here; now that we're using ADDCARRY, we shouldn't produce that node at all on ARM. (LowerADDC_ADDE_SUBC_SUBE should have been removed as part of D35192. ARMISD::ADDC is a different node which is still useful.) efriedma: Please don't use ISD::ADDC here; now that we're using ADDCARRY, we shouldn't produce that node…
		rogfer01Unsubmitted Not Done Reply Inline Actions Thanks @efriedma. Now I see that I should have removed `LowerADDC_ADDE_SUBC_SUBE` in D35192. That said I fail to understand why we want to update this function in this patch. @jgalenson does keep using `ISD:ADD` (not to confuse it with `ISD::ADDC`) impact the code generation? I mean, I understand we can use `ISD::ADDCARRY` here but maybe we want to do this in a later patch? rogfer01: Thanks @efriedma. Now I see that I should have removed `LowerADDC_ADDE_SUBC_SUBE` in D35192.
		jgalensonAuthorUnsubmitted Not Done Reply Inline Actions Yes, with D35192 landed using ISD::ADD produces worse code. It's possible there's a different way to fix that, but doing this works as well. jgalenson: Yes, with D35192 landed using ISD::ADD produces worse code. It's possible there's a different…
		DAG.getVTList(Op.getValueType(), MVT::i32), LHS, RHS)
		.getValue(0);
OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value, LHS);		OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value, LHS);
break;		break;
case ISD::SSUBO:		case ISD::SSUBO:
ARMcc = DAG.getConstant(ARMCC::VC, dl, MVT::i32);		ARMcc = DAG.getConstant(ARMCC::VC, dl, MVT::i32);
Value = DAG.getNode(ISD::SUB, dl, Op.getValueType(), LHS, RHS);		Value = DAG.getNode(ISD::SUB, dl, Op.getValueType(), LHS, RHS);
OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, LHS, RHS);		OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, LHS, RHS);
break;		break;
case ISD::USUBO:		case ISD::USUBO:
▲ Show 20 Lines • Show All 523 Lines • ▼ Show 20 Lines	ARMTargetLowering::OptimizeVFPBrcond(SDValue Op, SelectionDAG &DAG) const {
SDValue RHS = Op.getOperand(3);		SDValue RHS = Op.getOperand(3);
SDValue Dest = Op.getOperand(4);		SDValue Dest = Op.getOperand(4);
SDLoc dl(Op);		SDLoc dl(Op);

bool LHSSeenZero = false;		bool LHSSeenZero = false;
bool LHSOk = canChangeToInt(LHS, LHSSeenZero, Subtarget);		bool LHSOk = canChangeToInt(LHS, LHSSeenZero, Subtarget);
bool RHSSeenZero = false;		bool RHSSeenZero = false;
bool RHSOk = canChangeToInt(RHS, RHSSeenZero, Subtarget);		bool RHSOk = canChangeToInt(RHS, RHSSeenZero, Subtarget);
if (LHSOk && RHSOk && (LHSSeenZero \|\| RHSSeenZero)) {		if (LHSOk && RHSOk && (LHSSeenZero \|\| RHSSeenZero)) {
		efriedmaUnsubmitted Not Done Reply Inline Actions We should be able to optimize away the AND here using known bits; does that not happen on trunk? efriedma: We should be able to optimize away the AND here using known bits; does that not happen on trunk?
		jgalensonAuthorUnsubmitted Not Done Reply Inline Actions It doesn't because ARM doesn't call setBooleanContents. If there's a way to set that correctly, it would indeed be better. Do you think we could do that? jgalenson: It doesn't because ARM doesn't call setBooleanContents. If there's a way to set that correctly…
		efriedmaUnsubmitted Not Done Reply Inline Actions Really? Wow, that's a big oversight. We should fix that. efriedma: Really? Wow, that's a big oversight. We should fix that.
		jgalensonAuthorUnsubmitted Not Done Reply Inline Actions Do you know the correct way to set it? I tried, but a few tests failed, and I don't know the architecture well enough to know if that was my fault or of the tests just needed to be tweaked. If we do make that change, then I'll be able to simplify this code, of course. jgalenson: Do you know the correct way to set it? I tried, but a few tests failed, and I don't know the…
		efriedmaUnsubmitted Not Done Reply Inline Actions It's likely the tests just need to be tweaked; a lot of them are pretty sensitive to the exact compiler output. efriedma: It's likely the tests just need to be tweaked; a lot of them are pretty sensitive to the exact…
		jgalensonAuthorUnsubmitted Not Done Reply Inline Actions So which argument to setBooleanContents is correct for ARM? Is it ZeroOrOne? jgalenson: So which argument to setBooleanContents is correct for ARM? Is it ZeroOrOne?
		efriedmaUnsubmitted Not Done Reply Inline Actions ZeroOrOne is probably best; among other things, it matches the AAPCS calling convention rules for bool. efriedma: ZeroOrOne is probably best; among other things, it matches the AAPCS calling convention rules…
		jgalensonAuthorUnsubmitted Not Done Reply Inline Actions Okay, thanks. I'll look into a patch that simply adds that and then rebase this commit on top of it, which will allow me to remove the AND-checking logic here. jgalenson: Okay, thanks. I'll look into a patch that simply adds that and then rebase this commit on top…
// If unsafe fp math optimization is enabled and there are no other uses of		// If unsafe fp math optimization is enabled and there are no other uses of
// the CMP operands, and the condition code is EQ or NE, we can optimize it		// the CMP operands, and the condition code is EQ or NE, we can optimize it
// to an integer comparison.		// to an integer comparison.
if (CC == ISD::SETOEQ)		if (CC == ISD::SETOEQ)
CC = ISD::SETEQ;		CC = ISD::SETEQ;
else if (CC == ISD::SETUNE)		else if (CC == ISD::SETUNE)
CC = ISD::SETNE;		CC = ISD::SETNE;

Show All 21 Lines	if (LHSOk && RHSOk && (LHSSeenZero \|\| RHSSeenZero)) {
SDVTList VTList = DAG.getVTList(MVT::Other, MVT::Glue);		SDVTList VTList = DAG.getVTList(MVT::Other, MVT::Glue);
SDValue Ops[] = { Chain, ARMcc, LHS1, LHS2, RHS1, RHS2, Dest };		SDValue Ops[] = { Chain, ARMcc, LHS1, LHS2, RHS1, RHS2, Dest };
return DAG.getNode(ARMISD::BCC_i64, dl, VTList, Ops);		return DAG.getNode(ARMISD::BCC_i64, dl, VTList, Ops);
}		}

return SDValue();		return SDValue();
}		}

		SDValue ARMTargetLowering::LowerBRCOND(SDValue Op, SelectionDAG &DAG) const {
		SDValue Chain = Op.getOperand(0);
		SDValue Cond = Op.getOperand(1);
		SDValue Dest = Op.getOperand(2);
		SDLoc dl(Op);

		// Optimize {s\|u}{add\|sub}.with.overflow feeding into a branch instruction.
		unsigned Opc = Cond.getOpcode();
		if (Cond.getResNo() == 1 && (Opc == ISD::SADDO \|\| Opc == ISD::UADDO \|\|
		Opc == ISD::SSUBO \|\| Opc == ISD::USUBO)) {
		// Only lower legal XALUO ops.
		if (!DAG.getTargetLoweringInfo().isTypeLegal(Cond->getValueType(0)))
		return SDValue();

		// The actual operation with overflow check.
		SDValue Value, OverflowCmp;
		SDValue ARMcc;
		std::tie(Value, OverflowCmp) = getARMXALUOOp(Cond, DAG, ARMcc);

		// Reverse the condition code.
		ARMCC::CondCodes CondCode =
		(ARMCC::CondCodes)cast<const ConstantSDNode>(ARMcc)->getZExtValue();
		rogfer01Unsubmitted Not Done Reply Inline Actions Apologies if this question looks a bit clueless about all the transformation, but why you need to reverse the condition code here? rogfer01: Apologies if this question looks a bit clueless about all the transformation, but why you need…
		jgalensonAuthorUnsubmitted Not Done Reply Inline Actions No, it's a good question, and I'm a bit confused about this myself. The getARMXALUOOp function seems to return the condition for whether there is not an overflow. It doesn't seem to be documented anywhere, but for example, in its SADDO case, it uses the VC flag, which is the "no overflow" case. Assuming that's right, then since these branches are branching when an overflow occurs, I need to invert the condition. Does that logic seem right? jgalenson: No, it's a good question, and I'm a bit confused about this myself. The getARMXALUOOp function…
		rogfer01Unsubmitted Not Done Reply Inline Actions It is certainly confusing. It returns three things: the intended arithmetic code (in `Value`), the comparison of the overflow (in `OverflowCmp`) and a condition code related to `OverflowCmp` (in `ARMcc`). For unsigned addition (`UADDO`) it does the following Value: add z, x, y OverflowCmp: cmp z, x ARMcc: HS The comparison will compute `z - x` and update the flags. Using the `HS` condition code means `z >= x` and this is exactly no overflow (because `z ← x + y`). Similarly when doing a signed addition (`SADDO`) Value: add z, x, y OverflowCmp: cmp z, x ARMcc: VC This case is less obvious but if there is no overflow in this case there must not be overflow in the comparison (`VC`): if there were no overflow in the `add`, then `cmp` because it does `z - x` to update the flags would compute `y` which does not overflow. Similarly for `USUBO` and `SSUBO` but here the comparison is `cmp x, y` (instead of `cmp z, x`) which is (suprisingly) more intuitive because unsigned `x - y` will not overflow if `x >= y` and trivially for the signed case as not overflowing is exactly `VC`. So after this wall of text, I believe your logic is sensible here. Given that `getARMXALUOOp` is now going to be used in three places after your change, would it be possible to add a comment to that function like the one below (or an improved wording if you come up with a better one!) // This function returns three things: the arithmetic computation itself // (Value), a comparison (OverflowCmp) and a condition code (ARMcc). The // comparison and the condition code define the case in which the arithmetic // computation does not overflow. std::pair<SDValue, SDValue> ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG, SDValue &ARMcc) const { ... rogfer01: It is certainly confusing. It returns three things: the intended arithmetic code (in `Value`)…
		jgalensonAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the long double-checking! That comment looks good to me, so I'll add it. jgalenson: Thanks for the long double-checking! That comment looks good to me, so I'll add it.
		CondCode = ARMCC::getOppositeCondition(CondCode);
		ARMcc = DAG.getConstant(CondCode, SDLoc(ARMcc), MVT::i32);
		SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);

		return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Chain, Dest, ARMcc, CCR,
		OverflowCmp);
		}

		return SDValue();
		}

SDValue ARMTargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {		SDValue ARMTargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const {
SDValue Chain = Op.getOperand(0);		SDValue Chain = Op.getOperand(0);
ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(1))->get();
SDValue LHS = Op.getOperand(2);		SDValue LHS = Op.getOperand(2);
SDValue RHS = Op.getOperand(3);		SDValue RHS = Op.getOperand(3);
SDValue Dest = Op.getOperand(4);		SDValue Dest = Op.getOperand(4);
SDLoc dl(Op);		SDLoc dl(Op);

if (Subtarget->isFPOnlySP() && LHS.getValueType() == MVT::f64) {		if (Subtarget->isFPOnlySP() && LHS.getValueType() == MVT::f64) {
DAG.getTargetLoweringInfo().softenSetCCOperands(DAG, MVT::f64, LHS, RHS, CC,		DAG.getTargetLoweringInfo().softenSetCCOperands(DAG, MVT::f64, LHS, RHS, CC,
dl);		dl);

// If softenSetCCOperands only returned one value, we should compare it to		// If softenSetCCOperands only returned one value, we should compare it to
// zero.		// zero.
if (!RHS.getNode()) {		if (!RHS.getNode()) {
RHS = DAG.getConstant(0, dl, LHS.getValueType());		RHS = DAG.getConstant(0, dl, LHS.getValueType());
CC = ISD::SETNE;		CC = ISD::SETNE;
}		}
}		}

		// Optimize {s\|u}{add\|sub}.with.overflow feeding into a branch instruction.
		unsigned Opc = LHS.getOpcode();
		if (LHS.getResNo() == 1 && (isOneConstant(RHS) \|\| isNullConstant(RHS)) &&
		(Opc == ISD::SADDO \|\| Opc == ISD::UADDO \|\| Opc == ISD::SSUBO \|\|
		Opc == ISD::USUBO) && (CC == ISD::SETEQ \|\| CC == ISD::SETNE)) {
		// Only lower legal XALUO ops.
		if (!DAG.getTargetLoweringInfo().isTypeLegal(LHS->getValueType(0)))
		efriedmaUnsubmitted Not Done Reply Inline Actions Is this assert actually guaranteed somehow? I mean, it should be possible to transform any relevant condition to an equivalent SETEQ or SETNE, but I don't see any code to actually ensure this. efriedma: Is this assert actually guaranteed somehow? I mean, it should be possible to transform any…
		jgalensonAuthorUnsubmitted Not Done Reply Inline Actions Good point. I actually copied that part of the code over from the AArch64 backend. I would guess that the assumption is that you should only be checking if the overflow bit is set or unset, hence EQ or NE, but I could imagine something transforming those into other operations. I changed it to be part of the condition to avoid the problem. Do you think it's worth doing something similar in the AArch64 backend? jgalenson: Good point. I actually copied that part of the code over from the AArch64 backend. I would…
		efriedmaUnsubmitted Not Done Reply Inline Actions Yes. efriedma: Yes.
		jgalensonAuthorUnsubmitted Not Done Reply Inline Actions Okay, I'll work on submitting a patch there (it should be NFC). jgalenson: Okay, I'll work on submitting a patch there (it should be NFC).
		return SDValue();

		// The actual operation with overflow check.
		SDValue Value, OverflowCmp;
		SDValue ARMcc;
		std::tie(Value, OverflowCmp) = getARMXALUOOp(LHS.getValue(0), DAG, ARMcc);

		if ((CC == ISD::SETNE) != isOneConstant(RHS)) {
		// Reverse the condition code.
		ARMCC::CondCodes CondCode =
		(ARMCC::CondCodes)cast<const ConstantSDNode>(ARMcc)->getZExtValue();
		CondCode = ARMCC::getOppositeCondition(CondCode);
		ARMcc = DAG.getConstant(CondCode, SDLoc(ARMcc), MVT::i32);
		}
		SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);

		return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Chain, Dest, ARMcc, CCR,
		OverflowCmp);
		}
		efriedmaUnsubmitted Done Reply Inline Actions I would probably write "if ((CC == ISD::SETNE) != isOneConstant(RHS))". efriedma: I would probably write "if ((CC == ISD::SETNE) != isOneConstant(RHS))".

if (LHS.getValueType() == MVT::i32) {		if (LHS.getValueType() == MVT::i32) {
SDValue ARMcc;		SDValue ARMcc;
SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl);		SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl);
SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);		SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32);
return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other,		return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other,
Chain, Dest, ARMcc, CCR, Cmp);		Chain, Dest, ARMcc, CCR, Cmp);
}		}

▲ Show 20 Lines • Show All 3,240 Lines • ▼ Show 20 Lines	SDValue ARMTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
default: llvm_unreachable("Don't know how to custom lower this!");		default: llvm_unreachable("Don't know how to custom lower this!");
case ISD::WRITE_REGISTER: return LowerWRITE_REGISTER(Op, DAG);		case ISD::WRITE_REGISTER: return LowerWRITE_REGISTER(Op, DAG);
case ISD::ConstantPool: return LowerConstantPool(Op, DAG);		case ISD::ConstantPool: return LowerConstantPool(Op, DAG);
case ISD::BlockAddress: return LowerBlockAddress(Op, DAG);		case ISD::BlockAddress: return LowerBlockAddress(Op, DAG);
case ISD::GlobalAddress: return LowerGlobalAddress(Op, DAG);		case ISD::GlobalAddress: return LowerGlobalAddress(Op, DAG);
case ISD::GlobalTLSAddress: return LowerGlobalTLSAddress(Op, DAG);		case ISD::GlobalTLSAddress: return LowerGlobalTLSAddress(Op, DAG);
case ISD::SELECT: return LowerSELECT(Op, DAG);		case ISD::SELECT: return LowerSELECT(Op, DAG);
case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG);		case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG);
		case ISD::BRCOND: return LowerBRCOND(Op, DAG);
case ISD::BR_CC: return LowerBR_CC(Op, DAG);		case ISD::BR_CC: return LowerBR_CC(Op, DAG);
case ISD::BR_JT: return LowerBR_JT(Op, DAG);		case ISD::BR_JT: return LowerBR_JT(Op, DAG);
case ISD::VASTART: return LowerVASTART(Op, DAG);		case ISD::VASTART: return LowerVASTART(Op, DAG);
case ISD::ATOMIC_FENCE: return LowerATOMIC_FENCE(Op, DAG, Subtarget);		case ISD::ATOMIC_FENCE: return LowerATOMIC_FENCE(Op, DAG, Subtarget);
case ISD::PREFETCH: return LowerPREFETCH(Op, DAG, Subtarget);		case ISD::PREFETCH: return LowerPREFETCH(Op, DAG, Subtarget);
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP: return LowerINT_TO_FP(Op, DAG);		case ISD::UINT_TO_FP: return LowerINT_TO_FP(Op, DAG);
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
▲ Show 20 Lines • Show All 6,469 Lines • Show Last 20 Lines

test/CodeGen/ARM/su-addsub-overflow.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=arm-eabi -mcpu=generic \| FileCheck %s

				define i32 @sadd(i32 %a, i32 %b) local_unnamed_addr #0 {
				; CHECK-LABEL: sadd:
				; CHECK: mov r[[R0:[0-9]+]], r0
				; CHECK-NEXT: add r[[R1:[0-9]+]], r[[R0]], r1
				; CHECK-NEXT: cmp r[[R1]], r[[R0]]
				; CHECK-NEXT: movvc pc, lr
				entry:
				%0 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
				%1 = extractvalue { i32, i1 } %0, 1
				br i1 %1, label %trap, label %cont

				trap:
				tail call void @llvm.trap() #2
				unreachable

				cont:
				%2 = extractvalue { i32, i1 } %0, 0
				ret i32 %2

				}

				define i32 @uadd(i32 %a, i32 %b) local_unnamed_addr #0 {
				; CHECK-LABEL: uadd:
				; CHECK: mov r[[R0:[0-9]+]], r0
				; CHECK-NEXT: adds r[[R1:[0-9]+]], r[[R0]], r1
				; CHECK-NEXT: cmp r[[R1]], r[[R0]]
				; CHECK-NEXT: movhs pc, lr
				entry:
				%0 = tail call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
				%1 = extractvalue { i32, i1 } %0, 1
				br i1 %1, label %trap, label %cont

				trap:
				tail call void @llvm.trap() #2
				unreachable

				cont:
				%2 = extractvalue { i32, i1 } %0, 0
				ret i32 %2

				}

				define i32 @ssub(i32 %a, i32 %b) local_unnamed_addr #0 {
				; CHECK-LABEL: ssub:
				; CHECK: cmp r0, r1
				; CHECK-NEXT: subvc r0, r0, r1
				; CHECK-NEXT: movvc pc, lr
				efriedmaUnsubmitted Not Done Reply Inline Actions This looks weird; we're generating a cmp, then a sub with exactly the same operands? efriedma: This looks weird; we're generating a cmp, then a sub with exactly the same operands?
				jgalensonAuthorUnsubmitted Not Done Reply Inline Actions Well, as you can see in diff 2, we're currently doing that with a lot of other instructions. This removes most of those other instructions. I have another patch I was going to send after this one that fixes this. Specifically, ARMBaseInstrInfo::optimizeCompareInstr tries to remove the cmp, but it runs right after the MachineSink pass, which sinks the sub into the cont basic block, which stops the optimization from working. jgalenson: Well, as you can see in diff 2, we're currently doing that with a lot of other instructions.
				entry:
				%0 = tail call { i32, i1 } @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)
				%1 = extractvalue { i32, i1 } %0, 1
				br i1 %1, label %trap, label %cont

				trap:
				tail call void @llvm.trap() #2
				unreachable

				cont:
				%2 = extractvalue { i32, i1 } %0, 0
				ret i32 %2

				}

				define i32 @usub(i32 %a, i32 %b) local_unnamed_addr #0 {
				; CHECK-LABEL: usub:
				; CHECK: mov r2, r0
				; CHECK-NEXT: subs r0, r2, r1
				; CHECK-NEXT: cmp r2, r1
				; CHECK-NEXT: movhs pc, lr
				entry:
				%0 = tail call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %a, i32 %b)
				%1 = extractvalue { i32, i1 } %0, 1
				br i1 %1, label %trap, label %cont

				trap:
				tail call void @llvm.trap() #2
				unreachable

				cont:
				%2 = extractvalue { i32, i1 } %0, 0
				ret i32 %2

				}

				define void @sum(i32* %a, i32* %b, i32 %n) local_unnamed_addr #0 {
				; CHECK-LABEL: sum:
				; CHECK: ldr [[R0:r[0-9]+]],
				; CHECK-NEXT: ldr [[R1:r[0-9]+\|lr]],
				; CHECK-NEXT: add [[R2:r[0-9]+]], [[R1]], [[R0]]
				; CHECK-NEXT: cmp [[R2]], [[R1]]
				; CHECK-NEXT: strvc [[R2]],
				; CHECK-NEXT: addvc
				; CHECK-NEXT: cmpvc
				; CHECK-NEXT: bvs
				entry:
				%cmp7 = icmp eq i32 %n, 0
				br i1 %cmp7, label %for.cond.cleanup, label %for.body

				for.cond.cleanup:
				ret void

				for.body:
				%i.08 = phi i32 [ %7, %cont2 ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %b, i32 %i.08
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx1 = getelementptr inbounds i32, i32* %a, i32 %i.08
				%1 = load i32, i32* %arrayidx1, align 4
				%2 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %1, i32 %0)
				%3 = extractvalue { i32, i1 } %2, 1
				br i1 %3, label %trap, label %cont

				trap:
				tail call void @llvm.trap() #2
				unreachable

				cont:
				%4 = extractvalue { i32, i1 } %2, 0
				store i32 %4, i32* %arrayidx1, align 4
				%5 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %i.08, i32 1)
				%6 = extractvalue { i32, i1 } %5, 1
				br i1 %6, label %trap, label %cont2

				cont2:
				%7 = extractvalue { i32, i1 } %5, 0
				%cmp = icmp eq i32 %7, %n
				br i1 %cmp, label %for.cond.cleanup, label %for.body

				}

				declare void @llvm.trap() #2
				declare { i32, i1 } @llvm.sadd.with.overflow.i32(i32, i32) #1
				declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) #1
				declare { i32, i1 } @llvm.ssub.with.overflow.i32(i32, i32) #1
				declare { i32, i1 } @llvm.usub.with.overflow.i32(i32, i32) #1