Page MenuHomePhabricator

[DAGCombine][ARM] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold
Needs ReviewPublic

Authored by lebedev.ri on May 24 2019, 7:37 AM.

Details

Summary

DAGCombiner::visitADDLikeCommutative() already has a sibling fold: (add X, Carry) -> (addcarry X, 0, Carry)
This fold, as suggested by @efriedma, helps recover from some of the regressions of D62266

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.May 24 2019, 7:37 AM

It looks like we don't have great test coverage here. In general, we probably need to restrict getAsCarry depending on whether the intended use is an an addition or subtraction, at least for ARM.

test/CodeGen/ARM/addsubcarry-promotion.ll
28

The redundant sxth here is unfortunate, but probably orthogonal. Maybe a missing SimplifyDemandedBits case?

lebedev.ri marked an inline comment as done.May 24 2019, 3:52 PM

It looks like we don't have great test coverage here.

Yes, indeed, that is the only affected test in the entire check-llvm-codegen.

In general, we probably need to restrict getAsCarry depending on whether the intended use is an an addition or subtraction, at least for ARM.

I'm sorry, but i'm again not sure what that means (:
Restrict how?

I mean, we basically need two methods: getAsAddCarry (carry produced by UADDO/ADDCARRY) and getAsSubCarry (carry produced by USUBO/SUBCARRY), depending on how you're planning to use the carry bit, so we don't end up with same sort of mismatch we ran into before on ARM.

I mean, we basically need two methods: getAsAddCarry (carry produced by UADDO/ADDCARRY) and getAsSubCarry (carry produced by USUBO/SUBCARRY), depending on how you're planning to use the carry bit, so we don't end up with same sort of mismatch we ran into before on ARM.

Thank you for the explanation.

Is that required to be part of the follow-up fixes for D62266?
It sounds, while not unrelated, not exactly specific to the regressions there.
I don't know much about these 'carry' ops, so unless it's required to
fix those regressions, it might not be best for me to look into that..

If you don't want to continue working on this, that's okay, I think. D62450 looks like it's the more interesting fix.

If you don't want to continue working on this, that's okay, I think.

I'm just really unfamiliar with carry-nodes (and with ARM),
so it is not obvious to me what needs fixing, and in what way,
if there are no existing tests that regress due to my change.

D62450 looks like it's the more interesting fix.

These two are intertwined.
D62450 folds constant from setcc into parent addcarry/subcarry.
But without this patch there is no addcarry/subcarry...

Oh, I see...

I'll see if I can find some time to add some more testcases to the tree.

Oh, I see...

I'll see if I can find some time to add some more testcases to the tree.

Yes, please do, that would be helpful.

Maybe it is worth adding some platform dependent check to actually make sure turning the carry into a scalar is expensive? Or is it a reasonable assumption to make that it expensive on all plateforms?

In any case, this is more likely than not that this will optimize better down the road anyways, so maybe, if such plateform exist, we may want to delegate the cleanup to plateform specific transforms.

I think it would be beneficial to have an X86 test case for this pattern.

Maybe it is worth adding some platform dependent check to actually make sure turning the carry into a scalar is expensive? Or is it a reasonable assumption to make that it expensive on all plateforms?

I honestly don't know the answers to these questions, carry nodes are not my strong side,
If you want (given your recent carry patches), you can totally take this patch over.

In any case, this is more likely than not that this will optimize better down the road anyways, so maybe, if such plateform exist, we may want to delegate the cleanup to plateform specific transforms.

I think it would be beneficial to have an X86 test case for this pattern.

@lebedev.ri I think these are reasonable assumptions. Just add a test case for X86 and this is good to go as far as I'm concerned.