This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Combine lshr of add that intends to get the carry as llvm.uadd.with.overflow
AbandonedPublic

Authored by abinavpp on Aug 5 2021, 3:53 AM.

Details

Diff Detail

Event Timeline

abinavpp created this revision.Aug 5 2021, 3:53 AM
abinavpp requested review of this revision.Aug 5 2021, 3:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2021, 3:53 AM
abinavpp edited the summary of this revision. (Show Details)Aug 5 2021, 3:56 AM
foad added a subscriber: foad.Aug 5 2021, 4:34 AM
dmgreen added a subscriber: dmgreen.Aug 6 2021, 2:14 AM

I was trying out some examples, to see how this performs on ARM architectures. This pattern was relatively rare and didn't come up a lot in the benchmarks I tried (or optimized to be the same performance/codesize).

Testing the specific pattern, the first one I tried was i8's, a type not legal on those archs. It doesn't do great with this (although aarch64 does do OK given the instructions it can use):
https://godbolt.org/z/hbP445xsT
i32 looks OK for the same thing. AArch64 even looks better, I think because ARM already uses addcarry and splits things in a way that turns out to be equivalent.
https://godbolt.org/z/3zbhKPs1e

I was trying to look at across basic block too, but didn't do very well at getting something that wasn't just optimized away:
i8: https://godbolt.org/z/n8sxv3rqE
i32: https://godbolt.org/z/1sEa7Wd68
And then there was also this, but it crashed putting instructions in the wrong places:
https://godbolt.org/z/48abYPq8M

I have no strong objections to creating uadd_with_overflow in instcombine, but we have not done that in the past and I don't know if I see a huge reason to start now. It appears that we should at least be limiting it to legal types.

abinavpp updated this revision to Diff 365119.Aug 9 2021, 1:06 AM

Rebased; Addressed review comments.

It appears that we should at least be limiting it to legal types.

I agree. The i8 cases doesn't look good on thumbv6m and thumbv7m. I've added DL.isLegalInteger() checks now.

Most of non-correctness bailouts here don't really make sense to me.
Presumably, if it is profitable for a particular backend, said backend should be expanding whatever "bad" intrinsics into it's original form.

Most of non-correctness bailouts here don't really make sense to me.
Presumably, if it is profitable for a particular backend, said backend should be expanding whatever "bad" intrinsics into it's original form.

I agree that we shouldn't ideally add non-correctness bailouts, but I would like to do whatever I can in this patch to avoid regressing other backends.

abinavpp updated this revision to Diff 365130.Aug 9 2021, 2:36 AM

Fixed the case in which the add of the uaddo won't dominate the original trunc use. For e.g.:

target datalayout = "n32"

declare void @true()
declare void @false()

define i32 @lshr_32_add_zext_trunc(i32 %a, i32 %b) {

%zext.a = zext i32 %a to i64
%zext.b = zext i32 %b to i64
%add = add i64 %zext.a, %zext.b
%trunc.add = trunc i64 %add to i32
%cmp = icmp ult i32 %a, %b
br i1 %cmp, label %brTrue, label %brFalse

brTrue:

%shr = lshr i64 %add, 32
%trunc.shr = trunc i64 %shr to i32
%retTrue = add i32 %trunc.add, %trunc.shr
call void @true()
br label %brMerge

brFalse:

%retFalse = add i32 %trunc.add, 12
call void @false()
br label %brMerge

brMerge:

%ret = phi i32 [ %retTrue, %brTrue ], [ %retFalse, %brFalse ]
ret i32 %ret

}

lebedev.ri added inline comments.Aug 9 2021, 4:05 AM
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1060

I really don't think we should be doing this..

1069

Early return

1083–1084

Presumably you should just set the expansion point to be right after the old add?

craig.topper added inline comments.Aug 9 2021, 9:50 AM
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1105

We probably need to do something here to schedule the dead truncates to revisited to remove them. But I'm not sure.

craig.topper added inline comments.Aug 9 2021, 9:54 AM
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1105

I think this should use InstCombinerImpl::replaceInstUsesWith rather than Instruction::replaceAllUsesWith. Then I think it should call InstCombinerImpl::eraseInstFromFunction.

@spatel or @lebedev.ri does that sound right?

lebedev.ri added inline comments.Aug 9 2021, 10:23 AM
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1105

Right, the pass-specific call should be used.
I don't think we need to call eraseInstFromFunction() explcitly,
that should be handled automatically once you return non-null from this function.
Generally, i would think not having an intermediate vector would be better, but not sure.

craig.topper added inline comments.Aug 9 2021, 10:33 AM
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1105

The truncates aren’t directly part of the chain of instructions we’re changing so I’m not sure if the get queued properly for DCE once we return.

lebedev.ri added inline comments.Aug 9 2021, 10:37 AM
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1105

InstCombinerImpl::replaceInstUsesWith() adds the instruction uses of which we replaced into worklist,
so the next time it is revisited, it will be dropped.

abinavpp updated this revision to Diff 365652.Aug 10 2021, 8:41 PM
abinavpp marked 4 inline comments as done.

Rebased; Addressed review comments.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1060

I think I'm being too cautious here. I just realized that InstCombinerImpl::foldICmpWithConstant() can produce sadd_with_overflow, if I read correctly it doesn't seem to worry about cross basic-block use. I have removed this check now.

Matt added a subscriber: Matt.Aug 11 2021, 9:05 AM

I was trying out some examples, to see how this performs on ARM architectures. This pattern was relatively rare and didn't come up a lot in the benchmarks I tried (or optimized to be the same performance/codesize).

Testing the specific pattern, the first one I tried was i8's, a type not legal on those archs. It doesn't do great with this (although aarch64 does do OK given the instructions it can use):
https://godbolt.org/z/hbP445xsT
i32 looks OK for the same thing. AArch64 even looks better, I think because ARM already uses addcarry and splits things in a way that turns out to be equivalent.
https://godbolt.org/z/3zbhKPs1e

I was trying to look at across basic block too, but didn't do very well at getting something that wasn't just optimized away:
i8: https://godbolt.org/z/n8sxv3rqE
i32: https://godbolt.org/z/1sEa7Wd68
And then there was also this, but it crashed putting instructions in the wrong places:
https://godbolt.org/z/48abYPq8M

I have no strong objections to creating uadd_with_overflow in instcombine, but we have not done that in the past and I don't know if I see a huge reason to start now. It appears that we should at least be limiting it to legal types.

It's arguably a worse canonical form since fewer optimizations will understand this (e.g. ScalarEvolution, though not sure it really will matter here).

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1056

Demorgan this

1059–1060

I also think a legality check doesn't belong here. I also don't think the legal types in the datalayout is particularly well defined

1063

demorgan

Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 5:34 AM
Herald added a subscriber: StephenFan. · View Herald Transcript