Inst-combine version of https://reviews.llvm.org/D106139.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I was trying out some examples, to see how this performs on ARM architectures. This pattern was relatively rare and didn't come up a lot in the benchmarks I tried (or optimized to be the same performance/codesize).
Testing the specific pattern, the first one I tried was i8's, a type not legal on those archs. It doesn't do great with this (although aarch64 does do OK given the instructions it can use):
https://godbolt.org/z/hbP445xsT
i32 looks OK for the same thing. AArch64 even looks better, I think because ARM already uses addcarry and splits things in a way that turns out to be equivalent.
https://godbolt.org/z/3zbhKPs1e
I was trying to look at across basic block too, but didn't do very well at getting something that wasn't just optimized away:
i8: https://godbolt.org/z/n8sxv3rqE
i32: https://godbolt.org/z/1sEa7Wd68
And then there was also this, but it crashed putting instructions in the wrong places:
https://godbolt.org/z/48abYPq8M
I have no strong objections to creating uadd_with_overflow in instcombine, but we have not done that in the past and I don't know if I see a huge reason to start now. It appears that we should at least be limiting it to legal types.
I agree. The i8 cases doesn't look good on thumbv6m and thumbv7m. I've added DL.isLegalInteger() checks now.
Most of non-correctness bailouts here don't really make sense to me.
Presumably, if it is profitable for a particular backend, said backend should be expanding whatever "bad" intrinsics into it's original form.
I agree that we shouldn't ideally add non-correctness bailouts, but I would like to do whatever I can in this patch to avoid regressing other backends.
Fixed the case in which the add of the uaddo won't dominate the original trunc use. For e.g.:
target datalayout = "n32"
declare void @true()
declare void @false()
define i32 @lshr_32_add_zext_trunc(i32 %a, i32 %b) {
%zext.a = zext i32 %a to i64 %zext.b = zext i32 %b to i64 %add = add i64 %zext.a, %zext.b %trunc.add = trunc i64 %add to i32 %cmp = icmp ult i32 %a, %b br i1 %cmp, label %brTrue, label %brFalse
brTrue:
%shr = lshr i64 %add, 32 %trunc.shr = trunc i64 %shr to i32 %retTrue = add i32 %trunc.add, %trunc.shr call void @true() br label %brMerge
brFalse:
%retFalse = add i32 %trunc.add, 12 call void @false() br label %brMerge
brMerge:
%ret = phi i32 [ %retTrue, %brTrue ], [ %retFalse, %brFalse ] ret i32 %ret
}
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp | ||
---|---|---|
1105 | We probably need to do something here to schedule the dead truncates to revisited to remove them. But I'm not sure. |
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp | ||
---|---|---|
1105 | I think this should use InstCombinerImpl::replaceInstUsesWith rather than Instruction::replaceAllUsesWith. Then I think it should call InstCombinerImpl::eraseInstFromFunction. @spatel or @lebedev.ri does that sound right? |
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp | ||
---|---|---|
1105 | Right, the pass-specific call should be used. |
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp | ||
---|---|---|
1105 | The truncates aren’t directly part of the chain of instructions we’re changing so I’m not sure if the get queued properly for DCE once we return. |
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp | ||
---|---|---|
1105 | InstCombinerImpl::replaceInstUsesWith() adds the instruction uses of which we replaced into worklist, |
Rebased; Addressed review comments.
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp | ||
---|---|---|
1060 | I think I'm being too cautious here. I just realized that InstCombinerImpl::foldICmpWithConstant() can produce sadd_with_overflow, if I read correctly it doesn't seem to worry about cross basic-block use. I have removed this check now. |
It's arguably a worse canonical form since fewer optimizations will understand this (e.g. ScalarEvolution, though not sure it really will matter here).
llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp | ||
---|---|---|
1056 | Demorgan this | |
1059–1060 | I also think a legality check doesn't belong here. I also don't think the legal types in the datalayout is particularly well defined | |
1063 | demorgan |
Demorgan this