This is an alternative to D122825. Instead of trying to account for possible EFLAGS clobber by constant materialization, remove the with.overflow + select fold entirely, on the premise that non-trivial/fragile folds like this don't belong in FastISel. I've kept the with.overflow + br fold for now, as we don't have any known issues for that one.
Details
Diff Detail
Unit Tests
Event Timeline
Please can we see the effect of removing the with.overflow + br case and the foldX86XALUIntrinsic method entirely?
@RKSimon Here's how dropping it entirely looks like: https://gist.github.com/nikic/9e59e8e7b635a54f0d75aad89b2632d6
I think I prefer getting rid of it entirely, but I'd accept just this patch if others want to keep some eflags folding behaviour.
@pengfei @craig.topper any thoughts?
If I read the patches correct, this patch and further removing the whole folding are just for theoretical potential issues but we cannot think out for now, right?
Given we don't do much optimizations with FastISel (which means it's rare we meet another fail in future), I prefer to D122825 (maybe with comments for future information). Furthermore, I think we can even check if the constant is 0.
OK, I've no strong objections to D122825 instead if there's a motivation to retain fast-isel optimizations
You can't check if the constant is 0. It could be a ConstantExpr that includes a use of a 0. The ConstantExpr will get expanded to individual instructions and that might create a mov of 0. It could also expand to an expression that includes an add or other instruction which will also clobber the flags. For example, this test case is broken today because it creates an add instruction between the overflow add and the cmov.
@a = global i32 0, align 4 define i64 @adder(i64 %lhs, i64 %rhs) { %res = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 %lhs, i64 %rhs) %errorbit = extractvalue { i64, i1 } %res, 1 %errorval = select i1 %errorbit, i64 148, i64 add (i64 ptrtoint (i32* @a to i64), i64 5) ret i64 %errorval } declare { i64, i1 } @llvm.sadd.with.overflow.i64(i64 %a, i64 %b)
Thanks, this is a nice additional test case. I've committed it in https://github.com/llvm/llvm-project/commit/8ae33cb300262738d9798edb612c7431612f56f6. (Either approach would fix that one as well.)
Does D122825 still work if we don't have any constant operand? https://godbolt.org/z/5r6TvGxaP