This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] substitute equivalent constant to reduce logic-of-icmps
ClosedPublic

Authored by spatel on Apr 21 2020, 1:05 PM.

Details

Summary

(X == C) && (Y Pred1 X) --> (X == C) && (Y Pred1 C)
(X != C) || (Y Pred1 X) --> (X != C) || (Y Pred1 C)

This is not a complete alternate to D78430, but it is a more general transform that gets us most of the expected simplifications and several other improvements. It's a smaller patch too, so hopefully, less chance to go wrong assuming the underlying logic is correct:
http://volta.cs.utah.edu:8080/z/5gxjjc

PR45618:
https://bugs.llvm.org/show_bug.cgi?id=45618

Diff Detail

Event Timeline

spatel created this revision.Apr 21 2020, 1:05 PM

Does this supersede any of the existing folds?

nikic added a comment.Apr 21 2020, 1:52 PM

I like this approach. I guess the main thing we lose over the more limited handling in InstSimplify is the multi-use case?

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1151

Wouldn't you just be replacing one non-constant with another non-constant in that case?

Does this supersede any of the existing folds?

I suspect it does, but I didn't try to find those yet (the "AndOrICmps" chunk of instcombine is already several hundred lines of code). I'll skim through.

I like this approach. I guess the main thing we lose over the more limited handling in InstSimplify is the multi-use case?

Yes, the 1-use check is the only loss that I know of vs. what's in InstSimplify currently.
Also, D78430 has matching for:
(X == MAX) || (X >= Y) --> X >= Y
...and related patterns, and that is not handled here.
It's generally a gray area as to how to divide this kind of functionality, but we could do both if we want to be sure that nothing falls through the cracks.

I like this approach too because it is simpler.
Is it possible to apply this approach to InstSimplify as well? For example, when (x == c && x < y) is given, we can recursively call simplify icmp of (ult, c, y) and return its result.

spatel updated this revision to Diff 259291.Apr 22 2020, 8:12 AM
spatel marked 2 inline comments as done.

Patch updated:

  1. Call SimplifyICmpInst() to determine if the transform truly requires a new instruction; if not, we do not need a hasOneUse() restriction.
  2. Remove unlikely TODO comment.

The internal use of Simplify* here should make it possible to delete existing code in InstSimplify as discussed in D78430 (and definitely makes the new code in that patch unnecessary unless I've missed some loophole).

Also with this patch applied, I did an audit of the existing AndOrOfICmps* logic by commenting out chunks of code and seeing if tests fail. Everything else is apparently still necessary; this patch does not provide a complete superset for anything else we do in InstCombine. So we can't delete anything from InstCombine in this patch.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1151

Yes, seems unlikely that variable -> variable substitution would be worth doing. I'll remove the TODO note.

lebedev.ri requested changes to this revision.Apr 23 2020, 2:05 AM

Patch doesn't apply for me.
Does this go ontop of git master, or D78430?

This revision now requires changes to proceed.Apr 23 2020, 2:05 AM
lebedev.ri accepted this revision.Apr 23 2020, 2:27 AM
lebedev.ri added a reviewer: nlopes.

Same preexisting soundness problem as visible in D78430, but other than that LG.

llvm/test/Transforms/InstCombine/and-or-icmp-nullptr.ll
155–253

Likewise with D78430, alive says all these are preexisting miscompiles:

----------------------------------------
define i1 @ule_or_min(* %x, * %y) {
%0:
  %cmp = icmp ule * %x, %y
  %cmpeq = icmp eq * %x, null
  %r = or i1 %cmp, %cmpeq
  ret i1 %r
}
=>
define i1 @ule_or_min(* %x, * %y) {
%0:
  %cmp = icmp ule * %x, %y
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=1, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x1 (1)
i1 %r = #x1 (1)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 4        alloc type: 0

Target:
i1 %cmp = #x0 (0)
Source value: #x1 (1)
Target value: #x0 (0)


----------------------------------------
define i1 @ule_or_min_commute(* %x, * %y) {
%0:
  %cmp = icmp ule * %x, %y
  %cmpeq = icmp eq * %x, null
  %r = or i1 %cmpeq, %cmp
  ret i1 %r
}
=>
define i1 @ule_or_min_commute(* %x, * %y) {
%0:
  %cmp = icmp ule * %x, %y
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=2, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x1 (1)
i1 %r = #x1 (1)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 0        alloc type: 0

Target:
i1 %cmp = #x0 (0)
Source value: #x1 (1)
Target value: #x0 (0)


----------------------------------------
define i1 @ule_swap_or_min(* %x, * %y) {
%0:
  %cmp = icmp uge * %y, %x
  %cmpeq = icmp eq * %x, null
  %r = or i1 %cmp, %cmpeq
  ret i1 %r
}
=>
define i1 @ule_swap_or_min(* %x, * %y) {
%0:
  %cmp = icmp uge * %y, %x
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=2, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x1 (1)
i1 %r = #x1 (1)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 4        alloc type: 0
Block 2 >       size: 0 align: 2        alloc type: 0

Target:
i1 %cmp = #x0 (0)
Source value: #x1 (1)
Target value: #x0 (0)


----------------------------------------
define i1 @ule_swap_or_min_commute(* %x, * %y) {
%0:
  %cmp = icmp uge * %y, %x
  %cmpeq = icmp eq * %x, null
  %r = or i1 %cmpeq, %cmp
  ret i1 %r
}
=>
define i1 @ule_swap_or_min_commute(* %x, * %y) {
%0:
  %cmp = icmp uge * %y, %x
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=1, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x1 (1)
i1 %r = #x1 (1)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 2        alloc type: 0

Target:
i1 %cmp = #x0 (0)
Source value: #x1 (1)
Target value: #x0 (0)


----------------------------------------
define i1 @ugt_and_not_min(* %x, * %y) {
%0:
  %cmp = icmp ugt * %x, %y
  %cmpeq = icmp ne * %x, null
  %r = and i1 %cmp, %cmpeq
  ret i1 %r
}
=>
define i1 @ugt_and_not_min(* %x, * %y) {
%0:
  %cmp = icmp ugt * %x, %y
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=2, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x0 (0)
i1 %r = #x0 (0)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 2        alloc type: 0

Target:
i1 %cmp = #x1 (1)
Source value: #x0 (0)
Target value: #x1 (1)


----------------------------------------
define i1 @ugt_and_not_min_commute(* %x, * %y) {
%0:
  %cmp = icmp ugt * %x, %y
  %cmpeq = icmp ne * %x, null
  %r = and i1 %cmpeq, %cmp
  ret i1 %r
}
=>
define i1 @ugt_and_not_min_commute(* %x, * %y) {
%0:
  %cmp = icmp ugt * %x, %y
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=2, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x0 (0)
i1 %r = #x0 (0)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 2        alloc type: 0

Target:
i1 %cmp = #x1 (1)
Source value: #x0 (0)
Target value: #x1 (1)


----------------------------------------
define i1 @ugt_swap_and_not_min(* %x, * %y) {
%0:
  %cmp = icmp ult * %y, %x
  %cmpeq = icmp ne * %x, null
  %r = and i1 %cmp, %cmpeq
  ret i1 %r
}
=>
define i1 @ugt_swap_and_not_min(* %x, * %y) {
%0:
  %cmp = icmp ult * %y, %x
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=2, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x0 (0)
i1 %r = #x0 (0)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 2        alloc type: 0

Target:
i1 %cmp = #x1 (1)
Source value: #x0 (0)
Target value: #x1 (1)


----------------------------------------
define i1 @ugt_swap_and_not_min_commute(* %x, * %y) {
%0:
  %cmp = icmp ult * %y, %x
  %cmpeq = icmp ne * %x, null
  %r = and i1 %cmpeq, %cmp
  ret i1 %r
}
=>
define i1 @ugt_swap_and_not_min_commute(* %x, * %y) {
%0:
  %cmp = icmp ult * %y, %x
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=1, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x0 (0)
i1 %r = #x0 (0)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 4        alloc type: 0

Target:
i1 %cmp = #x1 (1)
Source value: #x0 (0)
Target value: #x1 (1)
This revision is now accepted and ready to land.Apr 23 2020, 2:27 AM

Hi, as I just left at D8540, Alive2 currently approximates comparison with null pointer, so please ignore the case. I'll rerun the unit tests after it is correctly supported.

Hi, as I just left at D8540, Alive2 currently approximates comparison with null pointer, so please ignore the case. I'll rerun the unit tests after it is correctly supported.

Thanks.

Sorry if the patches are not applying cleanly. I thought this 1 should still apply independently to trunk as shown, but the overlapping test diffs are probably conflicting. I'll push the InstSimplify change first because that is an underlying analysis for InstCombine and then regenerate the test diffs.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2020, 7:33 AM
nikic added a comment.Apr 23 2020, 9:49 AM

It looks like this caused a +0.40% text size increase on kimwitu++. There's a corresponding > 10% compile-time increase on the main.cc file, so that's likely the relevant one.

If you have time, it would be good to double check what's going on there, as text size increase usually means we're losing optimizations.

spatel added a comment.EditedApr 23 2020, 11:00 AM

It looks like this caused a +0.40% text size increase on kimwitu++. There's a corresponding > 10% compile-time increase on the main.cc file, so that's likely the relevant one.

If you have time, it would be good to double check what's going on there, as text size increase usually means we're losing optimizations.

I'm going to expose my lack of git knowledge here, but how do we know this commit is to blame for the difference?
How do I determine the build flags that are needed to repro?
I tried reverting this patch locally and compiling /testsuite/trunk/MultiSource/Applications/kimwitu++/main.cc, and the IR is identical to pre-revert compiled at -O2 or -O3.

It looks like this caused a +0.40% text size increase on kimwitu++. There's a corresponding > 10% compile-time increase on the main.cc file, so that's likely the relevant one.

If you have time, it would be good to double check what's going on there, as text size increase usually means we're losing optimizations.

I'm going to expose my lack of git knowledge here, but how do we know this commit is to blame for the difference?

Better link: http://llvm-compile-time-tracker.com/compare.php?from=7003a1da37b2aae5b17460922efde9efde2c229d&to=62da6ecea298739ad59c0563ce6d9493804ef1f0&stat=size-text This one only contains this change in the diff range. (The previous one had an additional MLIR change, which can't make a difference.)

How do I determine the build flags that are needed to repro?

The flags are determined by test-suite cmake/caches/O3.cmake configuration. I get the flags by running ninja kc -v and sticking -emit-llvm on the right one, which would be for me:

/home/nikic/llvm-project/build/bin/clang++ -DNDEBUG -O3 -w -Werror=date-time -I/home/nikic/llvm-test-suite/MultiSource/Applications/kimwitu++ -DYYDEBUG=1 -MD -MT MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/util.cc.o -MF MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/main.cc.o.d -o MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/main.cc.o -c ../MultiSource/Applications/kimwitu++/main.cc -emit-llvm

I tried reverting this patch locally and compiling /testsuite/trunk/MultiSource/Applications/kimwitu++/main.cc, and the IR is identical to pre-revert compiled at -O2 or -O3.

Hm, are you sure you reverted the right one? I also didn't see a difference, until I realized I reverted the InstSimplify change instead of the InstCombine change, duh. After fixing that, I get these two files and diff: https://gist.github.com/nikic/ecbe2482367be25b17ab982c2346b661

It looks like this caused a +0.40% text size increase on kimwitu++. There's a corresponding > 10% compile-time increase on the main.cc file, so that's likely the relevant one.

If you have time, it would be good to double check what's going on there, as text size increase usually means we're losing optimizations.

I'm going to expose my lack of git knowledge here, but how do we know this commit is to blame for the difference?

Better link: http://llvm-compile-time-tracker.com/compare.php?from=7003a1da37b2aae5b17460922efde9efde2c229d&to=62da6ecea298739ad59c0563ce6d9493804ef1f0&stat=size-text This one only contains this change in the diff range. (The previous one had an additional MLIR change, which can't make a difference.)

How do I determine the build flags that are needed to repro?

The flags are determined by test-suite cmake/caches/O3.cmake configuration. I get the flags by running ninja kc -v and sticking -emit-llvm on the right one, which would be for me:

/home/nikic/llvm-project/build/bin/clang++ -DNDEBUG -O3 -w -Werror=date-time -I/home/nikic/llvm-test-suite/MultiSource/Applications/kimwitu++ -DYYDEBUG=1 -MD -MT MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/util.cc.o -MF MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/main.cc.o.d -o MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/main.cc.o -c ../MultiSource/Applications/kimwitu++/main.cc -emit-llvm

I tried reverting this patch locally and compiling /testsuite/trunk/MultiSource/Applications/kimwitu++/main.cc, and the IR is identical to pre-revert compiled at -O2 or -O3.

Hm, are you sure you reverted the right one? I also didn't see a difference, until I realized I reverted the InstSimplify change instead of the InstCombine change, duh. After fixing that, I get these two files and diff: https://gist.github.com/nikic/ecbe2482367be25b17ab982c2346b661

I'm looking into this, and i believe those are improvements, not regressions.


As it can be seen on llvm-diff, we inlined more:

$ llvm-diff-11 old.ll new.ll 
function @_ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_PKS5_ exists only in left module
function @_ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ exists only in left module
function @_ZSt19__throw_logic_errorPKc exists only in left module
in function main:
  in block %entry:
    >   %__dnew.i.i.i.i.i2584 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i2481 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i2448 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i2360 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i2319 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i2216 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i2099 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i2046 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i1873 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i1840 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i1789 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i1736 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i1561 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i1528 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i1477 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i1424 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i1237 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i1204 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i1171 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i1118 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i915 = alloca i64, align 8
    >   %__dnew.i.i.i.i.i = alloca i64, align 8
    >   %__dnew.i.i.i.i.i692.i = alloca i64, align 8
    >   %__dnew.i.i.i.i.i662.i = alloca i64, align 8
    >   %__dnew.i.i.i.i.i633.i = alloca i64, align 8
    >   %__dnew.i.i.i.i.i.i = alloca i64, align 8
  in block %for.body196.lr.ph.i:
    >   %106 = bitcast %"class.std::__cxx11::basic_string.4"* %ref.tmp222.i to i8*
    >   %107 = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp222.i, i64 0, i32 2
        %106 = bitcast %"class.std::__cxx11::basic_string"* %ref.tmp222.i to i8*
    >   %109 = bitcast i64* %__dnew.i.i.i.i.i.i to i8*
    >   %110 = bitcast %union.anon.6* %107 to i8*
        %_M_p.i.i625.i = getelementptr inbounds %"class.std::__cxx11::basic_string", %"class.std::__cxx11::basic_string"* %ref.tmp222.i, i64 0, i32 0, i32 0
    >   %_M_allocated_capacity.i.i.i.i.i.i625.i = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp222.i, i64 0, i32 2, i32 0
    >   %_M_string_length.i.i.i.i.i.i.i.i = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp222.i, i64 0, i32 1
    >   %111 = bitcast %"class.std::__cxx11::basic_string.4"* %ref.tmp227.i to i8*
    >   %112 = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp227.i, i64 0, i32 2
        %107 = bitcast %"class.std::__cxx11::basic_string"* %ref.tmp227.i to i8*
    >   %114 = bitcast i64* %__dnew.i.i.i.i.i633.i to i8*
    >   %115 = bitcast %union.anon.6* %112 to i8*
        %_M_p.i.i626.i = getelementptr inbounds %"class.std::__cxx11::basic_string", %"class.std::__cxx11::basic_string"* %ref.tmp227.i, i64 0, i32 0, i32 0
    >   %_M_allocated_capacity.i.i.i.i.i.i638.i = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp227.i, i64 0, i32 2, i32 0
    >   %_M_string_length.i.i.i.i.i.i.i644.i = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp227.i, i64 0, i32 1
    >   %116 = bitcast %"class.std::__cxx11::basic_string.4"* %ref.tmp235.i to i8*
    >   %117 = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp235.i, i64 0, i32 2
        %108 = bitcast %"class.std::__cxx11::basic_string"* %ref.tmp235.i to i8*
    >   %119 = bitcast i64* %__dnew.i.i.i.i.i662.i to i8*
    >   %120 = bitcast %union.anon.6* %117 to i8*
        %_M_p.i.i627.i = getelementptr inbounds %"class.std::__cxx11::basic_string", %"class.std::__cxx11::basic_string"* %ref.tmp235.i, i64 0, i32 0, i32 0
    >   %_M_allocated_capacity.i.i.i.i.i.i667.i = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp235.i, i64 0, i32 2, i32 0
    >   %_M_string_length.i.i.i.i.i.i.i673.i = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp235.i, i64 0, i32 1
        %109 = bitcast %"class.std::__cxx11::basic_string"* %ref.tmp244.i to i8*
    <   %_M_p.i.i628.i = getelementptr inbounds %"class.std::__cxx11::basic_string", %"class.std::__cxx11::basic_string"* %ref.tmp244.i, i64 0, i32 0, i32 0
        %110 = getelementptr inbounds %"class.std::__cxx11::basic_string", %"class.std::__cxx11::basic_string"* %ref.tmp244.i, i64 0, i32 2
    >   %123 = bitcast %"class.std::__cxx11::basic_string.4"* %ref.tmp244.i to %union.anon.6**
    >   %124 = bitcast i64* %__dnew.i.i.i.i.i692.i to i8*
        %arraydecay.i.i.i.i630.i = bitcast %union.anon* %110 to i8*
    >   %_M_p.i18.i.i.i.i.i696.i = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp244.i, i64 0, i32 0, i32 0
    >   %_M_allocated_capacity.i.i.i.i.i.i697.i = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp244.i, i64 0, i32 2, i32 0
    >   %_M_string_length.i.i.i.i.i.i.i703.i = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp244.i, i64 0, i32 1
    <   %111 = getelementptr inbounds %"class.std::__cxx11::basic_string", %"class.std::__cxx11::basic_string"* %ref.tmp235.i, i64 0, i32 2
    <   %arraydecay.i.i.i.i636.i = bitcast %union.anon* %111 to i8*
    <   %112 = getelementptr inbounds %"class.std::__cxx11::basic_string", %"class.std::__cxx11::basic_string"* %ref.tmp227.i, i64 0, i32 2
    <   %arraydecay.i.i.i.i642.i = bitcast %union.anon* %112 to i8*
    <   %113 = getelementptr inbounds %"class.std::__cxx11::basic_string", %"class.std::__cxx11::basic_string"* %ref.tmp222.i, i64 0, i32 2
    <   %arraydecay.i.i.i.i648.i = bitcast %union.anon* %113 to i8*
  in block %if.else221.i:
    >   call void @llvm.lifetime.start.p0i8(i64 32, i8* nonnull %106) #22
    >   store %union.anon.6* %107, %union.anon.6** %108, align 8, !tbaa !21, !alias.scope !41
    >   %157 = load i8*, i8** getelementptr inbounds (%struct.cmdline_options.3, %struct.cmdline_options.3* @g_options, i64 0, i32 22, i32 0, i32 0), align 8, !tbaa !23, !noalias !41
    >   %158 = load i64, i64* getelementptr inbounds (%struct.cmdline_options.3, %struct.cmdline_options.3* @g_options, i64 0, i32 22, i32 1), align 8, !tbaa !10, !noalias !41
    >   call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %109) #22, !noalias !41
    >   store i64 %158, i64* %__dnew.i.i.i.i.i.i, align 8, !tbaa !44, !noalias !41
    >   %cmp3.i.i.i.i.i.i = icmp ugt i64 %158, 15
    >   br i1 %cmp3.i.i.i.i.i.i, label %if.then4.i.i.i.i.i.i, label %if.end6.i.i.i.i.i.i
    <   call void @llvm.lifetime.start.p0i8(i64 32, i8* nonnull %106) #23
    <   call void @_ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_PKS5_(%"class.std::__cxx11::basic_string"* nonnull sret align 8 %ref.tmp222.i, %"class.std::__cxx11::basic_string"* dereferenceable(32) getelementptr inbounds (%struct.cmdline_options, %struct.cmdline_options* @g_options, i64 0, i32 22), i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.74, i64 0, i64 0))
    <   %145 = load i8*, i8** %_M_p.i.i625.i, align 8, !tbaa !23
    <   %call224.i = call i32 @strcmp(i8* nonnull dereferenceable(1) %mybasename.0.i624.i, i8* nonnull dereferenceable(1) %145) #22
    <   %cmp225.i = icmp eq i32 %call224.i, 0
    <   br i1 %cmp225.i, label %cleanup.done284.i, label %lor.lhs.false226.i
  in block %if.end193:
    >   %598 = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp, i64 0, i32 2
    >   %599 = bitcast %"class.std::__cxx11::basic_string.4"* %ref.tmp to %union.anon.6**
    >   store %union.anon.6* %598, %union.anon.6** %599, align 8, !tbaa !21, !alias.scope !109
    >   %600 = load i8*, i8** getelementptr inbounds (%struct.cmdline_options.3, %struct.cmdline_options.3* @g_options, i64 0, i32 22, i32 0, i32 0), align 8, !tbaa !23, !noalias !109
    >   %601 = load i64, i64* getelementptr inbounds (%struct.cmdline_options.3, %struct.cmdline_options.3* @g_options, i64 0, i32 22, i32 1), align 8, !tbaa !10, !noalias !109
    >   %602 = bitcast i64* %__dnew.i.i.i.i.i to i8*
    >   call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %602) #22, !noalias !109
    >   store i64 %601, i64* %__dnew.i.i.i.i.i, align 8, !tbaa !44, !noalias !109
    >   %cmp3.i.i.i.i.i = icmp ugt i64 %601, 15
    >   br i1 %cmp3.i.i.i.i.i, label %if.then4.i.i.i.i.i, label %if.end.if.end6_crit_edge.i.i.i.i.i
    <   call void @_ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_PKS5_(%"class.std::__cxx11::basic_string"* nonnull sret align 8 %ref.tmp, %"class.std::__cxx11::basic_string"* dereferenceable(32) getelementptr inbounds (%struct.cmdline_options, %struct.cmdline_options* @g_options, i64 0, i32 22), i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.12, i64 0, i64 0))
    <   invoke void @_ZN14kc_filePrinter4initEPKcS1_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE(%class.kc_filePrinter* nonnull @v_hfile_printer, i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.10, i64 0, i64 0), i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.11, i64 0, i64 0), %"class.std::__cxx11::basic_string"* nonnull dereferenceable(32) %ref.tmp)
          to label %invoke.cont unwind label %lpad

in function _ZN2kcL8openfileEPKcS1_:
  in block %entry:
    >   %__dnew.i.i.i.i.i = alloca i64, align 8
  in block %if.else:
    >   %2 = getelementptr inbounds %"class.std::__cxx11::basic_string.4", %"class.std::__cxx11::basic_string.4"* %ref.tmp2, i64 0, i32 2
    >   %3 = bitcast %"class.std::__cxx11::basic_string.4"* %ref.tmp2 to %union.anon.6**
    >   store %union.anon.6* %2, %union.anon.6** %3, align 8, !tbaa !2, !alias.scope !7
    >   %4 = load i8*, i8** getelementptr inbounds (%struct.cmdline_options.3, %struct.cmdline_options.3* @g_options, i64 0, i32 24, i32 0, i32 0), align 8, !tbaa !10, !noalias !7
    >   %5 = load i64, i64* getelementptr inbounds (%struct.cmdline_options.3, %struct.cmdline_options.3* @g_options, i64 0, i32 24, i32 1), align 8, !tbaa !13, !noalias !7
    >   %6 = bitcast i64* %__dnew.i.i.i.i.i to i8*
    >   call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %6) #21, !noalias !7
    >   store i64 %5, i64* %__dnew.i.i.i.i.i, align 8, !tbaa !14, !noalias !7
    >   %cmp3.i.i.i.i.i = icmp ugt i64 %5, 15
    >   br i1 %cmp3.i.i.i.i.i, label %if.then4.i.i.i.i.i, label %if.end.if.end6_crit_edge.i.i.i.i.i
    <   call void @_ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_PKS5_(%"class.std::__cxx11::basic_string"* nonnull sret align 8 %ref.tmp2, %"class.std::__cxx11::basic_string"* dereferenceable(32) getelementptr inbounds (%struct.cmdline_options, %struct.cmdline_options* @g_options, i64 0, i32 24), i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.15, i64 0, i64 0))
    <   %call.i.i.i = call i64 @strlen(i8* nonnull dereferenceable(1) %file) #22, !noalias !2
    <   %_M_string_length.i.i.i.i = getelementptr inbounds %"class.std::__cxx11::basic_string", %"class.std::__cxx11::basic_string"* %ref.tmp2, i64 0, i32 1
    <   %2 = load i64, i64* %_M_string_length.i.i.i.i, align 8, !tbaa !5, !noalias !2
    <   %sub3.i.i.i = sub i64 4611686018427387903, %2
    <   %cmp.i.i.i = icmp ult i64 %sub3.i.i.i, %call.i.i.i
    <   br i1 %cmp.i.i.i, label %if.then.i.i.i, label %_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6appendEPKc.exit.i

in function _GLOBAL__sub_I_main.cc:
  in block %entry:
    >   tail call void @_ZNSt8ios_base4InitC1Ev(%"class.std::ios_base::Init.0"* nonnull @_ZStL8__ioinit)
    >   %0 = tail call i32 @__cxa_atexit(void (i8*)* bitcast (void (%"class.std::ios_base::Init.0"*)* @_ZNSt8ios_base4InitD1Ev to void (i8*)*), i8* getelementptr inbounds (%"class.std::ios_base::Init.0", %"class.std::ios_base::Init.0"* @_ZStL8__ioinit, i64 0, i32 0), i8* nonnull @__dso_handle) #21
    <   tail call void @_ZNSt8ios_base4InitC1Ev(%"class.std::ios_base::Init"* nonnull @_ZStL8__ioinit)
    <   %0 = tail call i32 @__cxa_atexit(void (i8*)* bitcast (void (%"class.std::ios_base::Init"*)* @_ZNSt8ios_base4InitD1Ev to void (i8*)*), i8* getelementptr inbounds (%"class.std::ios_base::Init", %"class.std::ios_base::Init"* @_ZStL8__ioinit, i64 0, i32 0), i8* nonnull @__dso_handle) #22

It looks like this caused a +0.40% text size increase on kimwitu++. There's a corresponding > 10% compile-time increase on the main.cc file, so that's likely the relevant one.

If you have time, it would be good to double check what's going on there, as text size increase usually means we're losing optimizations.

I'm going to expose my lack of git knowledge here, but how do we know this commit is to blame for the difference?

Better link: http://llvm-compile-time-tracker.com/compare.php?from=7003a1da37b2aae5b17460922efde9efde2c229d&to=62da6ecea298739ad59c0563ce6d9493804ef1f0&stat=size-text This one only contains this change in the diff range. (The previous one had an additional MLIR change, which can't make a difference.)

How do I determine the build flags that are needed to repro?

The flags are determined by test-suite cmake/caches/O3.cmake configuration. I get the flags by running ninja kc -v and sticking -emit-llvm on the right one, which would be for me:

/home/nikic/llvm-project/build/bin/clang++ -DNDEBUG -O3 -w -Werror=date-time -I/home/nikic/llvm-test-suite/MultiSource/Applications/kimwitu++ -DYYDEBUG=1 -MD -MT MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/util.cc.o -MF MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/main.cc.o.d -o MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/main.cc.o -c ../MultiSource/Applications/kimwitu++/main.cc -emit-llvm

I tried reverting this patch locally and compiling /testsuite/trunk/MultiSource/Applications/kimwitu++/main.cc, and the IR is identical to pre-revert compiled at -O2 or -O3.

Hm, are you sure you reverted the right one? I also didn't see a difference, until I realized I reverted the InstSimplify change instead of the InstCombine change, duh. After fixing that, I get these two files and diff: https://gist.github.com/nikic/ecbe2482367be25b17ab982c2346b661

I'm looking into this, and i believe those are improvements, not regressions.

Thanks! I double-checked my setup, and I'm still not able to repro. I wonder if platform makes a difference (I'm testing on macOS)?
Just staring at some of those math/logic+icmp combos made me think we're missing some underlying icmp transforms, so I'm working on a patch for at least 1 of those.

It looks like this caused a +0.40% text size increase on kimwitu++. There's a corresponding > 10% compile-time increase on the main.cc file, so that's likely the relevant one.

If you have time, it would be good to double check what's going on there, as text size increase usually means we're losing optimizations.

I'm going to expose my lack of git knowledge here, but how do we know this commit is to blame for the difference?

Better link: http://llvm-compile-time-tracker.com/compare.php?from=7003a1da37b2aae5b17460922efde9efde2c229d&to=62da6ecea298739ad59c0563ce6d9493804ef1f0&stat=size-text This one only contains this change in the diff range. (The previous one had an additional MLIR change, which can't make a difference.)

How do I determine the build flags that are needed to repro?

The flags are determined by test-suite cmake/caches/O3.cmake configuration. I get the flags by running ninja kc -v and sticking -emit-llvm on the right one, which would be for me:

/home/nikic/llvm-project/build/bin/clang++ -DNDEBUG -O3 -w -Werror=date-time -I/home/nikic/llvm-test-suite/MultiSource/Applications/kimwitu++ -DYYDEBUG=1 -MD -MT MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/util.cc.o -MF MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/main.cc.o.d -o MultiSource/Applications/kimwitu++/CMakeFiles/kc.dir/main.cc.o -c ../MultiSource/Applications/kimwitu++/main.cc -emit-llvm

I tried reverting this patch locally and compiling /testsuite/trunk/MultiSource/Applications/kimwitu++/main.cc, and the IR is identical to pre-revert compiled at -O2 or -O3.

Hm, are you sure you reverted the right one? I also didn't see a difference, until I realized I reverted the InstSimplify change instead of the InstCombine change, duh. After fixing that, I get these two files and diff: https://gist.github.com/nikic/ecbe2482367be25b17ab982c2346b661

I'm looking into this, and i believe those are improvements, not regressions.

Thanks! I double-checked my setup, and I'm still not able to repro. I wonder if platform makes a difference (I'm testing on macOS)?

Absolutely, more specifically libc++ (you) vs libstdc++ (me) makes difference.

Just staring at some of those math/logic+icmp combos made me think we're missing some underlying icmp transforms, so I'm working on a patch for at least 1 of those.

Nice!

nikic added a comment.Apr 24 2020, 2:38 PM

I'm looking into this, and i believe those are improvements, not regressions.

Yeah, I agree that the changes look okay. Always hard to tell when some innocuous single instruction change triggers a snowball effect because it trips an inlining heuristic somewhere.

Absolutely, more specifically libc++ (you) vs libstdc++ (me) makes difference.

Oh, good point!