Introduces a TargetLowerig::optimizeFallthroughsEarly() switch to
stop SelectionDAG flipping jump targets to create fallthroughs. Optimizing
for fallthroughs can negatively affect canonicalization and we can usually
leave the decision to MachineBlockPlacement.
Motivation
In most targets comparison instructions produce results for multiple
relations, so a program like if (x == C) { ... } else if (x > C) { ... }
can use a single compare instructions for 2 conditional branches. This pattern
is widespread in one of our systems (a decref operation with support for
immortal objects).
Machine-CSE would not consistently trigger, because:
SelectionDAG would often optimizes for fallthroughs,
flip true/false and negate the condition:
x > C --> !(x > C) with true/false branches flipped --> X <= C --> X < (C + 1) canonicalization
So we end up with a different constant inhibiting CSE!
However with MachineBlockPlacement optimzing control flow
and fallthroughs anyway there is no real benefit from
SelectionDAG optimizing early when it negatively affects
canonicalization.
Target Specifics
I did not apply the change to most targets:
- ARM: IfConverter pass currently relies on fallthroughs existing
- AArch64: Has a custom AArch64ConditionOptimizer pass to CSE comparison. Strangely the unit tests do not show consistent behavior with and without these changes. Glancing at the code I wonder if the algorithm only checks control flow among the "true" direction of a conditional branch and misses opportunities in the "false" direction.
- PowerPC+Mips: It seems in some tests not instructions are inserted when optimizing for fallthrough instead of flipping true/false.
- BPF+WebAssembly: The TargetInstrInfo::analyzeBranch implementation appears to be incomplete so MachineBlockPlacement cannot optimize for fallthroughs.
- Other targets: Test regressions that I did not understand.
The changes are enabled for: X86, RISCV, Lanai, XCore and by default
for new targets.
Test Changes
There is a lot of test churn. Most is benign because of different
canonicalization (which in fact makes it more likely to see the same
constant in LLVM-IR and assembly now).
There are a number of tests containing unoptimized conditional branches
jumping on true/false, getting better or worse. But those are just
artifacts of bugpoint reduction or manual construction and
won't happen in practice after InstCombine:
est/CodeGen/X86/callbr-asm-blockplacement.ll
test/CodeGen/X86/pr29170.ll
test/CodeGen/X86/pr46585.ll
test/CodeGen/X86/setcc-freeze.ll
test/CodeGen/X86/2008-04-17-CoalescerBug.ll
test/CodeGen/X86/2007-01-13-StackPtrIndex.ll
test/CodeGen/X86/2007-03-01-SpillerCrash.ll
test/CodeGen/X86/2007-12-18-LoadCSEBug.ll
test/CodeGen/X86/legalize-shift-64.ll
test/CodeGen/X86/avx-cmp.ll
test/CodeGen/X86/setcc-freeze.ll
test/CodeGen/X86/shrink-compare-pgso.ll
These tests show improved codegen because duplicated CMPs get
eliminated:
test/CodeGen/X86/speculative-load-hardening-indirect.ll
test/CodeGen/X86/switch-density.ll
test/CodeGen/X86/cse-cmp.ll (new test)
These appear to improve because of different normalization:
test/CodeGen/X86/cmp-bool.ll
test/CodeGen/X86/fp128-select.ll
test/CodeGen/X86/xmulo.ll
clang-format: please reformat the code