We rely on LCSSA in case when the uniform value defined inside the loop with divergent exit is used outside the loop.
In some cases EarlyCSE eliminates LCSSA PHIs because of the algorithm used in SimplifyPHINode function.
It always assumes more then one incoming values and mistakenly removes LCSSA PHIs.
This causes generation of incorrect code in AMDGPU backend.
Details
- Reviewers
rampitec nhaehnle hakzsam lebedev.ri spatel
Diff Detail
Event Timeline
This causes generation of incorrect code in AMDGPU backend.
This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)
That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.
Manually written IR w/o lcssa phis will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....
Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.
- You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
- How does this affect other targets (backends)? Does this need some TLI hook?
Sure! Good point. Thanks.
Although adding target specific information to such a general part requires a lot of changes to pass TTI from the Pass level to all the calls.
In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?
Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.
Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis
Okay. You are right for this concrete example. I also agree that LCSSA need to be simplified in trivial cases.
The problem is that this algorithm eliminates them in all cases. It looks like it was designed to detect PHIs with equal inputs and eliminates LCSSA unintentionally.
Anyway, I've got what is your objection about.
That means I have to change all the function declarations on the way to be able to pass TTI to check if it is the "divergent" target (
Can't you just schedule an LCSSAPass somewhere late in pipeline, before SelectionDAG and after early-cse?
Since this problem is being encountered, i'm guessing that even without this workaround,
the same problem can still be encountered if the original IR is not in LCSSA form to begin with,
i.e. i think the D60834 sounds like the properer fix (ignoring the roadblocks on the way there.)
Agree - InstSimplify is a very early (but frequently used) target-independent (no TTI/TLI) canonicalization pass/analysis. I don't see why it should be crippled as proposed in this patch.
The problem is that target-independent IR canonicalization passes like EarlyCSE are running late in the IR codegen pipeline?
Also, this code change causes lots of existing middle-end test failures, so it doesn't appear to be some harmless/degenerate pattern:
Failing Tests (43):
LLVM :: Transforms/CodeGenPrepare/X86/computedgoto.ll LLVM :: Transforms/Coroutines/ArgAddr.ll LLVM :: Transforms/CorrelatedValuePropagation/select.ll LLVM :: Transforms/GlobalDCE/complex-constantexpr.ll LLVM :: Transforms/IndVarSimplify/rewrite-loop-exit-value.ll LLVM :: Transforms/Inline/AArch64/phi.ll LLVM :: Transforms/Inline/bfi-update.ll LLVM :: Transforms/Inline/inline-fast-math-flags.ll LLVM :: Transforms/Inline/inline_constprop.ll LLVM :: Transforms/InstCombine/gep-combine-loop-invariant.ll LLVM :: Transforms/InstCombine/gepphigep.ll LLVM :: Transforms/InstMerge/st_sink_bugfix_22613.ll LLVM :: Transforms/InstSimplify/phi.ll LLVM :: Transforms/LICM/2003-05-02-LoadHoist.ll LLVM :: Transforms/LoopInstSimplify/basic.ll LLVM :: Transforms/LoopSimplify/ashr-crash.ll LLVM :: Transforms/LoopSimplify/merge-exits.ll LLVM :: Transforms/LoopUnroll/runtime-loop-multiple-exits.ll LLVM :: Transforms/LoopUnroll/runtime-loop4.ll LLVM :: Transforms/LoopUnroll/runtime-multiexit-heuristic.ll LLVM :: Transforms/LoopUnroll/runtime-unroll-remainder.ll LLVM :: Transforms/LoopUnroll/unroll-cleanup.ll LLVM :: Transforms/LoopUnswitch/2007-07-12-ExitDomInfo.ll LLVM :: Transforms/LoopUnswitch/unswitch-equality-undef.ll LLVM :: Transforms/LoopVectorize/AMDGPU/packed-math.ll LLVM :: Transforms/LoopVectorize/X86/already-vectorized.ll LLVM :: Transforms/LoopVectorize/X86/float-induction-x86.ll LLVM :: Transforms/LoopVectorize/X86/reg-usage.ll LLVM :: Transforms/LoopVectorize/X86/small-size.ll LLVM :: Transforms/LoopVectorize/X86/unroll-pm.ll LLVM :: Transforms/LoopVectorize/first-order-recurrence.ll LLVM :: Transforms/LoopVectorize/gcc-examples.ll LLVM :: Transforms/LoopVectorize/if-conversion.ll LLVM :: Transforms/LoopVectorize/opt.ll LLVM :: Transforms/LoopVectorize/read-only.ll LLVM :: Transforms/LoopVectorize/reduction.ll LLVM :: Transforms/LoopVectorize/value-ptr-bug.ll LLVM :: Transforms/LoopVectorize/vectorize-once.ll LLVM :: Transforms/PhaseOrdering/reassociate-after-unroll.ll LLVM :: Transforms/PhaseOrdering/simplifycfg-options.ll LLVM :: Transforms/SimpleLoopUnswitch/2007-07-12-ExitDomInfo.ll LLVM :: Transforms/SimpleLoopUnswitch/update-scev.ll LLVM :: Transforms/ThinLTOBitcodeWriter/no-type-md.ll