If consecutive select instructions are lowered separately in CGP, it will introduce redundant condition check and branches that cannot be removed by later optimization phases. This patch lowers all consecutive select instructions at the same to to avoid inefficent code as demonstrated in https://llvm.org/bugs/show_bug.cgi?id=29095
Details
Diff Detail
Event Timeline
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
4545 | Assert SI1->getParent() == SI2->getParent()? | |
4549 | Use 'auto It', a const_iterator, and clang-format? | |
4553 | Is SI2 not dependent on SI1 if, e.g SI2->getTrueValue() == (add SI 1)? I'm fuzzy on what 'dependent' means in this context, sorry! | |
4556 | This would be better off as an llvm_unreachable; that should let you remove the return. | |
4571 | I'm worried that this is quadratic in the number of SelectInsts in a BB. Is there a way to perform the isDependent check by looking at the users of a SelectInst instead of traversing the BB? | |
test/CodeGen/X86/codegen-prepare-select.ll | ||
8 ↗ | (On Diff #70041) | Please move these checks closer to the IR which produces the expected assembly. Similarly, please move the NOT tests closer to the IR which used to produce the corresponding assembly. |
40 ↗ | (On Diff #70041) | It would help me follow along if you could reduce this test a bit more. E.g, besides the branch_weights, I'm not sure if any of the metadata here is really needed. Similarly, it's not clear if/why some of the operations in the loop are really needed (mul, ashr, sext). If they really are needed, it would help to have a small explanatory comment. |
Thanks! I found this much easier to follow.
It might be worth adding negative tests (for >1 jae), where the conditions of adjacent SelectInst's differ, or where there is overlap between the true/false values.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
4549 | Always push SI first and start from the next instruction? | |
4558 | Early return true if ASI size is 1? | |
4559 | Probably use a different variable name from the input parameter. | |
4560 | I1 --> TI | |
4564 | Add a comment here for explanation. | |
4613 | Instead of skipping the optimization, It is better to keep the current behavior if those selects can not be grouped -- optimize them one by one. Skipping optimization completely need more discussion and can be done as follow up. |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
4738 | There is probably no need to do reverse iteration. I suggest create a helper function Value *getTrueOrFalseValue(SelectInst *SI, bool isTrue, const SmallPtrSet<...>& Selects) { Value *V; do { V = (isTrue? SI->getTrueValue() : SI->getFalseValue()); VI = dyn_cast<SelectInst>(V); if (!VI || !Selects.count(VI)) break; SI = VI; } while (true); return V; } Then the code below can be simplified a lot and easier to read: PN->addIncoming(getTrueOrFalseValue(SI, true, INS), TrueBlock); |
Assert SI1->getParent() == SI2->getParent()?