Sometime SimplifyCFG is too aggressive in converting branch to select (if-conversion). The cost of speculation is under-estimated and would cause significant overhead. This patch set the minimum cost for speculating any instruction in the context of if-conversion. This speeds up stl map::lower_bound by 2X.
Details
Diff Detail
Event Timeline
lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
325 | Why can't ComputeSpeculationCost return the correct cost here? |
lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
255 | My real question is why TTI.getUserCost(I) returns the wrong cost here :) |
lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
255 | For this specific case, it's that GEP operator is mostly considered free (as soon as the addressing mode is legal). The reasoning is that if you dereference the output of GEP operator, the memory instruction can have the addressing encoded. But here GEP operator is not immediately used by a memory op. Instead, it's used later in other basic block, and the GEP operation actually translated to a real LEA instruction, thus it is not free. I don't think it's quite right to set most GEP operator as 0-cost. But just by looking at the GEP operator itself, it's hard to tell whether it will be encoded into the memory op, or need to have separate instruction for it. In this case, it would be better to be conservative. So what should I do? Change GEP operator's cost to TCC_Basic for all cases? |
lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
255 |
Please don't do that; you'll be making the common case worse. The right answer is to look at the addressing mode and make a more-intelligent decision. As it turns out, I recall having a patch that did something like this (see gep-add-cost-v2.patch in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140303/206866.html). We never really finished discussing it, but you might see if something like that patch helps here too. |
I think the correct heuristic change is : GEP cost can be considered 0 only when its result has a single use which is also the address of a memory operation. A more elaborate heuristic is that if GEP has a single use which is another GEP it can be merged with (and folded), its cost can be considered zero. See related bug when GEP merging does not do what it does when there are multiple uses (and folding does not happen with merging): https://llvm.org/bugs/show_bug.cgi?id=23163
The lower_bound case is quite interesting. If the input value is close to first or last, which means branch prediction within the loop will be mostly correct, the non-if-converted version is 2X faster than if-converted version. If the the input value is completely randomly distributed, the if-converted version is 2X faster than non-if-converted version.
So overall, it's hard to make the decision without looking at the branch misprediction rate for the specific branch.
Branch probability info needs to be used here for guidance if possible:
- highly biased branch should be marked as predictable and skipped
- the cost of speculation needs to be weighted using probability info.
Regarding the iterative nature of the SampleFDO, I think the right
approach is for the second iteration build to force ifcvt if the first
iteration does it (and leading to missing branch prob data in the
second iteration).
David
My real question is why TTI.getUserCost(I) returns the wrong cost here :)