This is an archive of the discontinued LLVM Phabricator instance.

Set minimum cost of speculating an instruction.
AbandonedPublic

Authored by danielcdh on Feb 17 2016, 2:48 PM.

Details

Reviewers
davidxl
Summary

Sometime SimplifyCFG is too aggressive in converting branch to select (if-conversion). The cost of speculation is under-estimated and would cause significant overhead. This patch set the minimum cost for speculating any instruction in the context of if-conversion. This speeds up stl map::lower_bound by 2X.

Diff Detail

Event Timeline

danielcdh updated this revision to Diff 48244.Feb 17 2016, 2:48 PM
danielcdh retitled this revision from to Set minimum cost of speculating an instruction..
danielcdh updated this object.
danielcdh added a reviewer: davidxl.
danielcdh added a subscriber: llvm-commits.
davidxl added inline comments.Feb 17 2016, 2:59 PM
lib/Transforms/Utils/SimplifyCFG.cpp
325

Why can't ComputeSpeculationCost return the correct cost here?

danielcdh updated this revision to Diff 48263.Feb 17 2016, 4:54 PM

Move the cost check to callee.

davidxl added inline comments.Feb 17 2016, 5:25 PM
lib/Transforms/Utils/SimplifyCFG.cpp
255

My real question is why TTI.getUserCost(I) returns the wrong cost here :)

danielcdh added inline comments.Feb 17 2016, 9:43 PM
lib/Transforms/Utils/SimplifyCFG.cpp
255

For this specific case, it's that GEP operator is mostly considered free (as soon as the addressing mode is legal). The reasoning is that if you dereference the output of GEP operator, the memory instruction can have the addressing encoded. But here GEP operator is not immediately used by a memory op. Instead, it's used later in other basic block, and the GEP operation actually translated to a real LEA instruction, thus it is not free.

I don't think it's quite right to set most GEP operator as 0-cost. But just by looking at the GEP operator itself, it's hard to tell whether it will be encoded into the memory op, or need to have separate instruction for it. In this case, it would be better to be conservative.

So what should I do? Change GEP operator's cost to TCC_Basic for all cases?

hfinkel added inline comments.
lib/Transforms/Utils/SimplifyCFG.cpp
255

So what should I do? Change GEP operator's cost to TCC_Basic for all cases?

Please don't do that; you'll be making the common case worse. The right answer is to look at the addressing mode and make a more-intelligent decision.

As it turns out, I recall having a patch that did something like this (see gep-add-cost-v2.patch in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140303/206866.html). We never really finished discussing it, but you might see if something like that patch helps here too.

davidxl edited edge metadata.Feb 17 2016, 10:25 PM

I think the correct heuristic change is : GEP cost can be considered 0 only when its result has a single use which is also the address of a memory operation. A more elaborate heuristic is that if GEP has a single use which is another GEP it can be merged with (and folded), its cost can be considered zero. See related bug when GEP merging does not do what it does when there are multiple uses (and folding does not happen with merging): https://llvm.org/bugs/show_bug.cgi?id=23163

The lower_bound case is quite interesting. If the input value is close to first or last, which means branch prediction within the loop will be mostly correct, the non-if-converted version is 2X faster than if-converted version. If the the input value is completely randomly distributed, the if-converted version is 2X faster than non-if-converted version.

So overall, it's hard to make the decision without looking at the branch misprediction rate for the specific branch.

Branch probability info needs to be used here for guidance if possible:

  1. highly biased branch should be marked as predictable and skipped
  2. the cost of speculation needs to be weighted using probability info.

Regarding the iterative nature of the SampleFDO, I think the right
approach is for the second iteration build to force ifcvt if the first
iteration does it (and leading to missing branch prob data in the
second iteration).

David

danielcdh abandoned this revision.Feb 8 2017, 4:13 PM