As highlighted in D82438, the speculation cost isn't accumulated in SpeculativelyExecuteBB. So now include the selects, the speculated instruction and any constant expressions against the set threshold.
Details
Diff Detail
Event Timeline
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
2099–2104 | Please add a new cl::opt for that. So i'd go with 20. |
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
2099–2104 | Most of the cost modelling framework actually tries to assume that branches are always predicated though, and I don't understand why making arbitrary decisions about machine details would be a good idea here. Plus, it's also completely irrelevant to code size... It would seem to be better to query the latency / size of the branch, but I highly doubt that would produce results that anyone would want... |
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
2099–2104 | I'll rephrase: given that the cost modelling logic changed, |
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
2099–2104 | Okay... so I guess we need to start at: What is the purpose of this code? I think it's trying to convert any number of phis and speculate, at most, a single instruction (and maybe try to figure out something with constexprs). If PHINodeFoldingThreshold is supposed to represent the cost of a cmp + sel, this doesn't feel completely broken to me but the threshold could just be PHINodeFoldingThreshold * NumPHIs? |
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
2099–2104 | I have no idea, it's you who is changing/fixing this, I'm just saying that since the entire cost-modelling logic |
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
2099–2104 | Yeah, so I think multiplying by the number of phis should bring us back inline with what we were doing before. |
Rebased after reorganising the two parts of code that evaluate phis. The budget now allows a basic cost for each phi within the block, plus a basic cost for the one instruction that we may speculate.
Please add a new cl::opt for that.
I believe, the threshold should be the branch misprediction cost,
i.e. if it takes less cost (latency) to execute the branch
then to wrongly predict that it won't be executed,
but then rewind and execute it anyways, we should just execute it.
So i'd go with 20.