Not actively working on LLVM at the moment
User Details
- User Since
- Aug 13 2013, 3:10 PM (527 w, 5 d)
Wed, Sep 20
Mon, Sep 11
Tweak comments for weight constants.
Rework formula for prolog/epilog loop-backedge weights as suggested by @wenlei .
It seems there is no further feedback. Landing this now.
Tue, Sep 5
like when emitMemRuntimeChecks is done, we really just don't know how likely these arrays overlap or not.
Slightly different off-topic from the change itself, but for this to be future proof, I'm also wondering that eventually should we have the BranchInst::Create API for cond branch to always ask for branch weights if the parent function has non-zero entry count.
This is already abandoned, so will not further pursue this. But for the discussion:
Fri, Sep 1
I have a feeling @mtrofin would prefer pass-specific weights rather than a unified notion of "likely"/"unlikely"... So with the latest round of patches it's probably best to abandon this for now.
factor out weight constants in a similar way to D158642
Factor out weights constant to match the style used in D158642
factor out branch weight constants.
Thu, Aug 31
Aug 25 2023
Nice catch!
Aug 24 2023
Aug 23 2023
I'd also posit, that maybe since we're changing this we should reevaluate the numbers we use as defaults.
And as another strawman / discussion-starter I put up D158680 where I use !{"branch_weights", i32 1, i32 0} to represent likely branches and the actual "LikelyWeight" mostly becomes an internal implementation detail of the BranchProbabilityInfo class... This seemed like a neat way to get an abstract "likely", "unlikely" notation, but not sure of the effects if we no longer would have "truly zero" weights because they would be interpreted differently now...
And here's another straw-man proposal to get a discussion going: Because I am having trouble to say what the difference between branch_weights 1, 0 and llvm.expect even is: This change turns any branch_weights X, 0 or branch_weights 0, X into an abstract representation of likely/unlikely that BranchProbabilityInfo will interpret when computing the analysis...
On the other hand, I dislike exposing some internal detail of a pass.
My initial reaction to this was that we should keep the --unlikely-branch-weights flag available
- Fixed overflow test when scaling up weights.
- Still need a better way than hardcoding 127 as "likely" branch weight; hoping that people can give me some hint in the discussion I started in https://reviews.llvm.org/D158642
Maybe we should expose the "LikelyBranchWeight" of LowerExpectIntrinsic.cpp in an internal LLVM header? So frontends are still forced to use the llvm.expect abstractions while LLVM-internal code can do the quicker thing and add likely branch weights directly...
I guess we may need a discussion about how to express these "nearly always taken" / "nearly never taken" branch weights. The hardcoded 1:127 ratio in the current patch is pretty arbitrary.
Aug 18 2023
Aug 16 2023
Change code to avoid zero branch-weights.
Aug 15 2023
Aug 14 2023
Aug 9 2023
Decided to just use "utils/update_cc_test_checks.py" so we can inspect the generate llvm.expect calls directly.
Aug 8 2023
LGTM
As you can see, there still has many redundant operations. Just like the last iteration is to check whether it has reached a fixed point and etc.
Thanks, but there's no need to wait for me, land it. Given the previous test-case works this should be good. We have extensive nightly tests running with our codebase, but if something breaks again our oncall should contact you :)
Aug 3 2023
Thanks for looking into this! Sounds like we may need unreasonably large inputs so going without a test should be okay. Either as a separate patch or merged with this one, whatever works best for you.
Aug 2 2023
on another side note: The quadratic behavior for blocks with multiple predecessors would not happen for dataflow implementations "by-the-book" where the IN and OUT sets are saved separately. The fact that we compute the IN-set on the fly here but don't store it in the BlockData lead to the runtime explosion with the worklist algo...
as in https://reviews.llvm.org/D156850 that is.
I finally understand what happened. Turns out the variant loop runtime of the Worklist algorithm is dominated by some exception handling block having N predecessors for an input of genX input of size N. We updated every of those N predecessors and each time re-visisted that block and merged the info of all the N predecessors. So not sure what to make of this... it's surely a bit of an artifact of those artifically big inputs, but yeah we happen to avoid this problem by just doing multiple rounds of RPO over the whole functions as in https://reviews.llvm.org/D156850 so let's go with that.
I checked and the number of blocks visited only increases linearly. So that's all expected. There are some effects leading to quadratic compiletime as the Consume/Kill BitVectors also get bigger with number of basic blocks, but we can't avoid that and it's independent of dataflow solving order.
Curious... With a single loop (and no loop nestings) I would have expected the worklist algorithm to converge quick enough (like maybe double the time of the loop-free program in practice). Would still be interested to learn why this goes wrong as I think of the worklist algorithm as the standard way to solve dataflow problems...
No problem, thanks for following up!