This is an archive of the discontinued LLVM Phabricator instance.

LoopVectorize: Set branch_weight for conditional branches
AcceptedPublic

Authored by MatzeB on Aug 31 2023, 6:02 PM.

Details

Summary

Consistently add branch_weights metadata in any condition branch created from LoopVectorize.cpp:

  • Will only add metadata if the original loop-latch branch had metadata assigned.
  • Most checks should rarely trigger so I am using a 127:1 ratio.
  • For the middle block we assume an equal distribution of modulo results.

Diff Detail

Event Timeline

MatzeB created this revision.Aug 31 2023, 6:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2023, 6:02 PM
MatzeB requested review of this revision.Aug 31 2023, 6:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2023, 6:02 PM

For the discussion about the 127:1 ratio see also D158668 (and discussion in D158642 and D157462)

For the discussion about the 127:1 ratio see also D158668 (and discussion in D158642 and D157462)

added a comment in D158642 (api stuff, not actual weight values. xur@ might have an opinion about those)

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1934

This could be const, looks like, correct?

2875

worth factoring the 3 lines in a little utility?

3111

can either of the VF methods return 0?

MatzeB updated this revision to Diff 555482.Sep 1 2023, 12:58 PM
MatzeB marked an inline comment as done.
MatzeB marked 2 inline comments as done.
MatzeB added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3111

I don't think vectorizing with a vector width of 0 (or a minimum of 0) can happen.

mtrofin accepted this revision.Sep 1 2023, 1:03 PM
mtrofin added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3111

Ack, maybe an assert about that might be well placed here?

This revision is now accepted and ready to land.Sep 1 2023, 1:03 PM
MatzeB updated this revision to Diff 555537.Sep 1 2023, 5:15 PM

factor out weight constants in a similar way to D158642

wenlei added a comment.Sep 4 2023, 2:00 PM

This is a good change. Thanks.

Most checks should rarely trigger so I am using a 127:1 ratio.

Intuitively this feels reasonable, but I just don't feel confident enough.. like when emitMemRuntimeChecks is done, we really just don't know how likely these arrays overlap or not.

Is the plan to fill in something that's somewhat reasonable first which is basically a best effort guess, and then later fine tune that? This case is different from the general likely/unlikely weights, which is more straightforward.

Slightly different off-topic from the change itself, but for this to be future proof, I'm also wondering that eventually should we have the BranchInst::Create API for cond branch to always ask for branch weights if the parent function has non-zero entry count.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
395

Suggest add comments for these constants - what they are for and the reasoning for the selected weights.

400

nit: this is not really unique to loop vectorize. Either move it to common header, and use it consistently across the code base, or stick with the current convention, which achieves the same thing with this:

Br->setMetadata(LLVMContext::MD_prof, MDBuilder(Br->getContext())
                         .createBranchWeights(TrueWeight, FalseWeight));

I'd avoid inconsistency if possible.

MatzeB added a comment.Sep 5 2023, 1:47 PM

Slightly different off-topic from the change itself, but for this to be future proof, I'm also wondering that eventually should we have the BranchInst::Create API for cond branch to always ask for branch weights if the parent function has non-zero entry count.

I don't think we can enforce this. As when the API itself creates an instruction without adding it to a basic block, so you don't know about the surrounding function. Also forcing things would disallow some programming patterns like adding the weights later (I can't think of a use case for that admittedly, but I also wouldn't say this will never be useful)...

Admittedly I was contemplating whether adding an optional parameter to the API would make sense but it feels inconsistent... We have lots of XXXInst::Create functions but they all are like a low-level API that take the minimum amount of parameters necessary to construction an instruction and I would feel odd to start being opinionated and allow some metadata for convenience. Also usually we have the "convenience" style APIs in IRBuilder::createXXX already the code just didn't happen to use them here...

MatzeB added a comment.Sep 5 2023, 1:53 PM

like when emitMemRuntimeChecks is done, we really just don't know how likely these arrays overlap or not.

Is the plan to fill in something that's somewhat reasonable first which is basically a best effort guess, and then later fine tune that? This case is different from the general likely/unlikely weights, which is more straightforward.

I don't know either how likely these cases are. And in the end it will depend heavily on the programming patterns and style of the source code. The weights may be right for one program while the next program might repeat a pattern where the weights are wrong.

However I am convinced that the current weights are better than the default 50:50 assumption that we would get for not setting any weights so it's good to set them. I don't really have data on how good/bad the choice here is and we should definitely keep tuning this (or ideally find ways to get FDO feedback, even though admittedly I don't know how FDO/PGO feedback would work as that seems a tricky problem when mapping backwards, but it's something I at least want to start thinking deeper about :) )

MatzeB added inline comments.Sep 5 2023, 1:58 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
400

Well honestly I did not want to create a helper function, because I knew reviewers would ask me to apply to the whole project. I also tried applying it to the whole project, but there are a lot of varying patterns in the codebase that I would regress when forcing them to this API:

  • Some cases construct the weights node upfront and re-use it within a loop.
  • Some instances already have MDBuilder or Context references around and want to re-use them.
  • Some instances don't have BI added to the basic block yet so BI.getContext() call will fail.
  • Some instances rather use the IRBuilder::createCondBr etc.

So I did not feel like I created consistency but just added to the amount of inconsistency by adding one more helper. Hence I kept this local...

MatzeB added inline comments.Sep 5 2023, 1:59 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
400

And for the record: There may be value in changing bigger parts of the codebase, but it may be bigger than it looks and is certainly out of scope for this change.

wenlei accepted this revision.Sep 18 2023, 9:57 AM

lgtm assuming comments for constants will be added. thanks.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
395

We discussed this topic on D158642, and it applies here too :)

400

There may be value in changing bigger parts of the codebase, but it may be bigger than it looks and is certainly out of scope for this change.

Agreed, and yes that should be a separate change.

Not a deal breaker, but I'd prefer defer all this to a bigger change for the codebase later, and just use the one liner here..

So I did not feel like I created consistency but just added to the amount of inconsistency by adding one more helper. Hence I kept this local...

The thing is.. later people may add another static/local setBranchWeights to another file, because they look at this change and be like - oh, there's precedence of doing that and I didn't create that inconsistency.. :)

If you really feel strongly about keeping this local helper, I'm fine.. But I suggest just use that one-liner without the helper.