This is an archive of the discontinued LLVM Phabricator instance.

[DomTree][NFC] Introduce function that finds common dom of multiple instructions
AbandonedPublic

Authored by mkazantsev on Nov 18 2020, 10:38 PM.

Details

Summary

This patch adds function findNearestCommonDominatorInst that, for a given
set of instructions, finds their nearest dominating instruction.

Diff Detail

Event Timeline

mkazantsev created this revision.Nov 18 2020, 10:38 PM
mkazantsev requested review of this revision.Nov 18 2020, 10:38 PM
mkazantsev planned changes to this revision.Nov 19 2020, 2:09 AM

Fixed bug with duplicating instructions in the list. Added relative checks.

skatkov added inline comments.Nov 19 2020, 2:48 AM
llvm/lib/IR/Dominators.cpp
348

What if you meet PHi node?

352

Introduce a variable for: CommonDom->getParent()
It appears three time in a loop.

mkazantsev marked 2 inline comments as done.

Added elaborate comments & tests with Phis.

mkazantsev added inline comments.Nov 19 2020, 4:18 AM
llvm/lib/IR/Dominators.cpp
348

Added comment explaining how we handle phis, added respective tests.

Algorithm lgtm.
+ @kuhar for additional review.

skatkov accepted this revision.EditedNov 19 2020, 6:38 PM

Thanks! LGTM.

Probably it makes sense to wait a bit for @kuhar review as @asbirlea requested.

This revision is now accepted and ready to land.Nov 19 2020, 6:38 PM
kuhar added a comment.Nov 20 2020, 8:59 AM

As an alternative implementation, have you considered teaching DomTree::findNCD to work with multiple basic blocks first, and then using this to implement the multi-instruction findNCD?
The algorithm would be like this: create a vector VBB of BB of all instructions, find its NCD, case split on the NCD and VBB being the same block or not.

llvm/include/llvm/IR/Dominators.h
206–207

Does this work for invoke instructions? I'm thinking of a situation when two instructions are in a block whose immediate dominator ends with an invoke but the invoke result is not available for these two instructions.

kuhar requested changes to this revision.Nov 20 2020, 9:02 AM
kuhar added inline comments.
llvm/lib/IR/Dominators.cpp
344

nit: please add some vowels. Why not Instructions?

373

Maybe assert that CommDom actually dominates all Insns when EXPENSIVE_CHECKS are enabled?

This revision now requires changes to proceed.Nov 20 2020, 9:02 AM
mkazantsev added inline comments.Nov 22 2020, 8:52 PM
llvm/include/llvm/IR/Dominators.h
206–207

We don't guarantee that the result will be available. We only guarantee that this invoke will be executed before we reach each of the points in arg list.

As an alternative implementation, have you considered teaching DomTree::findNCD to work with multiple basic blocks first, and then using this to implement the multi-instruction findNCD?
The algorithm would be like this: create a vector VBB of BB of all instructions, find its NCD, case split on the NCD and VBB being the same block or not.

I was considering that, but don't really see how it's useful. The existing code makes N iterations, for which of them it either calls comesBefore or findNCD (findNCD will be called as many times as many different blocks there is - 1).

In the proposed solution, I'll need to:

  • FIll the set of blocks (because API of findNCD needs blocks)
  • Call multi-arg findNCD (which will call two-arg NCD same amount of times)
  • make comesBefore queries for all instructions in topmost block.

Effectively it's always the same number of findNCD queries, overhead on set creation and potentially less number of comesBefore queries (in my solution we do it in all blocks). So overhead on set filling doesn't seem worth saving some number of cheap comesBefore queries in general case. So I just don't see why this alternative solution is better.

I was considering that, but don't really see how it's useful. The existing code makes N iterations, for which of them it either calls comesBefore or findNCD (findNCD will be called as many times as many different blocks there is - 1).

In the proposed solution, I'll need to:

  • FIll the set of blocks (because API of findNCD needs blocks)
  • Call multi-arg findNCD (which will call two-arg NCD same amount of times)
  • make comesBefore queries for all instructions in topmost block.

Effectively it's always the same number of findNCD queries, overhead on set creation and potentially less number of comesBefore queries (in my solution we do it in all blocks). So overhead on set filling doesn't seem worth saving some number of cheap comesBefore queries in general case. So I just don't see why this alternative solution is better.

I was thinking more in terms of API design, not algorithmic efficiency. To me it seems like there's a clear boundary between block-level and instruction-level analysis here and I propose to separate the concerns more if this doesn't complicate the implementation. I think that finding the NCD of multiple blocks should be useful by itself; there was even a question about it on llvm-dev fairly recently: https://groups.google.com/g/llvm-dev/c/TZ3MCpwO5Yg/m/Z4MfZz_jBAAJ.

llvm/include/llvm/IR/Dominators.h
206–207

I'd find it very surprising if the result of findNCDInst returned an instruction that didn't dominate an argument.

Note that dominates is defined above as:

/// In particular, it is worth noting that:
///  * Non-instruction Defs dominate everything.
///  * Def does not dominate a use in Def itself (outside of degenerate cases
///    like unreachable code or trivial phi cycles).
///  * Invoke/callbr Defs only dominate uses in their default destination.
bool dominates(const Value *Def, const Use &U) const;
skatkov added inline comments.Nov 22 2020, 10:29 PM
llvm/include/llvm/IR/Dominators.h
206–207

As I understand this function is not about def and use at all while in the doc you mentioned it is about def-use chain.

And why do you think that result of this function does not dominate any of arguments? It does to my understanding if consider this as an extension of Basic Block domination definition.

skatkov added inline comments.Nov 22 2020, 10:37 PM
llvm/include/llvm/IR/Dominators.h
206–207

Does it make sense to explicitly mention in the doc function that it is not about def-use chain but about instructions and use an example with invoke to emphasize it to avoid confusion and incorrect usage of this utility function?

kuhar added inline comments.Nov 22 2020, 10:44 PM
llvm/include/llvm/IR/Dominators.h
206–207

As I understand this function is not about def and use at all while in the doc you mentioned it is about def-use chain.

Yeah, you are right.

And why do you think that result of this function does not dominate any of arguments?

By 'an argument' I meant any instruction in Insns, not the type Argument.

It does to my understanding if consider this as an extension of Basic Block domination definition.

I got tripped of that this generalizes some existing function that checks if an instruction dominates another instruction. But now that you pointed out that the two-argument dominates operates on instructions and *uses*, I see that this is not the case.

Where do you plan to use findNCDInst? I'm trying to understand why bool dominates(Inst, Inst) and Inst findNCDInst(Inst, Inst) haven't been used before.

You can find the planned user in dependencies stack: https://reviews.llvm.org/D90456

I need the common dom to find the latest context where we can prove predicate so that it's also true for all given instructions.

mkazantsev added inline comments.Nov 22 2020, 10:52 PM
llvm/include/llvm/IR/Dominators.h
206–207

The user of this: D90456.

I don't need def-use notion here, just precedence in execution flow.

You can find the planned user in dependencies stack: https://reviews.llvm.org/D90456

I need the common dom to find the latest context where we can prove predicate so that it's also true for all given instructions.

What do you think about defining and implementing bool dominates(Inst, Inst) at the same time (or before) findNCDInst? IMO it's awkward to explain what finding the NCD of instructions means without first defining dominance between instructions. Then you can use that definition as a post-condition for findNCDInst and check that the result dominates all Insns. Similarly, I'd also implement findNCD for multiple blocks first and then use it in the version for instructions. Otherwise I'd see it as building something very specific without laying out all the foundations first.

You can find the planned user in dependencies stack: https://reviews.llvm.org/D90456

I need the common dom to find the latest context where we can prove predicate so that it's also true for all given instructions.

What do you think about defining and implementing bool dominates(Inst, Inst) at the same time (or before) findNCDInst?

It will have a signature clash with existing dominates(def, use). For that, we first need to rename existing dominates (including all its users across the code) into something like defUseAvailable, and only then implement this. It's a massive refactoring of whole DomTree which I definitely don't want to be blocked on. As you see, as set of changes is currently waiting for this patch.

We can make this, but separately, as follow-up, after the changes are unblocked, if it makes sense.

mkazantsev added inline comments.Nov 22 2020, 11:50 PM
llvm/lib/IR/Dominators.cpp
373

dominates(def, use) has a slightly different semantics and fails for dominates(x, x). I also think ther will be tricky corner cases when it will fail when both args are Phis from the same block.

I see the deep API problem here now. Current dominates(Value *, Instruction *) should not be called like that (at least because it returns false for dominates(X, X)). It has a different semantics, and should be called something like "canBeUserOf".

The clear definition of dominance between two instructions is also something fishy. Specifically, it is fishy for Phi nodes from the same block (there is no precedence between them). Maybe the best name to express what we need in this patch would be comesBefore.

It looks like a vast rework of whole DomTree API. I'm OK doing that, but do not want to have my patches waiting for this API rework (they honestly don't need). Let's unblock this branch first and then think how to name it better.

mkazantsev marked 2 inline comments as done.

Added expensive asserts;
Renamed Insns->Instructions

Further rework, I think, should go separately, because it affects code outside this one.

Fixed check (as expected, dominates fails for two Phis from the same block).

kuhar added a comment.Nov 23 2020, 8:32 AM

I see the deep API problem here now. Current dominates(Value *, Instruction *) should not be called like that (at least because it returns false for dominates(X, X)). It has a different semantics, and should be called something like "canBeUserOf".

The clear definition of dominance between two instructions is also something fishy. Specifically, it is fishy for Phi nodes from the same block (there is no precedence between them). Maybe the best name to express what we need in this patch would be comesBefore.

It looks like a vast rework of whole DomTree API. I'm OK doing that, but do not want to have my patches waiting for this API rework (they honestly don't need). Let's unblock this branch first and then think how to name it better.

I see three ways forward:

  1. Propose the renaming of the existing .dominates functions on llvm-dev and to that before this work.
  2. Add instruction-level dominance function, but call it differently to avoid name collision, e.g., dominatesInst. Use this in this revision and propose refactoring afterwards.
  3. Implement findNCDInst as a free function where it's actually necessary. Maybe this notion of dominance and NCD of instructions is domain-specific enough that there's not much value in adding this to DominatorTree?

I'd be most comfortable with (2) or (3).

mkazantsev abandoned this revision.Nov 23 2020, 8:54 PM

Let's go with 3. The idea was to share this code if someone might need it. I'm totally fine if it's written every time in different places whenever someone needs this logic.