This is an archive of the discontinued LLVM Phabricator instance.

[DomTree] Accept Value as Def (NFC)
ClosedPublic

Authored by nikic on Oct 17 2020, 12:01 PM.

Details

Summary

Non-instruction defs like arguments, constants or global values always dominate all instructions/uses inside the function. This case currently needs to be treated separately by the caller, see https://reviews.llvm.org/D89623#inline-832818 for an example. This patch makes the dominator tree APIs accept a Value instead of an Instruction and always returns true for the non-Instruction case.

A complication here is that BasicBlocks are also Values. For that reason we can't support the dominates(Value *, BasicBlock *) variant, as it would conflict with dominates(BasicBlock *, BasicBlock *), which has different semantics. For the other two APIs we assert that the passed value is not a BasicBlock.

Diff Detail

Event Timeline

nikic created this revision.Oct 17 2020, 12:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2020, 12:01 PM
nikic requested review of this revision.Oct 17 2020, 12:01 PM

I like this. might be able to clean up some call sites with this. I wait a bit for other opinions.

Interesting idea.

The first question that comes to my mind is what does it mean for and value to dominate a use.
The function-level comment says:

/// Return true if Def dominates a use in User.

this is clear when thinking about CFG-level graph dominance, but globals and arguments are not really present in the CFG. I guess the intention is to actually check if a value is always 'available' before a use, i.e., either executes (as an instruction) in the same function, or 'statically' available in the module. Is this consistent with your patch and understanding, @nikic?

If that's the case, I guess a theoretical way to tie it back to dominance would be to say that these values are defined in the virtual CFG entry which dominates all 'real' function entries (currently exactly one in LLVM IR). We don't currently have virtual entry nodes in dominators, but adding them would have other benefits too (faster re-rooting, support for multiple entry points).

Interesting idea.

The first question that comes to my mind is what does it mean for and value to dominate a use.
The function-level comment says:

/// Return true if Def dominates a use in User.

this is clear when thinking about CFG-level graph dominance, but globals and arguments are not really present in the CFG. I guess the intention is to actually check if a value is always 'available' before a use, i.e., either executes (as an instruction) in the same function, or 'statically' available in the module. Is this consistent with your patch and understanding, @nikic?

Yes, that's the interpretation I have in mind. I think this is also consistent with the interpretation of existing DominatorTree methods. The CFG-level concept of graph dominance is only really applicable to the base API operating on basic blocks, while other APIs perform reasoning in terms of value availability. For example, if you ask whether an invoke dominates an instruction in the unwind block, then the graph-level interpretation would tell you "yes". However, the DomTree API will actually tell you "no", because it knows that the invoke result is only available in the success block.

kuhar accepted this revision.Oct 19 2020, 12:57 PM

OK, thanks for the clarification. Can you update the function-level comment with an explanation like this?

This revision is now accepted and ready to land.Oct 19 2020, 12:57 PM
nikic updated this revision to Diff 299453.Oct 20 2020, 1:21 PM

Extend API comments a bit.

kuhar accepted this revision.Oct 20 2020, 1:48 PM

Thanks for updating the comments.

One corner case: how does dominance work over global values (possibly with initializers)? Is it possible to have invalid IR where a global has a broken initializer, but dominates answers true? I don't know this part of IR very well and am not sure if such issues are possible or not.

nikic added a comment.Oct 20 2020, 3:21 PM

One corner case: how does dominance work over global values (possibly with initializers)? Is it possible to have invalid IR where a global has a broken initializer, but dominates answers true? I don't know this part of IR very well and am not sure if such issues are possible or not.

I can't say I'm particularly familiar with details of globals either. Here's the relevant bit of LangRef on the question of globals and dominance:

As SSA values, global variables define pointer values that are in scope (i.e. they dominate) all basic blocks in the program. Global variables always define a pointer to their “content” type because they describe a region of memory, and all memory objects in LLVM are accessed through pointers.

Because globals (as SSA values) refer to the pointer to the global, rather than the value of the global, I don't think the well-formedness of the initializer really comes into play here (though I'm also not sure in what way globals initializers can be "broken" in LLVM).

This revision was automatically updated to reflect the committed changes.

Allowing the dominates call to accept any Value can lead to unexpected errors, because there can be classes that are derived from Value that have nothing to do with the IR or the DT in some cases, and for those the result will always be true instead of an error or an assert. Pruning all such cases to include in the internal assert also doesn't make sense to me.

For example, all MemoryAccesss in MemorySSA are Values, but their use-user chains are isolated to their own "island" (a use of a MemoryAccess must be another MemoryAccess). A DT.dominate() call with a MemoryAccess as the first argument wouldn't make any sense, and should not be allowed, but the new API will accept it and return true.
Again, this is just an example, and many more may be added in the future, so it wouldn't make sense to assert inside the DT dominate call the DefV is not a MemoryAccess.

I think a better option is to add APIs for some of the specific cases you're looking for, for example a Constant (GlobalValue is subclassed already) or an Argument.
Even a generic API for Value should check for the specific cases known to be ok and return true for those, and then assert for anything else.

nikic added a comment.Oct 22 2020, 1:47 PM

Allowing the dominates call to accept any Value can lead to unexpected errors, because there can be classes that are derived from Value that have nothing to do with the IR or the DT in some cases, and for those the result will always be true instead of an error or an assert. Pruning all such cases to include in the internal assert also doesn't make sense to me.

For example, all MemoryAccesss in MemorySSA are Values, but their use-user chains are isolated to their own "island" (a use of a MemoryAccess must be another MemoryAccess). A DT.dominate() call with a MemoryAccess as the first argument wouldn't make any sense, and should not be allowed, but the new API will accept it and return true.
Again, this is just an example, and many more may be added in the future, so it wouldn't make sense to assert inside the DT dominate call the DefV is not a MemoryAccess.

I think a better option is to add APIs for some of the specific cases you're looking for, for example a Constant (GlobalValue is subclassed already) or an Argument.
Even a generic API for Value should check for the specific cases known to be ok and return true for those, and then assert for anything else.

APIs for specific cases don't really make sense to me: If I know in advance that something is an Argument or Constant, there's no point in performing a domination check, as the result is known in advance. The point of accepting a Value* here is that you can pick an arbitrary IR value and perform a dominance check on it, without having to perform dyn_casts in advance.

Of course, this is only meaningful for values that appear as normal instruction operands, i.e. other instructions, arguments and constants. Using this with anything else is not meaningful, and I do agree that the assertion should reflect that. I've adjusted it to check for Constant/Argument only in https://github.com/llvm/llvm-project/commit/c0e8c94373b4e97d2eecc53bb16d22e085b73948.

With the stricter assertion, does this seem reasonable to you? We have many APIs that accept Value*s, and I daresay that nearly all of them wouldn't know what to do with a MemoryAccess*, so I don't see this as a fundamental problem as long as such arguments are properly rejected rather than silently misbehaving.

Thank you for the fix.