Implements RFC discussed in: https://llvm.discourse.group/t/rfc-operationinstancesinterface-or-any-better-name/2158/10
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Follow block parents to compute number of block executions + compute number of executions for ForOp with constant bounds
Very nice!
I have a bunch of nits and several generalization suggestions.
mlir/include/mlir/Analysis/NumberOfExecutions.h | ||
---|---|---|
38 | Nit: explicit? | |
69 | There doesn't seem to be an accessor for this field, so IMO it can be dropped (unless there is an anticipated need for it). At which point, we can also consider using BlockNumberOfExecutionsInfo = Optional<int64_t>, which can be nicely constructed from an int64_t or from llvm::None. | |
mlir/include/mlir/Dialect/Async/IR/AsyncOps.td | ||
86–88 | It should be possible to do DeclareOpInterfaceMethods<RegionBranchOpInterface, ["getNumRegionInvocations"]> in the def and avoid this explicit declaration. | |
mlir/include/mlir/Interfaces/ControlFlowInterfaces.td | ||
135 | Nit: I'd be explicit and say something like "Populates countPerRegion with the number..." | |
137 | Nit: could we have a named constant instead of -1? | |
149 | Nit: countPerRegion.resize(numRegions, -1) avoids the loop | |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
45–46 | I wonder if this can be generalized to work on all constant-like operations. Currently, we only have a ConstantLike trait, so we cannot get the constant value as attribute without knowing the concrete op type. Maybe we can turn ConstantLike into an interface with value being an interface method. Out of scope for this commit probably. | |
53 | Could we rather query the op? Hardcoding FuncOp makes this analysis non-applicable to custom function types (we have some, e.g., in LLVM dialect and in Flang). | |
58 | Let's also be future-proof and guard against graph regions (https://mlir.llvm.org/docs/LangRef/#graph-regions). They don't have blocks (yet), but if they do, the logic of following block successors below will not necessarily apply. An assertion could do. | |
66 | Nit: we usually don't declare function stack variables as const in MLIR | |
71 | Nit: I would have just used auto and kept the original lambda type here, it's not like we are passing it around. | |
mlir/lib/Dialect/SCF/SCF.cpp | ||
303–306 | Could we rather extend the one we already have in MathExtras.h to support negative RHS? |
Address PR comments
mlir/include/mlir/Analysis/NumberOfExecutions.h | ||
---|---|---|
69 | I added a custom non-default constructible type to catch accidental default constructions of None instead of a failure. And maybe more advanced analysis can add some other fields in the future. Added this to class documentation for the next reader and removed unused Block*. | |
mlir/include/mlir/Dialect/Async/IR/AsyncOps.td | ||
86–88 | Cool, learning something new every day :) | |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
45–46 | Yeah, that would be useful also in the context of TF graphs with tf.Const ops, but I did not find any relevant examples and chose the easiest path. | |
53 | It guess in can just implement RegionBranchOpInterface, I decided to special case it because it's "built in" and not defined in any of the dialect .td files. I expect all other FuncOpLike things to implement RegionBranchOpInterface. | |
71 | Unfortunately lambdas do not support recursion without ugly tricks. |
LGTM. Please wait for one more reviewer to double-check the logic according to the forum discussion.
mlir/include/mlir/Analysis/NumberOfExecutions.h | ||
---|---|---|
69 | Ack. (Phabricator doesn't let anybody other than the author to mark comments as done). | |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
53 | ODS is not mandatory to implement an interface, ops defined in C++ can derive Traits<InterfaceName::Trait> and get the same functionality. I'm fine with your current implementation given the explanation provided. We should look into having FuncOp implement the interface separately and consult with @rriddle about future plans regarding it being built-in. |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
---|---|---|
53 | Yeah, I know, I just don't know is it "ok" to add it to built in type :) Assuming FuncOp is as fundamental as Region I think it's ok to have a special case, but I'd like to hear opinion from someone with a better judgement. |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
---|---|---|
53 | FuncOp isn't as fundamental as Region, I rather treat FuncOp as any other op. For example LLVM, SPIRV, GPU dialects aren't using FuncOp. |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
---|---|---|
53 | I've looked into adding RegionBranchOpInterface to FuncOp`, but it feels weird because the FuncOp does not execute the region, only the CallOp does. For example: getSuccessorEntryOperands: "Returns the operands of this operation used as the entry arguments", but the FuncOp is OpTrait::ZeroOperands. Maybe add a special trait, something like OpTrait::InvokesRegionsOnce, an additional piece of information just like OpTrait::OneRegion. Or just rely on OpTrait::FunctionLike here, assuming that all functions execute attached region once. |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
---|---|---|
14 | llvm includes go after MLIR includes. | |
19 | Is the dependency on standard ops necessary? | |
23 | Please trim some of these includes, several are unnecessary. | |
36 | Missing static here? | |
66 | This is not correct and seems like it would result in incorrect usages of the analysis. If you want to estimate function executions, you'll need to use the callgraph. | |
75 | Drop trivial braces. | |
122 | Nit: Flip the condition of this if, and remove the trivial braces. if (!...) blockInfo.insert | |
mlir/lib/Dialect/Async/IR/Async.cpp | ||
125 | nit: Seems like push_back(1) is better than an explicit resize and assignment. | |
mlir/lib/Interfaces/ControlFlowInterfaces.cpp | ||
78 | Prefer explicit qualification, i.e. mlir::kUnknownNumRegionInvocations |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
---|---|---|
66 | Why should functions be default to executed once? That doesn't seem like a conservatively safe default. |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
---|---|---|
66 | I agree, there seems to be a code smell here. One solution is for queries to be of the form Optional<int64_t> getNumberOfExecutions(Block *block, Region *assumingThisRegionIsEnteredOnce). So the analysis could hold a DenseMap<Block *, NumberOfExecutionsAssumingDirectParentRegionEnteredOnce>. Then on a query, you would walk up parents, multiplying NumberOfExecutionsAssumingDirectParentRegionEnteredOnce at each step until we reach assumingThisRegionIsEnteredOnce (and until we implement a call graph analysis, we will return None if assumingThisRegionIsEnteredOnce is not an ancestor of block) And the body of a module should probably be an illegal Region to pass as assumingThisRegionIsEnteredOnce, because it is a "declarative region" and never "executes" (we might need a new IR property to represent this?) test-print-number-of-executions would have some hardcoded policy to print values assuming assumingThisRegionIsEnteredOnce == enclosingFunc.getBody() |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
---|---|---|
66 | what about a function like def f(i): print("here") # How many executions will this be assigned? if i > 0: f(i - 1) I think we need to be more precise about what we mean by "execution" here. Also, does this pass correctly handle a case like? def g(i): if i == 0: abort() print("here") # How many executions? |
At the moment it is somewhat unclear at what scope this analysis is intended to be constructed and used. It seems like this currently doesn't handle things related to the callgraph, doesn't handle control flow well, and assumes that it only operates locally within a single function? I'm slightly concerned that we are building an analysis that is very tied with the constraints on async tokens as determined by D90716, without much reusability outside of those constraints. Along with Sean's point, I don't think the concept of "Execution" is very well scoped here and leads to some weird code smells with functions/modules/etc.
mlir/include/mlir/Dialect/SCF/SCFOps.td | ||
---|---|---|
201 | Replace -1 here with kUnknownNumRegionInvocations. | |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
54 | Use m_Constant here instead, we should avoid hardcoding operation types whenever possible. |
I'd say that call graph is irrelevant for this analysis. Call operations "consumes" SSA values and then it is the responsibility of the callee to do the right thing. But I'm speaking from the point of view of async-ref-counting, I'd love to hear what @ftynse can tell about his need for register assignment.
Maybe there is a better terminology then "number of executions" that is less confusing.
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
---|---|---|
66 | #1. 1 execution of the basic block that contains print operations. Function call is irrelevant here. #2. Unknown, because it involves dynamic control flow CFG will look something like this: ^bb0 %cond = cmp i == 0 cond_br %cond ^abort, ^print ^abort br ^print ^print def g(i): abort() print("here") # is a bit more tricky, but let's assume this is an # estimate given that no operations can terminate a program
"How many times this operation will be executed in this region assuming region invoked N times". Call operation is about completely different region (different instance of a region). |
FWIW Original async ref counting PR uses "number of operation instances in the region"
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
---|---|---|
66 | I see that the current code actually does walk parents in the way I described. Sorry. I thought somewhat about the definition of "execution" that we mean here. It's really "for *each* time that Region is entered, how many times will given operation be executed". That makes the definition work for my case f above, since it yields getNumberOfExecutions(thePrintOp, f.getBody()) == 1. Also, because of the possibility of aborting or calling a function that cannot be proven to terminate in the middle of a block, I think it is best for getNumberOfExecutions to take an Operation * instead of a Block, so that cases like this behave as intended, and users don't have to do their own within-a-block check: def h(): print("here1") # Number of executions is 1 per entry to h.getBody() call @weProvedThatItAbortsOrInfiniteLoops() print("here") # Number of executions is 0 per entry to h.getBody() As such, could we do something like an interface to this analysis that looks like: Optional<IntegerRange> getNumberOfExecutions(Operation *op, Region *perEntryOfThisRegion) IntegerRange is just a pair of integers that represent a lower and upper bound on the number of executions, which composes nicely to handle cases where we can't prove that the statement executes at all (like those tricky abort/infinite-loop cases) by having a lower bound of 0 (we could do something more precise than an interval, such as an arbitrary set of integers, but that might get unwieldy). To handle this correctly, getNumRegionInvocations will need to be wordsmithed to something like "number of times the body will be invoked if the body yields normally (i.e. doesn't abort or invoke an infinite loop)". Then this analysis can do a bottom up traversal of the region tree, doing the op-level analysis within each block looking for possibly aborting/infiniteLooping operations and correctly handling any ops that follow ops that might abort/infiniteLoop, and transitively propagating that up the region tree. We can make this more precise with interprocedural analysis that does the equivalent of inferring attributes like LLVM's norecurse/willreturn/mustprogress (see https://llvm.org/docs/LangRef.html#function-attributes) Under this definition, we have: def f(i): print("here") # Number of executions is 1 per entry to f.getBody() if i > 0: f(i - 1) def g(i): if i == 0: abort() print("here") # Number of executions is 0 or 1 per entry to g.getBody() def h(): print("here1") # Number of executions is 1 per entry to h.getBody() call @weProvedThatItAbortsOrInfiniteLoops() print("here") # Number of executions is 0 per entry to h.getBody() def i(): print("here1") # Number of executions is 1 per entry to i.getBody() call @cannotProveItDoesntAbortOrInfiniteLoop() print("here") # Number of executions is 0 or 1 per entry to i.getBody() |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
---|---|---|
66 | I updated documentation for the interface and NumberOfExecutions to mention "yields normally" and "each time" properties. I like this API ... Optional<IntegerRange> getNumberOfExecutions(Operation *op, Region *perEntryOfThisRegion) ... but for now computing the IntegerRange is not really needed and it adds a lot of complexity. I plan to add API with Optional<int64_t> to represent "known fixed number" vs "everything else" Optional<int64_t> getNumberOfExecutions(Operation *op, Region *perEntryOfThisRegion); and assume that all operations are "normal": do not abort, or go into infinite loop == "completed in finite amount of time". |
Implment Optional<int64_t> getNumberOfExecutions(Operation *op, Region *perEntryOfThisRegion) API
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
---|---|---|
66 | Thanks. I think with the new API, we don't need these special cases. FunctionLike already didn't make sense because an op is not "executed" by the module region (we should probably return "unknown" (or an error) for getNumberOfExecutions(someOpInAFunction, module.getBody()). OpTrait::OneRegion doesn't guarantee anything about the number of invocations. E.g. scf::ForOp implements OpTrait::OneRegion. | |
136 | The note that all operates are assumed to not abort/infloop should be in the documentation comment for users of the API; it is not an implementation detail because it makes the analysis incorrect for some potential users. |
Thanks Eugene. All I have left are nits. This LGTM.
We probably want to wait on River to give final sign-off. In particular surrounding this analysis not being conservative w.r.t. potentially aborting/nonterminating computations.
mlir/include/mlir/Support/MathExtras.h | ||
---|---|---|
32 | nit: do we have a unittest you could extend? (or add? :) ) |
The new wording makes more sense to me and clears the confusion I had previously, thanks! LGTM after the remaining comments.
mlir/include/mlir/Analysis/NumberOfExecutions.h | ||
---|---|---|
38 | nit: in a finite amount | |
98 | nit: Use /// for comments here. | |
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
16 | Can this function include be removed? | |
39 | This seems like it would break when a graph region is nested within a CFG region. Can we avoid the direct assert here and just use kUnknownNumRegionInvocations instead? I could see an assert on the top level operation, but is there a reason to assert on nested operations? | |
51–60 | ||
129 | nit: in a finite amount of time | |
209–222 |
Address PR comments
mlir/lib/Analysis/NumberOfExecutions.cpp | ||
---|---|---|
209–222 | I had to walk block and then operations to get sensible output for filecheck, otherwise it prints inner operation before its parent. |
Nit: explicit?