This revision adds the initial pass for performing SCCP generically in MLIR. SCCP is an algorithm for propagating constants across control flow, and optimistically assumes all values to be constant unless proven otherwise. It currently supports branching control, with support for regions and inter-procedural propagation being added in followups.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I think the test cases are missing one of the key scenarios where a control flow path / block isn't executed because the predicate is a constant but that is only known by detecting that the same constant propagates across the back edge as well as the other predecessor. And to know that the same constant propagates, you need to know that the aforementioned control flow path doesn't execute - sort of a catch22. An example (mixed basic blocks with some C for brevity).
func @foo ^bb0: c1 = constant 1 br ^bb1(c1) ^bb1(x1) if x1 < 20 x2 = 1 br ^bb1(x2) else x3 = 50 br ^bb1(x3)
mlir/include/mlir/Transforms/Passes.td | ||
---|---|---|
282 | remove -> removed | |
mlir/test/Transforms/sccp.mlir | ||
38 | Nit: No phi's please! PHI values -> arguments ? |
Thanks for the review Uday!
I added the test case as simple_loop_inner_control_flow, let me know if that is what you had in mind or if there are any other tests you'd like to see.
mlir/lib/Transforms/SCCP.cpp | ||
---|---|---|
44–45 | Nit: A value with dynamic values can be confusing - instead, A value that cannot statically be determined to be a constant | |
130 | Could you expand this comment? | |
157 | Nit: do either backticks and regular single quotes work for parameters in doc comments? | |
314 | You can with the last output parameter on tryToFold - is this what you were looking for? bool inPlaceUpdate; FoldUtils::tryToFold(op, nullptr, nullptr, &inPlaceUpdate); // operation is folded if inPlaceUpdate is false | |
345–364 | Could this reuse tryToFold? | |
374–378 | Trivial braces. | |
mlir/test/Transforms/sccp.mlir | ||
101 | Mention here to the effect that this is not just inner control flow but you are testing sensitivity to non-executable edges in the CFG. | |
122–124 | You could make this a little stronger/realistic by just doing an IV increment here. %iv_inc = addi %iv, %cst_1 br ^bb1(%iv_inc) Still the same result. | |
126–127 | This looks good, but reading the test case in isolation won't make it clear as to what happens to the non-executable edge/path post SCCP on this test case. Does it stay as is, or becomes a dead/unreachable block, or is deleted? Add additional checks depending on what it is? (I just jumped here without looking at all of the actual code) |
The test case looks good - thanks. A round of mostly superficial comments. Btw, why is the build failing? I can't tell by chasing the links here:
https://reviews.llvm.org/harbormaster/build/61458/ and the status here is fine as well: https://buildkite.com/mlir/mlir-core
(This is for my future reference as well.)
Thanks for the review!
mlir/lib/Transforms/SCCP.cpp | ||
---|---|---|
157 | I'm not sure as we don't generate doxygen ATM for me to look at. The codebase is fairly inconsistent to either, but changed all within this file to consistently use ''. We should figure that out though, and standardize to using one. | |
314 | This is a bit of a tricky situation. We don't actually want to fold here, this is essentially just simulated execution with constant parameters. The constants we have here aren't guaranteed to be those at runtime, so it's better to avoid the extra overhead of OperationFolder as we don't want to generated anything at this point. | |
345–364 | Same comment above. We don't want to generate any constant operations here, we just want to know what the output would be. As stated above, the constant values are the current values from the lattice not the current IR. So even if we did use it, it wouldn't give us the result we want. | |
374–378 | Thanks for the catch, they weren't trivial at one point but I forgot to cleanup. | |
mlir/test/Transforms/sccp.mlir | ||
126–127 | Added a few extra checks for making sure various things get folded. ATM we just insert constants. I've debated back and forth about adding in the CFG canonicalizations here, but given that canonicalize can already take care of those I decided to keep the initial versions focused on the propagation. |
Btw, why is the build failing? I can't tell by chasing the links here:
https://reviews.llvm.org/harbormaster/build/61458/ and the status here is fine as well: https://buildkite.com/mlir/mlir-core
(This is for my future reference as well.)
I tried looking at the log, but it seems like it's just failing to create a local git branch to apply the diff to. I have been seeing a lot of harbormaster failures lately that just result to "failed to apply patch". I've just been ignoring these failures if local testing works.
mlir/lib/Transforms/SCCP.cpp | ||
---|---|---|
314 | Right - while the algorithm is still running, a current 'constant' lattice value could be later lowered to the bottom symbol ('overdefined'). But once the algorithm has converged, all 'Unknown' lattice values will become either 'constant' or 'overdefined' at which point we could fold the relevant ops. Do you want to collect such ops? You'll still ideally need a new pattern rewriting driver that takes in a list of ops to process as the starting point. In any case, the comment above on "Don't try to fold the results ... won't be in-place" needs to be fixed - it's misleading because tryToFold does in-place updates as well. Instead, change to "can't guarantee that ... will be out of place"? |
mlir/lib/Transforms/SCCP.cpp | ||
---|---|---|
314 | Yeah, this is what I mentioned in the comment in the test file. I'm not sure all of the types of foldings we want to do within SCCP itself at this point. Right now we replace any results that were found to be constant and erase any operations that we can, but that doesn't cover all of the potential simplifications we could do. Right now my main goal is to get the constant propagation working across regions and inter-procedurally. After that I think we can tune the amount of additional simplifications we do here based on how it fits into user pipelines. Also, thank you for the comment suggestion! |
Yes, I think nearly all the harbormaster builds from at least the last few days are failing (all for me) - this is also the reason many commits in the last few days broke builds - everyone is ignoring notifications. This is a slippery slope esp with the revisions that are changing CMake files.
Some more comments. Again, I've been looking at things a bit locally. Will take a global look in the next round. Are your test cases still missing scenarios like:
- paths propagating different constants meeting (you have one with back edges (the last one) but none with the straightforward if/else style).
- a test case where you have say a multi operand op (say addi) that has one operand coming directly from a constant op and the other via a block argument whose both predecessors send in the same constant.
mlir/lib/Transforms/SCCP.cpp | ||
---|---|---|
324 | But getResults() is empty! | |
354 | Nit: Try to fold -> Simulate folding .... This is the part that's contradictory in the face of FoldUtils::tryToFold and the comment above "Don't try to fold ..." | |
359 | Why isn't this conservative? If you mark it overdefined already, it can never be highered again. What if some of its operand lattice values are later lowered from unknown to constant? Are you treating both unknown and overdefined in the same way here for simulated folding purposes? |
Added a test case for 1, 2 doesn't seem to cover any new code paths. There is already a test for folding an addi that has an input from a loop iv.
mlir/lib/Transforms/SCCP.cpp | ||
---|---|---|
324 | getResults isn't necessarily empty if the operations has regions. | |
359 | It already is conservative, if you look above we only get to this point if all of the operands have been resolved. At this point the operands can only go to overdefined, so there isn't a way that more information can be propagated to this point. |
mlir/lib/Transforms/SCCP.cpp | ||
---|---|---|
324 | But you've already handled the > 0 regions case in the block above and have returned for all those cases. You don't need the '|| conditional' altogether here. |
Resolve comments
mlir/lib/Transforms/SCCP.cpp | ||
---|---|---|
324 | Oh duh, sorry about that. Thanks for pointing that out. Accidentally missed that when changing region handling. |
Since the 'S' in SCCP is for sparse because this is for SSA representations and given that MLIR is all already SSA, there won't be a CCP. So you could just call this file ConditionalConstantPropagation.cpp if you prefer too instead of adding yet another hard to type four letter uppercase acronym. The CL flag could continue to use -sccp or -ccp.
Overall, this code looks really good. I can do a complete review unless Jacques/Mehdi who were originally added as reviewers plan to review it anyway.
mlir/lib/Transforms/SCCP.cpp | ||
---|---|---|
90 | The standard term for this in the SCCP paper and the literature is meet - these are the meet rules. mergeIn -> meet? | |
132 | Typo: any -> many. | |
359 | Thanks - that's what I missed - the check above for unknown operands. | |
456 | argLattice is actually invariant? Hoist it out of the loop, and use it in the block above where you are marking it overdefined. |
Resolve comments
mlir/lib/Transforms/SCCP.cpp | ||
---|---|---|
456 | It is recomputed in the loop to avoid potential iterator invalidation w.r.t the lattice for the branch operand. Refactored to avoid eagerly constructing the branch operand lattice. |
Conceptually this looks very similar to LLVM's SCCP. Do you anticipate using any MLIR specific properties to cover more cases than the LLVM version? I am not really up-to-speed with the latest on MLIR dialects, but to me it seems like there is a set of transformations where there is currently no additional information to use from MLIR dialects compared to LLVM IR, like SCCP or GVN.
Viewed very simplistically, they only require the following from an IR: SSA, a way to traverse a function along CFG edges, a way to simplify instructions and a way to replace values. Do you think it would be feasible to provide some kind of interface to abstract those and then work towards sharing the implementations of the underlying algorithms between LLVM & MLIR? Or is the plan to duplicate various passes from LLVM also in MLIR?
Thanks for the comment Florian! I would be very +1 on sharing common implementations of the more generalized algorithms if we can. From a technical perspective, the major problems I've encountered with bridging the gap are a few impedance mismatches between LLVM and MLIR that arise from different implementation decisions:
- The multi-level aspect leads to special handling for various constructs in MLIR(e.g. regions/structured control flow/etc.) that aren't present in LLVM.
- MLIR globals(like functions) are not SSA values, and require specific handling because they do not use or interact with the traditional SSA use list.
- Block arguments vs PHIs
When adding things in MLIR that already exist in LLVM, at least for me, it is a cost-benefit computation between the amount of work it would take to refactor the thing in LLVM to be usable by MLIR. By amount of work I mean not only the technical aspects, but also the effort to convince the community that it is the right thing to do. I'd love to share as much as possible(as we do for things like ADT/Support/etc) and I'm happy to collaborate in that direction, but if I'm driving it alone the cost has often out-weighed the benefits.
That'll be an interesting and likely recurring question moving forward. I wonder however if when applicable LLVM would be OK to take a slow-down from generalizing passes to MLIR and going through interfaces for the sake of sharing the implementation?
Thanks for sharing the list. It seems like handling nested regions/structured control flow would be the most tricky to abstract. I am not really up-to-date on what is possible with nested regions, but would the interactions allowed between different regions be similar across all types of regions or could it be dependent on the region type? I.e. is it safe to propagate constants to all regions or is it unsafe to propagate constants between certain types of regions?
I think the other 2 issues should be fairly straight-forward to abstract.
When adding things in MLIR that already exist in LLVM, at least for me, it is a cost-benefit computation between the amount of work it would take to refactor the thing in LLVM to be usable by MLIR. By amount of work I mean not only the technical aspects, but also the effort to convince the community that it is the right thing to do. I'd love to share as much as possible(as we do for things like ADT/Support/etc) and I'm happy to collaborate in that direction, but if I'm driving it alone the cost has often out-weighed the benefits.
I think that's a reasonable trade-off. I mostly wanted to get a discussions started on that topic, which would definitely not be a short-term kind of project. I'm not too familiar with the MLIR side of things, but I might have a bit of time to collaborate on the LLVM side of such a project. It might be good to sync up/discuss ideas somewhere else than the current review ;)
I think the answer here depends a lot on how this would look like in practice. Are you referring to slow-down in terms of runtime or development?
In terms of run-time overhead, the abstraction would have to be very cheap I think, but that should be feasible as long as it is limited to a few key aspects (like traversing functions/blocks/regions and an interface to simplify instructions given a set of input values (which LLVM has)). In terms of slowing down development, I am personally not too concerned. The passes that seem likely candidates (SCCP, GVN) don't seem to see a huge amount of ongoing development.
I think sharing implementations could be beneficial for both LLVM and MLIR, as it would hopefully mean more users = more testing = more people willing to work on fixes/improvements.
mlir/lib/Transforms/SCCP.cpp | ||
---|---|---|
157 | The doxygen is actually generated - it's here. https://mlir.llvm.org/doxygen/ | |
308 | If the value is an op result, do you want to / can you erase this op? You already have a pre inc iterator at its call site. |
remove -> removed