This patch introduces a generic data-flow analysis framework to MLIR. The framework implements a fixed-point iteration algorithm and a dependency graph between lattice states and analysis. Lattice states and points are fully extensible to support highly-customizable analyses.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/include/mlir/Analysis/DataFlowFramework.h | ||
---|---|---|
33 | Is "lattice point" a well known term in the literature? Otherwise it seems to me that "program point" may be more descriptive here | |
168 | #if LLVM_ENABLE_ABI_BREAKING_CHECKS | |
182 | Why not PointT *point ? | |
mlir/lib/Analysis/DataFlowFramework.cpp | ||
36 ↗ | (On Diff #433272) | Nit: Spurious braces multiple times, here and elsewhere. |
rename LatticePoint to ProgramPoint and LatticeState to AnalysisState
to avoid confusion.
also fix formatting issues
mlir/include/mlir/Analysis/DataFlowFramework.h | ||
---|---|---|
67 | When reading the code, I find it confusing that sometimes it's in ProgramPoint::states and sometimes it's in DataFlowSolver::analysisStates. Can we unify to DataFlowSolver::analysisStates? | |
80 | I checked your other code, and BaseT seems to always be ProgramPoint. Is it possible for BaseT to be something else? If not, can we let ProgramPointBase always inherit ProgramPoint? | |
143 | I think the naming is a bit confusing here: Point and ProgramPoint have pretty much the same meaning. Maybe ProgramPoint and AbstractCustomProgramPoint? | |
263–268 | Can we make the ownership clearer: DenseMap<std::pair<Point, TypeID>, std::unique_ptr<AnalysisState>> analysisStates; I understand that you might not want ProgramPoint::states to own these AnalysisStates, so like what the other comment on ProgramPoint::states suggests, I would recommend unifying to DataFlowSolver::AnalysisState. |
mlir/include/mlir/Analysis/DataFlowFramework.h | ||
---|---|---|
67 | I did this for performance reasons. Let me check again what the difference is. |
mlir/include/mlir/Analysis/DataFlowFramework.h | ||
---|---|---|
67 | I'll revert this back to what it was before (as you suggested). I made some recent changes to the sparse analysis that makes this optimization redundant. I checked that, as things are now, it's no longer needed. | |
80 | This is a convention for storage uniquer types, but I'll make the change as I can't imagine scenarios in which the base type would be anything else. |
I think River's up next :)
But sure!
mlir/include/mlir/Analysis/DataFlowFramework.h | ||
---|---|---|
143 | I've named them ProgramPoint and GenericProgramPoint. |
(No need to block on me, I can't make next week, and I'll be OOO for a few weeks after)
Took a quick scan, will look more in detail tomorrow. Really appreciate the detailed documentation.
mlir/include/mlir/Analysis/DataFlowFramework.h | ||
---|---|---|
9–12 | ||
50 | Can this constructor be protected? (You can't create an instance of this class anyways). | |
129 | Drop the llvm:: here, same applies to various other things in this file (take a look at what's provided by LLVM.h) | |
196–200 | The use of draw here is kind of confusing, is this adding a dependency? | |
242–243 | Can we make this protected? | |
285 | Is the use of child necessary here? I would assume that we can have some analyses that are "parents" in some situations, and "children" in another. | |
291–292 | ||
294 | ||
302 | Are these defined in this commit? I can't find what these are supposed to be referencing. | |
314 | Could we just call this initialize? | |
322 | ? To avoid "update .. update" | |
mlir/lib/Analysis/DataFlowFramework.cpp | ||
13 ↗ | (On Diff #434216) | Why? Please prefer using LLVM_DEBUG directly instead. |
19 ↗ | (On Diff #434216) | Please prefer using namespace mlir; instead. |
121–126 ↗ | (On Diff #434216) | |
144 ↗ | (On Diff #434216) | Can you just inline these into the header? |
mlir/lib/Analysis/DataFlowFramework.cpp | ||
---|---|---|
13 ↗ | (On Diff #434216) | It was a little annoying wrapping it with LLVM_ENABLE_ABI_BREAKING_CHECKS everywhere... |
review comments
mlir/include/mlir/Analysis/DataFlowFramework.h | ||
---|---|---|
9–12 | https://jakubmarian.com/comma-before-that-and-which/ Odd to be digging into a grammar debate here... but I've seen this mistake a few times in the code base's comments :P | |
302 | soon^TM |
mlir/include/mlir/Analysis/DataFlowFramework.h | ||
---|---|---|
9–12 | Not sure that applies here given that you could wrap that portion within parentheses and it would make more sense. The sentence as written has a run-on feel of "and .. and" which apply to different parts of the sentence. Can you reword the sentence then? Right now it's confusing to read. |
mlir/lib/Analysis/DataFlowFramework.cpp | ||
---|---|---|
13 ↗ | (On Diff #434216) | Not sure what you mean, this is a .cpp you don't need to use LLVM_ENABLE_ABI_BREAKING_CHECKS anywhere... |
mlir/lib/Analysis/DataFlowFramework.cpp | ||
---|---|---|
13 ↗ | (On Diff #434216) | At least, I'm not sure what situations we have that set NDEBUG but not LLVM_ENABLE_ABI_BREAKING_CHECKS. |
mlir/lib/Analysis/DataFlowFramework.cpp | ||
---|---|---|
13 ↗ | (On Diff #434216) | Internally at google, LLVM_ENABLE_ABI_BREAKING_CHECKS is always turned off, even when NDEBUG is not set (the two debugName fields are not compiled when LLVM_ENABLE_ABI_BREAKING_CHECKS is not set). Alternatively, I can make the debugName fields always compile. |
mlir/include/mlir/Analysis/DataFlowFramework.h | ||
---|---|---|
121 | It's a content key. KeyTy is a required typedef of StorageUniquer::BaseStorage. I'll rename some of the stuff to make this more obvious. |
mlir/include/mlir/Analysis/DataFlowFramework.h | ||
---|---|---|
292 | I've been thinking about how to get rid of the need for propagateIfChanged(). One way to do it is to reintroduce the separation of "LatticeValue" and "LatticeElement", and here let's call them "AnalysisState" and "AnalysisStateElement" (we should try to find a better name than "Element"...). // User should override this. class AnalysisState { public: virtual ~AnalysisState(); virtual bool isUninitialized() const = 0; virtual ChangeResult defaultInitialize() = 0; virtual void print(raw_ostream &os) const = 0; protected: AnalysisState(ProgramPoint point) : point(point) {} StringRef debugName; }; // User should not override this. // AnalysisStateElement (instead of AnalysisState) is what's stored in DataFlowSolver::analysisStates. template <typename State> class AnalysisStateElement { public: // Read-only. const State &get() const { return state; } // User must return a ChangeResult, and the API does the // "propagateIfChanged". // // Usage: // update([&](State *state, Set<WorkItem> *added_dependees) { // ... // }); void update(function_ref<ChangeResult(State *, Set<WorkItem> *)> update_func) { SetVector<WorkItem> added_dependees; auto changed = update_func(&state, &added_dependees); dependees.insert(added_dependees.begin(), added_dependees.end()); if (changed) { for (auto &dependee : dependees) { solver.enqueue(dependee); } } } private: ProgramPoint point; State state; SetVector<WorkItem> dependees; DataFlowSolver &solver; }; This would address the following issues at the same time:
AnalysisStateElement is not to be overridden by the user. In particular, this does mean that we can't have the custom onUpdate() anymore. In my opinion, such logic belongs to visit(): we shouldn't spread analysis logic in both DataFlowAnalysis::visit() and onUpdate(). |
mlir/include/mlir/Analysis/DataFlowFramework.h | ||
---|---|---|
292 | (Note: "dependee" is a misnomer, it should be "dependent"...) I do appreciate that the separation of analysis state and element makes the API clearer, and I am aware that AnalysisState, DataFlowSolver, and DataFlowAnalysis are all tightly coupled, but there is a major problem with this approach. update_func is not going to know what the dependents are. The dependents could be analyses and states of which the writer of update_func is not aware. This will make any analyses written in this style not be naturally composable. For a sparse analysis, this is straightforward: getAnalysisElement<ConstantValue>(value).update( [&](ConstantValue *state, SetVector<WorkItem &> dependents) { do_update(state); for (Operation *user : value.getUsers()) dependents.insert({this, user}); }); But what happens if I write an AnalysisB that wants to depend on this state? The above analysis will need to know about that. This is why dependency management is the responsibility of the solver (and the states combined). Individual analyses can ask the solver, "hey, when this state gets update, invoke me again" or, "hey, when you get updated, invoke me on the users of the value". There are also performance implications. The dependents can be stored on the solver, but I moved them to the analysis state class to skip a map lookup when propagateIfChanged is called (which also improves cache locality). And onUpdate exists so that states can implement dependency logic outside of the big hash map, e.g. by using use-def chains. This results in significant performance boosts to constant propagation so that it matches the performance (within 5%, despite the analysis being split in two) of the current implementation. Separating the state itself and the dependency logic into separate classes would mean that both would have to be overwritten... I would agree that the API surface isn't perfect. addDependency is a function in DataFlowSolver but it just accesses private members of` AnalysisState`. That's a minor change and I should probably do that. |
(The reason I didn't have a backreference to DataFlowSolver in each AnalysisState instead of making the classes friends with each other is because I didn't want to increase the size of AnalysisState, but I haven't checked the memory usage on large programs yet so this point is a little moot).
I think this is in a good enough state that we can land and iterate on in-tree.
mlir/lib/Analysis/DataFlowFramework.cpp | ||
---|---|---|
13 ↗ | (On Diff #434216) | Setting aside how horribly broken of a behavior that is (that isn't your fault), I support your original patch of having a wrapper DEBUG macro. Having to worry about differences between NDEBUG/LLVM_ENABLE_ABI_BREAKING_CHECKS is really hacky.... |
53 ↗ | (On Diff #435376) | nit: Can you put comment blocks between the different class implementations? Makes the file a little easier to read. |
make AnalysisState and DataFlowAnalysis constructors public to avoid confusion
with mscv inheritance rules