This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Analysis/
-
mlir/
-
Analysis/
30/30
DataFlowFramework.h
-
lib/Analysis/
-
Analysis/
-
CMakeLists.txt
11/11
DataFlowFramework.cpp
-
test/
-
Analysis/
-
test-foo-analysis.mlir
-
lib/Analysis/
-
Analysis/
-
CMakeLists.txt
-
TestDataFlowFramework.cpp
-
tools/mlir-opt/
-
mlir-opt/
-
mlir-opt.cpp
-
utils/bazel/llvm-project-overlay/mlir/
-
bazel/
-
llvm-project-overlay/
-
mlir/
-
BUILD.bazel
-
test/
-
BUILD.bazel

Differential D126751

[mlir] Add a generic data-flow analysis framework
ClosedPublic

Authored by Mogball on May 31 2022, 6:53 PM.

Download Raw Diff

Details

Reviewers

phisiart
rriddle
mehdi_amini
aartbik

Commits

rGead75d9434ec: (Reland)[mlir] Add a generic data-flow analysis framework
rG9dea11728340: [mlir] Add a generic data-flow analysis framework

Summary

This patch introduces a generic data-flow analysis framework to MLIR. The framework implements a fixed-point iteration algorithm and a dependency graph between lattice states and analysis. Lattice states and points are fully extensible to support highly-customizable analyses.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Mogball created this revision.May 31 2022, 6:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2022, 6:53 PM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 21 others. · View Herald Transcript

Mogball requested review of this revision.May 31 2022, 6:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2022, 6:53 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Mogball planned changes to this revision.May 31 2022, 6:54 PM

Harbormaster completed remote builds in B167187: Diff 433272.May 31 2022, 7:07 PM

mehdi_amini added inline comments.Jun 1 2022, 8:41 AM

mlir/include/mlir/Analysis/DataFlowFramework.h
34	Is "lattice point" a well known term in the literature? Otherwise it seems to me that "program point" may be more descriptive here
169	`#if LLVM_ENABLE_ABI_BREAKING_CHECKS`
183	Why not `PointT *point` ?
mlir/lib/Analysis/DataFlowFramework.cpp
37	Nit: Spurious braces multiple times, here and elsewhere.

rename LatticePoint to ProgramPoint and LatticeState to AnalysisState
to avoid confusion.

also fix formatting issues

Mehdi's review comments

Mogball planned changes to this revision.Jun 1 2022, 12:34 PM

Harbormaster completed remote builds in B167326: Diff 433494.Jun 1 2022, 12:46 PM

phisiart requested changes to this revision.Jun 2 2022, 4:51 PM

phisiart added a subscriber: phisiart.

phisiart added inline comments.Jun 2 2022, 4:52 PM

mlir/include/mlir/Analysis/DataFlowFramework.h
68	When reading the code, I find it confusing that sometimes it's in `ProgramPoint::states` and sometimes it's in `DataFlowSolver::analysisStates`. Can we unify to `DataFlowSolver::analysisStates`?
81	I checked your other code, and `BaseT` seems to always be `ProgramPoint`. Is it possible for `BaseT` to be something else? If not, can we let `ProgramPointBase` always inherit `ProgramPoint`?
144	I think the naming is a bit confusing here: Point and ProgramPoint have pretty much the same meaning. Maybe `ProgramPoint` and `AbstractCustomProgramPoint`?
264–269	Can we make the ownership clearer: DenseMap<std::pair<Point, TypeID>, std::unique_ptr<AnalysisState>> analysisStates; I understand that you might not want `ProgramPoint::states` to own these `AnalysisState`s, so like what the other comment on `ProgramPoint::states` suggests, I would recommend unifying to `DataFlowSolver::AnalysisState`.

Mogball added inline comments.Jun 2 2022, 8:52 PM

mlir/include/mlir/Analysis/DataFlowFramework.h
68	I did this for performance reasons. Let me check again what the difference is.

Mogball added inline comments.Jun 3 2022, 12:17 PM

mlir/include/mlir/Analysis/DataFlowFramework.h
68	I'll revert this back to what it was before (as you suggested). I made some recent changes to the sparse analysis that makes this optimization redundant. I checked that, as things are now, it's no longer needed.
81	This is a convention for storage uniquer types, but I'll make the change as I can't imagine scenarios in which the base type would be anything else.

review comments.

Also adds an onUpdate function to analysis states

Mogball added reviewers: rriddle, mehdi_amini.Jun 3 2022, 5:32 PM

Harbormaster completed remote builds in B167864: Diff 434216.Jun 3 2022, 6:17 PM

Now we need an ODM :)

I think River's up next :)

But sure!

mlir/include/mlir/Analysis/DataFlowFramework.h
144	I've named them `ProgramPoint` and `GenericProgramPoint`.

In D126751#3558372, @Mogball wrote:

I think River's up next :)

But sure!

(No need to block on me, I can't make next week, and I'll be OOO for a few weeks after)

Mogball added a child revision: D127064: [mlir] Add Dead Code Analysis.Jun 4 2022, 7:45 PM

Took a quick scan, will look more in detail tomorrow. Really appreciate the detailed documentation.

mlir/include/mlir/Analysis/DataFlowFramework.h
10–13
51	Can this constructor be protected? (You can't create an instance of this class anyways).
130	Drop the llvm:: here, same applies to various other things in this file (take a look at what's provided by LLVM.h)
197–201	The use of `draw` here is kind of confusing, is this adding a dependency?
243–244	Can we make this protected?
286	Is the use of child necessary here? I would assume that we can have some analyses that are "parents" in some situations, and "children" in another.
292–293
295
303	Are these defined in this commit? I can't find what these are supposed to be referencing.
315	Could we just call this `initialize`?
323	? To avoid "update .. update"
mlir/lib/Analysis/DataFlowFramework.cpp
14	Why? Please prefer using `LLVM_DEBUG` directly instead.
20	Please prefer `using namespace mlir;` instead.
122–127
145	Can you just inline these into the header?

Mogball added inline comments.Jun 6 2022, 8:35 AM

mlir/lib/Analysis/DataFlowFramework.cpp
14	It was a little annoying wrapping it with `LLVM_ENABLE_ABI_BREAKING_CHECKS` everywhere...

review comments

mlir/include/mlir/Analysis/DataFlowFramework.h
10–13	https://jakubmarian.com/comma-before-that-and-which/ Odd to be digging into a grammar debate here... but I've seen this mistake a few times in the code base's comments :P
303	soon^TM

Harbormaster completed remote builds in B168080: Diff 434505.Jun 6 2022, 9:36 AM

rriddle added inline comments.Jun 6 2022, 11:10 PM

mlir/include/mlir/Analysis/DataFlowFramework.h
10–13	Not sure that applies here given that you could wrap that portion within parentheses and it would make more sense. The sentence as written has a run-on feel of "and .. and" which apply to different parts of the sentence. Can you reword the sentence then? Right now it's confusing to read.

rriddle added inline comments.Jun 6 2022, 11:11 PM

mlir/lib/Analysis/DataFlowFramework.cpp
14	Not sure what you mean, this is a .cpp you don't need to use LLVM_ENABLE_ABI_BREAKING_CHECKS anywhere...

rriddle added inline comments.Jun 6 2022, 11:15 PM

mlir/lib/Analysis/DataFlowFramework.cpp
14	At least, I'm not sure what situations we have that set NDEBUG but not LLVM_ENABLE_ABI_BREAKING_CHECKS.

Mogball marked 3 inline comments as done.Jun 7 2022, 10:09 AM

Mogball added inline comments.

mlir/lib/Analysis/DataFlowFramework.cpp
14	Internally at google, LLVM_ENABLE_ABI_BREAKING_CHECKS is always turned off, even when NDEBUG is not set (the two `debugName` fields are not compiled when LLVM_ENABLE_ABI_BREAKING_CHECKS is not set). Alternatively, I can make the `debugName` fields always compile.

clang-format

Herald added a reviewer: aartbik. · View Herald TranscriptJun 7 2022, 10:19 AM

Harbormaster completed remote builds in B168340: Diff 434865.Jun 7 2022, 10:20 AM

fix diff

Harbormaster completed remote builds in B168345: Diff 434870.Jun 7 2022, 10:56 AM

Sorry for the delay, I'll leave some more comments tomorrow.

mlir/include/mlir/Analysis/DataFlowFramework.h
119	Throughout the class this is called `key` so maybe `getKey()`?
122	"key and contents" doesn't seem to match the code.

Mogball added inline comments.Jun 8 2022, 8:59 AM

mlir/include/mlir/Analysis/DataFlowFramework.h
122	It's a content key. `KeyTy` is a required typedef of `StorageUniquer::BaseStorage`. I'll rename some of the stuff to make this more obvious.

change "key" to "value" in GenericProgramPointBase

Mogball marked 2 inline comments as done.Jun 8 2022, 4:03 PM

Harbormaster completed remote builds in B168709: Diff 435376.Jun 8 2022, 4:27 PM

phisiart added inline comments.Jun 9 2022, 12:58 AM

mlir/include/mlir/Analysis/DataFlowFramework.h
293	I've been thinking about how to get rid of the need for `propagateIfChanged()`. One way to do it is to reintroduce the separation of "LatticeValue" and "LatticeElement", and here let's call them "AnalysisState" and "AnalysisStateElement" (we should try to find a better name than "Element"...). // User should override this. class AnalysisState { public: virtual ~AnalysisState(); virtual bool isUninitialized() const = 0; virtual ChangeResult defaultInitialize() = 0; virtual void print(raw_ostream &os) const = 0; protected: AnalysisState(ProgramPoint point) : point(point) {} StringRef debugName; }; // User should not override this. // AnalysisStateElement (instead of AnalysisState) is what's stored in DataFlowSolver::analysisStates. template <typename State> class AnalysisStateElement { public: // Read-only. const State &get() const { return state; } // User must return a ChangeResult, and the API does the // "propagateIfChanged". // // Usage: // update([&](State state, Set<WorkItem> added_dependees) { // ... // }); void update(function_ref<ChangeResult(State , Set<WorkItem> )> update_func) { SetVector<WorkItem> added_dependees; auto changed = update_func(&state, &added_dependees); dependees.insert(added_dependees.begin(), added_dependees.end()); if (changed) { for (auto &dependee : dependees) { solver.enqueue(dependee); } } } private: ProgramPoint point; State state; SetVector<WorkItem> dependees; DataFlowSolver &solver; }; This would address the following issues at the same time: Currently `AnalysisState` is a somewhat awkward abstraction: it contains logic on dependency management. I think that `AnalysisState` should be a standalone class that only represents the stored state itself (i.e. the lattice value); all dependency management logic is considered part of the analysis and should appear in `DataFlowAnalysis` which the user overrides. By introducing `AnalysisStateElement` we force all state updates to go through a single entry point (`update()` in the example code I wrote). This way we don't need to expose `propagateIfChanged` and the user will never forget to call it. From a reader's perspective, currently `AnalysisState` is heavily coupled with `DataFlowSolver`, which is a big class. In particular, the reader must understand what they are expected to do with `DataFlowSolver` in `onUpdate`. By creating a separate `AnalysisStateElement` class, `AnalysisState` becomes more readable. `AnalysisStateElement` is not to be overridden by the user. In particular, this does mean that we can't have the custom `onUpdate()` anymore. In my opinion, such logic belongs to `visit()`: we shouldn't spread analysis logic in both `DataFlowAnalysis::visit()` and `onUpdate()`.

Mogball added inline comments.Jun 9 2022, 12:02 PM

mlir/include/mlir/Analysis/DataFlowFramework.h
293	(Note: "dependee" is a misnomer, it should be "dependent"...) I do appreciate that the separation of analysis state and element makes the API clearer, and I am aware that `AnalysisState`, `DataFlowSolver`, and `DataFlowAnalysis` are all tightly coupled, but there is a major problem with this approach. `update_func` is not going to know what the dependents are. The dependents could be analyses and states of which the writer of `update_func` is not aware. This will make any analyses written in this style not be naturally composable. For a sparse analysis, this is straightforward: getAnalysisElement<ConstantValue>(value).update( [&](ConstantValue state, SetVector<WorkItem &> dependents) { do_update(state); for (Operation user : value.getUsers()) dependents.insert({this, user}); }); But what happens if I write an `AnalysisB` that wants to depend on this state? The above analysis will need to know about that. This is why dependency management is the responsibility of the solver (and the states combined). Individual analyses can ask the solver, "hey, when this state gets update, invoke me again" or, "hey, when you get updated, invoke me on the users of the value". There are also performance implications. The dependents can be stored on the solver, but I moved them to the analysis state class to skip a map lookup when `propagateIfChanged` is called (which also improves cache locality). And `onUpdate` exists so that states can implement dependency logic outside of the big hash map, e.g. by using use-def chains. This results in significant performance boosts to constant propagation so that it matches the performance (within 5%, despite the analysis being split in two) of the current implementation. Separating the state itself and the dependency logic into separate classes would mean that both would have to be overwritten... I would agree that the API surface isn't perfect. `addDependency` is a function in `DataFlowSolver` but it just accesses private members of` AnalysisState`. That's a minor change and I should probably do that.

Herald added a subscriber: Peiming. · View Herald TranscriptJun 9 2022, 12:02 PM

(The reason I didn't have a backreference to DataFlowSolver in each AnalysisState instead of making the classes friends with each other is because I didn't want to increase the size of AnalysisState, but I haven't checked the memory usage on large programs yet so this point is a little moot).

Mogball marked an inline comment as done.Jun 9 2022, 12:45 PM

I think this is in a good enough state that we can land and iterate on in-tree.

mlir/lib/Analysis/DataFlowFramework.cpp
14	Setting aside how horribly broken of a behavior that is (that isn't your fault), I support your original patch of having a wrapper DEBUG macro. Having to worry about differences between NDEBUG/LLVM_ENABLE_ABI_BREAKING_CHECKS is really hacky....
54	nit: Can you put comment blocks between the different class implementations? Makes the file a little easier to read.

review comments

phisiart accepted this revision.Jun 13 2022, 3:03 PM

This revision is now accepted and ready to land.Jun 13 2022, 3:03 PM

Harbormaster completed remote builds in B169575: Diff 436579.Jun 13 2022, 5:10 PM

fix windows

Harbormaster completed remote builds in B169623: Diff 436655.Jun 13 2022, 10:43 PM

windows

Harbormaster completed remote builds in B169626: Diff 436658.Jun 13 2022, 11:07 PM

windows

Harbormaster completed remote builds in B169628: Diff 436662.Jun 13 2022, 11:35 PM

make AnalysisState and DataFlowAnalysis constructors public to avoid confusion
with mscv inheritance rules

Harbormaster completed remote builds in B169629: Diff 436665.Jun 14 2022, 12:02 AM

This revision was landed with ongoing or failed builds.Jun 14 2022, 9:54 AM

Closed by commit rG9dea11728340: [mlir] Add a generic data-flow analysis framework (authored by Mogball). · Explain Why

This revision was automatically updated to reflect the committed changes.

Mogball added a commit: rG9dea11728340: [mlir] Add a generic data-flow analysis framework.

frgossen added a reverting change: rGa6fa12ab3b47: Revert "[mlir] Add a generic data-flow analysis framework".Jun 14 2022, 2:14 PM

Mogball added a commit: rGead75d9434ec: (Reland)[mlir] Add a generic data-flow analysis framework.Jun 14 2022, 2:33 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Analysis/

DataFlowFramework.h

454 lines

lib/

Analysis/

CMakeLists.txt

1 line

DataFlowFramework.cpp

161 lines

test/

Analysis/

test-foo-analysis.mlir

95 lines

lib/

Analysis/

CMakeLists.txt

1 line

TestDataFlowFramework.cpp

188 lines

tools/

mlir-opt/

mlir-opt.cpp

2 lines

utils/

bazel/

llvm-project-overlay/

mlir/

BUILD.bazel

5 lines

test/

BUILD.bazel

1 line

Diff 436665

mlir/include/mlir/Analysis/DataFlowFramework.h

This file was added.

//===- DataFlowFramework.h - A generic framework for data-flow analysis ---===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// This file defines a generic framework for writing data-flow analysis in MLIR.

// The framework consists of a solver, which runs the fixed-point iteration and

// manages analysis dependencies, and a data-flow analysis class used to

// implement specific analyses.

rriddleUnsubmitted

Done

//===----------------------------------------------------------------------===//

// This file defines a generic framework for writing data-flow analysis in MLIR.

- // The framework consists of a solver that runs the fixed-point iteration and

- // manages analysis dependencies and a data-flow analysis class for implementing

+ // The framework consists of a solver, that runs the fixed-point iteration and

+ // manages analysis dependencies, and a data-flow analysis class for implementing

// specific analyses.

//===----------------------------------------------------------------------===//

rriddle:

MogballAuthorUnsubmitted

Done

https://jakubmarian.com/comma-before-that-and-which/

Odd to be digging into a grammar debate here... but I've seen this mistake a few times in the code base's comments :P

Mogball: https://jakubmarian.com/comma-before-that-and-which/ Odd to be digging into a grammar debate…

rriddleUnsubmitted

Done

Not sure that applies here given that you could wrap that portion within parentheses and it would make more sense. The sentence as written has a run-on feel of "and .. and" which apply to different parts of the sentence. Can you reword the sentence then? Right now it's confusing to read.

rriddle: Not sure that applies here given that you could wrap that portion within parentheses and it…

//===----------------------------------------------------------------------===//

#ifndef MLIR_ANALYSIS_DATAFLOWFRAMEWORK_H

#define MLIR_ANALYSIS_DATAFLOWFRAMEWORK_H

#include "mlir/Analysis/DataFlowAnalysis.h"

#include "mlir/IR/Operation.h"

#include "mlir/Support/StorageUniquer.h"

#include "llvm/ADT/SetVector.h"

#include "llvm/Support/TypeName.h"

#include <queue>

namespace mlir {

/// Forward declare the analysis state class.

class AnalysisState;

//===----------------------------------------------------------------------===//

// GenericProgramPoint

//===----------------------------------------------------------------------===//

mehdi_aminiUnsubmitted

Done

Is "lattice point" a well known term in the literature? Otherwise it seems to me that "program point" may be more descriptive here

mehdi_amini: Is "lattice point" a well known term in the literature? Otherwise it seems to me that "program…

/// Abstract class for generic program points. In classical data-flow analysis,

/// programs points represent positions in a program to which lattice elements

/// are attached. In sparse data-flow analysis, these can be SSA values, and in

/// dense data-flow analysis, these are the program points before and after

/// every operation.

///

/// In the general MLIR data-flow analysis framework, program points are an

/// extensible concept. Program points are uniquely identifiable objects to

/// which analysis states can be attached. The semantics of program points are

/// defined by the analyses that specify their transfer functions.

///

/// Program points are implemented using MLIR's storage uniquer framework and

/// type ID system to provide RTTI.

class GenericProgramPoint : public StorageUniquer::BaseStorage {

public:

virtual ~GenericProgramPoint();

rriddleUnsubmitted

Done

Can this constructor be protected? (You can't create an instance of this class anyways).

rriddle: Can this constructor be protected? (You can't create an instance of this class anyways).

/// Get the abstract program point's type identifier.

TypeID getTypeID() const { return typeID; }

/// Get a derived source location for the program point.

virtual Location getLoc() const = 0;

/// Print the program point.

virtual void print(raw_ostream &os) const = 0;

protected:

/// Create an abstract program point with type identifier.

explicit GenericProgramPoint(TypeID typeID) : typeID(typeID) {}

private:

/// The type identifier of the program point.

TypeID typeID;

};

phisiartUnsubmitted

Done

When reading the code, I find it confusing that sometimes it's in ProgramPoint::states and sometimes it's in DataFlowSolver::analysisStates.

Can we unify to DataFlowSolver::analysisStates?

phisiart: When reading the code, I find it confusing that sometimes it's in `ProgramPoint::states` and…

MogballAuthorUnsubmitted

Done

I did this for performance reasons. Let me check again what the difference is.

Mogball: I did this for performance reasons. Let me check again what the difference is.

MogballAuthorUnsubmitted

Done

I'll revert this back to what it was before (as you suggested). I made some recent changes to the sparse analysis that makes this optimization redundant. I checked that, as things are now, it's no longer needed.

Mogball: I'll revert this back to what it was before (as you suggested). I made some recent changes to…

//===----------------------------------------------------------------------===//

// GenericProgramPointBase

//===----------------------------------------------------------------------===//

/// Base class for generic program points based on a concrete program point

/// type and a content key. This class defines the common methods required for

/// operability with the storage uniquer framework.

///

/// The provided key type uniquely identifies the concrete program point

/// instance and are the data members of the class.

template <typename ConcreteT, typename Value>

class GenericProgramPointBase : public GenericProgramPoint {

phisiartUnsubmitted

Done

I checked your other code, and BaseT seems to always be ProgramPoint. Is it possible for BaseT to be something else? If not, can we let ProgramPointBase always inherit ProgramPoint?

phisiart: I checked your other code, and `BaseT` seems to always be `ProgramPoint`. Is it possible for…

MogballAuthorUnsubmitted

Done

This is a convention for storage uniquer types, but I'll make the change as I can't imagine scenarios in which the base type would be anything else.

Mogball: This is a convention for storage uniquer types, but I'll make the change as I can't imagine…

public:

/// The concrete key type used by the storage uniquer. This class is uniqued

/// by its contents.

using KeyTy = Value;

/// Alias for the base class.

using Base = GenericProgramPointBase<ConcreteT, Value>;

/// Construct an instance of the program point using the provided value and

/// the type ID of the concrete type.

template <typename ValueT>

explicit GenericProgramPointBase(ValueT &&value)

: GenericProgramPoint(TypeID::get<ConcreteT>()),

value(std::forward<ValueT>(value)) {}

/// Get a uniqued instance of this program point class with the given

/// arguments.

template <typename... Args>

static ConcreteT *get(StorageUniquer &uniquer, Args &&...args) {

return uniquer.get<ConcreteT>(/*initFn=*/{}, std::forward<Args>(args)...);

}

/// Allocate space for a program point and construct it in-place.

template <typename ValueT>

static ConcreteT *construct(StorageUniquer::StorageAllocator &alloc,

ValueT &&value) {

return new (alloc.allocate<ConcreteT>())

ConcreteT(std::forward<ValueT>(value));

}

/// Two program points are equal if their values are equal.

bool operator==(const Value &value) const { return this->value == value; }

/// Provide LLVM-style RTTI using type IDs.

static bool classof(const GenericProgramPoint *point) {

return point->getTypeID() == TypeID::get<ConcreteT>();

}

/// Get the contents of the program point.

phisiartUnsubmitted

Done

Throughout the class this is called key so maybe getKey()?

phisiart: Throughout the class this is called `key` so maybe `getKey()`?

const Value &getValue() const { return value; }

private:

phisiartUnsubmitted

Done

"key and contents" doesn't seem to match the code.

phisiart: "key and contents" doesn't seem to match the code.

MogballAuthorUnsubmitted

Done

It's a content key. KeyTy is a required typedef of StorageUniquer::BaseStorage. I'll rename some of the stuff to make this more obvious.

Mogball: It's a content key. `KeyTy` is a required typedef of `StorageUniquer::BaseStorage`. I'll rename…

/// The program point value.

Value value;

};

//===----------------------------------------------------------------------===//

// ProgramPoint

//===----------------------------------------------------------------------===//

rriddleUnsubmitted

Done

Drop the llvm:: here, same applies to various other things in this file (take a look at what's provided by LLVM.h)

rriddle: Drop the llvm:: here, same applies to various other things in this file (take a look at what's…

/// Fundamental IR components are supported as first-class program points.

struct ProgramPoint : public PointerUnion<GenericProgramPoint *, Operation *,

Value, Block *, Region *> {

using ParentTy = PointerUnion<GenericProgramPoint *, Operation *, Value,

Block *, Region *>;

/// Inherit constructors.

using ParentTy::PointerUnion;

/// Allow implicit conversion from the parent type.

ProgramPoint(ParentTy point = nullptr) : ParentTy(point) {}

/// Print the program point.

void print(raw_ostream &os) const;

/// Get the source location of the program point.

phisiartUnsubmitted

Done

I think the naming is a bit confusing here: Point and ProgramPoint have pretty much the same meaning.

Maybe ProgramPoint and AbstractCustomProgramPoint?

phisiart: I think the naming is a bit confusing here: Point and ProgramPoint have pretty much the same…

MogballAuthorUnsubmitted

Done

I've named them ProgramPoint and GenericProgramPoint.

Mogball: I've named them `ProgramPoint` and `GenericProgramPoint`.

Location getLoc() const;

};

/// Forward declaration of the data-flow analysis class.

class DataFlowAnalysis;

//===----------------------------------------------------------------------===//

// DataFlowSolver

//===----------------------------------------------------------------------===//

/// The general data-flow analysis solver. This class is responsible for

/// orchestrating child data-flow analyses, running the fixed-point iteration

/// algorithm, managing analysis state and program point memory, and tracking

/// dependencies beteen analyses, program points, and analysis states.

///

/// Steps to run a data-flow analysis:

///

/// 1. Load and initialize children analyses. Children analyses are instantiated

/// in the solver and initialized, building their dependency relations.

/// 2. Configure and run the analysis. The solver invokes the children analyses

/// according to their dependency relations until a fixed point is reached.

/// 3. Query analysis state results from the solver.

///

/// TODO: Optimize the internal implementation of the solver.

class DataFlowSolver {

mehdi_aminiUnsubmitted

Done

#if LLVM_ENABLE_ABI_BREAKING_CHECKS

mehdi_amini: `#if LLVM_ENABLE_ABI_BREAKING_CHECKS`

public:

/// Load an analysis into the solver. Return the analysis instance.

template <typename AnalysisT, typename... Args>

AnalysisT *load(Args &&...args);

/// Initialize the children analyses starting from the provided top-level

/// operation and run the analysis until fixpoint.

LogicalResult initializeAndRun(Operation *top);

/// Lookup an analysis state for the given program point. Returns null if one

/// does not exist.

template <typename StateT, typename PointT>

const StateT *lookupState(PointT point) const {

auto it = analysisStates.find({point, TypeID::get<StateT>()});

mehdi_aminiUnsubmitted

Done

Why not PointT *point ?

mehdi_amini: Why not `PointT *point` ?

if (it == analysisStates.end())

return nullptr;

return static_cast<const StateT *>(it->second.get());

}

/// Get a uniqued program point instance. If one is not present, it is

/// created with the provided arguments.

template <typename PointT, typename... Args>

PointT *getProgramPoint(Args &&...args) {

return PointT::get(uniquer, std::forward<Args>(args)...);

}

/// A work item on the solver queue is a program point, child analysis pair.

/// Each item is processed by invoking the child analysis at the program

/// point.

using WorkItem = std::pair<ProgramPoint, DataFlowAnalysis *>;

/// Push a work item onto the worklist.

void enqueue(WorkItem item) { worklist.push(std::move(item)); }

rriddleUnsubmitted

Done

The use of draw here is kind of confusing, is this adding a dependency?

rriddle: The use of `draw` here is kind of confusing, is this adding a dependency?

protected:

/// Get the state associated with the given program point. If it does not

/// exist, create an uninitialized state.

template <typename StateT, typename PointT>

StateT *getOrCreateState(PointT point);

/// Propagate an update to an analysis state if it changed by pushing

/// dependent work items to the back of the queue.

void propagateIfChanged(AnalysisState *state, ChangeResult changed);

/// Add a dependency to an analysis state on a child analysis and program

/// point. If the state is updated, the child analysis must be invoked on the

/// given program point again.

void addDependency(AnalysisState *state, DataFlowAnalysis *analysis,

ProgramPoint point);

private:

/// The solver's work queue. Work items can be inserted to the front of the

/// queue to be processed greedily, speeding up computations that otherwise

/// quickly degenerate to quadratic due to propagation of state updates.

std::queue<WorkItem> worklist;

/// Type-erased instances of the children analyses.

SmallVector<std::unique_ptr<DataFlowAnalysis>> childAnalyses;

/// The storage uniquer instance that owns the memory of the allocated program

/// points.

StorageUniquer uniquer;

/// A type-erased map of program points to associated analysis states for

/// first-class program points.

DenseMap<std::pair<ProgramPoint, TypeID>, std::unique_ptr<AnalysisState>>

analysisStates;

/// Allow the base child analysis class to access the internals of the solver.

friend class DataFlowAnalysis;

};

//===----------------------------------------------------------------------===//

// AnalysisState

//===----------------------------------------------------------------------===//

rriddleUnsubmitted

Done

Can we make this protected?

rriddle: Can we make this protected?

/// Base class for generic analysis states. Analysis states contain data-flow

/// information that are attached to program points and which evolve as the

/// analysis iterates.

///

/// This class places no restrictions on the semantics of analysis states beyond

/// these requirements.

///

/// 1. Querying the state of a program point prior to visiting that point

/// results in uninitialized state. Analyses must be aware of unintialized

/// states.

/// 2. Analysis states can reach fixpoints, where subsequent updates will never

/// trigger a change in the state.

/// 3. Analysis states that are uninitialized can be forcefully initialized to a

/// default value.

class AnalysisState {

public:

virtual ~AnalysisState();

/// Create the analysis state at the given program point.

AnalysisState(ProgramPoint point) : point(point) {}

/// Returns true if the analysis state is uninitialized.

virtual bool isUninitialized() const = 0;

/// Force an uninitialized analysis state to initialize itself with a default

phisiartUnsubmitted

Done

Can we make the ownership clearer:

DenseMap<std::pair<Point, TypeID>, std::unique_ptr<AnalysisState>> analysisStates;

I understand that you might not want ProgramPoint::states to own these AnalysisStates, so like what the other comment on ProgramPoint::states suggests, I would recommend unifying to DataFlowSolver::AnalysisState.

phisiart: Can we make the ownership clearer: ``` DenseMap<std::pair<Point, TypeID>, std…

/// value.

virtual ChangeResult defaultInitialize() = 0;

/// Print the contents of the analysis state.

virtual void print(raw_ostream &os) const = 0;

protected:

/// This function is called by the solver when the analysis state is updated

/// to optionally enqueue more work items. For example, if a state tracks

/// dependents through the IR (e.g. use-def chains), this function can be

/// implemented to push those dependents on the worklist.

virtual void onUpdate(DataFlowSolver *solver) const {}

/// The dependency relations originating from this analysis state. An entry

/// `state -> (analysis, point)` is created when `analysis` queries `state`

/// when updating `point`.

///

rriddleUnsubmitted

Done

Is the use of child necessary here? I would assume that we can have some analyses that are "parents" in some situations, and "children" in another.

rriddle: Is the use of child necessary here? I would assume that we can have some analyses that are…

/// When this state is updated, all dependent child analysis invocations are

/// pushed to the back of the queue. Use a `SetVector` to keep the analysis

/// deterministic.

///

/// Store the dependents on the analysis state for efficiency.

SetVector<DataFlowSolver::WorkItem> dependents;

rriddleUnsubmitted

Done

/// define explicit transfer functions between input states and output states.

- /// But in this framework, the dependency graph can change during the analysis.

- /// And the transfer functions are opaque in that the solver doesn't know what

+ /// In this framework, however, the dependency graph can change during the analysis,

+ /// and transfer functions are opaque such that the solver doesn't know what

/// states calling `visit` on an analysis will be updated. This allows multiple

rriddle:

phisiartUnsubmitted

Done

I've been thinking about how to get rid of the need for propagateIfChanged(). One way to do it is to reintroduce the separation of "LatticeValue" and "LatticeElement", and here let's call them "AnalysisState" and "AnalysisStateElement" (we should try to find a better name than "Element"...).

// User should override this.
class AnalysisState {
public:
  virtual ~AnalysisState();
  virtual bool isUninitialized() const = 0;
  virtual ChangeResult defaultInitialize() = 0;
  virtual void print(raw_ostream &os) const = 0;

protected:
  AnalysisState(ProgramPoint point) : point(point) {}
  StringRef debugName;
};

// User should not override this.
// AnalysisStateElement (instead of AnalysisState) is what's stored in DataFlowSolver::analysisStates.
template <typename State>
class AnalysisStateElement {
 public:
  // Read-only.
  const State &get() const {
    return state;
  }

  // User must return a ChangeResult, and the API does the
  // "propagateIfChanged".
  //
  // Usage:
  // update([&](State *state, Set<WorkItem> *added_dependees) {
  //   ...
  // });
  void update(function_ref<ChangeResult(State *, Set<WorkItem> *)> update_func) {
    SetVector<WorkItem> added_dependees;
    auto changed = update_func(&state, &added_dependees);
    dependees.insert(added_dependees.begin(), added_dependees.end());

    if (changed) {
      for (auto &dependee : dependees) {
        solver.enqueue(dependee);
      }
    }
  }

 private:
  ProgramPoint point;
  State state;
  SetVector<WorkItem> dependees;
  DataFlowSolver &solver;
};

This would address the following issues at the same time:

Currently AnalysisState is a somewhat awkward abstraction: it contains logic on dependency management. I think that AnalysisState should be a standalone class that only represents the stored state itself (i.e. the lattice value); all dependency management logic is considered part of the analysis and should appear in DataFlowAnalysis which the user overrides.

By introducing AnalysisStateElement we force all state updates to go through a single entry point (update() in the example code I wrote). This way we don't need to expose propagateIfChanged and the user will never forget to call it.

From a reader's perspective, currently AnalysisState is heavily coupled with DataFlowSolver, which is a big class. In particular, the reader must understand what they are expected to do with DataFlowSolver in onUpdate. By creating a separate AnalysisStateElement class, AnalysisState becomes more readable.

AnalysisStateElement is not to be overridden by the user. In particular, this does mean that we can't have the custom onUpdate() anymore. In my opinion, such logic belongs to visit(): we shouldn't spread analysis logic in both DataFlowAnalysis::visit() and onUpdate().

phisiart: I've been thinking about how to get rid of the need for `propagateIfChanged()`. One way to do…

MogballAuthorUnsubmitted

Done

(Note: "dependee" is a misnomer, it should be "dependent"...)

I do appreciate that the separation of analysis state and element makes the API clearer, and I am aware that AnalysisState, DataFlowSolver, and DataFlowAnalysis are all tightly coupled, but there is a major problem with this approach.

update_func is not going to know what the dependents are. The dependents could be analyses and states of which the writer of update_func is not aware. This will make any analyses written in this style not be naturally composable. For a sparse analysis, this is straightforward:

getAnalysisElement<ConstantValue>(value).update(
  [&](ConstantValue *state, SetVector<WorkItem &> dependents) {
    do_update(state);
    for (Operation *user : value.getUsers()) dependents.insert({this, user});
});

But what happens if I write an AnalysisB that wants to depend on this state? The above analysis will need to know about that. This is why dependency management is the responsibility of the solver (and the states combined). Individual analyses can ask the solver, "hey, when this state gets update, invoke me again" or, "hey, when you get updated, invoke me on the users of the value".

There are also performance implications. The dependents can be stored on the solver, but I moved them to the analysis state class to skip a map lookup when propagateIfChanged is called (which also improves cache locality). And onUpdate exists so that states can implement dependency logic outside of the big hash map, e.g. by using use-def chains. This results in significant performance boosts to constant propagation so that it matches the performance (within 5%, despite the analysis being split in two) of the current implementation. Separating the state itself and the dependency logic into separate classes would mean that both would have to be overwritten...

I would agree that the API surface isn't perfect. addDependency is a function in DataFlowSolver but it just accesses private members of` AnalysisState`. That's a minor change and I should probably do that.

Mogball: (Note: "dependee" is a misnomer, it should be "dependent"...) I do appreciate that the…

/// The program point to which the state belongs.

ProgramPoint point;

rriddleUnsubmitted

Done

/// states calling `visit` on an analysis will be updated. This allows multiple

- /// analysis to plug in and provide values for the same state.

+ /// analyses to plug in and provide values for the same state.

///

/// Generally, when an analysis queries an uninitialized state, it is expected

rriddle:

#if LLVM_ENABLE_ABI_BREAKING_CHECKS

/// When compiling with debugging, keep a name for the analysis state.

StringRef debugName;

#endif // LLVM_ENABLE_ABI_BREAKING_CHECKS

/// Allow the framework to access the dependents.

friend class DataFlowSolver;

rriddleUnsubmitted

Done

Are these defined in this commit? I can't find what these are supposed to be referencing.

rriddle: Are these defined in this commit? I can't find what these are supposed to be referencing.

MogballAuthorUnsubmitted

Done

soon^TM

Mogball: soon^TM

};

//===----------------------------------------------------------------------===//

// DataFlowAnalysis

//===----------------------------------------------------------------------===//

/// Base class for all data-flow analyses. A child analysis is expected to build

/// an initial dependency graph (and optionally provide an initial state) when

/// initialized and define transfer functions when visiting program points.

///

/// In classical data-flow analysis, the dependency graph is fixed and analyses

/// define explicit transfer functions between input states and output states.

rriddleUnsubmitted

Done

Could we just call this initialize?

rriddle: Could we just call this `initialize`?

/// In this framework, however, the dependency graph can change during the

/// analysis, and transfer functions are opaque such that the solver doesn't

/// know what states calling `visit` on an analysis will be updated. This allows

/// multiple analyses to plug in and provide values for the same state.

///

/// Generally, when an analysis queries an uninitialized state, it is expected

/// to "bail out", i.e., not provide any updates. When the value is initialized,

/// the solver will re-invoke the analysis. If the solver exhausts its worklist,

rriddleUnsubmitted

Done

/// The function is expected to create dependencies on queried states and

- /// propagate updates on updated states. A dependency can be created by

+ /// propagate updates on changed states. A dependency can be created by

/// calling `drawDependency` between the input state and a program point,

? To avoid "update .. update"

rriddle: ? To avoid "update .. update"

/// however, and there are still uninitialized states, the solver "nudges" the

/// analyses by default-initializing those states.

class DataFlowAnalysis {

public:

virtual ~DataFlowAnalysis();

/// Create an analysis with a reference to the parent solver.

explicit DataFlowAnalysis(DataFlowSolver &solver);

/// Initialize the analysis from the provided top-level operation by building

/// an initial dependency graph between all program points of interest. This

/// can be implemented by calling `visit` on all program points of interest

/// below the top-level operation.

///

/// An analysis can optionally provide initial values to certain analysis

/// states to influence the evolution of the analysis.

virtual LogicalResult initialize(Operation *top) = 0;

/// Visit the given program point. This function is invoked by the solver on

/// this analysis with a given program point when a dependent analysis state

/// is updated. The function is similar to a transfer function; it queries

/// certain analysis states and sets other states.

///

/// The function is expected to create dependencies on queried states and

/// propagate updates on changed states. A dependency can be created by

/// calling `addDependency` between the input state and a program point,

/// indicating that, if the state is updated, the solver should invoke `solve`

/// on the program point. The dependent point does not have to be the same as

/// the provided point. An update to a state is propagated by calling

/// `propagateIfChange` on the state. If the state has changed, then all its

/// dependents are placed on the worklist.

///

/// The dependency graph does not need to be static. Each invocation of

/// `visit` can add new dependencies, but these dependecies will not be

/// dynamically added to the worklist because the solver doesn't know what

/// will provide a value for then.

virtual LogicalResult visit(ProgramPoint point) = 0;

protected:

/// Create a dependency between the given analysis state and program point

/// on this analysis.

void addDependency(AnalysisState *state, ProgramPoint point);

/// Propagate an update to a state if it changed.

void propagateIfChanged(AnalysisState *state, ChangeResult changed);

/// Register a custom program point class.

template <typename PointT>

void registerPointKind() {

solver.uniquer.registerParametricStorageType<PointT>();

}

/// Get or create a custom program point.

template <typename PointT, typename... Args>

PointT *getProgramPoint(Args &&...args) {

return solver.getProgramPoint<PointT>(std::forward<Args>(args)...);

}

/// Get the analysis state assiocated with the program point. The returned

/// state is expected to be "write-only", and any updates need to be

/// propagated by `propagateIfChanged`.

template <typename StateT, typename PointT>

StateT *getOrCreate(PointT point) {

return solver.getOrCreateState<StateT>(point);

}

/// Get a read-only analysis state for the given point and create a dependency

/// on `dependent`. If the return state is updated elsewhere, this analysis is

/// re-invoked on the dependent.

template <typename StateT, typename PointT>

const StateT *getOrCreateFor(ProgramPoint dependent, PointT point) {

StateT *state = getOrCreate<StateT>(point);

addDependency(state, dependent);

return state;

}

#if LLVM_ENABLE_ABI_BREAKING_CHECKS

/// When compiling with debugging, keep a name for the analyis.

StringRef debugName;

#endif // LLVM_ENABLE_ABI_BREAKING_CHECKS

private:

/// The parent data-flow solver.

DataFlowSolver &solver;

/// Allow the data-flow solver to access the internals of this class.

friend class DataFlowSolver;

};

template <typename AnalysisT, typename... Args>

AnalysisT *DataFlowSolver::load(Args &&...args) {

childAnalyses.emplace_back(new AnalysisT(*this, std::forward<Args>(args)...));

#if LLVM_ENABLE_ABI_BREAKING_CHECKS

childAnalyses.back().get()->debugName = llvm::getTypeName<AnalysisT>();

#endif // LLVM_ENABLE_ABI_BREAKING_CHECKS

return static_cast<AnalysisT *>(childAnalyses.back().get());

}

template <typename StateT, typename PointT>

StateT *DataFlowSolver::getOrCreateState(PointT point) {

std::unique_ptr<AnalysisState> &state =

analysisStates[{ProgramPoint(point), TypeID::get<StateT>()}];

if (!state) {

state = std::unique_ptr<StateT>(new StateT(point));

#if LLVM_ENABLE_ABI_BREAKING_CHECKS

state->debugName = llvm::getTypeName<StateT>();

#endif // LLVM_ENABLE_ABI_BREAKING_CHECKS

}

return static_cast<StateT *>(state.get());

}

inline raw_ostream &operator<<(raw_ostream &os, const AnalysisState &state) {

state.print(os);

return os;

}

inline raw_ostream &operator<<(raw_ostream &os, ProgramPoint point) {

point.print(os);

return os;

}

} // end namespace mlir

namespace llvm {

/// Allow hashing of program points.

template <>

struct DenseMapInfo<mlir::ProgramPoint>

: public DenseMapInfo<mlir::ProgramPoint::ParentTy> {};

} // end namespace llvm

#endif // MLIR_ANALYSIS_DATAFLOWFRAMEWORK_H

mlir/lib/Analysis/CMakeLists.txt

Show All 10 Lines	set(LLVM_OPTIONAL_SOURCES
AliasAnalysis/LocalAliasAnalysis.cpp		AliasAnalysis/LocalAliasAnalysis.cpp
)		)

add_mlir_library(MLIRAnalysis		add_mlir_library(MLIRAnalysis
AliasAnalysis.cpp		AliasAnalysis.cpp
BufferViewFlowAnalysis.cpp		BufferViewFlowAnalysis.cpp
CallGraph.cpp		CallGraph.cpp
DataFlowAnalysis.cpp		DataFlowAnalysis.cpp
		DataFlowFramework.cpp
DataLayoutAnalysis.cpp		DataLayoutAnalysis.cpp
IntRangeAnalysis.cpp		IntRangeAnalysis.cpp
Liveness.cpp		Liveness.cpp
SliceAnalysis.cpp		SliceAnalysis.cpp

AliasAnalysis/LocalAliasAnalysis.cpp		AliasAnalysis/LocalAliasAnalysis.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
Show All 17 Lines

mlir/lib/Analysis/DataFlowFramework.cpp

This file was added.

//===- DataFlowFramework.cpp - A generic framework for data-flow analysis -===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

#include "mlir/Analysis/DataFlowFramework.h"

#include "llvm/Support/Debug.h"

#define DEBUG_TYPE "dataflow"

#if LLVM_ENABLE_ABI_BREAKING_CHECKS

#define DATAFLOW_DEBUG(X) LLVM_DEBUG(X)

rriddleUnsubmitted

Done

Why? Please prefer using LLVM_DEBUG directly instead.

rriddle: Why? Please prefer using `LLVM_DEBUG` directly instead.

MogballAuthorUnsubmitted

Done

It was a little annoying wrapping it with LLVM_ENABLE_ABI_BREAKING_CHECKS everywhere...

Mogball: It was a little annoying wrapping it with `LLVM_ENABLE_ABI_BREAKING_CHECKS` everywhere...

rriddleUnsubmitted

Done

Not sure what you mean, this is a .cpp you don't need to use LLVM_ENABLE_ABI_BREAKING_CHECKS anywhere...

rriddle: Not sure what you mean, this is a .cpp you don't need to use LLVM_ENABLE_ABI_BREAKING_CHECKS…

rriddleUnsubmitted

Done

At least, I'm not sure what situations we have that set NDEBUG but not LLVM_ENABLE_ABI_BREAKING_CHECKS.

rriddle: At least, I'm not sure what situations we have that set NDEBUG but not…

MogballAuthorUnsubmitted

Done

Internally at google, LLVM_ENABLE_ABI_BREAKING_CHECKS is always turned off, even when NDEBUG is not set (the two debugName fields are not compiled when LLVM_ENABLE_ABI_BREAKING_CHECKS is not set).

Alternatively, I can make the debugName fields always compile.

Mogball: Internally at google, LLVM_ENABLE_ABI_BREAKING_CHECKS is always turned off, even when NDEBUG is…

rriddleUnsubmitted

Done

Setting aside how horribly broken of a behavior that is (that isn't your fault), I support your original patch of having a wrapper DEBUG macro. Having to worry about differences between NDEBUG/LLVM_ENABLE_ABI_BREAKING_CHECKS is really hacky....

rriddle: Setting aside how horribly broken of a behavior that is (that isn't your fault), I support your…

#else

#define DATAFLOW_DEBUG(X)

#endif // LLVM_ENABLE_ABI_BREAKING_CHECKS

using namespace mlir;

rriddleUnsubmitted

Done

Please prefer using namespace mlir; instead.

rriddle: Please prefer `using namespace mlir;` instead.

//===----------------------------------------------------------------------===//

// GenericProgramPoint

//===----------------------------------------------------------------------===//

GenericProgramPoint::~GenericProgramPoint() = default;

//===----------------------------------------------------------------------===//

// AnalysisState

//===----------------------------------------------------------------------===//

AnalysisState::~AnalysisState() = default;

//===----------------------------------------------------------------------===//

// ProgramPoint

//===----------------------------------------------------------------------===//

void ProgramPoint::print(raw_ostream &os) const {

mehdi_aminiUnsubmitted

Done

Nit: Spurious braces multiple times, here and elsewhere.

mehdi_amini: Nit: Spurious braces multiple times, here and elsewhere.

if (isNull()) {

os << "<NULL POINT>";

return;

}

if (auto *programPoint = dyn_cast<GenericProgramPoint *>())

return programPoint->print(os);

if (auto *op = dyn_cast<Operation *>())

return op->print(os);

if (auto value = dyn_cast<Value>())

return value.print(os);

if (auto *block = dyn_cast<Block *>())

return block->print(os);

auto *region = get<Region *>();

os << "{\n";

for (Block &block : *region) {

block.print(os);

os << "\n";

rriddleUnsubmitted

Done

nit: Can you put comment blocks between the different class implementations? Makes the file a little easier to read.

rriddle: nit: Can you put comment blocks between the different class implementations? Makes the file a…

}

os << "}";

}

Location ProgramPoint::getLoc() const {

if (auto *programPoint = dyn_cast<GenericProgramPoint *>())

return programPoint->getLoc();

if (auto *op = dyn_cast<Operation *>())

return op->getLoc();

if (auto value = dyn_cast<Value>())

return value.getLoc();

if (auto *block = dyn_cast<Block *>())

return block->getParent()->getLoc();

return get<Region *>()->getLoc();

}

//===----------------------------------------------------------------------===//

// DataFlowSolver

//===----------------------------------------------------------------------===//

LogicalResult DataFlowSolver::initializeAndRun(Operation *top) {

// Initialize the analyses.

for (DataFlowAnalysis &analysis : llvm::make_pointee_range(childAnalyses)) {

DATAFLOW_DEBUG(llvm::dbgs()

<< "Priming analysis: " << analysis.debugName << "\n");

if (failed(analysis.initialize(top)))

return failure();

}

// Run the analysis until fixpoint.

ProgramPoint point;

DataFlowAnalysis *analysis;

do {

// Exhaust the worklist.

while (!worklist.empty()) {

std::tie(point, analysis) = worklist.front();

worklist.pop();

DATAFLOW_DEBUG(llvm::dbgs() << "Invoking '" << analysis->debugName

<< "' on: " << point << "\n");

if (failed(analysis->visit(point)))

return failure();

}

// "Nudge" the state of the analysis by forcefully initializing states that

// are still uninitialized. All uninitialized states in the graph can be

// initialized in any order because the analysis reached fixpoint, meaning

// that there are no work items that would have further nudged the analysis.

for (AnalysisState &state :

llvm::make_pointee_range(llvm::make_second_range(analysisStates))) {

if (!state.isUninitialized())

continue;

DATAFLOW_DEBUG(llvm::dbgs() << "Default initializing " << state.debugName

<< " of " << state.point << "\n");

propagateIfChanged(&state, state.defaultInitialize());

}

// Iterate until all states are in some initialized state and the worklist

// is exhausted.

} while (!worklist.empty());

return success();

}

void DataFlowSolver::propagateIfChanged(AnalysisState *state,

ChangeResult changed) {

if (changed == ChangeResult::Change) {

DATAFLOW_DEBUG(llvm::dbgs() << "Propagating update to " << state->debugName

<< " of " << state->point << "\n"

<< "Value: " << *state << "\n");

for (const WorkItem &item : state->dependents)

enqueue(item);

rriddleUnsubmitted

Done

(void)inserted;

- if (inserted) {

- DATAFLOW_DEBUG(llvm::dbgs()

+ DATAFLOW_DEBUG({

+ if (inserted) {

+ llvm::dbgs()

<< "Creating dependency between " << state->debugName

<< " of " << state->point << "\nand " << analysis->debugName

<< " on " << point << "\n");

- }

+ }

+ });

}

DataFlowAnalysis::~DataFlowAnalysis() = default;

rriddle:

state->onUpdate(this);

}

void DataFlowSolver::addDependency(AnalysisState *state,

DataFlowAnalysis *analysis,

ProgramPoint point) {

auto inserted = state->dependents.insert({point, analysis});

(void)inserted;

DATAFLOW_DEBUG({

if (inserted) {

llvm::dbgs() << "Creating dependency between " << state->debugName

<< " of " << state->point << "\nand " << analysis->debugName

<< " on " << point << "\n";

}

});

}

rriddleUnsubmitted

Done

Can you just inline these into the header?

rriddle: Can you just inline these into the header?

//===----------------------------------------------------------------------===//

// DataFlowAnalysis

//===----------------------------------------------------------------------===//

DataFlowAnalysis::~DataFlowAnalysis() = default;

DataFlowAnalysis::DataFlowAnalysis(DataFlowSolver &solver) : solver(solver) {}

void DataFlowAnalysis::addDependency(AnalysisState *state, ProgramPoint point) {

solver.addDependency(state, this, point);

}

void DataFlowAnalysis::propagateIfChanged(AnalysisState *state,

ChangeResult changed) {

solver.propagateIfChanged(state, changed);

}

mlir/test/Analysis/test-foo-analysis.mlir

This file was added.

				// RUN: mlir-opt -split-input-file -pass-pipeline='func.func(test-foo-analysis)' %s 2>&1 \| FileCheck %s

				// CHECK-LABEL: function: @test_default_init
				func.func @test_default_init() -> () {
				// CHECK: a -> 0
				"test.foo"() {tag = "a"} : () -> ()
				return
				}

				// -----

				// CHECK-LABEL: function: @test_one_join
				func.func @test_one_join() -> () {
				// CHECK: a -> 0
				"test.foo"() {tag = "a"} : () -> ()
				// CHECK: b -> 1
				"test.foo"() {tag = "b", foo = 1 : ui64} : () -> ()
				return
				}

				// -----

				// CHECK-LABEL: function: @test_two_join
				func.func @test_two_join() -> () {
				// CHECK: a -> 0
				"test.foo"() {tag = "a"} : () -> ()
				// CHECK: b -> 1
				"test.foo"() {tag = "b", foo = 1 : ui64} : () -> ()
				// CHECK: c -> 0
				"test.foo"() {tag = "c", foo = 1 : ui64} : () -> ()
				return
				}

				// -----

				// CHECK-LABEL: function: @test_fork
				func.func @test_fork() -> () {
				// CHECK: init -> 1
				"test.branch"() [^bb0, ^bb1] {tag = "init", foo = 1 : ui64} : () -> ()

				^bb0:
				// CHECK: a -> 3
				"test.branch"() [^bb2] {tag = "a", foo = 2 : ui64} : () -> ()

				^bb1:
				// CHECK: b -> 5
				"test.branch"() [^bb2] {tag = "b", foo = 4 : ui64} : () -> ()

				^bb2:
				// CHECK: end -> 6
				"test.foo"() {tag = "end"} : () -> ()
				return

				}

				// -----

				// CHECK-LABEL: function: @test_simple_loop
				func.func @test_simple_loop() -> () {
				// CHECK: init -> 1
				"test.branch"() [^bb0] {tag = "init", foo = 1 : ui64} : () -> ()

				^bb0:
				// CHECK: a -> 1
				"test.foo"() {tag = "a", foo = 3 : ui64} : () -> ()
				"test.branch"() [^bb0, ^bb1] : () -> ()

				^bb1:
				// CHECK: end -> 3
				"test.foo"() {tag = "end"} : () -> ()
				return
				}

				// -----

				// CHECK-LABEL: function: @test_double_loop
				func.func @test_double_loop() -> () {
				// CHECK: init -> 2
				"test.branch"() [^bb0] {tag = "init", foo = 2 : ui64} : () -> ()

				^bb0:
				// CHECK: a -> 1
				"test.foo"() {tag = "a", foo = 3 : ui64} : () -> ()
				"test.branch"() [^bb0, ^bb1] : () -> ()

				^bb1:
				// CHECK: b -> 4
				"test.foo"() {tag = "b", foo = 5 : ui64} : () -> ()
				"test.branch"() [^bb0, ^bb2] : () -> ()

				^bb2:
				// CHECK: end -> 4
				"test.foo"() {tag = "end"} : () -> ()
				return
				}

mlir/test/lib/Analysis/CMakeLists.txt

	# Exclude tests from libMLIR.so			# Exclude tests from libMLIR.so
	add_mlir_library(MLIRTestAnalysis			add_mlir_library(MLIRTestAnalysis
	TestAliasAnalysis.cpp			TestAliasAnalysis.cpp
	TestCallGraph.cpp			TestCallGraph.cpp
	TestDataFlow.cpp			TestDataFlow.cpp
				TestDataFlowFramework.cpp
	TestLiveness.cpp			TestLiveness.cpp
	TestMatchReduction.cpp			TestMatchReduction.cpp
	TestMemRefBoundCheck.cpp			TestMemRefBoundCheck.cpp
	TestMemRefDependenceCheck.cpp			TestMemRefDependenceCheck.cpp
	TestMemRefStrideCalculation.cpp			TestMemRefStrideCalculation.cpp
	TestSlice.cpp			TestSlice.cpp


	Show All 15 Lines

mlir/test/lib/Analysis/TestDataFlowFramework.cpp

This file was added.

				//===- TestDataFlowFramework.cpp - Test data-flow analysis framework ------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Analysis/DataFlowFramework.h"
				#include "mlir/Dialect/Func/IR/FuncOps.h"
				#include "mlir/Pass/Pass.h"

				using namespace mlir;

				namespace {
				/// This analysis state represents an integer that is XOR'd with other states.
				class FooState : public AnalysisState {
				public:
				MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(FooState)

				using AnalysisState::AnalysisState;

				/// Default-initialize the state to zero.
				ChangeResult defaultInitialize() override { return join(0); }

				/// Returns true if the state is uninitialized.
				bool isUninitialized() const override { return !state; }

				/// Print the integer value or "none" if uninitialized.
				void print(raw_ostream &os) const override {
				if (state)
				os << *state;
				else
				os << "none";
				}

				/// Join the state with another. If either is unintialized, take the
				/// initialized value. Otherwise, XOR the integer values.
				ChangeResult join(const FooState &rhs) {
				if (rhs.isUninitialized())
				return ChangeResult::NoChange;
				return join(*rhs.state);
				}
				ChangeResult join(uint64_t value) {
				if (isUninitialized()) {
				state = value;
				return ChangeResult::Change;
				}
				uint64_t before = *state;
				state = before ^ value;
				return before == *state ? ChangeResult::NoChange : ChangeResult::Change;
				}

				/// Set the value of the state directly.
				ChangeResult set(const FooState &rhs) {
				if (state == rhs.state)
				return ChangeResult::NoChange;
				state = rhs.state;
				return ChangeResult::Change;
				}

				/// Returns the integer value of the state.
				uint64_t getValue() const { return *state; }

				private:
				/// An optional integer value.
				Optional<uint64_t> state;
				};

				/// This analysis computes `FooState` across operations and control-flow edges.
				/// If an op specifies a `foo` integer attribute, the contained value is XOR'd
				/// with the value before the operation.
				class FooAnalysis : public DataFlowAnalysis {
				public:
				MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(FooAnalysis)

				using DataFlowAnalysis::DataFlowAnalysis;

				LogicalResult initialize(Operation *top) override;
				LogicalResult visit(ProgramPoint point) override;

				private:
				void visitBlock(Block *block);
				void visitOperation(Operation *op);
				};

				struct TestFooAnalysisPass
				: public PassWrapper<TestFooAnalysisPass, OperationPass<func::FuncOp>> {
				MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(TestFooAnalysisPass)

				StringRef getArgument() const override { return "test-foo-analysis"; }

				void runOnOperation() override;
				};
				} // namespace

				LogicalResult FooAnalysis::initialize(Operation *top) {
				if (top->getNumRegions() != 1)
				return top->emitError("expected a single region top-level op");

				// Initialize the top-level state.
				getOrCreate<FooState>(&top->getRegion(0).front())->join(0);

				// Visit all nested blocks and operations.
				for (Block &block : top->getRegion(0)) {
				visitBlock(&block);
				for (Operation &op : block) {
				if (op.getNumRegions())
				return op.emitError("unexpected op with regions");
				visitOperation(&op);
				}
				}
				return success();
				}

				LogicalResult FooAnalysis::visit(ProgramPoint point) {
				if (auto op = point.dyn_cast<Operation >()) {
				visitOperation(op);
				return success();
				}
				if (auto block = point.dyn_cast<Block >()) {
				visitBlock(block);
				return success();
				}
				return emitError(point.getLoc(), "unknown point kind");
				}

				void FooAnalysis::visitBlock(Block *block) {
				if (block->isEntryBlock()) {
				// This is the initial state. Let the framework default-initialize it.
				return;
				}
				FooState *state = getOrCreate<FooState>(block);
				ChangeResult result = ChangeResult::NoChange;
				for (Block *pred : block->getPredecessors()) {
				// Join the state at the terminators of all predecessors.
				const FooState *predState =
				getOrCreateFor<FooState>(block, pred->getTerminator());
				result \|= state->join(*predState);
				}
				propagateIfChanged(state, result);
				}

				void FooAnalysis::visitOperation(Operation *op) {
				FooState *state = getOrCreate<FooState>(op);
				ChangeResult result = ChangeResult::NoChange;

				// Copy the state across the operation.
				const FooState *prevState;
				if (Operation *prev = op->getPrevNode())
				prevState = getOrCreateFor<FooState>(op, prev);
				else
				prevState = getOrCreateFor<FooState>(op, op->getBlock());
				result \|= state->set(*prevState);

				// Modify the state with the attribute, if specified.
				if (auto attr = op->getAttrOfType<IntegerAttr>("foo")) {
				uint64_t value = attr.getUInt();
				result \|= state->join(value);
				}
				propagateIfChanged(state, result);
				}

				void TestFooAnalysisPass::runOnOperation() {
				func::FuncOp func = getOperation();
				DataFlowSolver solver;
				solver.load<FooAnalysis>();
				if (failed(solver.initializeAndRun(func)))
				return signalPassFailure();

				raw_ostream &os = llvm::errs();
				os << "function: @" << func.getSymName() << "\n";

				func.walk([&](Operation *op) {
				auto tag = op->getAttrOfType<StringAttr>("tag");
				if (!tag)
				return;
				const FooState *state = solver.lookupState<FooState>(op);
				assert(state && !state->isUninitialized());
				os << tag.getValue() << " -> " << state->getValue() << "\n";
				});
				}

				namespace mlir {
				namespace test {
				void registerTestFooAnalysisPass() { PassRegistration<TestFooAnalysisPass>(); }
				} // namespace test
				} // namespace mlir

mlir/tools/mlir-opt/mlir-opt.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
void registerTestGpuSerializeToHsacoPass();		void registerTestGpuSerializeToHsacoPass();
void registerTestDataFlowPass();		void registerTestDataFlowPass();
void registerTestDataLayoutQuery();		void registerTestDataLayoutQuery();
void registerTestDecomposeCallGraphTypes();		void registerTestDecomposeCallGraphTypes();
void registerTestDiagnosticsPass();		void registerTestDiagnosticsPass();
void registerTestDominancePass();		void registerTestDominancePass();
void registerTestDynamicPipelinePass();		void registerTestDynamicPipelinePass();
void registerTestExpandMathPass();		void registerTestExpandMathPass();
		void registerTestFooAnalysisPass();
void registerTestComposeSubView();		void registerTestComposeSubView();
void registerTestMultiBuffering();		void registerTestMultiBuffering();
void registerTestIntRangeInference();		void registerTestIntRangeInference();
void registerTestIRVisitorsPass();		void registerTestIRVisitorsPass();
void registerTestGenericIRVisitorsPass();		void registerTestGenericIRVisitorsPass();
void registerTestGenericIRVisitorsInterruptPass();		void registerTestGenericIRVisitorsInterruptPass();
void registerTestInterfaces();		void registerTestInterfaces();
void registerTestLinalgCodegenStrategy();		void registerTestLinalgCodegenStrategy();
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	#if MLIR_ROCM_CONVERSIONS_ENABLED
mlir::test::registerTestGpuSerializeToHsacoPass();		mlir::test::registerTestGpuSerializeToHsacoPass();
#endif		#endif
mlir::test::registerTestDecomposeCallGraphTypes();		mlir::test::registerTestDecomposeCallGraphTypes();
mlir::test::registerTestDataFlowPass();		mlir::test::registerTestDataFlowPass();
mlir::test::registerTestDataLayoutQuery();		mlir::test::registerTestDataLayoutQuery();
mlir::test::registerTestDominancePass();		mlir::test::registerTestDominancePass();
mlir::test::registerTestDynamicPipelinePass();		mlir::test::registerTestDynamicPipelinePass();
mlir::test::registerTestExpandMathPass();		mlir::test::registerTestExpandMathPass();
		mlir::test::registerTestFooAnalysisPass();
mlir::test::registerTestComposeSubView();		mlir::test::registerTestComposeSubView();
mlir::test::registerTestMultiBuffering();		mlir::test::registerTestMultiBuffering();
mlir::test::registerTestIntRangeInference();		mlir::test::registerTestIntRangeInference();
mlir::test::registerTestIRVisitorsPass();		mlir::test::registerTestIRVisitorsPass();
mlir::test::registerTestGenericIRVisitorsPass();		mlir::test::registerTestGenericIRVisitorsPass();
mlir::test::registerTestInterfaces();		mlir::test::registerTestInterfaces();
mlir::test::registerTestLinalgCodegenStrategy();		mlir::test::registerTestLinalgCodegenStrategy();
mlir::test::registerTestLinalgElementwiseFusion();		mlir::test::registerTestLinalgElementwiseFusion();
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

Show First 20 Lines • Show All 5,778 Lines • ▼ Show 20 Lines	cc_library(
name = "Analysis",		name = "Analysis",
srcs = glob(		srcs = glob(
[		[
"lib/Analysis/*.cpp",		"lib/Analysis/*.cpp",
"lib/Analysis/*.h",		"lib/Analysis/*.h",
"lib/Analysis//.cpp",		"lib/Analysis//.cpp",
"lib/Analysis//.h",		"lib/Analysis//.h",
],		],
exclude = [
"lib/Analysis/Vector*.cpp",
"lib/Analysis/Vector*.h",
],
),		),
hdrs = glob(		hdrs = glob(
[		[
"include/mlir/Analysis/*.h",		"include/mlir/Analysis/*.h",
"include/mlir/Analysis//.h",		"include/mlir/Analysis//.h",
],		],
exclude = ["include/mlir/Analysis/Vector*.h"],
),		),
includes = ["include"],		includes = ["include"],
deps = [		deps = [
":CallOpInterfaces",		":CallOpInterfaces",
":ControlFlowInterfaces",		":ControlFlowInterfaces",
":DataLayoutInterfaces",		":DataLayoutInterfaces",
":IR",		":IR",
":InferIntRangeInterface",		":InferIntRangeInterface",
▲ Show 20 Lines • Show All 3,363 Lines • Show Last 20 Lines

utils/bazel/llvm-project-overlay/mlir/test/BUILD.bazel

Show All 20 Lines	cc_library(
srcs = glob(["lib/Analysis/*.cpp"]),		srcs = glob(["lib/Analysis/*.cpp"]),
includes = ["lib/Dialect/Test"],		includes = ["lib/Dialect/Test"],
deps = [		deps = [
":TestDialect",		":TestDialect",
"//llvm:Support",		"//llvm:Support",
"//mlir:Affine",		"//mlir:Affine",
"//mlir:AffineAnalysis",		"//mlir:AffineAnalysis",
"//mlir:Analysis",		"//mlir:Analysis",
		"//mlir:FuncDialect",
"//mlir:IR",		"//mlir:IR",
"//mlir:MemRefDialect",		"//mlir:MemRefDialect",
"//mlir:Pass",		"//mlir:Pass",
"//mlir:Support",		"//mlir:Support",
],		],
)		)

td_library(		td_library(
▲ Show 20 Lines • Show All 632 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add a generic data-flow analysis frameworkClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 436665

mlir/include/mlir/Analysis/DataFlowFramework.h

mlir/lib/Analysis/CMakeLists.txt

mlir/lib/Analysis/DataFlowFramework.cpp

mlir/test/Analysis/test-foo-analysis.mlir

mlir/test/lib/Analysis/CMakeLists.txt

mlir/test/lib/Analysis/TestDataFlowFramework.cpp

mlir/tools/mlir-opt/mlir-opt.cpp

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

utils/bazel/llvm-project-overlay/mlir/test/BUILD.bazel

[mlir] Add a generic data-flow analysis framework
ClosedPublic