This is an archive of the discontinued LLVM Phabricator instance.

[mlir][mlir-query] Introduce mlir-query tool with autocomplete support
ClosedPublic

Authored by devajithvs on Jul 12 2023, 3:06 PM.

Details

Summary

This commit adds the initial version of the mlir-query tool, which leverages the pre-existing matchers defined in mlir/include/mlir/IR/Matchers.h

The tool provides the following set of basic queries:

hasOpAttrName(string)
hasOpName(string)
isConstantOp()
isNegInfFloat()
isNegZeroFloat()
isNonZero()
isOne()
isOneFloat()
isPosInfFloat()
isPosZeroFloat()
isZero()
isZeroFloat()

Diff Detail

Event Timeline

devajithvs created this revision.Jul 12 2023, 3:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2023, 3:06 PM
devajithvs edited the summary of this revision. (Show Details)Jul 12 2023, 3:14 PM
devajithvs published this revision for review.Jul 12 2023, 3:57 PM

Thanks for the contribution!

As a high-level remark, I would note that this is a bit monolithic where everything is in the tools, whereas in general we approach it from the point of view that the tool should be a minimal driver around a library implementation.

Thanks for the contribution!

As a high-level remark, I would note that this is a bit monolithic where everything is in the tools, whereas in general we approach it from the point of view that the tool should be a minimal driver around a library implementation.

Thanks Mehdi, I'll move all the implementation from the tools directory to keep it minimal, but where should it go?

Fix clang-format issue

I'll move all the implementation from the tools directory to keep it minimal, but where should it go?

A new library maybe?

Also we discussed the possibility to reuse PDLL: what's missing from PDLL to use it instead of a custom language here?

I'll move all the implementation from the tools directory to keep it minimal, but where should it go?

A new library maybe?

Also we discussed the possibility to reuse PDLL: what's missing from PDLL to use it instead of a custom language here?

I see that orthogonal: that's a different input language. The initial usage was similar to clang-query where one constructs a matcher incrementally for use in C++. So you'd built and could then pretty much copy and paste reuse. Useful to do with PDLL too but not replacement IMHO.

I wouldn't consider this orthogonal: this is a positioning of the tool, it's raison d'être here. I've never heard that the use case for this was "help the development of pattern matcher" for example. Is this really the main motivation? I'm questioning the need for this really.

I wouldn't consider this orthogonal: this is a positioning of the tool, it's raison d'être here. I've never heard that the use case for this was "help the development of pattern matcher" for example. Is this really the main motivation? I'm questioning the need for this really.

This is what I was doing when writing clang matchers quite a lot :) And a few blogs also point at it. While writing LLVM backends though it was more dumpr and equivalent to see trees, where I was simulating a workflow like this. I can't claim to have done a survey though.

I don't think writing PDL by hand is yet as concise or familiar as this. So even if PDL was used to implement some of these, I see plain C++ matchers as very easy way to add these and very useful. I think PDL matching for regions or variadics is not as good yet. With PDL as input one can reuse a parser though.

mlir/include/mlir/Tools/mlir-query/Matcher/Diagnostics.h
1

You'll need file type specifier here to differentiate between C & C++

jpienaar added inline comments.Jul 17 2023, 4:03 PM
mlir/include/mlir/Tools/mlir-query/Matcher/Diagnostics.h
14

MATCHER_DIAGNOSTICS ?

29

Wouldn't you need file too? Or is expectation here that all in one file as mlir doesn't have include? (although you could have .mlir file with locations associated)

50

Could you add a section comment per and sort underneath?

88

Prefer not to use unsigned to signal non-negative

135

Just print?

137

Do we need this vs a user using raw_string_ostream ? I haven't seen how often used (a "dump" variant that use print with llvm::errs() is useful for quick debugging too)

mlir/include/mlir/Tools/mlir-query/Matcher/Marshallers.h
31

You could use the c++17 nested namespace designators here

mlir/include/mlir/Tools/mlir-query/Matcher/MatchersInternal.h
1

Same here. re file type.

Also if this is an implementation, should this be in lib/ ?

26

Don't use a using statement in a header.

Move implementation to lib and minor changes following review

devajithvs added inline comments.Jul 17 2023, 6:04 PM
mlir/include/mlir/Tools/mlir-query/Matcher/Diagnostics.h
1

Amended all headers with C++ type specifier

14

Done

29

The diagnostic here is for detecting errors within the parser query, there won't be a file. Sorry for the misleading comment; I am updating it.

The locations here represent the positions where errors occur for the queries and not within MLIR itself. When it comes to printing the MLIR source location for matches, existing diagnostics in MLIR are used.

50

Added a comment per section and sorted the values underneath

88

Changed it here. Should other uses of unsigned be replaced with int as well?

135

Renamed.

137

Removed toString() function

mlir/include/mlir/Tools/mlir-query/Matcher/MatchersInternal.h
1

Moved all implementations to the lib folder and made the tools directory minimal.

26

Removed all llvm::dbgs

Update all namespaces to use the c++17 nested namespace designator

I wouldn't consider this orthogonal: this is a positioning of the tool, it's raison d'être here. I've never heard that the use case for this was "help the development of pattern matcher" for example. Is this really the main motivation? I'm questioning the need for this really.

This is what I was doing when writing clang matchers quite a lot :) And a few blogs also point at it. While writing LLVM backends though it was more dumpr and equivalent to see trees, where I was simulating a workflow like this. I can't claim to have done a survey though.

I don't think writing PDL by hand is yet as concise or familiar as this. So even if PDL was used to implement some of these, I see plain C++ matchers as very easy way to add these and very useful. I think PDL matching for regions or variadics is not as good yet. With PDL as input one can reuse a parser though.

Are you referring to PDL or PDLL?

Can we get examples and comparisons here?
Also even if we want to support an extra syntax, the library should likely be structured in a decoupled way from the syntax.

mlir/tools/mlir-query/mlir-query.cpp
36

That shouldn't be necessary, let's avoid hard-coding this kind of flags please.

I wouldn't consider this orthogonal: this is a positioning of the tool, it's raison d'être here. I've never heard that the use case for this was "help the development of pattern matcher" for example. Is this really the main motivation? I'm questioning the need for this really.

This is what I was doing when writing clang matchers quite a lot :) And a few blogs also point at it. While writing LLVM backends though it was more dumpr and equivalent to see trees, where I was simulating a workflow like this. I can't claim to have done a survey though.

I don't think writing PDL by hand is yet as concise or familiar as this. So even if PDL was used to implement some of these, I see plain C++ matchers as very easy way to add these and very useful. I think PDL matching for regions or variadics is not as good yet. With PDL as input one can reuse a parser though.

Are you referring to PDL or PDLL?

Can we get examples and comparisons here?
Also even if we want to support an extra syntax, the library should likely be structured in a decoupled way from the syntax.

As for PDLL, let's take the query m hasOpName("arith.addf").

I'm not sure how the matcher implemented in pdll would look like, even though we can have something like the following, it's not a dynamic matcher (op-name need to be inferred at runtime).

/// match is not supported, only rewriting/erasing supported
Pattern HasOpNamePattern => match op<arith.addf>;

As for PDL, let's say we have the following PDL matcher (created at runtime) in MLIR:

module {
    pdl.pattern @HasOpNamePattern : benefit(0) {
	%0 = operands loc(#loc1)
	%1 = types loc(#loc1)
	%2 = operation "arith.addf"(%0 : !pdl.range<value>) -> (%1 : !pdl.range<type>) loc(#loc1)
	match %2 /// match doesn't exist yet.
    } loc(#loc)
}

PDL has a rewrite/erase within the pattern and can simply load these patterns into a FrozenRewritePatternSet and then use applyPatternsAndFoldGreedily()

However, for the mlir-query case, we only need to return matches. So, a separate implementation like applyPatternsAndReturnMatches() will be required, as I couldn't find an equivalent function that simply returns matches.

This might not be the correct/best way to do this. Please let me know if there's a better way to do this.

I'll try to find ways to decouple the parser from the matchers.

Remove hardcoded allowUnregisteredDialects(true)

Remove hardcoded values and add them as commandline options

Also even if we want to support an extra syntax, the library should likely be structured in a decoupled way from the syntax.

Thats a good idea, I'd be pro getting the basic structure correct and then generalizing (e.g., a bit of land and iterate). One easy way I could see here is to make the parsing as much an implementation detail as possible (not exposed in public headers etc). We could start with having a "parse to matcher" call that takes a string and returns a matcher representation that could then be fed to a match call which returns array of operation*. I think that is roughly then that the mlir-query driver registers matchers (well we could have default set and modify inside lib/ as the "language" is relatively fixed, but it feels like something outside lib for me), autocompletes/error checks in tool, calls the parse to match method (which will do redundant checks here probably as tool isn't only input and autocomplete + user experience is better with previous checking), calls the match and then walks the results printing them out. This would result in moving more parts out of lib/ and into tools/ I believe. Is that what you were thiking?

mlir/include/mlir/Query/Matcher/Diagnostics.h
1 ↗(On Diff #542179)

What parts are needed to expose to users to define a matcher? And Mehdi's suggestion about separating syntax/parsing with behavior is a useful guide here too.

39 ↗(On Diff #542179)

Do we need to even expose the diagnostics of the parser outside the parser lib/ ?

45 ↗(On Diff #542179)

Could we move these to enum classes, then we cold drop the ET_ and CT_ prefixes.

mlir/include/mlir/Query/Matcher/Parser.h
1 ↗(On Diff #542179)

I don't think this needs to be exposed as a public header.

mlir/lib/Query/Matcher/Registry.cpp
63 ↗(On Diff #542179)

Could these be in tools/mlir-query ?

  • Move MatchFinder implementation to a separate file
  • Add new -c option to run commands without prompt
  • Add a simple test case
  • Minor cleanups
devajithvs added a comment.EditedJul 28 2023, 4:40 PM

Also we discussed the possibility to reuse PDLL: what's missing from PDLL to use it instead of a custom language here?

During the last week, I've been looking at PDLL and trying to incorporate it with mlir-query.

One issue I faced is that the Ahead-of-Time (AOT) compiled PDLL is not as powerful or flexible as the existing matchers we have. For example, it cannot match operations based on their names.

As an example of what we can do with PDLL, let's consider an isConstantOp matcher, which can be defined as follows:

Constraint IsConstantConstraint(op: Op) [{
  return mlir::success(op->hasTrait<mlir::OpTrait::ConstantLike>());
}];

Pattern isConstantOp {
  let op = op<>;
  IsConstantConstraint(op);
  erase op;
}

This constraint is similar to what we have in regular MLIR matchers.

I've separated the MatchFinder into a separate file to make it easier to add matchFinder support for PDLL Matchers.

To add PDLL matcher support, we could update the MatchFinder with a function that accepts PDLByteCode and returns matches:

static std::vector<Operation *> getMatches(Operation *root, const mlir::detail::PDLByteCode *bytecode) {
  std::vector<Operation *> matches;

  SmallVector<mlir::detail::PDLByteCode::MatchResult, 4> pdlMatches;
  // Simple match finding with walk.
  root->walk([&](Operation *op) {
    bytecode->match(op, rewriter, pdlMatches, *mutableByteCodeState);
    // ... Some logic to check if the size of pdlMatches has changed and append
    // the corresponding match to matches appropriately ...
  });

  return matches;
}

The parser will remain the same.

As for the registry, it has a lookupMatcherCtor(StringRef MatcherName) function, which returns a matcherCtor (constructor) that helps infer the types and number of arguments and checks them against the parsed arguments (inside marshaller).

Now constructMatcher takes this matcherCtor and the parsed argument values to return a VariantMatcher instance. The VariantMatcher here is created with the create function inside the implementation of the abstract class MatcherDescriptor (inside marshaller). The create method here takes the MLIR matcher function pointers, creates a DynMatcher with that, and then creates a VariantMatcher.

If we are adding support for PDLL Matchers, we can add a new implementation of abstract MatcherDescriptor class for PDLByteCode.

The idea for VariantMatcher class was borrowed from ASTMatchers and provides a nice abstraction for simple, variadic, and polymorphic matchers (since they have different types of nodes).

In our case, we could utilize this abstraction for wrapped MLIR Matchers (called DynMatcher), variadic matchers (e.g., AllOf, AnyOf), and PDLMatchers (as PDLByteCode).

devajithvs added a comment.EditedJul 28 2023, 4:44 PM

Also even if we want to support an extra syntax, the library should likely be structured in a decoupled way from the syntax.

Thats a good idea, I'd be pro getting the basic structure correct and then generalizing (e.g., a bit of land and iterate). One easy way I could see here is to make the parsing as much an implementation detail as possible (not exposed in public headers etc). We could start with having a "parse to matcher" call that takes a string and returns a matcher representation that could then be fed to a match call which returns array of operation*. I think that is roughly then that the mlir-query driver registers matchers (well we could have default set and modify inside lib/ as the "language" is relatively fixed, but it feels like something outside lib for me), autocompletes/error checks in tool, calls the parse to match method (which will do redundant checks here probably as tool isn't only input and autocomplete + user experience is better with previous checking), calls the match and then walks the results printing them out. This would result in moving more parts out of lib/ and into tools/ I believe. Is that what you were thiking?

We have something similar to a "parse to matcher" function, parseMatcherExpression returns the wrapped matcher (after going through all the mentioned steps). Inside the function, it calculates the VariantValue by parsing. The VariantValue has a method getMatcher() which returns the corresponding VariantMatcher. Once we have that (if we were to also do PDLL), we can check if it's a DynMatcher or a PDLMatcher and handle it appropriately.

EDIT: I'm working on the review comments, I'll make an update again with that and tests, which I missed.

devajithvs updated this revision to Diff 546493.Aug 2 2023, 8:39 AM

Move all internal details to internal namespace
Convert all enums to enum class
Cleanup diagnostics only keeping minimal features needed for now
Move register matchers to tools/mlir-query

devajithvs added inline comments.Aug 4 2023, 5:47 AM
mlir/include/mlir/Query/Matcher/Diagnostics.h
1 ↗(On Diff #542179)

Moved diagnostics to internal namespace. None of these are used directly to define a matcher.

39 ↗(On Diff #542179)

No, we don't need to expose diagnostics outside lib. Moved diagnostics to internal namespace.

45 ↗(On Diff #542179)

Replaced all enums to enum classes and dropped the prefixes. Also trimmed diagnostics (removed some features that are only needed for future reviews)

mlir/include/mlir/Query/Matcher/Parser.h
1 ↗(On Diff #542179)

Moved Parser implementation to internal namespace.

mlir/lib/Query/Matcher/Registry.cpp
63 ↗(On Diff #542179)

Reworked registry to register matchers in tools/mlir-query/mlir-query.cpp

Thanks for all the changes!

mlir/include/mlir/Query/Matcher/Diagnostics.h
39 ↗(On Diff #542179)

Why not just move it there rather than put in internal namespace here?

84 ↗(On Diff #546493)

Nit: MLIR mostly uses os rather than OS

mlir/include/mlir/Query/Matcher/Marshallers.h
82 ↗(On Diff #546493)

argNo

83 ↗(On Diff #546493)

argKinds

118 ↗(On Diff #546493)

Could we use more descriptive name than func?

157 ↗(On Diff #546493)

Elide trivial braces

167 ↗(On Diff #546493)

Given the if returns, then you can elide the else and just return

mlir/include/mlir/Query/Matcher/MatchFinder.h
1 ↗(On Diff #546493)

rm empty line

mlir/include/mlir/Query/Matcher/MatchersInternal.h
1 ↗(On Diff #546493)

Can this also move to lib/ ? (include/ are for public headers unless exceptional cases)

mlir/include/mlir/Query/Matcher/Parser.h
58 ↗(On Diff #546493)

What else besides the registry? (What is the more of and more?)

mlir/include/mlir/Query/Matcher/VariantValue.h
52 ↗(On Diff #546493)

These are among the dex doxygen comments, why are they highlighted?

mlir/include/mlir/Query/Query.h
29 ↗(On Diff #546493)

Can we use LogicalResult for this?

mlir/include/mlir/Query/QueryParser.h
27 ↗(On Diff #546493)

Could we just expose these two functions in the header rather than the entire QueryParser class?

mlir/include/mlir/Query/QuerySession.h
33 ↗(On Diff #546493)

Why is a shared_ptr needed?

36 ↗(On Diff #546493)

Should all of these be exposed?

mlir/lib/Query/Matcher/VariantValue.cpp
1 ↗(On Diff #546493)

File type indicator is only needed for .h files

I'm working on the changes now.

mlir/include/mlir/Query/Matcher/MatchersInternal.h
1 ↗(On Diff #546493)

I'll try to decouple some of these and try to move most of the headers to lib/, I thought all the headers should go in the include/

The problem now that since registry initialization is moved to tools/mlir-query/mlir-query.cpp, atleast the registry needs to be in /include, I'll try to rework some of it and try to reduce the dependencies for registry.

mlir/include/mlir/Query/Matcher/Parser.h
58 ↗(On Diff #546493)

This is inspired from clang-query, they also have a SEMA instance that does the same. SEMA here only connect to the registry, but we could intercept and also have additional checks/logic on the matchers, like binding matcher to an ID.

mlir/include/mlir/Query/Matcher/VariantValue.h
52 ↗(On Diff #546493)

That was a miss sorry, I'll remove the extra /

  • Rename OS variables and other clang-tidy checks
  • Prepend llvm:: to StringRefs and ArrayRefs
  • Make run method use LogicalResult instead of bool
  • Remove unneeded file type indicator for cpp filesx
  • Remove shared_ptr and add make some variables private in QuerySession
  • Move files from public header directory to lib
  • Refractoring and some renamings

Thanks for reviewing :), I've made the suggested changes.

mlir/include/mlir/Query/Matcher/Marshallers.h
82 ↗(On Diff #546493)

Done

83 ↗(On Diff #546493)

Done

118 ↗(On Diff #546493)

Renamed func to matcherFunc

157 ↗(On Diff #546493)

Done

167 ↗(On Diff #546493)

Done

mlir/include/mlir/Query/Matcher/MatchFinder.h
1 ↗(On Diff #546493)

Done

mlir/include/mlir/Query/Matcher/MatchersInternal.h
1 ↗(On Diff #546493)

I've moved and reworked other parts out of the include/ dir, but this header is used at many places, especially the DynMatcher class (provides a convenient way for storing matcher implementation in a DynMatcher class).

I can't seem find a way to move this out from the include directory.

mlir/include/mlir/Query/Query.h
29 ↗(On Diff #546493)

Done

mlir/include/mlir/Query/QuerySession.h
33 ↗(On Diff #546493)

parseSourceFileForTool function in mlir/Tools/ParseUtilities.h was used to parse the input MLIR file and it required the sourceMgr to be a shared_ptr.

Reworked and now simply use a sourceMgr reference instead.

36 ↗(On Diff #546493)

Made them private and added a few getter functions.

mlir/lib/Query/Matcher/VariantValue.cpp
1 ↗(On Diff #546493)

Done

jpienaar accepted this revision.Aug 15 2023, 7:53 AM

I think this looks good (usable already!) and we can land and iterate.

This revision is now accepted and ready to land.Aug 15 2023, 7:53 AM

I think this looks good (usable already!) and we can land and iterate.

I've rebased it and ran the tests against the latest llvm version. Everything works without needing any change. It would be great to have this merged if everything is ok (before phabricator is made a read only HTML copy)