This is an archive of the discontinued LLVM Phabricator instance.

[mlir:PDLL] Add support for C++ generation
ClosedPublic

Authored by rriddle on Feb 14 2022, 2:07 PM.

Details

Summary

This commits adds a C++ generator to PDLL that generates wrapper PDL patterns
directly usable in C++ code, and also generates the definitions of native constraints/rewrites
that have code bodies specified in PDLL. This generator is effectively the PDLL equivalent of
the current DRR generator, and will allow easy replacement of DRR patterns with PDLL patterns.
A followup will start to utilize this for end-to-end integration testing and show case how to
use this as a drop-in replacement for DRR tablegen usage.

Diff Detail

Event Timeline

rriddle created this revision.Feb 14 2022, 2:07 PM
rriddle requested review of this revision.Feb 14 2022, 2:07 PM
mlir/lib/Tools/PDLL/CodeGen/CPPGen.cpp
97

As currently implemented, we'd end up with C++ interspersed with PDL string representation.
I fear this will grow large and hard to manage.

Thinking a bit more, is this actually related to PDLL or just to PDL?
Wouldn't it be better if the struct took a ModuleOp + PdlPatternName and was always an information-less wrapper ?

This way there would be a single MLIR source of truth one can go look at.
The short-term downside is that we would need to handle files more explicitly (i.e. materialize the PDL file) and C++ would need to parse that file (i.e. at compiler runtime).

OTOH, this would allow behaviors that seem nicer in MLIR rather than PDLL.
I imagine we will soon need to separate multiple PDL .mlir files logically and start importing them from each other and from executable MLIR.
The notion of DCE is then dependent on context and since it is already builtin MLIR, this sounds preferable.
Then we can separate PDLL files and mix PDLL + hand-written MLIR-PDL transparently.

None of this would require looking at any generated C++.

188

Nothing here touches constParams.
I imagine this is related to the comment // FIXME: We currently do not have a modeling for the "constant params" in D119779 ?

mehdi_amini added inline comments.Feb 15 2022, 10:25 AM
mlir/lib/Tools/PDLL/CodeGen/CPPGen.cpp
97

We should print generic here, I still to have a build of MLIR that strips the custom assembly support.

Ideally we would even not including a string to be parsed at runtime, but emit the C++ that construct the IR! :)

rriddle updated this revision to Diff 409072.Feb 15 2022, 3:20 PM
rriddle edited the summary of this revision. (Show Details)
mlir/lib/Tools/PDLL/CodeGen/CPPGen.cpp
97

Ideally we would even not including a string to be parsed at runtime, but emit the C++ that construct the IR! :)

This sounds like a scary amount of duplication to me ..
I'd prefer we have a way to load all patterns we want dynamically and just call then by name.

jpienaar added inline comments.Feb 16 2022, 8:33 AM
mlir/lib/Tools/PDLL/CodeGen/CPPGen.cpp
8

File description?

71

OOC should there be a guard against a pattern named GeneratedPDLLPattern2? :)

93

Nit: this form won't trigger the syntax highlighing added to vscode. These for me are just MLIR snippets inside C++ code and the fact that they include PDL ops isn't that interesting at this level (at least not to the level that I'd want to special case detection or would me act differently)

97

generic

That shouldn't really matter, these should only be generated and consumed at the same rev, this should be a build system artifact. That being said, this could reduce number of test case updates if syntax changes.

includes

Having something like #include "<pattern_name>.mlir here could be a compromise. Or are you proposing even beyond that : from a single generated pattern file one has these patterns merely refer into it?

If you had this in bytecode format, would that be too late for pattern optimization? (e.g., we want this in this form C++ side, vs serializing the bytecode variant here and then keeping the "raw" pattern in external file for dynamic loading etc)

mehdi_amini added inline comments.Feb 16 2022, 12:13 PM
mlir/lib/Tools/PDLL/CodeGen/CPPGen.cpp
97

This sounds like a scary amount of duplication to me ..

I'm not sure I see why? Talking about this with River I even think we could have a generic backend for mlir-translate that given an op emits the C++ calls to build it (including nested region).

I'd prefer we have a way to load all patterns we want dynamically and just call then by name.

Sure: the thing is that when you say "load all patterns dynamically" it begs the question about what is the storage format for the patterns. Ideally the engine does not really care: what it wants is a "in-memory" representation of a pattern a way to load it.
There can be multiple way to store this and get the in-memory representation form, above what I was pointing at is that "string" isn't a great production artifact for the storage.

rriddle updated this revision to Diff 409416.Feb 16 2022, 2:42 PM
rriddle marked 4 inline comments as done.
rriddle added inline comments.Feb 16 2022, 2:43 PM
mlir/lib/Tools/PDLL/CodeGen/CPPGen.cpp
71

Haha, added.

93

Yeah, I had this written before we added support for that. Updated.

97

There are various different things at play here. There is the generated C++ API that is exposed to users, and the internal storage of the patterns that we use. As Mehdi mentions, the storage representation is generally opaque to the user, whether that be an inlined string/bitcode blob/directly using the builder API/etc.

As currently implemented, we'd end up with C++ interspersed with PDL string representation.
I fear this will grow large and hard to manage.

Thinking a bit more, is this actually related to PDLL or just to PDL?
Wouldn't it be better if the struct took a ModuleOp + PdlPatternName and was always an information-less wrapper ?

This way there would be a single MLIR source of truth one can go look at.
The short-term downside is that we would need to handle files more explicitly (i.e. materialize the PDL file) and C++ would need to parse that file (i.e. at compiler runtime).

OTOH, this would allow behaviors that seem nicer in MLIR rather than PDLL.

What behaviors do you have in mind here? The only thing PDLL is doing is generating MLIR and splatting that to the source file, all analysis/optimization/transformation of the PDL happens at runtime.

I imagine we will soon need to separate multiple PDL .mlir files logically and start importing them from each other and from executable MLIR.
The notion of DCE is then dependent on context and since it is already builtin MLIR, this sounds preferable.
Then we can separate PDLL files and mix PDLL + hand-written MLIR-PDL transparently.

None of this would require looking at any generated C++.

You can already mix PDLL generated patterns with PDL patterns from other places. Maybe I'm not understanding what kind of mixing you refer to here. The way that PDL works is that when the rewrite pattern set is frozen (i.e. when you create a FrozenRewritePatternSet) all of the PDL patterns are merged together into a single giant module, and then lowered/optimized together. Generating separate patterns or one giant pattern module has no effect on the end result, it's only about how you expose things to the user. Generating separate C++ patterns (one per PDLL pattern) allows for the user to directly reference specific patterns by name in the C++ API; e.g. if you had one .pdll file for canonicalization patterns of a dialect, you'd want to separate out the patterns for specific operations when adding them to the canonicalization pattern list for that op.

188

Yep, also added a comment here.

ping. Any updates or fundamental blockers?

jpienaar accepted this revision.Feb 22 2022, 4:32 PM

Looks good, I think this is fine starting point and can be refined from here.

mlir/lib/Tools/PDLL/CodeGen/CPPGen.cpp
108

Nice touch, if have forgotten this.

This revision is now accepted and ready to land.Feb 22 2022, 4:32 PM
This revision was automatically updated to reflect the committed changes.