This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] Add a library-based entry point for parsing td files
ClosedPublic

Authored by rriddle on Feb 15 2022, 3:20 PM.

Details

Summary

This commit adds a new TableGenParseFile entry point for tablegen
that parses an input buffer and invokes a callback function with
a record keeper (notably without an output buffer). This kind of entry
point is very useful for tablegen consuming tools that don't create
output, and want invoke tablegen multiple times. The current way
that we interact with tablegen is via relative includes to
TGParser(not great).

Diff Detail

Event Timeline

rriddle created this revision.Feb 15 2022, 3:20 PM
rriddle requested review of this revision.Feb 15 2022, 3:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2022, 3:20 PM

Essentially the same justification as when I submitted D108934, we essentially have a growing desire to interact with tablegen as a library (see the child revision for an example). The current situation (as noted in the description) is that we use relative include to TGParser, but this isn't really tenable when adding usages upstream. This commit hopefully starts to rectify that, by exposing a minimal interface into the TG parser.

I'm in favor of exposing this, I actually ran into this problem just yesterday in an external project.

However, the proposed interface using a callback is strange. How about returning an Expected<RecordKeeper> instead?

I'm in favor of exposing this, I actually ran into this problem just yesterday in an external project.

However, the proposed interface using a callback is strange. How about returning an Expected<RecordKeeper> instead?

Ah, I think I would be fine going with something like that. The thing I like about the callback approach is that it has more explicit expectations around lifetime, which tablegen right now has a horrible notion of. Not shown here is that to actually properly invoke tablegen multiple times, you need to call llvm_shutdown/reset the global SrcMgr/etc.

I'm in favor of exposing this, I actually ran into this problem just yesterday in an external project.

However, the proposed interface using a callback is strange. How about returning an Expected<RecordKeeper> instead?

Ah, I think I would be fine going with something like that. The thing I like about the callback approach is that it has more explicit expectations around lifetime, which tablegen right now has a horrible notion of. Not shown here is that to actually properly invoke tablegen multiple times, you need to call llvm_shutdown/reset the global SrcMgr/etc.

Should all of that perhaps be happening inside this function? E.g., could we have this be "hermetic" and have the lifetime of everything tblgen side scoped to the call (e.g., you can walk over all the records inside but not keep any references to it). That way if this gets changed tblgen side, the users of this function don't need to be updated.

rriddle updated this revision to Diff 409418.Feb 16 2022, 2:42 PM
rriddle edited the summary of this revision. (Show Details)

I'm in favor of exposing this, I actually ran into this problem just yesterday in an external project.

However, the proposed interface using a callback is strange. How about returning an Expected<RecordKeeper> instead?

Ah, I think I would be fine going with something like that. The thing I like about the callback approach is that it has more explicit expectations around lifetime, which tablegen right now has a horrible notion of. Not shown here is that to actually properly invoke tablegen multiple times, you need to call llvm_shutdown/reset the global SrcMgr/etc.

Should all of that perhaps be happening inside this function? E.g., could we have this be "hermetic" and have the lifetime of everything tblgen side scoped to the call (e.g., you can walk over all the records inside but not keep any references to it). That way if this gets changed tblgen side, the users of this function don't need to be updated.

Good idea, moved things to here. I also added a helper "reset" function to remove the need to call llvm_destroy (which can likely destroy more than what we need here).

rriddle added inline comments.Feb 16 2022, 2:48 PM
llvm/lib/TableGen/Parser.cpp
36

We could do this at the beginning and allow returning the RecordKeeper, but users would have to keep in mind that any successive call to this function will destroy any previously returned RecordKeeper. The current lifetime model of TableGen makes this quite annoying (that is more easily fixable now than it used to be, but someone still needs to put in a lot of the work to plumb through a context everywhere in TableGen code).

Could the RecordKeeper object be the tablegen context? Still need to plumb it through everything.

Could the RecordKeeper object be the tablegen context? Still need to plumb it through everything.

Conceptually yeah, we could use the RecordKeeper as the context itself (I mean that is basically what it is right now for the most part). I didn't mean to imply above that we explicitly need a new class (though we could have one), we just need something akin to LLVMContext/MLIRContext to plumb through the API.

nhaehnle added inline comments.Feb 17 2022, 9:41 AM
llvm/lib/TableGen/Parser.cpp
36

This current version seems a reasonable compromise to me as long as all the issues around global state in TableGen haven't been addresses.

lattner accepted this revision.Feb 17 2022, 9:06 PM

I agree with nhaehnle and others that Tblgen's internal implementation and lifetime model is really problematic. Until and if someone feels compelled to rewrite tblgen from scratch (any takers ;-) following best practices, this seems like a reasonable step forward. This lgtm. Thanks!

This revision is now accepted and ready to land.Feb 17 2022, 9:06 PM
jpienaar accepted this revision.Feb 19 2022, 10:02 AM

Nice (I've wanted to use tblgen as library in little rewriter I was playing around with, so this will be useful there even just for experiments, thanks :-))

This revision was landed with ongoing or failed builds.Mar 3 2022, 4:14 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 4:14 PM

The unit test was failing for me (in the cmake and bazel builds) due to other files already being present in the SrcMgr - so I tried this patch: d60a65abb6b050e10d9efdbc56dcb2e2e4772af1 to address that. No idea if it's right/good, but I guess if this function can reset the SrcMgr at the end, it's probably OK to reset it at the start too?

The unit test was failing for me (in the cmake and bazel builds) due to other files already being present in the SrcMgr - so I tried this patch: d60a65abb6b050e10d9efdbc56dcb2e2e4772af1 to address that. No idea if it's right/good, but I guess if this function can reset the SrcMgr at the end, it's probably OK to reset it at the start too?

Weird. That fix looks appropriate to me, thanks for submitting it! Realistically we should be able to reset at the start, we can just plumb any initialization of the SrcMgr (e.g. diag handler) as params to this function.