This commit adds a new TableGenParseFile entry point for tablegen
that parses an input buffer and invokes a callback function with
a record keeper (notably without an output buffer). This kind of entry
point is very useful for tablegen consuming tools that don't create
output, and want invoke tablegen multiple times. The current way
that we interact with tablegen is via relative includes to
TGParser(not great).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Essentially the same justification as when I submitted D108934, we essentially have a growing desire to interact with tablegen as a library (see the child revision for an example). The current situation (as noted in the description) is that we use relative include to TGParser, but this isn't really tenable when adding usages upstream. This commit hopefully starts to rectify that, by exposing a minimal interface into the TG parser.
I'm in favor of exposing this, I actually ran into this problem just yesterday in an external project.
However, the proposed interface using a callback is strange. How about returning an Expected<RecordKeeper> instead?
Ah, I think I would be fine going with something like that. The thing I like about the callback approach is that it has more explicit expectations around lifetime, which tablegen right now has a horrible notion of. Not shown here is that to actually properly invoke tablegen multiple times, you need to call llvm_shutdown/reset the global SrcMgr/etc.
Should all of that perhaps be happening inside this function? E.g., could we have this be "hermetic" and have the lifetime of everything tblgen side scoped to the call (e.g., you can walk over all the records inside but not keep any references to it). That way if this gets changed tblgen side, the users of this function don't need to be updated.
Good idea, moved things to here. I also added a helper "reset" function to remove the need to call llvm_destroy (which can likely destroy more than what we need here).
llvm/lib/TableGen/Parser.cpp | ||
---|---|---|
35 | We could do this at the beginning and allow returning the RecordKeeper, but users would have to keep in mind that any successive call to this function will destroy any previously returned RecordKeeper. The current lifetime model of TableGen makes this quite annoying (that is more easily fixable now than it used to be, but someone still needs to put in a lot of the work to plumb through a context everywhere in TableGen code). |
Could the RecordKeeper object be the tablegen context? Still need to plumb it through everything.
Conceptually yeah, we could use the RecordKeeper as the context itself (I mean that is basically what it is right now for the most part). I didn't mean to imply above that we explicitly need a new class (though we could have one), we just need something akin to LLVMContext/MLIRContext to plumb through the API.
llvm/lib/TableGen/Parser.cpp | ||
---|---|---|
35 | This current version seems a reasonable compromise to me as long as all the issues around global state in TableGen haven't been addresses. |
I agree with nhaehnle and others that Tblgen's internal implementation and lifetime model is really problematic. Until and if someone feels compelled to rewrite tblgen from scratch (any takers ;-) following best practices, this seems like a reasonable step forward. This lgtm. Thanks!
Nice (I've wanted to use tblgen as library in little rewriter I was playing around with, so this will be useful there even just for experiments, thanks :-))
The unit test was failing for me (in the cmake and bazel builds) due to other files already being present in the SrcMgr - so I tried this patch: d60a65abb6b050e10d9efdbc56dcb2e2e4772af1 to address that. No idea if it's right/good, but I guess if this function can reset the SrcMgr at the end, it's probably OK to reset it at the start too?
Weird. That fix looks appropriate to me, thanks for submitting it! Realistically we should be able to reset at the start, we can just plumb any initialization of the SrcMgr (e.g. diag handler) as params to this function.
We could do this at the beginning and allow returning the RecordKeeper, but users would have to keep in mind that any successive call to this function will destroy any previously returned RecordKeeper. The current lifetime model of TableGen makes this quite annoying (that is more easily fixable now than it used to be, but someone still needs to put in a lot of the work to plumb through a context everywhere in TableGen code).