This is an archive of the discontinued LLVM Phabricator instance.

Let tablegen generate property definitions
ClosedPublic

Authored by JDevlieghere on Jul 23 2019, 8:14 PM.

Details

Summary

Property definitions are currently defined in a PropertyDefinition array and have a corresponding enum to index in this array. Unfortunately this is quite error prone. Indeed, just today we found an incorrect merge where a discrepancy between the order of the enum values and their definition caused the test suite to fail spectacularly.

Tablegen can streamline the process of generating the property definition table while at the same time guaranteeing that the enums stay in sync. That's exactly what this patch does. It adds a new tablegen file for the properties, building on top of the infrastructure that Raphael added recently for the command options. It also introduces two new tablegen backends: one for the property definitions and one for their corresponding enums.

It might be worth mentioning that I generated most of the tablegen definitions from the existing property definitions, by adding a dump method to the struct. This seems both more efficient and less error prone that copying everything over by hand. Only Enum properties needed manual fixup for the EnumValues and DefaultEnumValue fields.

Diff Detail

Event Timeline

JDevlieghere created this revision.Jul 23 2019, 8:14 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: mgorny. · View Herald Transcript
JDevlieghere edited the summary of this revision. (Show Details)Jul 23 2019, 8:18 PM

My dream is to one day be able to define a property by simply declaring a variable somewhere (say: Property<T> Foo(ParentProperty, "foo", "description of foo", DefaultValue);), and that one could just get/set them via something like context[Foo] = new_value. In that world, to tablegenning would hopefully be necessary, but that world is still pretty far away, and your approach does help with eliminating the redundancy, so maybe it's worth doing anyway... I dunno...

Regarding the patch itself, I have two questions/comments:

  • you put all property definitions into a single file, including those coming from "plugins". This is kind of bad as the knowledge of plugin internals leaks out. It may not be too bad, if the same effect could be achieved by just putting the definitions into a separate file and running tablegen twice, and this is just a way of avoiding that. It looks like that is the case here, but I'm not 100% sure. It's something to keep an eye on, at least, particularly, if we ever want to make "real" plugins.
  • I'm wondering if the same effect could not be achieved in a more low-cost way via a .def header file. I believe @aprantl had a patch like that at one point. The .def file could just contain something like: LLDB_PROPERTY(eFoo, "foo", "description", default). When you want to define the enum, you just have the macro expand to "eFoo", when you define the property itself, you have it expand to the whole {eFoo, ...} blurb... The advantage of tablegen is that it allows you to define the property list via some non-trivial algorithm, but it's not clear to me whether this is needed/useful here...

My dream is to one day be able to define a property by simply declaring a variable somewhere (say: Property<T> Foo(ParentProperty, "foo", "description of foo", DefaultValue);), and that one could just get/set them via something like context[Foo] = new_value. In that world, to tablegenning would hopefully be necessary, but that world is still pretty far away, and your approach does help with eliminating the redundancy, so maybe it's worth doing anyway... I dunno...

Regarding the patch itself, I have two questions/comments:

  • you put all property definitions into a single file, including those coming from "plugins". This is kind of bad as the knowledge of plugin internals leaks out. It may not be too bad, if the same effect could be achieved by just putting the definitions into a separate file and running tablegen twice, and this is just a way of avoiding that. It looks like that is the case here, but I'm not 100% sure. It's something to keep an eye on, at least, particularly, if we ever want to make "real" plugins.

Yep, but I'm also happy to have separate files for the plugins. They have only a few options, so maybe tablegen is fast enough that it doesn't really matter.

  • I'm wondering if the same effect could not be achieved in a more low-cost way via a .def header file. I believe @aprantl had a patch like that at one point. The .def file could just contain something like: LLDB_PROPERTY(eFoo, "foo", "description", default). When you want to define the enum, you just have the macro expand to "eFoo", when you define the property itself, you have it expand to the whole {eFoo, ...} blurb... The advantage of tablegen is that it allows you to define the property list via some non-trivial algorithm, but it's not clear to me whether this is needed/useful here...

The def file is really nice when everything has the exact same structure, like things in Dwarf.def. The advantage of using tablegen here is that you can omit fields, like the default string value when they're not needed and do some sanity checking. It also has the advantage we can change the representation down the road, without having to modify the tablegen files.

My dream is to one day be able to define a property by simply declaring a variable somewhere (say: Property<T> Foo(ParentProperty, "foo", "description of foo", DefaultValue);), and that one could just get/set them via something like context[Foo] = new_value. In that world, to tablegenning would hopefully be necessary, but that world is still pretty far away, and your approach does help with eliminating the redundancy, so maybe it's worth doing anyway... I dunno...

Regarding the patch itself, I have two questions/comments:

  • you put all property definitions into a single file, including those coming from "plugins". This is kind of bad as the knowledge of plugin internals leaks out. It may not be too bad, if the same effect could be achieved by just putting the definitions into a separate file and running tablegen twice, and this is just a way of avoiding that. It looks like that is the case here, but I'm not 100% sure. It's something to keep an eye on, at least, particularly, if we ever want to make "real" plugins.

Yep, but I'm also happy to have separate files for the plugins. They have only a few options, so maybe tablegen is fast enough that it doesn't really matter.

Cool. Glad we're on the same page.

  • I'm wondering if the same effect could not be achieved in a more low-cost way via a .def header file. I believe @aprantl had a patch like that at one point. The .def file could just contain something like: LLDB_PROPERTY(eFoo, "foo", "description", default). When you want to define the enum, you just have the macro expand to "eFoo", when you define the property itself, you have it expand to the whole {eFoo, ...} blurb... The advantage of tablegen is that it allows you to define the property list via some non-trivial algorithm, but it's not clear to me whether this is needed/useful here...

The def file is really nice when everything has the exact same structure, like things in Dwarf.def. The advantage of using tablegen here is that you can omit fields, like the default string value when they're not needed and do some sanity checking. It also has the advantage we can change the representation down the road, without having to modify the tablegen files.

.def files can omit fields too: #define BOOL_PROPERTY(name, global, default, desc) PROPERTY(name, OptionValue::eTypeBoolean, global, default, nullptr, {}, desc). Some sanity checking sounds like it could be useful, but I'm not exactly sure what kind of checks you have in mind. Being able to change the representation is nice, but I expect most of those changes would also be achievable with the def files. More radical changes (like the variable thing I mentioned) would probably require changes regardless of how the properties are generated...

Have separate tablegen files for plugins.

.def files can omit fields too: #define BOOL_PROPERTY(name, global, default, desc) PROPERTY(name, OptionValue::eTypeBoolean, global, default, nullptr, {}, desc). Some sanity checking sounds like it could be useful, but I'm not exactly sure what kind of checks you have in mind. Being able to change the representation is nice, but I expect most of those changes would also be achievable with the def files. More radical changes (like the variable thing I mentioned) would probably require changes regardless of how the properties are generated...

I agree with you and I'm not opposed to def-files at all. I think tablegen and .def files have different trade-offs and while the latter could probably work for properties, I have the feeling that tablegen is a better fit. The things are mentioned before are just a few things that came to mind.

To give an example of sanity checking: this isn't in the patch (yet) but with tablegen we can ensure that every option has either a default unsigned or string value. In the table you can't differentiate between a default 0 and an explicit default value of 0.

.def files can omit fields too: #define BOOL_PROPERTY(name, global, default, desc) PROPERTY(name, OptionValue::eTypeBoolean, global, default, nullptr, {}, desc). Some sanity checking sounds like it could be useful, but I'm not exactly sure what kind of checks you have in mind. Being able to change the representation is nice, but I expect most of those changes would also be achievable with the def files. More radical changes (like the variable thing I mentioned) would probably require changes regardless of how the properties are generated...

I agree with you and I'm not opposed to def-files at all. I think tablegen and .def files have different trade-offs and while the latter could probably work for properties, I have the feeling that tablegen is a better fit. The things are mentioned before are just a few things that came to mind.

To give an example of sanity checking: this isn't in the patch (yet) but with tablegen we can ensure that every option has either a default unsigned or string value. In the table you can't differentiate between a default 0 and an explicit default value of 0.

If you used something like the BOOL_PROPERTY macro from my previous comment, then there would be no way to explicitly specify a default string value for a boolean property (and similarly for other property types too).

Overall, I'm not strongly opposed to this, but I am not convinced that the added complexity of tablegen ("#including a .def file" vs. "running a functional program which generates a def file, which is then #included") is worth the benefits. I suggest getting someone else to review this to break the stalemate. :)

Add sanity check that I mentioned. This actually caught a few properties that didn't have defaults.

xiaobai accepted this revision.Jul 25 2019, 12:29 PM

I think this is fine. Using .def files would be okay too, but I like the sanity checks that Jonas introduced in the LLDBPropertyDefEmitter

This revision is now accepted and ready to land.Jul 25 2019, 12:29 PM
friss added a comment.Jul 25 2019, 1:21 PM

When we hit that merge conflict issue 2 days ago (which mismatched a property and its enum), the first thing that came to my mind was that a tablegen approach would have prevented it. For some reason it seemed more natural to me, and it still does.

I might have liked a .def file approach as much if Jonas did it that way. Now that this is implemented however, I'm not sure I see the appeal of redoing it as a .def file.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2019, 2:37 PM
sammccall added inline comments.
lldb/trunk/source/Plugins/DynamicLoader/Darwin-Kernel/DynamicLoaderDarwinKernel.cpp
77 ↗(On Diff #211824)

These unqualified #include directives are causing us some problems downstream. What do you think about qualifying these like other includes in the file? i.e. #include "Plugins/DynamicLoader/Darwin-Kernel/Properties.inc"

The problem is that these "local" includes can only resolve to the right thing if the build system specifies a different -I line to each relevant cpp file (or the header lives next to the source file, which isn't the case for generated files).
Many build systems (including both CMake and bazel) only support doing this for separate libraries, however lldb's source isn't sufficiently clearly layered to allow splitting into fine-grained libraries in bazel (CMake is much more lax here).

While CMake is the gold standard here, it'd be helpful to make it possible to build with other systems out-of-tree. @chandlerc was looking at this issue and has more context and opinions here.

It also seems a bit less magic and more consistent with the rest of the includes in the file, as well as with the project. (Most, though sadly not all, .inc files in llvm are included by qualified paths).

I'll send a patch shortly after verifying it fixes the issue.