This is an archive of the discontinued LLVM Phabricator instance.

[mlir][tblgen] Improve detecting default-prototype builder for AttrOrTypeDef
AbandonedPublic

Authored by wrengr on Jul 6 2022, 6:31 PM.

Details

Summary

First attempt at resolving https://llvm.org/PR56415 , though it's still a bit WIP at the moment.

Diff Detail

Event Timeline

wrengr created this revision.Jul 6 2022, 6:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2022, 6:31 PM
wrengr requested review of this revision.Jul 6 2022, 6:31 PM
wrengr added a comment.Jul 6 2022, 6:36 PM

This differential is still a bit WIP regarding style (i.e., whether to make the new functions methods or not, names of things, etc). Also I still need to add some unit tests to ensure this works as intended.

@rriddle let me know if this general approach looks reasonable to you, and if so then offer any thoughts you have re the style portion of things.

jpienaar added inline comments.Jul 13 2022, 2:06 PM
mlir/lib/TableGen/AttrOrTypeDef.cpp
35

Yeah folks were mentioning desire to move away from strings in that context and more similar to how this is handled in arguments.

48

Could we inline the call here? That seems like it would be easier to follow and avoid the need for introducing K3

63

Are there any warnings flagged here?

rriddle added inline comments.Jul 13 2022, 2:10 PM
mlir/lib/TableGen/AttrOrTypeDef.cpp
35

Yeah, I think for now it would be nice if we just check endsWith instead of ==. That might cover most uses, where the generic builders have namespaces (e.g. "::mlir::MLIRContext *":$ctx) and the user written builders don't (e.g. "MLIRContext *":$ctx).

Mogball added inline comments.
mlir/lib/TableGen/AttrOrTypeDef.cpp
44

Checking for the string equality of the C++ types can be fragile, because it doesn't account for namespace differences, e.g. ::mlir::Type vs Type, and cannot account for implicit C++ conversions, e.g. SmallVectorImpl<T> & and ArrayRef<T>.

wrengr updated this revision to Diff 445922.Jul 19 2022, 1:15 PM

Lots of changes/improvements.

Okay, so this is starting to turn into a rabbit hole, even without getting into any more sophisticated string comparisons. Here are the problems I'm running into at the moment:

  1. Errors emitted by the AttrOrTypeDef ctor occur before the errors emitted by mlir::tblgen::generateAttrOrTypeFormat, therefore the test/mlir-tblgen/attr-or-type-format-invalid.td file can't include tests for errors emitted by both sources (without implicitly ordering the seemingly-independent tests so that their errors occur in FileCheck order).
  1. Fatal errors emitted by AttrOrTypeDef don't respect the -asmformat-error-is-fatal=false commandline option, so they cause mlir-tblgen to exit before even calling generateAttrOrTypeFormat. The mlir/lib/TableGen/AttrOrTypeDef.cpp file isn't allowed to include the mlir/tools/mlir-tblgen/FormatGen.h header, so it can't access that commandline option directly; so if we want these errors to respect that commandline option, then the only solution I can think of is for the AttrOrTypeDef ctor to take an extra bool argument, and then have the callers thread things through. However, such explicit threading of state is a code smell and will quickly become gross. Alas, without somehow respecting the commandline option, each test record that expects a fatal error would have to be given their own separate files.
  1. Relatedly, since the "unsure" case could still have too many false-positives, ideally we'd have some other commandline option for indicating whether the user desires the warnings or not. Which runs into the same issues re threading state around.
  1. The biggest issue, however, is that the AttrOrTypeDef ctor is called repeatedly for the same Record. This means that any expensive checks will be run repeatedly, and any non-fatal messages will also be reported repeatedly. I think this one has a feasible solution, though I haven't tried it out yet. Namely: the AttrOrTypeDef ctor seems to only be called by mlir/tools/mlir-tblgen/AttrOrTypeDefGen.cpp:collectAllDefs, and that function seems to only be called by mlir/tools/mlir-tblgen/AttrOrTypeDefGen.cpp:DefGenerator, which only calls it with the DefGenerator::defRecords field, which itself doesn't seem to be used anywhere else. If that chain of references is indeed the only one, then we could have DefGenerator store the vector of AttrOrTypeDef objects in lieu of the vector of Record*, and adjust the type of collectAllDefs accordingly. This way, the AttrOrTypeDef ctor is only called once on any given record, and the results are cached in the DefGenerator for it to reuse for both decls and defs.
wrengr marked 2 inline comments as done.Jul 19 2022, 1:29 PM
wrengr added inline comments.
mlir/lib/TableGen/AttrOrTypeDef.cpp
44

Oh I'm all too aware of the many fragilities of using string comparisons to try emulating type comparisons :) E.g., there's also things like whitespace differences. I have some preliminary work to deal with whitespace and K3 matching of namespaces, but don't want to spend too much time working on that until after determining if we even want to stick with this approach at all (as opposed to, say, adding a new bit hasDefaultPrototypeCustomBuilder field for users to explicitly say when they have some custom builder they expect to match).

Or were you suggesting I explicitly mention namespaces and implicit conversions in the documentation?

48

I was hoping that separating them would help make the logic easier to follow, since the for-loop + switch is just the K3 version of llvm::any_of. (And factoring out the builderHasParameters predicate follows the general style guide of factoring out predicates in order to reduce the boolean-blindness problem). But since this is the only place we use K3 or builderHasParameters, I suppose splitting them out doesn't help too much. Though if we want to try better comparisons between each parameter string, then that'll also probably want K3...

The main issue with inlining the function here is that builderHasParameters can short-circuit its for-loop, thus skipping over the return K3::True; path after the loop. However, this is nested within the for-loop of checkForBuilderWithDefaultPrototype, and since C++ doesn't have labeled-loops like Java, that means some ugliness to capture the "break out of the inner loop and then/also continue the outer loop" semantics. For a boolean comparison between parameter strings, I think I've managed to find a not-too-ugly way to get the same semantics when inlining builderHasParameters. Though for a K3-comparison between parameter strings, inlining that too starts to get quite ugly IMO.

Will post a new version inlining builderHasParameters and sticking with a boolean-comparison, for now. Can revisit the topic if/when we switch to using a K3-comparison.

63

None that I've seen so far, but I haven't run the full test suite yet. I do have some tests which trigger the warning in the expected situations; though they're currently commented out due to issues I discussed in the most recent update. (N.B., the version of this differential you commented on had some logic bugs re when to enter the state where it emits a warning. Which have been fixed in the most recent update)

wrengr marked 2 inline comments as done.Jul 19 2022, 2:47 PM
wrengr added inline comments.
mlir/lib/TableGen/AttrOrTypeDef.cpp
63

After running check-mlir and check-mlir-integration I didn't see any warnings or errors. But CMake might've hidden them from me. Is there any special CMake incantation to get the complete stderr logs from mlir-tblgen runs? The only logs I can find are for the autoconfig-like stuff before actually building anything.

wrengr edited the summary of this revision. (Show Details)Jul 19 2022, 4:10 PM
wrengr updated this revision to Diff 445976.Jul 19 2022, 4:11 PM

git-clang-format

Mogball requested changes to this revision.Jul 20 2022, 9:32 PM
Mogball added inline comments.
mlir/lib/TableGen/AttrOrTypeDef.cpp
32

Drop username

51

Drop const

64

Spell out the auto? Also drop const

71–76
80

llvm::zip?

81

I think checking the number of parameters is robust and will always give useful errors. String matching is a rabbit hole for sure, but I'm not currently aware of anyone relying on implicit conversions, so we can stick with this until someone complains.

I would take Rivers suggestion and check "ends with" or even "contains" but on both the cppType and cppStorageType.

89
102

Drop username. Also, can the check be moved someone else? (Into AttrOrTypeDefGen.cpp) then you can reference the flag and ensure the checks are performed at most once

172

Don't add usernames to FIXMEs or TODOs

mlir/test/mlir-tblgen/attr-or-type-format-invalid.td
136

Drop username

This revision now requires changes to proceed.Jul 20 2022, 9:32 PM
wrengr abandoned this revision.Mar 21 2023, 5:44 PM