Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[UTC] Add fallback support for specific metadata, and check their defs
Needs RevisionPublic

Authored by hnrklssn on Apr 13 2023, 4:50 AM.

Details

Summary

This prevents update_cc_tests.py from emitting hard-coded identifiers
for metadata (global variable checkers still check hard-coded
identifiers). Instead it emits regex checkers that match even if the
identifiers change. Also adds a new mode for --check-globals: instead of
simply being on or off, it now has the options 'none', 'smart' and
'all', with 'none' and 'all' corresponding to the previous modes.

The 'smart' mode only emits checks for global definitions referenced
in the IR or other metadata that itself has a definition checker
emitted, making the rule transitive. It does not emit checks for
attribute sets, since that is better checked by --check-attributes. This
mode is made the new default. To make the change in default mode
backwards compatible a version bump is introduced (to v3), and the
default remains 'none' in v1 & v2.

This will result in metadata checks being emitted more often, so filters
are added to not check absolute file paths and compiler version git
hashes.

rdar://105239218

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change looks good to me, but if clang can't emit these yet we should probably wait until committing this? Do you have a link to a review that uses this metadata?

It can emit them, we just use them a bit more frequently downstream. I've found an example and added a lit test as an example of how it works.

We should probably just use the META fallback for all metadata not already explicitly handled. I think there's pretty little value in adding a special name for each metadata kind.

I added a fallback, and some logic to extract the metadata kind name from the IR, because I do find it useful to emit checks named after the type of IR they match against.

delcypher added inline comments.
clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected
13

@hnrklssn I just noticed we don't have a CHECK for what META2 actually refers to. Should we?

Not something that has to be fixed in this patch, more just an observation.

llvm/utils/UpdateTestChecks/common.py
873–874

Is this call to re.search(...) guaranteed to always succeed?

978

There may be some value in having !annotations matched explicitly as well as there being a fallback. In the current patch it looks like:

  • metadata is assigned variables with the META prefix on CHECK lines
  • !annotation is assigned variables with the META prefix on CHECK lines
  • Anything else that matches r'![a-z.]+ ' is assigned variables with the META prefix on CHECK lines

When looking a large test having everything lumped into META variables could make the generated tests a little harder to read.

hnrklssn added inline comments.Apr 26 2023, 6:23 AM
clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected
13

Indeed this is true for metadata in general, presumably because the RHS often refer to things like other metadata identifiers. In the case of annotations they seem to always refer to simple strings however, so it would be feasible to do a straight match without having to do recursive matching or complex regexes to determine which part of the metadata to match against.

In many cases with metadata attached to IR nodes, multiple nodes refer to the same metadata node, so at least you verify that they still are consistent. But I agree that verifying the content would be a great future addition.

llvm/utils/UpdateTestChecks/common.py
873–874

Yes. match is a match object from a combined regex concatenated from both the IR prefix (![a-z.]+ ) and the IR regexp (![0-9]+), so it will always contain a substring matching the IR prefix.

978

In a way, yes. At the same time, since we're not matching against the definition of the metadata node, all uses (while named META#) will still be prefixed by !annotation. On the other hand again, we have explicitly enumerated so many other types of metadata, so it is a bit inconsistent.

nikic added inline comments.Apr 26 2023, 6:28 AM
clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected
13

You need to pass --check-globals to check the actual metadata.

hnrklssn added inline comments.May 4 2023, 8:14 AM
clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected
13

When I add that to this test case it adds

//.
// CHECK: attributes #0 = { noinline nounwind optnone "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+mmx,+sse,+sse2,+x87" }
//.
// CHECK: !0 = !{i32 1, !"wchar_size", i32 4}
// CHECK: !1 = !{!"clang version 17.0.0 (git@github.com:llvm/llvm-project.git 684914f47cf59e9ab6d8b0f73c58ca6272ea28d4)"}
// CHECK: !2 = !{!"auto-init"}
//.

So it seems to just be doing a simple literal matching on all metadata, regardless of whether we captured that metadata in any filecheck variable. And it still doesn't use the META2 variable to match the definition. Am I missing something? If we use the literal metadata names instead of variable matching for the definitions, there isn't much point in doing variable matching for the metadata uses either, since the test still very much relies on the metadata numbering being stable.

hnrklssn added inline comments.May 22 2023, 7:58 AM
clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected
13

@nikic Do you have more information to add about how metadata definition matchers can be generated without hardcoding everything (which is kind of the opposite of what this patch is trying to do), or in general if you're happy with the state of the PR?

nikic added inline comments.May 22 2023, 12:07 PM
clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected
13

This works fine with update_test_checks, so it must be some bug in update_cc_test_checks in particular. From a quick look, I suspect it's because https://github.com/llvm/llvm-project/blob/3d05ab6d3e24e76ff53b8d7d623c436b4be5b809/llvm/utils/update_cc_test_checks.py#L447 hardcodes a True value, while update_test_checks makes this dependent on --preserve-names, which is disabled by default there.

nikic added inline comments.May 23 2023, 1:54 PM
clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected
13

Or maybe just test this with update_test_checks? This change is valuable for that script as well, and it doesn't have this extra issue.

hnrklssn updated this revision to Diff 525174.May 24 2023, 7:25 AM

Overhauled how update_cc_tests.py checks globals

It now:

  • no longer matches the literal names, like update_test_checks.py does unless --preserve-names (which doesn't seem to see much use, at least not upstream, so did not add option to update_cc_tests.py) is set
  • (along with update_test_checks.py) now adds checks for any definitions of globals referenced in the IR (by default, can still check all globals using "--check-globals all")
  • (along with update_test_checks.py) can also be configured to emit checks for any definitions of globals referenced by other globals referenced in the IR, using "--check-globals transitive"
  • (along with update_test_checks.py) filters out unstable metadata output, so that literal directory names and git hashes are not matched against
hnrklssn added inline comments.May 24 2023, 7:35 AM
clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected
13

I couldn't find any uses of
--preserve-names in the entire repo (not even in tests for update_test_checks), so I went ahead and changed update_cc_test_checks to pass True instead.

While at it I added 2 new options to match some, but not necessarily all, globals. I set the least verbose one (other than "none") as the default, emitting checks for the definitions of all globals that we emit matches for in the IR. Do you agree that this is reasonable?

hnrklssn marked an inline comment as not done.May 24 2023, 7:35 AM
nikic added inline comments.May 24 2023, 8:36 AM
clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected
13

That sounds like a nice approach. My main question would be whether --check-globals seen is the right default: My gut feeling here is that this will introduce additional checks for attributes/metadata in tests that don't actually care about it, but I'm not particularly sure about that.

You might want to do some mass test regeneration and take a look over the diff to see how useful the changes would be.

If we do change the default, the default change likely need to be part of --version 3 to avoid substantial test churn.

hnrklssn added inline comments.May 25 2023, 8:32 AM
clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected
13

I've regenerated checks before and after my patch and compared them. Some findings:

  • there's a surprising number of custom metadata types that didn't get variable checkers but instead checked hardcoded metadata IDs (!sloc, !pcsections, !alias.scope, !noalias, !nonnull, !noundef, !dereferenceable, !nontemporal, !fpmath etc.). The tests often have checkers, but were removed when regenerating them before my patch, so something seems to have broken.
  • I'm split on whether global variables should match the exact name or get a matcher. Right now the patch handles them like any other global, but I'm assuming that the name actually carries significance for linking?
  • matching attributes seems useful in some cases, not matter too much in others, but often there aren't that many attributes so it does not pollute much. Tests where attribute matchers pollute should probably use --scrub-attributes in update_check_tests (does not exist in update_cc_check_tests) anyways
  • tests that include debug info generally do so deliberately, so matching against that seems reasonable
  • TBAA metadata doesn't give too much insight unless transitive/all. That said not all TBAA tests check metadata at all today (at least not llvm/test/Transforms/ArgumentPromotion/reserve-tbaa.ll)
  • Similar goes for !llvm.loop, the useful info is pointed to indirectly
  • Profile info doesn't seem to be included very often. It's probably reasonable to match against it when it's there. Found one test that was specifically created to check profile info metadata but lost the check during conversion to update_check_tests (llvm/test/Transforms/ArgumentPromotion/profile.ll)

Since the transitive setting provides substantially more insight than the seen setting for loop and tbaa, and only really explodes for debug info (which is rarely included, and deliberately so when done) I'm open to making it the default instead. For opt based tests I kind of think that all should be the default since any metadata in the output was either created by the tested passes, or explicitly included in the input.

nikic added inline comments.May 25 2023, 12:59 PM
clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected
13

I've regenerated checks before and after my patch and compared them. Some findings:

  • there's a surprising number of custom metadata types that didn't get variable checkers but instead checked hardcoded metadata IDs (!sloc, !pcsections, !alias.scope, !noalias, !nonnull, !noundef, !dereferenceable, !nontemporal, !fpmath etc.). The tests often have checkers, but were removed when regenerating them before my patch, so something seems to have broken.

Yes, which is why I'm happy to have the generic implementation here -- many of these were handled by manually adjusting the update_test_checks output...

  • I'm split on whether global variables should match the exact name or get a matcher. Right now the patch handles them like any other global, but I'm assuming that the name actually carries significance for linking?

I'm thinking we should probably retain the name of globals, the same way we do for functions. Unlike things like local variable names, and attribute/metadata IDs, these tend to be stable and semantically meaningful. (Also the wildcard generated for them breaks syntax highlighting in vim. Grumble, grumble.)

  • matching attributes seems useful in some cases, not matter too much in others, but often there aren't that many attributes so it does not pollute much. Tests where attribute matchers pollute should probably use --scrub-attributes in update_check_tests (does not exist in update_cc_check_tests) anyways
  • tests that include debug info generally do so deliberately, so matching against that seems reasonable
  • TBAA metadata doesn't give too much insight unless transitive/all. That said not all TBAA tests check metadata at all today (at least not llvm/test/Transforms/ArgumentPromotion/reserve-tbaa.ll)
  • Similar goes for !llvm.loop, the useful info is pointed to indirectly
  • Profile info doesn't seem to be included very often. It's probably reasonable to match against it when it's there. Found one test that was specifically created to check profile info metadata but lost the check during conversion to update_check_tests (llvm/test/Transforms/ArgumentPromotion/profile.ll)

Since the transitive setting provides substantially more insight than the seen setting for loop and tbaa, and only really explodes for debug info (which is rarely included, and deliberately so when done) I'm open to making it the default instead. For opt based tests I kind of think that all should be the default since any metadata in the output was either created by the tested passes, or explicitly included in the input.

I think there's broadly two cases:

  1. There are globals/attributes/metadata in the input, which are needed to make the test work, but not important for the test output, because they will just be preserved. E.g. an InstSimplify test might use globals/attributes/metadata, but won't ever change them, and checking that fact is not particularly useful.
  1. Tests for passes that do modify/add attributes/metadata, such as IPO inference passes. It's worth noting that there is a separate --check-attributes flag that covers some of these cases, which will match the attributes printed before the function definition, which is generally more convenient than matching the attributes at the end of the module.

The first the use case we generally do not want to match any globals, and for the latter we probably want either --check-attributes (if testing function attributes in particular) or --check-globals transitive (if testing something metadata or call-site attributes).

I'm still not really sure what the right default would be. I guess one could argue that for the former case, having a few extra attributes at the test isn't a big deal -- one can just ignore them. But that does also mean that whenever we for example change attributes on an intrinsic, this will now suddenly affect hundreds of tests that encode those attributes.

hnrklssn updated this revision to Diff 526606.May 30 2023, 7:36 AM

Revert checker behaviour for global variables to check hard-coded identifiers
Remove 'seen' mode
Replace 'transitive' mode with 'smart', where attributes are no longer checked as metadata
Introduce version bump to v3, and make 'smart' mode the default in v3

hnrklssn marked an inline comment as done.May 30 2023, 7:46 AM
hnrklssn added inline comments.
clang/test/utils/update_cc_test_checks/Inputs/annotations.c.expected
13

I replaced --checked-globals transitive with --checked-globals smart, with the addition that it doesn't check attributes, and made it the default mode in the new version 3. If attributes need to be checked --check-attributes can be used, or --check-globals all.

Emitting checks for metadata that hasn't been changed by the passes when testing with opt may result in some extra noise in the test, but is unlikely to result in significant extra work to maintain, since if the metadata format for that metadata kind changes, the metadata input needs to change also. For test cases where it adds significant noise --check-globals none is always an option.

Also reverted the behaviour for global variables to check the exact identifier again.

hnrklssn marked an inline comment as done.Jun 7 2023, 6:22 AM

@nikic Did you have a look at the new patch?

Could you please rebase the patch to current main? I wasn't able to apply it.

The behavior you describe sounds reasonable to me.

llvm/utils/UpdateTestChecks/common.py
1710

We should also not print the all argument for --check-globals argument for version < 3, otherwise that will introduce a spurious change in all such tests.

could you put a little more information in the commit message please. "It won't do X when we do Y", could mean a lot of things. We don't do Y anymore, or we do X' now, with various choices for X'.

hnrklssn updated this revision to Diff 532649.Jun 19 2023, 7:39 AM

Rebase after reformatting with Black, update commit message

could you put a little more information in the commit message please. "It won't do X when we do Y", could mean a lot of things. We don't do Y anymore, or we do X' now, with various choices for X'.

I've updated the commit message to clarify what it means to not emit hardcoded identifiers, and that we instead emit regex checkers. Was that what you were aiming for, or are there other parts that you would like me to clarify?

llvm/utils/UpdateTestChecks/common.py
1710

I don't know how to dynamically change the --check-globals option between store_true and choices, since the behaviour can itself be affected by which argument is passed to the --version option. So the way I see it there's no way of knowing how to parse between two different option types ahead of time. For the default when no option is specified the parsing is the same however, so we can simply infer later what option is implied based on the version.

nikic added inline comments.Jun 19 2023, 8:27 AM
llvm/utils/UpdateTestChecks/common.py
1710

It's not necessary to change the option, just how it is printed. I.e. add something like this:

if args.version < 3 and value == "all":
    # Don't include argument value in older versions.
    autogenerated_note_args += "--check-globals "
    continue
nikic added inline comments.Jun 19 2023, 8:53 AM
llvm/utils/UpdateTestChecks/common.py
1710

Ah, I get what you're saying now -- old UTC_ARGS are not accepted at all anymore, and trying to update existing tests will just fail with an error!

We do need old UTC_ARGS to work. I think you can make the argument value optional using these options:

nargs="?", 
const="default",
default="default",
choices=["none", "smart", "all", "default"],

Or possibly omit const/default and handle None instead.

nikic added inline comments.Jun 19 2023, 9:02 AM
llvm/utils/UpdateTestChecks/common.py
1710

Correction, I think the right options are:

nargs="?",
const="all",
default="default",
choices=["none", "smart", "all"],

That is, use "default" is it's not specified at all, use "all" if only --check-globals is passed and also accept --check-globals none|smart|all.

hnrklssn updated this revision to Diff 532719.Jun 19 2023, 11:27 AM

Keep supporting --check-globals without explicit level as meaning '--check-globals all'

hnrklssn marked an inline comment as done.Jun 19 2023, 11:29 AM
hnrklssn added inline comments.
llvm/utils/UpdateTestChecks/common.py
1710

Ah nice, I did not expect the library to support this use case!

nikic added a comment.Jun 20 2023, 1:39 AM

There are some test failures.

I believe there is one bug with the handling of unnamed globals. Previously we produced this:

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals --version 2
; RUN: opt -S < %s | FileCheck %s
@0 = global i32 0

;.
; CHECK: @[[GLOB0:[0-9]+]] = global i32 0
;.
define i32 @test() {
; CHECK-LABEL: define i32 @test() {
; CHECK-NEXT:    [[V:%.*]] = load i32, ptr @[[GLOB0]], align 4
; CHECK-NEXT:    ret i32 [[V]]
;
  %v = load i32, ptr @0
  ret i32 %v
}

And now we instead check CHECK: @0 = global i32 0, so there is a mismatch between definition and use. I believe it does make sense to keep the wildcard names for unnamed globals.

On the same test case, if I keep rerunning update_test_checks on the same file, it keeps adding extra CHECK: @0 = global i32 0 lines at the start (before the first ;.. That didn't happen previously.

clang/test/utils/update_cc_test_checks/Inputs/annotations.c
2

It would be more typical to pass --version 3 when calling update_cc_test_checks. (Note that this is only a quirk of the test setup. Outside tests the --version 3 is implied for new tests.)

llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/check_attrs.ll.funcattrs.expected
9 ↗(On Diff #532719)

Huh, where does this change come from?

llvm/utils/UpdateTestChecks/common.py
1016

Does this still comply with the formatter?

hnrklssn updated this revision to Diff 532868.Jun 20 2023, 5:02 AM
hnrklssn marked 4 inline comments as done.

Added back regex matchers for anonymous global values

There are some test failures.

I believe there is one bug with the handling of unnamed globals. Previously we produced this:

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals --version 2
; RUN: opt -S < %s | FileCheck %s
@0 = global i32 0

;.
; CHECK: @[[GLOB0:[0-9]+]] = global i32 0
;.
define i32 @test() {
; CHECK-LABEL: define i32 @test() {
; CHECK-NEXT:    [[V:%.*]] = load i32, ptr @[[GLOB0]], align 4
; CHECK-NEXT:    ret i32 [[V]]
;
  %v = load i32, ptr @0
  ret i32 %v
}

And now we instead check CHECK: @0 = global i32 0, so there is a mismatch between definition and use. I believe it does make sense to keep the wildcard names for unnamed globals.

On the same test case, if I keep rerunning update_test_checks on the same file, it keeps adding extra CHECK: @0 = global i32 0 lines at the start (before the first ;.. That didn't happen previously.

Fixed. Does this test case exist somewhere? I couldn't find it upstream.

llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/check_attrs.ll.funcattrs.expected
9 ↗(On Diff #532719)

I thought it was an existing test failure upstream, but it seems to have disappeared after doing a full rebuild clang+llvm.

llvm/utils/UpdateTestChecks/common.py
1016

No, I guess maintaining this structure isn't worth it long term with the autoformatter

@nikic Are you happy with the current patchset?

nikic accepted this revision.Jul 4 2023, 7:44 AM

LGTM

Fixed. Does this test case exist somewhere? I couldn't find it upstream.

That wasn't an existing test case. Though if you add a --check-globals all variant to generated-funcs.c, I believe that would cover that case.

This revision is now accepted and ready to land.Jul 4 2023, 7:44 AM
hnrklssn updated this revision to Diff 537286.Jul 5 2023, 3:26 AM

Add unnamed global test case

Added cases in generated-funcs.c with --check-globals all, testing that named and unnamed globals are matched differently.

nikic added a comment.Jul 5 2023, 3:33 AM

(The patch description could use an update...)

hnrklssn updated this revision to Diff 537299.Jul 5 2023, 4:55 AM

Update test cases after rebasing on ToT

hnrklssn retitled this revision from Add support for annotations in UpdateTestChecks (NFC) to [UTC] Add fallback support for specific metadata, and check their defs.Jul 5 2023, 4:56 AM
hnrklssn edited the summary of this revision. (Show Details)
jdoerfert closed this revision.Jul 7 2023, 11:57 AM

When you commit things, can you please add the Differential Revision into the commit message so it is closed here and we can easily find the review for a change.

Committed as 8a3fdf7b908978625e9a7e57fbb443e4e6f98976.

This revision is now accepted and ready to land.Jul 7 2023, 12:08 PM
jdoerfert requested changes to this revision.Jul 10 2023, 12:22 PM

I might pull this later today until a fix is available. It makes updates of files with globals super cumbersome (you need to manually remove the duplicates).

This revision now requires changes to proceed.Jul 10 2023, 12:22 PM

I might pull this later today until a fix is available. It makes updates of files with globals super cumbersome (you need to manually remove the duplicates).

FWIW, the duplication happens if you pass -p already. It's an existing problem made worse by effectively forcing -p for globals.