Page MenuHomePhabricator

[UpdateTestChecks] Match unnamed values like "@[0-9]+" and "![0-9]+"

Authored by jdoerfert on Aug 2 2020, 1:21 PM.



With this patch we will match most *uses* of "temporary" named things in
the IR via regular expressions, not their name at creation time. The new
"values" we match are:

  • "unnamed" globals: @[0-9]+
  • debug metadata: !dbg ![0-9]+
  • loop metadata: !loop ![0-9]+
  • tbaa metadata: !tbaa ![0-9]+
  • range metadata: !range ![0-9]+
  • generic metadata: metadata ![0-9]+
  • attributes groups: #[0-9]

We still don't match the declarations but that can be done later. This
patch can introduce churn when existing check lines contain the old
hardcoded versions of the above "values". We can add a flag to opt-out,
or opt-in, if necessary.

Diff Detail

Event Timeline

jdoerfert created this revision.Aug 2 2020, 1:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2020, 1:21 PM
Herald added subscribers: bollu, kosarev. · View Herald Transcript
jdoerfert requested review of this revision.Aug 2 2020, 1:21 PM
jdoerfert edited the summary of this revision. (Show Details)Aug 2 2020, 1:24 PM

Will update the clang tests shortly.


Maybe I add matchers for these next.

lgtm, apart from clang tests. Someone else should take a look as well, though.


typo: unnamed.


typo: group

MaskRay added inline comments.Aug 3 2020, 9:38 PM

Is this a capturing group? If not, use `(?:\s+)

The 3 in the range(3, match.lastindex): below looks strange. I hope we can get rid if it.


Use non-capturing group


Do you have an example that not substituting repeatedly can fail?

jdoerfert added inline comments.Aug 4 2020, 5:18 AM

It is, we use in 386 for example. I did not change that part.

The 3 is a bit magic, I agree. It basically is to skip this group, the outer group of all "IR values", which I could avoid and the entire match. We have to have some hardcoded values there though, like before. Though I can make it a variable to explain better.


We use this in 386 and other places.


Line 8 in llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/various_ir_values.ll.funcsig.expected :)

MaskRay added inline comments.Aug 4 2020, 2:37 PM

[[ATTRIBUTES0:#.*]] is very long. I wonder whether we should abbreviate it to ATTR0 or the like.


Instead of Structure of arrays, would it be better organizing these constants as Array of structures?

For example, you can write ['TMP', '%', r'[\w.-]+?'], ['GLOBAL', '@', ...], ...


Tuples should be better: [('TMP', '%', r'[\w.-]+?'), ('GLOBAL', '@', ...), ...]

jdoerfert added inline comments.Aug 4 2020, 2:48 PM

I thought about it afterwards as well. Will make it a class so it's easier to see what is what.

arichardson added inline comments.Aug 6 2020, 2:52 AM

Since we are capturing the attribute number instead of using a #{{[0-9]+}} regex, should we be using a non-capturing [[ATTRIBUTES0]] after the first time?

I guess this would require something like global_vars_seen in addition to vars_seen

jdoerfert added inline comments.Aug 6 2020, 8:42 AM

Argh, good catch. Will fix this for sure.

jdoerfert updated this revision to Diff 284492.Aug 10 2020, 1:30 PM
jdoerfert marked 9 inline comments as done.

global_seen_vars, better grouping of inormation, addressed comments


I think here it actually made sense to capture the name again, it is a different prefix after all. However, I did introduce the global_vars_seen now changed the test to include two functions. All but %{{.*}} values are considered global and we will reuse the existing mapping from the same prefix. You can see tbaa is reused in the test.

jdoerfert updated this revision to Diff 284537.Aug 10 2020, 5:49 PM

Update clang tests

Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2020, 5:49 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
jdoerfert edited the summary of this revision. (Show Details)Aug 10 2020, 5:52 PM

Mostly looks good.

nameless_values = [
    NamelessValue(r'TMP',  r'%',            r'[\w.-]+?'),
    NamelessValue(r'TMP',  r'%',            r'[\w.-]+?'),

Can the suffix use (?:...)?


This is indeed strange. If we don't have a good explanation, probably worth adding count=1

jdoerfert updated this revision to Diff 284543.Aug 10 2020, 6:27 PM
jdoerfert marked an inline comment as done.

added @MaskRays's tweaks

(I'm going to sneak in !range ! as well)


We use the captured value later, and I guess people want to keep the exact same "spaces". TBH, it won't make much difference to the logic anywhere (we'd still use match.lastindex). All it would do is it would make us "unify" the output, unsure if that is what we want.

MaskRay accepted this revision.Aug 10 2020, 9:37 PM

LGTM. I don't have further comments. Worth waiting for other opinions.

This revision is now accepted and ready to land.Aug 10 2020, 9:37 PM
thakis added a subscriber: thakis.Aug 12 2020, 9:40 AM
thakis added inline comments.

This is Py3.4+. Per we still support 2.7, which is why all my bots run that, which is why this breaks my bots :) (...and everyone still using Python 2.7 locally). Can you do this some other way? Maybe just surround the re with "^..$'?

jdoerfert added inline comments.Aug 12 2020, 3:52 PM

Addressed and restored actual functionality in 07448c550457d2afb1e7d69254a58ad44dece3d2