This is an archive of the discontinued LLVM Phabricator instance.

FileCheck Enhancement - pattern templates.
Needs RevisionPublic

Authored by eklepilkina on Jul 15 2016, 2:35 AM.

Details

Summary

This patch adds opportunity to describe some patterns as named template with or without parameters by using new directive CHECK-PATTERN.

Diff Detail

Event Timeline

eklepilkina retitled this revision from to FileCheck Enhancement - pattern templates..
eklepilkina updated this object.
eklepilkina changed the edit policy from "All Users" to "Administrators".
eklepilkina added a subscriber: vsk.

I'll glad if there will be some suggestions how make syntax better. The main idea of this syntax was to use characters with \ for avoiding escaping these characters in standart patterns.

vsk added a comment.Jul 15 2016, 10:42 AM

This looks very useful!

My first round of comments do not really address the correctness of this patch (although I do think I found one correctness issue). Please clang-format the diff and upload a diff with more context (e.g git diff -U1000 ...). I can take a closer look once these issues are fixed up :).

test/FileCheck/check-pattern.txt
11

What are the exact semantics of "\:", "\#", and "\=" in the CHECK-PATTERN and CHECK contexts? Please document this and add the documentation to this patch.

I think I have "\#" figured out, but "\:" is confusing (does it make a captured word available via the specified label?). I can't really work out exactly what "\=" is doing -- I'd rather have a spec before picking through the patch :).

12

Could you add a test that shows this behaves properly when different prefixes are in use? E.g;

// RUN: FileCheck -input-file %s %s -check-prefix=FOO
// RUN: FileCheck -input-file %s %s -check-prefix=BAR

// FOO-PATTERN thing: foo-thing
// BAR-PATTERN thing: bar-thing

foo-thing
bar-thing

// FOO: {{\:(thing)}}
// BAR: {{\:(thing)}}
utils/FileCheck/FileCheck.cpp
136

Needs a doxygen comment.

220

I suggest calling this "getNumParameters", and dropping the redundant "\return" doxygen line.

241

Typo, "Parameter". The same typo occurs elsewhere in the diff.

247

Use emplace.

255

Return a StringRef or a const std::string &.

260

Use llvm::SmallSet.

261

I think braces around single-statement loops are usually dropped.

286

Instead of repeatedly re-writing Regex, I suggest creating a SmallVector of expression components and then doing llvm::join(Components).

290

This looks like you're setting yourself up for a use-after-free. This should return a std::string.

295

Avoid static class members.

311

Avoid globals. Also, use a llvm::StringMap<PatternTemplate*> here.

326

NULL is discouraged, use nullptr.

Also, this does 2 table lookups but only needs to do one. E.g something like;

const auto PatternIT = Templates.find(TemplateName);
if (PatternIT == Templates.end())
  return nullptr;
return *PatternIT;
823

Could you elaborate?

eklepilkina marked 10 inline comments as done.Jul 19 2016, 5:15 AM
eklepilkina added inline comments.
test/FileCheck/check-pattern.txt
12

Thank you for this test! There was no problem with prefixes, but was problem with fixed strings instead of patterns. I fixed it.

eklepilkina marked 3 inline comments as done.Jul 19 2016, 6:09 AM
eklepilkina added inline comments.
utils/FileCheck/FileCheck.cpp
823

This means that there may be used described before template as variable.

eklepilkina marked 2 inline comments as done.
aprantl added inline comments.Jul 19 2016, 11:13 AM
docs/CommandGuide/FileCheck.rst
494 ↗(On Diff #64480)

The CHECK-PATTERN directive is only mentioned in examples but never defined properly.

542 ↗(On Diff #64480)

The new syntax should be introduced before the examples. This makes it easier to look up the syntax when you use the document as a reference.

A couple of general remarks about the proposed syntax:

Using CHECK-lines to define meta-variables without checking anything diverges from FileCheck's usual behavior. I find the name CHECK-PATTERN (I know that CHECK can be renamed, but still) particularly misleading and confusing. I would expect something more like CHECK-DEFINE-PATTERN or even passing the patterns to FileCheck on the command line instead.

The syntax of \#, #:, \= is different from what we usually have in FileCheck where we usually use double-brackets "[[" to escape. Why is it so different? Is this borrowed from some other language that users can be expected to have some familiarity with?

docs/CommandGuide/FileCheck.rst
529 ↗(On Diff #64480)

I don't understand this example. Is it really equivalent to the next example as claimed in the text, or does it have side-effects? If so, what are the side-effects?

vsk added a comment.Jul 19 2016, 11:46 AM

More mostly surface-level comments. I think Adrian and I both have concerns over the syntax, so I'll wait a bit before diving into the code.

docs/CommandGuide/FileCheck.rst
522 ↗(On Diff #64480)

mane -> many

529 ↗(On Diff #64480)

Is the extra space in \#( reg_hex_num) necessary? I don't want to be too picky / focused on minutiae, but I think this could legitimately be a point of confusion.

529 ↗(On Diff #64480)

Have you considered changing the syntax for parameter substitution? IMO \:(regName)\=(reg) has two problems: (1) it uses two separate syntactic constructs to do one thing, and (2) it overloads \:, making its meaning context-dependent. I suggest something like this: \=(regName:reg).

540 ↗(On Diff #64480)

Specify whether or not this can occur in a CHECK-PATTERN line.

test/FileCheck/check-pattern-prefixes.txt
3 ↗(On Diff #64480)

Check for conflicting parameter definitions:

// RUN: not FileCheck -input-file %s %s -check-prefixes=FOO,BAR

utils/FileCheck/FileCheck.cpp
137

Should return a StringRef. If the caller needs a mutable copy, it can then do: getRegExStr().str().

140

Ditto, return a StringRef (avoids a copy).

194

Since there is only one in-out parameter here, I think the API would be clearer as:

Expected<std::string> ReplaceTemplates(StringRef Str, SourceMgr &SM) (see llvm/Support/Error.h)

203

Use llvm/ADT/IndexedMap.h.

206

StringRef Name

220

Could you drop the \return doxygen line here? I don't think it adds very much.

246

Can drop StringReg(...) if getRegExStr() returns a StringRef.

307

Use std::unique_ptr<PatternTemplate>.

315

Can you find a way to instantiate a TemplatesCollection earlier, and to then pass it by reference into methods/classes which need it? I don't think a singleton approach is much better than having a global.

321

StringRef's are by nature already references. There's no need to pass them by-reference, just copy them.

1015

Use llvm::make_unique<PatternTemplate>.

Using CHECK-lines to define meta-variables without checking anything diverges from FileCheck's usual behavior.

Execuse me, but I didn't understand what do you mean. FileCheck variables are defined in check strings too. And about "without checking": there is checking of earlier definition of this pattern. What do you want to check more?

I would expect something more like CHECK-DEFINE-PATTERN

There is no problem to rename.

or even passing the patterns to FileCheck on the command line instead.

There may be a lot of patterns. Do you mean passing file with patterns?

The syntax of \#, #:, \= is different from what we usually have in FileCheck where we usually use double-brackets "[[" to escape. Why is it so different? Is this borrowed from some other language that users can be expected to have some familiarity with?

This syntax was based on simple regular expressions where \ is escaping character. Syntax is a difficult question. Do you suggest using something like #pattern_name, [[:variable_name]], [[=variable_value]]?
I don't know how users will be better so I would like to know more opinions.

eklepilkina marked 9 inline comments as done.Jul 20 2016, 1:28 AM
eklepilkina added inline comments.
docs/CommandGuide/FileCheck.rst
529 ↗(On Diff #64480)

Ok. I'll change syntax when there is one decision, because there are more suggestions about changing it.

utils/FileCheck/FileCheck.cpp
203

"The index map template takes two types. The first is the mapped type and the second is a functor that maps its argument to a size_t.".
I need to remember variable position in string. As I uderstood functor should have constructor without parameters, but if I write functor it should have field with position. I can't define position using only parameter name.

eklepilkina marked 5 inline comments as done.Jul 20 2016, 6:03 AM
eklepilkina added inline comments.
docs/CommandGuide/FileCheck.rst
529 ↗(On Diff #64480)

There was extra space. Without it, test is equivalen.

test/FileCheck/check-pattern-prefixes.txt
3 ↗(On Diff #64480)

Why should it fail? There will be warning that template with same name exists. And pattern will be assosiated with foo-thing.

utils/FileCheck/FileCheck.cpp
315

If I do so a lot of functions will get one more parameter although many functions will only pass this parameter further. Most of these functions get quite big number of parameters now.

About syntax.
There is an idea to do as you advised.
Suggested new syntax:
[['#template_name]] - use of template 'template_name'. It can occur in CHECK-PATTERN line, when description of one template includes other templates described before. (Without quote, I don't know how escape # here)
[[:Variable_name]] - template variable with name 'variable_name'
[[:variable_name=value]] - current value of template variable(it's needed when you use template with variables).
Is this syntax better? Should I do such way?

About syntax.
There is an idea to do as you advised.
Suggested new syntax:
[['#template_name]] - use of template 'template_name'. It can occur in CHECK-PATTERN line, when description of one template includes other templates described before. (Without quote, I don't know how escape # here)
[[:Variable_name]] - template variable with name 'variable_name'
[[:variable_name=value]] - current value of template variable(it's needed when you use template with variables).
Is this syntax better? Should I do such way?

Let me start off by saying thanks for working on this!

For reference, here is what I remember the syntax of FileCheck CHECK-lines to be currently:

ACTION <- CHECK ':' MATCH '\n' ;
MATCH <- ;
MATCH <- TEXT MATCH ;
MATCH <- REGEX MATCH ;
MATCH <- VAR MATCH ;
REGEX <- '{{' POSIX_REGEX '}}' ;
VAR <- '[[' IDENT ':' POSIX_REGEX ']]' ;
VAR <- '[[' IDENT ']]' ;

Is there a benefit in (or need for) distinguishing between variable uses and pattern uses, or could we piggyback the existing syntax?

As for pattern variables, I really think this review would benefit if we could split out pattern variables into a separate review. I didn't yet understand how pattern variables are supposed to work: Is \:(variable)\=(value) supposed to be an argument to a pattern use that immediately precedes it? Or is it modifying global state similar to regular variable definition in FileCheck? If the latter is the case, why does it come after the pattern in the example? In the former case, wouldn't it be better to have syntax that looks more like passing in an argument?

Is there a benefit in (or need for) distinguishing between variable uses and pattern uses, or could we piggyback the existing syntax?

There is difference between patterns and variables so we should understand where variable is, where pattern is.

As for pattern variables, I really think this review would benefit if we could split out pattern variables into a separate review. I didn't yet understand how pattern variables are supposed to work: Is \:(variable)\=(value) supposed to be an argument to a pattern use that immediately precedes it? Or is it modifying global state similar to regular variable definition in FileCheck? If the latter is the case, why does it come after the pattern in the example? In the former case, wouldn't it be better to have syntax that looks more like passing in an argument?

I a bit misuderstand how can I split out pattern variables into a separate review, because this review need all these changes. We can't use pattern variable without patterns, so separate review will duplicate part of changes of pattern review.
Pattren variables supposed to help create one pattern when you have a lot of similar checks.
For example

CHECK-DEFINE-PATTERN reg: {{\:(RegName)0x[0-9a-fA-F]+}}

This is pattern for defining register, and name is parameter. Template is saved in such way.
Then you can use it.
CHECK: {{\#(reg)\:(RegName)\=(REG)}}
When you write such check, FileCheck will replace it with template where variable name is changed to it's value. Variable name is some kind of named spaceholder.
The previous check-line will be equivalent to
CHECK: {{REG0x[0-9a-fA-F]+}}
We should have difference between pattern variable and global variables because we can use global variable in pattern or as value of pattern variable.

Is there a benefit in (or need for) distinguishing between variable uses and pattern uses, or could we piggyback the existing syntax?

There is difference between patterns and variables so we should understand where variable is, where pattern is.

They are also both placeholders for a pattern. Can you give an example where it is useful to tell the difference between a use of a pattern and a use of a plain old variable? Does it make the test more readable?
I'm not saying that we have to share the syntax, but adding more syntax increases the cognitive load on testcase authors and we should make sure we are doing it for the right reasons.

As for pattern variables, I really think this review would benefit if we could split out pattern variables into a separate review.

I a bit misuderstand how can I split out pattern variables into a separate review, because this review need all these changes.

If you feel like splitting out the pattern variables into a separate review ins't feasible that's fine with me.

Could you please answer the following questions specifically?

I didn't yet understand how pattern variables are supposed to work: Is \:(variable)\=(value) supposed to be an argument to a pattern use that immediately precedes it? Or is it modifying global state similar to regular variable definition in FileCheck? If the latter is the case, why does it come after the pattern in the example? In the former case, wouldn't it be better to have syntax that looks more like passing in an argument?

Pattren variables supposed to help create one pattern when you have a lot of similar checks.
For example

CHECK-DEFINE-PATTERN reg: {{\:(RegName)0x[0-9a-fA-F]+}}

This is pattern for defining register, and name is parameter. Template is saved in such way.
Then you can use it.
// CHECK: {{\#(reg)\:(RegName)\=(REG)}}
When you write such check, FileCheck will replace it with template where variable name is changed to it's value. Variable name is some kind of named spaceholder.

In the documentation the example showed a pattern use on its own, here it appears inside of a regexp. To make sure I understand the proposal correctly :-) it might help if you could add your new syntax to the grammar I posted earlier.

thanks!
adrian

They are also both placeholders for a pattern. Can you give an example where it is useful to tell the difference between a use of a pattern and a use of a plain old variable? Does it make the test more readable?
I'm not saying that we have to share the syntax, but adding more syntax increases the cognitive load on testcase authors and we should make sure we are doing it for the right reasons.

Firstly, variables should be defined in some check-line. So we can't create variables separately. In case of patterns you can create a lot of frequently used patterns once (for example in separate file and then include this file where you need (see patch https://reviews.llvm.org/D22500)). Moreover pattern can have their own variables.

I didn't yet understand how pattern variables are supposed to work: Is \:(variable)\=(value) supposed to be an argument to a pattern use that immediately precedes it? Or is it modifying global state similar to regular variable definition in FileCheck?

Variables are arguments. I don't understand what the global state do you mean.

If the latter is the case, why does it come after the pattern in the example? In the former case, wouldn't it be better to have syntax that looks more like passing in an argument?

Do you mean something like \#pattern(a, b)? Or what? There is no order of variables during defining it. One variable can be used in pattern several times in different places. I just need variable name to find place in pattern and its value.

In the documentation the example showed a pattern use on its own, here it appears inside of a regexp. To make sure I understand the proposal correctly :-)

Yes, I have a mistake in documentation. Now pattern can be used only in regular expressions. But if we change syntax to use [[ ]], I'll can add possible use of them in fixed strings in line-checks.

it might help if you could add your new syntax to the grammar I posted earlier.

Syntax as is now.

ACTION <- CHECK-DEFINE-PATTERN IDENT ':' MATCH '\n' ;
ACTION <- CHECK-DEFINE-PATTERN IDENT ':' REGEX_DEF '\n' ;
PATTERN_VAR <-;
PATTERN_VAR <-' \:('IDENT')';
REGEX_VARS <- POSIX_REGEX;
REGEX_VARS <- REGEX_VARS PATTERN_VAR REGEX_VARS;
REGEX_DEF <- '{{' REGEX_VARS '}}';
VAR_VALUE <-;
VAR_VALUE <- '\:('IDENT')\=('TEXT')' VAR_VALUE;
PATTERN_USE <- POSIX_REGEX;
PATTERN_USE <- POSIX_REGEX '\#('IDENT')'VAR_VALUE PATTERN_USE;
REGEX <- '{{' PATTERN_USE '}}';
eklepilkina updated this revision to Diff 70696.Sep 8 2016, 7:12 AM
eklepilkina edited reviewers, added: vsk, alexfh; removed: dblaikie.
eklepilkina removed a subscriber: vsk.

Change syntax to new one suggested by Vedant Kumar.

ACTION <- CHECK ':' MATCH '\n' ;

ACTION <- CHECK-DEFINE-PATTERN ':' IDENT PARAMLIST? ':' PATTERN_ELEMENT* '\n' ;
PARAMLIST <- '(' IDENT (',' IDENT)* ')' ;
PATTERN_ELEMENT <- IDENT | REGEX;
MATCH <- (TEXT | REGEX | PATTERN_USE | VAR)* ;
REGEX <- '{{' POSIX_REGEX '}}' ;
PATTERN_USE <- '[[' '@' IDENT  ARGLIST? ']]' ;
VAR <- '[[' IDENT ':' POSIX_REGEX ']]' ;
VAR <- '[[' IDENT '@' IDENT ARGLIST? ']]' ;
ARGLIST <- '(' ARG (',' ARG)* ')' ;
ARG <- "([^"]|\\")*" ;
vsk edited edge metadata.Sep 9 2016, 2:27 PM

I've added a few comments inline which I hope will be useful. However, after looking this patch over a few times, I've found that it's too large for me to review constructively.

The main issue is that the grammar you've implemented doesn't correspond to the one we discussed. E.g your PATTERN_ELEMENT includes PATTERN_USE, and your ARG is not handled correctly. I don't want to say "no" to new features, but I would much rather have them land incrementally, with more testing for edge-cases and more documentation. Can you scale back this patch to just cover these new bits of the grammar?

ACTION <- CHECK-DEFINE-PATTERN ':' IDENT ':' PATTERN_ELEMENT* '\n' ;
PATTERN_ELEMENT <- IDENT | REGEX;
MATCH <- PATTERN_USE ;
PATTERN_USE <- '[[' '@' IDENT ']]' ;
VAR <- '[[' IDENT '@' IDENT ']]' ;
docs/CommandGuide/FileCheck.rst
488 ↗(On Diff #70696)

Please use the same formatting rules as the rest of this document. Can you trim the twiddles ("~~~") to match the section name's length and line-wrap text that isn't in a code block?

495 ↗(On Diff #70696)

Please summarize the new syntax in the paragraph above, and use small examples here instead.

test/FileCheck/check-pattern.txt
25

Add tests for:

  1. All error messages you've introduced
  2. Allow -check-prefix=FOO, FOO-DEFINE-PATTERN
  3. Disallow BAR-DEFINE-PATTERN and FOO-DEFINE-PATTERN for the same definition name
  4. Capture '@LINE' in a variable
  5. Disallow using a pattern captured with a FOO-definition in a BAR check
  6. Verify CHECK-{NEXT, SAME, DAG} etc. work with captured patterns

Are quotes necessary for parameters? Everything is string in FileCheck.
Using patterns and variables was in old grammer. I thought that we only change syntax, so I added this feature too.
Ok, I'll remove this from this patch. But then I want to add using patterns and variables in definitions later. On my opinion these features are quite useful.

vsk added a comment.Sep 10 2016, 12:00 PM

Are quotes necessary for parameters? Everything is string in FileCheck.

Yes, the quotes are necessary. Without them the grammar is ambiguous. Consider the string '@foo(a, b)'. Is there just one argument to foo (the string "a, b"), or are there two arguments ("a" and "b")?

Using patterns and variables was in old grammer. I thought that we only change syntax, so I added this feature too.
Ok, I'll remove this from this patch. But then I want to add using patterns and variables in definitions later. On my opinion these features are quite useful.

Sounds great :).

eklepilkina marked an inline comment as done.Sep 15 2016, 1:47 AM
eklepilkina added inline comments.
test/FileCheck/check-pattern.txt
25

Capture '@LINE' in a variable

I don't understand what do you mean. How can I do this? CHECK: [[Number: [[@LINE]]]]?
Or do you mean text '@LINE'?

eklepilkina edited edge metadata.

Patch without modifiers for pattern parameters and using pattern and variables inside definition of other patterns.

alexfh requested changes to this revision.May 11 2017, 5:30 AM

IIUC, there was no consensus about usefulness of this change.

This revision now requires changes to proceed.May 11 2017, 5:30 AM