This patch adds opportunity to describe some patterns as named template with or without parameters by using new directive CHECK-PATTERN.
Details
Diff Detail
Event Timeline
I'll glad if there will be some suggestions how make syntax better. The main idea of this syntax was to use characters with \ for avoiding escaping these characters in standart patterns.
This looks very useful!
My first round of comments do not really address the correctness of this patch (although I do think I found one correctness issue). Please clang-format the diff and upload a diff with more context (e.g git diff -U1000 ...). I can take a closer look once these issues are fixed up :).
| test/FileCheck/check-pattern.txt | ||
|---|---|---|
| 12 | What are the exact semantics of "\:", "\#", and "\=" in the CHECK-PATTERN and CHECK contexts? Please document this and add the documentation to this patch. I think I have "\#" figured out, but "\:" is confusing (does it make a captured word available via the specified label?). I can't really work out exactly what "\=" is doing -- I'd rather have a spec before picking through the patch :). | |
| 13 | Could you add a test that shows this behaves properly when different prefixes are in use? E.g; // RUN: FileCheck -input-file %s %s -check-prefix=FOO
// RUN: FileCheck -input-file %s %s -check-prefix=BAR
// FOO-PATTERN thing: foo-thing
// BAR-PATTERN thing: bar-thing
foo-thing
bar-thing
// FOO: {{\:(thing)}}
// BAR: {{\:(thing)}} | |
| utils/FileCheck/FileCheck.cpp | ||
| 136 | Needs a doxygen comment. | |
| 220 | I suggest calling this "getNumParameters", and dropping the redundant "\return" doxygen line. | |
| 241 | Typo, "Parameter". The same typo occurs elsewhere in the diff. | |
| 247 | Use emplace. | |
| 255 | Return a StringRef or a const std::string &. | |
| 260 | Use llvm::SmallSet. | |
| 261 | I think braces around single-statement loops are usually dropped. | |
| 286 | Instead of repeatedly re-writing Regex, I suggest creating a SmallVector of expression components and then doing llvm::join(Components). | |
| 290 | This looks like you're setting yourself up for a use-after-free. This should return a std::string. | |
| 295 | Avoid static class members. | |
| 311 | Avoid globals. Also, use a llvm::StringMap<PatternTemplate*> here. | |
| 326 | NULL is discouraged, use nullptr. Also, this does 2 table lookups but only needs to do one. E.g something like; const auto PatternIT = Templates.find(TemplateName); if (PatternIT == Templates.end()) return nullptr; return *PatternIT; | |
| 823 | Could you elaborate? | |
| test/FileCheck/check-pattern.txt | ||
|---|---|---|
| 13 | Thank you for this test! There was no problem with prefixes, but was problem with fixed strings instead of patterns. I fixed it. | |
| utils/FileCheck/FileCheck.cpp | ||
|---|---|---|
| 823 | This means that there may be used described before template as variable. | |
A couple of general remarks about the proposed syntax:
Using CHECK-lines to define meta-variables without checking anything diverges from FileCheck's usual behavior. I find the name CHECK-PATTERN (I know that CHECK can be renamed, but still) particularly misleading and confusing. I would expect something more like CHECK-DEFINE-PATTERN or even passing the patterns to FileCheck on the command line instead.
The syntax of \#, #:, \= is different from what we usually have in FileCheck where we usually use double-brackets "[[" to escape. Why is it so different? Is this borrowed from some other language that users can be expected to have some familiarity with?
| docs/CommandGuide/FileCheck.rst | ||
|---|---|---|
| 529 | I don't understand this example. Is it really equivalent to the next example as claimed in the text, or does it have side-effects? If so, what are the side-effects? | |
More mostly surface-level comments. I think Adrian and I both have concerns over the syntax, so I'll wait a bit before diving into the code.
| docs/CommandGuide/FileCheck.rst | ||
|---|---|---|
| 522 | mane -> many | |
| 529 | Is the extra space in \#( reg_hex_num) necessary? I don't want to be too picky / focused on minutiae, but I think this could legitimately be a point of confusion. | |
| 529 | Have you considered changing the syntax for parameter substitution? IMO \:(regName)\=(reg) has two problems: (1) it uses two separate syntactic constructs to do one thing, and (2) it overloads \:, making its meaning context-dependent. I suggest something like this: \=(regName:reg). | |
| 540 | Specify whether or not this can occur in a CHECK-PATTERN line. | |
| test/FileCheck/check-pattern-prefixes.txt | ||
| 3 | Check for conflicting parameter definitions: // RUN: not FileCheck -input-file %s %s -check-prefixes=FOO,BAR | |
| utils/FileCheck/FileCheck.cpp | ||
| 137 | Should return a StringRef. If the caller needs a mutable copy, it can then do: getRegExStr().str(). | |
| 140 | Ditto, return a StringRef (avoids a copy). | |
| 194 | Since there is only one in-out parameter here, I think the API would be clearer as: Expected<std::string> ReplaceTemplates(StringRef Str, SourceMgr &SM) (see llvm/Support/Error.h) | |
| 203 | Use llvm/ADT/IndexedMap.h. | |
| 206 | StringRef Name | |
| 220 | Could you drop the \return doxygen line here? I don't think it adds very much. | |
| 246 | Can drop StringReg(...) if getRegExStr() returns a StringRef. | |
| 307 | Use std::unique_ptr<PatternTemplate>. | |
| 315 | Can you find a way to instantiate a TemplatesCollection earlier, and to then pass it by reference into methods/classes which need it? I don't think a singleton approach is much better than having a global. | |
| 321 | StringRef's are by nature already references. There's no need to pass them by-reference, just copy them. | |
| 1015 | Use llvm::make_unique<PatternTemplate>. | |
Execuse me, but I didn't understand what do you mean. FileCheck variables are defined in check strings too. And about "without checking": there is checking of earlier definition of this pattern. What do you want to check more?
I would expect something more like CHECK-DEFINE-PATTERN
There is no problem to rename.
or even passing the patterns to FileCheck on the command line instead.
There may be a lot of patterns. Do you mean passing file with patterns?
The syntax of \#, #:, \= is different from what we usually have in FileCheck where we usually use double-brackets "[[" to escape. Why is it so different? Is this borrowed from some other language that users can be expected to have some familiarity with?
This syntax was based on simple regular expressions where \ is escaping character. Syntax is a difficult question. Do you suggest using something like #pattern_name, [[:variable_name]], [[=variable_value]]?
I don't know how users will be better so I would like to know more opinions.
| docs/CommandGuide/FileCheck.rst | ||
|---|---|---|
| 529 | Ok. I'll change syntax when there is one decision, because there are more suggestions about changing it. | |
| utils/FileCheck/FileCheck.cpp | ||
| 203 | "The index map template takes two types. The first is the mapped type and the second is a functor that maps its argument to a size_t.". | |
| docs/CommandGuide/FileCheck.rst | ||
|---|---|---|
| 529 | There was extra space. Without it, test is equivalen. | |
| test/FileCheck/check-pattern-prefixes.txt | ||
| 3 | Why should it fail? There will be warning that template with same name exists. And pattern will be assosiated with foo-thing. | |
| utils/FileCheck/FileCheck.cpp | ||
| 315 | If I do so a lot of functions will get one more parameter although many functions will only pass this parameter further. Most of these functions get quite big number of parameters now. | |
About syntax.
There is an idea to do as you advised.
Suggested new syntax:
[['#template_name]] - use of template 'template_name'. It can occur in CHECK-PATTERN line, when description of one template includes other templates described before. (Without quote, I don't know how escape # here)
[[:Variable_name]] - template variable with name 'variable_name'
[[:variable_name=value]] - current value of template variable(it's needed when you use template with variables).
Is this syntax better? Should I do such way?
Let me start off by saying thanks for working on this!
For reference, here is what I remember the syntax of FileCheck CHECK-lines to be currently:
ACTION <- CHECK ':' MATCH '\n' ;
MATCH <- ;
MATCH <- TEXT MATCH ;
MATCH <- REGEX MATCH ;
MATCH <- VAR MATCH ;
REGEX <- '{{' POSIX_REGEX '}}' ;
VAR <- '[[' IDENT ':' POSIX_REGEX ']]' ;
VAR <- '[[' IDENT ']]' ;Is there a benefit in (or need for) distinguishing between variable uses and pattern uses, or could we piggyback the existing syntax?
As for pattern variables, I really think this review would benefit if we could split out pattern variables into a separate review. I didn't yet understand how pattern variables are supposed to work: Is \:(variable)\=(value) supposed to be an argument to a pattern use that immediately precedes it? Or is it modifying global state similar to regular variable definition in FileCheck? If the latter is the case, why does it come after the pattern in the example? In the former case, wouldn't it be better to have syntax that looks more like passing in an argument?
There is difference between patterns and variables so we should understand where variable is, where pattern is.
As for pattern variables, I really think this review would benefit if we could split out pattern variables into a separate review. I didn't yet understand how pattern variables are supposed to work: Is \:(variable)\=(value) supposed to be an argument to a pattern use that immediately precedes it? Or is it modifying global state similar to regular variable definition in FileCheck? If the latter is the case, why does it come after the pattern in the example? In the former case, wouldn't it be better to have syntax that looks more like passing in an argument?
I a bit misuderstand how can I split out pattern variables into a separate review, because this review need all these changes. We can't use pattern variable without patterns, so separate review will duplicate part of changes of pattern review.
Pattren variables supposed to help create one pattern when you have a lot of similar checks.
For example
CHECK-DEFINE-PATTERN reg: {{\:(RegName)0x[0-9a-fA-F]+}}
This is pattern for defining register, and name is parameter. Template is saved in such way.
Then you can use it.
 CHECK: {{\#(reg)\:(RegName)\=(REG)}}
When you write such check, FileCheck will replace it with template where variable name is changed to it's value. Variable name is some kind of named spaceholder.
The previous check-line will be equivalent to
 CHECK: {{REG0x[0-9a-fA-F]+}}
We should have difference between pattern variable and global variables because we can use global variable in pattern or as value of pattern variable.
They are also both placeholders for a pattern. Can you give an example where it is useful to tell the difference between a use of a pattern and a use of a plain old variable? Does it make the test more readable?
I'm not saying that we have to share the syntax, but adding more syntax increases the cognitive load on testcase authors and we should make sure we are doing it for the right reasons.
As for pattern variables, I really think this review would benefit if we could split out pattern variables into a separate review.
I a bit misuderstand how can I split out pattern variables into a separate review, because this review need all these changes.
If you feel like splitting out the pattern variables into a separate review ins't feasible that's fine with me.
Could you please answer the following questions specifically?
I didn't yet understand how pattern variables are supposed to work: Is \:(variable)\=(value) supposed to be an argument to a pattern use that immediately precedes it? Or is it modifying global state similar to regular variable definition in FileCheck? If the latter is the case, why does it come after the pattern in the example? In the former case, wouldn't it be better to have syntax that looks more like passing in an argument?
Pattren variables supposed to help create one pattern when you have a lot of similar checks.
For exampleCHECK-DEFINE-PATTERN reg: {{\:(RegName)0x[0-9a-fA-F]+}}
This is pattern for defining register, and name is parameter. Template is saved in such way.
Then you can use it.
// CHECK: {{\#(reg)\:(RegName)\=(REG)}}
When you write such check, FileCheck will replace it with template where variable name is changed to it's value. Variable name is some kind of named spaceholder.
In the documentation the example showed a pattern use on its own, here it appears inside of a regexp. To make sure I understand the proposal correctly :-) it might help if you could add your new syntax to the grammar I posted earlier.
thanks!
adrian
Firstly, variables should be defined in some check-line. So we can't create variables separately. In case of patterns you can create a lot of frequently used patterns once (for example in separate file and then include this file where you need (see patch https://reviews.llvm.org/D22500)). Moreover pattern can have their own variables.
I didn't yet understand how pattern variables are supposed to work: Is \:(variable)\=(value) supposed to be an argument to a pattern use that immediately precedes it? Or is it modifying global state similar to regular variable definition in FileCheck?
Variables are arguments. I don't understand what the global state do you mean.
If the latter is the case, why does it come after the pattern in the example? In the former case, wouldn't it be better to have syntax that looks more like passing in an argument?
Do you mean something like \#pattern(a, b)? Or what? There is no order of variables during defining it. One variable can be used in pattern several times in different places. I just need variable name to find place in pattern and its value.
In the documentation the example showed a pattern use on its own, here it appears inside of a regexp. To make sure I understand the proposal correctly :-)
Yes, I have a mistake in documentation. Now pattern can be used only in regular expressions. But if we change syntax to use [[ ]], I'll can add possible use of them in fixed strings in line-checks.
it might help if you could add your new syntax to the grammar I posted earlier.
Syntax as is now.
ACTION <- CHECK-DEFINE-PATTERN IDENT ':' MATCH '\n' ;
ACTION <- CHECK-DEFINE-PATTERN IDENT ':' REGEX_DEF '\n' ;
PATTERN_VAR <-;
PATTERN_VAR <-' \:('IDENT')';
REGEX_VARS <- POSIX_REGEX;
REGEX_VARS <- REGEX_VARS PATTERN_VAR REGEX_VARS;
REGEX_DEF <- '{{' REGEX_VARS '}}';
VAR_VALUE <-;
VAR_VALUE <- '\:('IDENT')\=('TEXT')' VAR_VALUE;
PATTERN_USE <- POSIX_REGEX;
PATTERN_USE <- POSIX_REGEX '\#('IDENT')'VAR_VALUE PATTERN_USE;
REGEX <- '{{' PATTERN_USE '}}';Change syntax to new one suggested by Vedant Kumar.
ACTION <- CHECK ':' MATCH '\n' ;
ACTION <- CHECK-DEFINE-PATTERN ':' IDENT PARAMLIST? ':' PATTERN_ELEMENT* '\n' ;
PARAMLIST <- '(' IDENT (',' IDENT)* ')' ;
PATTERN_ELEMENT <- IDENT | REGEX;
MATCH <- (TEXT | REGEX | PATTERN_USE | VAR)* ;
REGEX <- '{{' POSIX_REGEX '}}' ;
PATTERN_USE <- '[[' '@' IDENT  ARGLIST? ']]' ;
VAR <- '[[' IDENT ':' POSIX_REGEX ']]' ;
VAR <- '[[' IDENT '@' IDENT ARGLIST? ']]' ;
ARGLIST <- '(' ARG (',' ARG)* ')' ;
ARG <- "([^"]|\\")*" ;I've added a few comments inline which I hope will be useful. However, after looking this patch over a few times, I've found that it's too large for me to review constructively.
The main issue is that the grammar you've implemented doesn't correspond to the one we discussed. E.g your PATTERN_ELEMENT includes PATTERN_USE, and your ARG is not handled correctly. I don't want to say "no" to new features, but I would much rather have them land incrementally, with more testing for edge-cases and more documentation. Can you scale back this patch to just cover these new bits of the grammar?
ACTION <- CHECK-DEFINE-PATTERN ':' IDENT ':' PATTERN_ELEMENT* '\n' ; PATTERN_ELEMENT <- IDENT | REGEX; MATCH <- PATTERN_USE ; PATTERN_USE <- '[[' '@' IDENT ']]' ; VAR <- '[[' IDENT '@' IDENT ']]' ;
| docs/CommandGuide/FileCheck.rst | ||
|---|---|---|
| 488 | Please use the same formatting rules as the rest of this document. Can you trim the twiddles ("~~~") to match the section name's length and line-wrap text that isn't in a code block? | |
| 495 | Please summarize the new syntax in the paragraph above, and use small examples here instead. | |
| test/FileCheck/check-pattern.txt | ||
| 25 | Add tests for: 
 | |
Are quotes necessary for parameters? Everything is string in FileCheck. 
Using patterns and variables was in old grammer. I thought that we only change syntax, so I added this feature too.
Ok, I'll remove this from this patch. But then I want to add using patterns and variables in definitions later. On my opinion these features are quite useful.
Yes, the quotes are necessary. Without them the grammar is ambiguous. Consider the string '@foo(a, b)'. Is there just one argument to foo (the string "a, b"), or are there two arguments ("a" and "b")?
Using patterns and variables was in old grammer. I thought that we only change syntax, so I added this feature too.
Ok, I'll remove this from this patch. But then I want to add using patterns and variables in definitions later. On my opinion these features are quite useful.
Sounds great :).
| test/FileCheck/check-pattern.txt | ||
|---|---|---|
| 25 | 
 I don't understand what do you mean. How can I do this? CHECK: [[Number: [[@LINE]]]]? | |
Patch without modifiers for pattern parameters and using pattern and variables inside definition of other patterns.
Please use the same formatting rules as the rest of this document. Can you trim the twiddles ("~~~") to match the section name's length and line-wrap text that isn't in a code block?