This is an archive of the discontinued LLVM Phabricator instance.

[FileCheck] Annotate input dump (1/7)
ClosedPublic

Authored by jdenny on Oct 8 2018, 2:25 PM.

Details

Summary

Extend FileCheck to dump its input annotated with FileCheck's
diagnostics: errors, good matches if -v, and additional information if
-vv. The goal is to make it easier to visualize FileCheck's matching
behavior when debugging.

Each patch in this series implements input annotations for a
particular category of FileCheck diagnostics. While the first few
patches alone are somewhat useful, the annotations become much more
useful as later patches implement annotations for -v and -vv
diagnostics, which show the matching behavior leading up to the error.

This first patch implements boilerplate plus input annotations for
error diagnostics reporting that no matches were found for a
directive. These annotations mark the search ranges of the failed
directives. Instead of using the usual ^~~, which is used by later
patches for good matches, these annotations use X~~ so that this
category of errors is visually distinct.

For example:

$ FileCheck -dump-input=help
The following description was requested by -dump-input=help to
explain the input annotations printed by -dump-input=always and
-dump-input=fail:

  - L:     labels line number L of the input file
  - T:L    labels the match result for a pattern of type T from line L of
           the check file
  - X~~    marks search range when no match is found
  - colors error

If you are not seeing color above or in input dumps, try: -color

$ FileCheck -v -dump-input=always check1 < input1 |& sed -n '/^Input file/,$p'
Input file: <stdin>
Check file: check1

-dump-input=help describes the format of the following dump.

Full input was:
<<<<<<
        1: ; abc def
        2: ; ghI jkl
next:3     X~~~~~~~~ error: no match found
>>>>>>

$ cat check1
CHECK: abc
CHECK-SAME: def
CHECK-NEXT: ghi
CHECK-SAME: jkl

$ cat input1
; abc def
; ghI jkl

Some additional details related to the boilerplate:

  • Enabling: The annotated input dump is enabled by -dump-input, which can also be set via the FILECHECK_OPTS environment variable. Accepted values are help, always, fail, or never. As shown above, help describes the format of the dump. always is helpful when you want to investigate a successful FileCheck run, perhaps for an unexpected pass. -dump-input-on-failure and FILECHECK_DUMP_INPUT_ON_FAILURE remain as a deprecated alias for -dump-input=fail.
  • Diagnostics: The usual diagnostics are not suppressed in this mode and are printed first. For brevity in the example above, I've omitted them using a sed command. Sometimes they're perfectly sufficient, and then they make debugging quicker than if you were forced to hunt through a dump of long input looking for the error. If you think they'll get in the way sometimes, keep in mind that it's pretty easy to grep for the start of the input dump, which is <<<.
  • Colored Annotations: The annotated input is colored if colors are enabled (enabling colors can be forced using -color). For example, errors are red. However, as in the above example, colors are not vital to reading the annotations.

I don't know how to test color in the output, so any hints here would
be appreciated.

Diff Detail

Event Timeline

jdenny created this revision.Oct 8 2018, 2:25 PM

I don't know how to test color in the output, so any hints here would be appreciated.

I think almost everyone reads FileCheck output through lit, which strips away all color =(

I am also somewhat uneasy about renaming the option, since it is already contained in some configurations. Could we accept both?

In the long term, I think we should introduce the LIT_DEBUG option, which would set both, so typing a long option name would not be necessary.

jdenny added a comment.Oct 8 2018, 3:13 PM

I don't know how to test color in the output, so any hints here would be appreciated.

I think almost everyone reads FileCheck output through lit, which strips away all color =(

When I'm really baffled by a FileCheck diagnostic, I find myself running the commands manually, so color is useful then. We could also consider teaching lit not to strip away color. In any case, the annotations are usable without color. Color just makes them easier to read.

But does anyone know of a good way to test colored output? My test cases here currently only exercise the dump without color.

I am also somewhat uneasy about renaming the option, since it is already contained in some configurations. Could we accept both?

I'm fine with that. To avoid questions of precedence, we could make it an error to specify both versions at the same time.

In the long term, I think we should introduce the LIT_DEBUG option, which would set both, so typing a long option name would not be necessary.

Sure. In this patch, I wasn't worried about the length of the option so much. It's just that the old name doesn't make sense for always mode.

I've added some nits inline.
Overall, this is a huge patch, and I still have trouble understanding all of what it is doing, so I think it should be broken up.

As to testing colors, I think that if the platform is fixed, just checking the color codes should be fine?

llvm/include/llvm/Support/FileCheck.h
131

IMO to be consistent, this parameter should also be a reference

llvm/lib/Support/FileCheck.cpp
1062

Five lines seem to be duplicated with the section above. Could that be extracted to a function? I also think that braces should not be avoided when "else" is present.

llvm/utils/FileCheck/FileCheck.cpp
136

Should this be an enum then?

183

with / changeColor block is duplicated many times, should be extracted into a function.

jdenny added a comment.Oct 8 2018, 9:21 PM

I've added some nits inline.

Thanks for reviewing.

Overall, this is a huge patch, and I still have trouble understanding all of what it is doing, so I think it should be broken up.

I'm happy to do so, but let's agree on the right way to break it up before I put in the effort. Here's my proposed patch sequence:

  1. Make changes to include/llvm/Support/FileCheck.h and lib/Support/FileCheck.cpp, which gather the list of diagnostics.
  2. Make changes to utils/FileCheck/FileCheck.cpp, which converts the list of diagnostics to annotations on the input.
  3. Adjust command-line option and env var.
  4. Expose colorsEnabled in WithColor.h.

On second thought, these are not logically distinct changes as none of them really has much purpose on its own. You might just consider reviewing files in the above order. Your call. Let me know.

As to testing colors, I think that if the platform is fixed, just checking the color codes should be fine?

You mean ANSI escape sequences? That would make for some rather unreadable expected output. There must be a better way. If only we had a tool that converts those sequences into something human-readable, like XML tags.

llvm/include/llvm/Support/FileCheck.h
131

It might be a nullptr.

llvm/lib/Support/FileCheck.cpp
1062

Five lines seem to be duplicated with the section above. Could that be extracted to a function?

Will do.

I also think that braces should not be avoided when "else" is present.

That's a new rule to me. It doesn't appear to be followed elsewhere in this file. Is it followed elsewhere in LLVM?

llvm/utils/FileCheck/FileCheck.cpp
136

Will do.

183

Will do.

jdenny updated this revision to Diff 168744.Oct 8 2018, 9:32 PM

Made most of the reviewer suggestions.

jdenny marked 4 inline comments as done.Oct 8 2018, 9:34 PM
jdenny added a comment.Oct 8 2018, 9:48 PM
  1. Make changes to include/llvm/Support/FileCheck.h and lib/Support/FileCheck.cpp, which gather the list of diagnostics.
  2. Make changes to utils/FileCheck/FileCheck.cpp, which converts the list of diagnostics to annotations on the input.
  3. Adjust command-line option and env var.
  4. Expose colorsEnabled in WithColor.h.

On second thought, these are not logically distinct changes as none of them really has much purpose on its own. You might just consider reviewing files in the above order. Your call. Let me know.

Actually, 3 and 4 are logically distinct. There just isn't much code associated with them, so I'm not sure splitting them off would address your concern. Still, let me know what would help you to review.

jdenny updated this revision to Diff 168864.Oct 9 2018, 2:14 PM
jdenny edited the summary of this revision. (Show Details)

Restore -dump-input-on-failure as requested, and clean up a little.

I decided it's easiest to keep -dump-input-on-failure as an independent option that does not include annotations. I marked it deprecated.

@jdenny Sorry I'm still struggling to understand what exactly are you doing.
You have one example in the revision description, and some examples in tests. The examples in tests are hard to read because they are also tested using FileCheck.

Could you add a few more examples in the revision description, and add a detailed explanation of what your feature is doing, and what is the exact semantics of added annotations?
(what exactly is "chk"? I guess short for "check"? And "sam" short for "same"? The semantics of ~, ^ and x also eludes me).

jdenny edited the summary of this revision. (Show Details)Oct 9 2018, 7:28 PM

@jdenny Sorry I'm still struggling to understand what exactly are you doing.

Sorry. It's obviously not as self-explanatory as I was hoping. I've probably been staring at it too long. Thanks for letting me know.

You have one example in the revision description, and some examples in tests. The examples in tests are hard to read because they are also tested using FileCheck.

Could you add a few more examples in the revision description, and add a detailed explanation of what your feature is doing, and what is the exact semantics of added annotations?
(what exactly is "chk"? I guess short for "check"? And "sam" short for "same"? The semantics of ~, ^ and x also eludes me).

I've extended the description as you suggest. Please let me know if it's still confusing.

OK that's somewhat more clear, but I'm still somewhat confused. Line by line:

$ FileCheck -v -dump-input=always checks1 < input1 |& sed -n '/^Key for/,$p'

I assume sed is there to suppress all output before the legend is printed?

  • Should legend be always printed? Wouldn't it make more sense to dump it only if requested?
Key for input dump annotations:

  - L:     labels input line number L

In general, from your legend it's hard to figure out what line refers to what.
Here I assume that this item refers to line numbers from the matched files,
but it takes some guessing and looking at the output.

- T:L    labels the only match result for a pattern of type T from line L
- T:L'N  labels the Nth match result for a pattern of type T from line L

I do not understand what N'th match is.

- ^~~    marks good match (requires -v)
- !~~    marks bad match

I could not understand what "bad match" is, could only get it from a more detailed description later

- X~~    marks search range when no match

"when no match is found" ?

- ?      marks fuzzy match when no match

I don't understand this line. Is it a best-effort match when no match is found? Where the question mark is situated then?

Detailed description of currently enabled markers:

Should the description of markers be always duplicated?

- ^~~    marks the final match for an expected pattern
- !~~    marks either:
         - the final match for an excluded pattern
         - the final but illegal match for an expected pattern

The explanation is not clear. What is "final"? It's better to clarify that excluded means supplied with "NOT".
It's not clear what "illegal" means here either.

- X~~    marks the search range for an unmatched expected pattern

Where is X located? Just at the start of the range?

- ?      marks a fuzzy match start for an otherwise unmatched pattern

What's the difference between X and a question mark?

Full input was:
<<<<<<
         1: ; abc def

Line numbering may become ambiguous with the input, especially the space after the colon.
Is line numbering required? Should there be a better separation?

chk:1         ^~~

It's confusing that (from my understanding) line numbers above refer to line numbers in the input document,
but line numbers here refer to line numbers in the file with FileCheck directives.

sam:2             ^~~

sam is not the best abbreviation for "SAME". Maybe spare another letter? Or use "sme" or something?

         2: ; ghI jkl
nxt:3'0     X~~~~~~~~

I don't understand the semantics here. What's '0?
Why X is below the semicolon? If it's always at the start of the line, should it be there at all?

nxt:3'1       ?

What is the purpose of this question mark? If we have already failed the search at this point because the previous pattern failed,
does it convey any information to put the question mark at the start of the line?
I don't understand what '1 means here either.

>>>>>>

$ cat checks1
CHECK: abc
CHECK-SAME: def
CHECK-NEXT: ghi
CHECK-SAME: jkl

$ cat input1
; abc def
; ghI jkl
george.karpenkov requested changes to this revision.Oct 10 2018, 10:56 AM
This revision now requires changes to proceed.Oct 10 2018, 10:56 AM

OK that's somewhat more clear, but I'm still somewhat confused. Line by line:

$ FileCheck -v -dump-input=always checks1 < input1 |& sed -n '/^Key for/,$p'

I assume sed is there to suppress all output before the legend is printed?

Yes. If you think all the usual diagnostics should be included here, I can remove the sed command.

  • Should legend be always printed? Wouldn't it make more sense to dump it only if requested?

I considered that. However, the output is already typically long given that we're dumping the input and probably have -v enabled. The legend doesn't add much more, so I don't see the harm in including it always. Why make life more difficult by forcing the user to try again with another option? It seems best to always put the information close to where the user needs it.

Key for input dump annotations:

  - L:     labels input line number L

In general, from your legend it's hard to figure out what line refers to what.
Here I assume that this item refers to line numbers from the matched files,
but it takes some guessing and looking at the output.

It says "input line", so I thought that would make it clear.

- T:L    labels the only match result for a pattern of type T from line L

Maybe it would be clearer if this were to say "checks line"?

  • T:L'N labels the Nth match result for a pattern of type T from line L
I do not understand what N'th match is.

It's an Nth match *result*. For example, one result for a pattern might be that it has no match in a particular search range. Another result might be the fuzzy match that FileCheck sometimes reports after that. For CHECK-DAG and -vv, you can also have a series of discarded matches.

Should it say "diagnostic" instead of "match result"?

- ^~~    marks good match (requires -v)
- !~~    marks bad match

I could not understand what "bad match" is, could only get it from a more detailed description later

Yes, I meant this as a high-level summary before you read all the details. I also thought it would be a nice reminder for someone who has read all this before. If you have suggestions on a better organization, please let me know.

- X~~    marks search range when no match

"when no match is found" ?

Just trying to be succinct. I can spell it out if that helps.

- ?      marks fuzzy match when no match

I don't understand this line. Is it a best-effort match when no match is found?

Yes. FileCheck already produces all these diagnostics. I'm just representing them as annotations.

Where the question mark is situated then?

The start of the fuzzy match. FileCheck currently doesn't report the full range of a fuzzy match, and I didn't try to change that.

Detailed description of currently enabled markers:

Should the description of markers be always duplicated?

I originally just had the detailed description but felt it was too intimidating, and I thought the short version before it helped to give a sense of the markers before the details. Again, I'm open to suggestions on how to improve it.

- ^~~    marks the final match for an expected pattern
- !~~    marks either:
         - the final match for an excluded pattern
         - the final but illegal match for an expected pattern

The explanation is not clear. What is "final"?

Not a discarded match, which sometimes happens for CHECK-DAG.

It's better to clarify that excluded means supplied with "NOT".

I was trying to keep it general in case we grow other directives with excluded patterns.

It's not clear what "illegal" means here either.

CHECK-NEXT and CHECK-SAME can match patterns and then complain they're illegal.

For those cases, I'll add examples of directives that are affected.

- X~~    marks the search range for an unmatched expected pattern

Where is X located? Just at the start of the range?

Yes. I thought it would be intuitive to LLVM developers (especially those who have read FileCheck's existing diagnostics, which already include ^~~) that X~~ and !~~ mark ranges like ^~~ but for different purposes.

- ?      marks a fuzzy match start for an otherwise unmatched pattern

What's the difference between X and a question mark?

Search range vs fuzzy match.

Full input was:
<<<<<<
         1: ; abc def

Line numbering may become ambiguous with the input, especially the space after the colon.

How? Every input line begins with a line number, colon, and space. It's so unambiguous, you can write a simple sed expression to extract just the input text from the annotated dump. Maybe you're saying I should mention the space in the legend?

Is line numbering required?

Without line numbering, you cannot always distinguish input lines and annotation lines. Moreover, the line numbers help when you see a diagnostic before the dump and want to search for the corresponding line.

Should there be a better separation?

I'm open to suggestions, but this seems to me like the most obvious way to do it.

chk:1         ^~~

It's confusing that (from my understanding) line numbers above refer to line numbers in the input document,
but line numbers here refer to line numbers in the file with FileCheck directives.

That's true throughout existing FileCheck diagnostics, and I don't know what to do about it. The user must remember that input and checks are (sometimes) in different files.

sam:2             ^~~

sam is not the best abbreviation for "SAME". Maybe spare another letter? Or use "sme" or something?

sme is fine by me.

         2: ; ghI jkl
nxt:3'0     X~~~~~~~~

I don't understand the semantics here. What's '0?

Diagnostic 0 for the CHECK-NEXT on line 3.

Why X is below the semicolon?

Because that's the start of the search range.

If it's always at the start of the line, should it be there at all?

Sometimes the search range doesn't start at the start of the line because the last match didn't end at the end of the previous line.

nxt:3'1       ?

What is the purpose of this question mark? If we have already failed the search at this point because the previous pattern failed,
does it convey any information to put the question mark at the start of the line?

It's not at the start of the line. It's at the start of the fuzzy match, which is later.

I don't understand what '1 means here either.

Diagnostic 1 for the CHECK-NEXT on line 3.

>>>>>>

$ cat checks1
CHECK: abc
CHECK-SAME: def
CHECK-NEXT: ghi
CHECK-SAME: jkl

$ cat input1
; abc def
; ghI jkl

Thanks for all your helpful comments. Please keep them coming, and I'll apply changes as we agree on them.

jdenny updated this revision to Diff 169320.Oct 11 2018, 3:34 PM
jdenny edited the summary of this revision. (Show Details)Oct 11 2018, 3:34 PM

Address reviewer concerns:

  • Add examples of match result types.
  • Document space that follows an input line number.
  • Clarify input file vs. check file for line numbers.
  • "when no match" -> "when no match is found".
  • Don't abbreviate directive types.

That's everything I know to do for now.

jdenny updated this revision to Diff 170220.Oct 19 2018, 11:16 AM
jdenny edited the summary of this revision. (Show Details)

Adjusted to take advantage of D53419, which extends WithColor.

I think that finishes separating out all logically distinct change sets. I'm willing to split up the patch more if it helps the review.

@jdenny

I'm willing to split up the patch more if it helps the review

Yes please, to be honest I can't still fully understand the feature description, let alone the code.
I have asked my colleague to take a glance, and his verdict was the same, so I'm not alone there.

You have a lot of new features in your FileCheck output; could you think of a way to separate them into simple, small, understandable chunks?

george.karpenkov requested changes to this revision.Oct 22 2018, 10:54 AM
This revision now requires changes to proceed.Oct 22 2018, 10:54 AM

@jdenny

I'm willing to split up the patch more if it helps the review

Yes please, to be honest I can't still fully understand the feature description, let alone the code.
I have asked my colleague to take a glance, and his verdict was the same, so I'm not alone there.

You have a lot of new features in your FileCheck output; could you think of a way to separate them into simple, small, understandable chunks?

OK, but please work with me to plan a way to break up the series that would actually prove helpful to you. Here are two proposals:

  1. Here's the plan I proposed in an earlier comment, but I've updated it for recent changes:
    1. Make changes to include/llvm/Support/FileCheck.h and lib/Support/FileCheck.cpp, which gather the list of diagnostics.
    2. Make changes to utils/FileCheck/FileCheck.cpp, which converts the list of diagnostics to annotations on the input. Include test and doc changes in this patch because this is where they become effective.

      Because you see FileCheck's *output* as many new features, I'm afraid all the parts you find confusing would all appear in the second patch, so I'm not sure this plan would prove helpful to you.
  1. This patch introduces input annotations for existing diagnostics. Those diagnostics are enumerated in MatchType in FileCheckDiag in llvm/include/llvm/Support/FileCheck.h. Would it be helpful to have each patch introduce annotations for one MatchType member? The first patch would likely be the largest as it would introduce boilerplate.

Please let me know which of these proposals you prefer, or propose something else.

Thanks for your help.

  1. This patch introduces input annotations for existing diagnostics. Those diagnostics are enumerated in MatchType in FileCheckDiag in llvm/include/llvm/Support/FileCheck.h. Would it be helpful to have each patch introduce annotations for one MatchType member? The first patch would likely be the largest as it would introduce boilerplate.

Unless someone (immediately) says it won't be helpful, I'm going to put some effort into this approach at splitting up the patch.

jdenny updated this revision to Diff 171782.Oct 30 2018, 1:56 PM
jdenny retitled this revision from [FileCheck] Annotate input dump to [FileCheck] Annotate input dump (1/7).
jdenny edited the summary of this revision. (Show Details)
  • Split up the patch. This is now the first in the series. Each patch has a description and example of the diagnostics for which it implements input annotations.
  • Add the input file name and check file name to the annotations description to further address a reviewer concern.
  • Don't use the word "key" to identify the annotations description because technically that's not what it is.
jdenny edited the summary of this revision. (Show Details)Oct 30 2018, 1:57 PM
jdenny set the repository for this revision to rL LLVM.Oct 30 2018, 2:05 PM
zturner added inline comments.
llvm/docs/CommandGuide/FileCheck.rst
83–85

I haven't been following this too closely, but I'm wondering, are the 3 modes actually necessary? It sounds like the main use case here is that the user has a failure and wants to get more info. So they will set it to either fail or always. But do they care which? Basically, what I'm wondering is why not just make this be a binary on flag?

It just seems simpler to say that if you want to dump the input, pass --dump-input, and if you don't want to dump the input, pass nothing.

llvm/test/FileCheck/match-full-lines.txt
4–7 ↗(On Diff #171782)

You could also achieve this without the command line flag by writing env FILECHECK_DUMP_INPUT= not FileCheck .... Pretty sure you can unset a variable and run a command this way.

jdenny added inline comments.Oct 30 2018, 3:11 PM
llvm/docs/CommandGuide/FileCheck.rst
83–85

-dump-input=always is helpful when you're debugging individual tests and encounter FileCheck successes you don't understand.

I was thinking -dump-input=fail (via an env var) is better when running full test suites, perhaps from a bot or IDE. Imagine a test with many successful FileCheck commands before the failed one. -dump-input=always might produce massive output that will be logged and that must be scrolled/grepped through to find the failure, depending on how you like to interact with test suite logs.

In any case, -dump-input=fail was inspired by the existing -dump-input-on-failure, which I believe @george.karpenkov said is used in bots. Is that right, George?

llvm/test/FileCheck/match-full-lines.txt
4–7 ↗(On Diff #171782)

I don't recall why I did it that way, and your way does seem more obvious. I'll change it.

While we're talking about this, I'd eventually like to handle this issue in another way entirely. In FileCheck's own test suite, I'd like to ban direct FileCheck calls. Instead, I'd like to require test authors to choose one of the following for each call:

  • %FileCheckee: Used for FileCheck calls whose exit status and output are being checked. Normal FILECHECK_* environment variables are cleared within these calls. This avoids problems where people have these variables set for general testing purposes and thus change the FileCheck output being tested, creating spurious fails or passes. If these calls need to test env vars, then we could just have an alternate set of environment variables (FILECHECKEE_* maybe) that %FileCheckee copies to the normal variables.
  • %FileChecker: Used for FileCheck calls that check the output of %FileCheckee calls (or any other output under test). FILECHECK_* environment variables are obeyed just as they are obeyed for FileCheck calls in other test suites.

Of course, this would be another patch, which I don't have time for right now. Do you think the idea has merit?

jdenny updated this revision to Diff 171998.Oct 31 2018, 1:05 PM
  • Address zturner's comment: prefer FILECHECK_DUMP_INPUT=never over -dump-input=never when avoiding user environment variables.
  • Set FILECHECK_DUMP_INPUT=never in test/FileCheck/verbose.txt as well. Somehow I didn't notice this one was failing before.
jdenny updated this revision to Diff 173040.Nov 7 2018, 2:59 PM
jdenny marked 2 inline comments as done.
jdenny edited the summary of this revision. (Show Details)
  • Rebased.
  • Removed FILECHECK_DUMP_INPUT env var. The new FILECHECK_OPTS is sufficient.
  • Removed unsetting of FILECHECK_* env vars in tests. That should be handled more carefully in a separate patch.

It really seems like DiagList and AnnotationList ought to be vectors, not lists. They are append-only, and AnnotationList gets converted to a vector anyway to sort it. The code doesn't depend on the element-pointer stability guarantee of a list, except in one place noted below which can be fixed.
It's quite possible a vector would perform less well, in the face of many diags/annotations, but as the diags/annotations are the unusual case, performance is not really a big consideration.

llvm/utils/FileCheck/FileCheck.cpp
245

range-for here?

283

This appears to be the only place that functionally depends on AnnotationList being a <list>. But if you built B as a stack instance first, then you can push_back when you're done, and then AnnotationList can be a <vector> instead.

Regarding the bit about environment variables, probably the right thing to do is add a lit.local.cfg to the FileCheck test directory, that zaps the environment variables. Then most individual FileCheck tests can assume a default environment, and tests for specific non-defaults can set the environment variables directly in the test file.
It would mean you can't run the FileCheck tests *from lit* with an environment variable set. Unless we invented another env var that lit.local.cfg would look for... but that's for another patch, at this point.

It really seems like DiagList and AnnotationList ought to be vectors, not lists. They are append-only, and AnnotationList gets converted to a vector anyway to sort it. The code doesn't depend on the element-pointer stability guarantee of a list, except in one place noted below which can be fixed.
It's quite possible a vector would perform less well, in the face of many diags/annotations, but as the diags/annotations are the unusual case, performance is not really a big consideration.

I was thinking of many diags (-v with many checks) because that's where the annotations have proven most helpful to me as a visualization. I was thinking the conversion to vector would be a small penalty in comparison to the potentially many copies during reiszing the vector. I was more concerned about the time impact than the space impact, which I guessed would be small on a modern system.

Having said all that, I don't have a strong opinion here, and we can change it later if someone does some performance measurements, so I'd be happy to use vector now with no further discussion if my above arguments aren't convincing. Just say the word.

Regarding the bit about environment variables, probably the right thing to do is add a lit.local.cfg to the FileCheck test directory, that zaps the environment variables. Then most individual FileCheck tests can assume a default environment, and tests for specific non-defaults can set the environment variables directly in the test file.
It would mean you can't run the FileCheck tests *from lit* with an environment variable set. Unless we invented another env var that lit.local.cfg would look for... but that's for another patch, at this point.

That's fine as a first step to addressing that issue. However we do it, I think it should move to a separate patch as the issue exists independently of this patch series. Agreed?

By the way, this issue also exists in the lit test suite.

llvm/utils/FileCheck/FileCheck.cpp
245

D53893 (later patch in this series) makes use of the iterators. Let me know if you think there's a better way.

jdenny updated this revision to Diff 174784.Nov 20 2018, 8:45 AM
  • Rebase, and extend for the new CHECK-COUNT-<num> directive.
  • Convert DiagList and AnnotationList to std::vector, as probinson recommended.
jdenny marked an inline comment as done.Nov 20 2018, 8:50 AM

Apologies for the delay. I haven't been ignoring this series; I was having internal qualms about the amount of effort to produce extensive annotations, and the value they might provide. But I've come down in favor of doing it.

llvm/docs/CommandGuide/FileCheck.rst
92

in favor of --dump-input=fail.

llvm/utils/FileCheck/FileCheck.cpp
102

There's a way to make the argument be an enum, which has a variety of advantages. Please do it that way.

159

I'd omit the S part. 6: is clearly a line number, you don't need to document that the colon has a space after it.

496

DumpInput == "never" ? nullptr : &Diags
so we don't bother collecting diags that we will never print. Saves a small bit of time and memory, but this tool is used a *lot* in the default "never" mode and it's worth doing that small optimization.

497

So, I can say -dump-input-on-failure -dump-input=fail and it will dump the input twice? I think -dump-input-on-failure should just set -dump-input=fail (if -dump-input didn't appear separately, i.e. the new option takes precedence) and you only get one dump.

499

The detailed description of the annotations becomes long enough that I think including it with the dumped input starts to get in the way. Maybe have a -dump-input=help that will print the description and quit, or something along those lines.

Apologies for the delay. I haven't been ignoring this series; I was having internal qualms about the amount of effort to produce extensive annotations, and the value they might provide.

No problem. Thanks for the careful consideration.

But I've come down in favor of doing it.

Great! I'll work on your suggestions soon.

llvm/utils/FileCheck/FileCheck.cpp
497

So, I can say -dump-input-on-failure -dump-input=fail and it will dump the input twice?

I was thinking -dump-input-on-failure would be removed eventually, so I decided to be lazy. I'll change it.

I think -dump-input-on-failure should just set -dump-input=fail (if -dump-input didn't appear separately, i.e. the new option takes precedence) and you only get one dump.

Do you want -dump-input=never -dump-input-on-failure to be the same as -dump-input=never or -dump-input=fail?

499

The detailed description of the annotations becomes long enough that I think including it with the dumped input starts to get in the way. Maybe have a -dump-input=help that will print the description and quit, or something along those lines.

If the input dump is short, I usually don't need the dump. I usually need the dump when it's long, and then the description is relatively tiny and doesn't feel like it's in the way. Moreover, I don't want to force users to remember how to obtain that description (FileCheck isn't even normally in my PATH, so that's another barrier), so I think it's more convenient just to print it with the dump.

However, you're the second person with your opinion, so I'm outvoted, and I'm fine with that. Any objection to always dumping a reminder that -dump-input=help exists?

probinson added inline comments.Nov 30 2018, 12:44 PM
llvm/utils/FileCheck/FileCheck.cpp
497

cl::opt doesn't support command-line-order checks, so what I'd do is have the -dump-input enum have a Default. Then if -dump-input is Default, you set it to Never or Fail depending on -dump-input-on-failure. If/when we get rid of -dump-input-on-failure, we can set the -dump-input default to Never and get rid of the Default enum too.

499

A reminder that -dump-input=help exists would be totally appropriate.

jdenny added inline comments.Nov 30 2018, 1:09 PM
llvm/utils/FileCheck/FileCheck.cpp
499

A reminder that -dump-input=help exists would be totally appropriate.

I'm assuming that -dump-input=fail might be used by bots, and I'm thinking about what happens when someone is reading without a terminal handy to run FileCheck -dump-input=help. Should we assume such people will quickly become familiar enough with these annotations that they don't need the description, or should we offer something more?

Perhaps the description should also appear in rst/html documentation, and perhaps the reminder should be a pointer to that instead of -dump-input=help because the former is more universally accessible.

What do you think?

probinson added inline comments.Nov 30 2018, 1:37 PM
llvm/utils/FileCheck/FileCheck.cpp
499

If a test failure is so involved that the annotations would be helpful, I think people would be running the test locally to try to debug it. So, getting the help from the tool should be fine.

jdenny updated this revision to Diff 176895.Dec 5 2018, 4:42 PM
jdenny edited the summary of this revision. (Show Details)
jdenny set the repository for this revision to rL LLVM.

Made changes suggested by probinson:

  • In docs, say -dump-input-on-failure is deprecated in favor of -dump-input=fail not -dump-input.
  • Use cl::values (enum) for -dump-input value.
  • In annotation description, don't document space after line number.
  • Don't bother collecting diagnostics if -dump-input=never.
  • Instead of leaving -dump-input-on-failure as a separate feature, make it an alias for -dump-input=fail but only when -dump-input is not otherwise specified.
  • Move annotation description to a -dump-input=help option.

I also removed the detailed description of the currently enabled
markers. In place of that, I added a few notes to the brief
description that preceded it. Here's why:

  • The reviews so far have led me to believe that having two descriptions was more confusing than helpful. George questioned whether the marker descriptions should be repeated (D52999#1260736). George didn't understand the short version of the !~~ description (D52999#1260736). Paul questioned why !~~ should represent both a discarded match and actual errors (D53898#1317412). (Perhaps we really need an additional marker, but my assumption is that the docs just aren't clear.)
  • The detailed description added complexity to multiple parts of the existing implementation (MatchType, MatchTypeStyle, and of course DumpInputAnnotationHelp).
  • Previously the detailed description changed based on whether color was enabled and based on the verbosity level. That seemed to make sense when the description printed with the dump. Now it requires the user to be sure to specify certain options the same when requesting help as when requesting the dump. The options in the latter case might be buried in a FILECHECK_OPTS in a bot, IDE, or script, so getting the options right for help could be error-prone.

If you feel that part of this change makes things worse, I can revert
it. Hopefully it's a helpful simplification and can be tweaked to
address any further problems.

jdenny marked 21 inline comments as done.Dec 5 2018, 4:50 PM
jdenny added inline comments.
llvm/docs/CommandGuide/FileCheck.rst
83–85

Seeing no followup for a while, I'm assuming my response here was satisfactory, and I'm marking this done.

llvm/lib/Support/FileCheck.cpp
1062

Seeing no followup for a while, I'm assuming my response here was satisfactory, and I'm marking this done.

llvm/utils/FileCheck/FileCheck.cpp
245

Seeing no followup for a while, I'm assuming my response here was satisfactory, and I'm marking this done.

jdenny marked 3 inline comments as done.Dec 5 2018, 4:52 PM
jdenny added a comment.EditedDec 5 2018, 4:56 PM

Why did I have to set the repository again? I already did that on 10/30.

jdenny added a comment.Dec 7 2018, 7:51 AM

Another way to improve the clarity of markers (to avoid the confusion discussed in, for example, D53898#1317412) occurred to me: Except for markers indicating successful completion of a directive (green ^~~ for most directives, and green X~~ for CHECK-NOT), I could add a note at the end of the marker. For example:

$ FileCheck -vv -dump-input=always check4a < input4a |& sed -n '/^<<<</,$p'
<<<<<<
         1: 01234
check:1     ^~
not:2         X~~
         2: 56789
not:2       ~~~~~
         3: abcdef
dag:3       ^~~~
dag:4'0       !~~~ discard: overlaps earlier match
         4: cdefgh
dag:4'1     ^~~~
next:5          !~ error: same line as previous match
>>>>>>

$ cat check4a
CHECK: 01
CHECK-NOT: foobar
CHECK-DAG: abcd
CHECK-DAG: cdef
CHECK-NEXT: gh

$ cat input4a 
01234
56789
abcdef
cdefgh

This shouldn't add much complexity to the implementation. Unless someone is opposed, I'll try to work on it soon.

jdenny updated this revision to Diff 177380.Dec 7 2018, 10:14 PM
jdenny edited the summary of this revision. (Show Details)

Add explanatory note to any marker not indicating successful completion of a directive (that is, anything not green).

probinson accepted this revision.Dec 10 2018, 1:31 PM

I like having the extra explanations at the end of the tilde lines, that's a great idea.
One unnecessary include left, and LGTM.

llvm/include/llvm/Support/FileCheck.h
21

I believe you don't need <list> anymore.

I like having the extra explanations at the end of the tilde lines, that's a great idea.
One unnecessary include left, and LGTM.

Thanks! I'll fix the include soon.

I won't commit this immediately as I'm thinking it's best to commit the entire patch series at the same time. Let me know if you feel differently.

jdenny updated this revision to Diff 177621.Dec 10 2018, 4:58 PM
jdenny set the repository for this revision to rL LLVM.

Remove unnecessary include.

jdenny marked an inline comment as done.Dec 10 2018, 5:02 PM

A couple of nits I didn't notice until looking at the next patch.

llvm/include/llvm/Support/FileCheck.h
162

Comments should be proper sentences.

167

The style guide doesn't actually say this, but it's pretty much universal in the code base to declare each member separately.

jdenny marked an inline comment as done.Dec 12 2018, 10:37 AM
jdenny added inline comments.
llvm/include/llvm/Support/FileCheck.h
162

Thanks. Before I adjust all the patches, does the following work, or was this more than a style issue?

/// Indicates no match for an expected pattern.
probinson added inline comments.Dec 12 2018, 11:50 AM
llvm/include/llvm/Support/FileCheck.h
162

Just style. What you propose is fine.

jdenny updated this revision to Diff 177923.Dec 12 2018, 3:11 PM

Adjust comment, as suggested by probinson.

Change GetMarker's parameter type from unsigned to FileCheckDiag::MatchType.

jdenny marked 2 inline comments as done.Dec 12 2018, 3:14 PM
jdenny marked an inline comment as done.Dec 12 2018, 3:18 PM
jdenny added inline comments.
llvm/include/llvm/Support/FileCheck.h
167

Ah, I missed that. I'll get it before committing.

jdenny marked 2 inline comments as done.Dec 15 2018, 5:54 AM
jdenny added inline comments.
llvm/include/llvm/Support/FileCheck.h
167

Done in D55738.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 17 2018, 4:05 PM
This revision was automatically updated to reflect the committed changes.
llvm/test/FileCheck/dump-input-enable.txt