Page MenuHomePhabricator

[FileCheck] Annotate input dump (1/7)
ClosedPublic

Authored by jdenny on Oct 8 2018, 2:25 PM.

Details

Summary

Extend FileCheck to dump its input annotated with FileCheck's
diagnostics: errors, good matches if -v, and additional information if
-vv. The goal is to make it easier to visualize FileCheck's matching
behavior when debugging.

Each patch in this series implements input annotations for a
particular category of FileCheck diagnostics. While the first few
patches alone are somewhat useful, the annotations become much more
useful as later patches implement annotations for -v and -vv
diagnostics, which show the matching behavior leading up to the error.

This first patch implements boilerplate plus input annotations for
error diagnostics reporting that no matches were found for a
directive. These annotations mark the search ranges of the failed
directives. Instead of using the usual ^~~, which is used by later
patches for good matches, these annotations use X~~ so that this
category of errors is visually distinct.

For example:

$ FileCheck -dump-input=help
The following description was requested by -dump-input=help to
explain the input annotations printed by -dump-input=always and
-dump-input=fail:

  - L:     labels line number L of the input file
  - T:L    labels the match result for a pattern of type T from line L of
           the check file
  - X~~    marks search range when no match is found
  - colors error

If you are not seeing color above or in input dumps, try: -color

$ FileCheck -v -dump-input=always check1 < input1 |& sed -n '/^Input file/,$p'
Input file: <stdin>
Check file: check1

-dump-input=help describes the format of the following dump.

Full input was:
<<<<<<
        1: ; abc def
        2: ; ghI jkl
next:3     X~~~~~~~~ error: no match found
>>>>>>

$ cat check1
CHECK: abc
CHECK-SAME: def
CHECK-NEXT: ghi
CHECK-SAME: jkl

$ cat input1
; abc def
; ghI jkl

Some additional details related to the boilerplate:

  • Enabling: The annotated input dump is enabled by -dump-input, which can also be set via the FILECHECK_OPTS environment variable. Accepted values are help, always, fail, or never. As shown above, help describes the format of the dump. always is helpful when you want to investigate a successful FileCheck run, perhaps for an unexpected pass. -dump-input-on-failure and FILECHECK_DUMP_INPUT_ON_FAILURE remain as a deprecated alias for -dump-input=fail.
  • Diagnostics: The usual diagnostics are not suppressed in this mode and are printed first. For brevity in the example above, I've omitted them using a sed command. Sometimes they're perfectly sufficient, and then they make debugging quicker than if you were forced to hunt through a dump of long input looking for the error. If you think they'll get in the way sometimes, keep in mind that it's pretty easy to grep for the start of the input dump, which is <<<.
  • Colored Annotations: The annotated input is colored if colors are enabled (enabling colors can be forced using -color). For example, errors are red. However, as in the above example, colors are not vital to reading the annotations.

I don't know how to test color in the output, so any hints here would
be appreciated.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jdenny edited the summary of this revision. (Show Details)Oct 11 2018, 3:34 PM

Address reviewer concerns:

  • Add examples of match result types.
  • Document space that follows an input line number.
  • Clarify input file vs. check file for line numbers.
  • "when no match" -> "when no match is found".
  • Don't abbreviate directive types.

That's everything I know to do for now.

jdenny updated this revision to Diff 170220.Oct 19 2018, 11:16 AM
jdenny edited the summary of this revision. (Show Details)

Adjusted to take advantage of D53419, which extends WithColor.

I think that finishes separating out all logically distinct change sets. I'm willing to split up the patch more if it helps the review.

@jdenny

I'm willing to split up the patch more if it helps the review

Yes please, to be honest I can't still fully understand the feature description, let alone the code.
I have asked my colleague to take a glance, and his verdict was the same, so I'm not alone there.

You have a lot of new features in your FileCheck output; could you think of a way to separate them into simple, small, understandable chunks?

george.karpenkov requested changes to this revision.Oct 22 2018, 10:54 AM
This revision now requires changes to proceed.Oct 22 2018, 10:54 AM

@jdenny

I'm willing to split up the patch more if it helps the review

Yes please, to be honest I can't still fully understand the feature description, let alone the code.
I have asked my colleague to take a glance, and his verdict was the same, so I'm not alone there.

You have a lot of new features in your FileCheck output; could you think of a way to separate them into simple, small, understandable chunks?

OK, but please work with me to plan a way to break up the series that would actually prove helpful to you. Here are two proposals:

  1. Here's the plan I proposed in an earlier comment, but I've updated it for recent changes:
    1. Make changes to include/llvm/Support/FileCheck.h and lib/Support/FileCheck.cpp, which gather the list of diagnostics.
    2. Make changes to utils/FileCheck/FileCheck.cpp, which converts the list of diagnostics to annotations on the input. Include test and doc changes in this patch because this is where they become effective.

      Because you see FileCheck's *output* as many new features, I'm afraid all the parts you find confusing would all appear in the second patch, so I'm not sure this plan would prove helpful to you.
  1. This patch introduces input annotations for existing diagnostics. Those diagnostics are enumerated in MatchType in FileCheckDiag in llvm/include/llvm/Support/FileCheck.h. Would it be helpful to have each patch introduce annotations for one MatchType member? The first patch would likely be the largest as it would introduce boilerplate.

Please let me know which of these proposals you prefer, or propose something else.

Thanks for your help.

  1. This patch introduces input annotations for existing diagnostics. Those diagnostics are enumerated in MatchType in FileCheckDiag in llvm/include/llvm/Support/FileCheck.h. Would it be helpful to have each patch introduce annotations for one MatchType member? The first patch would likely be the largest as it would introduce boilerplate.

Unless someone (immediately) says it won't be helpful, I'm going to put some effort into this approach at splitting up the patch.

jdenny updated this revision to Diff 171782.Oct 30 2018, 1:56 PM
jdenny retitled this revision from [FileCheck] Annotate input dump to [FileCheck] Annotate input dump (1/7).
jdenny edited the summary of this revision. (Show Details)
  • Split up the patch. This is now the first in the series. Each patch has a description and example of the diagnostics for which it implements input annotations.
  • Add the input file name and check file name to the annotations description to further address a reviewer concern.
  • Don't use the word "key" to identify the annotations description because technically that's not what it is.
jdenny edited the summary of this revision. (Show Details)Oct 30 2018, 1:57 PM
jdenny set the repository for this revision to rL LLVM.Oct 30 2018, 2:05 PM
zturner added inline comments.
llvm/docs/CommandGuide/FileCheck.rst
80–82

I haven't been following this too closely, but I'm wondering, are the 3 modes actually necessary? It sounds like the main use case here is that the user has a failure and wants to get more info. So they will set it to either fail or always. But do they care which? Basically, what I'm wondering is why not just make this be a binary on flag?

It just seems simpler to say that if you want to dump the input, pass --dump-input, and if you don't want to dump the input, pass nothing.

llvm/test/FileCheck/match-full-lines.txt
4–7

You could also achieve this without the command line flag by writing env FILECHECK_DUMP_INPUT= not FileCheck .... Pretty sure you can unset a variable and run a command this way.

jdenny added inline comments.Oct 30 2018, 3:11 PM
llvm/docs/CommandGuide/FileCheck.rst
80–82

-dump-input=always is helpful when you're debugging individual tests and encounter FileCheck successes you don't understand.

I was thinking -dump-input=fail (via an env var) is better when running full test suites, perhaps from a bot or IDE. Imagine a test with many successful FileCheck commands before the failed one. -dump-input=always might produce massive output that will be logged and that must be scrolled/grepped through to find the failure, depending on how you like to interact with test suite logs.

In any case, -dump-input=fail was inspired by the existing -dump-input-on-failure, which I believe @george.karpenkov said is used in bots. Is that right, George?

llvm/test/FileCheck/match-full-lines.txt
4–7

I don't recall why I did it that way, and your way does seem more obvious. I'll change it.

While we're talking about this, I'd eventually like to handle this issue in another way entirely. In FileCheck's own test suite, I'd like to ban direct FileCheck calls. Instead, I'd like to require test authors to choose one of the following for each call:

  • %FileCheckee: Used for FileCheck calls whose exit status and output are being checked. Normal FILECHECK_* environment variables are cleared within these calls. This avoids problems where people have these variables set for general testing purposes and thus change the FileCheck output being tested, creating spurious fails or passes. If these calls need to test env vars, then we could just have an alternate set of environment variables (FILECHECKEE_* maybe) that %FileCheckee copies to the normal variables.
  • %FileChecker: Used for FileCheck calls that check the output of %FileCheckee calls (or any other output under test). FILECHECK_* environment variables are obeyed just as they are obeyed for FileCheck calls in other test suites.

Of course, this would be another patch, which I don't have time for right now. Do you think the idea has merit?

jdenny updated this revision to Diff 171998.Oct 31 2018, 1:05 PM
  • Address zturner's comment: prefer FILECHECK_DUMP_INPUT=never over -dump-input=never when avoiding user environment variables.
  • Set FILECHECK_DUMP_INPUT=never in test/FileCheck/verbose.txt as well. Somehow I didn't notice this one was failing before.
jdenny updated this revision to Diff 173040.Nov 7 2018, 2:59 PM
jdenny marked 2 inline comments as done.
jdenny edited the summary of this revision. (Show Details)
  • Rebased.
  • Removed FILECHECK_DUMP_INPUT env var. The new FILECHECK_OPTS is sufficient.
  • Removed unsetting of FILECHECK_* env vars in tests. That should be handled more carefully in a separate patch.

It really seems like DiagList and AnnotationList ought to be vectors, not lists. They are append-only, and AnnotationList gets converted to a vector anyway to sort it. The code doesn't depend on the element-pointer stability guarantee of a list, except in one place noted below which can be fixed.
It's quite possible a vector would perform less well, in the face of many diags/annotations, but as the diags/annotations are the unusual case, performance is not really a big consideration.

llvm/utils/FileCheck/FileCheck.cpp
225

range-for here?

263

This appears to be the only place that functionally depends on AnnotationList being a <list>. But if you built B as a stack instance first, then you can push_back when you're done, and then AnnotationList can be a <vector> instead.

Regarding the bit about environment variables, probably the right thing to do is add a lit.local.cfg to the FileCheck test directory, that zaps the environment variables. Then most individual FileCheck tests can assume a default environment, and tests for specific non-defaults can set the environment variables directly in the test file.
It would mean you can't run the FileCheck tests *from lit* with an environment variable set. Unless we invented another env var that lit.local.cfg would look for... but that's for another patch, at this point.

It really seems like DiagList and AnnotationList ought to be vectors, not lists. They are append-only, and AnnotationList gets converted to a vector anyway to sort it. The code doesn't depend on the element-pointer stability guarantee of a list, except in one place noted below which can be fixed.
It's quite possible a vector would perform less well, in the face of many diags/annotations, but as the diags/annotations are the unusual case, performance is not really a big consideration.

I was thinking of many diags (-v with many checks) because that's where the annotations have proven most helpful to me as a visualization. I was thinking the conversion to vector would be a small penalty in comparison to the potentially many copies during reiszing the vector. I was more concerned about the time impact than the space impact, which I guessed would be small on a modern system.

Having said all that, I don't have a strong opinion here, and we can change it later if someone does some performance measurements, so I'd be happy to use vector now with no further discussion if my above arguments aren't convincing. Just say the word.

Regarding the bit about environment variables, probably the right thing to do is add a lit.local.cfg to the FileCheck test directory, that zaps the environment variables. Then most individual FileCheck tests can assume a default environment, and tests for specific non-defaults can set the environment variables directly in the test file.
It would mean you can't run the FileCheck tests *from lit* with an environment variable set. Unless we invented another env var that lit.local.cfg would look for... but that's for another patch, at this point.

That's fine as a first step to addressing that issue. However we do it, I think it should move to a separate patch as the issue exists independently of this patch series. Agreed?

By the way, this issue also exists in the lit test suite.

llvm/utils/FileCheck/FileCheck.cpp
225

D53893 (later patch in this series) makes use of the iterators. Let me know if you think there's a better way.

jdenny updated this revision to Diff 174784.Nov 20 2018, 8:45 AM
  • Rebase, and extend for the new CHECK-COUNT-<num> directive.
  • Convert DiagList and AnnotationList to std::vector, as probinson recommended.
jdenny marked an inline comment as done.Nov 20 2018, 8:50 AM

Apologies for the delay. I haven't been ignoring this series; I was having internal qualms about the amount of effort to produce extensive annotations, and the value they might provide. But I've come down in favor of doing it.

llvm/docs/CommandGuide/FileCheck.rst
83

in favor of --dump-input=fail.

llvm/utils/FileCheck/FileCheck.cpp
96

There's a way to make the argument be an enum, which has a variety of advantages. Please do it that way.

139

I'd omit the S part. 6: is clearly a line number, you don't need to document that the colon has a space after it.

671

DumpInput == "never" ? nullptr : &Diags
so we don't bother collecting diags that we will never print. Saves a small bit of time and memory, but this tool is used a *lot* in the default "never" mode and it's worth doing that small optimization.

672

So, I can say -dump-input-on-failure -dump-input=fail and it will dump the input twice? I think -dump-input-on-failure should just set -dump-input=fail (if -dump-input didn't appear separately, i.e. the new option takes precedence) and you only get one dump.

674

The detailed description of the annotations becomes long enough that I think including it with the dumped input starts to get in the way. Maybe have a -dump-input=help that will print the description and quit, or something along those lines.

Apologies for the delay. I haven't been ignoring this series; I was having internal qualms about the amount of effort to produce extensive annotations, and the value they might provide.

No problem. Thanks for the careful consideration.

But I've come down in favor of doing it.

Great! I'll work on your suggestions soon.

llvm/utils/FileCheck/FileCheck.cpp
672

So, I can say -dump-input-on-failure -dump-input=fail and it will dump the input twice?

I was thinking -dump-input-on-failure would be removed eventually, so I decided to be lazy. I'll change it.

I think -dump-input-on-failure should just set -dump-input=fail (if -dump-input didn't appear separately, i.e. the new option takes precedence) and you only get one dump.

Do you want -dump-input=never -dump-input-on-failure to be the same as -dump-input=never or -dump-input=fail?

674

The detailed description of the annotations becomes long enough that I think including it with the dumped input starts to get in the way. Maybe have a -dump-input=help that will print the description and quit, or something along those lines.

If the input dump is short, I usually don't need the dump. I usually need the dump when it's long, and then the description is relatively tiny and doesn't feel like it's in the way. Moreover, I don't want to force users to remember how to obtain that description (FileCheck isn't even normally in my PATH, so that's another barrier), so I think it's more convenient just to print it with the dump.

However, you're the second person with your opinion, so I'm outvoted, and I'm fine with that. Any objection to always dumping a reminder that -dump-input=help exists?

probinson added inline comments.Nov 30 2018, 12:44 PM
llvm/utils/FileCheck/FileCheck.cpp
672

cl::opt doesn't support command-line-order checks, so what I'd do is have the -dump-input enum have a Default. Then if -dump-input is Default, you set it to Never or Fail depending on -dump-input-on-failure. If/when we get rid of -dump-input-on-failure, we can set the -dump-input default to Never and get rid of the Default enum too.

674

A reminder that -dump-input=help exists would be totally appropriate.

jdenny added inline comments.Nov 30 2018, 1:09 PM
llvm/utils/FileCheck/FileCheck.cpp
674

A reminder that -dump-input=help exists would be totally appropriate.

I'm assuming that -dump-input=fail might be used by bots, and I'm thinking about what happens when someone is reading without a terminal handy to run FileCheck -dump-input=help. Should we assume such people will quickly become familiar enough with these annotations that they don't need the description, or should we offer something more?

Perhaps the description should also appear in rst/html documentation, and perhaps the reminder should be a pointer to that instead of -dump-input=help because the former is more universally accessible.

What do you think?

probinson added inline comments.Nov 30 2018, 1:37 PM
llvm/utils/FileCheck/FileCheck.cpp
674

If a test failure is so involved that the annotations would be helpful, I think people would be running the test locally to try to debug it. So, getting the help from the tool should be fine.

jdenny updated this revision to Diff 176895.Dec 5 2018, 4:42 PM
jdenny edited the summary of this revision. (Show Details)
jdenny set the repository for this revision to rL LLVM.

Made changes suggested by probinson:

  • In docs, say -dump-input-on-failure is deprecated in favor of -dump-input=fail not -dump-input.
  • Use cl::values (enum) for -dump-input value.
  • In annotation description, don't document space after line number.
  • Don't bother collecting diagnostics if -dump-input=never.
  • Instead of leaving -dump-input-on-failure as a separate feature, make it an alias for -dump-input=fail but only when -dump-input is not otherwise specified.
  • Move annotation description to a -dump-input=help option.

I also removed the detailed description of the currently enabled
markers. In place of that, I added a few notes to the brief
description that preceded it. Here's why:

  • The reviews so far have led me to believe that having two descriptions was more confusing than helpful. George questioned whether the marker descriptions should be repeated (D52999#1260736). George didn't understand the short version of the !~~ description (D52999#1260736). Paul questioned why !~~ should represent both a discarded match and actual errors (D53898#1317412). (Perhaps we really need an additional marker, but my assumption is that the docs just aren't clear.)
  • The detailed description added complexity to multiple parts of the existing implementation (MatchType, MatchTypeStyle, and of course DumpInputAnnotationHelp).
  • Previously the detailed description changed based on whether color was enabled and based on the verbosity level. That seemed to make sense when the description printed with the dump. Now it requires the user to be sure to specify certain options the same when requesting help as when requesting the dump. The options in the latter case might be buried in a FILECHECK_OPTS in a bot, IDE, or script, so getting the options right for help could be error-prone.

If you feel that part of this change makes things worse, I can revert
it. Hopefully it's a helpful simplification and can be tweaked to
address any further problems.

jdenny marked 21 inline comments as done.Dec 5 2018, 4:50 PM
jdenny added inline comments.
llvm/docs/CommandGuide/FileCheck.rst
80–82

Seeing no followup for a while, I'm assuming my response here was satisfactory, and I'm marking this done.

llvm/lib/Support/FileCheck.cpp
1039

Seeing no followup for a while, I'm assuming my response here was satisfactory, and I'm marking this done.

llvm/utils/FileCheck/FileCheck.cpp
225

Seeing no followup for a while, I'm assuming my response here was satisfactory, and I'm marking this done.

jdenny marked 3 inline comments as done.Dec 5 2018, 4:52 PM
jdenny added a comment.EditedDec 5 2018, 4:56 PM

Why did I have to set the repository again? I already did that on 10/30.

jdenny added a comment.Dec 7 2018, 7:51 AM

Another way to improve the clarity of markers (to avoid the confusion discussed in, for example, D53898#1317412) occurred to me: Except for markers indicating successful completion of a directive (green ^~~ for most directives, and green X~~ for CHECK-NOT), I could add a note at the end of the marker. For example:

$ FileCheck -vv -dump-input=always check4a < input4a |& sed -n '/^<<<</,$p'
<<<<<<
         1: 01234
check:1     ^~
not:2         X~~
         2: 56789
not:2       ~~~~~
         3: abcdef
dag:3       ^~~~
dag:4'0       !~~~ discard: overlaps earlier match
         4: cdefgh
dag:4'1     ^~~~
next:5          !~ error: same line as previous match
>>>>>>

$ cat check4a
CHECK: 01
CHECK-NOT: foobar
CHECK-DAG: abcd
CHECK-DAG: cdef
CHECK-NEXT: gh

$ cat input4a 
01234
56789
abcdef
cdefgh

This shouldn't add much complexity to the implementation. Unless someone is opposed, I'll try to work on it soon.

jdenny updated this revision to Diff 177380.Dec 7 2018, 10:14 PM
jdenny edited the summary of this revision. (Show Details)

Add explanatory note to any marker not indicating successful completion of a directive (that is, anything not green).

probinson accepted this revision.Dec 10 2018, 1:31 PM

I like having the extra explanations at the end of the tilde lines, that's a great idea.
One unnecessary include left, and LGTM.

llvm/include/llvm/Support/FileCheck.h
21

I believe you don't need <list> anymore.

I like having the extra explanations at the end of the tilde lines, that's a great idea.
One unnecessary include left, and LGTM.

Thanks! I'll fix the include soon.

I won't commit this immediately as I'm thinking it's best to commit the entire patch series at the same time. Let me know if you feel differently.

jdenny updated this revision to Diff 177621.Dec 10 2018, 4:58 PM
jdenny set the repository for this revision to rL LLVM.

Remove unnecessary include.

jdenny marked an inline comment as done.Dec 10 2018, 5:02 PM

A couple of nits I didn't notice until looking at the next patch.

llvm/include/llvm/Support/FileCheck.h
143

Comments should be proper sentences.

148

The style guide doesn't actually say this, but it's pretty much universal in the code base to declare each member separately.

jdenny marked an inline comment as done.Dec 12 2018, 10:37 AM
jdenny added inline comments.
llvm/include/llvm/Support/FileCheck.h
143

Thanks. Before I adjust all the patches, does the following work, or was this more than a style issue?

/// Indicates no match for an expected pattern.
probinson added inline comments.Dec 12 2018, 11:50 AM
llvm/include/llvm/Support/FileCheck.h
143

Just style. What you propose is fine.

jdenny updated this revision to Diff 177923.Dec 12 2018, 3:11 PM

Adjust comment, as suggested by probinson.

Change GetMarker's parameter type from unsigned to FileCheckDiag::MatchType.

jdenny marked 2 inline comments as done.Dec 12 2018, 3:14 PM
jdenny marked an inline comment as done.Dec 12 2018, 3:18 PM
jdenny added inline comments.
llvm/include/llvm/Support/FileCheck.h
148

Ah, I missed that. I'll get it before committing.

jdenny marked 2 inline comments as done.Dec 15 2018, 5:54 AM
jdenny added inline comments.
llvm/include/llvm/Support/FileCheck.h
148

Done in D55738.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 17 2018, 4:05 PM
Closed by commit rL349418: [FileCheck] Annotate input dump (1/7) (authored by jdenny, committed by ). · Explain Why
This revision was automatically updated to reflect the committed changes.