This is an archive of the discontinued LLVM Phabricator instance.

[FileCheck] Implement -dump-input-context
ClosedPublic

Authored by jdenny on Jun 19 2020, 9:31 AM.

Details

Summary

This patch is motivated by discussions at each of:

When input is dumped as specified by -dump-input=fail, this patch
filters the dump to show only input lines that are the starting lines
of error diagnostics plus the number of contextual lines specified
-dump-input-context (defaults to 5).

When -dump-input=always, there might be not be any errors, so all
input lines are printed, as without this patch.

Here's some sample output with -dump-input-context=3 -vv:

<<<<<<
           .
           .
           .
          13: foo
          14: foo
          15: hello world
check:1       ^~~~~~~~~~~
          16: foo
check:2'0     X~~ error: no match found
          17: foo
check:2'0     ~~~
          18: foo
check:2'0     ~~~
          19: foo
check:2'0     ~~~
           .
           .
           .
          27: foo
check:2'0     ~~~
          28: foo
check:2'0     ~~~
          29: foo
check:2'0     ~~~
          30: goodbye word
check:2'0     ~~~~~~~~~~~~
check:2'1     ?            possible intended match
          31: foo
check:2'0     ~~~
          32: foo
check:2'0     ~~~
          33: foo
check:2'0     ~~~
           .
           .
           .
>>>>>>

Diff Detail

Event Timeline

jdenny created this revision.Jun 19 2020, 9:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 19 2020, 9:31 AM
Herald added a subscriber: thopre. · View Herald Transcript
jdenny added inline comments.Jun 19 2020, 9:35 AM
llvm/utils/FileCheck/FileCheck.cpp
538

Oops, it's not always 10.

I'm thinking there could also be a -dump-input-filter command-line option to describe what lines of input should be sought, and -dump-input-context would describe how much context around those lines should be printed. It might have values like:

  • all: include all input lines. The effect is to disable -dump-input-context. This is probably best for CI.
  • annotation: include only input lines with annotations.
  • error: include only input lines with error diagnostics. This is very nearly what this patch implements now. It would probably be the default.
  • input: include only raw input lines and no annotations so you can more easily try that exact input again with FileCheck. This also would disable -dump-input-context.
arsenm added a subscriber: arsenm.Jun 19 2020, 5:34 PM
arsenm added inline comments.
llvm/utils/FileCheck/FileCheck.cpp
136

Does the description need to explicitly call out the default? I would expect that to be part of the standard opt dump, and it would avoid it getting out of sync with the actual value

jdenny marked an inline comment as done.Jun 19 2020, 6:03 PM
jdenny added inline comments.
llvm/utils/FileCheck/FileCheck.cpp
136

-help isn't printing it. Am I doing something wrong?

jdenny updated this revision to Diff 272259.Jun 20 2020, 7:57 AM
jdenny retitled this revision from [FileCheck] Implement -dump-input-context, and default to 5 to [FileCheck] Implement -dump-input-context and -dump-input-filter.
jdenny edited the summary of this revision. (Show Details)

With this patch in place, I tried to write some tests and quickly became frustrated with the missing context, so I implemented the -dump-input-filter idea I mentioned to make things more tunable. I also adjusted it so it doesn't elide lines if the ellipsis would occupy more lines.

jdenny marked an inline comment as done.Jun 20 2020, 8:01 AM

Are you still playing with this or are you looking for reviews? Or do you need more general feedback? (just making sure you're not blocked waiting on anything)

No, I'm not blocked. Feedback is welcome with the caveat that details might evolve as I write tests. I posted early in case anyone has high-level suggestions. Thanks for the support so far. :-)

Not looked at the code at all, but I like the idea of being able to limit the context. grep can do it, and FileCheck is basically a glorified grep, so it seems like a logical (and useful) extension!

rsmith added a subscriber: rsmith.Jun 22 2020, 2:26 PM

This direction looks great to me! The recent changes to FileCheck have made its output useless for many Clang tests (so much output is produced that the actual error message is not even in my scrollback!) and it looks like this would give a much better balance between terseness and including useful information, while still letting us produce the full output as part of CI.

Thanks for the feedback. I'll try to invest some more time in this tomorrow.

Sorry, late reply, but many thanks for working on this! I think the example in the description looks great, and this change would make dumping the input a very useful feature.

jdenny updated this revision to Diff 273059.Jun 24 2020, 9:37 AM
jdenny edited the summary of this revision. (Show Details)
jdenny set the repository for this revision to rG LLVM Github Monorepo.

Thanks for the feedback so far. This is ready for a detailed review now. I've made the following changes:

  • I've updated/extended FileCheck's test suite.
  • -dump-input-filter still defaults to error when -dump-input=fail (the default for -dump-input). However, it now defaults to all when -dump-input=always. The reason is that -dump-input=always is specifically useful for producing an input dump when there's no error, but -dump-input-filter=error always elides the entire input in that case.
  • -dump-input-filter and -dump-input-context can now be specified multiple times, and the value that requests the most informative/verbose input dump has precedence. This mimics the existing -dump-input behavior, which has proven to be useful when a test author specifies one of these options in a test case but a test runner specifies the same option with a different value in FILECHECK_OPTS.

I've added many reviewers to make sure people who've participated in this discussion previously have a chance to comment. I apologize if I'm spamming or missing anyone.

jdenny marked 2 inline comments as done.Jun 24 2020, 9:42 AM
jdenny added inline comments.
llvm/utils/FileCheck/FileCheck.cpp
136

This comment should be moot now that we're accepting multiple instances of the option and selecting the most informative/verbose value.

arsenm added inline comments.Jun 24 2020, 9:47 AM
llvm/utils/FileCheck/FileCheck.cpp
790–794

I already disliked the extra noise this message adds to the failure, and this adds additional lines. Can we only print this kind of information in a help mode?

jdenny marked an inline comment as done.Jun 24 2020, 9:50 AM
jdenny added inline comments.
llvm/utils/FileCheck/FileCheck.cpp
790–794

Sure, I can move the new lines into the -dump-input=help output.

To reduce the amount the output, could the input dump be restricted to the range between two labels? The rationale is that those ranges are logically independent blocks (such as a function) and the mismatch should be contained within such a block.

To reduce the amount the output, could the input dump be restricted to the range between two labels? The rationale is that those ranges are logically independent blocks (such as a function) and the mismatch should be contained within such a block.

Do you mean this as an alternative to the patch under review? As others have pointed out, the range between two labels can be large, and sometimes there are no labels and the input is large. Either way, its seems that restricting the dump to the range between two labels would often not solve the verbosity problem people are reporting, so I don't think that should be the default behavior. The patch under review should do better at limiting the verbosity per error.

Do you instead mean this as an additional opt-in feature? I'm not opposed if people feel that would be helpful, but I think it should be a later review. It might be enabled by -dump-input-context=label.

The LABEL can be large as well (or the test does not have labels), but I think it would a nice feature to have in addition to line limits since it is not as arbitrary as a number of lines. Both approaches can be combined, i.e. restrict emitted lines to within two labels, even if -dump-input-context alone would print lines before and after.

Just asking to consider this, it might be too difficult to implement.

The LABEL can be large as well (or the test does not have labels), but I think it would a nice feature to have in addition to line limits since it is not as arbitrary as a number of lines. Both approaches can be combined, i.e. restrict emitted lines to within two labels, even if -dump-input-context alone would print lines before and after.

A question would be which label section. Given the patch under review, I think -dump-input-filter=error is an intuitive way to answer that question, and -dump-input-context=label is an intuitive way to specify the full label section should be dumped as the context of the error. Something like -dump-input-context=label+10 could be a way to indicate additional context outside the label section, if that's useful.

Just asking to consider this, it might be too difficult to implement.

I think the main challenge would be that label information is normally not available to the input dump implementation unless -v is specified, but we could make -dump-input-context=label enable it.

jdenny updated this revision to Diff 273443.Jun 25 2020, 10:27 AM

Rebased.

Applied reviewer suggestion to move input-dump-related command-line options to -dump-input=help output.

jdenny marked an inline comment as done.Jun 25 2020, 10:30 AM

High level question - do we need additional command line flags here? Why not build this into the default actions that FileCheck takes? These detailed flags are unlikely to be used by people in practice, and this expands the surface area of FileCheck that needs to be maintained - it also expands the testing surface of FileCheck itself.

High level question - do we need additional command line flags here? Why not build this into the default actions that FileCheck takes?

By default, we'd have -dump-input-filter=error and -dump-input-context=5. That will hopefully address the concern that the current default (without this patch) is too verbose for many people.

These detailed flags are unlikely to be used by people in practice, and this expands the surface area of FileCheck that needs to be maintained - it also expands the testing surface of FileCheck itself.

I'm already using the patch under review while writing tests for other work, and the defaults I mentioned above are sometimes just right but other times are not nearly verbose enough. I frequently need more context to understand what has gone wrong in a test, so I find myself adjusting these options via FILECHECK_OPTS. I hope CI configurations will set FILECHECK_OPTS=-dump-input-filter=all -vv (maybe -color too) so that no one will ever miss context they need to debug a failure.

A major lesson I've learned from this discussion is that different LLVM developers have significantly different needs here, depending on what they're testing and probably depending on personal preferences. Tolerable defaults plus configurability seems key to addressing the various use cases.

By the way, I'm working on splitting up this patch to make it easier to review. Maybe it won't seem as complex after that.

Ok, but instead of allowing "fine grained control", why not allow a "tasteful default" and "all" option? The "all" option seems perfectly reasonable for the weird case, and eliminates a big configuration matrix. This would just entail turning the command line option into a magic number / enum.

-Chris

Ok, but instead of allowing "fine grained control", why not allow a "tasteful default" and "all" option? The "all" option seems perfectly reasonable for the weird case,

Hopefully we can reach a consensus on what's tasteful. I figured there would always be someone unhappy with the default, but then they'd have a way out via these options and FILECHECK_OPTS.

and eliminates a big configuration matrix. This would just entail turning the command line option into a magic number / enum.

If the configuration space would reduce to a bool that either selects the default or all, there would be two main changes in terms of implementation and testing:

  1. The elimination of -dump-input-context would mean that DumpInputContext would be replaced by 5 (or whatever value people ultimately agree on). That doesn't seem like much of a win. I will also need to rewrite the test dump-input-context.txt, which checks that the right amount of context is shown vs. elided in various boundary cases. It currently tests this by varying -dump-input-context, but I'll need to change it to vary the amount of input instead. At first glance, that seems more complex to write and maintain than what's there now, especially if the magic 5 should ever change, but maybe I'm overlooking a trick.
  1. The change to -dump-input-filter would mean that FindInputLineInFilter would handle only two cases instead of four. However, the infrastructure to support those two cases is the same infrastructure needed to support all four. The result is that only this one already simple function would shrink. Again, that doesn't seem like a win. The test dump-input-filter.txt will also get a little smaller, but not much less complex, in my opinion.

If the win is that the configuration space would be significantly easier to understand for a user, I might be able to buy that for #2 but not for #1.

A middle ground here might be to drop -dump-input-filter but keep -dump-input-context as is. People can specify -dump-input-context=99999 (or maybe some magic string like all) if they want all the context.

What do you think? Thanks for your feedback.

jdenny updated this revision to Diff 274456.Jun 30 2020, 6:54 AM

Fixed a FileCheck test broken by my last update.

jdenny updated this revision to Diff 275237.Jul 2 2020, 2:26 PM
jdenny retitled this revision from [FileCheck] Implement -dump-input-context and -dump-input-filter to [FileCheck] Implement -dump-input-context.
jdenny edited the summary of this revision. (Show Details)

I've split this review into three:

  • D83091: [FileCheck] Improve -dump-input documentation: This extracts some documentation changes that stand on their own.
  • D82203: [FileCheck] Implement -dump-input-context: This is the current review. I've removed -dump-input-filter so we can decide whether it's worthwhile separately. I've kept -dump-input-context. I think it is frequently useful, it adds very little complexity to the implementation and interface, and it actually seems to make testing the elision of the input easier.
  • D83097: [FileCheck] Implement -dump-input-filter

For the first and third reviews, I've added only a subset of the reviewers here as this review is the focus, but obviously everyone is welcome on all three.

jdenny updated this revision to Diff 276880.Jul 9 2020, 5:26 PM

In an effort to facilitate the review, I've extracted the logic that elides input lines only when the ellipsis is shorter. That's now in D83526.

I've also rebased.

jdenny added a comment.Jul 9 2020, 5:33 PM

I have no more simplifications planned, and there are no outstanding dependencies. This patch is ready for a review.

mehdi_amini accepted this revision.Jul 9 2020, 8:26 PM
This revision is now accepted and ready to land.Jul 9 2020, 8:26 PM

Thanks for the reviews!

This revision was automatically updated to reflect the committed changes.