Page MenuHomePhabricator

Change filecheck default to dump input on failure
ClosedPublic

Authored by mehdi_amini on Jun 8 2020, 12:44 PM.

Details

Summary

Having the input dumped on failure seems like a better
default: I debugged FileCheck tests for a while without knowing
about this option, which really helps to understand failures.

Remove -dump-input-on-failure and the environment variable
FILECHECK_DUMP_INPUT_ON_FAILURE which are now obsolete.

Diff Detail

Event Timeline

mehdi_amini created this revision.Jun 8 2020, 12:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2020, 12:44 PM
probinson added subscribers: jdenny, probinson.

I don't remember the exact reasoning but I believe it had something to do with bot logs? @jdenny or @thopre might remember.

Update tests

Herald added a reviewer: herhut. · View Herald Transcript
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a reviewer: silvas. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript
Herald added subscribers: Restricted Project, cfe-commits, msifontes and 20 others. · View Herald Transcript
aartbik accepted this revision.Jun 8 2020, 1:45 PM

Yes, assuming nobody remembers an adverse side-effect, I think this change will help debugging future failures.

This revision is now accepted and ready to land.Jun 8 2020, 1:45 PM
mehdi_amini added a comment.EditedJun 8 2020, 1:46 PM

I don't remember the exact reasoning but I believe it had something to do with bot logs? @jdenny or @thopre might remember.

It'd be interesting to hear about it: having the bot log including the input on failure could indeed help debugging when something fails in flaky way, or is dependent on the environment (like the path where the code is checked out, etc.).
At least in the past it could have saved me trouble when only a single bot was failing in mysterious way and the bot owner had to be involved to help figure out.

(also right now we're inconsistent: many tests are adding the flag explicitly)

eugenis accepted this revision.Jun 8 2020, 1:59 PM
eugenis added a subscriber: eugenis.

This would help debugging sanitizer failures on the bots a lot.

jpienaar accepted this revision.Jun 8 2020, 2:11 PM

I think this is a good default given it provides very useful info in failure case. For cases where folks expect it to fail + don't want it logged, then setting never (if that is a concern) seems better behavior IMHO.

thopre accepted this revision.Jun 8 2020, 3:09 PM

I don't remember the exact reasoning but I believe it had something to do with bot logs? @jdenny or @thopre might remember.

I guess -dump-input just followed the existing default behavior when -dump-input-on-failure was introduced. There was several mentions of whether to make it the default when that was introduced [1][2] and my understanding was the worry was on whether it would make some FileCheck test fail. I think with FileCheckOpt this is no longer a concern. I would have personally benefited from having this the default so I support this change.

[1] https://reviews.llvm.org/D49328?id=155527#inline-433596
[2] https://reviews.llvm.org/D49328#1170186

jdenny added a comment.Jun 8 2020, 3:43 PM

I don't remember the exact reasoning but I believe it had something to do with bot logs? @jdenny or @thopre might remember.

The only problem I've seen is the extreme verbosity when there are many failures. For example, I have experienced cases where that verbosity hung an IDE that was collecting the output. But if people have problems, they can set FILECHECK_OPTS=-dump-input=never, so it's probably fine.

It'd be interesting to hear about it: having the bot log including the input on failure could indeed help debugging when something fails in flaky way, or is dependent on the environment (like the path where the code is checked out, etc.).
At least in the past it could have saved me trouble when only a single bot was failing in mysterious way and the bot owner had to be involved to help figure out.

(also right now we're inconsistent: many tests are adding the flag explicitly)

I imagined more and more bots would set this eventually via FILECHECK_OPTS. Changing the default seems fine too.

llvm/test/FileCheck/dump-input-enable.txt
78

Are the tests in this section passing for you now that the default is fail?

127–128

The preceding section checks -dump-input=fail, so you can probably just remove this -dump-input-on-failure section.

128

Why is -dump-input=never needed here?

jdenny added a comment.Jun 8 2020, 3:48 PM

Also, please grep for FILECHECK_DUMP_INPUT_ON_FAILURE. There are a few more occurrences to remove from the repo.

jdenny added inline comments.Jun 8 2020, 3:58 PM
llvm/utils/FileCheck/FileCheck.cpp
117

Please mention in the patch summary that -dump-input-on-failure and FILECHECK_DUMP_INPUT_ON_FAILURE are being dropped.

mehdi_amini marked 3 inline comments as done.

Fix more tests

mehdi_amini added inline comments.Jun 8 2020, 6:20 PM
llvm/test/FileCheck/dump-input-enable.txt
78

Now they do :)
I originally pushed to Phab to get the pre-merge testing running while I was still fixing a few things locally, I didn't expect such fast reviews!
Should be all good now.

jdenny added inline comments.Jun 9 2020, 7:51 AM
llvm/test/FileCheck/dump-input-enable.txt
77

The previous section already covers -dump-input=never. This section was covering the case where -dump-input isn't specified. Maybe you should just update it to check that the default is now fail instead of never.

llvm/test/FileCheck/lit.local.cfg
42

Please add -dump-input=never to this first FileCheck call. Otherwise, the example is broken because FileCheck never prints expected string found in input when -dump-input=fail regardless of -v.

llvm/utils/FileCheck/FileCheck.cpp
117

This is marked as done, but I don't see the change.

jdenny added a comment.Jun 9 2020, 8:00 AM

By the way, -vv combined with -dump-input=fail provides a lot of additional information that can be helpful when debugging. Bots could pass that via FILECHECK_OPTS, or it could become the default when -dump-input is not never. In the latter case, I suggest a separate patch to make the review easier. What do people think?

mehdi_amini edited the summary of this revision. (Show Details)Jun 9 2020, 9:41 AM
mehdi_amini marked 4 inline comments as done.
mehdi_amini edited the summary of this revision. (Show Details)

Address @jdenny's comments:

  • fix the example in lit.local.cfg
  • Test the default value for dump-input
llvm/test/FileCheck/lit.local.cfg
42

Well spotted!

llvm/utils/FileCheck/FileCheck.cpp
117

That's because I amended my commit locally but arc does not propagate the update to Phabricator apparently: https://phabricator.kde.org/T7711 ; I'll do it manually

jdenny accepted this revision.Jun 9 2020, 11:20 AM

LGTM. Thanks!

This revision was automatically updated to reflect the committed changes.
arsenm added a subscriber: arsenm.Jun 9 2020, 5:17 PM

I think this is a worse default for development for large tests. For some generated check lines, I'm seeing many thousand of line dumps on failure, which just makes it harder to see the point it actually failed at. Can we move this default into the buildbot defaults or something so the log is clearer when a handful of tests fails? When working on patches, there are often a larger number of failures

mehdi_amini added a comment.EditedJun 9 2020, 5:45 PM

I think this is a worse default for development for large tests.

Maybe the issue is with large tests that needs to be broken up?
To me this is a general improvement during development at my desk and reading a test output to understand the failure: this goes beyond build bots.

(you have the environment variable you can set locally if you don't want the behavior though)

arsenm added a comment.Jun 9 2020, 6:30 PM

I think this is a worse default for development for large tests.

Maybe the issue is with large tests that needs to be broken up?

This isn't really manageable, especially with the trend of using update_* test checks scripts. Stuff like legalization tests just have to stress every combination of inputs. The -dump-input-on-failure could also be smarter about how much context it prints

To me this is a general improvement during development at my desk and reading a test output to understand the failure: this goes beyond build bots.

(you have the environment variable you can set locally if you don't want the behavior though)

The environment variable was removed though? I would also at least expect this to be an option I can set at cmake time and never have to think about again

jdenny added a comment.Jun 9 2020, 7:54 PM

The environment variable was removed though? I would also at least expect this to be an option I can set at cmake time and never have to think about again

Could you set FILECHECK_OPTS=-dump-input=never in your ~/.profile?

mehdi_amini added a comment.EditedJun 9 2020, 8:32 PM

I think this is a worse default for development for large tests.

Maybe the issue is with large tests that needs to be broken up?

This isn't really manageable, especially with the trend of using update_* test checks scripts. Stuff like legalization tests just have to stress every combination of inputs.

This is something that the script generating every combination could manage to split as well?
Alternatively, can these very large test be appended --dump-input=never on the RUN line? (maybe the test generator can do this?)

dstenb added a subscriber: dstenb.Jun 10 2020, 4:46 AM

I'm thinking about a possible improvement here: we could have FileCheck dump the input for the current CHECK-LABEL section only: it seems like it could reduce the output drastically while still preserving how useful the information is?

I'm thinking about a possible improvement here: we could have FileCheck dump the input for the current CHECK-LABEL section only: it seems like it could reduce the output drastically while still preserving how useful the information is?

That would help some. In my situation I've been making a lot of changes that touch dozens of generated MIR tests, and change most of the functions. If the dump constrained itself to just the first -LABEL section it would be more manageable

I'm thinking about a possible improvement here: we could have FileCheck dump the input for the current CHECK-LABEL section only: it seems like it could reduce the output drastically while still preserving how useful the information is?

One FileCheck invocation can report multiple errors, one per CHECK-LABEL section, and I prefer to see all errors every time.

I also prefer to see the entire input dump (with annotations produced by -vv) as sometimes the error FileCheck reports is far away from the actual error. For example, maybe the CHECK-LABEL directives matched text at unexpected locations in the input, and then the directives in their sections failed as a result. It might be hard to determine that the CHECK-LABEL directives were at fault without seeing the text it should have matched outside of its (incorrect) section.

If you choose to limit the dump to one CHECK-LABEL section, please make that behavior optional via a command-line option (which can be set via FILECHECK_OPTS).

I'm thinking about a possible improvement here: we could have FileCheck dump the input for the current CHECK-LABEL section only: it seems like it could reduce the output drastically while still preserving how useful the information is?

One FileCheck invocation can report multiple errors, one per CHECK-LABEL section, and I prefer to see all errors every time.

Yeah I had in mind to show all the errors for the current section, then the input for the section, before moving to the next section.

I also prefer to see the entire input dump (with annotations produced by -vv) as sometimes the error FileCheck reports is far away from the actual error. For example, maybe the CHECK-LABEL directives matched text at unexpected locations in the input, and then the directives in their sections failed as a result. It might be hard to determine that the CHECK-LABEL directives were at fault without seeing the text it should have matched outside of its (incorrect) section.

Sure: but that seems like not the most common case to me: if we start by showing the section we matched it provides already a pretty good indication.

If you choose to limit the dump to one CHECK-LABEL section, please make that behavior optional via a command-line option (which can be set via FILECHECK_OPTS).

I thought it would be a better default actually, leaving the full-full as opt-in

I'm thinking about a possible improvement here: we could have FileCheck dump the input for the current CHECK-LABEL section only: it seems like it could reduce the output drastically while still preserving how useful the information is?

One FileCheck invocation can report multiple errors, one per CHECK-LABEL section, and I prefer to see all errors every time.

Yeah I had in mind to show all the errors for the current section, then the input for the section, before moving to the next section.

I'm not sure I understand. Currently, (1) FileCheck prints all errors up front in case that's sufficient without the dump, and then (2) FileCheck prints the dump with annotations showing the errors again (and successful directives, including labels, if -vv). Are you talking about interleaving 1 and 2? Does that actually make it easier to read? Maybe just omit lines from the dump instead?

I also prefer to see the entire input dump (with annotations produced by -vv) as sometimes the error FileCheck reports is far away from the actual error. For example, maybe the CHECK-LABEL directives matched text at unexpected locations in the input, and then the directives in their sections failed as a result. It might be hard to determine that the CHECK-LABEL directives were at fault without seeing the text it should have matched outside of its (incorrect) section.

Sure: but that seems like not the most common case to me: if we start by showing the section we matched it provides already a pretty good indication.

If you choose to limit the dump to one CHECK-LABEL section, please make that behavior optional via a command-line option (which can be set via FILECHECK_OPTS).

I thought it would be a better default actually, leaving the full-full as opt-in

I'm not sure how common it is, and I don't know what the default should be. From the command line, I can always run again with different options to get missing info. From the bots, it's harder to get missing info (which is why I suggested -vv for the bots).

I'm growing to dislike the new behavior even more as I use it. I don't think restricting the dump to CHECK-LABEL prefixes will even help. The amount of context printed previously was good enough for development use. If I ever can't figure it out from the diff, I want to look at the output completely separate from the terminal. This is adding a lot of noise that doesn't help show me the failure point, and often fills my terminal scrollback buffer.

For AMDGPU test cases for example, every function has a long chunk of metadata that is irrelevant to the majority of tests. This floods my terminal even if a single function failed between check labels. I think this should revert to the old behavior. I think dump full on input would only be a reasonable default on build bots (where even then it might need some restrictions to avoid gigantic build logs if a really broken patch were committed).

FWIW I didn't see this change initially but was pleasantly surprised with the extra output when debugging a newly added (relatively small) test.

Perhaps llvm-dev is the right place to discuss the default behavior.

Regardless of that default, perhaps there should be an option that limits the dump to N lines that always encloses the first error.

mehdi_amini added a comment.EditedJun 15 2020, 3:20 PM

The amount of context printed previously was good enough for development use. If I ever can't figure it out from the diff, I want to look at the output completely separate from the terminal.

I have the totally opposite experience: I can't do any debugging efficiently without this option. Empirically, most new-comers to writing tests with FileCheck (in MLIR, TensorFlow, XLA, IREE, etc.) have been manually adding -dump-input-on-failure hard-coded into the test themselves, which is a pretty good indication.

Having something contextual with some limited scope around the failures would work for me as well (the previous situation was just not usable for me however)

I would also expect a simple command line flag to llvm-lit to be able to control this, rather than having to set an environment variable

mlir/test/mlir-tblgen/op-format-spec.td
1

All of these MLIR tests are microscopic and I don't think this is a representative sample across all the projects. Most testcases are significantly larger, and have hundreds if not thousands of lines of output

mehdi_amini marked an inline comment as done.Jun 16 2020, 2:36 PM

I would also expect a simple command line flag to llvm-lit to be able to control this, rather than having to set an environment variable

Would you like a lit command line flag that set the environment variable? I think that's easily doable

mlir/test/mlir-tblgen/op-format-spec.td
1

Well, if this can act as an extra incentive to break-up such large tests, it see it as a win: debugging failing patterns in the middle of such a large test output is not nice regardless.