This is an archive of the discontinued LLVM Phabricator instance.

Improve lit.py's usability by highlighting failing lines.
AbandonedPublic

Authored by varungandhi-apple on May 28 2020, 12:07 AM.

Details

Reviewers
yln
Summary
  • Before:
    • default (no flags): no script, no command output
    • -v: full script, no command output
    • -vv: full script, shows command output up to and including failing line
  • After:
    • default (no flags): no script, show only failing line in command output
    • -v: full script, highlight failing line in script, shows command output up to and including failing line, highlight failing output
    • -vv: alias to -v (preserve backwards compatibility)

Background discussion: http://lists.llvm.org/pipermail/llvm-dev/2019-September/135098.html
This patch doesn't follow the exact suggestion in that thread, but I think it's pretty close and arguably better.

I do have a couple of questions:

  1. Is it ok to rename echo_all_commands to runWithCommandOutput? I think the latter represents the meaning better (since the behavior is no longer controlled by the --echo-all-commands flag), but that might break compatibility for out-of-tree users (should be a small fix though).
  2. Should TerminalController be split out into a different file?

I don't feel strongly about the colors/styling. Happy to change it if there are common color schemes where it would become unreadable.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2020, 12:07 AM
varungandhi-apple added a comment.EditedMay 28 2020, 12:12 AM

Consider the test file:

# RUN: true
# RUN: false
# RUN: true

After this patch, the default output becomes:

******************** TEST 'lit :: dummy.py' FAILED ********************

Command Output (stdout, truncated):
--
$ ":" "RUN: at line 2"
$ "false"
note: command had no output on stdout or stderr
error: command failed with exit status: 1

--
(NOTE: The failure may depend on preceding RUN lines.
       Use --verbose to see preceding RUN lines and outputs.)

If you use -v/--verbose, it becomes:

varungandhi-apple edited the summary of this revision. (Show Details)May 28 2020, 12:15 AM

Ah, I just realized there are some inconsistent variable names, some are snake_case, some are camelCase. Should I be consistently using camelCase? I did see a bunch of snake_case variables, that's probably what tripped me up...

Fixed some variable names, added some doc comments, cleaned up some logic, added some more CHECK lines to tests.

I know this patch didn't introduce this, but I'm curious what the meaning/purpose is of the ":" in the -vv output?

Command Output (stdout, truncated):
--
$ ":" "RUN: at line 2"
  ^^^

@JDevlieghere, that is part of the code that is writing out the command line.

# TestRunner.py
def executeScriptInternal(test, litConfig, tmpBase, commands, cwd):
    # ... skipped
    for i,result in enumerate(results):
        # Write the command line run.
        out += '$ %s\n' % (' '.join('"%s"' % s
                                    for s in result.command.args),)

The commit which added it has the following message:

commit 591838d1d33bc0dc5d2ffc66dd22176f280cc625
Author: Daniel Dunbar <daniel@zuster.org>
Date:   Thu Jun 2 23:49:42 2016 +0000

    [lit] Improve readability of failing scripts.

     - This only applies to scripts executed by the _internal_ shell script
       interpreter.

     - This patch reworks the log to look more like a shell transcript, and be less
       verbose (but in the interest of calling attention to the important parts).

    Here is an example of the new format, for commands with/without failures and
    with/without output:

    $ true
    $ echo hi
    hi

    $ false
    note: command had no output on stdout or stderr
    error: command failed with exit status 1

While the commit message makes it seem like the double-quotes
should not be present, the commit does add tests with double-quotes,
so I'm guessing it is intentional and not accidental. I'm guessing the
answer to your question is probably the "but in the interest of calling
attention to the important parts", which involves specifying each arg.

llvm/utils/lit/lit/TestRunner.py
1647–1650

The out if out else err might seem a bit strange (this was tripping me up for a while). For reasons I haven't tried digging into (perhaps it has to do with internal vs external shell), looks like sometimes the command output that's applicable is coming from stdout and sometimes it is coming from stderr. You can look at the shtest-run-at-line.py test case to see this in action. So I ended up going with a best effort "try this or that". Maybe we can make this more precise.

[Now I'm wondering if this ^ should be a TODO in the code instead of a review comment...]

Hmm, Phabricator attached my inline comment to my other larger comment, that's confusing.
I'm guessing I was supposed to make an out-of-line comment paired with the inline one,
that's what the "not submitted" was about.

yln added a comment.May 28 2020, 4:05 PM

I think this is a good improvement for the usability of lit. Thanks for working on this!

  • default (no flags): no script, show only failing line in command output
  • -v: full script, highlight failing line in script, shows command output up to and including failing line, highlight failing output
  • -vv: alias to -v (preserve backwards compatibility)

Quick question: We show the command output up to and including the failing line. Should we do the same for the script part, i.e., skip the un-executed part of the script? Is it ever useful to know which script line/command would have been executed next?

Linux/Windows/environments without support for non-printable characters:
Did you try this out in an environment with a "dumb" console, e.g., build bot? Ideally, someone helps us test this on Linux and Windows before landing.

Please also update these docs: https://llvm.org/docs/CommandGuide/lit.html
I am okay with doing this in a separate patch.

Is it ok to rename echo_all_commands to runWithCommandOutput? I think the latter represents the meaning better (since the behavior is no longer controlled by the --echo-all-commands flag), but that might break compatibility for out-of-tree users (should be a small fix though).

I am in favor of this. Would it be possible to break the rename out into a separate patch to reduce the size of this one? Also should make it easer for downstream users to adopt/or in case we have to revert the rename temporarily.

Should TerminalController be split out into a different file?

I am okay with the current state. Let's move it to its own file in case we need to extend/touch it.

Ah, I just realized there are some inconsistent variable names, some are snake_case, some are camelCase. Should I be consistently using camelCase? I did see a bunch of snake_case variables, that's probably what tripped me up...

Yes, the mixture/mess is unfortunate. :(

I don't think we have a documented style guide (as we do for LLVM C++ code), but for new code I think we should follow official Python style: https://www.python.org/dev/peps/pep-0008/
This means snake_case for variable and function names. I use a linter to try to avoid introducing additional style issues and cleanup issues in code I touch, but usually avoid unrelated cleanups that would make my patches larger. And of course: use your own good judgement.

llvm/utils/lit/lit/LitConfig.py
30

snake_case

llvm/utils/lit/lit/OutputSettings.py
3

full unused.

7

The name of the two enums seems to be never used (only instances are compared).

11

I don't understand why we need a separate FullScriptWithHighlighting.

llvm/utils/lit/lit/TestRunner.py
1480

Phew, this function is pretty complicated. Please make sure the major cases are covered by tests.

1496

Can we add a test + comment for the failing case so the behavior is documented.

1528

Please split up this patch to change the behavior of -v first and add the "highlighting" feature in a follow-up. Also: do we need to highlight the failing output considering that it will always be the last thing printed?

1564

Why is there a separate case for highlighted script output? For command output with highlight (if supported). I think we should do the same for script output.

llvm/utils/lit/lit/cl_arguments.py
57

I think we can add the "aliases" to the above call to add_argument() now.

171

I think it should be possible to remove most of this code by setting the defaults/destination in add_argument().

yln added inline comments.May 28 2020, 4:22 PM
llvm/utils/lit/lit/TestRunner.py
1647–1650

The code below makes me think that we should try to find the RUN line (locateLastFailingRunLine(cmdOutput)) in both stdout and stderr.

varungandhi-apple marked 6 inline comments as done.May 28 2020, 5:19 PM

Quick question: We show the command output up to and including the failing line. Should we do the same for the script part, i.e., skip the un-executed part of the script? Is it ever useful to know which script line/command would have been executed next?

In my experience, if there are similar lines later on, sometimes it is good to spot them (especially in CI). Not something you want often though, but I think it's good to have.

Linux/Windows/environments without support for non-printable characters:
Did you try this out in an environment with a "dumb" console, e.g., build bot? Ideally, someone helps us test this on Linux and Windows before landing.

I have only tried it locally. I was assuming that there is some CI which is running, based on seeing the "Harbormaster failed remote builds in B58171: Diff 266757!" message earlier. However, that seems to have failed with a 503 error which I don't know what to do with. Is there no running CI?

I am in favor of this. Would it be possible to break the rename out into a separate patch to reduce the size of this one? Also should make it easer for downstream users to adopt/or in case we have to revert the rename temporarily.

I can break it up; although at this stage, it's unclear to me how many patches you'd like me to break this into, given the other comments.

Please also update these docs: https://llvm.org/docs/CommandGuide/lit.html
I am okay with doing this in a separate patch.

Please split up this patch to change the behavior of -v first and add the "highlighting" feature in a follow-up

Patch 1: Change behavior of -v. (and default and -vv behavior)
Patch 2: Add highlighting/colored output.
Patch 3: Updating https://llvm.org/docs/CommandGuide/lit.html
Patch 4: Rename echo_all_commands to runWithCommandOutput.

Can you clarify how many different patches you'd like to see and what the contents should be? Is it what I stated above?

Also, when you say "patch", do you mean different change requests (with their own reviews), or different commits but in this current change request (all in the same review)?

but for new code I think we should follow official Python style: https://www.python.org/dev/peps/pep-0008/
This means snake_case for variable and function names.

Ok, that means I'll need to change all of it... I saw a bunch of recent code using camelCase, so I used camelCase. :(

llvm/utils/lit/lit/OutputSettings.py
7

I used it for print debugging; looks like I should've defined __str__ instead for that.

llvm/utils/lit/lit/TestRunner.py
1480

Do you mean adding unit tests for testing this function by itself (I don't recall seeing any unit tests in the code)? Or do you mean integration tests?

1496

I will add the test. Could you elaborate on the comment part; I don't follow. What additional information should that comment describe?

1564

I don't understand the question. Could you rephrase? If there's no failure, there's nothing to be highlighted. Similarly, if someone is using -a, we are not doing any highlighting (i.e. we get FullScript). We only highlight when we actually have a failure and we were asked to highlight the output.

1647–1650

We still need to break the tie somehow though; what happens if both the stdout and stderr have RUN: at line in them? Is your suggestion that we try finding in one, if that fails, then try finding it in the other?

llvm/utils/lit/lit/cl_arguments.py
171

Do we expect the order of verbosity arguments to affect what the behavior is? The current code is saying that it should not affect the behavior. If you specify -a anywhere, you get all the output. We do have tests doing -a -v which seem to be expecting all the output, so that would need to be changed as well.

varungandhi-apple marked 6 inline comments as not done.May 28 2020, 5:43 PM
yln added a comment.May 29 2020, 11:16 AM

Is there no running CI?

There are bots (managed by different people) but usually lit is only tested as part of check-all. We want to avoid breaking too many people, especially because this is a feature interacting with the external shell I can see it causing problems on different platforms/environments.

Patch 1: Change behavior of -v. (and default and -vv behavior)
Patch 2: Add highlighting/colored output.
Patch 3: Updating https://llvm.org/docs/CommandGuide/lit.html
Patch 4: Rename echo_all_commands to runWithCommandOutput.

That sounds like a good plan.

Also, when you say "patch", do you mean different change requests (with their own reviews), or different commits but in this current change request (all in the same review)?

Yes, different "revisions" with their own review each.

llvm/utils/lit/lit/TestRunner.py
1480

I just meant: we have 4 different returns and should make sure that all code paths are tested.

1496

I meant: add the test to "pin down" the behavior (even if it's not the optimal desired) and also put this explaining comment there.

1564

I think we should always highlight the failure if there is one (just as we do with command output).

1647–1650

That sounds reasonable. How do you propose to break the tie? I would probably go stderr -> stdout since errors are more likely in the stderr?!

llvm/utils/lit/lit/cl_arguments.py
171

I think we should try to preserve existing behavior, if the new/other behavior isn't a clear improvement. Here, I don't think the order should matter.

I just meant that this block of code could probably be simpler if we use the appropriate arpgarse features.

varungandhi-apple marked 3 inline comments as done.May 31 2020, 7:11 PM

On second thought, I'm going to update the documentation alongside the behavior change, to make sure the two are in sync.
Patch 1: Change behavior of -v. (and default and -vv behavior). Update https://llvm.org/docs/CommandGuide/lit.html as necessary.
Patch 2: Add highlighting/colored output. Update https://llvm.org/docs/CommandGuide/lit.html as necessary.
Patch 3: Rename echo_all_commands to runWithCommandOutput.

llvm/utils/lit/lit/TestRunner.py
1564

Hmm. I think maybe we should have way to turn off the highlighting (both for the command output and the script), in case the heuristic returns the wrong result when someone is iterating on a bug. We can discuss this more in the subsequent revisions I send after breaking this one up.

1647–1650

Ok, that makes sense. I'll make it default stderr -> stdout.

llvm/utils/lit/lit/cl_arguments.py
171

I think we should try to preserve existing behavior, if the new/other behavior isn't a clear improvement. Here, I don't think the order should matter.

I just meant that this block of code could probably be simpler if we use the appropriate arpgarse features.

The problem with using the argparse features is that the mutation effect (for setting the value for the destination) will take place when the argument is parsed (which I'm guessing will happen left-to-right). Hence the final value will be different if the arguments appear in a different order.

I've split this patch up into four different patches. Here is the suggested review order:

  1. D82808: Small, adds some documentation for -s.
  2. D82791: Largish, adds the logic and test cases of finding the failing line.
  3. D82811: Small, adds coloring on top of the previous one.
  4. D82814: Small, renames echo_all_commands field.

I will close this revision.

varungandhi-apple abandoned this revision.Jun 29 2020, 3:09 PM