Page MenuHomePhabricator

Add an error message to the default SIGPIPE handler
ClosedPublic

Authored by dmcyk on Apr 28 2022, 4:49 AM.

Details

Summary

UNIX03 conformance requires utilities to flush stdout before exiting and raise
an error if writing fails. Flushing already happens on a call to exit
and thus automatically on a return from main. Write failure is then
detected by LLVM's default SIGPIPE handler. The handler already exits with
a non-zero code, but conformance additionally requires an error message.

rdar://89760992

Diff Detail

Event Timeline

dmcyk created this revision.Apr 28 2022, 4:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 4:49 AM
dmcyk requested review of this revision.Apr 28 2022, 4:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 4:49 AM
dmcyk updated this revision to Diff 425816.Apr 28 2022, 9:30 AM
  • rebase main
t.p.northover accepted this revision.May 3 2022, 7:56 AM

Apparently needed because tools like nm are specified to some degree by POSIX and we supply that on some platforms, so it seems reasonable to me.

This revision is now accepted and ready to land.May 3 2022, 7:56 AM
jhenderson requested changes to this revision.May 4 2022, 12:57 AM

Test case?

This revision now requires changes to proceed.May 4 2022, 12:57 AM
dmcyk updated this revision to Diff 426960.May 4 2022, 4:05 AM
  • Add llvm-nm test case for SIGPIPE handler
dmcyk updated this revision to Diff 426967.May 4 2022, 5:10 AM
  • Windows unsupported; rebase main
dmcyk added a comment.May 4 2022, 6:09 AM

Build failure is unrelated to my changes I think, only one test had a timeout and it was TIMEOUT: libFuzzer :: large.test (72097 of 72097).

jhenderson added a comment.EditedMay 5 2022, 4:45 AM

I'm wondering if it would be better to use python to drive this test rather than a somewhat non-trivial shell script? I'm mostly concerned about the potential flakiness of the sleep 2 command, since on an overloaded system, it's possible this won't have the desired effect. If you use python, you could attach a reader to the stdout of the llvm-nm process, that only read a bit of output (e.g. 1 byte) before stopping and closing itself. This in turn should trigger the error on stderr, if I'm following the behaviour right.

Here's about what I'm thinking (make sure to check this works before using):

# RUN: not %python %s llvm-nm 2>&1 | FIleCheck %s
# CHECK: write on a pipe with no reader

import subprocess
import sys

with subprocess.Popen([sys.argv[1]], stdout=subprocess.PIPE) as process:
  process.stdout.read(1)
  process.stdout.close()
sys.exit(process.returncode)
dmcyk updated this revision to Diff 427301.May 5 2022, 6:46 AM
  • Use python script instead of shell
dmcyk added a comment.May 5 2022, 6:47 AM

Yup, the python script works and it's better idd, thanks!

jhenderson accepted this revision.May 5 2022, 6:50 AM

LGTM!

llvm/test/tools/llvm-nm/unix03-sigpipe-exit.test
8 ↗(On Diff #427301)

Nit: I'd check the "error:" prefix here too.

This revision is now accepted and ready to land.May 5 2022, 6:50 AM
dmcyk updated this revision to Diff 427304.May 5 2022, 6:52 AM
  • Check full error message
dmcyk marked an inline comment as done.May 5 2022, 6:52 AM
dmcyk added a comment.May 5 2022, 10:12 AM

The builds look good, maybe @jhenderson or @t.p.northover, can you land this for me, please? I don't have commit access.

The builds look good, maybe @jhenderson or @t.p.northover, can you land this for me, please? I don't have commit access.

Although I do have a Linux VM, it's slow, and I'd prefer to avoid having to do a full build and test cycle on it (and I can't usefully use my Windows one). @t.p.northover, please could you take care of this?

dmcyk added a comment.May 16 2022, 1:02 AM

@t.p.northover made the commit, but it broke clang-ppc64le-linux-multistage buildbot, so it's been reverted.
In the test the input to FileCheck was empty, so maybe the signal handler didn't trigger for some reason.

The test does work on x86 Linux and macOS, so I've tried a ppc64le VM running CentOS 8.
The VM is too slow to compile the entire llvm, but I've tested a simple program with a SIGPIPE signal handler, the same python script and that worked as expected.
So the failure seems to be specific to RHEL or the buildbot environment. Maybe someone from #powerpc can help test this?

@t.p.northover made the commit, but it broke clang-ppc64le-linux-multistage buildbot, so it's been reverted.
In the test the input to FileCheck was empty, so maybe the signal handler didn't trigger for some reason.

The test does work on x86 Linux and macOS, so I've tried a ppc64le VM running CentOS 8.
The VM is too slow to compile the entire llvm, but I've tested a simple program with a SIGPIPE signal handler, the same python script and that worked as expected.
So the failure seems to be specific to RHEL or the buildbot environment. Maybe someone from #powerpc can help test this?

The issue does not seem to be related to PowerPC. The problem appears to be that llvm-nm finishes writing too fast for this to trigger the signal handler. With a shared library build (such as this bot), there are only 551 lines of output when running $LLVM_BUILD/bin/llvm-nm $LLVM_BUILD/bin/llvm-nm. That does not appear to be enough. You can either run a tool that runs longer or run llvm-nm with something that produces more output. When sorting for length of the output of llvm-nm for all the binaries in the bin directory, it appears that llvm-tblgen is among the longest ones. And certainly, if I change:

# RUN: not %python %s llvm-nm llvm-nm 2>&1 | FileCheck %s

to

# RUN: not %python %s llvm-nm llvm-tblgen 2>&1 | FileCheck %s

The test case succeeds. However, it would probably be ideal to devise a way to test this in a way that is guaranteed to invoke the signal handler.
It's up to you how you want to approach this.

Both llvm-symbolizer and llvm-cxxfilt, if called without positional arguments, are run interactively, essentially taking a line of input then writing some output before waiting for more input. This would allow you to force output to be written (by writing to stdin) after the pipe has been closed. Maybe that'll solve the issue, although I'm not sure I understand why the python script doesn't work in its current form: llvm-nm should try to write many bytes of output (> 1 byte), but the pipe is closed after the very first byte is written. Before going and changing the script, I'd like to understand whether this statement is actually true, and if so, why:

That does not appear to be enough. You can either run a tool that runs longer or run llvm-nm with something that produces more output.

Both llvm-symbolizer and llvm-cxxfilt, if called without positional arguments, are run interactively, essentially taking a line of input then writing some output before waiting for more input. This would allow you to force output to be written (by writing to stdin) after the pipe has been closed. Maybe that'll solve the issue, although I'm not sure I understand why the python script doesn't work in its current form: llvm-nm should try to write many bytes of output (> 1 byte), but the pipe is closed after the very first byte is written. Before going and changing the script, I'd like to understand whether this statement is actually true, and if so, why:

That does not appear to be enough. You can either run a tool that runs longer or run llvm-nm with something that produces more output.

If you have an idea for how I can verify this, I'd love to help. Without any tracing/debugging as proof, the way I guess I reason about it is that the process writes to the pipe asynchronously without regard to how much is read from the pipe. So if the write finishes before the pipe is closed, it finishes. Namely, reading a single byte from the pipe rather than all of them does not prevent all the bytes from being written to it before it is closed.
As I said, I am not sure why the test only works when there is more output. All I know for sure is that I've tried it on multiple machines with a shared library build with the same results.

FWIW, this test fails exactly the same way on my MacBook if I use llvm-nm on llvm-config which produces shorter output (although it is successful in its current form). So this seems to be completely pervasive and not at all restricted to PowerPC (or even Linux).

dmcyk updated this revision to Diff 433346.Wed, Jun 1, 3:28 AM
  • rebase
  • Use llvm-cxxfilt to test SIGPIPE handler
dmcyk added a comment.Wed, Jun 1, 3:29 AM

Thanks for trying it out @nemanjai, I can reproduce this locally indeed.

Both llvm-symbolizer and llvm-cxxfilt, if called without positional arguments, are run interactively, essentially taking a line of input then writing some output before waiting for more input. This would allow you to force output to be written (by writing to stdin) after the pipe has been closed. Maybe that'll solve the issue, although I'm not sure I understand why the python script doesn't work in its current form: llvm-nm should try to write many bytes of output (> 1 byte), but the pipe is closed after the very first byte is written. Before going and changing the script, I'd like to understand whether this statement is actually true, and if so, why:

That does not appear to be enough. You can either run a tool that runs longer or run llvm-nm with something that produces more output.

If you have an idea for how I can verify this, I'd love to help. Without any tracing/debugging as proof, the way I guess I reason about it is that the process writes to the pipe asynchronously without regard to how much is read from the pipe. So if the write finishes before the pipe is closed, it finishes. Namely, reading a single byte from the pipe rather than all of them does not prevent all the bytes from being written to it before it is closed.
As I said, I am not sure why the test only works when there is more output. All I know for sure is that I've tried it on multiple machines with a shared library build with the same results.

I think you're right about llvm-nm being able to write out all the bytes before the pipe is closed. Kernel buffers writes to a pipe, if nm's output doesn't exceed the buffer size it may finish running before stdout is closed.

In my system it looks like the kernel's pipe buffer fits 65k bytes, here's how I tested it:

$ cat sigtest.c
#include <signal.h>
#include <sys/signal.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

void Action(int sig, siginfo_t* info, void* context) {
  fprintf(stderr, "sig action\n");
  exit(78);
}

int main(int argc, char** argv) {
  struct sigaction orig;
  struct sigaction handler;
  handler.sa_flags = SA_SIGINFO;
  handler.sa_sigaction = Action;
  sigaction(SIGPIPE, &handler, &orig);

  int limit = 1000;
  if (argc > 1) {
    limit = atoi(argv[1]);
  }
  for (int i = 0; i < limit; ++i) {
    write(1, "a", 1);
    fprintf(stderr, "wrote nth: %d\n", i + 1);
  }
}

$ cat test.py
import subprocess
import sys
import time

with subprocess.Popen([sys.argv[1], sys.argv[2]], stdout=subprocess.PIPE) as process:
  time.sleep(5);
  print("- finished sleep")
  process.stdout.read(1)
  process.stdout.close()

print("- exit: ", process.returncode)
sys.exit(process.returncode)
$ clang sigtest.c
$ python3 test.py $PWD/a.out 65536
wrote nth: 65535
wrote nth: 65536
- finished sleep
- exit:  0

This finishes writing all the bytes before the first byte is even read and always exits with a 0 code.
But when I bump the number of bytes it hangs trying to write the 65537th byte, sometimes it still manages to write it before the pipe is closed:

$ python3 test.py $PWD/a.out 65537
wrote nth: 65535
wrote nth: 65536
- finished sleep
wrote nth: 65537
- exit:  0

and sometimes stdout is closed faster:

$ python3 test.py $PWD/a.out 65537
wrote nth: 65536
- finished sleep
sig action
- exit:  78

llvm-nm won't get the signal without a guarantee that stdout is closed before it tries to write to it, so I've moved the test case to use llvm-cxxfilt. Thanks for the tip @jhenderson.

jhenderson added inline comments.Mon, Jun 6, 3:08 AM
llvm/test/tools/llvm-cxxfilt/unix03-sigpipe-exit.test
1 ↗(On Diff #433346)

Comment needs updating.

I actually think the test as a whole should be moved to llvm/test/Support, since it's really testing a generic piece of behaviour, rather than nm's/cxxfilt's behaviour.

Unrelated, but I often encourage using double comment markers (i.e. ##) for true comments, versus lit/FileCheck directives, so that they stand out better.

23 ↗(On Diff #433346)
25 ↗(On Diff #433346)

No need for pass when there's a statement here.

dmcyk updated this revision to Diff 434741.Tue, Jun 7, 2:04 AM
dmcyk marked 3 inline comments as done.
  • Move test case to support; update comments
jhenderson accepted this revision.Tue, Jun 7, 5:43 AM

Latest updates look good again.

dmcyk closed this revision.Thu, Jun 23, 3:31 AM