Download Raw Diff

Details

Reviewers

davide
ab
jhenderson
t.p.northover

Summary

UNIX03 conformance requires utilities to flush stdout before exiting and raise
an error if writing fails. Flushing already happens on a call to exit
and thus automatically on a return from main. Write failure is then
detected by LLVM's default SIGPIPE handler. The handler already exits with
a non-zero code, but conformance additionally requires an error message.

rdar://89760992

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmcyk created this revision.Apr 28 2022, 4:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 4:49 AM

Herald added subscribers: dexonsmith, hiraditya. · View Herald Transcript

dmcyk requested review of this revision.Apr 28 2022, 4:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 4:49 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B161776: Diff 425746.Apr 28 2022, 5:35 AM

rebase main

Harbormaster completed remote builds in B161820: Diff 425816.Apr 28 2022, 10:20 AM

dmcyk added a reviewer: t.p.northover.May 3 2022, 7:45 AM

Apparently needed because tools like nm are specified to some degree by POSIX and we supply that on some platforms, so it seems reasonable to me.

This revision is now accepted and ready to land.May 3 2022, 7:56 AM

Test case?

This revision now requires changes to proceed.May 4 2022, 12:57 AM

Add llvm-nm test case for SIGPIPE handler

Herald added subscribers: rupprecht, MaskRay. · View Herald TranscriptMay 4 2022, 4:05 AM

Harbormaster completed remote builds in B162649: Diff 426960.May 4 2022, 4:56 AM

Windows unsupported; rebase main

Harbormaster completed remote builds in B162653: Diff 426967.May 4 2022, 5:53 AM

Build failure is unrelated to my changes I think, only one test had a timeout and it was TIMEOUT: libFuzzer :: large.test (72097 of 72097).

dexonsmith removed a subscriber: dexonsmith.May 4 2022, 6:28 PM

I'm wondering if it would be better to use python to drive this test rather than a somewhat non-trivial shell script? I'm mostly concerned about the potential flakiness of the sleep 2 command, since on an overloaded system, it's possible this won't have the desired effect. If you use python, you could attach a reader to the stdout of the llvm-nm process, that only read a bit of output (e.g. 1 byte) before stopping and closing itself. This in turn should trigger the error on stderr, if I'm following the behaviour right.

Here's about what I'm thinking (make sure to check this works before using):

# RUN: not %python %s llvm-nm 2>&1 | FIleCheck %s
# CHECK: write on a pipe with no reader

import subprocess
import sys

with subprocess.Popen([sys.argv[1]], stdout=subprocess.PIPE) as process:
  process.stdout.read(1)
  process.stdout.close()
sys.exit(process.returncode)

Use python script instead of shell

Yup, the python script works and it's better idd, thanks!

LGTM!

llvm/test/tools/llvm-nm/unix03-sigpipe-exit.test
8 ↗	(On Diff #427301)	Nit: I'd check the "error:" prefix here too.

This revision is now accepted and ready to land.May 5 2022, 6:50 AM

Check full error message

dmcyk marked an inline comment as done.May 5 2022, 6:52 AM

Harbormaster completed remote builds in B162905: Diff 427304.May 5 2022, 7:35 AM

The builds look good, maybe @jhenderson or @t.p.northover, can you land this for me, please? I don't have commit access.

In D124607#3494410, @dmcyk wrote:

The builds look good, maybe @jhenderson or @t.p.northover, can you land this for me, please? I don't have commit access.

Although I do have a Linux VM, it's slow, and I'd prefer to avoid having to do a full build and test cycle on it (and I can't usefully use my Windows one). @t.p.northover, please could you take care of this?

@t.p.northover made the commit, but it broke clang-ppc64le-linux-multistage buildbot, so it's been reverted.
In the test the input to FileCheck was empty, so maybe the signal handler didn't trigger for some reason.

The test does work on x86 Linux and macOS, so I've tried a ppc64le VM running CentOS 8.
The VM is too slow to compile the entire llvm, but I've tested a simple program with a SIGPIPE signal handler, the same python script and that worked as expected.
So the failure seems to be specific to RHEL or the buildbot environment. Maybe someone from #powerpc can help test this?

In D124607#3515118, @dmcyk wrote:

@t.p.northover made the commit, but it broke clang-ppc64le-linux-multistage buildbot, so it's been reverted.
In the test the input to FileCheck was empty, so maybe the signal handler didn't trigger for some reason.

The test does work on x86 Linux and macOS, so I've tried a ppc64le VM running CentOS 8.
The VM is too slow to compile the entire llvm, but I've tested a simple program with a SIGPIPE signal handler, the same python script and that worked as expected.
So the failure seems to be specific to RHEL or the buildbot environment. Maybe someone from #powerpc can help test this?

The issue does not seem to be related to PowerPC. The problem appears to be that llvm-nm finishes writing too fast for this to trigger the signal handler. With a shared library build (such as this bot), there are only 551 lines of output when running $LLVM_BUILD/bin/llvm-nm $LLVM_BUILD/bin/llvm-nm. That does not appear to be enough. You can either run a tool that runs longer or run llvm-nm with something that produces more output. When sorting for length of the output of llvm-nm for all the binaries in the bin directory, it appears that llvm-tblgen is among the longest ones. And certainly, if I change:

# RUN: not %python %s llvm-nm llvm-nm 2>&1 | FileCheck %s

# RUN: not %python %s llvm-nm llvm-tblgen 2>&1 | FileCheck %s

The test case succeeds. However, it would probably be ideal to devise a way to test this in a way that is guaranteed to invoke the signal handler.
It's up to you how you want to approach this.

Both llvm-symbolizer and llvm-cxxfilt, if called without positional arguments, are run interactively, essentially taking a line of input then writing some output before waiting for more input. This would allow you to force output to be written (by writing to stdin) after the pipe has been closed. Maybe that'll solve the issue, although I'm not sure I understand why the python script doesn't work in its current form: llvm-nm should try to write many bytes of output (> 1 byte), but the pipe is closed after the very first byte is written. Before going and changing the script, I'd like to understand whether this statement is actually true, and if so, why:

That does not appear to be enough. You can either run a tool that runs longer or run llvm-nm with something that produces more output.

In D124607#3536329, @jhenderson wrote:

Both llvm-symbolizer and llvm-cxxfilt, if called without positional arguments, are run interactively, essentially taking a line of input then writing some output before waiting for more input. This would allow you to force output to be written (by writing to stdin) after the pipe has been closed. Maybe that'll solve the issue, although I'm not sure I understand why the python script doesn't work in its current form: llvm-nm should try to write many bytes of output (> 1 byte), but the pipe is closed after the very first byte is written. Before going and changing the script, I'd like to understand whether this statement is actually true, and if so, why:

That does not appear to be enough. You can either run a tool that runs longer or run llvm-nm with something that produces more output.

If you have an idea for how I can verify this, I'd love to help. Without any tracing/debugging as proof, the way I guess I reason about it is that the process writes to the pipe asynchronously without regard to how much is read from the pipe. So if the write finishes before the pipe is closed, it finishes. Namely, reading a single byte from the pipe rather than all of them does not prevent all the bytes from being written to it before it is closed.
As I said, I am not sure why the test only works when there is more output. All I know for sure is that I've tried it on multiple machines with a shared library build with the same results.

FWIW, this test fails exactly the same way on my MacBook if I use llvm-nm on llvm-config which produces shorter output (although it is successful in its current form). So this seems to be completely pervasive and not at all restricted to PowerPC (or even Linux).

rebase
Use llvm-cxxfilt to test SIGPIPE handler

Thanks for trying it out @nemanjai, I can reproduce this locally indeed.

In D124607#3536577, @nemanjai wrote:

In D124607#3536329, @jhenderson wrote:

Both llvm-symbolizer and llvm-cxxfilt, if called without positional arguments, are run interactively, essentially taking a line of input then writing some output before waiting for more input. This would allow you to force output to be written (by writing to stdin) after the pipe has been closed. Maybe that'll solve the issue, although I'm not sure I understand why the python script doesn't work in its current form: llvm-nm should try to write many bytes of output (> 1 byte), but the pipe is closed after the very first byte is written. Before going and changing the script, I'd like to understand whether this statement is actually true, and if so, why:

That does not appear to be enough. You can either run a tool that runs longer or run llvm-nm with something that produces more output.

If you have an idea for how I can verify this, I'd love to help. Without any tracing/debugging as proof, the way I guess I reason about it is that the process writes to the pipe asynchronously without regard to how much is read from the pipe. So if the write finishes before the pipe is closed, it finishes. Namely, reading a single byte from the pipe rather than all of them does not prevent all the bytes from being written to it before it is closed.
As I said, I am not sure why the test only works when there is more output. All I know for sure is that I've tried it on multiple machines with a shared library build with the same results.

I think you're right about llvm-nm being able to write out all the bytes before the pipe is closed. Kernel buffers writes to a pipe, if nm's output doesn't exceed the buffer size it may finish running before stdout is closed.

In my system it looks like the kernel's pipe buffer fits 65k bytes, here's how I tested it:

$ cat sigtest.c
#include <signal.h>
#include <sys/signal.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

void Action(int sig, siginfo_t* info, void* context) {
  fprintf(stderr, "sig action\n");
  exit(78);
}

int main(int argc, char** argv) {
  struct sigaction orig;
  struct sigaction handler;
  handler.sa_flags = SA_SIGINFO;
  handler.sa_sigaction = Action;
  sigaction(SIGPIPE, &handler, &orig);

  int limit = 1000;
  if (argc > 1) {
    limit = atoi(argv[1]);
  }
  for (int i = 0; i < limit; ++i) {
    write(1, "a", 1);
    fprintf(stderr, "wrote nth: %d\n", i + 1);
  }
}

$ cat test.py
import subprocess
import sys
import time

with subprocess.Popen([sys.argv[1], sys.argv[2]], stdout=subprocess.PIPE) as process:
  time.sleep(5);
  print("- finished sleep")
  process.stdout.read(1)
  process.stdout.close()

print("- exit: ", process.returncode)
sys.exit(process.returncode)
$ clang sigtest.c
$ python3 test.py $PWD/a.out 65536
wrote nth: 65535
wrote nth: 65536
- finished sleep
- exit:  0

This finishes writing all the bytes before the first byte is even read and always exits with a 0 code.
But when I bump the number of bytes it hangs trying to write the 65537th byte, sometimes it still manages to write it before the pipe is closed:

$ python3 test.py $PWD/a.out 65537
wrote nth: 65535
wrote nth: 65536
- finished sleep
wrote nth: 65537
- exit:  0

and sometimes stdout is closed faster:

$ python3 test.py $PWD/a.out 65537
wrote nth: 65536
- finished sleep
sig action
- exit:  78

llvm-nm won't get the signal without a guarantee that stdout is closed before it tries to write to it, so I've moved the test case to use llvm-cxxfilt. Thanks for the tip @jhenderson.

Harbormaster completed remote builds in B167225: Diff 433346.Jun 1 2022, 4:03 AM

jhenderson added inline comments.Jun 6 2022, 3:08 AM

llvm/test/tools/llvm-cxxfilt/unix03-sigpipe-exit.test
1 ↗	(On Diff #433346)	Comment needs updating. I actually think the test as a whole should be moved to llvm/test/Support, since it's really testing a generic piece of behaviour, rather than nm's/cxxfilt's behaviour. Unrelated, but I often encourage using double comment markers (i.e. `##`) for true comments, versus lit/FileCheck directives, so that they stand out better.
23 ↗	(On Diff #433346)
25 ↗	(On Diff #433346)	No need for `pass` when there's a statement here.

Move test case to support; update comments

Harbormaster completed remote builds in B168250: Diff 434741.Jun 7 2022, 2:55 AM

Latest updates look good again.

dmcyk closed this revision.Jun 23 2022, 3:31 AM

This is an archive of the discontinued LLVM Phabricator instance.

Add an error message to the default SIGPIPE handler
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 434741

llvm/lib/Support/Unix/Signals.inc

llvm/test/Support/unix03-sigpipe-exit.test

This is an archive of the discontinued LLVM Phabricator instance.

Add an error message to the default SIGPIPE handlerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 434741

llvm/lib/Support/Unix/Signals.inc

llvm/test/Support/unix03-sigpipe-exit.test

Add an error message to the default SIGPIPE handler
ClosedPublic