This is an archive of the discontinued LLVM Phabricator instance.

[Commandline] Clear error on raw_ostream when there is parsing error.
AbandonedPublic

Authored by ychen on Sep 27 2019, 3:11 PM.

Details

Summary

Otherwise report_fatal_error will be triggered in
raw_fd_ostream::~raw_fd_ostream for llvm::errs since it is a static object.

Exposed by llvm/test/Other/close-stderr.ll where 2>&- is used.

Event Timeline

ychen created this revision.Sep 27 2019, 3:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2019, 3:11 PM
ruiu added a comment.Sep 29 2019, 9:35 PM

Can you add a testcase to demonstrate the issue?

POSIX has stated that file descriptors 0, 1 or 2 should not be closed. The is stated in many parts, e.g.

Applications should not execute programs with file descriptor 0 not open for reading or with file descriptor 1 or 2 not open for writing, as this might cause the executed program to misbehave. In order not to pass on these file descriptors to an executed program, applications should not just close them but should reopen them on, for example, /dev/null.

The change is justified if you can give an example of a different file descriptor.

ruiu added a comment.Sep 29 2019, 10:41 PM

POSIX has stated that file descriptors 0, 1 or 2 should not be closed. The is stated in many parts, e.g.

Applications should not execute programs with file descriptor 0 not open for reading or with file descriptor 1 or 2 not open for writing, as this might cause the executed program to misbehave. In order not to pass on these file descriptors to an executed program, applications should not just close them but should reopen them on, for example, /dev/null.

The change is justified if you can give an example of a different file descriptor.

I believe that sentence is talking about stdin/stdout/stderr and forking a new process. As long as you don't create a new process, I think you are free to close 0, 1 and 2.

That said I agree with you that we shouldn't close 0, 1 and 2 on exit. I'm not sure if we are doing this at the moment, but if we do, I think we should stop doing that.

POSIX has stated that file descriptors 0, 1 or 2 should not be closed. The is stated in many parts, e.g.

Applications should not execute programs with file descriptor 0 not open for reading or with file descriptor 1 or 2 not open for writing, as this might cause the executed program to misbehave. In order not to pass on these file descriptors to an executed program, applications should not just close them but should reopen them on, for example, /dev/null.

The change is justified if you can give an example of a different file descriptor.

I believe that sentence is talking about stdin/stdout/stderr and forking a new process. As long as you don't create a new process, I think you are free to close 0, 1 and 2.

That said I agree with you that we shouldn't close 0, 1 and 2 on exit. I'm not sure if we are doing this at the moment, but if we do, I think we should stop doing that.

Comments in close-stderr.ll says "Test that the error handling when writing to stderr fails exits the program cleanly rather than aborting." We shouldn't close it and I believe we're not doing it although users could do it at the commandline if they wish and we should exit cleanly.

ychen added a comment.Sep 30 2019, 9:01 PM

Can you add a testcase to demonstrate the issue?

The exit is called twice and somehow the test exit cleanly. I don't know why but it does not look right. If D67847 is landed without this change, the test would fail because of abort.

➜  llvm-project git:(master) gdb --args /home/ychen2/Bld/llvm-clang-lld-x86_64-scei-ps4-ubuntu-Debug-with-gcc/bin/opt --reject-this-option 2>&-
(gdb) b exit
Breakpoint 1 at 0x1c55ac0
(gdb) r
Starting program: /home/ychen2/Bld/llvm-clang-lld-x86_64-scei-ps4-ubuntu-Debug-with-gcc/bin/opt --reject-this-option
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, __GI_exit (status=1) at exit.c:139
(gdb) bt
#0  __GI_exit (status=1) at exit.c:139
#1  0x0000555559b20f76 in (anonymous namespace)::CommandLineParser::ParseCommandLineOptions (this=0x55555ecb6490, argc=2, argv=0x7fffffffcff0,
    Overview="llvm .bc -> .bc modular optimizer and analysis printer\n", Errs=0x55555ec8d420 <llvm::errs()::S>, LongOptionsUseDoubleDash=false)
    at /home/ychen2/Src/llvm-project/llvm/lib/Support/CommandLine.cpp:1588
#2  0x0000555559b1f36c in llvm::cl::ParseCommandLineOptions (argc=2, argv=0x7fffffffd978, Overview="llvm .bc -> .bc modular optimizer and analysis printer\n", Errs=0x0, EnvVar=0x0,
    LongOptionsUseDoubleDash=false) at /home/ychen2/Src/llvm-project/llvm/lib/Support/CommandLine.cpp:1243
#3  0x00005555571e6652 in main (argc=2, argv=0x7fffffffd978) at /home/ychen2/Src/llvm-project/llvm/tools/opt/opt.cpp:543
(gdb) c
Continuing.

Breakpoint 1, __GI_exit (status=1) at exit.c:139
139     in exit.c
(gdb) bt
#0  __GI_exit (status=1) at exit.c:139
#1  0x0000555559b4030a in llvm::report_fatal_error (Reason=, GenCrashDiag=false) at /home/ychen2/Src/llvm-project/llvm/lib/Support/ErrorHandling.cpp:125
#2  0x0000555559b40107 in llvm::report_fatal_error (Reason="IO failure on output stream: Bad file descriptor", GenCrashDiag=false) at /home/ychen2/Src/llvm-project/llvm/lib/Support/ErrorHandling.cpp:86
#3  0x0000555559ba6636 in llvm::raw_fd_ostream::~raw_fd_ostream (this=0x55555ec8d420 <llvm::errs()::S>, __in_chrg=<optimized out>) at /home/ychen2/Src/llvm-project/llvm/lib/Support/raw_ostream.cpp:633
#4  0x00007ffff687e041 in __run_exit_handlers (status=1, listp=0x7ffff6c26718 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#5  0x00007ffff687e13a in __GI_exit (status=<optimized out>) at exit.c:139
#6  0x0000555559b20f76 in (anonymous namespace)::CommandLineParser::ParseCommandLineOptions (this=0x55555ecb6490, argc=2, argv=0x7fffffffcff0,
    Overview="llvm .bc -> .bc modular optimizer and analysis printer\n", Errs=0x55555ec8d420 <llvm::errs()::S>, LongOptionsUseDoubleDash=false)
    at /home/ychen2/Src/llvm-project/llvm/lib/Support/CommandLine.cpp:1588
#7  0x0000555559b1f36c in llvm::cl::ParseCommandLineOptions (argc=2, argv=0x7fffffffd978, Overview="llvm .bc -> .bc modular optimizer and analysis printer\n", Errs=0x0, EnvVar=0x0,
    LongOptionsUseDoubleDash=false) at /home/ychen2/Src/llvm-project/llvm/lib/Support/CommandLine.cpp:1243
#8  0x00005555571e6652 in main (argc=2, argv=0x7fffffffd978) at /home/ychen2/Src/llvm-project/llvm/tools/opt/opt.cpp:543
(gdb) c
Continuing.
[Inferior 1 (process 28650) exited with code 01]
ruiu added a comment.Sep 30 2019, 9:13 PM

So, could you add a testcase to demonstrate the issue?

If an error occurs, I believe this piece of code will try to call report_fatal_error and that will fail too for the same reason (because if the code fails, it is likely that we can't write anything to stderr), and the process will exit with exit code 1. So, is the new behavior implemented in this patch distinguishable from the existing behavior? If a command behaves differently with this patch, please add a test to demonstrate it.

ychen added a comment.Sep 30 2019, 9:30 PM

So, could you add a testcase to demonstrate the issue?

If an error occurs, I believe this piece of code will try to call report_fatal_error and that will fail too for the same reason (because if the code fails, it is likely that we can't write anything to stderr), and the process will exit with exit code 1. So, is the new behavior implemented in this patch distinguishable from the existing behavior? If a command behaves differently with this patch, please add a test to demonstrate it.

The new behavior is not distinguishable from the existing behavior from the user's perspective. The existing behavior does exit(1)-> report_fatal_error -> exit(1) and new behavior just exit(1) cleanly.

ychen added a comment.Sep 30 2019, 9:43 PM

So, could you add a testcase to demonstrate the issue?

If an error occurs, I believe this piece of code will try to call report_fatal_error and that will fail too for the same reason (because if the code fails, it is likely that we can't write anything to stderr), and the process will exit with exit code 1. So, is the new behavior implemented in this patch distinguishable from the existing behavior? If a command behaves differently with this patch, please add a test to demonstrate it.

The new behavior is not distinguishable from the existing behavior from the user's perspective. The existing behavior does exit(1)-> report_fatal_error -> exit(1) and new behavior just exit(1) cleanly.

Another perspective is that the report_fatal_error in raw_fd_ostream is to make sure errors are not silently ignored before the destructor is called, which is exactly what this patch is for.

ruiu added a comment.Sep 30 2019, 10:26 PM

The result of calling exit() more than once is an undefined behavior, so I agree with you that we should avoid that even if that's currently not observable.

Another way of fixing this issue is handling 0, 1 and 2 as special FDs, as they are actually special. If FD is equal or less than 2, we can avoid calling exit even if reading/writing it results in an error. That's perhaps more general way of fixing this issue than clearing an error condition at this particular place, I guess?

MaskRay requested changes to this revision.EditedSep 30 2019, 10:28 PM

The exit is called twice and somehow the test exit cleanly.

The current behavior is undefined. POSIX says "If exit() is called more than once, the behavior is undefined." Changing exit to abort in D67847 will fix the issue. Thanks.

The test ; RUN: sh -c 'opt --reject-this-option 2>&-; echo $?; opt -o /dev/null /dev/null 2>&-; echo $?;' is testing an undefined behavior, but I can see 2>&- is a simple alternative to a full-fledged test that makes write operations fail with some other errno. My feeling is that you can just update close-stderr.ll to (not --crash opt --reject-this-option 2>&-) in D67847 and be prepared to update it if build bots on some platforms have different behaviors.

This revision now requires changes to proceed.Sep 30 2019, 10:28 PM

The exit is called twice and somehow the test exit cleanly.

The current behavior is undefined. POSIX says "If exit() is called more than once, the behavior is undefined." Changing exit to abort in D67847 will fix the issue. Thanks.

The test ; RUN: sh -c 'opt --reject-this-option 2>&-; echo $?; opt -o /dev/null /dev/null 2>&-; echo $?;' is testing an undefined behavior, but I can see 2>&- is a simple alternative to a full-fledged test that makes write operations fail with some other errno. My feeling is that you can just update close-stderr.ll to (not --crash opt --reject-this-option 2>&1) in D67847 and be prepared to update it if build bots on some platforms have different behaviors.

sorry, I'm really confused now. Based on the comment in close-stderr.ll, the test is to make sure abort does not happen? Why is it checking the undefined behavior?

ruiu added a comment.Sep 30 2019, 10:43 PM

The exit is called twice and somehow the test exit cleanly.

The current behavior is undefined. POSIX says "If exit() is called more than once, the behavior is undefined." Changing exit to abort in D67847 will fix the issue. Thanks.

The test ; RUN: sh -c 'opt --reject-this-option 2>&-; echo $?; opt -o /dev/null /dev/null 2>&-; echo $?;' is testing an undefined behavior, but I can see 2>&- is a simple alternative to a full-fledged test that makes write operations fail with some other errno. My feeling is that you can just update close-stderr.ll to (not --crash opt --reject-this-option 2>&1) in D67847 and be prepared to update it if build bots on some platforms have different behaviors.

sorry, I'm really confused now. Based on the comment in close-stderr.ll, the test is to make sure abort does not happen? Why is it checking the undefined behavior?

I think the current implementation is wrong in the sense that it calls exit more than once, but it happens to work just fine on most platforms, so the test is not failing at the moment. But in theory the test could hang instead of existing with 1, for example.

The exit is called twice and somehow the test exit cleanly.

The current behavior is undefined. POSIX says "If exit() is called more than once, the behavior is undefined." Changing exit to abort in D67847 will fix the issue. Thanks.

The test ; RUN: sh -c 'opt --reject-this-option 2>&-; echo $?; opt -o /dev/null /dev/null 2>&-; echo $?;' is testing an undefined behavior, but I can see 2>&- is a simple alternative to a full-fledged test that makes write operations fail with some other errno. My feeling is that you can just update close-stderr.ll to (not --crash opt --reject-this-option 2>&-) in D67847 and be prepared to update it if build bots on some platforms have different behaviors.

sorry, I'm really confused now. Based on the comment in close-stderr.ll, the test is to make sure abort does not happen? Why is it checking the undefined behavior?

close-stderr.ll was added to test a particular behavior of an undefined behavior. There was some churn to mark it as fail on Windows or under valgrind. This is a signal that it really should not test the particular behavior. As this patch suggests, to maintain that particular behavior, we would have to make the code complex. Adding a virtual function for this tricky case is something I really do not want to happen... We probably should just use not --crash opt --reject-this-option 2>&-.

ychen added a comment.EditedSep 30 2019, 11:11 PM

The exit is called twice and somehow the test exit cleanly.

The current behavior is undefined. POSIX says "If exit() is called more than once, the behavior is undefined." Changing exit to abort in D67847 will fix the issue. Thanks.

The test ; RUN: sh -c 'opt --reject-this-option 2>&-; echo $?; opt -o /dev/null /dev/null 2>&-; echo $?;' is testing an undefined behavior, but I can see 2>&- is a simple alternative to a full-fledged test that makes write operations fail with some other errno. My feeling is that you can just update close-stderr.ll to (not --crash opt --reject-this-option 2>&-) in D67847 and be prepared to update it if build bots on some platforms have different behaviors.

sorry, I'm really confused now. Based on the comment in close-stderr.ll, the test is to make sure abort does not happen? Why is it checking the undefined behavior?

close-stderr.ll was added to test a particular behavior of an undefined behavior.

By undefined I don't know what kind of consistent behavior could be checked.

There was some churn to mark it as fail on Windows or under valgrind. This is a signal that it really should not test the particular behavior.

I suspend some of this are related to double exit. The rest are related to platform / software specific handling of exit / crash.

As this patch suggests, to maintain that particular behavior, we would have to make the code complex. Adding a virtual function for this tricky case is something I really do not want to happen...

I see what you’re saying but the vtable is already there and error handling is a slow path.

We probably should just use not --crash opt --reject-this-option 2>&-.

This is against user experience and the test could be removed if we just let it crash in this case. I think it totally possible that the user doesn't really want to see anything from stderr and managed to give a wrong option. They could use &> /dev/null too but...

I agree with @ruiu that this test passes by coincidence and we should address it in some way besides D67847.

MaskRay resigned from this revision.Oct 18 2019, 7:22 PM
ychen abandoned this revision.Mar 25 2020, 11:06 AM

llvm/test/Other/close-stderr.ll is removed in rG4ad768525844 which makes this patch unnecessary.