This is an archive of the discontinued LLVM Phabricator instance.

[clang] diagnose function fallthrough
Needs ReviewPublic

Authored by nickdesaulniers on Mar 20 2023, 4:01 PM.

Details

Summary

Debugging functions that have fallen through to the next function is not
fun. So much so that the Linux kernel has a host utility called objtool
that can report such cases of fallthrough (for architectures objtool
supports). Example:

drivers/media/i2c/m5mols/m5mols.o: warning: objtool:
m5mols_set_fmt() falls through to next function __cfi_m5mols_open()

With this new diagnostic, -Wfunction-fallthrough, we can produce a
similar diagnostic:

drivers/media/i2c/m5mols/m5mols_core.c:573:12: warning: function
'm5mols_set_fmt' falls through to next function
[-Wfunction-fallthrough]
static int m5mols_set_fmt(struct v4l2_subdev *sd,
           ^

This can help diagnose the improper usage of __builtin_unreachable(). It
can also help us spot potential codegen bugs in LLVM itself.

Example issues faced in the past:
https://github.com/ClangBuiltLinux/linux/issues?q=is%3Aissue+falls+label%3A%22%5BTOOL%5D+objtool%22+
Feature request:
https://lore.kernel.org/llvm/CAHk-=wgTSdKYbmB1JYM5vmHMcD9J9UZr0mn7BOYM_LudrP+Xvw@mail.gmail.com/

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2023, 4:01 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
nickdesaulniers requested review of this revision.Mar 20 2023, 4:01 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 20 2023, 4:01 PM
  • git clang-format HEAD~
nickdesaulniers added inline comments.
llvm/test/CodeGen/AArch64/function-fallthrough.ll
7

TODO: add test that the final MBB is a call to a noreturn function followed by unreachable

I'm concerned about the potential for false positives:

  • If a user doesn't mark a function noreturn, but it doesn't actually ever return.
  • A function is conditionally broken, but the condition is never actually true (for example, jump threading creates a dead codepath).
  • A function might actually be unconditionally broken, but it doesn't matter because it never actually gets called. This can easily happen with C++ templates, or an "outlining" optimization on LLVM IR.

For some of these situations, there are somewhat straightforward workarounds we can suggest for users... but in some cases, there aren't. I really don't want to to implement warnings where the best suggestion we can make to fix the warning is "Try refactoring your code; it might go away".

Yeah, we already have the return type warning and UBSan for the dynamic version of this check - seems OK? That a function lowers to unreachable isn't a bug - there's a ton of them in LLVM, quite intentionally as unimplemented functions that should never be called.

If the issue is with the actual fallthrough - that's been discussed elsewhere recently, and I think it'd be reasonable to turn on TrapOnUnreachable everywhere, basically (with NoTrapAfterNoreturn too, perhaps, to keep the cost low) so that these trivial fallthroughs hit int3 instead of going on to the next function.

If the issue is with the actual fallthrough - that's been discussed elsewhere recently, and I think it'd be reasonable to turn on TrapOnUnreachable everywhere, basically (with NoTrapAfterNoreturn too, perhaps, to keep the cost low) so that these trivial fallthroughs hit int3 instead of going on to the next function.

As I understand things, folks are confused about the runtime behavior of falling through and magically calling some other function. If we can warn about that situation from the FE without false positives, then huttah, I think that's sufficient. But I don't think we know of a way to do that from the FE, so I think ensuring a runtime trap is more helpful to folks than the status quo. (Also, in general, I think we want to avoid emitting diagnostics from the backend -- that has usability issues where clean -fsyntax-only builds suddenly get warnings when doing full codegen, or in circumstances where the optimization level changes the diagnostic output, etc.)

I'm concerned about the potential for false positives:

  • If a user doesn't mark a function noreturn, but it doesn't actually ever return.

Ok, but a user can fix that in their code. Also, I think we could possibly help by adding diagnostics for this. Unconditionally calling a noreturn function or having an obviously infinite loop should trigger such a diagnostic with note encouraging the user to mark their function noreturn as well?

  • A function is conditionally broken, but the condition is never actually true (for example, jump threading creates a dead codepath).
  • A function might actually be unconditionally broken, but it doesn't matter because it never actually gets called. This can easily happen with C++ templates, or an "outlining" optimization on LLVM IR.

These sound like bugs in LLVM. Even if we don't expose function fallthrough to users as a diagnostic warning (via -Wfunction-fallthrough), perhaps a -mllvm flag is still handy for llvm developers to spot places where we could perhaps improve codegen.

I don't think we should ever emit a branch to an unconditional block; perhaps we need to be more aggressive about cleaning those up? Otherwise it seems like a waste of instructions, which pollute I$.

For some of these situations, there are somewhat straightforward workarounds we can suggest for users... but in some cases, there aren't. I really don't want to to implement warnings where the best suggestion we can make to fix the warning is "Try refactoring your code; it might go away".

Contrast that with the intent of the feature request; diagnosing when a function may fall through to another. That is quite painful to debug when that occurs at runtime. How do we reconcile?

Yeah, we already have the return type warning

Which warning?

and UBSan for the dynamic version of this check - seems OK?

Which ubsan check?

That a function lowers to unreachable isn't a bug - there's a ton of them in LLVM, quite intentionally as unimplemented functions that should never be called.

IDK about an entire function lowering to unreachable; more so that there exists a branch to an unreachable basic block (or simply an unreachable inst) where that results in one function falling through to another.

If the issue is with the actual fallthrough - that's been discussed elsewhere recently,

Link?

and I think it'd be reasonable to turn on TrapOnUnreachable everywhere, basically (with NoTrapAfterNoreturn too, perhaps, to keep the cost low) so that these trivial fallthroughs hit int3 instead of going on to the next function.

Right, at this point, I'm probably just going to turn on -mllvm -trap-unreachable in the Linux kernel. I might create a -f flag for it, since I do my best to avoid using -mllvm in the kernel's build system. I think this is unfortunate because I suspect it will increase the kernel image size and hide codegen bugs in LLVM, particularly when sanitizers are enabled. (i.e. places where we generate branches to unreachable).

If the issue is with the actual fallthrough - that's been discussed elsewhere recently, and I think it'd be reasonable to turn on TrapOnUnreachable everywhere, basically (with NoTrapAfterNoreturn too, perhaps, to keep the cost low) so that these trivial fallthroughs hit int3 instead of going on to the next function.

As I understand things, folks are confused about the runtime behavior of falling through and magically calling some other function. If we can warn about that situation from the FE without false positives, then huttah, I think that's sufficient. But I don't think we know of a way to do that from the FE, so I think ensuring a runtime trap is more helpful to folks than the status quo. (Also, in general, I think we want to avoid emitting diagnostics from the backend -- that has usability issues where clean -fsyntax-only builds suddenly get warnings when doing full codegen, or in circumstances where the optimization level changes the diagnostic output, etc.)

The issue is that there are problems that may occur as a result of optimizations; if the front end does not do optimizations, it cannot diagnose these. Some examples include:

  • -Wframe-larger-than= (useful for embedded systems without "nearly unlimited" stack, though an imperfect proxy for maximal stack depth, which is what these systems actually want. Helpful for spotting large stack allocations that should come from the heap or static memory).
  • -Wattribute-warning (some implementations of _FORTIFY_SOURCE such as the Linux kernel's rely on this; specifically that calls to functions with these attributes get optimized out. This provides significant protections during compile time when due to optimizations we have an out of bounds access, when compared to implementations that simply use _Static_assert)
  • -Wfunction-fallthrough (hypothetical new diagnostic this patch is adding)

Even if Clang had a high level IR and did optimizations, it still could not guarantee the results of LLVM's subsequent optimizations.

Does that mean that -Wframe-larger-than= (and the like) should not exist because it's not diagnosable via -fsyntax-only?

I do think that there are a class of diagnostics that are useful to users that depend on optimizations being run. While a policy around new diagnostics being diagnosable via -fsyntax-only is a simplistic litmus test, I do think it allows for our competition to implement diagnostics that may be more useful to users. This hamstrings our competitiveness, IMO.

I'm concerned about the potential for false positives:

  • If a user doesn't mark a function noreturn, but it doesn't actually ever return.

Ok, but a user can fix that in their code. Also, I think we could possibly help by adding diagnostics for this. Unconditionally calling a noreturn function or having an obviously infinite loop should trigger such a diagnostic with note encouraging the user to mark their function noreturn as well?

But that's a false positive - there was no bug in their code. We try to avoid those in compile-time diagnostics. Maybe the noreturn function is a library they're calling outside their control.

  • A function is conditionally broken, but the condition is never actually true (for example, jump threading creates a dead codepath).
  • A function might actually be unconditionally broken, but it doesn't matter because it never actually gets called. This can easily happen with C++ templates, or an "outlining" optimization on LLVM IR.

These sound like bugs in LLVM. Even if we don't expose function fallthrough to users as a diagnostic warning (via -Wfunction-fallthrough), perhaps a -mllvm flag is still handy for llvm developers to spot places where we could perhaps improve codegen.

I don't understand why these would be bugs in LLVM - the LLVM source code itself has many functions that are intentionally unconditionally broken if you ever call them (they have just llvm_unreachable as their body) - and they are, in practice, never called. These aren't bugs/there isn't anything to do to "fix" this code and a warning/error that flagged them as such would be noise and inconvenience to LLVM developers.

A sanitizer error if they're ever called would be suitable, and does exist - if a developer introduced a bug where one of these got called, it could be diagnosed by -fsanitize=undefined.

I don't think we should ever emit a branch to an unconditional block; perhaps we need to be more aggressive about cleaning those up? Otherwise it seems like a waste of instructions, which pollute I$.

For some of these situations, there are somewhat straightforward workarounds we can suggest for users... but in some cases, there aren't. I really don't want to to implement warnings where the best suggestion we can make to fix the warning is "Try refactoring your code; it might go away".

Contrast that with the intent of the feature request; diagnosing when a function may fall through to another. That is quite painful to debug when that occurs at runtime. How do we reconcile?

Generally the solution LLVM/Clang prefers is compile-time diagnostics that can be implemented reliably/deterministically in the frontend, and sanitizers that fail fast at runtime for ~everything else.

Yeah, we already have the return type warning

Which warning?

-Wreturn-type: https://godbolt.org/z/cde5MGeac

and UBSan for the dynamic version of this check - seems OK?

Which ubsan check?

https://godbolt.org/z/n1YGWc9Wb - "runtime error: execution reached the end of a value-returning function without returning a value"

That a function lowers to unreachable isn't a bug - there's a ton of them in LLVM, quite intentionally as unimplemented functions that should never be called.

IDK about an entire function lowering to unreachable; more so that there exists a branch to an unreachable basic block (or simply an unreachable inst) where that results in one function falling through to another.

If the issue is with the actual fallthrough - that's been discussed elsewhere recently,

Link?

https://discourse.llvm.org/t/can-we-keep-must-progress-optimizations-around-infinite-loop-in-c-while-avoiding-some-surprising-behavior/69205/3

and I think it'd be reasonable to turn on TrapOnUnreachable everywhere, basically (with NoTrapAfterNoreturn too, perhaps, to keep the cost low) so that these trivial fallthroughs hit int3 instead of going on to the next function.

Right, at this point, I'm probably just going to turn on -mllvm -trap-unreachable in the Linux kernel. I might create a -f flag for it, since I do my best to avoid using -mllvm in the kernel's build system. I think this is unfortunate because I suspect it will increase the kernel image size and hide codegen bugs in LLVM, particularly when sanitizers are enabled. (i.e. places where we generate branches to unreachable).

What sort of codegen bugs would this hide? What's the problematic case with sanitizers you're picturing? I don't quite understand it - perhaps a godbolt link/example?

If this does increase size at all - beyond the necessary cases (adding an int3 on an empty and never-called function or a never taken branch that would've fallen through) those are bugs we could fix.

That's partly what I wanted to do when I started looking at enabling TrapOnUnreachable in general - trying to get to the minimal number of traps necessary, so code with an infinite-but-side-effecting (so valid C++/IR) loop shouldn't need a trap after it, and nothing branches to the unreachable so it's fine to omit any trap there. But I never could figure out how to do that nicely (basically is the last instruction before the end where the trap might go, an unconditional jump? Then no need for a trap, probably... (I mean, someone else might be jumping into this tail, so it's a bit trickier than just that, and even just that wasn't obvious to me))

If the issue is with the actual fallthrough - that's been discussed elsewhere recently, and I think it'd be reasonable to turn on TrapOnUnreachable everywhere, basically (with NoTrapAfterNoreturn too, perhaps, to keep the cost low) so that these trivial fallthroughs hit int3 instead of going on to the next function.

As I understand things, folks are confused about the runtime behavior of falling through and magically calling some other function. If we can warn about that situation from the FE without false positives, then huttah, I think that's sufficient. But I don't think we know of a way to do that from the FE, so I think ensuring a runtime trap is more helpful to folks than the status quo. (Also, in general, I think we want to avoid emitting diagnostics from the backend -- that has usability issues where clean -fsyntax-only builds suddenly get warnings when doing full codegen, or in circumstances where the optimization level changes the diagnostic output, etc.)

The issue is that there are problems that may occur as a result of optimizations; if the front end does not do optimizations, it cannot diagnose these. Some examples include:

  • -Wframe-larger-than= (useful for embedded systems without "nearly unlimited" stack, though an imperfect proxy for maximal stack depth, which is what these systems actually want. Helpful for spotting large stack allocations that should come from the heap or static memory).
  • -Wattribute-warning (some implementations of _FORTIFY_SOURCE such as the Linux kernel's rely on this; specifically that calls to functions with these attributes get optimized out. This provides significant protections during compile time when due to optimizations we have an out of bounds access, when compared to implementations that simply use _Static_assert)
  • -Wfunction-fallthrough (hypothetical new diagnostic this patch is adding)

Even if Clang had a high level IR and did optimizations, it still could not guarantee the results of LLVM's subsequent optimizations.

Does that mean that -Wframe-larger-than= (and the like) should not exist because it's not diagnosable via -fsyntax-only?

I do think that there are a class of diagnostics that are useful to users that depend on optimizations being run. While a policy around new diagnostics being diagnosable via -fsyntax-only is a simplistic litmus test, I do think it allows for our competition to implement diagnostics that may be more useful to users. This hamstrings our competitiveness, IMO.

Different tools for different folks - I don't think it's outright unacceptable to have optimizer-driven diagnostics, but I think we're understandably cautious of them.

I wouldn't totally mind if frame-large-than were a sanitizer failure only if/when the function is called. Doesn't seem so bad. But it's not the worst thing as it is today either.

I'm concerned about the potential for false positives:

  • If a user doesn't mark a function noreturn, but it doesn't actually ever return.

Ok, but a user can fix that in their code. Also, I think we could possibly help by adding diagnostics for this. Unconditionally calling a noreturn function or having an obviously infinite loop should trigger such a diagnostic with note encouraging the user to mark their function noreturn as well?

But that's a false positive - there was no bug in their code. We try to avoid those in compile-time diagnostics. Maybe the noreturn function is a library they're calling outside their control.

Can you or @efriedma demonstrate a concrete case you're thinking of? I've tried:

void x (void) {
    for (;;);
}
int y (int x) {
    if (x)
        __builtin_unreachable();
    return 42;
}
__attribute__((noreturn))
extern void z (void);

void w (void) { z(); }

with this patch and none of the above produce such a false positive with my patch applied at any optimization level. (i.e. no -Wfunction-fallthrough produced; you'll notice I explicitly check for calls to noreturn functions). So I'd like to make sure we're not arguing against a strawman here. I'm happy to add more test cases to this patch to assure reviewers there are no false positives.

  • A function is conditionally broken, but the condition is never actually true (for example, jump threading creates a dead codepath).
  • A function might actually be unconditionally broken, but it doesn't matter because it never actually gets called. This can easily happen with C++ templates, or an "outlining" optimization on LLVM IR.

These sound like bugs in LLVM. Even if we don't expose function fallthrough to users as a diagnostic warning (via -Wfunction-fallthrough), perhaps a -mllvm flag is still handy for llvm developers to spot places where we could perhaps improve codegen.

I don't understand why these would be bugs in LLVM - the LLVM source code itself has many functions that are intentionally unconditionally broken if you ever call them (they have just llvm_unreachable as their body) - and they are, in practice, never called. These aren't bugs/there isn't anything to do to "fix" this code and a warning/error that flagged them as such would be noise and inconvenience to LLVM developers.

I'm not sure why you're referring to calls. This diagnostic is strictly related to intra-function control flow that results in branch to unreachable, which generally is lowered to a branch off the end of a function. I would think that would be considered unacceptable.

A sanitizer error if they're ever called would be suitable, and does exist - if a developer introduced a bug where one of these got called, it could be diagnosed by -fsanitize=undefined.
Generally the solution LLVM/Clang prefers is compile-time diagnostics that can be implemented reliably/deterministically in the frontend, and sanitizers that fail fast at runtime for ~everything else.

Yes, but kernel developers would prefer to diagnose issues at compile time when possible, rather that at runtime. Function fallthough is diagnosable at compile time, as this patch demonstrates.

Yeah, we already have the return type warning

Which warning?

-Wreturn-type: https://godbolt.org/z/cde5MGeac

Strange; we get different behavior between C and C++ here in clang wrt. whether a ret is inserted or not. But for C, I think it demonstrates that that diagnostic is not enough to diagnose function fallthrough; try adding __builtin_unreachable() and notice how -Wreturn-type is of no help.

and UBSan for the dynamic version of this check - seems OK?

Which ubsan check?

https://godbolt.org/z/n1YGWc9Wb - "runtime error: execution reached the end of a value-returning function without returning a value"

(Strange again; we get different behavior here between C and C++ in clang)

That said, it would be nice to diagnose function fallthrough at compile time rather than having to have a distinct UBSan build catch this at runtime.

That a function lowers to unreachable isn't a bug - there's a ton of them in LLVM, quite intentionally as unimplemented functions that should never be called.

IDK about an entire function lowering to unreachable; more so that there exists a branch to an unreachable basic block (or simply an unreachable inst) where that results in one function falling through to another.

If the issue is with the actual fallthrough - that's been discussed elsewhere recently,

Link?

https://discourse.llvm.org/t/can-we-keep-must-progress-optimizations-around-infinite-loop-in-c-while-avoiding-some-surprising-behavior/69205/3

Yes, this seems similar. Thanks for the link; I'll watch that thread and think about whether there's common solutions to both problems here.

and I think it'd be reasonable to turn on TrapOnUnreachable everywhere, basically (with NoTrapAfterNoreturn too, perhaps, to keep the cost low) so that these trivial fallthroughs hit int3 instead of going on to the next function.

Right, at this point, I'm probably just going to turn on -mllvm -trap-unreachable in the Linux kernel. I might create a -f flag for it, since I do my best to avoid using -mllvm in the kernel's build system. I think this is unfortunate because I suspect it will increase the kernel image size and hide codegen bugs in LLVM, particularly when sanitizers are enabled. (i.e. places where we generate branches to unreachable).

What sort of codegen bugs would this hide? What's the problematic case with sanitizers you're picturing? I don't quite understand it - perhaps a godbolt link/example?

Here's a reduced test case from a recent kernel incident:
https://godbolt.org/z/q8TWrjMf6
Note how __ubsan_handle_out_of_bounds is literally marked RECOVERABLE in https://github.com/llvm/llvm-project/blob/fb8d894f23c5e805f0c87d89fb9d6c0eed3a0e72/compiler-rt/lib/ubsan/ubsan_handlers.h#L90-L91, and yet we generate a call to it that will fall off the end of the function when that callback returns.

Note: that test case has an off by one bug in the do while loop that was fixed; but it's additionally problematic that LLVM generated code that was not "recoverable."
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=efbcbb12ee99f750c9f25c873b55ad774871de2a

If this does increase size at all - beyond the necessary cases (adding an int3 on an empty and never-called function or a never taken branch that would've fallen through) those are bugs we could fix.

Quick stats for a kernel build (x86 defconfig):
Default:
85635360 B == ~81.66 MB
Default+-mllvm -trap-unreachable:
85826000 B == ~81.85 MB
(so a 0.36% increase in binary size)

That's partly what I wanted to do when I started looking at enabling TrapOnUnreachable in general - trying to get to the minimal number of traps necessary, so code with an infinite-but-side-effecting (so valid C++/IR) loop shouldn't need a trap after it, and nothing branches to the unreachable so it's fine to omit any trap there. But I never could figure out how to do that nicely (basically is the last instruction before the end where the trap might go, an unconditional jump? Then no need for a trap, probably... (I mean, someone else might be jumping into this tail, so it's a bit trickier than just that, and even just that wasn't obvious to me))

As you note, it is interesting to see TargetOptions::TrapUnreachable enabled by default for specific targets. It is probably worthwhile pursuing getting it enabled for more targets. Were there patches to that effect?

If the issue is with the actual fallthrough - that's been discussed elsewhere recently, and I think it'd be reasonable to turn on TrapOnUnreachable everywhere, basically (with NoTrapAfterNoreturn too, perhaps, to keep the cost low) so that these trivial fallthroughs hit int3 instead of going on to the next function.

As I understand things, folks are confused about the runtime behavior of falling through and magically calling some other function. If we can warn about that situation from the FE without false positives, then huttah, I think that's sufficient. But I don't think we know of a way to do that from the FE, so I think ensuring a runtime trap is more helpful to folks than the status quo. (Also, in general, I think we want to avoid emitting diagnostics from the backend -- that has usability issues where clean -fsyntax-only builds suddenly get warnings when doing full codegen, or in circumstances where the optimization level changes the diagnostic output, etc.)

The issue is that there are problems that may occur as a result of optimizations; if the front end does not do optimizations, it cannot diagnose these. Some examples include:

  • -Wframe-larger-than= (useful for embedded systems without "nearly unlimited" stack, though an imperfect proxy for maximal stack depth, which is what these systems actually want. Helpful for spotting large stack allocations that should come from the heap or static memory).
  • -Wattribute-warning (some implementations of _FORTIFY_SOURCE such as the Linux kernel's rely on this; specifically that calls to functions with these attributes get optimized out. This provides significant protections during compile time when due to optimizations we have an out of bounds access, when compared to implementations that simply use _Static_assert)
  • -Wfunction-fallthrough (hypothetical new diagnostic this patch is adding)

Even if Clang had a high level IR and did optimizations, it still could not guarantee the results of LLVM's subsequent optimizations.

Does that mean that -Wframe-larger-than= (and the like) should not exist because it's not diagnosable via -fsyntax-only?

I do think that there are a class of diagnostics that are useful to users that depend on optimizations being run. While a policy around new diagnostics being diagnosable via -fsyntax-only is a simplistic litmus test, I do think it allows for our competition to implement diagnostics that may be more useful to users. This hamstrings our competitiveness, IMO.

Different tools for different folks - I don't think it's outright unacceptable to have optimizer-driven diagnostics, but I think we're understandably cautious of them.

I wouldn't totally mind if frame-large-than were a sanitizer failure only if/when the function is called. Doesn't seem so bad. But it's not the worst thing as it is today either.

Honestly, I should probably write a new diagnostic for a single object being stack allocated if it's greater than a given threshold; that's probably more useful for what the kernel is looking for; otherwise -Wframe-larger-than= can be death by a thousand papercuts due to inlining decisions.

Making -Wframe-larger-than into a sanitizer is perhaps interesting; it would probably require careful interaction with the runtime to know what the "current stack depth" was at all times. Though, "we prefer to diagnose things at compile time, when possible." @kees might know if the kernel has any protection to detect a potential stack overflow at runtime, or what the requirements for some such thing might look like.

Yes, but kernel developers would prefer to diagnose issues at compile time when possible, rather that at runtime. Function fallthough is diagnosable at compile time, as this patch demonstrates.

I'm not saying the diagnostic is useless. I'm saying users are going to find cases where their perfectly good code blows up, and so the warning will get disabled for every large codebase except the Linux kernel. And every user that goes through the process of seeing the false positive and disabling the warning will start distrusting other warnings that are actually reliable.

Note how __ubsan_handle_out_of_bounds is literally marked RECOVERABLE in https://github.com/llvm/llvm-project/blob/fb8d894f23c5e805f0c87d89fb9d6c0eed3a0e72/compiler-rt/lib/ubsan/ubsan_handlers.h#L90-L91, and yet we generate a call to it that will fall off the end of the function when that callback returns.

Using "recoverable" UBSan doesn't actually affect the code we generate after the diagnostic, and it doesn't suppress compiler optimizations that turn provably undefined code into "unreachable". So often the code blows up anyway. Maybe we should try to do something different here in some cases. For out-of-bounds specifically, I'm not sure what a fix for that would look like, though; there isn't any intuitive behavior for an out-of-bounds access.

Can you or @efriedma demonstrate a concrete case you're thinking of? I've tried:

The cases I'm thinking of are cases which *do* actually appear to fall through, but don't because of invariants that aren't visible to the compiler. Functions that look like void f() { __builtin_unreachable() } actually exist in LLVM. (The pattern tends to show up with virtual functions...) Or a function may or may not be noreturn depending on the argument values. Or jump threading generated a codepath that has unconditional undefined behavior, but we can't actually prove it because there's a call to a function that isn't marked willreturn.

Have you tried building LLVM with this to see what shows up?

ojeda added a subscriber: ojeda.Mar 23 2023, 2:37 AM

If the issue is with the actual fallthrough - that's been discussed elsewhere recently, and I think it'd be reasonable to turn on TrapOnUnreachable everywhere, basically (with NoTrapAfterNoreturn too, perhaps, to keep the cost low) so that these trivial fallthroughs hit int3 instead of going on to the next function.

As I understand things, folks are confused about the runtime behavior of falling through and magically calling some other function. If we can warn about that situation from the FE without false positives, then huttah, I think that's sufficient. But I don't think we know of a way to do that from the FE, so I think ensuring a runtime trap is more helpful to folks than the status quo. (Also, in general, I think we want to avoid emitting diagnostics from the backend -- that has usability issues where clean -fsyntax-only builds suddenly get warnings when doing full codegen, or in circumstances where the optimization level changes the diagnostic output, etc.)

The issue is that there are problems that may occur as a result of optimizations; if the front end does not do optimizations, it cannot diagnose these. Some examples include:

  • -Wframe-larger-than= (useful for embedded systems without "nearly unlimited" stack, though an imperfect proxy for maximal stack depth, which is what these systems actually want. Helpful for spotting large stack allocations that should come from the heap or static memory).
  • -Wattribute-warning (some implementations of _FORTIFY_SOURCE such as the Linux kernel's rely on this; specifically that calls to functions with these attributes get optimized out. This provides significant protections during compile time when due to optimizations we have an out of bounds access, when compared to implementations that simply use _Static_assert)
  • -Wfunction-fallthrough (hypothetical new diagnostic this patch is adding)

Even if Clang had a high level IR and did optimizations, it still could not guarantee the results of LLVM's subsequent optimizations.

Does that mean that -Wframe-larger-than= (and the like) should not exist because it's not diagnosable via -fsyntax-only?

I do think that there are a class of diagnostics that are useful to users that depend on optimizations being run. While a policy around new diagnostics being diagnosable via -fsyntax-only is a simplistic litmus test, I do think it allows for our competition to implement diagnostics that may be more useful to users. This hamstrings our competitiveness, IMO.

Okay, was not very thorough in explaining my concerns about diagnostics coming out of the optimizer. -fsyntax-only behavior is one facet to the issue, but the problem boils down to "mysterious compiler behavior" as far as the user is concerned. Other examples of this same problem are:

  • User is compiling everything locally with their debug build (-O0) and everything compiles fine. They commit it and it gets built as a release build (-O2 or -O3), and now they get alerted to problems.
  • User is compiling everything locally with their optimized build and everything compiles fine. They commit it and it gets built on another machine with significantly more resources than the developer's machine had, so optimization decisions are different and not they get alerted to problems.
  • User is compiling everything locally with their optimized build and everything compiles fine, they close their web browser, fix a typo in a comment, and recompile and now they get alerted to problems.
  • Any use of -Werror with optimization-based diagnostics is effectively playing compiler russian roulette.

The root of the problem is that we want diagnostics to be reproducible, and diagnostics generated by the optimizer often are not reliably reproducible. GCC has already gone down this path, and it causes confusion in practice:

https://stackoverflow.com/questions/59388740/gcc-shows-different-warnings-depending-on-optimisation-level
https://stackoverflow.com/questions/65049644/gcc-warning-about-unitialized-values-caused-by-turning-optimization-on
https://developers.redhat.com/blog/2019/03/13/understanding-gcc-warnings-part-2#

That's why we've traditionally limited the backend to producing remarks instead of warnings or errors. It's a tradeoff between user experience decisions. That's not to say there's never a valid reason to diagnose from the backend, but I think we need to be very mindful about the user experience when we add them. Linux kernel developers are one class of user (and likely don't have too much confusion over optimization-based diagnostics), but we've got plenty of users for whom the compiler is effectively a black box, and those are the folks I worry about with optimization-based diagnostics.

Note how __ubsan_handle_out_of_bounds is literally marked RECOVERABLE in https://github.com/llvm/llvm-project/blob/fb8d894f23c5e805f0c87d89fb9d6c0eed3a0e72/compiler-rt/lib/ubsan/ubsan_handlers.h#L90-L91, and yet we generate a call to it that will fall off the end of the function when that callback returns.

Using "recoverable" UBSan doesn't actually affect the code we generate after the diagnostic, and it doesn't suppress compiler optimizations that turn provably undefined code into "unreachable". So often the code blows up anyway. Maybe we should try to do something different here in some cases. For out-of-bounds specifically, I'm not sure what a fix for that would look like, though; there isn't any intuitive behavior for an out-of-bounds access.

Recoverable is important concept for ubsan callbacks; some are expected for the program to continue some are not. If we enable UBSan to help us find UB in our program, but the presence of said UB causes our program to go off the rails, then that would not be considered recoverable. A less charitable interpretation would be to call that sanitizer broken.

I wonder if the issue I'm seeing in https://godbolt.org/z/aaj6KTrPe is actually something better solved by additional use of freeze on poison? (uh oh, I've summoned Cthulhu (and the Cenobites) by uttering those two terms in the same sentence! A blood sacrifice is necessary!)

For example, where does the unreachable come from in that example?

SCCPPass is turning a br i1 poison, ... into an unreachable.

Where did the branch on poison come from?

LoopFullUnrollPass introduced a bunch of branches on poison when unrolling the loop.

I'd bet that if we "froze" the poison first before branching on it (and only did so when -fsanitize=array-bounds was enabled) then we'd never end up replacing the branch with unreachable, which is destroying our control flow and making our sanitizer not actually recoverable.


Come to think of it, the other test of "broken code" I've seen in the past (an alluded to upthread) was related to the divide by zero sanitizer.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc3&id=e5d523f1ae8f2cef01f8e071aeee432654166708
I should reduce a test case from that, and see if perhaps freeze is also a solution there, too. Then maybe warn on fallthrough becomes moot (or less interesting) if these are simply codegen bugs for various UBSanitizers.

There are limits to how much we can do by just changing the code clang generates... but the two particular cases you mention probably could be "fixed" by messing with the IR generated by clang. Sure, that probably makes sense to pursue. (If you're going to pick an arbitrary value, zero is probably a better choice than "freeze poison".)

There are limits to how much we can do by just changing the code clang generates... but the two particular cases you mention probably could be "fixed" by messing with the IR generated by clang. Sure, that probably makes sense to pursue. (If you're going to pick an arbitrary value, zero is probably a better choice than "freeze poison".)

I don't think that could be a clang codegen of IR change. What I had in mind is closer to:

diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index 916c02692823..aa1f288b51ee 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -236,9 +236,17 @@ void llvm::simplifyLoopAfterUnroll(Loop *L, bool SimplifyIVs, LoopInfo *LI,
   SmallVector<WeakTrackingVH, 16> DeadInsts;
   for (BasicBlock *BB : L->getBlocks()) {
     for (Instruction &Inst : llvm::make_early_inc_range(*BB)) {
-      if (Value *V = simplifyInstruction(&Inst, {DL, nullptr, DT, AC}))
-        if (LI->replacementPreservesLCSSAForm(&Inst, V))
+      if (Value *V = simplifyInstruction(&Inst, {DL, nullptr, DT, AC})) {
+        // TODO: check if sanitizer is enabled
+        if (isa<PoisonValue>(V) && isa<LoadInst>(Inst)) {
+          errs() << "XXX: found poison (replacing a load)!\n";
+          FreezeInst *FI = new FreezeInst(V);
+          FI->insertInto(BB, Inst.getIterator());
+          if (LI->replacementPreservesLCSSAForm(FI, V))
+            Inst.replaceAllUsesWith(FI);
+        } else if (LI->replacementPreservesLCSSAForm(&Inst, V))
           Inst.replaceAllUsesWith(V);
+      }
       if (isInstructionTriviallyDead(&Inst))
         DeadInsts.emplace_back(&Inst);
     }

Which seems to do exactly what I want (minus only doing so when -fsanitize=array-bounds is enabled; not sure we leave such breadcrumbs behind in IR; still looking to see if we can tell in LLVM when specific sanitizers are enabled)

The problem with a change like that is that it's not clear what the underlying semantic model is. If we add a flag that says "try to generate code that matches Linux kernel developers' mental model of the underlying machine", or "loop unrolling should try to preserve undefined behavior", or "out-of-bounds memory accesses are well-defined", the semantics are unclear. The point of having well-defined IR is that a developer can tell whether a transform is legal or not based on reading LangRef, as opposed to trying to blindly guess whether a transform is going to break some user's code.

Making clang generate different IR for specific load/division/etc. operators is much easier to specify; IR has the same meaning it always has, we're just generating IR that's well-defined in more cases.

The problem with a change like that is that it's not clear what the underlying semantic model is. If we add a flag that says "try to generate code that matches Linux kernel developers' mental model of the underlying machine", or "loop unrolling should try to preserve undefined behavior", or "out-of-bounds memory accesses are well-defined", the semantics are unclear. The point of having well-defined IR is that a developer can tell whether a transform is legal or not based on reading LangRef, as opposed to trying to blindly guess whether a transform is going to break some user's code.

I think it needs to be making specific cases of UB defined when specific santitizers trying to catch those UB cases are enabled, if those sanitizer call backs are expected to be recoverable (i.e. the ones that are not noreturn). (This reminds me of a transform I think LLVM does where the blockaddress of a removed basic block gets replaced with 1).

Making clang generate different IR for specific load/division/etc. operators is much easier to specify; IR has the same meaning it always has, we're just generating IR that's well-defined in more cases.

So perhaps if -fsanitize=array-bounds was enabled, clang would add metadata to each load to say "this is an array bounds sanitizer load" (and metadata for udiv when -fsanitize=integer-divide-by-zero)? I think then we'd still need to teach simplifyInstruction to look for such metadata and do something different (like replace with freeze or 0 or 1 or whatever).

Or did you have something else in mind?

(Is it worth moving this discussion to another RFC? I would like to ask folks from https://users.cs.utah.edu/~regehr/papers/undef-pldi17.pdf their thoughts on freeze wrt this issue.)

nikic added a subscriber: nikic.Mar 23 2023, 1:14 PM

@nickdesaulniers I don't think we want to handle that on the LLVM side. That will be fundamentally unreliable. If Clang wishes to make "recoverable" ubsan have arbitrary but well-defined behavior in the case of UB, it needs to generate appropriate IR to model that. For example, instead of generating

if (x == 0) {
  report_div_by_zero();
}
res = y / x;

it needs to generate

if (x == 0) {
  report_div_by_zero();
  res = 0; // or any other value
} else {
  res = y / x;
}

And similarly for other cases.

(recoverable feels like a bit of a distraction here? recoverable just means you've asked ubsan not to trap/stop on failure - but to let the program continue and do whatever it would've done without the sanitizer enabled - sometimes that's crash/trap anyway, sometimes it's something less bad... but that's all that's being asked for: "keep going/do whatever you'd do without the sanitizer enabled, rather than hard stop as soon as the sanitizer detects a problem" - no, we shouldn't recover differently/more safely with sanitizers enabled (don't want to create a language variant/encourage people to build incorrect programs with sanitizers and run them that way because "it works"))

but we could/should trap at the end of a function that lacks a valid return, if the end is reachable by local reasoning (ie: maybe we still trap after noreturn - alternatively we could guarantee to include a trap at the end of a noreturn so it can't return instead of trapping after noreturn? (wouldn't be 100% rigorous, because you could be mix-and-matching compilers, but might be the right tradeoff in terms of size/safety))

I think the reason "recoverable" ubsan causes trouble is that it introduces branches that subsequent optimizations can abuse. So without ubsan, we just have an udiv instruction. With ubsan, we conveniently have a branch on exactly the condition that would make the udiv undefined, so we can easily prove control flow doesn't continue after the ubsan handler. Subsequent optimizations take advantage of that, so ubsan "breaks" code. (So the code was never actually correct according to the semantic model, but it was broken in a way the compiler is less likely optimize.)

I think the reason "recoverable" ubsan causes trouble is that it introduces branches that subsequent optimizations can abuse. So without ubsan, we just have an udiv instruction. With ubsan, we conveniently have a branch on exactly the condition that would make the udiv undefined, so we can easily prove control flow doesn't continue after the ubsan handler. Subsequent optimizations take advantage of that, so ubsan "breaks" code. (So the code was never actually correct according to the semantic model, but it was broken in a way the compiler is less likely optimize.)

Ah, interesting - thanks for the explanation!

(but, yeah, not sure we can/should provide any further guarantees beyond "does /something/ after the sanitizer failure... " don't get stronger guarantees than without the sanitizer & while worth documenting that the sanitizer could make the problems after "recovery" worse than without the sanitizer enabled)