Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
I couldn't find any other non-controversial places to change this: this seems to be all the cases where something is marked as unreachable but using our current macros.
My understanding is that the behavior of llvm_unreachable is undefined in a Release build. If that's the case, this change makes things objectively worse for no benefit that I can see.
If it's guaranteed to abort then all that we lose it file and line number information so it's not as bad. Still, what's the benefit?
I would strongly contest the statement that making unreachable spaces clearly undefined behaviour to the compiler provides no benefit. It allows significant optimisations to be made that are not possible under the as-if rule if the compiler cannot prove that the code is unreachable. Allowing such optimisations is entirely the point of the existence of the llvm_unreachable macro in LLVM.
Hi David, what *is* the runtime behavior of llvm_unreachable in a program compiled with RELEASE?
Anyway, why not modify DIE by appending llvm_unreachable after the call to Fortran::common::die?
I feel like there are a number of conflated issues here.
We should use llvm_unreachable in cases consistent with how LLVM and Clang use llvm_unreachable. I describe these as are places where, if the logic of the code is correct, no user input should trigger the condition even though that might not be locally obvious. In release mode, where we do care about the performance of the Compiler's -fsyntax-only mode, these are compiled out allowing for more-aggressive optimization.
The string, however, in llvm_unreachable should indicate why the condition should not have been reached, so that if it is reached during debugging, it's easier to figure out where the bug is. We should not have llvm_unreachable("unreachable"). llvm_unreachable("can't happen"); is not good either.
For the cases like this:
bool Pre(const parser::MainProgram &) { DIE("unreachable"); }
are these unreachable because they correspond to things that just haven't been implemented yet? If so, the message should say something about being unimplemented (and I don't see much value in making these llvm_unreachable because we might want them to trigger explicitly even if we try compiling things in Release mode).
In release mode, where we do care about the performance of the Compiler's -fsyntax-only mode, these are compiled out allowing for more-aggressive optimization.
This is the crux of my concerns. It has been determined, presumably by measurement, that clang's -fsyntax-only benefits enough from this optimization that it is worth the cost of a worse user experience when the internal error occurs (undefined behavior vs. internal error message with file and line number). There is no evidence that changing any occurrence of DIE in flang would have similar benefits. It's simply premature optimization.
This optimization decision should be up to the programmer, not a project-wide policy. Now in flang we have the choice of reporting an internal error with DIE when it's low-cost to be triggered in a Release build, or llvm_unreachable for those (IMO rare) cases where you know there is a worthwhile performance benefit. Similar to std::vector::at vs. std::vector::operator[].
I agree with @hfinkel and should have changed these strings to be more descriptive; I just wanted to get this merged as soon as possible so that the general discussion about error handling could happen. I thought we had agreed to go ahead with this change as a prelude to discussing error handling in general.
Hi David, what *is* the runtime behavior of llvm_unreachable in a program compiled with RELEASE?
The runtime behaviour when compiled with RELEASE is undefined. As I and @hfinkel mentioned, the point of that is optimisations that cannot otherwise be made by the compiler. I would say that runtime performance of release builds should be one of our main priorities...
are these unreachable because they correspond to things that just haven't been implemented yet?
They're unreachable because they can't actually be arrived at with the current design of traversing the parse tree. I'm not sure why the functions exist in this case though to be honest, since they can't be called. Someone with more knowledge of this code than me can probably answer that question better.
This is the crux of my concerns. It has been determined, presumably by measurement, that clang's -fsyntax-only benefits enough from this optimization that it is worth the cost of a worse user experience when the internal error occurs (undefined behavior vs. internal error message with file and line number).
There can be no internal error in any of these cases. We're not using llvm_unreachable to state that these shouldn't be unreachable. They actually are not reachable through any code path, other than the Pre functions which you will discover you have called incorrectly very quickly in Debug builds. It's also not simply a premature optimisation; there's a huge amount of prior art for unreachable optimisations being very effective.
There's also the secondary point that we should align with the rest of the project on this rather than rolling our own solution. I think this is quite an important point that cannot just be dismissed by "we think our way is better". It needs a discussion on llvm-dev about what is wrong about the current approach of the rest of the project, and movement there. I believed that that discussion was predicated on us making this change which is why I wanted to push it through. If that's not the case and this change needs to be part of that discussion, can someone that thinks the current approach of the rest of the project is wrong start that as soon as possible? There's no point in me trying to start it as I don't disagree with the rest of the project.
Anyway, why not modify DIE by appending llvm_unreachable after the call to Fortran::common::die?
This was originally the approach taken by Kiran from AMD in this patch: https://github.com/flang-compiler/f18/pull/1057
It was rejected pending the aforementioned discussion about doing things the way LLVM does them currently vs changing the LLVM policy.
Yep, just chiming in with a +1 for this general direction, modulo points already made that I'll reiterate:
- More specific messages on unreachable
- Unreachable only in "intended to/best of the developers knowledge that these are unreachable by any user input" (ie: if they are reached, it's because there's a bug in the compiler's source code)
- /maybe/ use "report_fatal_error" for temporary "quick (rather than having to add in error handling/propagating code paths) error for features not implemented yet but that are intended to be implemented Real Soon Now"
This is documented in the LLVM Coding Standards here: https://llvm.org/docs/CodingStandards.html#assert-liberally and the Programmers Guide here: https://llvm.org/docs/ProgrammersManual.html#programmatic-errors
If you want to ensure a good user experience for end users when the compiler violates its invariants - implement something like Clang's crash reporting wrapper (or, I think there's an in-process/single-process version of it now too). You can get a reproduction from the user, then run it with an assertions-enabled build to get specific details about the failure suitable for compiler engineers to drill-down on the failure.
That doesn't seem to be the case. Quoting:
...that there is no benefit of partially replacing die function with llvm_unreachable. I will go ahead and close this PR.
+1 for good messages.
I think there's universal agreement that code that cannot be reached with user input should be marked as unreachable. The point of contention is whether or not code in a RELEASE build is allowed to catch & report that situation.
I'd generally consider that inconsistent with LLVM's style and what I was trying to avoid by having the discussion about this issue prior to flang being integrated into the project - I think it'd be a hinderance to the LLVM project if the unreachable-equivalent were always-on in release builds & thus had different performance characteristics (and then need for an extra unreachable taxonomy - those cheap enough to keep in release builds and those that are too expensive). LLVM's "expensive checks" is already an awkward extra variant/build mode - I think it's important we don't create more of those if possible.
Is there reason to believe that a subset of asserts/unreachable that would be always-on would improve Clang significantly, for instance? If not, then I don't think flang should be making a different tack here.
There's other technical issues with this. Firstly, an approach like this wouldn't work because DIE is used in places that we don't necessarily want to signal as unreachable.
Secondly, the LLVM_BUILTIN_UNREACHABLE doesn't actually do anything here as Fortran::common::die() is already marked [[no return]] so any code after it is unreachable anyway.
Ultimately the problem with using this here would be the same as using the current DIE macro; by inserting a function call you're not only not telling the compiler that area is unreachable but you're actively telling the compiler that you think it can be reached, since this function should be called if it is.
Is there reason to believe that a subset of asserts/unreachable that would be always-on would improve Clang significantly, for instance? If not, then I don't think flang should be making a different tack here.
+1 on this. Clang is a tried and tested compiler that many people are already shipping downstream commercial compilers based on.
If there's a significant problem with how it does error handling it should be discussed there as well as here and fixed for both.
If not, we should do the same thing as clang does here.
Release builds of the compiler must have well-defined behavior, and that behavior must be to crash on a fatal internal error rather than to silently proceed to generate incorrect code, which is the worst thing a compiler can do to a user. Trading off protection against silent generation of bad code to gain some unmeasured performance improvement would be insanely irresponsible.
If this is the case then we shouldn't be writing this in a language that frequently explicitly leaves things as undefined behaviour for performance reasons, which is something that both the C and C++ standards do. Undefined behaviour is both unavoidable in C++ and necessary for many of the optimisations that C++ compilers perform. Given the language we're writing in it is never going to be possible to have fully well defined behaviour for every runtime path in the compiler.
Regardless, as I've said before in this thread, there is no undefined behaviour in any of these cases; none of these macro calls are actually reachable; in some cases this is trivial, in some cases this is asserted by comments already and so those comments should be changed if this is not true. We're simply attempting to signal that to the compiler in cases that it cannot necessarily prove it itself, so that it can perform optimisations based on that fact.
That seems inconsistent with the way Clang is written (are you suggesting Clang is "insanely irresponsible" - or that many/most other software that uses assertions that are no-op'd in release builds are so?), which seems like a counterproof. Indeed it's impossible to test all invariants at all times in C++ - so which ones do we test? Do we test/fatal-error all array accesses? (do we test them at every layer of API abstraction - because a caller might violate the contract at any layer?). It doesn't seem practical to create such software - we make assumptions that code behaves according to its contract all the time & should continue to do so.
To be clear, we're all experienced compiler developers here producing production software. No one is suggesting anything that would be "insanely irresponsible." Obviously, we make all kinds of trade-offs here for performance, and correctness is simultaneously critically important, and we have general coding conventions that favor for performance understanding that the integrated effect of microoptimizations matters, even if the individual effect of every individual choice is practically unmeasurable. We use vector's operator [] instead of at() with exceptions, for example. While silently generating incorrect code is clearly unacceptable, the use of llvm_unreachable is one of many cases where we depend on internal invariants and, in places where risk is high, we should have better testing and restructure the code to help ensure correctness.
One more thought: We might split this into two patches. First, make the strings more informative. Then a patch for llvm_unreachable on top of that.
An another thought: we might have a separate cmake flag to transform DIE -> unreachable, or some form of DIE into unreachable, so that we can test the performance difference easily going forward.
Maybe? I wouldn't hold up that approach if you're really going for it - but I think it loses some of the benefits of/reason for the style guide: To make LLVM project code consistent & thus easier for LLVM developers to move between projects - being able to understand and write appropriate code in different areas.
I agree with @dblaikie, Clang and LLVM use this scheme successfully. That said, there is also llvm/Support/ErrorHandling.h (to handle user input errors).
There was a recent discussion on when to use what mechanism, I will link here once I find it again. I want to paraphrase some comments ppl made on this in IRC:
"assert/llvm_unreachable if it can't happen. if it happens then either fix the code so it really doesn't happen or use report_runtime_error. if the error really can happen then use report_runtime_error"
We updated the coding standard about this: https://llvm.org/docs/CodingStandards.html#assert-liberally
Both DIE and llvm_unreachable are for reporting internal errors. This discussion isn't about user errors.
I agree with @dblaikie, @jdoerfert and @aaron.ballman here.
I believe consistency is key, which is why I was pushing this change in the first place. Many of us work on multiple LLVM sub-projects and having to remember different coding guidelines for different files in the same repository is fairly irritating, and I think there's a barrier to entry for getting people from other subprojects to help us fix things if we're going to be rejecting their patches to flang because they don't follow the written guidelines the whole project should follow and the style that the subproject they usually work on follows.
I haven't seen a reason articulated as to why this change shouldn't be made that doesn't apply to the project as a whole; I'd like to know what makes flang exceptional here that we should do things differently. If there's a reason that this design choice should only apply to flang and not e.g. clang then I'm happy to drop the consistency argument.
It boils down to the fact that our code should follow the coding guidelines unless there's a specific, exceptional reason that applies to our subproject that we can't do so. I strongly believe the burden of argument in these cases should be on those arguing that we should diverge from the coding guidelines, not on those who are simply following the written rules of the project that we are part of. Any other way of working will only cause us issues going forward when (hopefully) we get people working with us cross-project.
I haven't seen a reason articulated as to why this change shouldn't be made that doesn't apply to the project as a whole;
I think that this is exactly right. Moreover, if your goal is to make release builds more robust in reporting internal errors, you don't just want to enable asserts to Flang, but also in MLIR and LLVM itself. FWIW, all of my production builds for my organization shipped with asserts enabled (LLVM_ENABLE_ASSERTIONS) for many years, precisely because that risk tradeoff seemed more appropriate for that environment. If you're in a similar place in the risk-tradeoff space, I recommend building with Release/LLVM_ENABLE_ASSERTIONS=ON.
That all having been said, I'm open to having more granular flags for essentially selecting NDEBUG on a subproject level when configuring. I can certainly see why that might be convenient while working on development of a particular subproject for running test suites and the like.
That all having been said, I'm open to having more granular flags for essentially selecting NDEBUG on a subproject level when configuring. I can certainly see why that might be convenient while working on development of a particular subproject for running test suites and the like.
I'm fine with this, it's something that @mehdi_amini and I both proposed as a way forward in a previous discussion on this: (see here https://github.com/flang-compiler/f18/issues/966). It's also not an issue building flang without NDEBUG and llvm with; we had accidentally been doing this since merging until I fixed the discrepancy yesterday (see here https://reviews.llvm.org/rG81bf1e29aae5992db318803ba4722a585ad64638).
I would want this toggleable flag to still be toggling the standard LLVM error handling mechanisms though, namely llvm_unreachable for unreachable cases and assert for function preconditions, so that people submitting code for flang can follow the LLVM coding guidelines and not have their patches rejected on those grounds.
Since flang will not generate code without LLVM, and since LLVM is using assert/unreachable everywhere, the *only* possibility of having a build of flang that is "hardened" with the property that @sscalpone/@tskeith/@klausler is to build LLVM with assertions enabled.
Note also that all the uses of MLIR in Flang will also come with plenty of assert/unreachable calls that won't be able to be turned into "DIE" unless you build with assertions.
So I don't understand why the frontend here would be more specific than the rest of the compiler? The divergence is not solving the problem as I understand it.
Release builds of the compiler must have well-defined behavior, and that behavior must be to crash on a fatal internal error rather than to silently proceed to generate incorrect code, which is the worst thing a compiler can do to a user. Trading off protection against silent generation of bad code to gain some unmeasured performance improvement would be insanely irresponsible.
Shipping with LLVM_ENABLE_ASSERTIONS=ON wouldn't be enough probably, you likely want to ship with UBSAN enabled. The good news is that no one prevents you from doing so!
The most recent discussions was likely: https://github.com/flang-compiler/f18/issues/966
I'll quote that I was OK with the merge of f18 into LLVM under the condition that:
I think it is not a blocker as long as there is agreement that merging right now implies the LLVM ways of modeling assertions, and having a "release_assert" is something that you'll need to convince to change on llvm-dev@ later.
So right now the default should be the LLVM policy.
flang has no guidelines that say don't use llvm_unreachable and no changes have been rejected because of its use.
I haven't seen a reason articulated as to why this change shouldn't be made that doesn't apply to the project as a whole; I'd like to know what makes flang exceptional here that we should do things differently.
flang is not different. I think a developer in any part of the project should have the option of using a version of llvm_unreachable (called DIE or something else) that is defined to produce an internal error even in Release builds. I object to this particular change because whoever wrote the code chose to use DIE, which has well-defined behavior in Release builds and you haven't given a good reason (IMO) to override that decision and make it undefined. The reason for the change seems to be that someone new to flang might encounter DIE and not know at first what it does (!?); the same would seem to be true of thousands of other names they would encounter.
Except this one. Flang's implicit guidelines seem to have some idea about when DIE is appropriate and distinct from llvm_unreachable.
I haven't seen a reason articulated as to why this change shouldn't be made that doesn't apply to the project as a whole; I'd like to know what makes flang exceptional here that we should do things differently.
flang is not different. I think a developer in any part of the project should have the option of using a version of llvm_unreachable (called DIE or something else) that is defined to produce an internal error even in Release builds.
That is different from the rest of LLVM - it's a way you would like the LLVM project to work, but it's not the way it works at the moment & flang was accepted into the LLVM project with an explicit understanding that this part of the LLVM style would be adhered to.
I object to this particular change because whoever wrote the code chose to use DIE, which has well-defined behavior in Release builds and you haven't given a good reason (IMO) to override that decision and make it undefined. The reason for the change seems to be that someone new to flang might encounter DIE and not know at first what it does (!?); the same would seem to be true of thousands of other names they would encounter.
It represents a distinct philosophy about error handling that isn't one shared by the LLVM project (as described in the LLVM coding standards and programmers guide). I appreciate different folks have different takes on this - but the LLVM project is pretty clear on this in its documentation, and in communications with the flang community prior to accepting the project.
flang is not different. I think a developer in any part of the project should have the option of using a version of llvm_unreachable (called DIE or something else) that is defined to produce an internal error even in Release builds.
This is not without bound: a developer has the option to use it *withing the guidelines*. Here is just isn't in line with the LLVM coding standard and the LLVM usual use of asserts/unreachable.
I object to this particular change because whoever wrote the code chose to use DIE, which has well-defined behavior in Release builds and you haven't given a good reason (IMO) to override that decision and make it undefined.
The reason for the change seems to be that someone new to flang might encounter DIE and not know at first what it does (!?); the same would seem to be true of thousands of other names they would encounter.
The reason is not the name, otherwise we would rename it as report_fatal_error which should provide the same functionality as DIE. The reason it is the policy given above (link to coding standard on assert) + the fact that this was a condition for flang to merge in LLVM as I pointed out above: the ticket was part of the dashboard of things that will be fixed in-tree after landing.
You can't use the fact that DIE was used before flang was in merged in LLVM to prevent updating it, while this was identified before merge as something that would need to change.
I'm clearly +1 with David here:
It represents a distinct philosophy about error handling that isn't one shared by the LLVM project (as described in the LLVM coding standards and programmers guide). I appreciate different folks have different takes on this - but the LLVM project is pretty clear on this in its documentation, and in communications with the flang community prior to accepting the project.
This is neither what I said nor what I meant. My argument about consistency is that someone working in the LLVM project and who has read and is familiar with the LLVM Coding Guidelines should feel free to apply those guidelines when crossing between subprojects. That would mean that the use of assert and llvm_unreachable as is strongly stated in those guidelines, regardless of whether this subproject thinks that debug-only assertions are acceptable or not. It's entirely possible that someone working in a file flang may have never seen the DIE macro and it may not be present in the file in which they're working. Should they still have to know it exists in this case?
It seems clear to me that if this change I'm proposing (which, might I reiterate, only actually introduces llvm_unreachable in cases that are either demonstrably not reachable or are stated to not be reachable in comments) is rejected on these grounds then any patch containing llvm_unreachable is likely to be rejected for the same reason; especially if it's being used in a case that is believed to be, but isn't provably, unreachable (which would still be in line with the LLVM Coding Guidelines suggested usage).
The argument that "this hasn't happened yet so it isn't a problem" is not one I think has any relevance if we actually intend to increase the number of people working on this subproject.
Summary of discussion in F18 developers call on 15/06/20
- Flang’s current use of asserts in these specific cases (‘provably’ unreachable code not possible to hit from user error) clearly doesn’t match the LLVM guidelines. This patch brings it in line in that particular regard.
- The arguments against this patch going in would need to either:
- Argue a specific technical reason why Flang should not follow the guidelines for cases like this
- Argue that a change in LLVM coding guidelines is needed so that Flang becomes compliant.
- No argument has been put forward of these types so the patch should be submitted to align the code to the guidelines
- This does not block either of the two above arguments being made in future.
I said this on today's call but wanted to also add a note here.
We should split this into two separate discussions. Both points here are valid concerns but I would propose we separate implementation practices from the requirements.
I think we should follow the established LLVM practices here. From this perspective this patch should proceed.
However, we need to make certain it is possible to build a release of the compiler that enables the execution of internal checks (even at the cost of some performance) as a build option. I will go so far as to suggest that this should be a requirement as it matches many uses cases that I'm aware of (as also pointed out by @hfinkel). I'm not certain how far away we are from supporting this across clang, flang, mlir, llvm, etc. but the LLVM_ENABLE_ASSERTIONS path seems like at least a place to start.
I fully support having a requirement that LLVM should support being shipped in Release with LLVM_ENABLE_ASSERTIONS configurations as a first class citizen, and would support an RFC on this on the LLVM mailing list. As a part of this I think we should consider cases where particularly expensive checks or ABI-incompatible changes are hidden behind NDEBUG as bugs that need fixing.
FYI the Release+Assert config is well supported: this my default local config and this the one used by many bots already.
As others pointed out, this build configuration is fully supported, many bots use it, many developers use it every day. However, there's a difference between "you can build it this way if you want" and "this is what we package and deliver in our releases." If the flang project is packaging its own releases, it can arrange to build LLVM as Release+Asserts as well as the flang front-end project. I believe that is *not* how the standard Clang release packaging works, however.
As a part of this I think we should consider cases where particularly expensive checks or ABI-incompatible changes are hidden behind NDEBUG as bugs that need fixing.
ABI-breaking changes are supposed to be under a separate configuration flag, LLVM_ABI_BREAKING_CHECKS. "Expensive" checks similarly have their own flag, LLVM_ENABLE_EXPENSIVE_CHECKS (and there is at least one bot enabling that flag).
So, it is already the case that if such things are merely under NDEBUG then those are bugs that need fixing.