This is an archive of the discontinued LLVM Phabricator instance.

[CMake] Make omitting CMAKE_BUILD_TYPE an error
ClosedPublic

Authored by thieta on Apr 21 2022, 12:56 AM.

Details

Summary

After a lot of discussion in this diff the consensus was that it is really hard to guess the users intention with their LLVM build. Instead of trying to guess if Debug or Release is the correct default option we opted for just not specifying CMAKE_BUILD_TYPE a error.

Discussion on discourse here:
https://discourse.llvm.org/t/rfc-select-a-better-linker-by-default-or-warn-about-using-bfd

Diff Detail

Event Timeline

thieta created this revision.Apr 21 2022, 12:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 12:56 AM
Herald added a subscriber: mgorny. · View Herald Transcript
thieta requested review of this revision.Apr 21 2022, 12:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 12:56 AM
thieta updated this revision to Diff 424117.Apr 21 2022, 12:57 AM

Add release notes

thieta retitled this revision from [CMake] Change default CMAKE_BULID_TYPE to Release to [CMake] Change default CMAKE_BUILD_TYPE to Release.Apr 21 2022, 12:58 AM
thieta updated this revision to Diff 424118.Apr 21 2022, 12:58 AM

Remove spelling mistake

fhahn added a subscriber: fhahn.Apr 21 2022, 1:10 AM

I think if Release is the new default we should also enable assertions by default, as this should be closer to the Debug configuration and at least ensure that -debug works.

This makes sense but I think the main justification is that many projects do this and therefore defaulting to Release more likely meets users' expectation.

E.g. tensorflow lite has something like

if(NOT CMAKE_BUILD_TYPE)
  message(STATUS "Setting build type to Release, for debug builds use"
    "'-DCMAKE_BUILD_TYPE=Debug'.")
  set(CMAKE_BUILD_TYPE "Release")
endif()

I think if Release is the new default we should also enable assertions by default, as this should be closer to the Debug configuration and at least ensure that -debug works.

+1 For keeping assertions enabled by default.

hans added a comment.Apr 21 2022, 4:06 AM

I think if Release is the new default we should also enable assertions by default, as this should be closer to the Debug configuration and at least ensure that -debug works.

I think this is a good point.

However I don't think we want to change the behaviour for "cmake -DCMAKE_BUILD_TYPE=Release ..." which today has asserts disabled, and which I presume people use for their release builds.

Enabling asserts when the user doesn't set CMAKE_BUILD_TYPE explicitly sounds good to me though.

I think if Release is the new default we should also enable assertions by default, as this should be closer to the Debug configuration and at least ensure that -debug works.

I think this is a good point.

However I don't think we want to change the behaviour for "cmake -DCMAKE_BUILD_TYPE=Release ..." which today has asserts disabled, and which I presume people use for their release builds.

Enabling asserts when the user doesn't set CMAKE_BUILD_TYPE explicitly sounds good to me though.

I see two options:

  • We can flip the default for LLVM_ENABLE_ASSERTIONS to ON instead of OFF - this is easy and uncomplicated
  • We can flip it to ON if you don't specify CMAKE_BUILD_TYPE and OFF if you pass Release and ON when passing debug. But this seems a bit ... convoluted? Is that really what end users expect?

I think I prefer to set it to ON by default and then have people specify it if they want to change it to OFF.

hans added a comment.Apr 21 2022, 4:44 AM

I think I prefer to set it to ON by default and then have people specify it if they want to change it to OFF.

I fear that this would silently regress the performance of release builds for some people.

  • We can flip it to ON if you don't specify CMAKE_BUILD_TYPE and OFF if you pass Release and ON when passing debug. But this seems a bit ... convoluted? Is that really what end users expect?

Actually I think that's exactly how it works today :-)

I see your point about the logic getting convoluted, but I don't think it'd be too bad, especially since we're printing a message about it anyway. It could be "No build type selected, default to Release with asserts enabled".

Are there any numbers as to how much slower an assertion build in Release mode is? I can see some benefits in having assertions turned on if no build type is specified:

  • Assertion failures have nicer error messages in bug reports than segmentation faults caused by violated pre-conditions
  • It could potentially catch miscompiles or similar that would otherwise go unnoticed

I definitely am against having it default to on when a user specified the build type to be release explicitly however.

Are there any numbers as to how much slower an assertion build in Release mode is? I can see some benefits in having assertions turned on if no build type is specified:

  • Assertion failures have nicer error messages in bug reports than segmentation faults caused by violated pre-conditions
  • It could potentially catch miscompiles or similar that would otherwise go unnoticed

Yes, when using a random git build of llvm, enabling assertions is kinda important as bugs otherwise are much harder to notice. If a user builds a final release, it's less important though.

Anecdotally, I measured the slowdown incurred by assertions in a release build a couple years ago; it was in the order of magnitude of 30% slower than a regular Release build. I.e. measurably and noticably slower, but still tolerably slower given that you're running a bleeding edge, possibly broken compiler.

I definitely am against having it default to on when a user specified the build type to be release explicitly however.

Indeed, that would break dozens of distribution packagers who just do a cmake -DCMAKE_BUILD_TYPE=Release build without explicitly setting anything about assertions.

thieta updated this revision to Diff 424169.Apr 21 2022, 5:56 AM

Enable assertions by default in no CMAKE_BUILD_TYPE is specified

Here is an attempt at the logic described above - here is what it does now:

➜ cmake -GNinja ../llvm > /dev/null && grep LLVM_ENABLE_ASSERTIONS CMakeCache.txt && rm CMakeCache.txt
LLVM_ENABLE_ASSERTIONS:BOOL=ON
➜ cmake -GNinja -DCMAKE_BUILD_TYPE=Release ../llvm > /dev/null && grep LLVM_ENABLE_ASSERTIONS CMakeCache.txt && rm CMakeCache.txt
<nothing here means OFF>
➜ cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON ../llvm > /dev/null && grep LLVM_ENABLE_ASSERTIONS CMakeCache.txt && rm CMakeCache.txt
LLVM_ENABLE_ASSERTIONS:INTERNAL=ON
➜ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug ../llvm > /dev/null && grep LLVM_ENABLE_ASSERTIONS CMakeCache.txt && rm CMakeCache.txt
LLVM_ENABLE_ASSERTIONS:BOOL=ON

Does that look right?

jhenderson added inline comments.Apr 21 2022, 8:02 AM
llvm/docs/ReleaseNotes.rst
72–74

Probably should mention the assertion situation here too, as people won't expect the defaulted Release build to include assertions (because Release builds don't usually have them!)

thieta updated this revision to Diff 424212.Apr 21 2022, 8:48 AM

Added release note about assertions.

thieta marked an inline comment as done.Apr 21 2022, 8:48 AM

Makes sense to me, thanks for proposing and working on this!

Would you be able to update relevant docs as well? For example, from Getting Started with the LLVM System:

-DCMAKE_BUILD_TYPE=type --- Valid options for type are Debug, Release, RelWithDebInfo, and MinSizeRel. Default is Debug.

Not sure about LLVM Testing Infrastructure Guide:

% cmake -DCMAKE_BUILD_TYPE="Release" -DLLVM_ENABLE_ASSERTIONS=On

Thanks!

I don't understand the rational about turning assertions on. To me changing the default is about making it easier for users who just want to use LLVM to get the right version running, while assertions are targeting developers.

If we can't resolve the targeted audience, what about erroring out when CMAKE_BUILD_TYPE isn't defined?

$ cmake ../llvm
Error: please specify a build type:
 -DCMAKE_BUILD_TYPE=Debug  # Unoptimized, with debug info (large binaries)
 -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON  # Optimized, no debug info, with assertions
 -DCMAKE_BUILD_TYPE=Release  # Optimized, no debug info, no assertions
MaskRay added a comment.EditedApr 21 2022, 9:17 AM

I don't understand the rational about turning assertions on. To me changing the default is about making it easier for users who just want to use LLVM to get the right version running, while assertions are targeting developers.

If we can't resolve the targeted audience, what about erroring out when CMAKE_BUILD_TYPE isn't defined?

$ cmake ../llvm
Error: please specify a build type:
 -DCMAKE_BUILD_TYPE=Debug  # Unoptimized, with debug info (large binaries)
 -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON  # Optimized, no debug info, with assertions
 -DCMAKE_BUILD_TYPE=Release  # Optimized, no debug info, no assertions

I agree that we should not turn assertions on when CMAKE_BUILD_TYPE is unspecified.

My rationale is more about: we decide to let empty CMAKE_BUILD_TYPE behave like Release, I hope it looks like just CMAKE_BUILD_TYPE=Release. Adding more magic behind it may be adding confusion.

  • We can flip it to ON if you don't specify CMAKE_BUILD_TYPE and OFF if you pass Release and ON when passing debug. But this seems a bit ... convoluted? Is that really what end users expect?

Actually I think that's exactly how it works today :-)

I see your point about the logic getting convoluted, but I don't think it'd be too bad, especially since we're printing a message about it anyway. It could be "No build type selected, default to Release with asserts enabled".

Agreed. CMake's default compiler flags for CMAKE_BUILD_TYPE=Release are -O3 -DNDEBUG, so one would assume that this already disables assertions. If we want to default to Release but leave assertions on we're kind of subverting user expectations of how these build profiles behave.

If we can't resolve the targeted audience, what about erroring out when CMAKE_BUILD_TYPE isn't defined?

Sounds reasonable to me. Or at least print a warning. But I don't think there is a good default, simply because we don't have good insight into why people compile LLVM and what they would want. And even if we had, this change shows that defaults can change.

Not sure if that could cause issues for multi-configuration users (if that's at all possible in LLVM), maybe @aaron.ballman can chime in (assuming that you use the Visual Studio generator)?

llvm/CMakeLists.txt
67–75

Can we make this a warning? I don't think users should rely on the default being whatever it currently is.

mstorsjo added inline comments.Apr 21 2022, 12:50 PM
llvm/CMakeLists.txt
67–75

Should we also mention how to disable the assertions in the same sentence?

I don't understand the rational about turning assertions on. To me changing the default is about making it easier for users who just want to use LLVM to get the right version running, while assertions are targeting developers.

If we can't resolve the targeted audience, what about erroring out when CMAKE_BUILD_TYPE isn't defined?

$ cmake ../llvm
Error: please specify a build type:
 -DCMAKE_BUILD_TYPE=Debug  # Unoptimized, with debug info (large binaries)
 -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON  # Optimized, no debug info, with assertions
 -DCMAKE_BUILD_TYPE=Release  # Optimized, no debug info, no assertions

That sounds reasonable to me too, and avoids surprising inconsistencies.

We could also point to https://llvm.org/docs/CMake.html if no CMAKE_BUILD_TYPE is given. Depending on the use case (e.g. debug info might actually be wanted) other flags might be helpful like BUILD_SHARED_LIBS or LLVM_USE_SPLIT_DWARF. Or if you're working on frontend topics, set LLVM_TARGETS_TO_BUILD=Native. Some of these have a greater effect on build time/size than CMAKE_BUILD_TYPE=Release in my experience, and the page lists these options quite nicely.

I disagree about erroring out when CMAKE_BUILD_TYPE is undefined. We want to make the build easy for new users, so I think it's important that the default build (cmake with no arguments) just works.

I also think the default should be release build with asserts disabled.

I disagree about erroring out when CMAKE_BUILD_TYPE is undefined. We want to make the build easy for new users, so I think it's important that the default build (cmake with no arguments) just works.

Why isn’t it easy to copy/paste one option clearly presented by the cmake invocation? That seems like a good user experience to me: there is a fairly simple choice to be made and we expose it concisely and directly, it’s not like there is an obscure error and the user needs to read a doc…
It’s not like every command line tool works without mandatory options.

I also think the default should be release build with asserts disabled.

I’d be fine with this though.

I disagree about erroring out when CMAKE_BUILD_TYPE is undefined. We want to make the build easy for new users, so I think it's important that the default build (cmake with no arguments) just works.

Why isn’t it easy to copy/paste one option clearly presented by the cmake invocation? That seems like a good user experience to me: there is a fairly simple choice to be made and we expose it concisely and directly, it’s not like there is an obscure error and the user needs to read a doc…
It’s not like every command line tool works without mandatory options.

My experience tells me that if we present 3 options to choose from, many users will just randomly pick one without thinking about it, and we'll still get bug reports about debug builds using too much memory. To me it makes the most sense to just default to Release builds with no asserts, this is the easiest to implement and it will give new users the best experience in my opinion.

keith added a subscriber: keith.Apr 21 2022, 9:33 PM

If we can't resolve the targeted audience, what about erroring out when CMAKE_BUILD_TYPE isn't defined?

Not sure if that could cause issues for multi-configuration users (if that's at all possible in LLVM), maybe @aaron.ballman can chime in (assuming that you use the Visual Studio generator)?

I'd have to test against the patch that turns it into an error to know for sure. I predominately use the MSVC built-in cmake support these days, and that has a separate way to set CMAKE_BUILD_TYPE that always defaults to *something* as best I can tell (it seems to set it to Debug, but I don't know if that's because of our CMake settings or not). So I wouldn't think this would be an issue if we went this route.

Personally, I think the default depends entirely on who is doing the download and from where. If people are doing a git clone of the main branch, I think they're more often developers and would want a debug build and switching to release will be a burden/surprise/annoyance. If they're snagging source from the release downloads page, they're probably not doing active development with that and would want a release build of some kind and the current behavior is a burden/surprise/annoyance. Without some measurements of how users are getting to the source in the first place, I don't think we can reasonably pick a default for all situations and call it better than the current one. It'll just be different.

Personally, I think the default depends entirely on who is doing the download and from where. If people are doing a git clone of the main branch, I think they're more often developers and would want a debug build and switching to release will be a burden/surprise/annoyance. If they're snagging source from the release downloads page, they're probably not doing active development with that and would want a release build of some kind and the current behavior is a burden/surprise/annoyance. Without some measurements of how users are getting to the source in the first place, I don't think we can reasonably pick a default for all situations and call it better than the current one. It'll just be different.

I don't think this is 100% true. The downside for a user doing a debug build by default is that they very often run out of RAM and diskspace because the debug build is bigger and more RAM hungry when not using a decent linker.

The downside for a developer cloning and building a release build is often not utter confusion and sending questions in discord / discourse, but just rebuild with debug turned on.

I agree that it might be hard to guess "right" but the consequences are different and more confusing for people that are just getting started with LLVM.

Personally, I think the default depends entirely on who is doing the download and from where. If people are doing a git clone of the main branch, I think they're more often developers and would want a debug build and switching to release will be a burden/surprise/annoyance. If they're snagging source from the release downloads page, they're probably not doing active development with that and would want a release build of some kind and the current behavior is a burden/surprise/annoyance. Without some measurements of how users are getting to the source in the first place, I don't think we can reasonably pick a default for all situations and call it better than the current one. It'll just be different.

I don't think this is 100% true. The downside for a user doing a debug build by default is that they very often run out of RAM and diskspace because the debug build is bigger and more RAM hungry when not using a decent linker.

The downside for a developer cloning and building a release build is often not utter confusion and sending questions in discord / discourse, but just rebuild with debug turned on.

I think you're hand-waving away the burden this adds to folks new to the community. The downside to the developer situation is that they waste a few hours doing a build that they can't debug, then they have to track down someone willing to tell them what they've done wrong, and then they have to do another build that will take even longer. Each one of these hurdles is a place where people can get frustrated and give up.

I agree that it might be hard to guess "right" but the consequences are different and more confusing for people that are just getting started with LLVM.

I think both of the situations are basically the same with basically the same outcomes -- if the user picks the "correct" build target (either through the default or explicitly), they're happy, if not, they're unhappy.

I think both of the situations are basically the same with basically the same outcomes -- if the user picks the "correct" build target (either through the default or explicitly), they're happy, if not, they're unhappy.

That's a fair point.

I actually think that erroring out if you don't specify a CMAKE_BUILD_TYPE is not a bad alternative. If we write a good error message describing the different options in user-friendly terms I think it might be the option that's the least confusing for new users.

The only real downside to it I can see right now is that I haven't seen many other projects behave in this way - so it would set LLVM apart, but on the other hand - as you point out: LLVM is big and building the "wrong" configuration has bigger ramification than for example "zlib".

My experience tells me that if we present 3 options to choose from, many users will just randomly pick one without thinking about it, and we'll still get bug reports about debug builds using too much memory.

But @mehdi_amini's suggestion was to print -DCMAKE_BUILD_TYPE=Debug # Unoptimized, with debug info (large binaries), so they made a choice and knew what it would entail.

To me it makes the most sense to just default to Release builds with no asserts, this is the easiest to implement and it will give new users the best experience in my opinion.

Nobody has any doubts that Release builds provide the best experience for users, but wouldn't most users simply take prebuilt binaries instead of cloning a 1G repository and running a build that can take a good hour or even more, depending on how much you want to build? Maybe most people building their own LLVM want to investigate a bug or develop a new feature?

Chances are that changing the defaults will just make a different set of people mad, disappointed that they had to rebuild with debug info or that they don't see assertions by default.

My experience tells me that if we present 3 options to choose from, many users will just randomly pick one without thinking about it, and we'll still get bug reports about debug builds using too much memory.

But @mehdi_amini's suggestion was to print -DCMAKE_BUILD_TYPE=Debug # Unoptimized, with debug info (large binaries), so they made a choice and knew what it would entail.

To me it makes the most sense to just default to Release builds with no asserts, this is the easiest to implement and it will give new users the best experience in my opinion.

Nobody has any doubts that Release builds provide the best experience for users, but wouldn't most users simply take prebuilt binaries instead of cloning a 1G repository and running a build that can take a good hour or even more, depending on how much you want to build? Maybe most people building their own LLVM want to investigate a bug or develop a new feature?

Chances are that changing the defaults will just make a different set of people mad, disappointed that they had to rebuild with debug info or that they don't see assertions by default.

+1 to this

Rather than changing the default and annoying or confusing people, improve the documentation! Have cmake print a link to the awesome documentation.

Rather than changing the default and annoying or confusing people, improve the documentation! Have cmake print a link to the awesome documentation.

Are you suggesting that it should print the link to the documentation if there is no build_type specified or do you have another suggestion when the link should be printed?

While I always think the documentation could be improved I am not sure just improving documentation is the solution here, unless we pair it with erroring out if no explicit build_type is set. It won't be discovered otherwise.

The target audience for the default configuration should be people who are new to the project, it could be users or it could be developers. It also could be people with very little experience programming or building from code from source. We want to keep the barrier for entry as low as possible for this group of people. We can have the best documentation or have verbose error messages to help guide them in the right direction, but the best way to communicate to new users about how to build the project is to have a sensible default configuration.

I also don't think we should worry about people who complain about the default build configuration. If someone has enough understanding of the project to know that the default configuration is not the best for them, then they have enough understanding to change the configuration. The defaults should be designed for people who may not (yet) know anything about build types, debug info, assertions, and who just want to get started using and participating in the project.

keith removed a subscriber: keith.Apr 22 2022, 9:26 AM

The target audience for the default configuration should be people who are new to the project, it could be users or it could be developers. It also could be people with very little experience programming or building from code from source. We want to keep the barrier for entry as low as possible for this group of people. We can have the best documentation or have verbose error messages to help guide them in the right direction, but the best way to communicate to new users about how to build the project is to have a sensible default configuration.

I agree with almost every point, right up until "have a sensible default configuration"; I don't think it's possible to have a default configuration that is sensible to both users and developers. Well, I take that back. I think it *is* possible, but I think we would have to invest significant effort in improving the LLVM compilation performance issues so that you don't need a super computer to build the project. My understanding is *that* is what's most off-putting to new folks, whether they're users or developers, and that's somewhat (but not entirely) orthogonal to the default build mode.

I think the current build mode is *defensible* (as is switching to release instead of debug by default), which is about the best we'll get.

I also don't think we should worry about people who complain about the default build configuration. If someone has enough understanding of the project to know that the default configuration is not the best for them, then they have enough understanding to change the configuration. The defaults should be designed for people who may not (yet) know anything about build types, debug info, assertions, and who just want to get started using and participating in the project.

Strong +1.

My recommendation is: leave the default build type alone; there's existing documentation, blog posts, stack overflow questions, etc that assume the default build type as it is today, so let's not invalidate that (which would likely add to confusion for folks new to the project hunting around for answers in their favorite search engine). Instead, let's improve our own documentation as best we can under the assumption that the most likely place a confused user will find an answer to build configuration questions will be our own docs.

The target audience for the default configuration should be people who are new to the project, it could be users or it could be developers. It also could be people with very little experience programming or building from code from source. We want to keep the barrier for entry as low as possible for this group of people. We can have the best documentation or have verbose error messages to help guide them in the right direction, but the best way to communicate to new users about how to build the project is to have a sensible default configuration.

I agree with almost every point, right up until "have a sensible default configuration"; I don't think it's possible to have a default configuration that is sensible to both users and developers. Well, I take that back. I think it *is* possible, but I think we would have to invest significant effort in improving the LLVM compilation performance issues so that you don't need a super computer to build the project. My understanding is *that* is what's most off-putting to new folks, whether they're users or developers, and that's somewhat (but not entirely) orthogonal to the default build mode.

I don't see it as choosing between users and developers, I think new developers and new users approach the codebase in the same way. Step 1: Build it. Step 2: Figure out what I can do with it. Step 1 is the same for both groups and that's why I think it's important to have the default configuration be the one that makes building the project the easiest.

The target audience for the default configuration should be people who are new to the project, it could be users or it could be developers. It also could be people with very little experience programming or building from code from source. We want to keep the barrier for entry as low as possible for this group of people. We can have the best documentation or have verbose error messages to help guide them in the right direction, but the best way to communicate to new users about how to build the project is to have a sensible default configuration.

I agree with almost every point, right up until "have a sensible default configuration"; I don't think it's possible to have a default configuration that is sensible to both users and developers. Well, I take that back. I think it *is* possible, but I think we would have to invest significant effort in improving the LLVM compilation performance issues so that you don't need a super computer to build the project. My understanding is *that* is what's most off-putting to new folks, whether they're users or developers, and that's somewhat (but not entirely) orthogonal to the default build mode.

I don't see it as choosing between users and developers, I think new developers and new users approach the codebase in the same way. Step 1: Build it. Step 2: Figure out what I can do with it. Step 1 is the same for both groups and that's why I think it's important to have the default configuration be the one that makes building the project the easiest.

Step 0 for a significant number of people is "be told by your boss to go fix/implement something in the project", so I'm not certain I agree that Step 1 is the same for both groups in practice. I think it's highly dependent on who is doing the downloading, and I don't think we'll be able to pick a default that's "right" without data as to who is downloading and why (which would be very difficult data to collect, I think). That's why I'm not seeing a strong enough motivation for this change -- we don't know what the impacts will be in practice, but we know there will be impacts.

We have some pieces of data though: the number of people who complain about the build filling up their disk and the link taking forever (or OOM). On the other hand the people who "know" are already setting up an explicit value...
That said I tend to think that the audience is heterogeneous enough that I agree it seems hard to make sure we "pick a default that's "right" " in the absolute (hence my proposal to not pick any).

@aaron.ballman There's some people that can't build in debug mode at all because there machine is not powerful enough. For me, this is enough to disqualify it from being the default, because it's creating a barrier of entry for new users/developers.

We have some pieces of data though: the number of people who complain about the build filling up their disk and the link taking forever (or OOM). On the other hand the people who "know" are already setting up an explicit value...
That said I tend to think that the audience is heterogeneous enough that I agree it seems hard to make sure we "pick a default that's "right" " in the absolute (hence my proposal to not pick any).

I think we can pick a 'right' default, we just need to define 'right' differently to make this possible. :) I think the 'right' default is the one that is easiest to build, which is the release build. It takes less resources (CPU, memory, and disk space) and takes less time to build.

@aaron.ballman There's some people that can't build in debug mode at all because there machine is not powerful enough. For me, this is enough to disqualify it from being the default, because it's creating a barrier of entry for new users/developers.

The same is true for making release builds; you can OOM a machine trivially when linking. To me, the barrier to entry isn't that the default build mode is wrong, it's that we've not done a good job of making our project buildable without a build farm or super computer. I think changing the default build mode to release is a bit like using a teaspoon to bail out a leaking boat; it's incrementally better than nothing, but it's really not addressing the problem or likely to make a significant difference.

@aaron.ballman There's some people that can't build in debug mode at all because there machine is not powerful enough. For me, this is enough to disqualify it from being the default, because it's creating a barrier of entry for new users/developers.

The same is true for making release builds; you can OOM a machine trivially when linking. To me, the barrier to entry isn't that the default build mode is wrong, it's that we've not done a good job of making our project buildable without a build farm or super computer. I think changing the default build mode to release is a bit like using a teaspoon to bail out a leaking boat; it's incrementally better than nothing, but it's really not addressing the problem or likely to make a significant difference.

I think we pretty much agree on everything here, except I do think build mode will actually help reduce this problem a lot.

I think we pretty much agree on everything here, except I do think build mode will actually help reduce this problem a lot.

And part of the reason I think this is because I spent way too much time trying to get this debug build configuration[1] running on the 2 CPU, 7 GB runners provided by GitHub, and getting the release build configurations working was very easy.

[1] https://github.com/llvm/llvm-project/blob/release/14.x/.github/workflows/llvm-tests.yml#L116

@aaron.ballman There's some people that can't build in debug mode at all because there machine is not powerful enough. For me, this is enough to disqualify it from being the default, because it's creating a barrier of entry for new users/developers.

The same is true for making release builds; you can OOM a machine trivially when linking. To me, the barrier to entry isn't that the default build mode is wrong, it's that we've not done a good job of making our project buildable without a build farm or super computer. I think changing the default build mode to release is a bit like using a teaspoon to bail out a leaking boat; it's incrementally better than nothing, but it's really not addressing the problem or likely to make a significant difference.

I think we pretty much agree on everything here, except I do think build mode will actually help reduce this problem a lot.

I won't block the change because I don't have any actual data to refute a claim that this is an improvement, but I continue to think it's rearranging deck chairs on the Titanic and will cause more problems than it solves.

I can also confirm that a Release build is a LOT easier to build than a debug build.

  • Linker OOMs are mainly caused by large object files, both due to code size of unoptimized code and especially due to the extra debug info. Hence the first recommendation in any of the forums when people show a linker OOM is to make sure they're using a release build, if that is what they intended, and to use LLD or Gold (that is a separate topic however)
  • A debug build on Windows using MSVC can take around 70 to roughly 100 GB of storage space, which are always surprising and may also lead to out of disk space errors. Release builds are comparatively just a few Gigabytes. I believe they are also multiply tens of Gigabytes on other OSs
  • Due to object file limits it is actually impossible to build Clang in debug mode when using a MinGW toolchain without building shared libs as the final linked clang executable is larger than 4GB.

I sadly don't currently have access to my Windows workstation where I actually build some of these configurations to have the precise numbers, but I think others will be able to confirm that they've had similar experiences.

While it is true that LLVM does require somewhat of a supercomputer to build, a release build is somewhat doable for the average user. My ultrabook that I use for when I am out and about has only 8 GB of RAM, which is absolutely enough to build a release build of LLVM on Fedora 35 without ever having RAM OOM issues or having to tweak any settings like job count.

This change is going to inconvenience A group, and I think that if we decide we want a 'default', there is value in discussing who we care the most about/who is affected 'the most'. I would posit that a vast majority of 'users' of our compiler don't use git + cmake + build, they download a release from their distro, download it from the release page, or pop up their example in godbolt. I would also posit that a vast majority of the folks that do git + cmake + build are developers, who I would assume want Debug or Release+Asserts.

I don't understand the rational about turning assertions on. To me changing the default is about making it easier for users who just want to use LLVM to get the right version running, while assertions are targeting developers.

If we can't resolve the targeted audience, what about erroring out when CMAKE_BUILD_TYPE isn't defined?

$ cmake ../llvm
Error: please specify a build type:
 -DCMAKE_BUILD_TYPE=Debug  # Unoptimized, with debug info (large binaries)
 -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON  # Optimized, no debug info, with assertions
 -DCMAKE_BUILD_TYPE=Release  # Optimized, no debug info, no assertions

I like the idea of erroring with NO options. While this inconveniences EVERYONE, it does so "only a little bit". With a default, we are going to inconvenience 1 group "a ton" (that is, they only discover their mistake after a time-consuming build vs 'right away'). So I think this is, by all intents, the least harm overall (assuming you believe the 'git + cmake + build' for USE group exists in any meaningful count).

I think both of the situations are basically the same with basically the same outcomes -- if the user picks the "correct" build target (either through the default or explicitly), they're happy, if not, they're unhappy.

That's a fair point.

I actually think that erroring out if you don't specify a CMAKE_BUILD_TYPE is not a bad alternative. If we write a good error message describing the different options in user-friendly terms I think it might be the option that's the least confusing for new users.

The only real downside to it I can see right now is that I haven't seen many other projects behave in this way - so it would set LLVM apart, but on the other hand - as you point out: LLVM is big and building the "wrong" configuration has bigger ramification than for example "zlib".

Agreed completely. Being 'slightly different' in our CMake invocation is completely defensible in this situation. For someone on a reasonable 'home' machine, LLVM takes literal hours to build. Wasting that much time to just discover I have the wrong build config would be 1^SIZE_MAX worse than 20 seconds of cmake to tell me that.

We have some pieces of data though: the number of people who complain about the build filling up their disk and the link taking forever (or OOM). On the other hand the people who "know" are already setting up an explicit value...
That said I tend to think that the audience is heterogeneous enough that I agree it seems hard to make sure we "pick a default that's "right" " in the absolute (hence my proposal to not pick any).

I think we can pick a 'right' default, we just need to define 'right' differently to make this possible. :) I think the 'right' default is the one that is easiest to build, which is the release build. It takes less resources (CPU, memory, and disk space) and takes less time to build.

If you want a 'right' default, it has to be 'the one that most users of cmake WANT'. I posit that is 'default' or 'release with debug info + asserts', since MOST direct users of our cmake scripts are going to be developers. I'd find 'no default' to be a vast improvement over 'give me the wrong one'.

If you want a 'right' default, it has to be 'the one that most users of cmake WANT'. I posit that is 'default' or 'release with debug info + asserts', since MOST direct users of our cmake scripts are going to be developers. I'd find 'no default' to be a vast improvement over 'give me the wrong one'.

I think the default should be for people who don't have the experience or the knowledge (yet) to know which configuration works best for them or even how to select a different one. Once people have enough knowledge to know what configuration they want, then they can pass those options to CMake and the default is no longer relevant to them. Along the same lines, the reason I don't like having no default, is you are forcing someone new to the project to pick a configuration when they may not have any idea which one is best for them. It's especially an issue when one of the options (Debug build) has a good chance to not be buildable on their system.

If you want a 'right' default, it has to be 'the one that most users of cmake WANT'. I posit that is 'default' or 'release with debug info + asserts', since MOST direct users of our cmake scripts are going to be developers. I'd find 'no default' to be a vast improvement over 'give me the wrong one'.

I think the default should be for people who don't have the experience or the knowledge (yet) to know which configuration works best for them or even how to select a different one.

Right, and that includes 'new' developers trying to develop for the 1st time on LLVM, who, I again, posit WANT a debug build.

Once people have enough knowledge to know what configuration they want, then they can pass those options to CMake and the default is no longer relevant to them. Along the same lines, the reason I don't like having no default, is you are forcing someone new to the project to pick a configuration when they may not have any idea which one is best for them. It's especially an issue when one of the options (Debug build) has a good chance to not be buildable on their system.

Your concern is 'fixed' by the documentation/diagnostic message being sufficiently detailed I believe. We're 'forcing' them to identify whether they want to 'use' the compiler, or 'develop' on the compiler. The 'right' answer is opposite for both. As long as you assert we have BOTH groups in any significance, there is no valid default.

As far as:

is you are forcing someone new to the project to pick a configuration when they may not have any idea which one is best for them.

Our users are unique in that they are likely developers/technical themselves. If they don't know the general differences in use cases between a release and a debug build, I'm not sure picking a default will help them so much as put them in even more pain when they discover we picked wrong.

For what it's worth, I'm somewhat surprised reading this discussion -- I was under the impression that developers generally use Release+Asserts builds rather than Debug builds. Apart from being slow to build, debug builds also make running tests prohibitively slow. Personally, while I do make use of LLVM Debug builds from time to time, I'd consider this to be a specialized configuration, similar to sanitizer or fuzzer instrumented builds. It's not a configuration I would recommend for routine development, and certainly not something I would recommend to new developers.

For what it's worth, I'm somewhat surprised reading this discussion -- I was under the impression that developers generally use Release+Asserts builds rather than Debug builds. Apart from being slow to build, debug builds also make running tests prohibitively slow. Personally, while I do make use of LLVM Debug builds from time to time, I'd consider this to be a specialized configuration, similar to sanitizer or fuzzer instrumented builds. It's not a configuration I would recommend for routine development, and certainly not something I would recommend to new developers.

In the CFE, my team has found that the overhead of build & test running is worth it, as Clang ends up being near-undebuggable otherwise. We DEFINITELY use it for routine dev across my organization. We are punished a little LESS in that we work on 12 core + server hardware, but we DO actually use gold or lld thanks to the OOM issue.

For what it's worth, I'm somewhat surprised reading this discussion -- I was under the impression that developers generally use Release+Asserts builds rather than Debug builds. Apart from being slow to build, debug builds also make running tests prohibitively slow. Personally, while I do make use of LLVM Debug builds from time to time, I'd consider this to be a specialized configuration, similar to sanitizer or fuzzer instrumented builds. It's not a configuration I would recommend for routine development, and certainly not something I would recommend to new developers.

FWIW, my experience has been to exclusively use debug builds when working on Clang. RelWithDebInfo is not up to the task of daily development work, and I don't think I've ever made a Release + Asserts built except accidentally. Also, because I primarily use Visual Studio, debug builds have a *huge* benefit of enabling debug support in the STL, which catches very real issues with invalid iterators or other things that are not caught in release builds (even with asserts enabled).

  • Linker OOMs are mainly caused by large object files, both due to code size of unoptimized code and especially due to the extra debug info. Hence the first recommendation in any of the forums when people show a linker OOM is to make sure they're using a release build, if that is what they intended, and to use LLD or Gold (that is a separate topic however)

Dropping debug info solves the problem, but then you don't have debug info. You can still build with debug info if you use BUILD_SHARED_LIBS=ON and/or LLVM_USE_SPLIT_DWARF=ON on platforms that support it.

Another helpful option is LLVM_PARALLEL_LINK_JOBS (only works with Ninja though).

Apart from being slow to build, debug builds also make running tests prohibitively slow.

It depends. I work on specific warnings usually and then do llvm-lit -sv $(git grep -l Wwarning clang/test/). That takes a couple of seconds regardless of the build profile. Running all tests takes a few minutes in Release, I don't have the patience for that. ;)

Otherwise I run larger subsets, like all Sema tests or all Parser tests and so on. Similarly if you're doing backend work in LLVM you might subset to that backend.

[Debug is] not a configuration I would recommend for routine development, and certainly not something I would recommend to new developers.

Quite a number of people I work with are completely lost if they can't step through code. And since we're talking about newbies here... it takes some experience to find bugs without a debugger.

The only real downside to it I can see right now is that I haven't seen many other projects behave in this way - so it would set LLVM apart, but on the other hand - as you point out: LLVM is big and building the "wrong" configuration has bigger ramification than for example "zlib".

Well put. We're probably the outlier if we were to force a choice. But the costs of getting it wrong are high and so I think we're acting in the best interests of all users, including new ones. LLVM isn't a quick build anyway, so let them take a minute to think about what they want.

Then I have to repeat myself. If you google for LLVM and CMAKE_BUILD_TYPE, you will only find a list of the types. There is no documentation about the trade-offs or recommendations.

Personally, I'd expect any new developer in LLVM to want Debug configuration, unless they're fairly new developers who don't know how to use a debugger and rely solely on things like print statements to debug. I doubt that they'd want "RelWithDebInfo", since the optimizations would likely completely throw them off. Certainly, when I first started LLVM, and indeed pretty much when I start any project for development, I use Debug configuration almost exclusively. That being said, I am a Windows developer who almost exclusively builds Visual Studio projects these days, so I have the luxury of being able to click a drop-down menu and change the config that way.

As @tschuett says, I think we need to improve our Getting Started docs to indicate which config to pick for different use-cases. This possibly applies regardless of what else we do (or don't do). I also think having an error when none is specified is reasonable, as long as the error points to the aforementioned docs explaining which to choose.

One thing we all seem to agree on is that documentation could be better - so I took a stab at that here - I think it's connected to this discussion, but I don't think it's the only fix, please see the doc suggestions here: https://reviews.llvm.org/D124367

I am currently leaning towards:

  • Improve documentation
  • Make omitting CMAKE_BUILD_TYPE a error. I am also fine with changing it to Release, but I don't have a strong opinion either way. I do think keeping Debug is worse than both options. But it can be made better by point 3 here:
  • Add an error when trying to use Debug + bfd linker to catch the most common problem, especially if we keep Debug the default.

One thing we all seem to agree on is that documentation could be better - so I took a stab at that here - I think it's connected to this discussion, but I don't think it's the only fix, please see the doc suggestions here: https://reviews.llvm.org/D124367

I am currently leaning towards:

  • Improve documentation
  • Make omitting CMAKE_BUILD_TYPE a error. I am also fine with changing it to Release, but I don't have a strong opinion either way. I do think keeping Debug is worse than both options. But it can be made better by point 3 here:
  • Add an error when trying to use Debug + bfd linker to catch the most common problem, especially if we keep Debug the default.

I'm OK with #1 and #2, but note that the combo in #3 is not ALWAYS broken, just if you don't have enough memory or are running too many link jobs at the same time. I'm not sure we want it to be an 'error'.

One thing we all seem to agree on is that documentation could be better - so I took a stab at that here - I think it's connected to this discussion, but I don't think it's the only fix, please see the doc suggestions here: https://reviews.llvm.org/D124367

I am currently leaning towards:

  • Improve documentation
  • Make omitting CMAKE_BUILD_TYPE a error. I am also fine with changing it to Release, but I don't have a strong opinion either way. I do think keeping Debug is worse than both options. But it can be made better by point 3 here:

Even though I think making Release the default build type is better than making omitting it an error. I agree that both of these options are better than keeping Debug the default.

  • Add an error when trying to use Debug + bfd linker to catch the most common problem, especially if we keep Debug the default.

This is still a valid build configuration, so I do not think this should be an error. The best way to help new users avoid this is to just change the default.

I'm OK with #1 and #2, but note that the combo in #3 is not ALWAYS broken, just if you don't have enough memory or are running too many link jobs at the same time. I'm not sure we want it to be an 'error'.

In my anecdotal information this configuration is almost ALWAYS broken. I just tested this on my system with 32GB RAM. With all defaults and it can't build correctly and dies when the linker runs out of memory. So our stock configuration is bad even on a pretty high-end system with bfd. This is using GCC 11.2.1 and binutils 2.38 - so pretty recent stuff.

Maybe we can tweak the default configuration to not make this happen, but if we don't I really would like that error there (which can be circumvented with a flag).

I'm OK with #1 and #2, but note that the combo in #3 is not ALWAYS broken, just if you don't have enough memory or are running too many link jobs at the same time. I'm not sure we want it to be an 'error'.

In my anecdotal information this configuration is almost ALWAYS broken. I just tested this on my system with 32GB RAM. With all defaults and it can't build correctly and dies when the linker runs out of memory. So our stock configuration is bad even on a pretty high-end system with bfd. This is using GCC 11.2.1 and binutils 2.38 - so pretty recent stuff.

Maybe we can tweak the default configuration to not make this happen, but if we don't I really would like that error there (which can be circumvented with a flag).

I rarely, if ever, build the whole of LLVM on my Linux VM, because I don't need to, as I tend to work on only a small subset of the LLVM tools. This means that I rarely run into linker issues due to exhausted memory. I don't believe I ever bothered changing my linker default. Whilst my usage may not be the same as others, I want to raise it to highlight that the configuration isn't always broken - it mostly works just fine if you don't try and build everything at once.

  • A debug build on Windows using MSVC can take around 70 to roughly 100 GB of storage space, which are always surprising and may also lead to out of disk space errors. Release builds are comparatively just a few Gigabytes. I believe they are also multiply tens of Gigabytes on other OSs

As someone who recently did his first patch, this was very surprising. I had a ~155 GB VS2022 generator debug build folder, which was a lot more than 15-20 GB mentioned on https://llvm.org/docs/GettingStarted.html

I rarely, if ever, build the whole of LLVM on my Linux VM, because I don't need to, as I tend to work on only a small subset of the LLVM tools. This means that I rarely run into linker issues due to exhausted memory. I don't believe I ever bothered changing my linker default. Whilst my usage may not be the same as others, I want to raise it to highlight that the configuration isn't always broken - it mostly works just fine if you don't try and build everything at once.

Yeah I don't doubt that there are many usages or other options to set where it will work, for experienced developers I don't see this as a problem.

But the point of this change would be to help users that haven't worked with LLVM before and just following a guide to build it - and with the default settings as they are now, it's more likely than not that they end up with a broken build because the memory runs out and it's not obvious unless you have a lot of experience.

Reading back over the comments above it seems like there is:

  • Everyone agrees we can do better with documentation. Which is why I have a second review for some documentation updates here: https://reviews.llvm.org/D124367
  • Pretty strong support for switching the default build_type to Release, but there is also some pretty strong pushback on this. Where the main disagreement is that it will just create a different problem where we put people that want a Debug build in a similar bad situation.
  • Pretty strong support for requiring a CMAKE_BUILD_TYPE set - and error out if it's not set. The main push back on this one is that it would make our build "pretty different" from other CMake based projects.
  • Mixed support for warning / erroring out when using DEBUG + BFD linker.

With this I suggest we should go ahead and:

  • Improve documentation - Please help out in the review posted above.
  • Make omitting CMAKE_BUILD_TYPE an error and link to the improved documentation.

Personally I still feel like a warning/error for Debug + bfd is still a good idea, but I think it's less important if we decide to make omitting CMAKE_BUILD_TYPE an error and we can test that first for a release and see how / if it changes the number of people complaining about the problem.

Any big disagreement on this plan?

With this I suggest we should go ahead and:

  • Improve documentation - Please help out in the review posted above.
  • Make omitting CMAKE_BUILD_TYPE an error and link to the improved documentation.

I would be happy with this path forward.

kippesp added a subscriber: kippesp.EditedApr 28 2022, 8:50 AM

That being said, I am a Windows developer who almost exclusively builds Visual Studio projects these days, so I have the luxury of being able to click a drop-down menu and change the config that way.

If CMAKE_BUILD_TYPE is specified in a multi-configuration build (like visual studio), the alternative build types won't be available. Usually when using MSBUILD, CMAKE_BUILD_TYPE should go unspecified at configuration time and the build type given at build time. Multi-configuration build types should probably omit this requirement.

cmake path/to/llvm
cmake --build . --config Debug
cmake --build . --config Release
etc

Clarification: CMAKE_BUILD_TYPE isn't (or, at least, shouldn't be) actionable with multi-configuration build types; the drop-down in the visual studio IDE is unaffected (with all build types still available). My comment is strictly limited to multiconfig.

If CMAKE_BUILD_TYPE is specified in a multi-configuration build (like visual studio), the alternative build types won't be available.

I'm not on Windows, but the CMake documentation says:

For multi-configuration generators like Visual Studio, Xcode, and Ninja Multi-Config, the configuration is chosen by the user at build time and CMAKE_BUILD_TYPE is ignored.

But it's pointless to require something that will be ignored, so I agree that it makes sense to have the requirement dependent on GENERATOR_IS_MULTI_CONFIG.

With this I suggest we should go ahead and:

  • Improve documentation - Please help out in the review posted above.
  • Make omitting CMAKE_BUILD_TYPE an error and link to the improved documentation.

This seems like a good compromise +1.

thieta updated this revision to Diff 426006.Apr 29 2022, 2:44 AM

Make omitting CMAKE_BUILD_TYPE an error.

Link will be updated when the updated documentation is merged.

thieta retitled this revision from [CMake] Change default CMAKE_BUILD_TYPE to Release to [CMake] Make omitting CMAKE_BUILD_TYPE an error.Apr 29 2022, 2:45 AM
thieta edited the summary of this revision. (Show Details)
aaron.ballman added inline comments.Apr 29 2022, 5:19 AM
llvm/CMakeLists.txt
70–75

Minor wording suggestions.

llvm/docs/ReleaseNotes.rst
72–74
thieta updated this revision to Diff 426055.Apr 29 2022, 7:37 AM

Rewording.

mehdi_amini accepted this revision.Apr 29 2022, 11:23 AM

LGTM, but please wait for others!

This revision is now accepted and ready to land.Apr 29 2022, 11:23 AM
thieta added a comment.May 2 2022, 5:17 AM

Friendly ping on this one - I would love to get feedback positive or negative on this latest revision. I think it's important that we don't have anyone strongly objecting to this approach.

aaron.ballman accepted this revision.May 2 2022, 5:21 AM

LGTM!

I applied the patch locally and tried it out with the MSVC built-in CMake support and confirmed that everything still works as I'd expect.

thieta added a comment.May 2 2022, 5:22 AM

LGTM!

I applied the patch locally and tried it out with the MSVC built-in CMake support and confirmed that everything still works as I'd expect.

Ah yes - I forgot to comment that I already have tested this! But yeah the check already did test for multi-configuration generators.

hans accepted this revision.May 2 2022, 5:56 AM

lgtm

llvm/docs/ReleaseNotes.rst
74

Should we add a paragraph which includes the motivation, explaining the problem, and perhaps linking to the docs which will help the user make the right choice?

LGTM, aside from @hans's comment.

llvm/docs/ReleaseNotes.rst
74

+1

thieta updated this revision to Diff 426591.May 3 2022, 12:47 AM

Added more details to Release Notes

thieta marked 3 inline comments as done.May 3 2022, 12:48 AM

Added more detail to the release notes. Let me know if anyone else have any concerns - otherwise I plan to land this tomorrow morning CEST.

thieta updated this revision to Diff 426592.May 3 2022, 12:49 AM

Update URL to documentation that will point directly to CMAKE_BUILD_TYPE when https://reviews.llvm.org/D124367 has landed, which I plan to land at the same time.

jhenderson added inline comments.May 3 2022, 12:54 AM
llvm/docs/ReleaseNotes.rst
74–78

Various grammar nits and suggestions.

thieta updated this revision to Diff 426596.May 3 2022, 1:16 AM

Various grammar nits

jhenderson accepted this revision.May 3 2022, 1:17 AM

Thanks for seeing this through!

Could you also update the CMake invocation in https://github.com/llvm/llvm-project/blob/be656df18721dc55a1de2eea64a3f73b6afa29a2/llvm/docs/GettingStarted.rst#getting-the-source-code-and-building-llvm? Or https://clang.llvm.org/get_started.html? There's probably going to be many more places like this.

MaskRay accepted this revision.May 3 2022, 2:16 PM
thieta updated this revision to Diff 426950.May 4 2022, 2:18 AM

Added CMAKE_BUILD_TYPE in several places in the documentation.

thieta added a comment.May 4 2022, 2:21 AM

Thanks for seeing this through!

Could you also update the CMake invocation in https://github.com/llvm/llvm-project/blob/be656df18721dc55a1de2eea64a3f73b6afa29a2/llvm/docs/GettingStarted.rst#getting-the-source-code-and-building-llvm? Or https://clang.llvm.org/get_started.html? There's probably going to be many more places like this.

I have hit a lot of these places in my latest revision now. But the clang website seems to be a checked in html file in the repo - I was unsure if this was generated or not.

I am tempted to merge this now and handle any other updates as other patches as we find them instead of holding this up any longer. Any objections @awarzynski ?

awarzynski accepted this revision.May 4 2022, 4:19 AM

I am tempted to merge this now and handle any other updates as other patches as we find them instead of holding this up any longer. Any objections @awarzynski ?

No objections, just lots of gratitude :) Thanks for doing this, @thieta, and for addressing my comments !

This revision was landed with ongoing or failed builds.May 4 2022, 5:01 AM
This revision was automatically updated to reflect the committed changes.
thieta added a comment.May 4 2022, 5:02 AM

Thanks everyone for your input! Let's hope this makes it easier for new users to pick the right defaults!