Page MenuHomePhabricator

[RFC] Change how we deal with optional dependencies
ClosedPublic

Authored by JDevlieghere on Dec 10 2019, 3:17 PM.

Details

Summary

Recently there has been some discussion about how we deal with optional dependencies in LLDB. The common approach in LLVM is to make things work out of the box. If the dependency isn't there, we just move on. This is not true in LLDB. Unless you explicitly disable the dependency with LLDB_DISABLE_*, you'll get a configuration-time error.

When a dependency is not found in LLVM, it prints a status message. This is something I don't like, because there are quite a lot, and they're easy to miss. I'd rather not build all of LLDB only to find out that it couldn't find Python and now I can't run the test suite. Given how "important" some of our dependencies really are, I think a warning isn't unreasonable.

For the first approach we'd need 3 variables for every CMake option.

  1. The default value.
  2. The user provided value, which can override the default value. If the user says they want to disable Python, we shouldn't print the aforementioned warning.
  3. The value actually used by LLDB. This needs to be different from the one provided by the user, because maybe they enabled Python, but the library couldn't be found.

What I don't like about this approach is there needs to be a discrepancy between the variable provided by the user and the one used in the code. If I pass LLDB_DISABLE_PYTHON I want to grep for that and see what it affects.

The second approach is printing the warning once, and modifying the user-provided variable in the cache. So if you pass LLDB_DISABLE_PYTHON=OFF and CMake can't find it, it would update LLDB_DISABLE_PYTHON=ON, which is what you'd probably do in the current situation. This also means the warning is only printed once, because once LLDB_DISABLE_PYTHON is set, you wouldn't try looking for it again.

What I don't like about this approach is that this is messing with my user provided variables behind my back. Sure it prints a warning, but if there were other configuration issue I might need to rerun CMake and be oblivious what happened.

The third and final approach is doing what LLVM does. If the dependency is found, turn the option on (unless the value is overwritten by the user) and vice versa. This is by far the cleanest approach from a CMake and UX perspective, but I've already stated what I don't like about this.

PS: The diff shows what (2) would look like.

Diff Detail

Event Timeline

JDevlieghere created this revision.Dec 10 2019, 3:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2019, 3:17 PM
Herald added a subscriber: abidh. · View Herald Transcript

I personally prefer the third approach. To make sure I understand correctly, I'll write it in my own words so you can correct me if I misunderstood.
Try to find the dependency, and if we find it then use it. If not, then we can print out something like "Didn't find DEPENDENCY" and continue on our merry way. If the user overwrites the values and something goes wrong, send a fatal error and tell them that what the value they set isn't going to work without further work (e.g. explicitly enable python support but didn't find python? tell the user that you couldn't find python and maybe suggest setting some other CMake variables to help CMake find python).

I think that we should emphasize UX here. While I think that everybody could benefit from understanding the build system better, helping the user understand that their settings don't work (and why they don't work) goes a long way.

I'm going to be blunt but if you dislike what LLVM does, improve that rather than proliferating inconsistencies between subprojects.

That said, there are few things that could improve this. We could print a nice summary of which dependencies were found and which were not at the end of CMake run. We could move variables for controlling deps (or maybe even all user-set variables) to a separate file (meson does something like this, i think), so they're easier to find and track.

I'm going to be blunt but if you dislike what LLVM does, improve that rather than proliferating inconsistencies between subprojects.

I think LLVM's behavior makes sense for LLVM but makes less sense for LLDB. I'm all for consistency, but that doesn't mean we should shoehorn anything just because that's how it's done in LLVM.

That said, there are few things that could improve this. We could print a nice summary of which dependencies were found and which were not at the end of CMake run. We could move variables for controlling deps (or maybe even all user-set variables) to a separate file (meson does something like this, i think), so they're easier to find and track.

It's true that external dependencies are a lot more critical for lldb than llvm. That might justify some deviation from it, but I would certainly try to minimize that. I am a big anti-fan of overwriting user-provided values, so I'd really try to avoid that method.

Besides lldb developers, another category we should consider are llvm (or clang or lld, ...) developers who know nothing about lldb, but they ended up building it because they specified LLVM_ENABLE_PROJECTS=all. These people probably don't want to figure out how to install libedit, or how to disable it, but if the default config "just works" for them, they'd be happy to build lldb and make sure their patch doesn't break it. Optimizing for this experience may be even more important than core lldb developers, since there's fewer of the latter and they usually just configure lldb once.

Ignoring inconsistency with llvm, the approach I would like the most is to have the dependency options be ternary. "ON" means the dependency must be there, and if it isn't you get an error. "OFF" means no dependency, we don't even try to look for it. The third option (let's call it "AUTO") would try to look for the dependency, and use it if it's there -- if not, you get the usual status message.

This isn't entirely consistent with llvm, but it's not that far from it either -- llvm essentially just has two of these options: "OFF" and "AUTO", only auto is called "ON". If we wanted to be more consistent, we could even rename these options to "OFF", "ON" and "FORCE_ON"...

People who want to be sure they are using python could just set this variable to "FORCE_ON", for instance in their cache file. Would that be enough to address your desire for explicitness?

As for the number of variables, I don't think we really need that many of them. I think the default value should always be "auto". The only reason we spend so much effort in computing the default value is because some platforms don't ever have a certain dependency, and we wanted the build to "just work" there. If "auto" just works, this can all be thrown out the window.

And I don't think we need a separate /names/ for the "value selected by user" and "the effective value". We can reuse the "cache" vs "regular variable" distinction here (as you have accidentally done in this patch). While shadowing cache variables is a bit "naughty", I think it can actually be helpful in some cases. For example, nothing except the code which actually handles the dependency search should ever care about the setting value "as provided by user". They should only ever care about the effective value -- shadowing can ensure that. And this is a practice successfully used elsewhere in llvm -- e.g., we create a shadow for LLVM_TARGETS_TO_BUILD, which expands the "all" pseudo-target provided by the user into the actual list of targets. However, I don't recall seeing this approach in dependency management code...

The implementation of something like this could be:

# This assumes that LLDB_DISABLE_PYTHON is renamed to LLDB_ENABLE_PYTHON across the board, which is something I've wanted to do for a while..
set(LLDB_ENABLE_PYTHON "ON" CACHE STRING "On, Off or Auto")
string(TOLOWER "${LLDB_ENABLE_PYTHON}" LLDB_ENABLE_PYTHON)
if (LLDB_ENABLE_PYTHON)
  if ("${LLDB_ENABLE_PYTHON}" STREQUAL "auto")
    set(maybe_required)
  else
    set(maybe_required REQUIRED)
  endif()
  find_package(Python ${maybe_required})
  set(LLDB_ENABLE_PYTHON ${PYTHON_FOUND})
endif()

I personally prefer the third approach. To make sure I understand correctly, I'll write it in my own words so you can correct me if I misunderstood.
Try to find the dependency, and if we find it then use it. If not, then we can print out something like "Didn't find DEPENDENCY" and continue on our merry way. If the user overwrites the values and something goes wrong, send a fatal error and tell them that what the value they set isn't going to work without further work (e.g. explicitly enable python support but didn't find python? tell the user that you couldn't find python and maybe suggest setting some other CMake variables to help CMake find python).

How exactly does this "overwriting" work? Could you point me to the code that does this? I don't remember seeing anything like this, but the llvm build is not entirely consistent either, so it's possible we're looking at different things...

lldb/cmake/modules/LLDBConfig.cmake
132

I don't think this does what you wanted it to do. This leaves the cache variable intact, and sets the local cmake variable with the same name to "ON;FORCE".

shafik added a subscriber: shafik.Dec 11 2019, 9:02 AM

Besides lldb developers, another category we should consider are llvm (or clang or lld, ...) developers who know nothing about lldb, but they ended up building it because they specified LLVM_ENABLE_PROJECTS=all. These people probably don't want to figure out how to install libedit, or how to disable it, but if the default config "just works" for them, they'd be happy to build lldb and make sure their patch doesn't break it. Optimizing for this experience may be even more important than core lldb developers, since there's fewer of the latter and they usually just configure lldb once.

I want to seconded this, as someone who supports expression parsing which is depends on clang components which may be under active development such as ASTImporter having them being able to build lldb w/o a hassle is an important consideration if we want to them to be proactive and run the lldb test suite etc...

I personally prefer the third approach. To make sure I understand correctly, I'll write it in my own words so you can correct me if I misunderstood.
Try to find the dependency, and if we find it then use it. If not, then we can print out something like "Didn't find DEPENDENCY" and continue on our merry way. If the user overwrites the values and something goes wrong, send a fatal error and tell them that what the value they set isn't going to work without further work (e.g. explicitly enable python support but didn't find python? tell the user that you couldn't find python and maybe suggest setting some other CMake variables to help CMake find python).

How exactly does this "overwriting" work? Could you point me to the code that does this? I don't remember seeing anything like this, but the llvm build is not entirely consistent either, so it's possible we're looking at different things...

By overwrite, I meant that the user would explicitly set the value to ON. In the terms you used, this would be FORCE_ON.

emaste added a subscriber: emaste.Dec 16 2019, 12:52 PM

Implement Pavel's suggestion.

mgorny requested changes to this revision.Dec 18 2019, 10:53 AM
mgorny added inline comments.
lldb/cmake/modules/LLDBConfig.cmake
24

Any reason not to use the regular NOT logic here? It would make it accept all kinds of false values just like you accept all kinds of true values below.

This revision now requires changes to proceed.Dec 18 2019, 10:53 AM
JDevlieghere marked an inline comment as done.
mgorny added inline comments.Dec 18 2019, 12:10 PM
lldb/cmake/modules/LLDBConfig.cmake
27

Now you broke it the other way around. Is there any reason you can't just ${${variable}} here?

JDevlieghere marked an inline comment as done.Dec 18 2019, 12:41 PM
JDevlieghere added inline comments.
lldb/cmake/modules/LLDBConfig.cmake
27

Can you please describe the desired behavior rather than the symptom? I must be missing what you're trying to achieve. :-)

Maybe there's a gap in my CMake knowledge, but wouldn't if(${${variable}}) evaluate to true for any string value?

mgorny added inline comments.Dec 18 2019, 12:55 PM
lldb/cmake/modules/LLDBConfig.cmake
27

The desired behavior is to keep standard CMake behavior for any value other than AUTO, and by standard behavior I mean: https://cmake.org/cmake/help/v3.0/command/if.html the first definition on the list.

Particularly, in Gentoo we default to passing yes/no for historical reasons. I would find it irritating if different places in LLVM accepted different constants.

JDevlieghere marked an inline comment as done.

Accept usual constant in addition to auto.

lldb/cmake/modules/LLDBConfig.cmake
27

Got it, thanks for the explanation!

JDevlieghere marked 2 inline comments as done.Dec 18 2019, 2:45 PM
mgorny accepted this revision.Dec 19 2019, 12:07 AM

Besides missing LZMA, looks good to me. However, I suspect you may want to wait for a second opinion ;-).

lldb/cmake/modules/LLDBConfig.cmake
397–398

Did you omit LLDB_ENABLE_LZMA on purpose or accidentally?

This revision is now accepted and ready to land.Dec 19 2019, 12:07 AM
labath requested changes to this revision.Dec 19 2019, 2:50 AM

I like where this is going, but I think this still needs some work wrt. the panel library (and being able to customize the dependency search more).

lldb/cmake/modules/LLDBConfig.cmake
22

Use the same capitalization of "auto" as the description

35–41

I don't care that much about that, but I don't see the advantage in canonicalizing to ON/OFF (the other variables don't go through the canonicalization process, so only their default values will be ON/OFF -- the actual values will be whatever flavour of "true" the user chooses to pass to cmake)

489–490

I was wondering above if the name we pass to the find_package provides sufficient customization, and I think this shows it doesn't. Ideally, the panel (sub)library would behave the same way as the curses library and respect all three values of the cache variable. This one does not support the "force on" mode.

Having the cache variable handling code centralized in a single function is a worthy goal, but I am not sure that this is really possible, given the current cmake abilities. e.g. we'd need some sort of a indirect call so we can implement custom searching logic, but there aren't any particularly nice ways of achieving that. The only way I know of achieving indirection is to put the logic in a separate file, and then use either an include, find_package or something else to load it.

If you want to go the file loading approach, or "outline" than function (at least some parts of it), then I think both are fine, but I don't think leaving this code in here is good, since the whole point of this exercise is to standardize things.

This revision now requires changes to proceed.Dec 19 2019, 2:50 AM
JDevlieghere marked an inline comment as done.
  • Extract finding curses and panel into FindCursesAndPanel
  • Include LZMA
mgorny added inline comments.Dec 19 2019, 8:43 PM
lldb/cmake/modules/LLDBConfig.cmake
489–490

Isn't this redundant now?

labath accepted this revision.Dec 20 2019, 1:42 AM

Looks fine to me. Just remove the redundant panel searching code, and please make sure this works correctly from a clean build, per the inline comment.

lldb/cmake/modules/FindCursesAndPanel.cmake
9

It's not fully clear to me what will happen when this code is run for the first time (when CURSES_INCLUDE_DIRS, etc. is not defined yet). Who will set CURSES_PANEL_FOUND in that case? Could you make sure this works correctly when run for the first time on a fully clean build?

I don't know whether this is the standard way of writing find_package files, but I'd consider just removing the caching and letting find_package(Curses) and find_library(panel) just run every time -- they already contain some internal caching so we're not saving much here anyway...

lldb/cmake/modules/LLDBConfig.cmake
489–490

yep. it sure is.

This revision is now accepted and ready to land.Dec 20 2019, 1:42 AM
JDevlieghere marked 2 inline comments as done.Dec 20 2019, 10:05 AM
JDevlieghere added inline comments.
lldb/cmake/modules/FindCursesAndPanel.cmake
9

I inspired myself on other FindPackage code. I have to admit that I don't know how it works exactly, but it behaves the way you expect. I did a clean build to verify.

This revision was automatically updated to reflect the committed changes.
JDevlieghere marked an inline comment as done.