This is primarily motivated by the desire to move from Python2 to
Python3. PYTHON_EXECUTABLE is ambiguous. This explicitly identifies
the python interpreter in use. Since the LLVM build seems to be able to
completed successfully with python3, use that across the build. The old
path aliases PYTHON_EXECUTABLE to be treated as Python3.
Details
- Reviewers
smeenai ldionne beanz JDevlieghere
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
After a discussion with @smeenai switch immediately to python3 as this may otherwise be treated as a regression for people on CMake>=3.12 and already using python3. For those who are using older releases, the old behaviour is preserved with an alternate variable name being referenced.
Although I'd be happy to move everything over to Python 3, the consensus from http://lists.llvm.org/pipermail/llvm-dev/2020-January/138730.html was to still support Python 2.7 till Jan 2021 or so, so it's important to have a fallback to Python 2 here if Python 3 isn't found, I think.
Include clang as some of the CI uses the unified build which fails without the update.
I would strongly prefer to do this differently. While we hope to drop Python 2 support in LLDB as soon as possible, we are not there yet. This patch as it stands will make it really hard for us to keep doing so (even internally) if upstream decides to drop support. I have looked into this change myself a while ago with these limitations in mind, and believe we can achieve the same in a way that would make this easier.
My suggestion is to have 4 new CMake variable that abstract over this: LLVM_PYTHON_EXECUTABLE, LLVM_PYTHON_LIBRARIES, LLVM_PYTHON_INCLUDE_DIRS and finally LLVM_PYTHON_HINT. We can then populate the first three depending on what machinery we want to use, i.e. the old CMake way of finding Python (until we bump the requirement across llvm), as well as the new FindPython3 and FindPython2. Similarly for the HINT variable, having our own abstraction means we don't clobber any of the variables used internally by CMake.
What do you think?
@JDevlieghere I think that adding another mechanism for finding the python executable is not the right approach. You already have the variables that you are talking about, you just need to specify it in triplicate if you want compatibility across all the versions, but specifying -DPython3_EXECUTABLE= gives you the control over the executable for python3. If you want to use python2, -DPython3_EXECUTABLE= and -DPython2_EXECUTABLE= will ensure that python2 is always used. If you are using an older CMake, specifying -DPython3_EXECUTABLE=, -DPython2_EXECUTABLE= and -DPYTHON_EXECUTABLE= will ensure that the specified version is used. I'm not sure what the scenario is that you believe will be made more difficult with this approach.
This is not better IMHO since it assumes that all subprojects are using the LLVM_meow variable to refer to the Python executable.
It only works because you're setting Python3_EXECUTABLE to Python2_EXECUTABLE.
set(Python3_EXECUTABLE ${Python2_EXECUTABLE})
This will be completely opaque to lldb and we'll have to struggle again to reverse engineer which Python is used, leaving us in the same situation as today. How are we going to choose between Python 2 and Python 3 based on that variable? What I envision is to find Python once, like we do with other dependencies in the LLVM project, and then use that (interpreter, libraries, include paths) in LLDB. This has to work for in-tree-builds as well as out-of-tree builds (I'm not a fan but hey they're still supported).
This patch is already changing the variable. How is this different/worse from using Python3_EXECUTABLE and having it maybe point to a Python 2 interpreter?
clang/CMakeLists.txt | ||
---|---|---|
154 | Python2? |
@JDevlieghere I think that for LLDB, which this patch does not touch, the proper thing to do is to have 2 different paths with the ability to explicitly opt into the path, i.e. LLDB_USE_PYTHON2:BOOL and LLDB_USE_PYTHON3:BOOL with CMakeDependentOption to prevent the two from being set to TRUE simultaneously. If two paths need to be supported, then you must have two paths.
I talked to Saleem and he cleared up some of my concerns. Given that the community seems to have agreed to support only Python 3, this change seems a lot more reasonable. My earlier comment was made under the impression that we were going to continue supporting Python 2. With that in mind, the burden should fall on LLDB if we want to continue supporting it. Python 2 will be a "special case" then, rather than another first class citizen. With all that said, this LGTM.
To be clear, the plan is to support Python 2 till Jan 2021 or so. Given that this has the fallback though, it LGTM (with the comments addressed).
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138730.html is the discussion for LLVM's Python 2/3 plans.
llvm/CMakeLists.txt | ||
---|---|---|
696 | Python2 |
llvm/CMakeLists.txt | ||
---|---|---|
696 | Yeah, I have that fixed locally, there are 2 instances of it. |
Is there more background on this somewhere? What's the advantage of using Python3_EXECUTABLE instead of just PYTHON_EXECUTABLE? Having Python3_EXECUTABLE be python 2 seems pretty surprising to me.
Also, this doesn't update some PYTHON_EXECUTABLEs, e.g. the one in lld/test. Is that intentional?
I was trying to do the minimal change to cover llvm. I definitely care about LLD and will do that in a subsequent change.
As to the benefit of this, it is primarily that the detection here explicitly ensures that python3 is used rather than python2. The fallback of aliasing is the easiest way to achieve this without having a lot of complicated logic at each site:
if(NOT Python3_Interpreter_FOUND) add_custom_command(...) else() add_custom_command(...) endif()
which would be the way to handle that. Creating another wrapper function is a terrible thing just to filter out incidental python usage would not only be extremely complicated but a large amount of logic that would also increase configure times unnecessarily.
The detection here is far more robust and precise. Python3_EXECUTABLE is the CMake variable name, not something that I chose.
I believe this breaks check-builtins on macOS. See https://bugs.chromium.org/p/chromium/issues/detail?id=1076480 for details.
The split() on the subprocess output throws since it wants a b'.' on py3, but that code has an unqualified except block. That means (I think) we always believe we run on macOS 10.0.0 under py3. I don't understand how we then fail to convert that back to an int (see linked bug).
Anyhoo, please take a look, and if it takes a while to fix we might have to go through yet another reland cycled :/
I would've expected we'd just set some new variable, eg LLVM_PYTHON_INTERPRETER to either the py2 or py3 interpreter and use that everywhere. Seems a bit less confusing, and shouldn't make anything else more complicated. But it's all going away in a few months anyways, so I don't have a strong opinion.
What's the new cmake flag to set the python executable, and should we update llvm/docs/GettingStarted.rst to not refer to PYTHON_EXECUTABLE anymore?
Python2?