This is an archive of the discontinued LLVM Phabricator instance.

build: use `find_package(Python3)` if available
ClosedPublic

Authored by compnerd on Apr 23 2020, 2:50 PM.

Details

Summary

This is primarily motivated by the desire to move from Python2 to
Python3. PYTHON_EXECUTABLE is ambiguous. This explicitly identifies
the python interpreter in use. Since the LLVM build seems to be able to
completed successfully with python3, use that across the build. The old
path aliases PYTHON_EXECUTABLE to be treated as Python3.

Diff Detail

Event Timeline

compnerd created this revision.Apr 23 2020, 2:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2020, 2:50 PM
compnerd updated this revision to Diff 259720.Apr 23 2020, 3:02 PM
compnerd retitled this revision from build: use `find_package(Python2)` if available to build: use `find_package(Python3)` if available.
compnerd edited the summary of this revision. (Show Details)

After a discussion with @smeenai switch immediately to python3 as this may otherwise be treated as a regression for people on CMake>=3.12 and already using python3. For those who are using older releases, the old behaviour is preserved with an alternate variable name being referenced.

Although I'd be happy to move everything over to Python 3, the consensus from http://lists.llvm.org/pipermail/llvm-dev/2020-January/138730.html was to still support Python 2.7 till Jan 2021 or so, so it's important to have a fallback to Python 2 here if Python 3 isn't found, I think.

ldionne requested changes to this revision.Apr 24 2020, 4:10 AM
ldionne added inline comments.
llvm/CMakeLists.txt
697

We need to fall back to Python 2 here per @smeenai 's comment.

This revision now requires changes to proceed.Apr 24 2020, 4:10 AM
compnerd updated this revision to Diff 259895.Apr 24 2020, 8:23 AM

Add fallback to python2 in case python3 is not found.

compnerd updated this revision to Diff 259923.Apr 24 2020, 10:29 AM

Include clang as some of the CI uses the unified build which fails without the update.

Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2020, 10:29 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
JDevlieghere requested changes to this revision.Apr 24 2020, 11:13 AM

I would strongly prefer to do this differently. While we hope to drop Python 2 support in LLDB as soon as possible, we are not there yet. This patch as it stands will make it really hard for us to keep doing so (even internally) if upstream decides to drop support. I have looked into this change myself a while ago with these limitations in mind, and believe we can achieve the same in a way that would make this easier.

My suggestion is to have 4 new CMake variable that abstract over this: LLVM_PYTHON_EXECUTABLE, LLVM_PYTHON_LIBRARIES, LLVM_PYTHON_INCLUDE_DIRS and finally LLVM_PYTHON_HINT. We can then populate the first three depending on what machinery we want to use, i.e. the old CMake way of finding Python (until we bump the requirement across llvm), as well as the new FindPython3 and FindPython2. Similarly for the HINT variable, having our own abstraction means we don't clobber any of the variables used internally by CMake.

What do you think?

This revision now requires changes to proceed.Apr 24 2020, 11:13 AM
compnerd added a comment.EditedApr 24 2020, 12:00 PM

@JDevlieghere I think that adding another mechanism for finding the python executable is not the right approach. You already have the variables that you are talking about, you just need to specify it in triplicate if you want compatibility across all the versions, but specifying -DPython3_EXECUTABLE= gives you the control over the executable for python3. If you want to use python2, -DPython3_EXECUTABLE= and -DPython2_EXECUTABLE= will ensure that python2 is always used. If you are using an older CMake, specifying -DPython3_EXECUTABLE=, -DPython2_EXECUTABLE= and -DPYTHON_EXECUTABLE= will ensure that the specified version is used. I'm not sure what the scenario is that you believe will be made more difficult with this approach.

compnerd updated this revision to Diff 259987.Apr 24 2020, 2:03 PM

Adjust clang-tools-extra at the same time

compnerd updated this revision to Diff 260010.Apr 24 2020, 3:28 PM

Add missed case of PYTHON_EXECUTABLE

ldionne accepted this revision.Apr 27 2020, 8:06 AM

My suggestion is to have 4 new CMake variable that abstract over this: LLVM_PYTHON_EXECUTABLE, LLVM_PYTHON_LIBRARIES, LLVM_PYTHON_INCLUDE_DIRS and finally LLVM_PYTHON_HINT. We can then populate the first three depending on what machinery we want to use, i.e. the old CMake way of finding Python (until we bump the requirement across llvm), as well as the new FindPython3 and FindPython2. Similarly for the HINT variable, having our own abstraction means we don't clobber any of the variables used internally by CMake.

What do you think?

This is not better IMHO since it assumes that all subprojects are using the LLVM_meow variable to refer to the Python executable.

@JDevlieghere I think that adding another mechanism for finding the python executable is not the right approach. You already have the variables that you are talking about, you just need to specify it in triplicate if you want compatibility across all the versions, but specifying -DPython3_EXECUTABLE= gives you the control over the executable for python3. If you want to use python2, -DPython3_EXECUTABLE= and -DPython2_EXECUTABLE= will ensure that python2 is always used. If you are using an older CMake, specifying -DPython3_EXECUTABLE=, -DPython2_EXECUTABLE= and -DPYTHON_EXECUTABLE= will ensure that the specified version is used. I'm not sure what the scenario is that you believe will be made more difficult with this approach.

It only works because you're setting Python3_EXECUTABLE to Python2_EXECUTABLE.

set(Python3_EXECUTABLE ${Python2_EXECUTABLE})

This will be completely opaque to lldb and we'll have to struggle again to reverse engineer which Python is used, leaving us in the same situation as today. How are we going to choose between Python 2 and Python 3 based on that variable? What I envision is to find Python once, like we do with other dependencies in the LLVM project, and then use that (interpreter, libraries, include paths) in LLDB. This has to work for in-tree-builds as well as out-of-tree builds (I'm not a fan but hey they're still supported).

My suggestion is to have 4 new CMake variable that abstract over this: LLVM_PYTHON_EXECUTABLE, LLVM_PYTHON_LIBRARIES, LLVM_PYTHON_INCLUDE_DIRS and finally LLVM_PYTHON_HINT. We can then populate the first three depending on what machinery we want to use, i.e. the old CMake way of finding Python (until we bump the requirement across llvm), as well as the new FindPython3 and FindPython2. Similarly for the HINT variable, having our own abstraction means we don't clobber any of the variables used internally by CMake.

What do you think?

This is not better IMHO since it assumes that all subprojects are using the LLVM_meow variable to refer to the Python executable.

This patch is already changing the variable. How is this different/worse from using Python3_EXECUTABLE and having it maybe point to a Python 2 interpreter?

clang/CMakeLists.txt
154

Python2?

compnerd added a comment.EditedApr 27 2020, 9:11 AM

@JDevlieghere I think that for LLDB, which this patch does not touch, the proper thing to do is to have 2 different paths with the ability to explicitly opt into the path, i.e. LLDB_USE_PYTHON2:BOOL and LLDB_USE_PYTHON3:BOOL with CMakeDependentOption to prevent the two from being set to TRUE simultaneously. If two paths need to be supported, then you must have two paths.

JDevlieghere accepted this revision.Apr 27 2020, 2:34 PM

I talked to Saleem and he cleared up some of my concerns. Given that the community seems to have agreed to support only Python 3, this change seems a lot more reasonable. My earlier comment was made under the impression that we were going to continue supporting Python 2. With that in mind, the burden should fall on LLDB if we want to continue supporting it. Python 2 will be a "special case" then, rather than another first class citizen. With all that said, this LGTM.

This revision is now accepted and ready to land.Apr 27 2020, 2:34 PM

I talked to Saleem and he cleared up some of my concerns. Given that the community seems to have agreed to support only Python 3, this change seems a lot more reasonable. My earlier comment was made under the impression that we were going to continue supporting Python 2. With that in mind, the burden should fall on LLDB if we want to continue supporting it. Python 2 will be a "special case" then, rather than another first class citizen. With all that said, this LGTM.

To be clear, the plan is to support Python 2 till Jan 2021 or so. Given that this has the fallback though, it LGTM (with the comments addressed).

http://lists.llvm.org/pipermail/llvm-dev/2020-January/138730.html is the discussion for LLVM's Python 2/3 plans.

llvm/CMakeLists.txt
696

Python2

compnerd marked an inline comment as done.Apr 27 2020, 5:59 PM
compnerd added inline comments.
llvm/CMakeLists.txt
696

Yeah, I have that fixed locally, there are 2 instances of it.

thakis added a subscriber: thakis.Apr 27 2020, 6:47 PM

Is there more background on this somewhere? What's the advantage of using Python3_EXECUTABLE instead of just PYTHON_EXECUTABLE? Having Python3_EXECUTABLE be python 2 seems pretty surprising to me.

Also, this doesn't update some PYTHON_EXECUTABLEs, e.g. the one in lld/test. Is that intentional?

I was trying to do the minimal change to cover llvm. I definitely care about LLD and will do that in a subsequent change.

As to the benefit of this, it is primarily that the detection here explicitly ensures that python3 is used rather than python2. The fallback of aliasing is the easiest way to achieve this without having a lot of complicated logic at each site:

if(NOT Python3_Interpreter_FOUND)
  add_custom_command(...)
else()
  add_custom_command(...)
endif()

which would be the way to handle that. Creating another wrapper function is a terrible thing just to filter out incidental python usage would not only be extremely complicated but a large amount of logic that would also increase configure times unnecessarily.

The detection here is far more robust and precise. Python3_EXECUTABLE is the CMake variable name, not something that I chose.

I believe this breaks check-builtins on macOS. See https://bugs.chromium.org/p/chromium/issues/detail?id=1076480 for details.

The split() on the subprocess output throws since it wants a b'.' on py3, but that code has an unqualified except block. That means (I think) we always believe we run on macOS 10.0.0 under py3. I don't understand how we then fail to convert that back to an int (see linked bug).

Anyhoo, please take a look, and if it takes a while to fix we might have to go through yet another reland cycled :/

I was trying to do the minimal change to cover llvm. I definitely care about LLD and will do that in a subsequent change.

As to the benefit of this, it is primarily that the detection here explicitly ensures that python3 is used rather than python2. The fallback of aliasing is the easiest way to achieve this without having a lot of complicated logic at each site:

if(NOT Python3_Interpreter_FOUND)
  add_custom_command(...)
else()
  add_custom_command(...)
endif()

I would've expected we'd just set some new variable, eg LLVM_PYTHON_INTERPRETER to either the py2 or py3 interpreter and use that everywhere. Seems a bit less confusing, and shouldn't make anything else more complicated. But it's all going away in a few months anyways, so I don't have a strong opinion.

hans added a subscriber: hans.Feb 14 2022, 10:47 AM

What's the new cmake flag to set the python executable, and should we update llvm/docs/GettingStarted.rst to not refer to PYTHON_EXECUTABLE anymore?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2022, 10:47 AM