Page MenuHomePhabricator

llvmbuildectomy
AbandonedPublic

Authored by serge-sans-paille on Oct 9 2020, 9:50 AM.

Details

Summary

Replace LLVMBuild.txt and llvm-build script by LLVMBuild.cmake and a cmake script.

As a side effect, prune now unused parts of llvm-build, like the ability to generate Makefile Fragment.
It also removes LLVMBuild.txt for binaries (and the Tool componenet type) as they were not used anymore.
This removes a Python dependency for the build system.

Diff Detail

Event Timeline

serge-sans-paille requested review of this revision.Oct 9 2020, 9:50 AM
asl added a subscriber: asl.Oct 9 2020, 10:00 AM

Can we fold all these LLVMBuild.cmake files into the corresponding cmakelists.txt?

Wow pretty neat to see llvmbuild being removed!

That said we are still depending on it (we generate Bazel build file from this and use this in many projects using MLIR like TensorFlow, XLA, IREE...).
We need to move away from it: can you ping me and/or @GMNGeoffrey when you get closer to have this in a landing state?

In D89142#2322179, @asl wrote:

Can we fold all these LLVMBuild.cmake files into the corresponding cmakelists.txt?

I think we *can* but I'd like to do that in another patch, just for the sake of incremental steps.
The nice thing with this patch is that the content of LLVMBuild.cmake and LLVMBuild.txt are pretty similar (the .cmake was of course automatically generated from the .txt)

That said we are still depending on it (we generate Bazel build file from this and use this in many projects using MLIR like TensorFlow, XLA, IREE...).

I feared that scenario so much :-)

We need to move away from it: can you ping me and/or @GMNGeoffrey when you get closer to have this in a landing state?

Deal.

thieta added a subscriber: thieta.Oct 10 2020, 2:47 AM

Wow pretty neat to see llvmbuild being removed!

That said we are still depending on it (we generate Bazel build file from this and use this in many projects using MLIR like TensorFlow, XLA, IREE...).
We need to move away from it: can you ping me and/or @GMNGeoffrey when you get closer to have this in a landing state?

Could the generation of Bazel files be done by parsing the compile_commands.json maybe? that seems like a more "portable" approach since it's a pretty standard format these days.

Wow pretty neat to see llvmbuild being removed!

That said we are still depending on it (we generate Bazel build file from this and use this in many projects using MLIR like TensorFlow, XLA, IREE...).
We need to move away from it: can you ping me and/or @GMNGeoffrey when you get closer to have this in a landing state?

Could the generation of Bazel files be done by parsing the compile_commands.json maybe? that seems like a more "portable" approach since it's a pretty standard format these days.

It isn't clear to me how: the compile_commands.json seems much lower level and does not contain the information we need which is about libraries dependencies. I don't think the library dependency is expressed in the individual compile commands or easy to recover from this.
Bazel is a more "strict" build system where you can't even include a header from another library if you don't declare the dependency.

Wow pretty neat to see llvmbuild being removed!

That said we are still depending on it (we generate Bazel build file from this and use this in many projects using MLIR like TensorFlow, XLA, IREE...).
We need to move away from it: can you ping me and/or @GMNGeoffrey when you get closer to have this in a landing state?

Could the generation of Bazel files be done by parsing the compile_commands.json maybe? that seems like a more "portable" approach since it's a pretty standard format these days.

No. compile_commands.json records compile actions but there is no organization of libraries and no dependency information.

Wow pretty neat to see llvmbuild being removed!

That said we are still depending on it (we generate Bazel build file from this and use this in many projects using MLIR like TensorFlow, XLA, IREE...).
We need to move away from it: can you ping me and/or @GMNGeoffrey when you get closer to have this in a landing state?

Could the generation of Bazel files be done by parsing the compile_commands.json maybe? that seems like a more "portable" approach since it's a pretty standard format these days.

We're working on hand-written Bazel BUILD files at https://github.com/google/llvm-bazel, so the generation from LLVMBuild.txt (or something else) shouldn't be necessary. Of course automatically converting between systems would be nice, but is practically pretty difficult, especially since Bazel tends to track more things than CMake, so inherently CMake doesn't have all the information Bazel needs. Anyway we should be able to shift people over to the new ones, but they're not quite ready yet (in particular working on detecting and setting all the configuration options for various platforms is WIP).

  • Improve cmake function documentation
  • Cleanup code
  • Remove obsolete reference

I tried to organize the code so that it's possible for a third part tool to « consume » the LLVMBuild.cmake files in order to generate their own requirements.
Note that to the opposite of LLVMBuild.txt which described Tool and Libraries, we currently only describe Libraries. The cmake generator was not using the Tool part anymore, but this may still be relevant for other build generators (?)

Finish documentation update

@mehdi_amini and @GMNGeoffrey : no longer a WIP, I'll be happy to discuss how this affects bazel / mlir builds.

Thanks serge to work on this. I am pretty excited to get rid of llvm-build. I applied the patch on Windows and got two failures:

FAIL: LLVM :: tools/llvm-config/booleans.test (37437 of 39803)
******************** TEST 'LLVM :: tools/llvm-config/booleans.test' FAILED ********************
C:\Users\meinersbur\src\llvm-project\llvm\test\tools\llvm-config\booleans.test:25:20: error: CHECK-SHARED-MODE: expected string not found in input
CHECK-SHARED-MODE: {{static|shared}}
                   ^
<stdin>:1:1: note: scanning from here
llvm-config: unknown component name: all
^
<stdin>:1:16: note: possible intended match here
llvm-config: unknown component name: all
               ^

Input file: <stdin>
Check file: C:\Users\meinersbur\src\llvm-project\llvm\test\tools\llvm-config\booleans.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: llvm-config: unknown component name: all
check:25'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
check:25'1                    ?                         possible intended match
>>>>>>

error: command failed with exit status: 1

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
FAIL: LLVM :: tools/llvm-config/system-libs.windows.test (37441 of 39803)
******************** TEST 'LLVM :: tools/llvm-config/system-libs.windows.test' FAILED ********************
C:\Users\meinersbur\src\llvm-project\llvm\test\tools\llvm-config\system-libs.windows.test:5:8: error: CHECK: expected string not found in input
CHECK: psapi.lib shell32.lib ole32.lib uuid.lib advapi32.lib
       ^
<stdin>:1:1: note: scanning from here
llvm-config: unknown component name: all
^
<stdin>:1:6: note: possible intended match here
llvm-config: unknown component name: all
     ^

Input file: <stdin>
Check file: C:\Users\meinersbur\src\llvm-project\llvm\test\tools\llvm-config\system-libs.windows.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: llvm-config: unknown component name: all
check:5'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
check:5'1          ?                                   possible intended match
>>>>>>

error: command failed with exit status: 1

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
********************
Failed Tests (2):
  LLVM :: tools/llvm-config/booleans.test
  LLVM :: tools/llvm-config/system-libs.windows.test


Testing Time: 480.42s
  Unsupported      : 16954
  Passed           : 22759
  Expectedly Failed:    88
  Failed           :     2
serge-sans-paille edited the summary of this revision. (Show Details)

Fix issues spotted by @Meinersbur

LLVM's policy so far was that downstream concerns should not block upstream progress. That being said, if it's just some time needed for downstream to prepare, we could consider waiting a courtesy.

mehdi_amini added a comment.EditedOct 20 2020, 7:13 PM

LLVM's policy so far was that downstream concerns should not block upstream progress. That being said, if it's just some time needed for downstream to prepare, we could consider waiting a courtesy.

Absolutely, we don't have a hard blocker, but the timing is just a bit annoying for us as we are in the middle of the migration. Ideally if we could wait a few weeks that would be better for us (and for all the downstream projects building with Bazel).

(Right now the patch hasn't been reviewed/accepted yet though?)

Do we really need a LLVMBuild.cmake for every component? I hoped it would be possible to derive that information from CMake's dependency graph (which would avoid them being out-of-sync). If it's not feasible to get that information from CMake, why not use add_llvm_library in the CMakeLists.txt to pass that information?

llvm/cmake/modules/LLVM-Build.cmake
17

Even if already covered in LLVMBuild.rst, could you document which arguments are accepted here as well?

tschuett added a comment.EditedOct 24 2020, 1:17 PM

How about keeping the Bazel files in-tree?

  • more customers
  • better visibility
  • lower maintenance costs
  • ...

How about keeping the Bazel files in-tree?

  • more customers
  • better visibility
  • lower maintenance costs
  • ...

RFC coming soon :-)

This comment was removed by tschuett.

Do we really need a LLVMBuild.cmake for every component? I hoped it would be possible to derive that information from CMake's dependency graph (which would avoid them being out-of-sync). If it's not feasible to get that information from CMake, why not use add_llvm_library in the CMakeLists.txt to pass that information?

That sounds feasible, but:

  • An upside of current approach is that a third-part system can run cmake independently to read the config files
  • I wanted to keep the change relatively minimal. So I'd rather do that in another commit
llvm/cmake/modules/LLVM-Build.cmake
17

I started doing that, but felt it's ultra-redundant. I'll reference the original doc to avoid duplication.

  • An upside of current approach is that a third-part system can run cmake independently to read the config files

Sound like the intended use case for https://cmake.org/cmake/help/latest/manual/cmake-server.7.html

  • I wanted to keep the change relatively minimal. So I'd rather do that in another commit

Since it's rewriting each LLVMBuild.txt, I wouldn't call it minimal. It may however, it reduce risks by keeping the solutions similar to each other, but also also reduces the benefit. That is, what is the advantage of replacing scripts written in Python by the same scripts rewritten in CMake? Practically, we still need Python for llvm-lit, extract_symbols.py, libc (gen_hdr.py), libcxxabi, compiler-rt (gen_dynamic_list.py), ...

I see the biggest disadvantage of LLVMBuild that one has to redundantly specify library dependencies that must be kept consistent and other subprojects (such as clang) don't even use. This approach doesn't fix that.

Since it's rewriting each LLVMBuild.txt, I wouldn't call it minimal. It may however, it reduce risks by keeping the solutions similar to each other, but also also reduces the benefit. That is, what is the advantage of replacing scripts written in Python by the same scripts rewritten in CMake? Practically, we still need Python for llvm-lit, extract_symbols.py, libc (gen_hdr.py), libcxxabi, compiler-rt (gen_dynamic_list.py), ...

I see the biggest disadvantage of LLVMBuild that one has to redundantly specify library dependencies that must be kept consistent and other subprojects (such as clang) don't even use. This approach doesn't fix that.

Current step also removes (and not replace) a few LLVMBuild.txt, whenever it was only used to describe tools dependencies. And it makes the scope of the change well defined and easier to tackle than a whole python package.

That bein g said, I agree it's only the first step toward removing the LLVMBuild.* files, and the redunduncy between these files and the regular CMakeFiles.txt. I just don't want to do all the changes at once, and this one is already a big change!

[style] The indentation convention for CMake files is two spaces.

That bein g said, I agree it's only the first step toward removing the LLVMBuild.* files, and the redunduncy between these files and the regular CMakeFiles.txt. I just don't want to do all the changes at once, and this one is already a big change!

It's still a detour. More work to get the intermediate state to a good state, its risk to break something, the confusion it creates to contributors. In case of Bazel, the intermediate state might also cause additional work (Bazel people may want to comment on that). Now, you already did the work with this patch (which is of high quality), but as a means to explore the end solution, we don't actually need to commit it.

It would be great to get some other opinions as well that see more value than I do.

llvm/cmake/modules/LLVM-Build.cmake
17

Like with doxygen comments, it's the first point of reference for the definition, independent of whether additional documentation exists. IMHO a short description is helpful.

176–177

[nit] surplus space

Since it's rewriting each LLVMBuild.txt, I wouldn't call it minimal. It may however, it reduce risks by keeping the solutions similar to each other, but also also reduces the benefit. That is, what is the advantage of replacing scripts written in Python by the same scripts rewritten in CMake? Practically, we still need Python for llvm-lit, extract_symbols.py, libc (gen_hdr.py), libcxxabi, compiler-rt (gen_dynamic_list.py), ...

I see the biggest disadvantage of LLVMBuild that one has to redundantly specify library dependencies that must be kept consistent and other subprojects (such as clang) don't even use. This approach doesn't fix that.

Current step also removes (and not replace) a few LLVMBuild.txt, whenever it was only used to describe tools dependencies. And it makes the scope of the change well defined and easier to tackle than a whole python package.

That bein g said, I agree it's only the first step toward removing the LLVMBuild.* files, and the redunduncy between these files and the regular CMakeFiles.txt. I just don't want to do all the changes at once, and this one is already a big change!

What's the next step after this current patch?

[style] The indentation convention for CMake files is two spaces.

That bein g said, I agree it's only the first step toward removing the LLVMBuild.* files, and the redunduncy between these files and the regular CMakeFiles.txt. I just don't want to do all the changes at once, and this one is already a big change!

It's still a detour. More work to get the intermediate state to a good state, its risk to break something, the confusion it creates to contributors. In case of Bazel, the intermediate state might also cause additional work (Bazel people may want to comment on that).

It won't make a different for our Bazel generator script I believe.

About the incrementality, I assume this is the main point:

Do we really need a LLVMBuild.cmake for every component? I hoped it would be possible to derive that information from CMake's dependency graph (which would avoid them being out-of-sync). If it's not feasible to get that information from CMake, why not use add_llvm_library in the CMakeLists.txt to pass that information?

Is there an agreement that we want to remove these LLVMBuild.cmake files and move everything to the main CMakeLists.txt files?

It isn't clear why we can't just do it immediately, but I haven't thought it through really. If Serge perceives the current patch as a useful incremental step before addressing to the next steps, I don't mind doing it this way either.

What's the next step after this current patch?

I'll take the example of LLVMVEctorize. Current build specification (including this patch) it split in two:

add_llvm_component_library(LLVMVectorize
  LoadStoreVectorizer.cpp
  LoopVectorizationLegality.cpp
  LoopVectorize.cpp
  SLPVectorizer.cpp
  Vectorize.cpp
  VectorCombine.cpp
  VPlan.cpp
  VPlanHCFGBuilder.cpp
  VPlanPredicator.cpp
  VPlanSLP.cpp
  VPlanTransforms.cpp
  VPlanVerifier.cpp

  ADDITIONAL_HEADER_DIRS
  ${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms
  ${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Vectorize

  DEPENDS
  intrinsics_gen
  )

and

LLVMBuildComponent(
    TYPE Library
    NAME Vectorize
    PARENT Transforms
    LIBRARY_NAME Vectorize
    REQUIRED_LIBRARIES Analysis Core Support TransformUtils
)

Next step is two merge these two calls into a single one, at the add_llvm_component_library level. Thanks to the translation of the Python script into cmake script, this merge should be relatively easy. See how the intermediate step looks useful ;-) ?

I've posted a WIP approach in https://reviews.llvm.org/D90848 . It showcases what the final merge could look like, and I *really* like what it looks like in the end.
I don't mind *not* applying that patch and merging the two in one single patch, if that better suits the reviewers. The intermediate step was important to me, but may not matter from an commit history point of view.