This is an archive of the discontinued LLVM Phabricator instance.

Adding cmake to the exported list of tarballs
AbandonedPublic

Authored by kwk on Jan 26 2022, 7:45 AM.

Details

Summary

Since 0eed292fbae22a8856682b07e1cb968424b49941 or
https://reviews.llvm.org/D88458 there exists a top-level cmake directory
that is shared by some projects when they add the containing Modules
directory to the CMAKE_MODULE_PATH.

When you're doing standalone builds of LLVM and its subprojects, there's
currently no way to get a tarball for this cmake directory. This patch
changes this. It generates a git archive tar ball for the cmake
directory (~5,0K in size).

If you don't want to consume the all-in-one llvm-project tarball (~101M
in size) for a small subproject, then we need this change.

Diff Detail

Event Timeline

kwk created this revision.Jan 26 2022, 7:45 AM
kwk requested review of this revision.Jan 26 2022, 7:45 AM
Herald added a project: Restricted Project. · View Herald Transcript

@kkleine I have not tested that approach, but would that be possible to bundle the content of the cmake directory within each other tarball, so that the existence of cmake/ is transparent to downstream user?

I'm not 100% sure it's a good thing to do, as it makes building from tarball even more different from building from the mono repo, but as a packager I do see value in the approach.

kwk added a comment.Jan 28 2022, 7:03 AM

@kkleine I have not tested that approach, but would that be possible to bundle the content of the cmake directory within each other tarball, so that the existence of cmake/ is transparent to downstream user?

I'm not 100% sure it's a good thing to do, as it makes building from tarball even more different from building from the mono repo, but as a packager I do see value in the approach.

First answering your question about the technical possibility:

git archive has an option (--add-file) to a file to the generated acrhive. You can use that option multiple times to add more than one file. The added file will be placed at the root of the tar ball. I think it is best to show you how this could be done. For simplicity I'll just use tar and not tar.xz:

# Let's create the shared cmake tarball once for all projects to consume it:
cd llvm-project/
cd cmake/
git archive --prefix=cmake-shared/ HEAD . > cmake-shared.tar

# Let's pick a small project (`libcxx`) so its quick and easy to get results.
# We'll create the tarball for that project (analogue to how we do it in `export.sh`).
cd ../libcxx
git archive --add-file=../cmake/cmake-shared.tar --prefix=libcxx/ HEAD . > libcxx-HEAD.tar

# Now let's print out the top level elements in the resulting tarball
tar -tf libcxx-HEAD.tar | cut -d "/" -f 1 | sort | uniq

# Output:
# cmake-shared.tar
# libcxx

NOTICE: The resulting tarball contains the cmake-shared.tar as a standalone file. This is the cleanest way I could find to get the shared cmake directory into the outer-tarball.

Then, there's another way that looked promising but produced a rather odd looking tree structure in the end.

Instead of adding a single compressed file, we could add all the files from the top-level cmake directory individually:

git archive $(for f in $(find ../cmake/ -type f); do echo "--add-file=$f"; done) --prefix=libcxx/ HEAD . > libcxx-HEAD.tar

tar -tf libcxx-HEAD.tar | cut -d "/" -f 1 | sort | uniq

# Output:
# CheckLinkerFlag.cmake
# EnableLanguageNolink.cmake
# ExtendPath.cmake
# FindPrefixFromConfig.cmake
# HandleCompilerRT.cmake
# HandleOutOfTreeLLVM.cmake
# libcxx
# README.rst
# SetPlatformToolchainTools.cmake

NOTICE: The absensce of any nesting of the cmake files. Their placement in the subtree is gone.

Now that I've presented what's possible I wanted to talk a bit about the scenario and if it is really such a big problem, to include the cmake tarball into packages that need it.

  1. Yes it's bad that packages need to know about this. But eventually this time had to come that projects outsource shared pieces.
  2. The llvm subproject` is needed by many other projects. Do we include in them? No. Sure, it is much bigger in size but just looking at what the current practice is now...
  3. Do we really want to include the shared cmake tree into projects that (yet) don't use it? What if they suddenly do, refactor this script again to include them? Why not make this a pull-in approach and leave it to the packagers to deal with? For clang in Fedora we already consume clang and clang-tools-extra and I think it doesn't necessarily hurt to include a third tarball there.
kwk added a comment.Jan 28 2022, 7:46 AM

One thing I have to try is adding a file to the end of the archive. On tar level that should just work.

kwk added a comment.EditedJan 28 2022, 8:28 AM

One thing I have to try is adding a file to the end of the archive. On tar level that should just work.

As long as we're happy with not doing everything in one bash pipe but instead:

  1. create a pure *.tar ball with git archive
  2. append the cmake directory structure to the tarball as a top level directory
  3. then compressing the tarball with xz

Then we can do it:

cd llvm-project/libcxx

# Remove any artifact
rm libcxx-HEAD.tar 

# Create pure tar archive
git archive --prefix=libcxx/ -o libcxx-HEAD.tar HEAD .

# Show last files in tarball
# Notice the absence of anything statrting with cmake/

tar -tf libcxx-HEAD.tar | tail

# Output:
# libcxx/utils/libcxx/test/params.py
# libcxx/utils/libcxx/test/target_info.py
# libcxx/utils/libcxx/util.py
# libcxx/utils/merge_archives.py
# libcxx/utils/run.py
# libcxx/utils/ssh.py
# libcxx/utils/sym_diff.py
# libcxx/utils/symcheck-blacklists/
# libcxx/utils/symcheck-blacklists/linux_blacklist.txt
# libcxx/utils/symcheck-blacklists/osx_blacklist.txt


# Show that the git PAX header can properly be read before appending to the archive
cat libcxx-HEAD.tar | git get-tar-commit-id 

# Output:
# d22958975bf8b55ac50aed6b0068e7fb69cff2c2

# Append the cmake tree
tar --append -f libcxx-HEAD.tar ../cmake

# Output:
# tar: Removing leading `../' from member names
# tar: Removing leading `../' from hard link targets

# Show that the cmake/ tree was added 
tar -tf libcxx-HEAD.tar | tail

# Output: 
# cmake/
# cmake/Modules/
# cmake/Modules/CheckLinkerFlag.cmake
# cmake/Modules/EnableLanguageNolink.cmake
# cmake/Modules/ExtendPath.cmake
# cmake/Modules/FindPrefixFromConfig.cmake
# cmake/Modules/HandleCompilerRT.cmake
# cmake/Modules/HandleOutOfTreeLLVM.cmake
# cmake/Modules/SetPlatformToolchainTools.cmake
# cmake/README.rst

# Show that libcxx is still there
tar -tf libcxx-HEAD.tar | head -n1

# Output:
# libcxx/

# Show that the Git PAX header hasn't changed:
cat libcxx-HEAD.tar | git get-tar-commit-id 

# Output:
# d22958975bf8b55ac50aed6b0068e7fb69cff2c2

I've created the patch D118481 for this and we can see with which one we want to proceed.

serge-sans-paille added a comment.EditedFeb 1 2022, 9:19 AM

As a packager, https://reviews.llvm.org/D118481 looks like a better approach because it removes a build dependency, but it also make each tarball slightly differ (my a small margin, I guess). @tstellar, I'm curious about your opinion.

kwk abandoned this revision.Feb 11 2022, 2:59 AM

Abandoning this patch in favor of D118481.