This is an archive of the discontinued LLVM Phabricator instance.

[llvm] Improve export.sh with help and snapshot
ClosedPublic

Authored by kwk on Apr 28 2021, 5:20 AM.

Details

Summary

This change adds the ability to create source tarballs for unreleased or untagged code by providing the --git-ref <GIT_REF> flag to the llvm/utils/release/export.sh script. This is useful for creating daily snapshot tarballs that can easily be consumed by packagers who want to build a daily snapshot.

The default behavior of export.sh hasn't changed.

You may also provide a --template argument to say how the artifacts
are supposed to be named (as suggested by @hans).

The -help output of export.sh was changed quite significantly to look like this:

Export the Git sources and build tarballs from them.

Usage: export.sh [-release|--release <major>.<minor>.<patch>]
                      [-rc|--rc <num>]
                      [-final|--final]
                      [-git-ref|--git-ref <git-ref>]
                      [-template|--template <template>]

Flags:

  -release  | --release <major>.<minor>.<patch>    The version number of the release
  -rc       | --rc <num>                           The release candidate number
  -final    | --final                              When provided, this option will disable the rc flag
  -git-ref  | --git-ref <git-ref>                  (optional) Use <git-ref> to determine the release and don't export the test-suite files
  -template | --template <template>                (optional) Possible placeholders: $PROJECT $YYYYMMDD $GIT_REF $RELEASE $RC.
                                                   Defaults to '${PROJECT}-${RELEASE}${RC}.src.tar.xz'.

The following list shows the filenames (with <placeholders>) for the artifacts
that are being generated (given that you don't touch --template).

  * llvm-<RELEASE><RC>.src.tar.xz
  * clang-<RELEASE><RC>.src.tar.xz
  * compiler-rt-<RELEASE><RC>.src.tar.xz
  * libcxx-<RELEASE><RC>.src.tar.xz
  * libcxxabi-<RELEASE><RC>.src.tar.xz
  * libclc-<RELEASE><RC>.src.tar.xz
  * clang-tools-extra-<RELEASE><RC>.src.tar.xz
  * polly-<RELEASE><RC>.src.tar.xz
  * lldb-<RELEASE><RC>.src.tar.xz
  * lld-<RELEASE><RC>.src.tar.xz
  * openmp-<RELEASE><RC>.src.tar.xz
  * libunwind-<RELEASE><RC>.src.tar.xz
  * flang-<RELEASE><RC>.src.tar.xz

Additional files being generated:

  * llvm-project-<RELEASE><RC>.src.tar.xz    (the complete LLVM source project)
  * test-suite-<RELEASE><RC>.src.tar.xz      (only when not using --git-ref)

To ease the creation of snapshot builds, we also provide these files

  * llvm-release-<YYYYMMDD>.txt        (contains the <RELEASE> as a text)
  * llvm-rc-<YYYYMMDD>.txt             (contains the rc version passed to the invocation of export.sh)
  * llvm-git-revision-<YYYYMMDD>.txt   (contains the current git revision sha1)

Example values for the placeholders:

  * <RELEASE>  -> 13.0.0
  * <YYYYMMDD> -> 20210414
  * <RC>       -> rc4        (will be empty when using --git-ref)

In order to generate snapshots of the upstream main branch you could do this for example:

  export.sh --git-ref upstream/main --template '${PROJECT}-${YYYYMMDD}.src.tar.xz'

Diff Detail

Event Timeline

kwk created this revision.Apr 28 2021, 5:20 AM
kwk requested review of this revision.Apr 28 2021, 5:20 AM
Herald added a project: Restricted Project. · View Herald Transcript
kwk updated this revision to Diff 341159.Apr 28 2021, 5:22 AM
  • Fixup
kwk updated this revision to Diff 341163.Apr 28 2021, 5:41 AM
  • Fixup
kwk edited the summary of this revision. (Show Details)Apr 28 2021, 5:41 AM
kwk added inline comments.
llvm/utils/release/export.sh
31

This wasn't used anywhere.

Harbormaster completed remote builds in B101387: Diff 341158.
Harbormaster completed remote builds in B101390: Diff 341163.
kwk edited the summary of this revision. (Show Details)May 5 2021, 1:04 AM
kwk added reviewers: aaronpuchert, sscalpone, hans.
hans added a comment.May 5 2021, 2:06 AM

Is having <yyyymmdd> in the filename really the most useful thing. I'd imagine having the git ref directly might be more useful, or perhaps the output of "git-describe" on it. Or maybe it should have both the git-ref and date in the filename?
And when using the date, perhaps it should use the date of the commit (commitdate, not authordate i guess) rather than the date when the script is run?
Or maybe the script should take a template argument for the filenames?

llvm/utils/release/export.sh
37

The test-release.sh script (and maybe others?) have a "-git-ref" option. Maybe that's a better name than "-snapshot", for consistency?

57

Additional to what?

This is useful for creating daily snapshot tarballs that can easily be consumed by packagers who want to build a daily snapshot.

My understanding (derived from the containing directory's name) is that this script is used for creating the official release tarballs to be published on releases.llvm.org or GitHub. Are there plans to publish snapshots?

Short of that it's just a "create tarball for a given git hash", which isn't really LLVM-specific. Packagers will often have their own preferred way of creating snapshots, e.g. in Open Build Service I'd use a source service to create the snapshots automatically.

But that's not up to me. If @hans and @tstellar are fine with this, it's good to go.

Is having <yyyymmdd> in the filename really the most useful thing. I'd imagine having the git ref directly might be more useful, or perhaps the output of "git-describe" on it. Or maybe it should have both the git-ref and date in the filename?

Makes sense to me, after all the result should ideally be reproducible. (Prior to compression at least.)

llvm/utils/release/export.sh
95–101

Seems like this duplicates 77-83, maybe remove those lines?

kwk updated this revision to Diff 343509.May 6 2021, 3:20 PM
  • Rename outward facing option --snapshot to --git-ref
kwk edited the summary of this revision. (Show Details)May 6 2021, 3:21 PM
kwk updated this revision to Diff 343510.May 6 2021, 3:22 PM
  • Remove duplicate code
kwk added a comment.May 6 2021, 3:23 PM

Is having <yyyymmdd> in the filename really the most useful thing. I'd imagine having the git ref directly might be more useful, or perhaps the output of "git-describe" on it. Or maybe it should have both the git-ref and date in the filename?
And when using the date, perhaps it should use the date of the commit (commitdate, not authordate i guess) rather than the date when the script is run?
Or maybe the script should take a template argument for the filenames?

Okay, the question is valid but I have a perfect reason for why you want the date of the script being run. As a package maintainer I already generate daily source tarballs and keep the last three days in this pre-release of my fork: https://github.com/kwk/llvm-project/releases/tag/source-snapshot.

Now, when I want to build a package today, I can create the download URL simply by putting the current date into this URL: https://github.com/kwk/llvm-project/releases/download/source-snapshot/llvm-<YYYYYMMDD>.src.tar.xz. And if I want to know which git revision or LLVM version this was, I can query these URLs respectively:

If the date in here was anything else than the current date, I cannot construct the download URL. Having the date seems a bit old-school, yes, but it most certainly is easier to consume than any date of a commit which you cannot know upfront.

This is useful for creating daily snapshot tarballs that can easily be consumed by packagers who want to build a daily snapshot.

My understanding (derived from the containing directory's name) is that this script is used for creating the official release tarballs to be published on releases.llvm.org or GitHub. Are there plans to publish snapshots?

Like I mentioned to @hans above, I think we could utilize github to generate daily source tarballs for us and stored them in a pre-release. The advantage of a pre-release in this case is that it doesn't show up on the repo's front-page. I'm using my own fork's pre-release: https://github.com/kwk/llvm-project/releases/tag/source-snapshot and I'm using my own github workflow to generate this on a schedule: https://github.com/kwk/llvm-project/blob/snapshot/.github/workflows/re-build-source-snapshot.yml. Here you can see the individual runs of this action: https://github.com/kwk/llvm-project/actions/workflows/re-build-source-snapshot.yml.

Short of that it's just a "create tarball for a given git hash", which isn't really LLVM-specific. Packagers will often have their own preferred way of creating snapshots, e.g. in Open Build Service I'd use a source service to create the snapshots automatically.

But that's not up to me. If @hans and @tstellar are fine with this, it's good to go.

Is having <yyyymmdd> in the filename really the most useful thing. I'd imagine having the git ref directly might be more useful, or perhaps the output of "git-describe" on it. Or maybe it should have both the git-ref and date in the filename?

Makes sense to me, after all the result should ideally be reproducible. (Prior to compression at least.)

I agree that it's a bit of a sacrifice. When you run the export script twice a day you get two different source tarballs for the same day that carry the same name. But I'd say that by the extra or additional files, you can always know which revision is being tar-ed: https://github.com/kwk/llvm-project/releases/download/source-snapshot/llvm-git-revision-<YYYYYMMDD>.txt.

I've played with both solutions, i.e. having the git checkout and tarball creation in my snapshot pipeline or outsourcing it. I must say that I liked the outsourced version better, simply because it makes packaging a bliss. All I have to do is take our regular RPM spec files and prefix the content with something like this, which is easily generated, given the llvm-git-revision-<YYYYMMDD>.txt and llvm-version-<YYYYYMMDD>.txt files:

################################################################################
# BEGIN SNAPSHOT PREFIX
################################################################################

%global _with_snapshot_build 1
%bcond_with snapshot_build

%if %{with snapshot_build}
%global llvm_snapshot_yyyymmdd 20210506
%global llvm_snapshot_version 13.0.0
%global llvm_snapshot_git_revision 207b08a9130bd8167851f9e053f5d67bfd1969c8

# Split version
%global llvm_snapshot_version_major %{lua: print(string.match(rpm.expand("%{llvm_snapshot_version}"), "[0-9]+"));}
%global llvm_snapshot_version_minor %{lua: print(string.match(rpm.expand("%{llvm_snapshot_version}"), "%p([0-9]+)%p"));}
%global llvm_snapshot_version_patch %{lua: print(string.match(rpm.expand("%{llvm_snapshot_version}"), "%p([0-9]+)$"));}

# Shorten git revision
%global llvm_snapshot_git_revision_short %{lua: print(string.sub(rpm.expand("%llvm_snapshot_git_revision"), 0, 14));}
%endif

################################################################################
# END SNAPSHOT PREFIX
################################################################################

All we needed to do was to have special snapshot treatment fenced with %if %{with snapshot_build}:

%if %{with snapshot_build}
Source0:        https://github.com/kwk/llvm-project/releases/download/source-snapshot/llvm-%{llvm_snapshot_yyyymmdd}.src.tar.xz
%else
Source0:        https://github.com/llvm/llvm-project/releases/download/llvmorg-%{version}%{?rc_ver:-rc%{rc_ver}}/%{llvm_srcdir}.tar.xz
Source1:        https://github.com/llvm/llvm-project/releases/download/llvmorg-%{version}%{?rc_ver:-rc%{rc_ver}}/%{llvm_srcdir}.tar.xz.sig
Source2:        tstellar-gpg-key.asc
%endif

Sorry to bore you with these RPM stuff. But I think it proofs a lot of benefit.

llvm/utils/release/export.sh
37

Agreed. Renamed.

57

Additional files to the source tarballs.

95–101

Oh, sure. I must have missed that. Done.

kwk updated this revision to Diff 343512.May 6 2021, 3:24 PM
  • Fixup renaming --snapshot to --git-ref

Somehow the patch got messed up, I think. The changes that I see look like the changes since the previous patch, not the new patch.

Like I mentioned to @hans above, I think we could utilize github to generate daily source tarballs for us and stored them in a pre-release.

So you want to build daily snapshots as RPM, and for that need the sources to be on GitHub to have them official?

Sorry to bore you with these RPM stuff. But I think it proofs a lot of benefit.

Not boring me, openSUSE works with RPM as well. Though our build service can generate tarballs from git repositories automatically, so if I wanted to build snapshots I would probably just use that.

(I think you have to squash the changes, Arcanist seems to take only the last commit.)

kwk updated this revision to Diff 343928.May 9 2021, 1:20 PM
  • Restoring patch
kwk added a comment.May 9 2021, 1:27 PM

Somehow the patch got messed up, I think. The changes that I see look like the changes since the previous patch, not the new patch.

This should be fixed now. Sorry for the mess.

Like I mentioned to @hans above, I think we could utilize github to generate daily source tarballs for us and stored them in a pre-release.

So you want to build daily snapshots as RPM, and for that need the sources to be on GitHub to have them official?

In short, yes.

Sorry to bore you with these RPM stuff. But I think it proofs a lot of benefit.

Not boring me, openSUSE works with RPM as well. Though our build service can generate tarballs from git repositories automatically, so if I wanted to build snapshots I would probably just use that.

Generating the tarballs takes twelve minutes or so. For myself I wanted to have this outsourced of the build pipeline for the RPMs. From my short past experience with building snapshots, it wasn't LLVM in particular that needed to be adjusted in order for a build to go through. It was more about fixing paths or removing patches that already landed upstream. The feedback time for building an RPM that eventually fails was drastically reduced for me by relying on pre-built source tarballs. One other great benefit is that I can re-use the same RPM spec file for building snapshots and regular releases. This makes it easy to track changes that go into the next release more easy.

(I think you have to squash the changes, Arcanist seems to take only the last commit.)

Please let me know if this is fixed now.

kwk edited the summary of this revision. (Show Details)May 9 2021, 1:27 PM
hans added a comment.May 10 2021, 6:44 AM

I don't feel strongly about this one way or the other.

Tom is the main user of this script, so I think it's up to him to decide what to do here.

I'm fine with the concept of having an option for generating a snapshot tarball. I understand that it may not be useful to everyone, but we already ship tarballs as part of our releases and if we were to do official snapshots for the project, we would want to generate the tarballs in the same way. I do agree with Hans that it would be nice if there was some way to uniquely identify the tarball (e.g. with the git hash) otherwise if you do multiple tarballs a day, they would all have the same name.

kwk updated this revision to Diff 345168.May 13 2021, 9:05 AM

Introducing --template <template> to get what reviewers want.

kwk updated this revision to Diff 345169.May 13 2021, 9:06 AM

Fixup YYYYMMD -> YYYYMMDD

kwk edited the summary of this revision. (Show Details)May 13 2021, 9:18 AM
kwk marked 3 inline comments as done.May 19 2021, 1:31 PM

@hans @tstellar @aaronpuchert Can you please give this some final review? I've addressed your comments from before about a template for the filename for example.

hans added a comment.May 25 2021, 1:23 AM

Looks okay to me, but I think Tom still needs to sign off on this since he's the main user of the script.

tstellar added inline comments.Jul 29 2021, 7:28 AM
llvm/utils/release/export.sh
83

This is getting the release version from the LLVM source that contains export.sh, but it should be getting the version from the source that will be packaged.

kwk added inline comments.Aug 13 2021, 4:52 AM
llvm/utils/release/export.sh
83

Oh boy, that's a very good catch!

kwk added a comment.Aug 24 2021, 8:16 AM

@tstellar What do you think of fetching the CMakeLists.txt file from github?

llvm/utils/release/export.sh
83

I wonder if we can live with a solution that is backed by the fact that LLVM source lives on github. Then it is a matter of fetching a branch or a particular revision (NOT the local mirror/branch naming). To to a snapshot of one particular revision we could do this:

curl -s https://raw.githubusercontent.com/llvm/llvm-project/bcc29e0fcf24a74ef0ec68365afb020787ab0a88/llvm/CMakeLists.txt | grep -ioP 'set\(\s*LLVM_VERSION_(MAJOR|MINOR|PATCH)\s\K[0-9]+' | paste -sd '.'

And for the main branch we would do this:

curl -s https://raw.githubusercontent.com/llvm/llvm-project/main/llvm/CMakeLists.txt | grep -ioP 'set\(\s*LLVM_VERSION_(MAJOR|MINOR|PATCH)\s\K[0-9]+' | paste -sd '.'

Feel free to copy this to your local shell to try it out. Of course, the help text needs to be adjusted so that no one used their local remote and branch naming schemes.

The limitation of this approach is that you cannot release something that only you have locally, which is probably not a good practice anyways.

kwk planned changes to this revision.Aug 24 2021, 8:41 AM
kwk added inline comments.
llvm/utils/release/export.sh
83

Or we could do this instead:

// Get CMakeListst.txt for a particular revision.

t=$(mktemp)
mv /path/to/llvm-project/llvm/CMakeLists.txt $t
git checkout <git-ref> -- /path/to/llvm-project/llvm/CMakeLists.txt

// Grep the Version from CMakeLists.txt
// ... the code exists here already.

// Restore from the backed up file:
mv $t /path/to/llvm-project/llvm/CMakeLists.txt

The downside is that we need to overwrite code in the local repo for a limited time frame.

You can also use git show <treeish>:<file>, e.g. git show llvmorg-12.0.1:llvm/CMakeLists.txt.

kwk added a comment.Aug 25 2021, 6:25 AM

You can also use git show <treeish>:<file>, e.g. git show llvmorg-12.0.1:llvm/CMakeLists.txt.

This looks lovely. That's what I'll go for. Thank you so much @aaronpuchert !

kwk updated this revision to Diff 368621.Aug 25 2021, 6:33 AM
kwk edited the summary of this revision. (Show Details)

Using git show to get the CMakeLists.txt file in the correct version.

kwk updated this revision to Diff 368626.Aug 25 2021, 6:51 AM

Rebased onto upstream/main

kwk planned changes to this revision.EditedAug 26 2021, 1:26 AM

Need to take care of this error: fatal: path '/home/runner/work/llvm-project/llvm-project/llvm/CMakeLists.txt' exists on disk, but not in 'upstream/main'

kwk updated this revision to Diff 368830.Aug 26 2021, 1:40 AM

Deal with fatal: path '/home/runner/work/llvm-project/llvm-project/llvm/CMakeLists.txt' exists on disk, but not in 'upstream/main'

kwk added a comment.Aug 26 2021, 1:53 AM

@tstellar the error you've found about the CMakeLists.txt is now fixed.

tstellar accepted this revision.Sep 24 2021, 3:26 PM

LGTM, thanks.

This revision is now accepted and ready to land.Sep 24 2021, 3:26 PM
This revision was landed with ongoing or failed builds.Sep 24 2021, 3:38 PM
This revision was automatically updated to reflect the committed changes.