This is an archive of the discontinued LLVM Phabricator instance.

[bazel] Port zstd support
ClosedPublic

Authored by aaronmondal on Feb 5 2023, 7:49 AM.

Details

Summary

Originally added in D128465. Used by llvm:Support and lld:ELF.

Enabled by default. Disable with --@llvm_zstd//:llvm_enable_zstd=false.

Diff Detail

Event Timeline

aaronmondal created this revision.Feb 5 2023, 7:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2023, 7:49 AM
aaronmondal requested review of this revision.Feb 5 2023, 7:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2023, 7:49 AM
  • attempt to make patch applicable
  • Fix typo (zlib -> zstd).
  • Remove redundant llvm_zstd_enabled config setting.

Does this need a rebase?

MaskRay accepted this revision.Mar 28 2023, 10:15 AM

Thanks!

This revision is now accepted and ready to land.Mar 28 2023, 10:15 AM

Rerun against tests. Previous failure was most likely unrelated to this change,
but let's make sure.

GMNGeoffrey accepted this revision.Mar 28 2023, 2:53 PM
GMNGeoffrey added a subscriber: GMNGeoffrey.

Couple nits, but generally LGTM

utils/bazel/WORKSPACE
126

Can we use new_git_repository here instead?

utils/bazel/third_party_build/zstd.BUILD
18–21

I think I commented on this in another change, but can we make this a positive setting rather than a negative? I think it's easier to grok if you don't have to apply a negation.

54

We could try strip_include_prefix here. I think it avoids some of the problems with "includes" that make them pollute the whole dependency graph.

Failure fixed. Let's try again.

aaronmondal added inline comments.Mar 28 2023, 3:08 PM
utils/bazel/WORKSPACE
126

This was because of the hash failure for archives with github right? I'm not sure whether new_git_repository is a good idea here since we are using a stable archive (which AFAIK is guaranteed to be hash-stable). At least according to the bazel docs

Prefer http_archive to git_repository. The reasons are:

  • Git repository rules depend on system git(1) whereas the HTTP downloader is built into Bazel and has no system dependencies.
  • http_archive supports a list of urls as mirrors, and git_repository supports only a single remote.
  • http_archive works with the repository cache, but not git_repository. See #5116 for more information.

I'm only really worried about the third point here. You know the best practices better though, so I'm happy to change this if you prefer new_git_repository.

  • Make the config_setting positive to reduce cognitive overhead
  • Change include to strip_include_prefix so that we don't accidentally add a -Ilib which might lead issues down the line.
aaronmondal marked 2 inline comments as done and an inline comment as not done.Mar 28 2023, 3:22 PM

strip_include_prefix needs a string, not a list of strings.

GMNGeoffrey accepted this revision.Mar 28 2023, 3:57 PM
GMNGeoffrey added inline comments.
utils/bazel/WORKSPACE
126

Oh I actually didn't know about that third point. Yeah that sucks. You make good points that this is pretty stable. I guess I'm used to frequently-updating dependencies like LLVM :-) Let's keep it as http_archive.

An aside on hash stability:

Those archives are only sort-of guaranteed to be hash stable because everyone depends on that. gzip doesn't actually guarantee hash-stability by the spec and neither does git guarantee git archive stability. It recently changed the algorithm in a way that changed the hash. That broke the world when GitHub updated its git version: https://github.blog/2023-02-21-update-on-the-future-stability-of-source-code-archives-and-hashes. It seems that GitHub had a guarantee that "release assets" would be hash-stable and many people interpreted that to include the "source archive" link present on the release page (right next to the things manually uploaded), but GitHub didn't. Separately, someone from Bazel had received a private commitment from support that those would remain stable, but obviously that commitment didn't involve anyone who actually worked on the relevant code :-D Anyway it was a fun dumpster fire, but I still think that it would be better if Bazel provided a way that didn't rely on these unstable hashes. Probably using the GitHub API with the ref arg, as suggested in those GitHub docs, would provide a way to at least have a new_github_repository rule. Anyway, this is all fine for now :-)

aaronmondal added inline comments.Mar 28 2023, 4:25 PM
utils/bazel/WORKSPACE
126

Thanks for linking that article I didn't know about this.

If you rely on stable archives for security (ensuring you don’t accidentally trigger a tarbomb, for example), we recommend you switch to release assets instead of using source downloads.

Well looks like those living at head will seriously need to look for new options 🤣

it would be better if Bazel provided a way that didn't rely on these unstable hashes

Yep. This is definitely a big issue. I'm actually surprised that the git_archive variants are *less* cacheable than http_archives. My intuition was like "hey if a new commit comes in just treat it like a git pull" or something, but apparently that's not the case. I'll keep track of https://github.com/bazelbuild/bazel/pull/15536 so that we can switch when the git repo rules allow caching.

This revision was automatically updated to reflect the committed changes.

Looks like we didn't make this easy for downstream users to opt out of zstd: they still have to configure it (even if they were to pass the flag). We should add this to llvm_disable_optional_support_deps as well: https://github.com/llvm/llvm-project/blob/b66a6e4b104b245ed448e3dea944fa7e842a96b2/utils/bazel/configure.bzl#L178

aaronmondal added a comment.EditedMar 29 2023, 4:04 PM

Forced 3x speedup across the board xD. Jk this is not good. I think I wrote this after we already had a variant of https://reviews.llvm.org/D143320 running. I'll pick that back up. D143320 was lacking proper support for environment variables. I think fixing that should resolve this issue. Do you want me to revert for now or do you think this is ok until tomorrow?

I think tomorrow is fine. For people coming here because they don't have zstd, your options are:

  1. Copy the stanza from this WORKSPACE file into your own:
maybe(
    http_archive,
    name = "llvm_zstd",
    build_file = "@llvm-raw//utils/bazel/third_party_build:zstd.BUILD",
    sha256 = "7c42d56fac126929a6a85dbc73ff1db2411d04f104fae9bdea51305663a83fd0",
    strip_prefix = "zstd-1.5.2",
    urls = [
        "https://github.com/facebook/zstd/releases/download/v1.5.2/zstd-1.5.2.tar.gz"
    ],
)
  1. Create an empty @llvm_ztd//:zstd library
cc_library(name = "zstd")

and configure it in the WORKSPACE. For instance, use local_repository pointing at a directory containing the BUILD.bazel file and an empty WORKSPACE file, or use new_local_repository referencing that BUILD.bazel file and empty workspace file content:

# WORKSPACE
local_repository(
  name = "llvm_zstd",
  path = "third_party/zstd",
)

where that build file with the empty cc_library is at third_party/zstd/BUILD.bazel and there's an empty file at third_party/zstd/WORKSPACE

OR

# WORKSPACE
new_local_repository(
  name = "llvm_zstd",
  build_file_content="""cc_library(name="zstd")""",
  workspace_file_content="",
  path="<some empty directory. Any empty directory>"
)

Where you have to actually have some empty directory to point it, which is annoying.

@GMNGeoffrey Unfortunately I think reverting might be better after all. Making this work with WORKSPACES requires wayy too much boilerplate. Same issue for D143320.

Adding the http_archive somewhere with a maybe will break bzlmod builds in the future as module extensions intentionally don't support maybe. The current implementation doesn't work well with WORKSPACEs, so that's also not good. Using config_settings seems like a good way in general, but they probably should be in the llvm/BUILD.bazel and not in third_party/zstd.BUILD. After playing around with this for way too long I'm now concluding that having the selects in the llvm BUILD files might be the best way to go.

I also think we need to have a clear way of how external deps in general should be handled. I think we *either* need to enforce configuration via .bazelrc *or* configuration via the llvm_configure rule. I'm leaning towards handling it in llvm_configure, but then we'll need to change the attributes of that rule.

Maybe we should set a "best effort" goal of replicating the CMake configuration flags in llvm_configure and additional bazel specific flags in .bazelrc. This way we could stay close to the CMake build, let downstream users to declare a specific configuration via llvm_configure and let users that depend on bazel-specific config options (e.g. custom dep overrides) to do this via .bazelrc.

Overall there are just too many things problematic with the current state of this patch to roll forward.

Hmm I'm against reverting after all.

This issue is one of the reasons why bzlmod was created in the first place. See https://bazel.build/external/overview#shortcomings_of_the_workspace_system.

Going from there, I think the current workaround is actually the best way forward. It's not pretty, but it seems about as good as it gets with WORKSPACEs. External dependencies *should* be explicitly specified in the workspace file unless we want to adopt a deps.bzl pattern which we'd have to remove again when Bazel 7 deprecates WORKSPACEs (also this pattern has been discouraged for a long time now already).

What is currently posted as workaround is probably the best way to handle all external dependencies at the moment and I think this will lead to less friction once we get bzlmod ready.