This is an archive of the discontinued LLVM Phabricator instance.

Add use_default_shell_env = True to ctx.actions.run
ClosedPublic

Authored by Flamefire on Sep 16 2021, 3:26 AM.

Details

Summary

When building a tool in a non-standard environment (e.g. custom
compiler path -> LD_LIBRARY_PATH set) then
use_default_shell_env = True is required to run that tool in the same
environment or otherwise the build will fail due to missing symbols.
See https://github.com/google/jax/issues/7842 for this issue and
https://github.com/tensorflow/tensorflow/pull/44549 for related fix in
TF.

Diff Detail

Event Timeline

Flamefire created this revision.Sep 16 2021, 3:26 AM
Flamefire requested review of this revision.Sep 16 2021, 3:26 AM

I've done quite a bit of hunting around and reading about default_shell_env (why wasn't this in the documentation at https://docs.bazel.build/versions/main/skylark/lib/actions.html#run.use_default_shell_env?). From my reading, it appears that use_default_shell_env will include the entire local shell environment as part of each Bazel invocation. This has implications for build hermeticness and especially for shared caching, so I'm a bit reluctant to do it. AFAICT if you set --action_env=LD_LIBRARY_PATH and --host_action_env=LD_LIBRARY_PATH then actions (in target and host configuration, respectively) *should* have the LD_LIBRARY_PATH in their env. That's a decision that can be made in a more localized way so that individual users/projects can make their own tradeoff between hermeticness and actually getting their build to work :-P Following the thread on https://github.com/bazelbuild/bazel/issues/12579 and https://github.com/tensorflow/tensorflow/pull/44549 it seems that --host_action_env is broken and was fixed by https://github.com/bazelbuild/bazel/commit/e6670825b1 and then, if I'm reading GitHub UI's correctly, backported to the 4.x LTS release with https://github.com/bazelbuild/bazel/commit/80cf90e51a. Further, I believe that --incompatible_strict_action_env means that the default shell env *doesn't* inherit PATH, LD_LIBRARY_PATH, etc, so there is still a way to recover the more cache-friendly and hermetic behavior. I'm actually not sure in that case what the default_shell_env *does* contain and what shell env the action gets if not the default one. None, I guess? So still some confusion there.

So to evaluate the best path forward, could you try out whether building with Bazel 4.2 and --host_action_env=LD_LIBRARY_PATH fixes the issue for you? I'll also do a bit more investigation of what's going on with default_shell_env

GMNGeoffrey edited the summary of this revision. (Show Details)Sep 16 2021, 4:29 PM
Flamefire added a comment.EditedSep 17 2021, 1:57 AM

AFAICT if you set --action_env=LD_LIBRARY_PATH and --host_action_env=LD_LIBRARY_PATH then actions (in target and host configuration, respectively) *should* have the LD_LIBRARY_PATH in their env.

jax builds with --distinct_host_configuration=false which means all action_env should also be passed to host_action_env. We also pass e.g. --action_env=PYTHONPATH for our builds.
Yet as you can see in https://github.com/google/jax/issues/7842 the env is literally empty. So I'm pretty sure the --host_action_env=LD_LIBRARY_PATH won't work.
Furthermore we can't really use Bazel 4.2 as the underlying TensorFlow isn't compatible with Bazel 4.x yet: https://github.com/tensorflow/tensorflow/blob/v2.6.0/configure.py#L53

I'm actually not sure in that case what the default_shell_env *does* contain and what shell env the action gets if not the default one. None, I guess? So still some confusion there.

Yeah, very much looks like None while as far as I can tell stuff like LD_LIBRARY_PATH and action_envs are passed if use_default_shell_env is set. The TF guys accepted my PR which adds this to the TF repo, so I guess even they don't know a better way.

Re --incompatible_strict_action_env I found https://github.com/bazelbuild/bazel/issues/6648 that mentions that --action_env *should* be passed, but clearly is not -.-
However if you search the Bazel repo that whole thing is a mess:

See https://github.com/bazelbuild/bazel/issues/3320, https://github.com/bazelbuild/bazel/issues/12059, https://github.com/bazelbuild/bazel/issues/11163 to name a few general ones
https://github.com/bazelbuild/bazel/issues/3057:

Bazel does not forward --action_env=NAME, but it does forward --action_env=NAME=VALUE.

What??? As this is a repository rule I tried --action_env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH but still nothing passed

https://github.com/bazelbuild/bazel/issues/6392:

Are you sure that params passed to --action_env are passed to repository execute calls? That sounds completely wrong.

I can't follow that reasoning -.-

So in conclusion: I don't see any other option but this.

Edit: Same results with Bazel 4.2.1

GMNGeoffrey accepted this revision.Sep 17 2021, 10:21 AM

(╯°□°)╯︵ ┻━┻

Ugh yeah let's just add this for now at least. I'll follow up to see if we can get some clarity on wtf is going on with action environments. I assume you don't have commit access and would like me to land this for you?

This revision is now accepted and ready to land.Sep 17 2021, 10:21 AM

And if so, please lmk what you'd like me to put for the git author field (name and email)

mehdi_amini added inline comments.Sep 17 2021, 10:59 AM
utils/bazel/llvm-project-overlay/mlir/tblgen.bzl
172

Can you capture the rational presented above as a comment before this option?

keith added a subscriber: keith.Sep 17 2021, 12:23 PM

@GMNGeoffrey Updated the diff to include a comment. And yes please land this. Author tag is "Alexander Grund <alexander.grund@tu-dresden.de>", thanks.

This revision was automatically updated to reflect the committed changes.

Landed. I added a bit more detail to the comment.

Oh and I asked for more explanation for all this on https://groups.google.com/g/bazel-discuss/c/jHaj11mm6sU