This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
utils/bazel/
-
bazel/
-
configure.bzl
4/4
configure_overlay.py
-
overlay_directories.py

Differential D143295

[bazel] Move bazel configuration to a Python script
AbandonedPublic

Authored by thopre on Feb 3 2023, 2:06 PM.

Download Raw Diff

Details

Reviewers

chapuni
mehdi_amini
GMNGeoffrey
MaskRay
thieta
rupprecht
thakis
aaronmondal

Summary

Move logic to call the bazel overlay script and create targets.bzl and
vars.bzl into a python script to be used when building LLVM from an
external WORKSPACE such as when doing a repository override for
llvm-project in tensorflow MLIR backend. This allows setting up the
overlay more easily and keep the instructions stable upon changes in
LLVM.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

thopre created this revision.Feb 3 2023, 2:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 2:06 PM

Herald added subscribers: bzcheeseman, rriddle. · View Herald Transcript

thopre requested review of this revision.Feb 3 2023, 2:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 2:06 PM

Herald added subscribers: llvm-commits, stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B211795: Diff 494729.Feb 3 2023, 2:39 PM

aaronmondal mentioned this in D143351: [bazel] Remove unused dependency on libxml2.Feb 6 2023, 2:35 PM

Can we use some kind of formatting/linting for this script, potentially at a later time, unrelated to this relocation commit?

Running something like wemake-python-styleguide on this currently paints the entire file red. I think for now the following edits would be nice for some basic PEP8 compliance.

--- old.py        2023-02-07 00:29:01.552414476 +0100
+++ new.py        2023-02-07 00:29:02.787444050 +0100
@@ -65,7 +65,7 @@
         subprocess.run(cmd, check=True)
     except subprocess.CalledProcessError as e:
         cmd_str = " ".join([str(arg) for arg in cmd])
-        print(f"Failed to execute overlay script: '{cmd}'\n" +
+        print(f"Failed to execute overlay script: '{cmd_str}'\n" +
               f"Exited with code {e.returncode}\n" +
               f"stdout:\n{e.stdout}\n" +
               f"stderr:\n{e.stderr}\n", file=sys.stderr)
@@ -111,7 +111,7 @@
             # Skip if `CMAKE_CXX_STANDARD` is set with
             # `LLVM_REQUIRED_CXX_STANDARD`.
             # Then `v` will not be desired form, like "${...} CACHE"
-            if c[k] != None:
+            if c[k] is not None:
                 continue
 
             # Pick up 1st word as the value.
@@ -153,9 +153,9 @@
     vars = _extract_cmake_settings(llvm_cmake_path)
 
     _write_dict_to_file(
-        filepath = os.path.join(args.dst, "vars.bzl"),
-        header = "# Generated from {}\n\n".format(llvm_cmake),
-        vars = vars,
+        filepath=os.path.join(args.dst, "vars.bzl"),
+        header="# Generated from {}\n\n".format(llvm_cmake),
+        vars=vars,
     )
 
     # Create a starlark file with the requested LLVM targets.
@@ -164,6 +164,7 @@
     with open(targets_bzl_path, "w") as tgt_file:
         print(f"llvm_targets = [{targets_list_str}]", end="", file=tgt_file)
 
+
 if __name__ == "__main__":
-  _check_python_version()
-  main(parse_arguments())
+    _check_python_version()
+    main(parse_arguments())

I'd be happy to add additional edits later in a different commit if that's ok.

Also: Is there a particular reason behind _overlay_directories invoking a python subprocess instead of invoking the script in a starlark rule?

Edit: Found the suggest edit button and added the edits as suggestions 😊

aaronmondal added inline comments.Feb 6 2023, 3:45 PM

utils/bazel/configure_overlay.py
69
115
157–159
169–170

aaronmondal requested changes to this revision.Feb 6 2023, 3:46 PM

This revision now requires changes to proceed.Feb 6 2023, 3:46 PM

I have been using "black" to do python formatting in other places. Maybe we should make that part of the LLVM dev guide. But I think it's better if we make formatting changes separately from the functional changes.

Address black's linting errors
Call overlay code in overlay_directories directly instead of using a subprocess

Excuse me, I am not a fan of it.
Could you remain vars.bzl stuff, unless python would have interest to vars?

I am planning for vars.bzl to give configurations to build rules (and repository rules).

FYI, re. _extract_cmake_settings;

It would disappear when the central config database (among build systems) would be introduced.
It could be rewritten smarter with python. I wrote it under restriction of starlark.

Thank you.

Harbormaster completed remote builds in B212354: Diff 495482.Feb 7 2023, 5:32 AM

In D143295#4109933, @chapuni wrote:

Excuse me, I am not a fan of it.
Could you remain vars.bzl stuff, unless python would have interest to vars?

I am planning for vars.bzl to give configurations to build rules (and repository rules).

My point is to simplify the work needed to use LLVM as an external bazel dependency with a repository override, as documented in Tensorflow MLIR backend for instance: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/README.md
Note that those instructions are not complete as it stands because of vars.bzl which LLVM bazel build system expects but is not created if using a repository override. That's why I wanted to move it into a Python script to be able to call it instead of doing all those manual steps.

FYI, re. _extract_cmake_settings;

It would disappear when the central config database (among build systems) would be introduced.

What's the timeline for that to happen? Is there a patch under review already?

It could be rewritten smarter with python. I wrote it under restriction of starlark.

I'm happy to simplify it, is there a Python module to deal with CMakeLists.txt or just improve the parsing code?

Thank you.

Thanks, no more issues from my side :) I also think it's a lot better to do this without the subprocess module👍

I'll have to agree with @chapuni though, Bazel has exceptional support for bash scripts etc, anything that could be expressed in starlark with genrules and bash scripts seems much better to me than python. E.g. overlaying/moving files is like 10x faster in bash than it is in python.

Also, removing the python dependency would potentially make the build much more portable. I don't think this commit would be the right place to do this and I understand that pytorch/tensorflow devs may prefer using python in general.

However, I think the current functionality of the script is not something where python is actually needed. I'm not sure where else we'd need python for the LLVM build, but I'd say that requiring a rather large python dependency in every CI image is not very economical when we already have starlark and bash.

Python is already a hard requirement for many things in the LLVM ecosystem, so I don't think that is an argument. Other than that I don't know enough about Bazel to know if this is good or not.

In D143295#4110430, @aaronmondal wrote:

I'll have to agree with @chapuni though, Bazel has exceptional support for bash scripts etc, anything that could be expressed in starlark with genrules and bash scripts seems much better to me than python. E.g. overlaying/moving files is like 10x faster in bash than it is in python.

Also, removing the python dependency would potentially make the build much more portable. I don't think this commit would be the right place to do this and I understand that pytorch/tensorflow devs may prefer using python in general.

However, I think the current functionality of the script is not something where python is actually needed. I'm not sure where else we'd need python for the LLVM build, but I'd say that requiring a rather large python dependency in every CI image is not very economical when we already have starlark and bash.

Does the bazel CI run any test? If so you would already need Python for lit anyway. Also the overlay script is already in Python so continuing in Python makes sense to me. And the code I've moved to Python is very much not compute heavy so I don't think speed is really a concern.

I supposed llvm_configure is the only public entry for users.
I won't oppose if overlay_directories.py may be assumed as 2nd public entry.

One concern. It seems this change tends to miss update of llvm/CMakeLists.txt.
(for example, bazel build doesn't detect changes if llvm/CMakeLists.txt is changed.)
That said, utils/bazel has been flaky for change of directory structure to be mirrored.
We could run bazel sync to satisfy changes, I think

In D143295#4121029, @chapuni wrote:

I supposed llvm_configure is the only public entry for users.

It is, but its implementation function is not executed when using a --override_repository on the name associated with its repository_rule, as documented in tensorflow's MLIR readme. I failed to draw the right conclusion though: I now think the problem lies in that documentation. Tensorflow has a separate external repository for LLVM before the overlay and that's what should be overriden. I've created a PR against tensorflow to that effect. I'll keep this patch in "plan changes" state for the time being in case there is a reason for tensorflow to recommend overriding the repository defined by llvm_configure.

Thanks for all the feedback!

I won't oppose if overlay_directories.py may be assumed as 2nd public entry.

One concern. It seems this change tends to miss update of llvm/CMakeLists.txt.
(for example, bazel build doesn't detect changes if llvm/CMakeLists.txt is changed.)
That said, utils/bazel has been flaky for change of directory structure to be mirrored.
We could run bazel sync to satisfy changes, I think

As mentioned, I went for a change in how tensorflow uses a local LLVM repo instead: https://github.com/tensorflow/tensorflow/commit/64a1638818af21c33280287a95ea88ea62ea9237

Thus closing.

Revision Contents

Path

Size

utils/

bazel/

configure.bzl

113 lines

configure_overlay.py

148 lines

overlay_directories.py

21 lines

Diff 495482

utils/bazel/configure.bzl

Show All 27 Lines	DEFAULT_TARGETS = [
"Sparc",		"Sparc",
"SystemZ",		"SystemZ",
"VE",		"VE",
"WebAssembly",		"WebAssembly",
"X86",		"X86",
"XCore",		"XCore",
]		]

def _overlay_directories(repository_ctx):		def _llvm_configure_impl(repository_ctx):
src_path = repository_ctx.path(Label("//:WORKSPACE")).dirname		src_path = repository_ctx.path(Label("//:WORKSPACE")).dirname
bazel_path = src_path.get_child("utils").get_child("bazel")		bazel_path = src_path.get_child("utils").get_child("bazel")
overlay_path = bazel_path.get_child("llvm-project-overlay")		script_path = bazel_path.get_child("configure_overlay.py")
script_path = bazel_path.get_child("overlay_directories.py")

python_bin = repository_ctx.which("python3")		python_bin = repository_ctx.which("python3")
if not python_bin:		if not python_bin:
# Windows typically just defines "python" as python3. The script itself		# Windows typically just defines "python" as python3. The script itself
# contains a check to ensure python3.		# contains a check to ensure python3.
python_bin = repository_ctx.which("python")		python_bin = repository_ctx.which("python")

if not python_bin:		if not python_bin:
fail("Failed to find python3 binary")		fail("Failed to find python3 binary")

		# Create a starlark file with the requested LLVM targets.
		targets = repository_ctx.attr.targets
cmd = [		cmd = [
python_bin,		python_bin,
script_path,		script_path,
"--src",		"--dst",
src_path,
"--overlay",
overlay_path,
"--target",
".",		".",
]		"--targets",
		] + targets
exec_result = repository_ctx.execute(cmd, timeout = 20)		exec_result = repository_ctx.execute(cmd, timeout = 20)

if exec_result.return_code != 0:		if exec_result.return_code != 0:
fail(("Failed to execute overlay script: '{cmd}'\n" +		fail(("Failed to execute overlay configure script: '{cmd}'\n" +
"Exited with code {return_code}\n" +		"Exited with code {return_code}\n" +
"stdout:\n{stdout}\n" +		"stdout:\n{stdout}\n" +
"stderr:\n{stderr}\n").format(		"stderr:\n{stderr}\n").format(
cmd = " ".join([str(arg) for arg in cmd]),		cmd = " ".join([str(arg) for arg in cmd]),
return_code = exec_result.return_code,		return_code = exec_result.return_code,
stdout = exec_result.stdout,		stdout = exec_result.stdout,
stderr = exec_result.stderr,		stderr = exec_result.stderr,
))		))

def _extract_cmake_settings(repository_ctx, llvm_cmake):
# The list to be written to vars.bzl
# `CMAKE_CXX_STANDARD` may be used from WORKSPACE for the toolchain.
c = {
"CMAKE_CXX_STANDARD": None,
"LLVM_VERSION_MAJOR": None,
"LLVM_VERSION_MINOR": None,
"LLVM_VERSION_PATCH": None,
}

# It would be easier to use external commands like sed(1) and python.
# For portability, the parser should run on Starlark.
llvm_cmake_path = repository_ctx.path(Label("//:" + llvm_cmake))
for line in repository_ctx.read(llvm_cmake_path).splitlines():
# Extract "set ( FOO bar ... "
setfoo = line.partition("(")
if setfoo[1] != "(":
continue
if setfoo[0].strip().lower() != "set":
continue

# `kv` is assumed as \sKEY\s+VAL\s\).*
# Typical case is like
# LLVM_REQUIRED_CXX_STANDARD 17)
# Possible case -- It should be ignored.
# CMAKE_CXX_STANDARD ${...} CACHE STRING "...")
kv = setfoo[2].strip()
i = kv.find(" ")
if i < 0:
continue
k = kv[:i]

# Prefer LLVM_REQUIRED_CXX_STANDARD instead of CMAKE_CXX_STANDARD
if k == "LLVM_REQUIRED_CXX_STANDARD":
k = "CMAKE_CXX_STANDARD"
c[k] = None
if k not in c:
continue

# Skip if `CMAKE_CXX_STANDARD` is set with
# `LLVM_REQUIRED_CXX_STANDARD`.
# Then `v` will not be desired form, like "${...} CACHE"
if c[k] != None:
continue

# Pick up 1st word as the value.
# Note: It assumes unquoted word.
v = kv[i:].strip().partition(")")[0].partition(" ")[0]
c[k] = v

# Synthesize `LLVM_VERSION` for convenience.
c["LLVM_VERSION"] = "{}.{}.{}".format(
c["LLVM_VERSION_MAJOR"],
c["LLVM_VERSION_MINOR"],
c["LLVM_VERSION_PATCH"],
)

return c

def _write_dict_to_file(repository_ctx, filepath, header, vars):
# (fci + individual vars) + (fcd + dict items) + (fct)
fci = header
fcd = "\nllvm_vars={\n"
fct = "}\n"

for k, v in vars.items():
fci += '{} = "{}"\n'.format(k, v)
fcd += ' "{}": "{}",\n'.format(k, v)

repository_ctx.file(filepath, content = fci + fcd + fct)

def _llvm_configure_impl(repository_ctx):
_overlay_directories(repository_ctx)

llvm_cmake = "llvm/CMakeLists.txt"
vars = _extract_cmake_settings(
repository_ctx,
llvm_cmake,
)

_write_dict_to_file(
repository_ctx,
filepath = "vars.bzl",
header = "# Generated from {}\n\n".format(llvm_cmake),
vars = vars,
)

# Create a starlark file with the requested LLVM targets.
targets = repository_ctx.attr.targets
repository_ctx.file(
"llvm/targets.bzl",
content = "llvm_targets = " + str(targets),
executable = False,
)

llvm_configure = repository_rule(		llvm_configure = repository_rule(
implementation = _llvm_configure_impl,		implementation = _llvm_configure_impl,
local = True,		local = True,
configure = True,		configure = True,
attrs = {		attrs = {
"targets": attr.string_list(default = DEFAULT_TARGETS),		"targets": attr.string_list(default = DEFAULT_TARGETS),
},		},
)		)
Show All 22 Lines

utils/bazel/configure_overlay.py

This file was added.

#!/bin/python3

# This file is licensed under the Apache License v2.0 with LLVM Exceptions.

# See https://llvm.org/LICENSE.txt for license information.

# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

"""

Configure an overlay of the LLVM source tree that this file is part of and

the bazel files available in a subdirectory and generate the extra files

needed by the bazel build system.

"""

import os

import sys

import argparse

from overlay_directories import overlay_directories

def _check_python_version():

if sys.version_info[:2] < (3, 6):

raise RuntimeError(

"Must be invoked with a python 3.6+ interpreter but was {}"

+ "Python {}".format(sys.version_info)

)

def parse_arguments():

parser = argparse.ArgumentParser(

description="""

Configure an overlay of the LLVM source tree that this file is part of and

the bazel files available in a subdirectory and generate the extra files

needed by the bazel build system.

"""

)

parser.add_argument(

"--dst",

required=True,

help="Directory in which to place the overlay.",

)

parser.add_argument(

"--targets",

required=True,

nargs="+",

help="Space separated list of targets to build LLVM for.",

)

args = parser.parse_args()

return args

def _extract_cmake_settings(llvm_cmake_path):

# The list to be written to vars.bzl

# `CMAKE_CXX_STANDARD` may be used from WORKSPACE for the toolchain.

c = {

"CMAKE_CXX_STANDARD": None,

"LLVM_VERSION_MAJOR": None,

"LLVM_VERSION_MINOR": None,

"LLVM_VERSION_PATCH": None,

}

with open(llvm_cmake_path, "r") as f:

for line in f:

# Extract "set ( FOO bar ... "

setfoo = line.partition("(")

if setfoo[1] != "(":

continue

if setfoo[0].strip().lower() != "set":

continue

aaronmondalUnsubmitted

Done

cmd_str = " ".join([str(arg) for arg in cmd])

- print(f"Failed to execute overlay script: '{cmd}'\n" +

+ print(f"Failed to execute overlay script: '{cmd_str}'\n" +

f"Exited with code {e.returncode}\n" +

aaronmondal:

# `kv` is assumed as \s*KEY\s+VAL\s*\).*

# Typical case is like

# LLVM_REQUIRED_CXX_STANDARD 17)

# Possible case -- It should be ignored.

# CMAKE_CXX_STANDARD ${...} CACHE STRING "...")

kv = setfoo[2].strip()

i = kv.find(" ")

if i < 0:

continue

k = kv[:i]

# Prefer LLVM_REQUIRED_CXX_STANDARD instead of CMAKE_CXX_STANDARD

if k == "LLVM_REQUIRED_CXX_STANDARD":

k = "CMAKE_CXX_STANDARD"

c[k] = None

if k not in c:

continue

# Skip if `CMAKE_CXX_STANDARD` is set with

# `LLVM_REQUIRED_CXX_STANDARD`.

# Then `v` will not be desired form, like "${...} CACHE"

if c[k] is not None:

continue

# Pick up 1st word as the value.

# Note: It assumes unquoted word.

v = kv[i:].strip().partition(")")[0].partition(" ")[0]

c[k] = v

# Synthesize `LLVM_VERSION` for convenience.

c["LLVM_VERSION"] = "{}.{}.{}".format(

c["LLVM_VERSION_MAJOR"],

c["LLVM_VERSION_MINOR"],

c["LLVM_VERSION_PATCH"],

)

return c

def _write_dict_to_file(filepath, header, vars):

# (fci + individual vars) + (fcd + dict items) + (fct)

fci = header

fcd = "\nllvm_vars={\n"

fct = "}\n"

for k, v in vars.items():

aaronmondalUnsubmitted

Done

# Then `v` will not be desired form, like "${...} CACHE"

- if c[k] != None:

+ if c[k] is not None:

continue

aaronmondal:

fci += f'{k} = "{v}"\n'

fcd += f' "{k}": "{v}",\n'

with open(filepath, "w") as f:

f.write(fci + fcd + fct)

def main(args):

file_dir = os.path.dirname(os.path.abspath(__file__))

src_root = os.path.abspath(os.path.join(file_dir, "../.."))

overlay_path = os.path.join(file_dir, "llvm-project-overlay")

overlay_directories(src_root, overlay_path, args.dst)

llvm_cmake = "llvm/CMakeLists.txt"

llvm_cmake_path = os.path.join(src_root, llvm_cmake)

vars = _extract_cmake_settings(llvm_cmake_path)

_write_dict_to_file(

filepath=os.path.join(args.dst, "vars.bzl"),

header="# Generated from {}\n\n".format(llvm_cmake),

vars=vars,

)

# Create a starlark file with the requested LLVM targets.

targets_bzl_path = os.path.join(args.dst, "llvm", "targets.bzl")

targets_list_str = ", ".join([f'"{tgt}"' for tgt in args.targets])

with open(targets_bzl_path, "w") as tgt_file:

print(f"llvm_targets = [{targets_list_str}]", end="", file=tgt_file)

if __name__ == "__main__":

_check_python_version()

main(parse_arguments())

aaronmondalUnsubmitted

Done

if __name__ == "__main__":

- _check_python_version()

- main(parse_arguments())

+ _check_python_version()

+ main(parse_arguments())

aaronmondal:

aaronmondalUnsubmitted

Done

_write_dict_to_file(

- filepath = os.path.join(args.dst, "vars.bzl"),

- header = "# Generated from {}\n\n".format(llvm_cmake),

- vars = vars,

+ filepath=os.path.join(args.dst, "vars.bzl"),

+ header="# Generated from {}\n\n".format(llvm_cmake),

+ vars=vars,

)

# Create a starlark file with the requested LLVM targets.

aaronmondal:

utils/bazel/overlay_directories.py

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	def parse_arguments():

return args		return args


def _symlink_abs(from_path, to_path):		def _symlink_abs(from_path, to_path):
os.symlink(os.path.abspath(from_path), os.path.abspath(to_path))		os.symlink(os.path.abspath(from_path), os.path.abspath(to_path))


def main(args):		def overlay_directories(src, overlay, target):
for root, dirs, files in os.walk(args.overlay):		for root, dirs, files in os.walk(overlay):
# We could do something more intelligent here and only symlink individual		# We could do something more intelligent here and only symlink individual
# files if the directory is present in both overlay and src. This could also		# files if the directory is present in both overlay and src. This could also
# be generalized to an arbitrary number of directories without any		# be generalized to an arbitrary number of directories without any
# "src/overlay" distinction. In the current use case we only have two and		# "src/overlay" distinction. In the current use case we only have two and
# the overlay directory is always small, so putting that off for now.		# the overlay directory is always small, so putting that off for now.
rel_root = os.path.relpath(root, start=args.overlay)		rel_root = os.path.relpath(root, start=overlay)
if rel_root != ".":		if rel_root != ".":
os.mkdir(os.path.join(args.target, rel_root))		os.mkdir(os.path.join(target, rel_root))

for file in files:		for file in files:
relpath = os.path.join(rel_root, file)		relpath = os.path.join(rel_root, file)
_symlink_abs(os.path.join(args.overlay, relpath),		_symlink_abs(os.path.join(overlay, relpath),
os.path.join(args.target, relpath))		os.path.join(target, relpath))

for src_entry in os.listdir(os.path.join(args.src, rel_root)):		for src_entry in os.listdir(os.path.join(src, rel_root)):
if src_entry not in dirs:		if src_entry not in dirs:
relpath = os.path.join(rel_root, src_entry)		relpath = os.path.join(rel_root, src_entry)
_symlink_abs(os.path.join(args.src, relpath),		_symlink_abs(os.path.join(src, relpath),
os.path.join(args.target, relpath))		os.path.join(target, relpath))


if __name__ == "__main__":		if __name__ == "__main__":
_check_python_version()		_check_python_version()
main(parse_arguments())		args = parse_arguments()
		overlay_directories(args.src, args.overlay, args.target)

This is an archive of the discontinued LLVM Phabricator instance.

[bazel] Move bazel configuration to a Python scriptAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 495482

utils/bazel/configure.bzl

utils/bazel/configure_overlay.py

utils/bazel/overlay_directories.py

[bazel] Move bazel configuration to a Python script
AbandonedPublic