This is an archive of the discontinued LLVM Phabricator instance.

Added Dockerfiles to build clang from sources.
ClosedPublic

Authored by ilya-biryukov on Jun 14 2017, 1:01 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

ilya-biryukov created this revision.Jun 14 2017, 1:01 AM
klimek added inline comments.Jun 14 2017, 1:15 AM
docs/Docker.rst
8–9 ↗(On Diff #102498)

Grammatically a bit weird. Perhaps "You can find a number of source to build docker images for clang in llvm/utils/docker".

14 ↗(On Diff #102498)

placeholders

30 ↗(On Diff #102498)

Perhaps add ", especially to maintain releases for software deployed to large distributed fleets." or somesuch.

utils/docker/build_docker_image.sh
40 ↗(On Diff #102498)

Perhaps check that docker is installed at the beginning. Also, did you by chance try all this on a mac?

utils/docker/debian8/build/Dockerfile
12 ↗(On Diff #102498)

I'd have expected "LLVM Developers <...>"? Chandler, opinions?

utils/docker/nvidia-cuda/build/build_install_clang.sh
1 ↗(On Diff #102498)

Are these build_install_clang.sh actually different at all?

Updated docs to include non-obvious decisions made due to Docker restrictions.

ilya-biryukov marked 3 inline comments as done.

Updated docs to address review comments.

ilya-biryukov marked an inline comment as done.Jun 14 2017, 2:15 AM
ilya-biryukov added inline comments.
utils/docker/nvidia-cuda/build/build_install_clang.sh
1 ↗(On Diff #102498)

There're all the same because Docker doesn't allow symlinked files to be added into docker images.
Added explanation for this in the docs/Docker.rst

ilya-biryukov marked an inline comment as done.

Changed maintainer label to "LLVM Developers".

ilya-biryukov added inline comments.Jun 14 2017, 2:20 AM
utils/docker/debian8/build/Dockerfile
12 ↗(On Diff #102498)

Makes sense.
Changed the label to "LLVM Developers", without an e-mail address.

Check if docker is installed in build_docker_image.sh

ilya-biryukov added inline comments.Jun 14 2017, 2:31 AM
utils/docker/build_docker_image.sh
40 ↗(On Diff #102498)

Added the check.
Haven't tried on a Mac yet. Will do that.

This revision is now accepted and ready to land.Jun 14 2017, 4:40 AM
jlebar added a subscriber: jlebar.Jun 14 2017, 9:32 AM
jlebar added inline comments.
docs/Docker.rst
2 ↗(On Diff #102506)

(I got cc'ed because you used the word "cuda". :)

One thing that would be helpful to me, as someone coming in with zero context, is: What is the difference between the two docker images? I have a vague idea from the name, but does this mean that I can't use the build generated via debian8 for cuda? Or, do I have to run the debian8 build *on* debian8? It would be helpful to me to have answers to those questions in the document.

Added docs providing guidance on which docker image to choose.

ilya-biryukov added inline comments.Jun 16 2017, 2:06 AM
docs/Docker.rst
2 ↗(On Diff #102506)

You can use clang inside debian8 image to compile for cuda, but you won't have access to the GPU if you run the image you've built in a container, i.e. you can build the cuda binary, but you can't really run it on the GPU.
Nvidia has their own wrapper around docker for running containers that do have access to the GPU. And you should also use their base images for that to work.

Added the docs explaining which image one would want to choose.

jlebar added inline comments.Jun 16 2017, 8:14 AM
docs/Docker.rst
2 ↗(On Diff #102506)

The explanation is helpful, thank you!

mehdi_amini requested changes to this revision.Jun 16 2017, 10:43 PM
mehdi_amini added a subscriber: mehdi_amini.

I'm interested into seeing this landing! However I also have concerns right now:

  • The immediate added value for LLVM I could see right now is low as is. It could be useful for newcomers to get an easy setup to build and play with LLVM. But for this we also need to include this in http://llvm.org/docs/GettingStarted.html
  • The scripts are fairly trivial and as such can't be useful to everyone: but adding more options would require a careful design (see inline comment). This is also adding some maintenance burden on the community. I actually think we should try to have *less* hard-coded logic in the build_install_clang.sh script in favor of some arguments. For instance it could take the list of projects to checkout and the cmake argument for the invocation. This is the easiest way I can imagine to be able to support as much use-case as possible while keeping the maintenance of this script as low as possible.
docs/Docker.rst
58 ↗(On Diff #102790)

Why does it install gcc? Why not apt-get install clang?

162 ↗(On Diff #102790)

That gonna be quickly obsolete (if not already), docker allows to squash images now.

utils/docker/debian8/build/build_install_clang.sh
62 ↗(On Diff #102790)

That's where it becomes to get interesting: why only cfe?
Why wouldn't we package libc++, compiler-rt, lld, lldb, etc.
I understand you may not care, or that it even may be a burden (extra size), but as LLVM developers it isn't clear to me why we wouldn't have by default everything included.

It also seems to me that this script build_install_clang.sh is yet another build script to maintain while we already have some in the repo.

I suspect everyone will have different needs, and we will end up with a very complicated script to handle all possibilities (I for one am interested into using this, but with a very different config: bootstrap+LTO+lld+only some targets to begin with).

We really need more thoughts around this.

81 ↗(On Diff #102790)

LLVM has a bootstrap process in a single cmake invocation, and can even build LLD during stage 1 and use it to link stage 2 (with LTO for example).

utils/docker/nvidia-cuda/build/build_install_clang.sh
1 ↗(On Diff #102498)

I don't understand?
It does not seem OK to me to duplicate this. Can't you pass a relative path to this file and store it in a common directory?

This revision now requires changes to proceed.Jun 16 2017, 10:43 PM
ilya-biryukov added a comment.EditedJun 19 2017, 2:17 AM

Thanks for the comments.

  • The immediate added value for LLVM I could see right now is low as is. It could be useful for newcomers to get an easy setup to build and play with LLVM. But for this we also need to include this in http://llvm.org/docs/GettingStarted.html

We hope the added value would be more significant after we improve the scripts and address your other comments (mostly, configurability).

  • The scripts are fairly trivial and as such can't be useful to everyone: but adding more options would require a careful design (see inline comment). This is also adding some maintenance burden on the community. I actually think we should try to have *less* hard-coded logic in the build_install_clang.sh script in favor of some arguments. For instance it could take the list of projects to checkout and the cmake argument for the invocation. This is the easiest way I can imagine to be able to support as much use-case as possible while keeping the maintenance of this script as low as possible.

I would advocate for an easier interface to the build script, even if it would add some maintenance costs (that said, I'm ready to maintain it).
I think having a script, accepting parameters like 'make a 2-stage bootstrap and install lld+clang)', is a better interface than 'checkout projects cfe, lld, pass arguments "-X -Y -Z" to cmake'. Keeping an interface to the build script simple (while also not letting the maintenance costs blow up) should be a priority here, IMO.

I've also requested this in the inline comment, but wanted to repeat myself here: could you elaborate more on your specific use-case? (i.e. what tools from LLVM do you want installed on your machine. only lld, compiled using 2-stage bootstrap and LTO?)

docs/Docker.rst
58 ↗(On Diff #102790)

No specific reason, since we do 2-stage build and end up with "clang, compiled from clang" in the end, this shouldn't matter.

162 ↗(On Diff #102790)

Thanks, I wasn't aware of the '--squash' flag coming in Docker 1.13.
It makes sense to go with a single image, instead of build+release, and suggest using a flag and a newer Docker version in the docs.
I'll add changes to incorporate that.

utils/docker/debian8/build/build_install_clang.sh
62 ↗(On Diff #102790)

The maintenance cost is exactly the reason why we hard-coded the script to build only clang. It makes the script very easy. That said, I'm in favour of more configuration in the future, building clang is just a first iteration.

I can spend some time making the script configurable enough to cover your use-case too. Than we could iterate to make it even more configurable.

I suspect everyone will have different needs, and we will end up with a very complicated script to handle all possibilities

Could you elaborate more on what exactly are 'only some targets to begin with'?

81 ↗(On Diff #102790)

Thanks, will look into using it.

utils/docker/nvidia-cuda/build/build_install_clang.sh
1 ↗(On Diff #102498)

Docker also requires all the files you ADD to the images to be under 'build context' (i.e. a dir you pass to docker build). Currently, the script passes a specific image directory (i.e. debian8 or nvidia-cuda) as a build content, therefore we need to have a script in each of those dirs.

I agree with you, though, duplicating the script is not ideal. I'll change the scripts to remove copies of build_install_clang.sh. Thanks for the comment.

I would advocate for an easier interface to the build script, even if it would add some maintenance costs (that said, I'm ready to maintain it).
I think having a script, accepting parameters like 'make a 2-stage bootstrap and install lld+clang)', is a better interface than 'checkout projects cfe, lld, pass arguments "-X -Y -Z" to cmake'. Keeping an interface to the build script simple (while also not letting the maintenance costs blow up) should be a priority here, IMO.

Feel free to advocate for it, but except that stating an opinion above I'm missing you arguments.

You're stating some priority above without clearly defining the goals. I'm missing the big picture: who is the target of these images? What problem are we addressing? In my previous comment I put forward that one the main use case for the LLVM project would be to help people to start using LLVM (which is why I mentioned http://llvm.org/docs/GettingStarted.html )

Now why should the interface for this script be different from the build interface we *already* have?
You mention that you're looking for a make a 2-stage bootstrap and install lld+clang kind of interface, but that's already possible with our cmake AFAIK, it should something like: cmake -DCLANG_ENABLE_BOOTSTRAP=On -DLLVM_ENABLE_PROJECTS="lld;clang" && make install

docs/Docker.rst
58 ↗(On Diff #102790)

Of course it shouldn't matter, but if both gcc and clang are equally available, I always tend to take clang.

utils/docker/debian8/build/build_install_clang.sh
62 ↗(On Diff #102790)

I rather not land this before we settle on this at this point (I don't see any urgency to land it that would motivate doing otherwise).

Could you elaborate more on what exactly are 'only some targets to begin with'?

-DLLVM_TARGETS_TO_BUILD=X86

81 ↗(On Diff #102790)
ilya-biryukov edited edge metadata.

Changed build script to always run from the llvm/utils/docker folder to remove duplicate copies of build_install_clang.sh

Feel free to advocate for it, but except that stating an opinion above I'm missing you arguments.

For a newcomer or even a contributor, not already familiar with LLVM's build system, it's not at all obvious which arguments one must pass to cmake to make build work for him.
Given that the docker images we're producing is a distribution of LLVM tools, rather than a build/testing framework, there's no point in coupling its interface to the LLVM buildsystem, effectively making it as complicated as the buildsystem itself.

E.g., say someone would like to provide deb packages for clang, lld, etc and is willing to use Dockerfiles we provide. Say, the person is not an LLVM contributor.
They won't really want to dive into the complexity of building LLVM, but it won't take them long to figure out they need to run build_docker_image.sh debian8 myrepo/clang_deb staging --with_clang --with_lld --with_libc++.
For software distributions, the things that really matter are:

  • Which components do you include in the distribution, i.e. clang, lld, libc++, sanitizers, etc.
  • Configurations of those components, i.e. whether to build with sanitizers, whether to use LTO, which backends to include, etc. (This is the trickiest part from my perspective)
  • SVN revision/branch to build from.

Everything else can be figured out by the scripts: SVN projects you need to checkout, whether you should use one of 2-stage build script from 'Advanced Build Configuration' you pointed out or just run cmake twice, which cmake targets to build/install for each of the components, which cmake flags to pass, etc.

I do think you make a very good point that all of this 'is already there' in the cmake scripts and providing an interface to them gives greater flexibility.
I think all of those arguments are also true for standalone LLVM builds outside docker containers. But in docker we have a very controllable environment, which makes it possible to make things that actually work and have a nice interface.

You're stating some priority above without clearly defining the goals. I'm missing the big picture: who is the target of these images? What problem are we addressing? In my previous comment I put forward that one the main use case for the LLVM project would be to help people to start using LLVM (which is why I mentioned http://llvm.org/docs/GettingStarted.html )

Our goal would be to build docker images, containing fresh versions of clang (and later, possibly, other LLVM tools). These Dockerfiles are a step in that direction.
We have plans to add sanitizers into those images afterwards, maybe some other LLVM tools as well.

Now why should the interface for this script be different from the build interface we *already* have?
You mention that you're looking for a make a 2-stage bootstrap and install lld+clang kind of interface, but that's already possible with our cmake AFAIK, it should something like: cmake -DCLANG_ENABLE_BOOTSTRAP=On -DLLVM_ENABLE_PROJECTS="lld;clang" && make install

It is possible to do this (and other things, of course) with cmake, my point is that cmake is not great at giving you a good and discoverable interface to all of that.

BTW, it would be very nice for us if we had some scripts for our images, and then we could iterate to make them more configurable/flexible and discuss/implement alternative designs.

utils/docker/debian8/build/build_install_clang.sh
62 ↗(On Diff #102790)

Got it, backend targets. Thanks for elaborating on that.

81 ↗(On Diff #102790)

I've already found that, but thanks for the link nevertheless.

Feel free to advocate for it, but except that stating an opinion above I'm missing you arguments.

For a newcomer or even a contributor, not already familiar with LLVM's build system, it's not at all obvious which arguments one must pass to cmake to make build work for him.

OK so we're focusing on newcomers right?

Given that the docker images we're producing is a distribution of LLVM tools, rather than a build/testing framework, there's no point in coupling its interface to the LLVM buildsystem, effectively making it as complicated as the buildsystem itself.

Or as simple / expressive...

E.g., say someone would like to provide deb packages for clang, lld, etc and is willing to use Dockerfiles we provide. Say, the person is not an LLVM contributor.
They won't really want to dive into the complexity of building LLVM,

If someone is only interested in getting a standard release build of LLVM, we already have the release script in the repository: /llvm/utils/release/test-release.sh

but it won't take them long to figure out they need to run build_docker_image.sh debian8 myrepo/clang_deb staging --with_clang --with_lld --with_libc++.

I totally disagree with this.
If you need to understand which subcomponent you want: I don't see why having --with-clang is a better interface than -DLLVM_ENABLE_PROJECTS=clang, this is the same level of abstraction. The syntax bike shed is not value adding here, and I even advocate that it is more valuable for newcomers to get use the same syntax they'll have to use if they want to upgrade to a more direct interaction with LLVM.

Also, if you want to provide an abstraction over the build system, I don't see any reason to couple this to Docker, this seems like a separate discussion to have than having Docker images. This should be implemented orthogonally: if such script is useful, it should live separately in the repo and be used to build within or outside of Docker. The docker invocation can invoke this script or not (let's punt the exact interface) the same way a build outside of Docker could.

For software distributions, the things that really matter are:

  • Which components do you include in the distribution, i.e. clang, lld, libc++, sanitizers, etc.
  • Configurations of those components, i.e. whether to build with sanitizers, whether to use LTO, which backends to include, etc. (This is the trickiest part from my perspective)

Right, same as above: CMake provide option for these. If there are options missing we can add them.

Another kind of CMake interface is the preconfigured cache files, for example: clang/cmake/caches/Apple-stage2-ThinLTO.cmake

For software distributions, the things that really matter are:

  • Which components do you include in the distribution, i.e. clang, lld, libc++, sanitizers, etc.
  • Configurations of those components, i.e. whether to build with sanitizers, whether to use LTO, which backends to include, etc. (This is the trickiest part from my perspective)

Right, same as above: CMake provide option for these. If there are options missing we can add them.
Another kind of CMake interface is the preconfigured cache files, for example: clang/cmake/caches/Apple-stage2-ThinLTO.cmake

My point was not that CMake can't do that, but rather that it makes those things complicated.

OK so we're focusing on newcomers right?

We're planning to make clang distribution in docker images using these scripts. It was not our original intention to focus on newcomers.

Given that the docker images we're producing is a distribution of LLVM tools, rather than a build/testing framework, there's no point in coupling its interface to the LLVM buildsystem, effectively making it as complicated as the buildsystem itself.

Or as simple / expressive...

Of course cmake is as expressive as you can get when building LLVM.
After playing around with cmake and multistage builds, I actually agree with you. It's so much less hassle to reuse the bootstrapping logic inside cmake scripts.
And a single cmake invocation seems to be enough to do basically everything one could want.

Also, if you want to provide an abstraction over the build system, I don't see any reason to couple this to Docker, this seems like a separate discussion to have than having Docker images. This should be implemented orthogonally: if such script is useful, it should live separately in the repo and be used to build within or outside of Docker. The docker invocation can invoke this script or not (let's punt the exact interface) the same way a build outside of Docker could.

I totally agree with both points. There's no reason it should be coupled to Docker and it's actually a separate discussion.
Since cmake is already there, let's use it to give full flexibility.
I would still argue that a highler-level interface similar to llvm/utils/release/test_release.sh has its uses.

Added a prototype that allows to pass cmake args to the build.
Docs aren't yet updated and only debian8 image was tested.

Provide an interface to do custom CMake when building Dockerfiles.

Pass --trust-server-cert to svn checkout.
LLVM's SSL cerificate doesn't work without that.

I've now finalized the rewrite to allow custom CMake invocations, following discussion with @mehdi_amini .
This should be good enough for both our use-cases. Please speak up if you see anything that should be improved.

@mehdi_amini, have you had a chance to look at the revamped scripts?

mehdi_amini added inline comments.Jun 28 2017, 12:52 PM
docs/Docker.rst
109 ↗(On Diff #103197)

It is very confusing why we need three scripts. I especially don't understand the introduction of the build_clang_image.sh script here.
Can we just get rid of this third script that (after looking at the code) is a thing wrapper and just give the example invocation of the build_docker_image.sh that does that?

utils/docker/build_docker_image.sh
19 ↗(On Diff #103197)

s/buildsciprt_args/buildscript_args/

utils/docker/debian8/release/Dockerfile
21 ↗(On Diff #103197)

I'd put the RUN line here *before* the add to clang.tar.gz so that we don't always rebuild this layer when creating a new clang.

utils/docker/scripts/build_install_llvm.sh
128 ↗(On Diff #103197)

Isn't the checkout gonna fail if a project is specified multiple times?

ilya-biryukov added inline comments.Jun 29 2017, 1:06 AM
docs/Docker.rst
109 ↗(On Diff #103197)

It is indeed just a thin wrapper with a sole purpose of including a proper "recipe" to do a 2-stage clang build.
Is there any harm in including it? It's a very small and simple script. Do you find it confusing to have it beside the main script to build docker images?

ilya-biryukov marked 2 inline comments as done.

Addressed review comments.

ilya-biryukov marked an inline comment as done.Jun 29 2017, 1:33 AM
ilya-biryukov added inline comments.
utils/docker/scripts/build_install_llvm.sh
128 ↗(On Diff #103197)

Added a check for that.

mehdi_amini added inline comments.Jun 29 2017, 8:47 AM
docs/Docker.rst
109 ↗(On Diff #103197)

Adding 90 lines of script for something that is 3 lines of documentation that the user should copy/paste? Doesn't justify itself to me.

mehdi_amini added inline comments.Jun 29 2017, 8:57 AM
docs/Docker.rst
109 ↗(On Diff #103197)

Also, as mentioned in a previous review: we don't have a script that is "build clang" today (other than the release script. This is an orthogonal concern to Docker so I'm against tying it to Docker here.

mehdi_amini added inline comments.Jun 29 2017, 10:24 AM
docs/Docker.rst
109 ↗(On Diff #103197)

Also mentioned in a previous review: the cmake preconfigured cache files: they are superior to a script IMO because the user can more easily reuse them (you can include and override for instance).

ilya-biryukov marked an inline comment as done.

Removed build_clang_image.sh, updated the docs accordingly.

Thanks! LGTM.

I'll be happy to iterate separately on providing an easy way to build a boostrap of clang (that would be easy to use with or without Docker)!

mehdi_amini accepted this revision.Jun 29 2017, 10:31 AM
This revision is now accepted and ready to land.Jun 29 2017, 10:31 AM

@mehdi_amini, this should hopefully address your comments.
Could you take a final look one more time?

docs/Docker.rst
109 ↗(On Diff #103197)

I've removed the script and updated the docs accordingly. They don't mention the cmake preconfigured cache files, but I hope that's a minor thing that could be fixed in further commits.

Thanks for the review! Will land the revision tomorrow.

This revision was automatically updated to reflect the committed changes.
klimek added a comment.Jul 4 2017, 1:47 AM

Thanks! LGTM.

I'll be happy to iterate separately on providing an easy way to build a boostrap of clang (that would be easy to use with or without Docker)!

As a data point - I just tried to build the docker image and without Ilya's help I wouldn't have known to look at the docs (usually docker build envs for projects just work out of the box to give me a full image; I don't know whether you have a different experience). We'll improve comments on the script itself, but as a user I'd really like to have something that I can run that will Just Work :)

Thanks! LGTM.

I'll be happy to iterate separately on providing an easy way to build a boostrap of clang (that would be easy to use with or without Docker)!

As a data point - I just tried to build the docker image and without Ilya's help I wouldn't have known to look at the docs (usually docker build envs for projects just work out of the box to give me a full image; I don't know whether you have a different experience). We'll improve comments on the script itself, but as a user I'd really like to have something that I can run that will Just Work :)

I don't disagree that having something that "just works" would be nice. However you have to take into account that LLVM is a very complex project and there is not a "single image" that can fit all.

If we start with an "easy to build" full image, it should be the same as the official packages: http://releases.llvm.org ; this is *not* what was proposed here.

Thanks! LGTM.

I'll be happy to iterate separately on providing an easy way to build a boostrap of clang (that would be easy to use with or without Docker)!

As a data point - I just tried to build the docker image and without Ilya's help I wouldn't have known to look at the docs (usually docker build envs for projects just work out of the box to give me a full image; I don't know whether you have a different experience). We'll improve comments on the script itself, but as a user I'd really like to have something that I can run that will Just Work :)

I don't disagree that having something that "just works" would be nice. However you have to take into account that LLVM is a very complex project and there is not a "single image" that can fit all.

If we start with an "easy to build" full image, it should be the same as the official packages: http://releases.llvm.org ; this is *not* what was proposed here.

I don't understand yet why would we need something that is the same as the official packages. I'd expect more something like an experience a distro would give you.

If we start with an "easy to build" full image, it should be the same as the official packages: http://releases.llvm.org ; this is *not* what was proposed here.

I don't understand yet why would we need something that is the same as the official packages. I'd expect more something like an experience a distro would give you.

Can you clarify:

  1. What is your expectation in term of "experience a distro would give" compared to the official packages?
  2. Why are we shipping the official packages as they are today instead of providing what you're expecting in 1)?