This is an archive of the discontinued LLVM Phabricator instance.

A logic to copy LLVM licences into docker images.
Needs ReviewPublic

Authored by ilya-biryukov on Sep 26 2017, 4:18 PM.

Details

Event Timeline

ilya-biryukov created this revision.Sep 26 2017, 4:18 PM
mehdi_amini edited edge metadata.Sep 26 2017, 4:19 PM

Can you elaborate why this script is desirable/needed?

Can you elaborate why this script is desirable/needed?

Sure, sorry for leaving out the context.

Including licences is technically required in any binary distribution of LLVM, including ones inside Docker images.
Copying the licenses could be made optional, but always putting the licenses into Docker images should not hurt either.

klimek added inline comments.Sep 27 2017, 4:18 AM
utils/docker/scripts/build_install_llvm.sh
200

Somewhat unrelated, but I just noticed this - if we're more pieces from LLVM than Clang in that directory, perhaps call it LLVM_BUILD_DIR? (different patch is obviously fine)

utils/docker/scripts/llvm_checksum/collect_licenses.py
1 ↗(On Diff #116736)

Isn't that actually the same as
find . -iname '*license*' |while read f; do mkdir -p "$TARGET_DIR/$(dirname "$f")" && cp "$f" "$TARGET_DIR/$f" ; done

ilya-biryukov added inline comments.Sep 27 2017, 8:25 AM
utils/docker/scripts/build_install_llvm.sh
200

Makes sense, will do in a separate commit.

utils/docker/scripts/llvm_checksum/collect_licenses.py
1 ↗(On Diff #116736)

Almost.
LLVM also embeds some licences into the source tree (MD5, Copyright.regex).
I was initially thinking of parsing references to other licences inside LICENSE.txt, so I chose Python. But then decided to go with hard-coded list of files for now.

Python is probably too much for this, will replace with a bash script :-)

  • Copy licenses in bash instead of Python.
ilya-biryukov marked 2 inline comments as done.Sep 27 2017, 11:20 AM

Can you elaborate why this script is desirable/needed?

Sure, sorry for leaving out the context.

Including licences is technically required in any binary distribution of LLVM, including ones inside Docker images.
Copying the licenses could be made optional, but always putting the licenses into Docker images should not hurt either.

OK, but why is this logic applicable to clang/llvm built with the docker script and not for any build?

It seems to me that if this is deemed useful/necessary, that should be part of make install and controlled by a CMake flag (-DLLVM_INSTALL_LICENCES=ON) which could be enabled by default or not.

Docker is great for managing dependencies and providing build reproducibility: it is great to provide scripts that makes it easier to use to build clang/LLVM. However I'm worried about embedding any information in these scripts about build / release / packaging that wouldn't be Docker specific.

klimek edited edge metadata.Sep 28 2017, 12:42 AM

Can you elaborate why this script is desirable/needed?

Sure, sorry for leaving out the context.

Including licences is technically required in any binary distribution of LLVM, including ones inside Docker images.
Copying the licenses could be made optional, but always putting the licenses into Docker images should not hurt either.

OK, but why is this logic applicable to clang/llvm built with the docker script and not for any build?

It seems to me that if this is deemed useful/necessary, that should be part of make install and controlled by a CMake flag (-DLLVM_INSTALL_LICENCES=ON) which could be enabled by default or not.

Docker is great for managing dependencies and providing build reproducibility: it is great to provide scripts that makes it easier to use to build clang/LLVM. However I'm worried about embedding any information in these scripts about build / release / packaging that wouldn't be Docker specific.

"make install" is not primarily a distribution mechanism, docker is (most folks upload their docker images to public registries without thinking much about this).
Docker to me is more similar to having debian or rpm generating rules; if we had those, I'd also argue that we should include the licenses in the right places of these.

"make install" is not primarily a distribution mechanism, docker is (most folks upload their docker images to public registries without thinking much about this).

I'm not sure why "primarily a distribution mechanism matters? How is one suppose to distribute?

Docker to me is more similar to having debian or rpm generating rules; if we had those, I'd also argue that we should include the licenses in the right places of these.

I'm using Docker in my CI purely for reproducibility. Again separation of concerns: mixing the tool (Docker) with the purpose (distribution) does not seem right to me.

I just checked and the pre-built releases on llvm.org don't ship with the license files (but one, by accident I guess).

So again a CMake option with make install seems a far better place to me. Second to this, release scripts in llvm, decoupled from Docker.

"make install" is not primarily a distribution mechanism, docker is (most folks upload their docker images to public registries without thinking much about this).

I'm not sure why "primarily a distribution mechanism matters? How is one suppose to distribute?

docker push to a public registry?

Docker to me is more similar to having debian or rpm generating rules; if we had those, I'd also argue that we should include the licenses in the right places of these.

I'm using Docker in my CI purely for reproducibility. Again separation of concerns: mixing the tool (Docker) with the purpose (distribution) does not seem right to me.

Well, we definitely want a license label in the docker image / want the right license field set for .deb / other distribution mechanisms, so I don't think there's full separation of concerns anyway.

I just checked and the pre-built releases on llvm.org don't ship with the license files (but one, by accident I guess).

So again a CMake option with make install seems a far better place to me. Second to this, release scripts in llvm, decoupled from Docker.

Where would license files go on a make install?

Putting it into release scripts is obviously fine, especially as you pointed out that we'll probably want to include them somehow in the prebuilt binary distributions.

"make install" is not primarily a distribution mechanism, docker is (most folks upload their docker images to public registries without thinking much about this).

I'm not sure why "primarily a distribution mechanism matters? How is one suppose to distribute?

docker push to a public registry?

Not much different from pushing a .tar.gz resulting from the install to llvm.org (which is actually that how I package my internal distribution right now, luckily I don't distribute outside the company, so I haven't hit the license problem...).

Docker to me is more similar to having debian or rpm generating rules; if we had those, I'd also argue that we should include the licenses in the right places of these.

I'm using Docker in my CI purely for reproducibility. Again separation of concerns: mixing the tool (Docker) with the purpose (distribution) does not seem right to me.

Well, we definitely want a license label in the docker image / want the right license field set for .deb / other distribution mechanisms, so I don't think there's full separation of concerns anyway.

We want a license in every distribution: using docker, debian or not. I'm not sure why duplicating the logic everywhere would be a good thing?

I just checked and the pre-built releases on llvm.org don't ship with the license files (but one, by accident I guess).

So again a CMake option with make install seems a far better place to me. Second to this, release scripts in llvm, decoupled from Docker.

Where would license files go on a make install?

"$CLANG_INSTALL_DIR/share/llvm/licenses"? Eventually make the path it an option?

"make install" is not primarily a distribution mechanism, docker is (most folks upload their docker images to public registries without thinking much about this).

I'm not sure why "primarily a distribution mechanism matters? How is one suppose to distribute?

docker push to a public registry?

Not much different from pushing a .tar.gz resulting from the install to llvm.org (which is actually that how I package my internal distribution right now, luckily I don't distribute outside the company, so I haven't hit the license problem...).

True. I'm surprised every time I see somebody distributing tar.gz's with binaries :), but given that people do it, a script that puts the licenses somewhere makes sense; it does only replace 5 lines of code, though, so I'm not sure how urgent that is (but I'm totally in agreement it's an improvement :)

Docker to me is more similar to having debian or rpm generating rules; if we had those, I'd also argue that we should include the licenses in the right places of these.

I'm using Docker in my CI purely for reproducibility. Again separation of concerns: mixing the tool (Docker) with the purpose (distribution) does not seem right to me.

Well, we definitely want a license label in the docker image / want the right license field set for .deb / other distribution mechanisms, so I don't think there's full separation of concerns anyway.

We want a license in every distribution: using docker, debian or not. I'm not sure why duplicating the logic everywhere would be a good thing?

I just checked and the pre-built releases on llvm.org don't ship with the license files (but one, by accident I guess).

So again a CMake option with make install seems a far better place to me. Second to this, release scripts in llvm, decoupled from Docker.

Where would license files go on a make install?

"$CLANG_INSTALL_DIR/share/llvm/licenses"? Eventually make the path it an option?

I agree that putting this into CMake and enforcing that every llvm projects lists its licenses there is probably right approach.
While easy to duplicate, keeping license-copy scripts up-to-date is somewhat hard.

That said, I think doing this properly in CMake would require some discussion and changes to all LLVM projects.
And it would be nice to have this change right now as a quick workaround.

And it would be nice to have this change right now as a quick workaround.

That isn't how LLVM development work AFAIK: we merge things when they are ready. And when we identify the right way of doing things, we try to do build an incremental path in this direction instead of building technical debt.
I can send an RFC to LLVM if needed.

That isn't how LLVM development work AFAIK: we merge things when they are ready. And when we identify the right way of doing things, we try to do build an incremental path in this direction instead of building technical debt.

Point taken, your proposed solution would certainly help every LLVM distribution, not only Docker images.
I don't see how a workaround like that building up that much technical debt, though. But we can do that in our particular distribution if you're opposed to having that upstream now.

I can send an RFC to LLVM if needed.

That would certainly be very helpful, if you have spare cycles to work on that, of course.