This is an archive of the discontinued LLVM Phabricator instance.

docker images for mlir-nvidia
ClosedPublic

Authored by kuhnel on Jun 12 2020, 7:03 AM.

Details

Summary

A bunch of people at Google are working on standardising the way we run our buildbots. We also want to add new ones. As part of this effort we want to upstream the configuration and documentation of our buildbots. This way other buildbot owners can benefit from our setup.

This first patch adds the docker image and some scripts for mlir-nvidia buildbot.

Future patches will add more documentation, Terraform/kubernetes configuration files, scripts, additional build bots

Please check if

  • Is this the right repository for such things?
  • Is this the right place in the folder structure?

Event Timeline

kuhnel created this revision.Jun 12 2020, 7:03 AM
kuhnel edited the summary of this revision. (Show Details)Jun 12 2020, 7:07 AM
kuhnel added reviewers: gkistanova, PaulkaToast, tra.
kuhnel updated this revision to Diff 270390.Jun 12 2020, 7:11 AM

removed stray edit

I don't know where is the best place for the Docker config to live, but there are some in the monorepo already FYI:

llvm-project/libcxx/utils/docker/debian9/buildbot/Dockerfile
llvm-project/clang/tools/clang-fuzzer/Dockerfile
llvm-project/libc/utils/buildbot/Dockerfile
llvm-project/llvm/utils/docker/debian8/Dockerfile
llvm-project/llvm/utils/docker/example/Dockerfile
llvm-project/llvm/utils/docker/nvidia-cuda/Dockerfile

tra added a comment.Jun 12 2020, 10:20 AM

LGTM.

buildbot/google/docker/build_deploy.sh
23–26

Nit: This looks somewhat fragile -- anyone building from the same checked out tree will end up with the same version and will potentially clobber each other.
I think it would be more robust to always use checked-in version (or just git hash of the commit) and add "+local changes" to the tag/version if the checked out tree has been modified.
Not a big deal in practice considering that there's likely to be only one owner.

buildbot/google/docker/buildbot-mlir-nvidia/run.sh
15

We should probably describe what's expected to be in the secrets directory in the README file.

I don't know where is the best place for the Docker config to live, but there are some in the monorepo already FYI:

llvm-project/libcxx/utils/docker/debian9/buildbot/Dockerfile
llvm-project/clang/tools/clang-fuzzer/Dockerfile
llvm-project/libc/utils/buildbot/Dockerfile
llvm-project/llvm/utils/docker/debian8/Dockerfile
llvm-project/llvm/utils/docker/example/Dockerfile
llvm-project/llvm/utils/docker/nvidia-cuda/Dockerfile

I think it is debatable where the Dockerfiles should live. As mendi mentioned there are examples in the monorepo.
This links them closely with the project the belong to which might make more sense for contributors to individual projects.
However I also see the benefit of this centralized location as well for standardizing the setup.

I think there is an importance to keeping the Dockerfiles decoupled from the Google specific infrastructure. This means that if someone outside of Google would like to setup their own buildbot, they can re-use the project-specific Dockerfiles instead of starting from scratch. They might wish to set it up without terraform or GCP.
If our Dockerfiles need something google specific (e.g our particular secret handling) we could make use of the FROM directive to pull in the generic docker config for the llvm project and then customize it specifically to our needs.

buildbot/google/docker/README.md
24

nit: newline.

kuhnel added a subscriber: asl.Jun 15 2020, 5:04 AM

I don't know where is the best place for the Docker config to live, but there are some in the monorepo already FYI:

Yes, I'm aware of this. @asl recommended putting it in llvm-zorg, as this is there buildbot related things should go.
He also proposed moving the other buildbot Dockerfiles here.

kuhnel updated this revision to Diff 270745.Jun 15 2020, 7:21 AM
kuhnel marked 3 inline comments as done.
kuhnel edited the summary of this revision. (Show Details)
  • replaced version file with number from registry
  • updated readme on handling secrets
buildbot/google/docker/build_deploy.sh
23–26

I can get the version number from the registry. Will add a patch for that. However then it's more difficult to trace changes between the docker repo and the git commits.

kuhnel marked an inline comment as done.Jun 15 2020, 7:22 AM
tra added inline comments.Jun 15 2020, 10:34 AM
buildbot/google/docker/build_deploy.sh
23–26

I don't think it's worth complicating it yet. I guess something as simple as adding a hash-based tag + 'dirty' bit would do.
This should allow unambiguously identifying uploaded images.
e.g. something like this:

tags = ${IMAGE_NAME}:latest ${IMAGE_NAME}:${VERSION} ${IMAGE_NAME}:${HASH} 
if <tree is dirty>; then
  tags += ${IMAGE_NAME}:${HASH}-${USER}-${BUILDNO}
  BUILDNO++
fi
36

What if I happen to run it in a tree I haven't updated for a while? We probably don't want accidentally overwriting the currently used images with an out-of-date version. Should we check if the image with particular version already exists and inform the user that they are about to overwrite it?

I don't know where is the best place for the Docker config to live, but there are some in the monorepo already FYI:

Yes, I'm aware of this. @asl recommended putting it in llvm-zorg, as this is there buildbot related things should go.
He also proposed moving the other buildbot Dockerfiles here.

To me this is fine, as long as it provides easy way for reproduction locally when a failure happens: I don't mind checking another repo to get the Docker images, but ideally I should just run a script and point to my local repo to get a reproduced build.

kuhnel updated this revision to Diff 270984.Jun 16 2020, 2:01 AM
kuhnel marked 3 inline comments as done.

improved tagging of images

Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2020, 2:01 AM
kuhnel updated this revision to Diff 270985.Jun 16 2020, 2:03 AM

fixed last patch

buildbot/google/docker/build_deploy.sh
36

For that we do need to check the remote repository. So I'll keep that code.

It's surprisingly hard to come up with a solid way to tag images :)

Harbormaster failed remote builds in B60433: Diff 270984!
kuhnel accepted this revision.Jun 19 2020, 2:08 AM
This revision is now accepted and ready to land.Jun 19 2020, 2:08 AM
kuhnel closed this revision.Jun 19 2020, 2:08 AM