This is an archive of the discontinued LLVM Phabricator instance.

initial terraform configuration for Google buildbot workers
ClosedPublic

Authored by kuhnel on Jun 12 2020, 7:09 AM.

Details

Summary

This patch adds an initial terraform configuration for the new buildbot workers run at Google, on top of D81737.

This defined the cluster configuration and how the Docker images are deployed to the cluster.

Diff Detail

Event Timeline

kuhnel created this revision.Jun 12 2020, 7:09 AM
kuhnel abandoned this revision.Jun 12 2020, 7:27 AM
kuhnel updated this revision to Diff 270406.Jun 12 2020, 8:10 AM
  • first version of terraform configuration
kuhnel updated this revision to Diff 270408.Jun 12 2020, 8:12 AM

cleaned up dependencies

kuhnel retitled this revision from docker images for mlir-nvidiaSummary:Created folders to keep buildbot configuration for buildbotsowned at Google.First patch: add docker image and scripts for mlir-nvidia buildbotFuture patches will add more documentation, Terraform/kubernetes... to initial terraform configuration for Google buildbot workers.Jun 12 2020, 8:15 AM
kuhnel edited the summary of this revision. (Show Details)
kuhnel added reviewers: gkistanova, PaulkaToast, tra.
Harbormaster completed remote builds in B60124: Diff 270408.
tra added inline comments.Jun 12 2020, 10:42 AM
buildbot/google/README.md
16

past -> paste

buildbot/google/gcloud_config.sh
4 ↗(On Diff #270408)

config.sh?

If the script is executed while the current directory is not buildbot/google you may need to infer full file name for the config.

buildbot/google/terraform/README.md
16 ↗(On Diff #270408)

This will not work as it's implemented now, because config will not be found. See the comment in gcloud_config.sh

28 ↗(On Diff #270408)

*to* enable Kubernetes?

40 ↗(On Diff #270408)

What's expected to be in the file name ?
Is buildbot-token-mlir-nvidia special in any way. I do not see it mantioned anywhere else in the patch. Who/where/how is going to use this secret?

buildbot/google/terraform/main.tf
121 ↗(On Diff #270408)

Does it have something to do with the buildbot-token-mlir-nvidia we create as described in the Secrets section of the README?
I'm not quite sure how exactly those two are connected. Some comments with the details would be helpful here.

124–125 ↗(On Diff #270408)

Pardon my ignorance -- what is taint and why does it matter for us?

PaulkaToast added inline comments.Jun 12 2020, 2:01 PM
buildbot/google/README.md
32

I'd maybe add a note here about the ability to reproduce failures the buildbot encounters locally using the docker containers for debugging. As in, if an LLVM contributor breaks a Google buildbot by accident they can try to reproduce it locally without having access to our infrastructure.

40

nit: add a newline at the end of files. (:

buildbot/google/terraform/README.md
28 ↗(On Diff #270408)

If I'm not mistaken for new GCP projects, the Container Registry API must be enabled once as well.

kuhnel updated this revision to Diff 270762.Jun 15 2020, 8:31 AM
kuhnel marked 14 inline comments as done.
  • improved documentation of secrets and taints
  • fixed missing new lines
  • fixed config file paths
buildbot/google/terraform/README.md
16 ↗(On Diff #270408)

fixed the script.

40 ↗(On Diff #270408)

I fixed the name of the secrets and added more documentation on how they work.

buildbot/google/terraform/main.tf
121 ↗(On Diff #270408)

yes, that should be buildbot-token-mlir-nvidia, i seem to have mixed different versions of the files...

124–125 ↗(On Diff #270408)

tl;dr: This is a safe guard to only deploy container that require GPUs to machines with GPUs.

The long story:
https://cloud.google.com/kubernetes-engine/docs/how-to/gpus
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/

I added more documentation there.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 19 2020, 1:02 AM
This revision was automatically updated to reflect the committed changes.