This is an archive of the discontinued LLVM Phabricator instance.

[buildbot] Added builders and slaves for the new CUDA build/test bots.
ClosedPublic

Authored by tra on Jun 3 2020, 2:12 PM.

Event Timeline

tra created this revision.Jun 3 2020, 2:12 PM
tra updated this revision to Diff 268307.Jun 3 2020, 2:25 PM

Changed bot/slave structure

tra updated this revision to Diff 268308.Jun 3 2020, 2:27 PM

Fixed notification list.

Harbormaster completed remote builds in B58989: Diff 268308.
gkistanova requested changes to this revision.Jun 7 2020, 11:20 PM

Hello Artem.

I have commented the getCUDAAnnotatedBuildFactory, but it doesn't seem you need a special build factory for your builders.
Just use the existing AnnotatedBuilder.getAnnotatedBuildFactory in the builders.py for your build configurations.

zorg/buildbot/builders/CUDATestsuiteBuilder.py
224

You don't need to handle is_legacy_mode, so, better skip it here and let the default machinery to take place.

227

is_legacy_mode used twice.

254

bot_dir is not defined.

257

You can skip it and let the LLVMBuildFactory handle it by default.

258

Should it be clean=clean here instead?

263

You do not use the builddir property. Even if you would need an absolute path into your build directory, you do not need a build step. There is a workdir property, just use that instead. For example, something like

WithProperties("%(workdir)s/" + f.obj_dir + "/bin/clang++")

would give you a fully qualified path to the installed clang++.

This revision now requires changes to proceed.Jun 7 2020, 11:20 PM
tra updated this revision to Diff 269302.Jun 8 2020, 11:53 AM
tra marked 7 inline comments as done.

Addressed review comments.

tra updated this revision to Diff 269351.Jun 8 2020, 1:55 PM

Use AnnotatedBuilder with an external build script.

tra added a comment.Jun 8 2020, 1:58 PM

Hello Artem.

I have commented the getCUDAAnnotatedBuildFactory, but it doesn't seem you need a special build factory for your builders.
Just use the existing AnnotatedBuilder.getAnnotatedBuildFactory in the builders.py for your build configurations.

Thank you for pointing this out. I've missed the fact that AnnotatedBuildFactory already includes LLVMBuildFactory.
I've changed the patch to use AnnotatedBuildFactory + a script to launch external command.

Thanks for updating the patch, Artem.

Could you elaborate why you need the script launcher (zorg/buildbot/builders/annotated/external.py), please? You can use your cuda-related scripts directly with the annotated builder without having an extra layer.

tra added a comment.Jun 8 2020, 4:18 PM

Thanks for updating the patch, Artem.

Could you elaborate why you need the script launcher (zorg/buildbot/builders/annotated/external.py), please? You can use your cuda-related scripts directly with the annotated builder without having an extra layer.

The main reason at the moment is that I want to decouple LLVM buildbot from the details of the slave implementation. It will give me the ability to tinker with what the slaves are doing without having to make the round-trip to the zorg repo. I.e. I'll land this patch and stop bothering you with reviews of every little WIP change until I have something working.

Second reason is that the details of the slave implementation may be of no use to anyone else. I'm likely to use some things that are google-specific. If the final version of the script ends up being fairly generic, I'll send a follow-up patch to add it to zorg repo and switch the bots to use it.

tra added a comment.Jun 12 2020, 12:18 PM

ping! Will the current version of the patch do?

I can temporarily set you up like this in the staging. You would do all the experiments you are after, tinker, and such. Once you are happy you could prepare a final patch for the review.
For how long do you expect tinkering?

tra added a comment.Jun 12 2020, 3:55 PM

I can temporarily set you up like this in the staging. You would do all the experiments you are after, tinker, and such. Once you are happy you could prepare a final patch for the review.
For how long do you expect tinkering?

There are two goals -- get CUDA testing back online ASAP, and improve bold/test across all the bots we have (i.e. deduplicate builds, optimize cloud use costs, etc.). Former I can do relatively quickly (weeks?), the latter will likely take much longer.

My rough plan is :

  • set up these annotated build slaves in staging
  • set up a bot version to roughly match basic CUDA functionality testing that used to be done by the old CUDA bot
  • replace old/dead CUDA bot with the replica of the new one, but also keep the copy of the bots in staging, so I can continue tinkering with them.

Does it make sense?

Staging is ready for your experiments.

tra added a comment.Jun 24 2020, 2:37 PM

Staging is ready for your experiments.

Thank you.

It appears I 'll need to land external.py in zorg -- AnnotatedBuild factory always checks out a fresh copy of llvm-zorg.
I've attempted to manually copy the script where the bot is looking for it, but it's a rather fragile arrengement as the buildmaster apparently ends up cleaning the repo directories now and then and removes it.

Are you OK with me landing the external.py alone for now?

You could put that script somewhere local and give me the fully qualified path to that script.I'll set the staging accordingly.

Thanks

Galina

tra added a comment.Jun 24 2020, 3:52 PM

You could put that script somewhere local and give me the fully qualified path to that script.I'll set the staging accordingly.

The local copy lives in /buildbot/external.py

tra added a comment.Jun 25 2020, 1:03 PM

Now the script runs before LLVM source tree has been checked out, so there's no sources for me to work with. AFAICT, revision hash is not available in the environment, so there's no way for me to check out the correct revision.

As of this patch in general, I think the best thing to do here is to teach AnnotatedBuilder to accept external script to launch if they are specified with an absolute path. This way we'll not need external.py at all and the builders could be just set up with getAnnotatedBuildFactory(script="/buildbot/cuda-build"),.

tra added a comment.Jun 25 2020, 1:09 PM
In D81118#2115258, @tra wrote:

AFAICT, revision hash is not available in the environment, so there's no way for me to check out the correct revision.

BUILDBOT_REVISION= appears to be set for some builds, but not others.
E.g. http://lab.llvm.org:8014/builders/clang-cuda-gce-build/builds/313/steps/annotate/logs/stdio does not have it
but http://lab.llvm.org:8014/builders/clang-cuda-gce-build/builds/376/steps/annotate/logs/stdio does.

I guess I can live with that, but it would be great to let the external script run after the source code has been checked out.

gkistanova added a comment.EditedJun 25 2020, 6:29 PM

Now the script runs before LLVM source tree has been checked out, so there's no sources for me to work with. AFAICT, revision hash is not available in the environment, so there's no way for me to check out the correct revision.

That was my error. Sorry. Now the source code should be checked out for you by the bot.

tra updated this revision to Diff 276822.Jul 9 2020, 1:14 PM

Updated CUDA bot setup.

tra updated this revision to Diff 277478.Jul 13 2020, 10:13 AM

Removed external.py

tra added a comment.Jul 13 2020, 10:16 AM

@gkistanova : I think the bots are in a reasonable shape now and are ready to move to the normal build master.
This patch simplifies things a bit based on the changes I've added in D83503 which lets annotated builder use external scripts (sounds like that's what you may have done on the staging bot already) and added an option to control whether the source repo is checked out.

Hello Artem,

Good. Now, since you are done with the experiments, please make sure all your scripts are in llvm-zorg, and update this patch accordingly.

tra added a comment.Jul 22 2020, 11:14 AM

Hello Artem,

Good. Now, since you are done with the experiments, please make sure all your scripts are in llvm-zorg, and update this patch accordingly.

Done. Please see the patches in the stack.

This revision is now accepted and ready to land.Jul 22 2020, 4:07 PM
tra updated this revision to Diff 279967.Jul 22 2020, 4:37 PM

Updated status.py with the new builder names.

This revision was automatically updated to reflect the committed changes.