⚙ D81118 [buildbot] Added builders and slaves for the new CUDA build/test bots.

tra created this revision.Jun 3 2020, 2:12 PM

Herald added subscribers: sanjoy.google, bixia, yaxunl. · View Herald TranscriptJun 3 2020, 2:12 PM

Harbormaster completed remote builds in B58984: Diff 268297.Jun 3 2020, 2:20 PM

Changed bot/slave structure

Fixed notification list.

Harbormaster completed remote builds in B58988: Diff 268307.Jun 3 2020, 2:56 PM

Harbormaster completed remote builds in B58989: Diff 268308.

Hello Artem.

I have commented the getCUDAAnnotatedBuildFactory, but it doesn't seem you need a special build factory for your builders.
Just use the existing AnnotatedBuilder.getAnnotatedBuildFactory in the builders.py for your build configurations.

zorg/buildbot/builders/CUDATestsuiteBuilder.py
224 ↗	(On Diff #268308)	You don't need to handle `is_legacy_mode`, so, better skip it here and let the default machinery to take place.
227 ↗	(On Diff #268308)	`is_legacy_mode` used twice.
254 ↗	(On Diff #268308)	`bot_dir` is not defined.
257 ↗	(On Diff #268308)	You can skip it and let the `LLVMBuildFactory` handle it by default.
258 ↗	(On Diff #268308)	Should it be `clean=clean` here instead?
263 ↗	(On Diff #268308)	You do not use the `builddir` property. Even if you would need an absolute path into your build directory, you do not need a build step. There is a `workdir` property, just use that instead. For example, something like WithProperties("%(workdir)s/" + f.obj_dir + "/bin/clang++") would give you a fully qualified path to the installed clang++.

This revision now requires changes to proceed.Jun 7 2020, 11:20 PM

Herald added a reviewer: aaron.ballman. · View Herald TranscriptJun 7 2020, 11:20 PM

Addressed review comments.

Harbormaster completed remote builds in B59518: Diff 269302.Jun 8 2020, 12:08 PM

Use AnnotatedBuilder with an external build script.

Harbormaster completed remote builds in B59546: Diff 269351.Jun 8 2020, 1:55 PM

In D81118#2078854, @gkistanova wrote:

Hello Artem.

I have commented the getCUDAAnnotatedBuildFactory, but it doesn't seem you need a special build factory for your builders.
Just use the existing AnnotatedBuilder.getAnnotatedBuildFactory in the builders.py for your build configurations.

Thank you for pointing this out. I've missed the fact that AnnotatedBuildFactory already includes LLVMBuildFactory.
I've changed the patch to use AnnotatedBuildFactory + a script to launch external command.

Thanks for updating the patch, Artem.

Could you elaborate why you need the script launcher (zorg/buildbot/builders/annotated/external.py), please? You can use your cuda-related scripts directly with the annotated builder without having an extra layer.

In D81118#2081147, @gkistanova wrote:

Thanks for updating the patch, Artem.

Could you elaborate why you need the script launcher (zorg/buildbot/builders/annotated/external.py), please? You can use your cuda-related scripts directly with the annotated builder without having an extra layer.

The main reason at the moment is that I want to decouple LLVM buildbot from the details of the slave implementation. It will give me the ability to tinker with what the slaves are doing without having to make the round-trip to the zorg repo. I.e. I'll land this patch and stop bothering you with reviews of every little WIP change until I have something working.

Second reason is that the details of the slave implementation may be of no use to anyone else. I'm likely to use some things that are google-specific. If the final version of the script ends up being fairly generic, I'll send a follow-up patch to add it to zorg repo and switch the bots to use it.

ping! Will the current version of the patch do?

I can temporarily set you up like this in the staging. You would do all the experiments you are after, tinker, and such. Once you are happy you could prepare a final patch for the review.
For how long do you expect tinkering?

In D81118#2090901, @gkistanova wrote:

I can temporarily set you up like this in the staging. You would do all the experiments you are after, tinker, and such. Once you are happy you could prepare a final patch for the review.
For how long do you expect tinkering?

There are two goals -- get CUDA testing back online ASAP, and improve bold/test across all the bots we have (i.e. deduplicate builds, optimize cloud use costs, etc.). Former I can do relatively quickly (weeks?), the latter will likely take much longer.

My rough plan is :

set up these annotated build slaves in staging
set up a bot version to roughly match basic CUDA functionality testing that used to be done by the old CUDA bot
replace old/dead CUDA bot with the replica of the new one, but also keep the copy of the bots in staging, so I can continue tinkering with them.

Does it make sense?

Staging is ready for your experiments.

In D81118#2092283, @gkistanova wrote:

Staging is ready for your experiments.

Thank you.

It appears I 'll need to land external.py in zorg -- AnnotatedBuild factory always checks out a fresh copy of llvm-zorg.
I've attempted to manually copy the script where the bot is looking for it, but it's a rather fragile arrengement as the buildmaster apparently ends up cleaning the repo directories now and then and removes it.

Are you OK with me landing the external.py alone for now?

You could put that script somewhere local and give me the fully qualified path to that script.I'll set the staging accordingly.

Thanks

Galina

In D81118#2112774, @gkistanova wrote:

You could put that script somewhere local and give me the fully qualified path to that script.I'll set the staging accordingly.

The local copy lives in /buildbot/external.py

Here's an example of the failure to find external.py: http://lab.llvm.org:8014/builders/clang-cuda-gce-build/builds/303/steps/annotate/logs/stdio

I have changed the staging to invoke /buildbot/external.py.
http://lab.llvm.org:8014/builders/clang-cuda-gce-build/builds/313/steps/annotate/logs/stdio

In D81118#2113147, @gkistanova wrote:

I have changed the staging to invoke /buildbot/external.py.
http://lab.llvm.org:8014/builders/clang-cuda-gce-build/builds/313/steps/annotate/logs/stdio

Now the script runs before LLVM source tree has been checked out, so there's no sources for me to work with. AFAICT, revision hash is not available in the environment, so there's no way for me to check out the correct revision.

As of this patch in general, I think the best thing to do here is to teach AnnotatedBuilder to accept external script to launch if they are specified with an absolute path. This way we'll not need external.py at all and the builders could be just set up with getAnnotatedBuildFactory(script="/buildbot/cuda-build"),.

In D81118#2115258, @tra wrote:

AFAICT, revision hash is not available in the environment, so there's no way for me to check out the correct revision.

BUILDBOT_REVISION= appears to be set for some builds, but not others.
E.g. http://lab.llvm.org:8014/builders/clang-cuda-gce-build/builds/313/steps/annotate/logs/stdio does not have it
but http://lab.llvm.org:8014/builders/clang-cuda-gce-build/builds/376/steps/annotate/logs/stdio does.

I guess I can live with that, but it would be great to let the external script run after the source code has been checked out.

Now the script runs before LLVM source tree has been checked out, so there's no sources for me to work with. AFAICT, revision hash is not available in the environment, so there's no way for me to check out the correct revision.

That was my error. Sorry. Now the source code should be checked out for you by the bot.

Updated CUDA bot setup.

Harbormaster completed remote builds in B63632: Diff 276822.Jul 9 2020, 1:14 PM

tra added a parent revision: D83503: [buildbot] Annotated builder tweaks.Jul 9 2020, 1:14 PM

Removed external.py

Harbormaster completed remote builds in B63993: Diff 277478.Jul 13 2020, 10:13 AM

@gkistanova : I think the bots are in a reasonable shape now and are ready to move to the normal build master.
This patch simplifies things a bit based on the changes I've added in D83503 which lets annotated builder use external scripts (sounds like that's what you may have done on the staging bot already) and added an option to control whether the source repo is checked out.

Hello Artem,

Good. Now, since you are done with the experiments, please make sure all your scripts are in llvm-zorg, and update this patch accordingly.

In D81118#2149139, @gkistanova wrote:

Hello Artem,

Good. Now, since you are done with the experiments, please make sure all your scripts are in llvm-zorg, and update this patch accordingly.

Done. Please see the patches in the stack.

LGTM

This revision is now accepted and ready to land.Jul 22 2020, 4:07 PM

Updated status.py with the new builder names.

Harbormaster completed remote builds in B65302: Diff 279967.Jul 22 2020, 4:37 PM

Closed by commit rZORGddbbbd3e88c0: [buildbot] Add Builders and slaves for the new CUDA bot. (authored by tra). · Explain WhyJul 22 2020, 4:53 PM

This revision was automatically updated to reflect the committed changes.

This is an archive of the discontinued LLVM Phabricator instance.

[buildbot] Added builders and slaves for the new CUDA build/test bots.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 279970

buildbot/osuosl/master/config/builders.py

buildbot/osuosl/master/config/slaves.py

buildbot/osuosl/master/config/status.py