This is an archive of the discontinued LLVM Phabricator instance.

[zorg] Add buildbot for OpenMP on AMDGPU
ClosedPublic

Authored by david-salinas on Jul 27 2021, 6:01 PM.

Details

Summary

Build OpenMP for AMDGPU target

Event Timeline

david-salinas created this revision.Jul 27 2021, 6:01 PM
david-salinas requested review of this revision.Jul 27 2021, 6:01 PM

nice start David, i am not a buildbot expert, so i will defer to Galina and Johannes
also adding Michael Kruse would be good. @Meinersbur

Very cool, is this currently passing?

ashi1 added a comment.Jul 28 2021, 8:29 AM

Same question as @jdoerfert , and are we passing SOLVVEVV inside llvm-testsuite?

buildbot/osuosl/master/config/builders.py
1375

Why do we need this separate build, if we already have openmp-offload-amdgpu-runtime below? Is there a benefit to running this without -DLLVM_ENABLE_RUNTIMES=openmp?

buildbot/osuosl/master/config/status.py
268

please remove this comment

275

builders should be the builder name, either openmp-offload-amdgpu-project or openmp-offload-amdgpu-runtime, or both in this case.

buildbot/osuosl/master/config/workers.py
257

please remove this comment

Responded to some comments. I will publish a new patch with changes to address the comments shortly.

buildbot/osuosl/master/config/builders.py
1375

There is an potential race condition where the runtime build needs 'opt' already built, and if it isn't then the runtime build fails. So, we suspect doing it this way alleviates the potential for this race condition.

buildbot/osuosl/master/config/status.py
268

ACK

buildbot/osuosl/master/config/workers.py
257

ACK

Patch 2: addressed comments from ashi1

@david-salinas: Any reason to remove the offloading part? I added it to the CUDA builder to make clear that the other OpenMP builders (such as openmp-gcc-x86_64-linux-debian) don't test libomptarget, not even host-offloading. If you argue that that information is redundant, I would remove it from the CUDA builders as well to have some uniformity.

Same question as @jdoerfert , and are we passing SOLVVEVV inside llvm-testsuite?

We are not. I found it useful to track progress on standard-conformance. If everything other than SOLLVE-V&V would pass, it would mark the build as warning. However, @gkistanova seems to be against builds in a permanent warnings state (compiler warnings don't trigger that state) for in-production builds. Hence, I would remove the SOLLVE-V&V steps for openmp-offload-cuda-project and openmp-offload-cuda-runtime if moving to production. Therefore I suggest to set testsuite_sollvevv=False set as well.

buildbot/osuosl/master/config/builders.py
1375

For openmp-offload-cuda, there is a LLVM_ENABLE_PROJECTS=openmp and a LLVM_ENABLE_RUNTIMES=openmp build because I want to ensure that both supported build configurations are working.

Whether we want to drop support for LLVM_ENABLE_PROJECTS=openmp is a different subject. Personally, I find it very useful for tools (code completion, include-what-you-use, sanitizer builds, code coverage, static analyzers, ccache, IDE integration, ...) that are difficult or need explicit support to do on nested cmake builds.

Meinersbur added inline comments.Jul 28 2021, 11:52 PM
buildbot/osuosl/master/config/builders.py
1383

[serious] Don't set the install prefix. It is also set by getOpenMPCMakeBuildFactory

1385

Use depends_on_projects=[''llvm,'clang','openmp','lld'] instead

1388

I suspect this will not work. It would require that the command line is executed by a shell. There is the WithProperties class to pass worker-side properties.

For incremental builds (clean=False), using ccache has little value, maybe if there is a revert. But I suspect that the increased overhead by ccache would not make worth it.

1389

You may want to build LLVM itself using lld as well (assuming it is available on the build systen): -DLLVM_ENABLE_LLD=ON"

1390

LLVM_PARALLEL_LINK_JOBS is likely not needed. If your system has sufficient memory, I suggest to remove it.

1406

The LLVM_ENABLE_RUNTIMES build does not work well with incremental builds at the moment. See D106781.

1416

LLVM_ENABLE_RUNTIMES is redundant with enable_runtimes=['openmp']

buildbot/osuosl/master/config/workers.py
257

How much CPU/GPU RAM?

Patch 3: addressed comments from Meinersbur

david-salinas marked 13 inline comments as done.Jul 29 2021, 10:30 AM

Uploaded new patch (3) to address comments from Meinersbur, and mark comments Done.

@david-salinas: Any reason to remove the offloading part? I added it to the CUDA builder to make clear that the other OpenMP builders (such as openmp-gcc-x86_64-linux-debian) don't test libomptarget, not even host-offloading. If you argue that that information is redundant, I would remove it from the CUDA builders as well to have some uniformity.

Same question as @jdoerfert , and are we passing SOLVVEVV inside llvm-testsuite?

We are not. I found it useful to track progress on standard-conformance. If everything other than SOLLVE-V&V would pass, it would mark the build as warning. However, @gkistanova seems to be against builds in a permanent warnings state (compiler warnings don't trigger that state) for in-production builds. Hence, I would remove the SOLLVE-V&V steps for openmp-offload-cuda-project and openmp-offload-cuda-runtime if moving to production. Therefore I suggest to set testsuite_sollvevv=False set as well.

Sorry @Meinersbur , I didn't see the comment initially. I initially thought it was a bit redundant. But I see your point. I'll add back "offload" to the name in a new patch.

Patch 4: rename builders to include offload in name

@gkistanova and @Meinersbur I've pushed a new patch, re-adding "offload" to the Build names. Do you think you could take a look at this today? We're hoping to get this buildbot up and running on Staging today if possible. Thanks!

JonChesterfield added inline comments.
buildbot/osuosl/master/config/builders.py
1375

There's a patch at https://reviews.llvm.org/D107156 which may have fixed the opt dependency. If it has, we should add llvm-link in similar fashion, and then all may be well.

1375

The challenge for ENABLE_PROJECTS is the clang used has to be a very close match to the one that corresponds to the deviceRTL, sometimes the exact same hash from the monorepo when the internal API between the two changes. That makes it very error prone to use. We could mostly resolve that hazard by having cmake check for an exact hash/version of clang when building via ENABLE_PROJECTS

1390

Comment further down claims 16gb which is unlikely to be sufficient

ronlieb added inline comments.Aug 2 2021, 5:32 PM
buildbot/osuosl/master/config/builders.py
1390

64 GB for CPU
16 GB for GPU

Patch 6: correct memory details in workers.py

david-salinas marked 2 inline comments as done.Aug 3 2021, 9:20 AM
david-salinas added inline comments.
buildbot/osuosl/master/config/builders.py
1390

@JonChesterfield I've posted a new patch with the memory clarified.

@gkistanova @jdoerfert Could you take a look at my latest patch. I think I've addressed all issues. Thanks, Dave

This revision is now accepted and ready to land.Aug 4 2021, 12:27 AM
david-salinas closed this revision.Aug 4 2021, 10:08 AM