This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] Change nvcc compilation to use a unity build
ClosedPublic

Authored by JonChesterfield on Oct 27 2019, 4:02 PM.

Details

Summary

[libomptarget] Change nvcc compilation to use a unity build

This allows nvcc to inline functions between what would otherwise be distinct
translation units, which in turn removes any runtime cost from implementing
functions in source files (as opposed to inline in headers).

This will then allow the circular dependencies in deviceRTL to be readily
broken and individual components more easily shared between architectures.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptOct 27 2019, 4:02 PM

My test coverage for nvptx is quite poor so I'd be interested if this breaks downstream tests.

JonChesterfield edited the summary of this revision. (Show Details)Oct 27 2019, 4:08 PM

Given that the build model I will advocate includes linking the device RTL into the application (as IR), I don't see missing incremental builds as a real drawback.

I think we should compile unity.cu also when we create a bitcode RTL. (This will simplify the cmake file, remove the need for llvm-link, ...)

openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
60

Why is this still needed, or put differently, why do we not compile the unity.cu into bitcode where this variable (cuda_src_files) is still used below?

62

same as above (I think)

My understanding of this cmake is that nvptx is built as both a static archive and as a llvm-link'ed bitcode archive. The former suggests a toolchain that may not be capable of LTO, the latter suggests a toolchain that definitely is. When llvm-link is available, so is opt.

I'd like to compile the translation units separately when we can. Incremental builds don't matter hugely as the build time is negligible. However separate compilation means we don't pick up spurious relationships between source files. E.g. if data_sharing.cu adds a static function that gets called from loop.cu, the concatenated build will work just fine but a standalone one wouldn't. It also means headers must be present in all compilation units that use them, instead of included from at least one unit that gets #included earlier.

Strongly in agreement with your build model of linking the deviceRTL with the application code. I see the header/source/library boundaries as useful for developers and necessary to erase at compile time.

jdoerfert accepted this revision.Oct 30 2019, 4:04 PM

I see. I think we can keep both building modes for now. LGTM.

This revision is now accepted and ready to land.Oct 30 2019, 4:04 PM

Cool. I view this change as pretty low risk - we can move to always using the unity build or back to separate compilation easily.

This revision was automatically updated to reflect the committed changes.