This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/libomptarget/deviceRTLs/nvptx/
-
libomptarget/
-
deviceRTLs/
-
nvptx/
2
CMakeLists.txt
-
unity.cu

Differential D69489

[libomptarget] Change nvcc compilation to use a unity build
ClosedPublic

Authored by JonChesterfield on Oct 27 2019, 4:02 PM.

Download Raw Diff

Details

Reviewers

ABataev
jdoerfert
grokos
RaviNarayanaswamy
hfinkel
ronlieb
gregrodgers

Commits

rGe9f9dfab82bb: [libomptarget] Change nvcc compilation to use a unity build

Summary

[libomptarget] Change nvcc compilation to use a unity build

This allows nvcc to inline functions between what would otherwise be distinct
translation units, which in turn removes any runtime cost from implementing
functions in source files (as opposed to inline in headers).

This will then allow the circular dependencies in deviceRTL to be readily
broken and individual components more easily shared between architectures.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

JonChesterfield created this revision.Oct 27 2019, 4:02 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 27 2019, 4:02 PM

Herald added subscribers: openmp-commits, mgorny. · View Herald Transcript

Harbormaster completed remote builds in B40114: Diff 226594.Oct 27 2019, 4:02 PM

My test coverage for nvptx is quite poor so I'd be interested if this breaks downstream tests.

JonChesterfield edited the summary of this revision. (Show Details)Oct 27 2019, 4:08 PM

Given that the build model I will advocate includes linking the device RTL into the application (as IR), I don't see missing incremental builds as a real drawback.

I think we should compile unity.cu also when we create a bitcode RTL. (This will simplify the cmake file, remove the need for llvm-link, ...)

openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
60	Why is this still needed, or put differently, why do we not compile the unity.cu into bitcode where this variable (cuda_src_files) is still used below?
62	same as above (I think)

My understanding of this cmake is that nvptx is built as both a static archive and as a llvm-link'ed bitcode archive. The former suggests a toolchain that may not be capable of LTO, the latter suggests a toolchain that definitely is. When llvm-link is available, so is opt.

I'd like to compile the translation units separately when we can. Incremental builds don't matter hugely as the build time is negligible. However separate compilation means we don't pick up spurious relationships between source files. E.g. if data_sharing.cu adds a static function that gets called from loop.cu, the concatenated build will work just fine but a standalone one wouldn't. It also means headers must be present in all compilation units that use them, instead of included from at least one unit that gets #included earlier.

Strongly in agreement with your build model of linking the deviceRTL with the application code. I see the header/source/library boundaries as useful for developers and necessary to erase at compile time.

I see. I think we can keep both building modes for now. LGTM.

This revision is now accepted and ready to land.Oct 30 2019, 4:04 PM

Cool. I view this change as pretty low risk - we can move to always using the unity build or back to separate compilation easily.

Closed by commit rGe9f9dfab82bb: [libomptarget] Change nvcc compilation to use a unity build (authored by JonChesterfield). · Explain WhyOct 30 2019, 7:00 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

openmp/

libomptarget/

deviceRTLs/

nvptx/

CMakeLists.txt

2 lines

unity.cu

25 lines

Diff 227211

openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	set(cuda_src_files
src/data_sharing.cu		src/data_sharing.cu
src/libcall.cu		src/libcall.cu
src/loop.cu		src/loop.cu
src/omptarget-nvptx.cu		src/omptarget-nvptx.cu
src/parallel.cu		src/parallel.cu
src/reduction.cu		src/reduction.cu
src/sync.cu		src/sync.cu
src/task.cu		src/task.cu
)		)
		jdoerfertUnsubmitted Not Done Reply Inline Actions Why is this still needed, or put differently, why do we not compile the unity.cu into bitcode where this variable (cuda_src_files) is still used below? jdoerfert: Why is this still needed, or put differently, why do we not compile the unity.cu into bitcode…

set(omp_data_objects src/omp_data.cu)		set(omp_data_objects src/omp_data.cu)
		jdoerfertUnsubmitted Not Done Reply Inline Actions same as above (I think) jdoerfert: same as above (I think)

# Get the compute capability the user requested or use SM_35 by default.		# Get the compute capability the user requested or use SM_35 by default.
# SM_35 is what clang uses by default.		# SM_35 is what clang uses by default.
set(default_capabilities 35)		set(default_capabilities 35)
if (DEFINED LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY)		if (DEFINED LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY)
set(default_capabilities ${LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY})		set(default_capabilities ${LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY})
libomptarget_warning_say("LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY is deprecated, please use LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES")		libomptarget_warning_say("LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY is deprecated, please use LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES")
endif()		endif()
Show All 12 Lines	if(${LIBOMPTARGET_NVPTX_DEBUG})
set(CUDA_DEBUG -DOMPTARGET_NVPTX_DEBUG=-1 -g --ptxas-options=-v)		set(CUDA_DEBUG -DOMPTARGET_NVPTX_DEBUG=-1 -g --ptxas-options=-v)
endif()		endif()

# NVPTX runtime library has to be statically linked. Dynamic linking is not		# NVPTX runtime library has to be statically linked. Dynamic linking is not
# yet supported by the CUDA toolchain on the device.		# yet supported by the CUDA toolchain on the device.
set(BUILD_SHARED_LIBS OFF)		set(BUILD_SHARED_LIBS OFF)
set(CUDA_SEPARABLE_COMPILATION ON)		set(CUDA_SEPARABLE_COMPILATION ON)
list(APPEND CUDA_NVCC_FLAGS -I${devicertl_base_directory})		list(APPEND CUDA_NVCC_FLAGS -I${devicertl_base_directory})
cuda_add_library(omptarget-nvptx STATIC ${cuda_src_files} ${omp_data_objects}		cuda_add_library(omptarget-nvptx STATIC unity.cu
OPTIONS ${CUDA_ARCH} ${CUDA_DEBUG})		OPTIONS ${CUDA_ARCH} ${CUDA_DEBUG})

# Install device RTL under the lib destination folder.		# Install device RTL under the lib destination folder.
install(TARGETS omptarget-nvptx ARCHIVE DESTINATION "${OPENMP_INSTALL_LIBDIR}")		install(TARGETS omptarget-nvptx ARCHIVE DESTINATION "${OPENMP_INSTALL_LIBDIR}")

target_link_libraries(omptarget-nvptx ${CUDA_LIBRARIES})		target_link_libraries(omptarget-nvptx ${CUDA_LIBRARIES})


▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

openmp/libomptarget/deviceRTLs/nvptx/unity.cu

This file was added.

				//===------ unity.cu - Unity build of NVPTX deviceRTL ------------ CUDA -*-===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Support compilers, specifically NVCC, which have not implemented link time
				// optimisation. This removes the runtime cost of moving inline functions into
				// source files in exchange for preventing efficient incremental builds.
				//
				//===----------------------------------------------------------------------===//

				#include "src/cancel.cu"
				#include "src/critical.cu"
				#include "src/data_sharing.cu"
				#include "src/libcall.cu"
				#include "src/loop.cu"
				#include "src/omp_data.cu"
				#include "src/omptarget-nvptx.cu"
				#include "src/parallel.cu"
				#include "src/reduction.cu"
				#include "src/sync.cu"
				#include "src/task.cu"