This is an archive of the discontinued LLVM Phabricator instance.

lib/Driver/ToolChains/Cuda.cpp
170–182 ↗	(On Diff #118912)	I'd keep this code. It appears to serve useful purpose as it requires CUDA installation to have at least some libdevice library in it. It gives us a change to find a valid installation, instead of ailing some time later when we ask for a libdevice file and fail because there are none.
556 ↗	(On Diff #118912)	This sets default GPU arch for everyone based on OPENMP requirements. Perhaps this should be predicated on this being openmp compilation. IMO to avoid unnecessary surprises, the default for CUDA compilation should follow defaults of nvcc. sm_30 becomes default only in CUDA-9.

Hahnfeld marked an inline comment as done.Oct 13 2017, 10:15 AM

Hahnfeld added inline comments.

lib/Driver/ToolChains/Cuda.cpp
170–182 ↗	(On Diff #118912)	We had some internal discussions about this after I submitted the patch here. The main question is: Do we want to support CUDA installations without libdevice and are there use cases for that? I'd say that the user should be able to use a toolchain without libdevice together with `-nocudalib`.
540 ↗	(On Diff #118912)	This check guards the whole block.
556 ↗	(On Diff #118912)	This branch is only executed for OpenMP, see above

tra added inline comments.Oct 13 2017, 10:52 AM

lib/Driver/ToolChains/Cuda.cpp
170–182 ↗	(On Diff #118912)	Sounds reasonable. How about keeping the code but putting it under `if(!hasArg(nocudalib))`?
556 ↗	(On Diff #118912)	OK. I've missed that 'if'.

Hahnfeld marked 4 inline comments as done.Oct 13 2017, 10:58 AM

Hahnfeld added inline comments.

lib/Driver/ToolChains/Cuda.cpp
170–182 ↗	(On Diff #118912)	Ok, I'll do that in a separate patch and keep the code here for now.

gtbercea added inline comments.Oct 13 2017, 11:02 AM

lib/Driver/ToolChains/Cuda.cpp
170–182 ↗	(On Diff #118912)	The problem with nocudalib is that if for example you write a test, which looks to verify some device facing feature that requires a libdevice to be found (so you don't want to use nocudalib), it will probably work on your machine which has the correct CUDA setup but fail on another machine which does not (which is where you want to use nocudalib). You can see the contradiction there.

gtbercea added inline comments.Oct 13 2017, 11:04 AM

lib/Driver/ToolChains/Cuda.cpp
170–182 ↗	(On Diff #118912)	Just to be clear I am arguing for keeping this code :)

gtbercea added inline comments.Oct 13 2017, 11:05 AM

lib/Driver/ToolChains/Cuda.h
90 ↗	(On Diff #118912)	I would also like to keep the spirit of this code if not in this exact form at least something that performs the same functionality.

tra added inline comments.Oct 13 2017, 11:11 AM

lib/Driver/ToolChains/Cuda.cpp
170–182 ↗	(On Diff #118912)	@gtbercea: I'm not sure I follow your example. If you're talking about clang tests, we do have fake CUDA installation setup under test/Driver/Inputs which removes dependency on whatever CUDA you may or may not have installed on your machine. I also don't see a contradiction -- you you do need libdevice, it makes no point picking a broken CUDA installation which does not have any libdevice files. If you explicitly tell compiler that you don't need libdevice, that would make CUDA w/o libdevice acceptable. With --cuda-path you do have a way to tell clang which installation you want it to use. What do I miss?

tra added inline comments.Oct 13 2017, 11:13 AM

lib/Driver/ToolChains/Cuda.cpp
170–182 ↗	(On Diff #118912)	Ah, you were arguing with Hahnfeld@'s -nocudalib example. Then I guess we're in violent agreement.

gtbercea added inline comments.Oct 13 2017, 11:16 AM

lib/Driver/ToolChains/Cuda.cpp
170–182 ↗	(On Diff #118912)	I fully agree with this: "you do need libdevice, it makes no point picking a broken CUDA installation which does not have any libdevice files. If you explicitly tell compiler that you don't need libdevice, that would make CUDA w/o libdevice acceptable." I was trying to show an example of a situation where you have your code compiled using nocudalib on one machine and then the same code will error on a machine which requires the nocudalib flag to be passed to make up for the absence of libdevice.

gtbercea added inline comments.Oct 13 2017, 11:17 AM

lib/Driver/ToolChains/Cuda.cpp
170–182 ↗	(On Diff #118912)	Yes it was a counter argument to that! :)

tra added a reviewer: tra.Oct 13 2017, 11:19 AM

tra removed a subscriber: tra.

gtbercea added inline comments.Oct 13 2017, 11:19 AM

lib/Driver/ToolChains/Cuda.h
90 ↗	(On Diff #118912)	@tra what's your opinion on this code? Should this stay, stay but modified to be more robust or taken out completely?

tra added inline comments.Oct 13 2017, 11:25 AM

lib/Driver/ToolChains/Cuda.h
90 ↗	(On Diff #118912)	There are currently no users for this. In general, I would rather not have magically-changing default GPU based on how broken your CUDA installation is. IMO it would be better to keep defaults static and fail if prerequisites are not met.

gtbercea added inline comments.Oct 13 2017, 11:39 AM

lib/Driver/ToolChains/Cuda.h
90 ↗	(On Diff #118912)	I would have thought that it is up to the compiler to select, as default, the lowest viable compute capability. This is what this code aims to do (whether it actually does that's a separate issue :) ).

gtbercea added inline comments.Oct 13 2017, 11:41 AM

lib/Driver/ToolChains/Cuda.h
90 ↗	(On Diff #118912)	The reason I added this code in the first place was to overcome the fact that something like a default of sm_30 may work on the K40 but once you go to newer Pascal, Volta GPUs then you need a new minimum compute capability that is supported.

Check that the user didn't specify a value lower than sm_30 and re-add some code as discussed.

tra added inline comments.Oct 13 2017, 1:13 PM

lib/Driver/ToolChains/Cuda.h
90 ↗	(On Diff #118912)	Should this stay, stay but modified to be more robust or taken out completely? I'd take it out, at least for now as you've removed the only user of that function. In general, though, compilers tend to use conservative defaults and for CUDA that would be the lowest GPU variant supported by compiler. In case of CUDA it's determined by the CUDA SDK version. Figuring lowers supported version via libdevice mapping we've created is wrong. E.g. with this patch and -nocudalib you may end up using CUDA-9 without any libdevice and would return sm_20. If/when we need to figure out minimum supported version, it should be based directly on the value returned by version().

LGTM

tra accepted this revision.Oct 16 2017, 1:48 PM

This revision is now accepted and ready to land.Oct 16 2017, 1:48 PM

Closed by commit rL315996: [CMake][OpenMP] Customize default offloading arch (authored by Hahnfeld). · Explain WhyOct 17 2017, 6:37 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

CMakeLists.txt

11 lines

include/

clang/

Config/

config.h.cmake

3 lines

lib/

Driver/

ToolChains/

Cuda.h

11 lines

Cuda.cpp

15 lines

Diff 119310

cfe/trunk/CMakeLists.txt

Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	if (NOT(CLANG_DEFAULT_RTLIB STREQUAL "" OR
message(WARNING "Resetting default rtlib to use platform default")		message(WARNING "Resetting default rtlib to use platform default")
set(CLANG_DEFAULT_RTLIB "" CACHE STRING		set(CLANG_DEFAULT_RTLIB "" CACHE STRING
"Default runtime library to use (\"libgcc\" or \"compiler-rt\", empty for platform default)" FORCE)		"Default runtime library to use (\"libgcc\" or \"compiler-rt\", empty for platform default)" FORCE)
endif()		endif()

set(CLANG_DEFAULT_OPENMP_RUNTIME "libomp" CACHE STRING		set(CLANG_DEFAULT_OPENMP_RUNTIME "libomp" CACHE STRING
"Default OpenMP runtime used by -fopenmp.")		"Default OpenMP runtime used by -fopenmp.")

		# OpenMP offloading requires at least sm_30 because we use shuffle instructions
		# to generate efficient code for reductions.
		set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
		"Default architecture for OpenMP offloading to Nvidia GPUs.")
		string(REGEX MATCH "^sm_([0-9]+)$" MATCHED_ARCH "${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}")
		if (NOT DEFINED MATCHED_ARCH OR "${CMAKE_MATCH_1}" LESS 30)
		message(WARNING "Resetting default architecture for OpenMP offloading to Nvidia GPUs to sm_30")
		set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_30" CACHE STRING
		"Default architecture for OpenMP offloading to Nvidia GPUs." FORCE)
		endif()

set(CLANG_VENDOR ${PACKAGE_VENDOR} CACHE STRING		set(CLANG_VENDOR ${PACKAGE_VENDOR} CACHE STRING
"Vendor-specific text for showing with version information.")		"Vendor-specific text for showing with version information.")

if( CLANG_VENDOR )		if( CLANG_VENDOR )
add_definitions( -DCLANG_VENDOR="${CLANG_VENDOR} " )		add_definitions( -DCLANG_VENDOR="${CLANG_VENDOR} " )
endif()		endif()

set(CLANG_REPOSITORY_STRING "" CACHE STRING		set(CLANG_REPOSITORY_STRING "" CACHE STRING
▲ Show 20 Lines • Show All 478 Lines • Show Last 20 Lines

cfe/trunk/include/clang/Config/config.h.cmake

	Show All 14 Lines
	#define CLANG_DEFAULT_CXX_STDLIB "${CLANG_DEFAULT_CXX_STDLIB}"			#define CLANG_DEFAULT_CXX_STDLIB "${CLANG_DEFAULT_CXX_STDLIB}"

	/* Default runtime library to use. */			/* Default runtime library to use. */
	#define CLANG_DEFAULT_RTLIB "${CLANG_DEFAULT_RTLIB}"			#define CLANG_DEFAULT_RTLIB "${CLANG_DEFAULT_RTLIB}"

	/* Default OpenMP runtime used by -fopenmp. */			/* Default OpenMP runtime used by -fopenmp. */
	#define CLANG_DEFAULT_OPENMP_RUNTIME "${CLANG_DEFAULT_OPENMP_RUNTIME}"			#define CLANG_DEFAULT_OPENMP_RUNTIME "${CLANG_DEFAULT_OPENMP_RUNTIME}"

				/* Default architecture for OpenMP offloading to Nvidia GPUs. */
				#define CLANG_OPENMP_NVPTX_DEFAULT_ARCH "${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}"

	/* Multilib suffix for libdir. */			/* Multilib suffix for libdir. */
	#define CLANG_LIBDIR_SUFFIX "${CLANG_LIBDIR_SUFFIX}"			#define CLANG_LIBDIR_SUFFIX "${CLANG_LIBDIR_SUFFIX}"

	/* Relative directory for resource files */			/* Relative directory for resource files */
	#define CLANG_RESOURCE_DIR "${CLANG_RESOURCE_DIR}"			#define CLANG_RESOURCE_DIR "${CLANG_RESOURCE_DIR}"

	/* Directories clang will search for headers */			/* Directories clang will search for headers */
	#define C_INCLUDE_DIRS "${C_INCLUDE_DIRS}"			#define C_INCLUDE_DIRS "${C_INCLUDE_DIRS}"
	Show All 34 Lines

cfe/trunk/lib/Driver/ToolChains/Cuda.h

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	public:
/// \brief Get the detected Cuda library path.		/// \brief Get the detected Cuda library path.
StringRef getLibPath() const { return LibPath; }		StringRef getLibPath() const { return LibPath; }
/// \brief Get the detected Cuda device library path.		/// \brief Get the detected Cuda device library path.
StringRef getLibDevicePath() const { return LibDevicePath; }		StringRef getLibDevicePath() const { return LibDevicePath; }
/// \brief Get libdevice file for given architecture		/// \brief Get libdevice file for given architecture
std::string getLibDeviceFile(StringRef Gpu) const {		std::string getLibDeviceFile(StringRef Gpu) const {
return LibDeviceMap.lookup(Gpu);		return LibDeviceMap.lookup(Gpu);
}		}
/// \brief Get lowest available compute capability
/// for which a libdevice library exists.
std::string getLowestExistingArch() const {
std::string LibDeviceFile;
for (auto key : LibDeviceMap.keys()) {
LibDeviceFile = LibDeviceMap.lookup(key);
if (!LibDeviceFile.empty())
return key;
}
return "sm_20";
}
};		};

namespace tools {		namespace tools {
namespace NVPTX {		namespace NVPTX {

// Run ptxas, the NVPTX assembler.		// Run ptxas, the NVPTX assembler.
class LLVM_LIBRARY_VISIBILITY Assembler : public Tool {		class LLVM_LIBRARY_VISIBILITY Assembler : public Tool {
public:		public:
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

cfe/trunk/lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 536 Lines • ▼ Show 20 Lines	if (!DAL)
DAL = new DerivedArgList(Args.getBaseArgs());		DAL = new DerivedArgList(Args.getBaseArgs());

const OptTable &Opts = getDriver().getOpts();		const OptTable &Opts = getDriver().getOpts();

// For OpenMP device offloading, append derived arguments. Make sure		// For OpenMP device offloading, append derived arguments. Make sure
// flags are not duplicated.		// flags are not duplicated.
// Also append the compute capability.		// Also append the compute capability.
if (DeviceOffloadKind == Action::OFK_OpenMP) {		if (DeviceOffloadKind == Action::OFK_OpenMP) {
for (Arg *A : Args){		for (Arg *A : Args) {
bool IsDuplicate = false;		bool IsDuplicate = false;
for (Arg DALArg : DAL){		for (Arg DALArg : DAL) {
if (A == DALArg) {		if (A == DALArg) {
IsDuplicate = true;		IsDuplicate = true;
break;		break;
}		}
}		}
if (!IsDuplicate)		if (!IsDuplicate)
DAL->append(A);		DAL->append(A);
}		}

StringRef Arch = DAL->getLastArgValue(options::OPT_march_EQ);		StringRef Arch = DAL->getLastArgValue(options::OPT_march_EQ);
if (Arch.empty()) {		if (Arch.empty())
// Default compute capability for CUDA toolchain is the		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
// lowest compute capability supported by the installed		CLANG_OPENMP_NVPTX_DEFAULT_ARCH);
// CUDA version.
DAL->AddJoinedArg(nullptr,
Opts.getOption(options::OPT_march_EQ),
CudaInstallation.getLowestExistingArch());
}

return DAL;		return DAL;
}		}

for (Arg *A : Args) {		for (Arg *A : Args) {
if (A->getOption().matches(options::OPT_Xarch__)) {		if (A->getOption().matches(options::OPT_Xarch__)) {
// Skip this argument unless the architecture matches BoundArch		// Skip this argument unless the architecture matches BoundArch
if (BoundArch.empty() \|\| A->getValue(0) != BoundArch)		if (BoundArch.empty() \|\| A->getValue(0) != BoundArch)
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CMake][OpenMP] Customize default offloading archClosedPublic

Details

Diff Detail