This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/tools/
-
tools/
-
CMakeLists.txt
-
clang-nvlink-wrapper/
-
CMakeLists.txt
9/11
ClangNvlinkWrapper.cpp

Differential D108291

[clang-nvlink-wrapper] Wrapper around nvlink for archive files
ClosedPublic

Authored by saiislam on Aug 18 2021, 4:50 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
ye-luo
grokos
ABataev
JonChesterfield

Commits

rG83f3782c6129: [clang-nvlink-wrapper] Wrapper around nvlink for archive files

Summary

nvlink does not support linking of cubin files archived in an archive.
This tool extracts all the cubin files in the given device specific archive
and pass them to nvlink. It is required for linking static device libraries.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	33,160 ms	x64 debian > libFuzzer.libFuzzer::fork.test

Event Timeline

saiislam created this revision.Aug 18 2021, 4:50 AM

Herald added subscribers: mstorsjo, mgorny. · View Herald TranscriptAug 18 2021, 4:50 AM

saiislam requested review of this revision.Aug 18 2021, 4:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2021, 4:50 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Tests. The usual bug in this area is that an archive can contain multiple files with the same name which clobber each other if extracted to the same directory. It looks to me like this implementation will fail that test.

Also, motivation? Seems this can be worked around by not putting cubin files in an archive.

Edit: this will fail to implement the archive semantics of only pulling in used files, i.e. it is implicitly -whole-archive. Suggest error unless whole-archive is requested, as then it does the right thing today while leaving us the option to implement the correct linker semantics later.

Side point: really not a fan of incorrectly implementing bits of a linker as standalone tools called from clang. Can we raise a bug report against nvlink instead of doing this?

JonChesterfield added a reviewer: JonChesterfield.Aug 18 2021, 5:04 AM

saiislam mentioned this in D105191: [Clang][OpenMP] Add partial support for Static Device Libraries.Aug 18 2021, 5:06 AM

In D108291#2951969, @JonChesterfield wrote:

Tests. The usual bug in this area is that an archive can contain multiple files with the same name which clobber each other if extracted to the same directory. It looks to me like this implementation will fail that test.

While extracting cubin files from the archive each output file gets a new name using llvm::sys::fs::createTemporaryFile, so there won't be any clobbering.

Also, motivation? Seems this can be worked around by not putting cubin files in an archive.

It is required for linking static device libraries on nvptx (D105191). Given a fat heterogenous archive, clang-offload-bundler will create a device specific archive which will be passed to llvm-link for amdgpu and nvlink-wrapper for nvptx.

Edit: this will fail to implement the archive semantics of only pulling in used files, i.e. it is implicitly -whole-archive. Suggest error unless whole-archive is requested, as then it does the right thing today while leaving us the option to implement the correct linker semantics later.

llvm-link and nvlink-wrapper both work on whole archive semantics..

Side point: really not a fan of incorrectly implementing bits of a linker as standalone tools called from clang. Can we raise a bug report against nvlink instead of doing this?

I also don't like adding a new tool just for one additional wrapping around. But, couldn't find a feasible solution on my own. Another alternative is to extend llvm-ar to support "output directory" option. But, the suggestion in last week's multi-company meeting was to go with nvlink-wrapper instead of extending llvm-ar. Let's file a bug report for nvlink for supporting archive files and as soon as it is available we can drop this tool. Requires only one line change in Cuda.cpp.

JonChesterfield mentioned this in D93525: [clang-offload-bundler] Add unbundling of archives containing bundled object files into device specific archives.Aug 18 2021, 5:19 AM

Harbormaster completed remote builds in B120103: Diff 367179.Aug 18 2021, 5:49 AM

this is the working steps in the linking script.

$clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 -O3 -DNDEBUG CMakeFiles/cxx.complex_reduction.x.dir/complex_reduction.cpp.o -o ../bin/cxx.complex_reduction.x -v

clang-offload-bundler (host,device)
in: complex_reduction.cpp.o
out: complex_reduction-494ba8.o, complex_reduction-5aba63.cubin

nvlink (device)
in: complex_reduction-5aba63.cubin
out: complex_reduction-b1898c.out

clang-offload-wrapper (device)
in: complex_reduction-b1898c.out
out: cxx-a8318a.bc

clang (device)
in: cxx-a8318a.bc cxx-e54e6f.o

ld (host, device)
in: complex_reduction-494ba8.o, cxx-e54e6f.o
out: executable

I'm not quite understand what this wrapper replaces and why.
"It is required for linking static device libraries on nvptx" is not explaining what is not working with existing steps and what the clang-nvlink-wrapper changes to make it work. Need elaboration.

clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
42	arcive -> archive <objects> is input already "The wrapper extracts any arcive objects " what does it mean? "call nvlink with the individual files" waht individual files. What is the output? Please make this documentation more clear.

Discussed at the multicompany meeting today. Consensus reached is that the whole-archive/no-whole-archive distinction is unimportant - we can ship a toolchain that has whole-archive semantics and later change the default when doing more work in the linker. It's not my idea of backwards compatibility but I'm sympathetic to the argument that there's enough churn in gpu offloading that this particular point is lost in the noise.

The code itself looks basically OK to me, it extract files & passes them unchanged into nvlink along with other files. Error handling is incomplete - we detect some things going wrong, but end up returning 0 anyway. Bunch of comments inline.

It would be nicer from a unix perspective if the tool read the archives and returned a list of files to pass to nvlink, in pipeline fashion, but that'll make temporary file cleanup hazardous. Forking nvlink seems better for that reason.

Might be a nice usability feature for --help to print some stuff then invoke nvlink with --help itself. If this is ~ a perfect filter, aside from expanding archives, it could conceivably be named nvlink and used wherever nvlink is.

clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
28	what's unistd used for here? can we drop this?
124	should lose the commented out code
128	this seems a fairly likely failure mode - perhaps we should find nvlink first, and only start writing archives members into temporary files etc after establishing that it exists
137	This unconditionally writes an nvlink invocation to stderr. Not good post debugging the first implementation, programs should execute silently when successful. We could look for a debugging flag / environment variable if necessary for debugging this in the field
138	there's a risk that the large number of arguments that results here exceeds the platform limitations. I don't know offhand if executeandwait handles that (e.g. by creating response files), or if nvlink understands response files
157	there's a using namespace llvm above, can remove a lot of llvm:: prefixes
197	does this name the file it couldn't delete?

Added documentation.
Improved error handling.

Herald added a subscriber: sstefan1. · View Herald TranscriptAug 23 2021, 1:28 PM

Harbormaster completed remote builds in B120859: Diff 368202.Aug 23 2021, 2:01 PM

ye-luo added inline comments.Aug 23 2021, 2:11 PM

clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
19	Right now clang-offload-bundler is only used to create an object file for the host and a cubin file for the device. Then cubin files are passed to the nvlink. This is different from what you described clang-offload-bundler creates a device specific archive of cubin files. Such an archive is then passed to this tool to extract cubin files before passing to nvlink. Is this caused by changes in https://reviews.llvm.org/D105191? Do you have any reading materials which documents the whole linking flow of D105191?

saiislam marked an inline comment as done.Aug 24 2021, 12:35 AM

saiislam added inline comments.

clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
19	Yes, this patch is required for D105191 to work correctly on nvptx. Once this patch lands, I will update D105191 to call "clang-nvlink-wrapper" instead of "nvlink" in clang/lib/Driver/ToolChains/Cuda.cpp::NVPTX::OpenMPLinker::ConstructJob(). Greg Rodgers presented about static device libraries in last year's LLVM-CTH Workshop. In summary, following commands are generated by clang driver to deal with heterogenous device libraries: device-specific-archive.a <== clang-offload-bundler(heteregenous-device-archive.a, current-device) If (amdgpu) linked-output <== llvm-link(device-specific-archive.a) If (nvptx) extacted-cubins.cubin <== nvlink-wrapper(device-specific-archive.a) linked-output <== nvlink (extracted-cubins.cubin)

Ping. Any more suggestions/questions?
I will update requested changes/documentation for D105191 once this gets through.

Documentation is much improved. LGTM.

This revision is now accepted and ready to land.Aug 31 2021, 10:10 AM

saiislam mentioned this in rG83f3782c6129: [clang-nvlink-wrapper] Wrapper around nvlink for archive files.Sep 1 2021, 3:33 AM

Maybe add this to the release notes of clang 14?

aaron.ballman added a subscriber: aaron.ballman.Sep 2 2021, 2:20 PM

aaron.ballman added inline comments.

clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp
63	The `-1` is odd here -- this argument is a Boolean for whether the buffer is text or not, and this introduced a new MSVC warning (`'argument': truncation from 'int' to 'bool'`).

In D108291#2980737, @sylvestre.ledru wrote:

Maybe add this to the release notes of clang 14?

@sylvestre.ledru and @aaron.ballman please have a look at D109225. I have made the suggested changes.
I would like to include this wrapper in llvm-13 also because it is actually a fix required for D105191, which is a part of a feature being worked on for over a year (D81109 and D93525).

This patch has landed as 83f3782c6129e7a5df3faaf0ae576611d16a8d49 but not reflected on Phabricator

saiislam closed this revision.Sep 5 2021, 11:16 PM

saiislam added a commit: rG83f3782c6129: [clang-nvlink-wrapper] Wrapper around nvlink for archive files.

Revision Contents

Path

Size

clang/

tools/

CMakeLists.txt

1 line

clang-nvlink-wrapper/

CMakeLists.txt

25 lines

ClangNvlinkWrapper.cpp

164 lines

Diff 368202

clang/tools/CMakeLists.txt

	create_subdirectory_options(CLANG TOOL)			create_subdirectory_options(CLANG TOOL)

	add_clang_subdirectory(diagtool)			add_clang_subdirectory(diagtool)
	add_clang_subdirectory(driver)			add_clang_subdirectory(driver)
	add_clang_subdirectory(apinotes-test)			add_clang_subdirectory(apinotes-test)
	add_clang_subdirectory(clang-diff)			add_clang_subdirectory(clang-diff)
	add_clang_subdirectory(clang-format)			add_clang_subdirectory(clang-format)
	add_clang_subdirectory(clang-format-vs)			add_clang_subdirectory(clang-format-vs)
	add_clang_subdirectory(clang-fuzzer)			add_clang_subdirectory(clang-fuzzer)
	add_clang_subdirectory(clang-import-test)			add_clang_subdirectory(clang-import-test)
				add_clang_subdirectory(clang-nvlink-wrapper)
	add_clang_subdirectory(clang-offload-bundler)			add_clang_subdirectory(clang-offload-bundler)
	add_clang_subdirectory(clang-offload-wrapper)			add_clang_subdirectory(clang-offload-wrapper)
	add_clang_subdirectory(clang-scan-deps)			add_clang_subdirectory(clang-scan-deps)
	add_clang_subdirectory(clang-repl)			add_clang_subdirectory(clang-repl)

	add_clang_subdirectory(c-index-test)			add_clang_subdirectory(c-index-test)

	add_clang_subdirectory(clang-rename)			add_clang_subdirectory(clang-rename)
	Show All 31 Lines

clang/tools/clang-nvlink-wrapper/CMakeLists.txt

This file was added.

				set(LLVM_LINK_COMPONENTS BitWriter Core Object Support)

				if(NOT CLANG_BUILT_STANDALONE)
				set(tablegen_deps intrinsics_gen)
				endif()

				add_clang_executable(clang-nvlink-wrapper
				ClangNvlinkWrapper.cpp

				DEPENDS
				${tablegen_deps}
				)

				set(CLANG_NVLINK_WRAPPER_LIB_DEPS
				clangBasic
				)

				add_dependencies(clang clang-nvlink-wrapper)

				target_link_libraries(clang-nvlink-wrapper
				PRIVATE
				${CLANG_NVLINK_WRAPPER_LIB_DEPS}
				)

				install(TARGETS clang-nvlink-wrapper RUNTIME DESTINATION bin)

clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp

This file was added.

				//===-- clang-nvlink-wrapper/ClangNvlinkWrapper.cpp - wrapper over nvlink-===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===---------------------------------------------------------------------===//
				///
				/// \file
				/// This tool works as a wrapper over nvlink program. It transparently passes
				/// every input option and objects to nvlink except archive files. It reads
				/// each input archive file to extract archived cubin files as temporary files.
				/// These temp (*.cubin) files are passed to nvlink, because nvlink does not
				/// support linking of archive files implicitly.
				///
				/// During linking of heteregenous device archive libraries, the
				/// clang-offload-bundler creates a device specific archive of cubin files.
				/// Such an archive is then passed to this tool to extract cubin files before
				/// passing to nvlink.
				ye-luoUnsubmitted Done Reply Inline Actions Right now clang-offload-bundler is only used to create an object file for the host and a cubin file for the device. Then cubin files are passed to the nvlink. This is different from what you described clang-offload-bundler creates a device specific archive of cubin files. Such an archive is then passed to this tool to extract cubin files before passing to nvlink. Is this caused by changes in https://reviews.llvm.org/D105191? Do you have any reading materials which documents the whole linking flow of D105191? ye-luo: Right now clang-offload-bundler is only used to create an object file for the host and a cubin…
				saiislamAuthorUnsubmitted Done Reply Inline Actions Yes, this patch is required for D105191 to work correctly on nvptx. Once this patch lands, I will update D105191 to call "clang-nvlink-wrapper" instead of "nvlink" in clang/lib/Driver/ToolChains/Cuda.cpp::NVPTX::OpenMPLinker::ConstructJob(). Greg Rodgers presented about static device libraries in last year's LLVM-CTH Workshop. In summary, following commands are generated by clang driver to deal with heterogenous device libraries: device-specific-archive.a <== clang-offload-bundler(heteregenous-device-archive.a, current-device) If (amdgpu) linked-output <== llvm-link(device-specific-archive.a) If (nvptx) extacted-cubins.cubin <== nvlink-wrapper(device-specific-archive.a) linked-output <== nvlink (extracted-cubins.cubin) saiislam: Yes, this patch is required for D105191 to work correctly on nvptx. Once this patch lands, I…
				///
				/// Example:
				/// clang-nvlink-wrapper -o a.out-openmp-nvptx64 /tmp/libTest-nvptx-sm_50.a
				///
				/// 1. Extract (libTest-nvptx-sm_50.a) => /tmp/a.cubin /tmp/b.cubin
				/// 2. nvlink -o a.out-openmp-nvptx64 /tmp/a.cubin /tmp/b.cubin
				//===---------------------------------------------------------------------===//

				#include "llvm/Object/Archive.h"
				JonChesterfieldUnsubmitted Done Reply Inline Actions what's unistd used for here? can we drop this? JonChesterfield: what's unistd used for here? can we drop this?
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Errc.h"
				#include "llvm/Support/FileSystem.h"
				#include "llvm/Support/MemoryBuffer.h"
				#include "llvm/Support/Path.h"
				#include "llvm/Support/Program.h"
				#include "llvm/Support/Signals.h"
				#include "llvm/Support/StringSaver.h"
				#include "llvm/Support/WithColor.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;

				static cl::opt<bool> Help("h", cl::desc("Alias for -help"), cl::Hidden);
				ye-luoUnsubmitted Done Reply Inline Actions arcive -> archive <objects> is input already "The wrapper extracts any arcive objects " what does it mean? "call nvlink with the individual files" waht individual files. What is the output? Please make this documentation more clear. ye-luo: arcive -> archive <objects> is input already "The wrapper extracts any arcive objects " what…

				static Error runNVLink(std::string NVLinkPath,
				SmallVectorImpl<std::string> &Args) {
				std::vector<StringRef> NVLArgs;
				NVLArgs.push_back(NVLinkPath);
				for (auto &Arg : Args) {
				NVLArgs.push_back(Arg);
				}

				if (sys::ExecuteAndWait(NVLinkPath.c_str(), NVLArgs))
				return createStringError(inconvertibleErrorCode(), "'nvlink' failed");
				return Error::success();
				}

				static Error extractArchiveFiles(StringRef Filename,
				SmallVectorImpl<std::string> &Args,
				SmallVectorImpl<std::string> &TmpFiles) {
				std::vector<std::unique_ptr<MemoryBuffer>> ArchiveBuffers;

				ErrorOr<std::unique_ptr<MemoryBuffer>> BufOrErr =
				MemoryBuffer::getFileOrSTDIN(Filename, -1, false);
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions The `-1` is odd here -- this argument is a Boolean for whether the buffer is text or not, and this introduced a new MSVC warning (`'argument': truncation from 'int' to 'bool'`). aaron.ballman: The `-1` is odd here -- this argument is a Boolean for whether the buffer is text or not, and…
				if (std::error_code EC = BufOrErr.getError())
				return createFileError(Filename, EC);

				ArchiveBuffers.push_back(std::move(*BufOrErr));
				Expected<std::unique_ptr<llvm::object::Archive>> LibOrErr =
				object::Archive::create(ArchiveBuffers.back()->getMemBufferRef());
				if (!LibOrErr)
				return LibOrErr.takeError();

				auto Archive = std::move(*LibOrErr);

				Error Err = Error::success();
				auto ChildEnd = Archive->child_end();
				for (auto ChildIter = Archive->child_begin(Err); ChildIter != ChildEnd;
				++ChildIter) {
				if (Err)
				return Err;
				auto ChildNameOrErr = (*ChildIter).getName();
				if (!ChildNameOrErr)
				return ChildNameOrErr.takeError();

				StringRef ChildName = sys::path::filename(ChildNameOrErr.get());

				auto ChildBufferRefOrErr = (*ChildIter).getMemoryBufferRef();
				if (!ChildBufferRefOrErr)
				return ChildBufferRefOrErr.takeError();

				auto ChildBuffer =
				MemoryBuffer::getMemBuffer(ChildBufferRefOrErr.get(), false);
				auto ChildNameSplit = ChildName.split('.');

				SmallString<16> Path;
				int FileDesc;
				if (std::error_code EC = sys::fs::createTemporaryFile(
				(ChildNameSplit.first), (ChildNameSplit.second), FileDesc, Path))
				return createFileError(ChildName, EC);

				std::string TmpFileName(Path.str());
				Args.push_back(TmpFileName);
				TmpFiles.push_back(TmpFileName);
				std::error_code EC;
				raw_fd_ostream OS(Path.c_str(), EC, sys::fs::OF_None);
				if (EC)
				return createFileError(TmpFileName, errc::io_error);
				OS << ChildBuffer->getBuffer();
				OS.close();
				}
				return Err;
				}

				static Error cleanupTmpFiles(SmallVectorImpl<std::string> &TmpFiles) {
				for (auto &TmpFile : TmpFiles) {
				if (std::error_code EC = sys::fs::remove(TmpFile))
				return createFileError(TmpFile, errc::no_such_file_or_directory);
				}
				return Error::success();
				}

				int main(int argc, const char **argv) {
				sys::PrintStackTraceOnErrorSignal(argv[0]);

				JonChesterfieldUnsubmitted Done Reply Inline Actions should lose the commented out code JonChesterfield: should lose the commented out code
				if (Help) {
				cl::PrintHelpMessage();
				return 0;
				}
				JonChesterfieldUnsubmitted Done Reply Inline Actions this seems a fairly likely failure mode - perhaps we should find nvlink first, and only start writing archives members into temporary files etc after establishing that it exists JonChesterfield: this seems a fairly likely failure mode - perhaps we should find nvlink first, and only start…

				auto reportError = [argv](Error E) {
				logAllUnhandledErrors(std::move(E), WithColor::error(errs(), argv[0]));
				exit(1);
				};

				ErrorOr<std::string> NvlinkPath = sys::findProgramByName("nvlink");
				if (!NvlinkPath) {
				reportError(createStringError(NvlinkPath.getError(),
				JonChesterfieldUnsubmitted Done Reply Inline Actions This unconditionally writes an nvlink invocation to stderr. Not good post debugging the first implementation, programs should execute silently when successful. We could look for a debugging flag / environment variable if necessary for debugging this in the field JonChesterfield: This unconditionally writes an nvlink invocation to stderr. Not good post debugging the first…
				"unable to find 'nvlink' in path"));
				JonChesterfieldUnsubmitted Done Reply Inline Actions there's a risk that the large number of arguments that results here exceeds the platform limitations. I don't know offhand if executeandwait handles that (e.g. by creating response files), or if nvlink understands response files JonChesterfield: there's a risk that the large number of arguments that results here exceeds the platform…
				}

				SmallVector<const char *, 0> Argv(argv, argv + argc);
				SmallVector<std::string, 0> ArgvSubst;
				SmallVector<std::string, 0> TmpFiles;
				BumpPtrAllocator Alloc;
				StringSaver Saver(Alloc);
				cl::ExpandResponseFiles(Saver, cl::TokenizeGNUCommandLine, Argv);

				for (size_t i = 1; i < Argv.size(); ++i) {
				std::string Arg = Argv[i];
				if (sys::path::extension(Arg) == ".a") {
				if (Error Err = extractArchiveFiles(Arg, ArgvSubst, TmpFiles))
				reportError(std::move(Err));
				} else {
				ArgvSubst.push_back(Arg);
				}
				}

				JonChesterfieldUnsubmitted Done Reply Inline Actions there's a using namespace llvm above, can remove a lot of llvm:: prefixes JonChesterfield: there's a using namespace llvm above, can remove a lot of llvm:: prefixes
				if (Error Err = runNVLink(NvlinkPath.get(), ArgvSubst))
				reportError(std::move(Err));
				if (Error Err = cleanupTmpFiles(TmpFiles))
				reportError(std::move(Err));

				return 0;
				}
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions does this name the file it couldn't delete? JonChesterfield: does this name the file it couldn't delete?