This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/Dialect/GPU/Transforms: improve context management in SerializeToCubin
ClosedPublic

Authored by rohany on Sep 8 2023, 9:57 AM.

Download Raw Diff

Details

Reviewers

bondhugula
ThomasRaoux
nicolasvasilache
herhut

Commits

rG71bdd2c2380d: mlir/lib/Dialect/GPU/Transforms: improve context management in SerializeToCubin…

Summary

This commit adjusts the CUDA context management in the SerializeToCubin pass.
In particular, it uses the device 0 primary context instead of creating a new
CUDA context on each invocation of SerializeToCubin. This yields very large
improvements in compile time, especially if an application (like a JIT compiler)
is calling SerializeToCubin repeatedly.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rohany created this revision.Sep 8 2023, 9:57 AM

Herald added a reviewer: bondhugula. · View Herald TranscriptSep 8 2023, 9:57 AM

Herald added a reviewer: ThomasRaoux. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, Moerafaat, zero9178 and 22 others. · View Herald Transcript

rohany requested review of this revision.Sep 8 2023, 9:57 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptSep 8 2023, 9:57 AM

Herald added a reviewer: herhut. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Sorry, didn't see the github migration. Will move this to github.

Harbormaster completed remote builds in B256872: Diff 556277.Sep 8 2023, 12:36 PM

This revision was not accepted when it landed; it landed in state Needs Review.Oct 20 2023, 10:35 AM

Closed by commit rG71bdd2c2380d: mlir/lib/Dialect/GPU/Transforms: improve context management in SerializeToCubin… (authored by rohany, committed by GitHub <noreply@github.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

GitHub <noreply@github.com> added a commit: rG71bdd2c2380d: mlir/lib/Dialect/GPU/Transforms: improve context management in SerializeToCubin….

Revision Contents

Path

Size

mlir/

lib/

Dialect/

GPU/

Transforms/

SerializeToCubin.cpp

11 lines

Diff 557814

mlir/lib/Dialect/GPU/Transforms/SerializeToCubin.cpp

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	SerializeToCubinPass::serializeISA(const std::string &isa) {
char jitErrorBuffer[4096] = {0};		char jitErrorBuffer[4096] = {0};

RETURN_ON_CUDA_ERROR(cuInit(0));		RETURN_ON_CUDA_ERROR(cuInit(0));

// Linking requires a device context.		// Linking requires a device context.
CUdevice device;		CUdevice device;
RETURN_ON_CUDA_ERROR(cuDeviceGet(&device, 0));		RETURN_ON_CUDA_ERROR(cuDeviceGet(&device, 0));
CUcontext context;		CUcontext context;
RETURN_ON_CUDA_ERROR(cuCtxCreate(&context, 0, device));		// Use the primary context.
		RETURN_ON_CUDA_ERROR(cuDevicePrimaryCtxRetain(&context, device));
		// Push the primary context so that the next CUDA operations
		// actually use it.
		RETURN_ON_CUDA_ERROR(cuCtxPushCurrent(context));
CUlinkState linkState;		CUlinkState linkState;

CUjit_option jitOptions[] = {CU_JIT_ERROR_LOG_BUFFER,		CUjit_option jitOptions[] = {CU_JIT_ERROR_LOG_BUFFER,
CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES};		CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES};
void *jitOptionsVals[] = {jitErrorBuffer,		void *jitOptionsVals[] = {jitErrorBuffer,
reinterpret_cast<void *>(sizeof(jitErrorBuffer))};		reinterpret_cast<void *>(sizeof(jitErrorBuffer))};

RETURN_ON_CUDA_ERROR(cuLinkCreate(2, /* number of jit options */		RETURN_ON_CUDA_ERROR(cuLinkCreate(2, /* number of jit options */
Show All 19 Lines	SerializeToCubinPass::serializeISA(const std::string &isa) {
RETURN_ON_CUDA_ERROR(cuLinkComplete(linkState, &cubinData, &cubinSize));		RETURN_ON_CUDA_ERROR(cuLinkComplete(linkState, &cubinData, &cubinSize));

char cubinAsChar = static_cast<char >(cubinData);		char cubinAsChar = static_cast<char >(cubinData);
auto result =		auto result =
std::make_unique<std::vector<char>>(cubinAsChar, cubinAsChar + cubinSize);		std::make_unique<std::vector<char>>(cubinAsChar, cubinAsChar + cubinSize);

// This will also destroy the cubin data.		// This will also destroy the cubin data.
RETURN_ON_CUDA_ERROR(cuLinkDestroy(linkState));		RETURN_ON_CUDA_ERROR(cuLinkDestroy(linkState));
RETURN_ON_CUDA_ERROR(cuCtxDestroy(context));		// Pop and release the primary context.
		CUcontext poppedContext;
		RETURN_ON_CUDA_ERROR(cuCtxPopCurrent(&poppedContext));
		RETURN_ON_CUDA_ERROR(cuDevicePrimaryCtxRelease(device));

return result;		return result;
}		}

// Register pass to serialize GPU kernel functions to a CUBIN binary annotation.		// Register pass to serialize GPU kernel functions to a CUBIN binary annotation.
void mlir::registerGpuSerializeToCubinPass() {		void mlir::registerGpuSerializeToCubinPass() {
PassRegistration<SerializeToCubinPass> registerSerializeToCubin(		PassRegistration<SerializeToCubinPass> registerSerializeToCubin(
[] { return std::make_unique<SerializeToCubinPass>(); });		[] { return std::make_unique<SerializeToCubinPass>(); });
Show All 14 Lines