This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] CUDA provides a memcpy and memset
AbandonedPublic

Authored by jdoerfert on Mar 14 2021, 10:40 AM.

Download Raw Diff

Details

Reviewers

tra
bollu

Summary

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jdoerfert created this revision.Mar 14 2021, 10:40 AM

Herald added a reviewer: bollu. · View Herald TranscriptMar 14 2021, 10:40 AM

Herald added subscribers: hiraditya, yaxunl. · View Herald Transcript

jdoerfert requested review of this revision.Mar 14 2021, 10:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2021, 10:40 AM

Harbormaster completed remote builds in B93717: Diff 330521.Mar 14 2021, 10:40 AM

It would be good to add a test.

Both NVCC and clang currently lower memcpy to an explicit loop. I'm not sure what effect (if any) allowing memcpy/memset libcall would have on performance. We may want to benchmark it before landing.

In D98607#2626634, @tra wrote:

It would be good to add a test.

Both NVCC and clang currently lower memcpy to an explicit loop. I'm not sure what effect (if any) allowing memcpy/memset libcall would have on performance. We may want to benchmark it before landing.

I doubt I have the proper setup to do such benchmarking. I care about malloc/free, this was just a follow up because the same CUDA documentation paragraph says they are available.
I'm fine with dropping this for now.

I'd incorporate the changes into your free/malloc patch, but leave them commented out with a TODO explaining that they are available, but disabled until we can prove that they are beneficial.

jdoerfert mentioned this in D98606: [NVPTX] CUDA does provide malloc/free since compute capability 2.X.Mar 15 2021, 2:00 PM

"merged" into D98606

Revision Contents

Path

Size

llvm/

lib/

Analysis/

TargetLibraryInfo.cpp

2 lines

Diff 330521

llvm/lib/Analysis/TargetLibraryInfo.cpp

Show First 20 Lines • Show All 543 Lines • ▼ Show 20 Lines	static void initialize(TargetLibraryInfoImpl &TLI, const Triple &T,
//		//
// FIXME: Having no standard library prevents e.g. many fastmath		// FIXME: Having no standard library prevents e.g. many fastmath
// optimizations, so this situation should be fixed.		// optimizations, so this situation should be fixed.
if (T.isNVPTX()) {		if (T.isNVPTX()) {
TLI.disableAllFunctions();		TLI.disableAllFunctions();
TLI.setAvailable(LibFunc_nvvm_reflect);		TLI.setAvailable(LibFunc_nvvm_reflect);
TLI.setAvailable(llvm::LibFunc_malloc);		TLI.setAvailable(llvm::LibFunc_malloc);
TLI.setAvailable(llvm::LibFunc_free);		TLI.setAvailable(llvm::LibFunc_free);
		TLI.setAvailable(llvm::LibFunc_memcpy);
		TLI.setAvailable(llvm::LibFunc_memset);
} else {		} else {
TLI.setUnavailable(LibFunc_nvvm_reflect);		TLI.setUnavailable(LibFunc_nvvm_reflect);
}		}

// These vec_malloc/free routines are only available on AIX.		// These vec_malloc/free routines are only available on AIX.
if (!T.isOSAIX()) {		if (!T.isOSAIX()) {
TLI.setUnavailable(LibFunc_vec_calloc);		TLI.setUnavailable(LibFunc_vec_calloc);
TLI.setUnavailable(LibFunc_vec_malloc);		TLI.setUnavailable(LibFunc_vec_malloc);
▲ Show 20 Lines • Show All 1,129 Lines • Show Last 20 Lines