This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] CUDA provides a memcpy and memset
AbandonedPublic

Authored by jdoerfert on Mar 14 2021, 10:40 AM.

Diff Detail

Event Timeline

jdoerfert created this revision.Mar 14 2021, 10:40 AM
jdoerfert requested review of this revision.Mar 14 2021, 10:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2021, 10:40 AM
tra added a comment.Mar 15 2021, 10:43 AM

It would be good to add a test.

Both NVCC and clang currently lower memcpy to an explicit loop. I'm not sure what effect (if any) allowing memcpy/memset libcall would have on performance. We may want to benchmark it before landing.

In D98607#2626634, @tra wrote:

It would be good to add a test.

Both NVCC and clang currently lower memcpy to an explicit loop. I'm not sure what effect (if any) allowing memcpy/memset libcall would have on performance. We may want to benchmark it before landing.

I doubt I have the proper setup to do such benchmarking. I care about malloc/free, this was just a follow up because the same CUDA documentation paragraph says they are available.
I'm fine with dropping this for now.

tra added a comment.Mar 15 2021, 1:13 PM

I'd incorporate the changes into your free/malloc patch, but leave them commented out with a TODO explaining that they are available, but disabled until we can prove that they are beneficial.

jdoerfert abandoned this revision.Mar 15 2021, 2:00 PM

"merged" into D98606