This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Avoid temp copy of byval kernel parameters.
ClosedPublic

Authored by tra on Mar 11 2021, 4:31 PM.

Details

Summary

Avoid making a temporary copy of byval argument if all accesses are loads and therefore the pointer to the parameter can not escape.

This avoids excessive global memory accesses when each kernel makes its own copy.

Diff Detail

Event Timeline

tra created this revision.Mar 11 2021, 4:31 PM
tra requested review of this revision.Mar 11 2021, 4:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2021, 4:31 PM

This is going to be so great. Thanks, Art.

llvm/lib/Target/NVPTX/NVPTXLowerArgs.cpp
165–166

Looking at https://llvm.org/docs/LangRef.html#load-instruction, there's a lot of additional metadata that might be on a load instruction. Do we need to copy it? (Is there no LLVM utility that says, "create a new load in a different address space?)

Also, should we disallow atomic loads or otherwise handle them?

179

Is it safe to write this recursively? (Stack overflow?)

189

same -- is recursion ok here?

205

Nit: run-on sentence. s/which/. This/

asbirlea added inline comments.Mar 11 2021, 5:05 PM
llvm/lib/Target/NVPTX/NVPTXLowerArgs.cpp
165–166

AFAIK there is no such utility.

AAMDNodes AATags;
LI->getAAMetadata(AATags);
NewLI->setAAMetadata(AATags);

Perhaps also set debug location, e.g. NewLI->setDebugLoc(DebugLoc());

tra updated this revision to Diff 330112.Mar 11 2021, 5:18 PM
tra edited the summary of this revision. (Show Details)

Do not clone LoadInst, just replace its operand.

tra added inline comments.Mar 11 2021, 5:19 PM
llvm/lib/Target/NVPTX/NVPTXLowerArgs.cpp
165–166

Good point. I do not even need to clone it -- it does not care about address space itself. Its pointer operand deals with it.

179

It could be a problem.

LLVM's manual does not mention anything about that, but the examples are all 'flat'.
https://llvm.org/docs/ProgrammersManual.html#basic-inspection-and-traversal-routines

Shouldn't be hard to rewrite it as a loop.

@asbirlea : is recursion OK here, or do I need to use a loop instead?

tra updated this revision to Diff 330349.Mar 12 2021, 12:55 PM

Replaced recursive iteration with loop-based one.

tra marked 5 inline comments as done.Mar 12 2021, 12:57 PM
jlebar accepted this revision.Mar 12 2021, 1:31 PM

...in which we all remember why we like writing this recursively. :D

🚢

llvm/lib/Target/NVPTX/NVPTXLowerArgs.cpp
161

I think there was a clang-tidy warning at one point saying this should be lower-case c?

183

needed?

This revision is now accepted and ready to land.Mar 12 2021, 1:31 PM
tra updated this revision to Diff 330375.Mar 12 2021, 2:37 PM
tra marked an inline comment as done.

Fixed few tests affected by the patch.

tra added inline comments.Mar 12 2021, 2:54 PM
llvm/test/CodeGen/NVPTX/lower-kernel-ptr-arg.ll
35 ↗(On Diff #330375)

With a temporary copy of the argument, the alignment used to be inferred to be 8 based on the struct alignment and that allowed LLVM to generate 64-bit load. W/o a temporary copy, we're only looking at the load's alignment of 4, which probably evolved from the original 32-bit IR.
Making load alignment to be 8 makes the test work predictably regardless of whether we have a temp copy.

Updates look good.

This revision was landed with ongoing or failed builds.Mar 15 2021, 2:28 PM
This revision was automatically updated to reflect the committed changes.