This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Use LDG for pointer induction variables
ClosedPublic

Authored by broune on Aug 5 2015, 11:33 AM.

Details

Summary

Make NVPTXISelDAGToDAG able to emit cached loads (LDG) for pointer induction variables.

Also fix latent bug where LDG was not restricted to kernel functions. I believe that this could not be triggered so far since we do not currently infer that a pointer is global outside a kernel function, and only loads of global pointers are considered for cached loads.

This brings a 30% performance gain on some eigen3-based Google-internal CUDA benchmarks, where LLVM introduces a pointer induction variable and then previously couldn't use LDG.

Diff Detail

Event Timeline

broune updated this revision to Diff 31382.Aug 5 2015, 11:33 AM
broune retitled this revision from to [NVPTX] Use LDG for pointer induction variables.
broune updated this object.
broune added reviewers: jholewinski, jingyue.
broune added subscribers: llvm-commits, meheff, eliben.
broune added a subscriber: wengxt.Aug 5 2015, 11:33 AM
eliben added inline comments.Aug 5 2015, 11:40 AM
lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
551

Can't you get the data layout from F now?

560

maybe Objs for coding style (elsewhere too)?

561

How come const_cast is needed now but not before?

564

Seems like a good candidate for auto, since the type is repeated in the cast?

jingyue added inline comments.Aug 5 2015, 11:48 AM
test/CodeGen/NVPTX/load-with-non-coherent-cache.ll
216

This comment doesn't parse to me. Do you mean

we can only know that the parameter is global
but *don't* know it's never written to from the caller?

wengxt added inline comments.Aug 5 2015, 11:59 AM
test/CodeGen/NVPTX/load-with-non-coherent-cache.ll
222

from need to be a global address space pointer to make this check meaningful.

broune updated this revision to Diff 31387.Aug 5 2015, 12:13 PM
broune marked 4 inline comments as done.

Address eliben's comments.

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
551

Good point. Done.

560

Done. Actually lots of variables in this file are leading-lower-case independent of my change. I'll fix that later, but it will change lots of lines, so I'll keep it separate from this change.

561

GetUnderlyingObject has an overload that takes a const Value * and just wraps the non-const version using const_cast. There is no such overload for GetUnderlyingObjects, which makes sense, since one of the parameters is a container of Value * that would have to become a container of const Value *. Converting between such containers is problematic. So I think it's reasonable not to have the const overload, even though it does push const_cast into the callers. I checked previously what other callers of GetUnderlyingObjects do in this case and they also use const_cast.

broune updated this revision to Diff 31388.Aug 5 2015, 1:14 PM

Add @notkernel2 test.

test/CodeGen/NVPTX/load-with-non-coherent-cache.ll
216

We also don't know that the parameter is global. "only" applies to both sides of the "and", so the statement is that we don't know either one of those, since @notkernel is not a kernel function. A more detailed way of stating what I'm getting at:

In order to know that a parameter is a global pointer from within a non-kernel function, we would have to do inter-procedural analysis to see where the values for that parameter come from. This present function is not a kernel function, and we don't do inter-procedural analysis (and it doesn't have any callers anyway), so we cannot infer that it is global in this case. Thus we cannot use LDG.

The same statement applies if you replace "global pointer" with "never written to".

"never written to" is intended to include everything that happens for the duration of the kernel call (not just this function call), though I suppose "never" actually covers more than that, so I clarified it in the comment.

222

The meaning of the test is to capture what actually happens for a device function, where there are currently two independent reasons that LDG cannot be used.

Though it's a good suggestion to have a test where the parameter is explicitly marked as global, so I added that. Thanks for the suggestion.

jingyue accepted this revision.Aug 5 2015, 3:36 PM
jingyue edited edge metadata.

LGTM

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
552

I'd copy some comments at load-with-non-coherent-cache.ll:216 here too, so that people reading source code will quickly understand why we only do this for kernel functions.

test/CodeGen/NVPTX/load-with-non-coherent-cache.ll
216

Thanks. Makes sense now.

This revision is now accepted and ready to land.Aug 5 2015, 3:36 PM
jingyue added inline comments.Aug 5 2015, 3:39 PM
test/CodeGen/NVPTX/load-with-non-coherent-cache.ll
243

FYI, adding the keyword ptx_kernel to the function definition also makes isKernelFunction to return true. In that way, you don't need a long list of metadata. I noticed that only recently.

broune updated this revision to Diff 31405.Aug 5 2015, 3:41 PM
broune edited edge metadata.

Tiny comment improvement. Thanks to Jingyue for pointing out that the comment was ambiguous.

broune added inline comments.Aug 5 2015, 3:43 PM
test/CodeGen/NVPTX/load-with-non-coherent-cache.ll
243

Thanks, that's good to know. I'll fix that in the whole file my follow-up clean-up cl.

jholewinski added inline comments.Aug 5 2015, 3:44 PM
test/CodeGen/NVPTX/load-with-non-coherent-cache.ll
243

That works, but I would like to deprecate it. The PTX calling conventions are legacy cruft from the old back-end.

broune updated this revision to Diff 31411.Aug 5 2015, 3:57 PM

Another tiny improvement to the comments.

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
552

I added a more thorough comment explaining the conditions for using cached loads.

broune added inline comments.Aug 5 2015, 4:03 PM
test/CodeGen/NVPTX/load-with-non-coherent-cache.ll
243

OK, then I'll leave it out of my follow-up clean-up patch :)

jingyue added a subscriber: jingyue.Aug 5 2015, 4:11 PM

Hi Justin,

Why are we deprecating ptx_kernel? What's bad about it?

Jingyue