This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Use ldg for explicitly invariant loads.
ClosedPublic

Authored by jlebar on Aug 12 2016, 4:52 PM.

Download Raw Diff

Details

Reviewers

Commits

rG6d6b11a4a6a4: [NVPTX] Use ldg for explicitly invariant loads.
rL281152: [NVPTX] Use ldg for explicitly invariant loads.

Summary

With this change (plus some changes to prevent !invariant from being
clobbered within llvm), clang will be able to model the __ldg CUDA
builtin as an invariant load, rather than as a target-specific llvm
intrinsic. This will let the optimizer play with these loads --
specifically, we should be able to vectorize them in the load-store
vectorizer.

Diff Detail

Repository: rL LLVM

Event Timeline

jlebar updated this revision to Diff 67941.Aug 12 2016, 4:52 PM

jlebar retitled this revision from to [NVPTX] Use ldg for explicitly invariant loads..

jlebar updated this object.

jlebar added a reviewer: tra.

jlebar added subscribers: chandlerc, llvm-commits, hfinkel, jholewinski.

You may want to add a change for to make sure explicit invariant loads work within kernels, too.

llvm/test/CodeGen/NVPTX/ldg-invariant.ll
21–22 ↗	(On Diff #67941)	You may want to add a test case for invariant load from non-global space.

This revision is now accepted and ready to land.Aug 12 2016, 5:16 PM

jlebar added a parent revision: D23371: [CodeGen] Split out the notions of MI invariance and MI dereferenceability..Aug 12 2016, 5:28 PM

jlebar added a child revision: D23479: Add handling of !invariant.load to PropagateMetadata..Aug 16 2016, 11:13 PM

jlebar mentioned this in D23479: Add handling of !invariant.load to PropagateMetadata..

Closed by commit rL281152: [NVPTX] Use ldg for explicitly invariant loads. (authored by jlebar). · Explain WhySep 10 2016, 6:47 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

NVPTX/

NVPTXISelDAGToDAG.cpp

35 lines

test/

CodeGen/

NVPTX/

ldg-invariant.ll

27 lines

Diff 70958

llvm/trunk/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp

Show First 20 Lines • Show All 552 Lines • ▼ Show 20 Lines	if (auto *PT = dyn_cast<PointerType>(Src->getType())) {
default: break;		default: break;
}		}
}		}
return NVPTX::PTXLdStInstCode::GENERIC;		return NVPTX::PTXLdStInstCode::GENERIC;
}		}

static bool canLowerToLDG(MemSDNode *N, const NVPTXSubtarget &Subtarget,		static bool canLowerToLDG(MemSDNode *N, const NVPTXSubtarget &Subtarget,
unsigned CodeAddrSpace, MachineFunction *F) {		unsigned CodeAddrSpace, MachineFunction *F) {
// To use non-coherent caching, the load has to be from global		// We use ldg (i.e. ld.global.nc) for invariant loads from the global address
// memory and we have to prove that the memory area is not written		// space.
// to anywhere for the duration of the kernel call, not even after
// the load.
//		//
// To ensure that there are no writes to the memory, we require the		// We have two ways of identifying invariant loads: Loads may be explicitly
// underlying pointer to be a noalias (__restrict) kernel parameter		// marked as invariant, or we may infer them to be invariant.
// that is never used for a write. We can only do this for kernel		//
// functions since from within a device function, we cannot know if		// We currently infer invariance only for kernel function pointer params that
// there were or will be writes to the memory from the caller - or we		// are noalias (i.e. __restrict) and never written to.
// could, but then we would have to do inter-procedural analysis.		//
if (!Subtarget.hasLDG() \|\| CodeAddrSpace != NVPTX::PTXLdStInstCode::GLOBAL \|\|		// TODO: Perform a more powerful invariance analysis (ideally IPO, and ideally
!isKernelFunction(*F->getFunction())) {		// not during the SelectionDAG phase).
		//
		// TODO: Infer invariance only at -O2. We still want to use ldg at -O0 for
		// explicitly invariant loads because these are how clang tells us to use ldg
		// when the user uses a builtin.
		if (!Subtarget.hasLDG() \|\| CodeAddrSpace != NVPTX::PTXLdStInstCode::GLOBAL)
		return false;

		if (N->isInvariant())
		return true;

		// Load wasn't explicitly invariant. Attempt to infer invariance.
		if (!isKernelFunction(*F->getFunction()))
return false;		return false;
}

// We use GetUnderlyingObjects() here instead of		// We use GetUnderlyingObjects() here instead of
// GetUnderlyingObject() mainly because the former looks through phi		// GetUnderlyingObject() mainly because the former looks through phi
// nodes while the latter does not. We need to look through phi		// nodes while the latter does not. We need to look through phi
// nodes to handle pointer induction variables.		// nodes to handle pointer induction variables.
SmallVector<Value *, 8> Objs;		SmallVector<Value *, 8> Objs;
GetUnderlyingObjects(const_cast<Value *>(N->getMemOperand()->getValue()),		GetUnderlyingObjects(const_cast<Value *>(N->getMemOperand()->getValue()),
Objs, F->getDataLayout());		Objs, F->getDataLayout());
▲ Show 20 Lines • Show All 4,668 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/NVPTX/ldg-invariant.ll

				; RUN: llc < %s -march=nvptx64 -mcpu=sm_35 \| FileCheck %s

				; Check that invariant loads from the global addrspace are lowered to
				; ld.global.nc.

				; CHECK-LABEL: @ld_global
				define i32 @ld_global(i32 addrspace(1)* %ptr) {
				; CHECK: ld.global.nc.{{[a-z]}}32
				%a = load i32, i32 addrspace(1)* %ptr, !invariant.load !0
				ret i32 %a
				}

				; CHECK-LABEL: @ld_not_invariant
				define i32 @ld_not_invariant(i32 addrspace(1)* %ptr) {
				; CHECK: ld.global.{{[a-z]}}32
				%a = load i32, i32 addrspace(1)* %ptr
				ret i32 %a
				}

				; CHECK-LABEL: @ld_not_global_addrspace
				define i32 @ld_not_global_addrspace(i32 addrspace(0)* %ptr) {
				; CHECK: ld.{{[a-z]}}32
				%a = load i32, i32 addrspace(0)* %ptr
				ret i32 %a
				}

				!0 = !{}