This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/NVPTX/
-
Target/
-
NVPTX/
-
NVPTXAsmPrinter.cpp
-
test/CodeGen/NVPTX/
-
CodeGen/
-
NVPTX/
3/7
nvcl-param-align.ll

Differential D119247

[NVPTX] Use align attribute for kernel pointer arg alignment
ClosedPublic

Authored by nikic on Feb 8 2022, 7:31 AM.

Download Raw Diff

Details

Reviewers

tra

Group Reviewers

Restricted Project

Commits

rG1c729d719a34: [NVPTX] Use align attribute for kernel pointer arg alignment

Summary

Instead of determining the alignment based on the pointer element type (which is incompatible with opaque pointers), make use of alignment annotations added by the frontend.

In particular, clang will add alignment attributes to OpenCL kernels since D118894.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nikic created this revision.Feb 8 2022, 7:31 AM

Herald added subscribers: asavonic, hiraditya, Anastasia, jholewinski. · View Herald TranscriptFeb 8 2022, 7:31 AM

nikic requested review of this revision.Feb 8 2022, 7:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2022, 7:31 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B148247: Diff 406818.Feb 8 2022, 8:57 AM

make use of alignment annotations added by the frontend.

Does it mean that we now require explicit alignment annotation to generate reasonable code? What happens if alignment is not specified explicitly? Do we fall back to unaligned/naturally aligned/something else?

llvm/test/CodeGen/NVPTX/nvcl-param-align.ll
8	What's expected to happen if the alignment is not specified explicitly? It may be worth adding a test case for that.

Test absence of align attribute.

In D119247#3305444, @tra wrote:

make use of alignment annotations added by the frontend.

Does it mean that we now require explicit alignment annotation to generate reasonable code? What happens if alignment is not specified explicitly? Do we fall back to unaligned/naturally aligned/something else?

If the frontend does not add an explicit alignment annotation (and no alignment is inferred during optimization), then LLVM assumes that pointers are unaligned (alignment 1).

llvm/test/CodeGen/NVPTX/nvcl-param-align.ll
8	If no alignment is specified, then the default is 1. I've extended the test to check for this.

Harbormaster completed remote builds in B148434: Diff 407086.Feb 9 2022, 2:59 AM

tra added inline comments.Feb 9 2022, 10:41 AM

llvm/test/CodeGen/NVPTX/nvcl-param-align.ll
8	Always defaulting to unaligned access would likely be a performance regression. LLVM IR spec generally defaults to `If the alignment is not specified, then the code generator makes a target-specific assumption.` I think we do need to infer assumed alignment from the data layout, if we do know the type.

nikic added inline comments.Feb 9 2022, 11:53 AM

llvm/test/CodeGen/NVPTX/nvcl-param-align.ll
8	There may be some confusion here: LLVM defaults to ABI alignment for alignment of memory accesses like loads and stores (or rather, these accesses always have explicit alignment, but this default is usedwhen parsing textual IR). However, for pointer arguments, the default alignment is 1 if no `align` attribute is provided (and it's not a `byval` parameter, but that's explicitly excluded here). There should be no regressions here as long as the frontend adds the necessary alignment annotations, which was done for OpenCL kernels in D118894. As this code is only used for NVCL and not CUDA I assume that is sufficient, but please correct me if I'm wrong.

tra added inline comments.Feb 9 2022, 12:07 PM

llvm/test/CodeGen/NVPTX/nvcl-param-align.ll
8	The nominal goal of this patch is to allow dealing with opaque pointer types. It does not require defaulting to alignment of 1. Sticking with a known working default alignment for the non-opaque pointers seems to be a better choice. I agree that it's more or less a no-op if IR is generated by clang. However, there are other LLVM IR users that generate IR themselves. E.g. I have no idea whether TensorFlow's XLA, or julia, or any other LLVM-based JIT always passes alignment into for pointer arguments.

nikic added a reviewer: Restricted Project.Feb 9 2022, 12:23 PM

nikic added inline comments.

llvm/test/CodeGen/NVPTX/nvcl-param-align.ll
8	If other frontends don't generate the necessary alignment information, then we want to break those frontends ASAP, so they can be adjusted. As a matter of general policy, we always make changes necessary for opaque pointers unconditionally, so that we can deal with their effects in an isolated fashion. Making behavior changes part of the opaque pointer switch itself means that it will be very hard to root-cause the reason for any particular change. FWIW the equivalent change has already been made in the AMDGPU target.

LGTM.

llvm/test/CodeGen/NVPTX/nvcl-param-align.ll
8	After reading https://llvm.org/docs/OpaquePointers.html, I see your point.

This revision is now accepted and ready to land.Feb 9 2022, 1:13 PM

This revision was landed with ongoing or failed builds.Feb 10 2022, 2:57 AM

Closed by commit rG1c729d719a34: [NVPTX] Use align attribute for kernel pointer arg alignment (authored by nikic). · Explain Why

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rG1c729d719a34: [NVPTX] Use align attribute for kernel pointer arg alignment.

Revision Contents

Path

Size

llvm/

lib/

Target/

NVPTX/

NVPTXAsmPrinter.cpp

32 lines

test/

CodeGen/

NVPTX/

nvcl-param-align.ll

12 lines

Diff 407450

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp

Show First 20 Lines • Show All 1,329 Lines • ▼ Show 20 Lines	case Type::FixedVectorTyID:
}		}
O << "]";		O << "]";
break;		break;
default:		default:
llvm_unreachable("type not supported yet");		llvm_unreachable("type not supported yet");
}		}
}		}

static unsigned int getOpenCLAlignment(const DataLayout &DL, Type *Ty) {
if (Ty->isSingleValueType())
return DL.getPrefTypeAlignment(Ty);

auto *ATy = dyn_cast<ArrayType>(Ty);
if (ATy)
return getOpenCLAlignment(DL, ATy->getElementType());

auto *STy = dyn_cast<StructType>(Ty);
if (STy) {
unsigned int alignStruct = 1;
// Go through each element of the struct and find the
// largest alignment.
for (unsigned i = 0, e = STy->getNumElements(); i != e; i++) {
Type *ETy = STy->getElementType(i);
unsigned int align = getOpenCLAlignment(DL, ETy);
if (align > alignStruct)
alignStruct = align;
}
return alignStruct;
}

auto *FTy = dyn_cast<FunctionType>(Ty);
if (FTy)
return DL.getPointerPrefAlignment().value();
return DL.getPrefTypeAlignment(Ty);
}

void NVPTXAsmPrinter::printParamName(Function::const_arg_iterator I,		void NVPTXAsmPrinter::printParamName(Function::const_arg_iterator I,
int paramIndex, raw_ostream &O) {		int paramIndex, raw_ostream &O) {
getSymbol(I->getParent())->print(O, MAI);		getSymbol(I->getParent())->print(O, MAI);
O << "_param_" << paramIndex;		O << "_param_" << paramIndex;
}		}

void NVPTXAsmPrinter::emitFunctionParamList(const Function *F, raw_ostream &O) {		void NVPTXAsmPrinter::emitFunctionParamList(const Function *F, raw_ostream &O) {
const DataLayout &DL = getDataLayout();		const DataLayout &DL = getDataLayout();
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	if (!PAL.hasParamAttr(paramIndex, Attribute::ByVal)) {
auto *PTy = dyn_cast<PointerType>(Ty);		auto *PTy = dyn_cast<PointerType>(Ty);
if (isKernelFunc) {		if (isKernelFunc) {
if (PTy) {		if (PTy) {
// Special handling for pointer arguments to kernel		// Special handling for pointer arguments to kernel
O << "\t.param .u" << thePointerTy.getSizeInBits() << " ";		O << "\t.param .u" << thePointerTy.getSizeInBits() << " ";

if (static_cast<NVPTXTargetMachine &>(TM).getDrvInterface() !=		if (static_cast<NVPTXTargetMachine &>(TM).getDrvInterface() !=
NVPTX::CUDA) {		NVPTX::CUDA) {
Type *ETy = PTy->getPointerElementType();
int addrSpace = PTy->getAddressSpace();		int addrSpace = PTy->getAddressSpace();
switch (addrSpace) {		switch (addrSpace) {
default:		default:
O << ".ptr ";		O << ".ptr ";
break;		break;
case ADDRESS_SPACE_CONST:		case ADDRESS_SPACE_CONST:
O << ".ptr .const ";		O << ".ptr .const ";
break;		break;
case ADDRESS_SPACE_SHARED:		case ADDRESS_SPACE_SHARED:
O << ".ptr .shared ";		O << ".ptr .shared ";
break;		break;
case ADDRESS_SPACE_GLOBAL:		case ADDRESS_SPACE_GLOBAL:
O << ".ptr .global ";		O << ".ptr .global ";
break;		break;
}		}
O << ".align " << (int)getOpenCLAlignment(DL, ETy) << " ";		Align ParamAlign = I->getParamAlign().valueOrOne();
		O << ".align " << ParamAlign.value() << " ";
}		}
printParamName(I, paramIndex, O);		printParamName(I, paramIndex, O);
continue;		continue;
}		}

// non-pointer scalar to kernel func		// non-pointer scalar to kernel func
O << "\t.param .";		O << "\t.param .";
// Special case: predicate operands become .u8 types		// Special case: predicate operands become .u8 types
▲ Show 20 Lines • Show All 670 Lines • Show Last 20 Lines

llvm/test/CodeGen/NVPTX/nvcl-param-align.ll

	; RUN: llc < %s -march=nvptx -mcpu=sm_20 \| FileCheck %s			; RUN: llc < %s -march=nvptx -mcpu=sm_20 \| FileCheck %s

	target triple = "nvptx-unknown-nvcl"			target triple = "nvptx-unknown-nvcl"

				define void @foo(i64 %img, i64 %sampler, <5 x float>* align 32 %v1, i32* %v2) {
				; The parameter alignment is determined by the align attribute (default 1).
	; CHECK-LABEL: .entry foo(			; CHECK-LABEL: .entry foo(
	define void @foo(i64 %img, i64 %sampler, <5 x float>* %v) {
	; The parameter alignment should be the next power of 2 of 5xsizeof(float),
	; which is 32.
	; CHECK: .param .u32 .ptr .align 32 foo_param_2			; CHECK: .param .u32 .ptr .align 32 foo_param_2
				traUnsubmitted Not Done Reply Inline Actions What's expected to happen if the alignment is not specified explicitly? It may be worth adding a test case for that. tra: What's expected to happen if the alignment is not specified explicitly? It may be worth adding…
				nikicAuthorUnsubmitted Done Reply Inline Actions If no alignment is specified, then the default is 1. I've extended the test to check for this. nikic: If no alignment is specified, then the default is 1. I've extended the test to check for this.
				traUnsubmitted Not Done Reply Inline Actions Always defaulting to unaligned access would likely be a performance regression. LLVM IR spec generally defaults to `If the alignment is not specified, then the code generator makes a target-specific assumption.` I think we do need to infer assumed alignment from the data layout, if we do know the type. tra: Always defaulting to unaligned access would likely be a performance regression. LLVM IR spec…
				nikicAuthorUnsubmitted Done Reply Inline Actions There may be some confusion here: LLVM defaults to ABI alignment for alignment of memory accesses like loads and stores (or rather, these accesses always have explicit alignment, but this default is usedwhen parsing textual IR). However, for pointer arguments, the default alignment is 1 if no `align` attribute is provided (and it's not a `byval` parameter, but that's explicitly excluded here). There should be no regressions here as long as the frontend adds the necessary alignment annotations, which was done for OpenCL kernels in D118894. As this code is only used for NVCL and not CUDA I assume that is sufficient, but please correct me if I'm wrong. nikic: There may be some confusion here: LLVM defaults to ABI alignment for alignment of memory…
				traUnsubmitted Not Done Reply Inline Actions The nominal goal of this patch is to allow dealing with opaque pointer types. It does not require defaulting to alignment of 1. Sticking with a known working default alignment for the non-opaque pointers seems to be a better choice. I agree that it's more or less a no-op if IR is generated by clang. However, there are other LLVM IR users that generate IR themselves. E.g. I have no idea whether TensorFlow's XLA, or julia, or any other LLVM-based JIT always passes alignment into for pointer arguments. tra: The nominal goal of this patch is to allow dealing with opaque pointer types. It does not…
				nikicAuthorUnsubmitted Done Reply Inline Actions If other frontends don't generate the necessary alignment information, then we want to break those frontends ASAP, so they can be adjusted. As a matter of general policy, we always make changes necessary for opaque pointers unconditionally, so that we can deal with their effects in an isolated fashion. Making behavior changes part of the opaque pointer switch itself means that it will be very hard to root-cause the reason for any particular change. FWIW the equivalent change has already been made in the AMDGPU target. nikic: If other frontends don't generate the necessary alignment information, then we want to break…
				traUnsubmitted Not Done Reply Inline Actions After reading https://llvm.org/docs/OpaquePointers.html, I see your point. tra: After reading https://llvm.org/docs/OpaquePointers.html, I see your point.
				; CHECK: .param .u32 .ptr .align 1 foo_param_3
	ret void			ret void
	}			}

	!nvvm.annotations = !{!1, !2, !3}			!nvvm.annotations = !{!1, !2, !3}
	!1 = !{void (i64, i64, <5 x float>) @foo, !"kernel", i32 1}			!1 = !{void (i64, i64, <5 x float>, i32)* @foo, !"kernel", i32 1}
	!2 = !{void (i64, i64, <5 x float>) @foo, !"rdoimage", i32 0}			!2 = !{void (i64, i64, <5 x float>, i32)* @foo, !"rdoimage", i32 0}
	!3 = !{void (i64, i64, <5 x float>) @foo, !"sampler", i32 1}			!3 = !{void (i64, i64, <5 x float>, i32)* @foo, !"sampler", i32 1}