This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/NVPTX/
-
Target/
-
NVPTX/
1/1
NVPTXAsmPrinter.cpp
10/10
NVPTXISelLowering.cpp
-
NVPTXUtilities.h
-
test/CodeGen/NVPTX/
-
CodeGen/
-
NVPTX/
2/2
param-load-store.ll

Differential D129291

[NVPTX] Promote i24, i40, i48 and i56 to next power-of-two register when passing
ClosedPublic

Authored by kjetilkjeka on Jul 7 2022, 7:29 AM.

Download Raw Diff

Details

Reviewers

jholewinski
tra

Commits

rGff1920d106b5: [NVPTX] Promote i24, i40, i48 and i56 to next power-of-two register when passing

Summary

Today llc will crash when attempting to use non-power-of-two integer types as function arguments or returns. This patch enables passing non standard integer values in functions by promoting them before store and truncating after load.

The main motivation of implementing this change is that rust casts small structs (less than pointer size) into an integer of the same size. As an example, if a struct contains three u8 then it will be passed as an i24. This patch is a step towards enabling rust compilation to ptx while retaining the target independent optimizations.

The tests reflects that it is mostly the multiple of 8 integers less than 64 that is of interest. I have locally tested some non-multiple-of-eight integers but decided against writing tests for them as nothing should really rely on them. Let me know if you want a few of those in addition.

More context can be found in my original github issue

This is my first LLVM contribution and I hope I have done everything by your contribution guide. Let me know if I should fix anything and I will do my best to get it done. I'm also not very familiar with the LLVM codebase yet and I have not received any external input on the content of this patch beyond the test passing. If anything looks fishy it is probably something I have misunderstood.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kjetilkjeka created this revision.Jul 7 2022, 7:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2022, 7:29 AM

Herald added subscribers: mattd, gchakrabarti, asavonic and 2 others. · View Herald Transcript

kjetilkjeka requested review of this revision.Jul 7 2022, 7:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2022, 7:29 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B174155: Diff 442901.Jul 7 2022, 8:22 AM

When I run the failing test on my machine with <build>/tools/clang/tools/extra/clangd/unittests/./ClangdTests --gtest_filter=TUSchedulerTests.PreambleThrottle it passes even for the exact same pre-merge revision as the CI uses. Is this some sort of sporadic failure?

(Also, if I understand llvm correctly, the result of this test should not be changed by the nvptx changes I made)
`

kjetilkjeka added a reviewer: tra.Jul 11 2022, 8:55 AM

General nit: patch should be submitted on phabricator using large context. Please see: https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface

All i*-param.ll tests should probably be combined into one test file. They do pretty much the same thing.

Also, there should probably be test cases for integer sizes that are not multiples of 8. E.g i49.

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp

1500–1503

Size roundup calculation appears to be repeated in many places and could be extracted into a helper function.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

216

This looks a bit odd -- checking for the same boundary in two places, implicitly skipping power-of-2 sized integers...

I'd restructure it a bit to look like this:

if (VT.isScalarInteger()) {
   auto n = VT.getFixedSizeInBits();
   if (isPowerOf2_32(n))
     return false;
   switch(PowerOf2Ceil(n)){
      default: return false; // Covers i1 and integers larger than i64
      case 2:
      case 4:
      case 8:       *PromotedVT = MVT::i8; break;
      case 16:     *PromotedVT = MVT::i16; break;
      case 32:     *PromotedVT = MVT::i32; break;
      case 64:     *PromotedVT = MVT::i64; break;
    }
    return true
}

kjetilkjeka updated this revision to Diff 443965.Jul 12 2022, 8:47 AM

Thanks for the review!

I have tried to address all the requested changes and created the new diff with the large context.

I added a couple of non standard tests, but LLVM seem to handle the promote/truncate in many different ways for the different non standard integers so I'm not sure at what level of detail it makes sense to test for here. Let me know if you want more tests.

Harbormaster completed remote builds in B174891: Diff 443965.Jul 12 2022, 9:50 AM

I have tried to address all the requested changes and created the new diff with the large context.

Thank you. It makes patch reviewing much easier.

I added a couple of non standard tests, but LLVM seem to handle the promote/truncate in many different ways for the different non standard integers so I'm not sure at what level of detail it makes sense to test for here. Let me know if you want more tests.

We want to make sure that param loads/stores are done using correct sizes. We do not really care how LLVM does extension/truncation of non-power-of-2 sized integers.
So, in practice we only need to check for relevant ld.param/st.param instructions.
We don't need to enumerate all possible sizes. Something like i3/i11/i23/i47 should be sufficient to exercise the code in this patch.
We do have test/CodeGen/NVPTX/param-load-store.ll so the new test cases should probably go there as well.

BTW, for IR test cases here it's convenient to get the function to call itself, so you do not need both caller and callee and can test all relevant functionality in one place.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
217	I think we may simplify it a bit further. Get rid of this check, always set PromotedVT to the appropriate size and return true if the type changed. switch(PowerOf2Ceil(n)){ default: llvm_unreachable(); case 1: PromotedVT = MVT::i1; break; case 2: case 4: case 8: PromotedVT = MVT::i8; break; case 16: PromotedVT = MVT::i16; break; case 32: PromotedVT = MVT::i32; break; case 64: PromotedVT = MVT::i64; break; } return EVT(PromotedVT) != VT;
1327–1329	Use promoteScalarArgumentSize() here, too?
1378–1380	ditto.
1550–1555	This could also use promoteScalarArgumentSize(). if ((VT.isInteger() \|\| VT.isFloatingPoint()) TypeSize = promoteScalarArgumentSize(TypeSize*8)/8.
1683–1685	ditto.

Thanks for being so responsive and helpful! It helps a lot being my first time attempting to contribute to LLVM.

I added two small questions to your comments as inline comments.

We want to make sure that param loads/stores are done using correct sizes. We do not really care how LLVM does extension/truncation of non-power-of-2 sized integers.
So, in practice we only need to check for relevant ld.param/st.param instructions.
We don't need to enumerate all possible sizes. Something like i3/i11/i23/i47 should be sufficient to exercise the code in this patch.

Should I then also remove the attempts of testing the truncating/extending to make all tests the same?

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
217	I will fix this together with the change from the other comment when I have gotten your opinion on where to put the helper from the other comment but first just a question about the `case default`. Isn't it better to keep the `case default: return false;` to allow for attempting to promote the EVT and doing somewhere else if there is not a proper way to do it? I'm thinking more of the future scenario where things like `i99` and `i1000` is supported and you might first try to promote the integer type and if it fail then instead try to split it up into parts. I guess it doesn't make a big difference for this patch because things like `i99` or `i1000` will fail to compile both before and after the patch anyway. The only practical difference is the type of error you get if you try to feed these kind of integers to `llc`.
1327–1329	We would then need to put `promoteScalarArgumentSize` in a place that would be visible from both files (`NVPTXSelLowering.cpp` and `NVPTXAsmPrinter.cpp`). Would it make sense to use `NVPTXUtilities.cpp` for that?

In D129291#3647790, @kjetilkjeka wrote:

Should I then also remove the attempts of testing the truncating/extending to make all tests the same?

If you want to. Or it could be cleaned up in a separate patch. Leaving existing tests as is is fine, too.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
217	Isn't it better to keep the case default: return false; to allow for attempting to promote the EVT and doing somewhere else if there is not a proper way to do it? If there's a valid use case for it, then sure. On the other hand, if we are only expected to handle specific set of types (i1-i64), then I'd prefer to know right away if we ever end up with something that we can't handle. It's better to fail right away rather than ignore it and then have to debug it the hard way when things go wrong somewhere downstream. If we do grow a legitimate need to pass through other types unpromoted it will be easy enough to change.
1327–1329	I think you could just put it as an inline function into `NVPTXUtilities.h`

I have implement your review comments.

Looking at the tests from test/CodeGen/NVPTX/param-load-store.ll helped a lot and I think the quality of the tests are better now.

I also discovered a bug when implementing the i2 and i3 test related to promoting the type to i8 but using a i16 as the container but it should be fixed now.

Harbormaster completed remote builds in B175410: Diff 444662.Jul 14 2022, 10:49 AM

tra added inline comments.Jul 14 2022, 1:57 PM

llvm/test/CodeGen/NVPTX/param-load-store.ll
254–257	This, and other tests dealing with multiple loads/stores of parts of the argument may be fragile. First, the order of loads would not necessarily be guaranteed. Second -- the way LLVM reconstructs the value from parts may also change. I would stick with a set of `CHECK-DAG: ld.param.XX {{.*}} [test_i23_param_0+Y]`. All we care about here is that we load the right set of bits. We can assume that reconstructing the integer is handled by appropriate tests already.

I would stick with a set of CHECK-DAG: ld.param.XX {{.*}} [test_i23_param_0+Y]. All we care about here is that we load the right set of bits. We can assume that reconstructing the integer is handled by appropriate tests already.

I realize that you say that all we care about is loading the right set of bits. But I assume we would like to check it for returns/args on both the caller and callee side? These four cases is basically what I'm doing now in addition to checking that the function is actually called. I have removed the checking of truncating/promoting as you mentioned.

More specifically I'm checking the reads from the parameter in the call is done for the correct amount of bytes with CHECK-DAG. While the pass of the return is done with the promoted type. I'm also checking the pass and return from the call is being done with the correctly promoted type. I'm not checking anything related to truncating or promoting the integers before or after the load/store to param space.

Let me know if you want even less checks in the tests, like in fact only checking the read is being done with the correct amount of bytes, and I will fix it.

Harbormaster completed remote builds in B175616: Diff 444947.Jul 15 2022, 8:05 AM

LGTM with one test nit.

In D129291#3654804, @kjetilkjeka wrote:

I would stick with a set of CHECK-DAG: ld.param.XX {{.*}} [test_i23_param_0+Y]. All we care about here is that we load the right set of bits. We can assume that reconstructing the integer is handled by appropriate tests already.

I realize that you say that all we care about is loading the right set of bits. But I assume we would like to check it for returns/args on both the caller and callee side? These four cases is basically what I'm doing now in addition to checking that the function is actually called. I have removed the checking of truncating/promoting as you mentioned.

Let me rephrase -- we only care about loads/stores from the param space. That does apply to both load of parameters from the function arguments and storing them to pass parameters and return values.
I think we are in agreement on what we need to check. The comment was largely about checks for cvt, shl and or instructions we match (e.g a few still remain in test_i24).

llvm/test/CodeGen/NVPTX/param-load-store.ll
271	we want CHECK-DAG on loads here. Nit: no need to check logical ops.

This revision is now accepted and ready to land.Jul 15 2022, 12:35 PM

Seems like I forgot changing the i24 tests. It should be fixed now.

I still do not know why these CI tests are failing? I don't think it should have anything to do with this change though? I assume it's OK since you have not commented on it.

I hope that you will apply the patch to the code-base since I obviously do not have write access.

Thanks for helping and reviewing!

Harbormaster completed remote builds in B175807: Diff 445212.Jul 16 2022, 3:19 AM

Sanitizer test failures are AFAICT unrelated.

I will land your patch a bit later this week. Thank you for contributing these changes.

This revision was landed with ongoing or failed builds.Jul 22 2022, 2:15 PM

Closed by commit rGff1920d106b5: [NVPTX] Promote i24, i40, i48 and i56 to next power-of-two register when passing (authored by kjetilkjeka, committed by tra). · Explain Why

This revision was automatically updated to reflect the committed changes.

tra added a commit: rGff1920d106b5: [NVPTX] Promote i24, i40, i48 and i56 to next power-of-two register when passing.

Revision Contents

Path

Size

llvm/

lib/

Target/

NVPTX/

NVPTXAsmPrinter.cpp

14 lines

NVPTXISelLowering.cpp

95 lines

NVPTXUtilities.h

10 lines

test/

CodeGen/

NVPTX/

param-load-store.ll

230 lines

Diff 446978

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp

Show First 20 Lines • Show All 350 Lines • ▼ Show 20 Lines	if (Ty->isFloatingPointTy() \|\| (Ty->isIntegerTy() && !Ty->isIntegerTy(128))) {
size = ITy->getBitWidth();		size = ITy->getBitWidth();
} else {		} else {
assert(Ty->isFloatingPointTy() && "Floating point type expected here");		assert(Ty->isFloatingPointTy() && "Floating point type expected here");
size = Ty->getPrimitiveSizeInBits();		size = Ty->getPrimitiveSizeInBits();
}		}
// PTX ABI requires all scalar return values to be at least 32		// PTX ABI requires all scalar return values to be at least 32
// bits in size. fp16 normally uses .b16 as its storage type in		// bits in size. fp16 normally uses .b16 as its storage type in
// PTX, so its size must be adjusted here, too.		// PTX, so its size must be adjusted here, too.
if (size < 32)		size = promoteScalarArgumentSize(size);
size = 32;

O << ".param .b" << size << " func_retval0";		O << ".param .b" << size << " func_retval0";
} else if (isa<PointerType>(Ty)) {		} else if (isa<PointerType>(Ty)) {
O << ".param .b" << TLI->getPointerTy(DL).getSizeInBits()		O << ".param .b" << TLI->getPointerTy(DL).getSizeInBits()
<< " func_retval0";		<< " func_retval0";
} else if (Ty->isAggregateType() \|\| Ty->isVectorTy() \|\| Ty->isIntegerTy(128)) {		} else if (Ty->isAggregateType() \|\| Ty->isVectorTy() \|\| Ty->isIntegerTy(128)) {
unsigned totalsz = DL.getTypeAllocSize(Ty);		unsigned totalsz = DL.getTypeAllocSize(Ty);
unsigned retAlignment = 0;		unsigned retAlignment = 0;
Show All 12 Lines	for (unsigned i = 0, e = vtparts.size(); i != e; ++i) {
EVT elemtype = vtparts[i];		EVT elemtype = vtparts[i];
if (vtparts[i].isVector()) {		if (vtparts[i].isVector()) {
elems = vtparts[i].getVectorNumElements();		elems = vtparts[i].getVectorNumElements();
elemtype = vtparts[i].getVectorElementType();		elemtype = vtparts[i].getVectorElementType();
}		}

for (unsigned j = 0, je = elems; j != je; ++j) {		for (unsigned j = 0, je = elems; j != je; ++j) {
unsigned sz = elemtype.getSizeInBits();		unsigned sz = elemtype.getSizeInBits();
if (elemtype.isInteger() && (sz < 32))		if (elemtype.isInteger())
sz = 32;		sz = promoteScalarArgumentSize(sz);
O << ".reg .b" << sz << " func_retval" << idx;		O << ".reg .b" << sz << " func_retval" << idx;
if (j < je - 1)		if (j < je - 1)
O << ", ";		O << ", ";
++idx;		++idx;
}		}
if (i < e - 1)		if (i < e - 1)
O << ", ";		O << ", ";
}		}
▲ Show 20 Lines • Show All 1,094 Lines • ▼ Show 20 Lines	if (isKernelFunction(*F)) {
O << "\t.param .u64 .ptr .texref ";		O << "\t.param .u64 .ptr .texref ";
else		else
O << "\t.param .texref ";		O << "\t.param .texref ";
CurrentFnSym->print(O, MAI);		CurrentFnSym->print(O, MAI);
O << "_param_" << paramIndex;		O << "_param_" << paramIndex;
}		}
} else {		} else {
if (hasImageHandles)		if (hasImageHandles)
O << "\t.param .u64 .ptr .samplerref ";		O << "\t.param .u64 .ptr .samplerref ";
else		else
O << "\t.param .samplerref ";		O << "\t.param .samplerref ";
CurrentFnSym->print(O, MAI);		CurrentFnSym->print(O, MAI);
		traUnsubmitted Done Reply Inline Actions Size roundup calculation appears to be repeated in many places and could be extracted into a helper function. tra: Size roundup calculation appears to be repeated in many places and could be extracted into a…
O << "_param_" << paramIndex;		O << "_param_" << paramIndex;
}		}
continue;		continue;
}		}
}		}

auto getOptimalAlignForParam = [TLI, &DL, &PAL, F,		auto getOptimalAlignForParam = [TLI, &DL, &PAL, F,
paramIndex](Type *Ty) -> Align {		paramIndex](Type *Ty) -> Align {
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	if (!PAL.hasParamAttr(paramIndex, Attribute::ByVal)) {
printParamName(I, paramIndex, O);		printParamName(I, paramIndex, O);
continue;		continue;
}		}
// Non-kernel function, just print .param .b<size> for ABI		// Non-kernel function, just print .param .b<size> for ABI
// and .reg .b<size> for non-ABI		// and .reg .b<size> for non-ABI
unsigned sz = 0;		unsigned sz = 0;
if (isa<IntegerType>(Ty)) {		if (isa<IntegerType>(Ty)) {
sz = cast<IntegerType>(Ty)->getBitWidth();		sz = cast<IntegerType>(Ty)->getBitWidth();
if (sz < 32)		sz = promoteScalarArgumentSize(sz);
sz = 32;
} else if (isa<PointerType>(Ty))		} else if (isa<PointerType>(Ty))
sz = thePointerTy.getSizeInBits();		sz = thePointerTy.getSizeInBits();
else if (Ty->isHalfTy())		else if (Ty->isHalfTy())
// PTX ABI requires all scalar parameters to be at least 32		// PTX ABI requires all scalar parameters to be at least 32
// bits in size. fp16 normally uses .b16 as its storage type		// bits in size. fp16 normally uses .b16 as its storage type
// in PTX, so its size must be adjusted here, too.		// in PTX, so its size must be adjusted here, too.
sz = 32;		sz = 32;
else		else
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	if (isABI \|\| isKernelFunc) {
EVT elemtype = vtparts[i];		EVT elemtype = vtparts[i];
if (vtparts[i].isVector()) {		if (vtparts[i].isVector()) {
elems = vtparts[i].getVectorNumElements();		elems = vtparts[i].getVectorNumElements();
elemtype = vtparts[i].getVectorElementType();		elemtype = vtparts[i].getVectorElementType();
}		}

for (unsigned j = 0, je = elems; j != je; ++j) {		for (unsigned j = 0, je = elems; j != je; ++j) {
unsigned sz = elemtype.getSizeInBits();		unsigned sz = elemtype.getSizeInBits();
if (elemtype.isInteger() && (sz < 32))		if (elemtype.isInteger())
sz = 32;		sz = promoteScalarArgumentSize(sz);
O << "\t.reg .b" << sz << " ";		O << "\t.reg .b" << sz << " ";
printParamName(I, paramIndex, O);		printParamName(I, paramIndex, O);
if (j < je - 1)		if (j < je - 1)
O << ",\n";		O << ",\n";
++paramIndex;		++paramIndex;
}		}
if (i < e - 1)		if (i < e - 1)
O << ",\n";		O << ",\n";
▲ Show 20 Lines • Show All 583 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = TempVTs.size(); i != e; ++i) {
} else {		} else {
ValueVTs.push_back(VT);		ValueVTs.push_back(VT);
if (Offsets)		if (Offsets)
Offsets->push_back(Off);		Offsets->push_back(Off);
}		}
}		}
}		}

		/// PromoteScalarIntegerPTX
		/// Used to make sure the arguments/returns are suitable for passing
		/// and promote them to a larger size if they're not.
		///
		/// The promoted type is placed in \p PromoteVT if the function returns true.
		static bool PromoteScalarIntegerPTX(const EVT &VT, MVT *PromotedVT) {
		if (VT.isScalarInteger()) {
		switch (PowerOf2Ceil(VT.getFixedSizeInBits())) {
		traUnsubmitted Done Reply Inline Actions This looks a bit odd -- checking for the same boundary in two places, implicitly skipping power-of-2 sized integers... I'd restructure it a bit to look like this: if (VT.isScalarInteger()) { auto n = VT.getFixedSizeInBits(); if (isPowerOf2_32(n)) return false; switch(PowerOf2Ceil(n)){ default: return false; // Covers i1 and integers larger than i64 case 2: case 4: case 8: PromotedVT = MVT::i8; break; case 16: PromotedVT = MVT::i16; break; case 32: PromotedVT = MVT::i32; break; case 64: PromotedVT = MVT::i64; break; } return true } tra: This looks a bit odd -- checking for the same boundary in two places, implicitly skipping power…
		default:
		traUnsubmitted Done Reply Inline Actions I think we may simplify it a bit further. Get rid of this check, always set PromotedVT to the appropriate size and return true if the type changed. switch(PowerOf2Ceil(n)){ default: llvm_unreachable(); case 1: PromotedVT = MVT::i1; break; case 2: case 4: case 8: PromotedVT = MVT::i8; break; case 16: PromotedVT = MVT::i16; break; case 32: PromotedVT = MVT::i32; break; case 64: PromotedVT = MVT::i64; break; } return EVT(PromotedVT) != VT; tra: I think we may simplify it a bit further. Get rid of this check, always set PromotedVT to the…
		kjetilkjekaAuthorUnsubmitted Done Reply Inline Actions I will fix this together with the change from the other comment when I have gotten your opinion on where to put the helper from the other comment but first just a question about the `case default`. Isn't it better to keep the `case default: return false;` to allow for attempting to promote the EVT and doing somewhere else if there is not a proper way to do it? I'm thinking more of the future scenario where things like `i99` and `i1000` is supported and you might first try to promote the integer type and if it fail then instead try to split it up into parts. I guess it doesn't make a big difference for this patch because things like `i99` or `i1000` will fail to compile both before and after the patch anyway. The only practical difference is the type of error you get if you try to feed these kind of integers to `llc`. kjetilkjeka: I will fix this together with the change from the other comment when I have gotten your opinion…
		traUnsubmitted Done Reply Inline Actions Isn't it better to keep the case default: return false; to allow for attempting to promote the EVT and doing somewhere else if there is not a proper way to do it? If there's a valid use case for it, then sure. On the other hand, if we are only expected to handle specific set of types (i1-i64), then I'd prefer to know right away if we ever end up with something that we can't handle. It's better to fail right away rather than ignore it and then have to debug it the hard way when things go wrong somewhere downstream. If we do grow a legitimate need to pass through other types unpromoted it will be easy enough to change. tra: > Isn't it better to keep the case default: return false; to allow for attempting to promote…
		llvm_unreachable(
		"Promotion is not suitable for scalars of size larger than 64-bits");
		case 1:
		*PromotedVT = MVT::i1;
		break;
		case 2:
		case 4:
		case 8:
		*PromotedVT = MVT::i8;
		break;
		case 16:
		*PromotedVT = MVT::i16;
		break;
		case 32:
		*PromotedVT = MVT::i32;
		break;
		case 64:
		*PromotedVT = MVT::i64;
		break;
		}
		return EVT(*PromotedVT) != VT;
		}
		return false;
		}

// Check whether we can merge loads/stores of some of the pieces of a		// Check whether we can merge loads/stores of some of the pieces of a
// flattened function parameter or return value into a single vector		// flattened function parameter or return value into a single vector
// load/store.		// load/store.
//		//
// The flattened parameter is represented as a list of EVTs and		// The flattened parameter is represented as a list of EVTs and
// offsets, and the whole structure is aligned to ParamAlignment. This		// offsets, and the whole structure is aligned to ParamAlignment. This
// function determines whether we can load/store pieces of the		// function determines whether we can load/store pieces of the
// parameter starting at index Idx using a single vectorized op of		// parameter starting at index Idx using a single vectorized op of
▲ Show 20 Lines • Show All 1,068 Lines • ▼ Show 20 Lines	if (retTy->isFloatingPointTy() \|\| (retTy->isIntegerTy() && !retTy->isIntegerTy(128))) {
size = ITy->getBitWidth();		size = ITy->getBitWidth();
} else {		} else {
assert(retTy->isFloatingPointTy() &&		assert(retTy->isFloatingPointTy() &&
"Floating point type expected here");		"Floating point type expected here");
size = retTy->getPrimitiveSizeInBits();		size = retTy->getPrimitiveSizeInBits();
}		}
// PTX ABI requires all scalar return values to be at least 32		// PTX ABI requires all scalar return values to be at least 32
// bits in size. fp16 normally uses .b16 as its storage type in		// bits in size. fp16 normally uses .b16 as its storage type in
// PTX, so its size must be adjusted here, too.		// PTX, so its size must be adjusted here, too.
if (size < 32)		size = promoteScalarArgumentSize(size);
size = 32;

		traUnsubmitted Done Reply Inline Actions Use promoteScalarArgumentSize() here, too? tra: Use promoteScalarArgumentSize() here, too?
		kjetilkjekaAuthorUnsubmitted Done Reply Inline Actions We would then need to put `promoteScalarArgumentSize` in a place that would be visible from both files (`NVPTXSelLowering.cpp` and `NVPTXAsmPrinter.cpp`). Would it make sense to use `NVPTXUtilities.cpp` for that? kjetilkjeka: We would then need to put `promoteScalarArgumentSize` in a place that would be visible from…
		traUnsubmitted Done Reply Inline Actions I think you could just put it as an inline function into `NVPTXUtilities.h` tra: I think you could just put it as an inline function into `NVPTXUtilities.h`
O << ".param .b" << size << " _";		O << ".param .b" << size << " _";
} else if (isa<PointerType>(retTy)) {		} else if (isa<PointerType>(retTy)) {
O << ".param .b" << PtrVT.getSizeInBits() << " _";		O << ".param .b" << PtrVT.getSizeInBits() << " _";
} else if (retTy->isAggregateType() \|\| retTy->isVectorTy() \|\|		} else if (retTy->isAggregateType() \|\| retTy->isVectorTy() \|\|
retTy->isIntegerTy(128)) {		retTy->isIntegerTy(128)) {
O << ".param .align " << (retAlignment ? retAlignment->value() : 0)		O << ".param .align " << (retAlignment ? retAlignment->value() : 0)
<< " .b8 _[" << DL.getTypeAllocSize(retTy) << "]";		<< " .b8 _[" << DL.getTypeAllocSize(retTy) << "]";
} else {		} else {
Show All 32 Lines	if (!Outs[OIdx].Flags.isByVal()) {
}		}
// i8 types in IR will be i16 types in SDAG		// i8 types in IR will be i16 types in SDAG
assert((getValueType(DL, Ty) == Outs[OIdx].VT \|\|		assert((getValueType(DL, Ty) == Outs[OIdx].VT \|\|
(getValueType(DL, Ty) == MVT::i8 && Outs[OIdx].VT == MVT::i16)) &&		(getValueType(DL, Ty) == MVT::i8 && Outs[OIdx].VT == MVT::i16)) &&
"type mismatch between callee prototype and arguments");		"type mismatch between callee prototype and arguments");
// scalar type		// scalar type
unsigned sz = 0;		unsigned sz = 0;
if (isa<IntegerType>(Ty)) {		if (isa<IntegerType>(Ty)) {
sz = cast<IntegerType>(Ty)->getBitWidth();		sz = cast<IntegerType>(Ty)->getBitWidth();
if (sz < 32)		sz = promoteScalarArgumentSize(sz);
sz = 32;
} else if (isa<PointerType>(Ty)) {		} else if (isa<PointerType>(Ty)) {
		traUnsubmitted Done Reply Inline Actions ditto. tra: ditto.
sz = PtrVT.getSizeInBits();		sz = PtrVT.getSizeInBits();
} else if (Ty->isHalfTy())		} else if (Ty->isHalfTy())
// PTX ABI requires all scalar parameters to be at least 32		// PTX ABI requires all scalar parameters to be at least 32
// bits in size. fp16 normally uses .b16 as its storage type		// bits in size. fp16 normally uses .b16 as its storage type
// in PTX, so its size must be adjusted here, too.		// in PTX, so its size must be adjusted here, too.
sz = 32;		sz = 32;
else		else
sz = Ty->getPrimitiveSizeInBits();		sz = Ty->getPrimitiveSizeInBits();
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	if (IsByVal \|\|
Chain, DAG.getConstant(ArgAlign.value(), dl, MVT::i32),		Chain, DAG.getConstant(ArgAlign.value(), dl, MVT::i32),
DAG.getConstant(ParamCount, dl, MVT::i32),		DAG.getConstant(ParamCount, dl, MVT::i32),
DAG.getConstant(TypeSize, dl, MVT::i32), InFlag};		DAG.getConstant(TypeSize, dl, MVT::i32), InFlag};
Chain = DAG.getNode(NVPTXISD::DeclareParam, dl, DeclareParamVTs,		Chain = DAG.getNode(NVPTXISD::DeclareParam, dl, DeclareParamVTs,
DeclareParamOps);		DeclareParamOps);
NeedAlign = true;		NeedAlign = true;
} else {		} else {
// declare .param .b<size> .param<n>;		// declare .param .b<size> .param<n>;
if ((VT.isInteger() \|\| VT.isFloatingPoint()) && TypeSize < 4) {		if (VT.isInteger() \|\| VT.isFloatingPoint()) {
// PTX ABI requires integral types to be at least 32 bits in		// PTX ABI requires integral types to be at least 32 bits in
// size. FP16 is loaded/stored using i16, so it's handled		// size. FP16 is loaded/stored using i16, so it's handled
// here as well.		// here as well.
TypeSize = 4;		TypeSize = promoteScalarArgumentSize(TypeSize * 8) / 8;
}		}
		traUnsubmitted Done Reply Inline Actions This could also use promoteScalarArgumentSize(). if ((VT.isInteger() \|\| VT.isFloatingPoint()) TypeSize = promoteScalarArgumentSize(TypeSize8)/8. tra:* This could also use promoteScalarArgumentSize(). ``` if ((VT.isInteger() \|\| VT.isFloatingPoint…
SDValue DeclareScalarParamOps[] = {		SDValue DeclareScalarParamOps[] = {
Chain, DAG.getConstant(ParamCount, dl, MVT::i32),		Chain, DAG.getConstant(ParamCount, dl, MVT::i32),
DAG.getConstant(TypeSize * 8, dl, MVT::i32),		DAG.getConstant(TypeSize * 8, dl, MVT::i32),
DAG.getConstant(0, dl, MVT::i32), InFlag};		DAG.getConstant(0, dl, MVT::i32), InFlag};
Chain = DAG.getNode(NVPTXISD::DeclareScalarParam, dl, DeclareParamVTs,		Chain = DAG.getNode(NVPTXISD::DeclareScalarParam, dl, DeclareParamVTs,
DeclareScalarParamOps);		DeclareScalarParamOps);
NeedAlign = false;		NeedAlign = false;
}		}
Show All 19 Lines	for (unsigned j = 0, je = VTs.size(); j != je; ++j) {
if (VectorInfo[j] & PVF_FIRST) {		if (VectorInfo[j] & PVF_FIRST) {
assert(StoreOperands.empty() && "Unfinished preceding store.");		assert(StoreOperands.empty() && "Unfinished preceding store.");
StoreOperands.push_back(Chain);		StoreOperands.push_back(Chain);
StoreOperands.push_back(DAG.getConstant(ParamCount, dl, MVT::i32));		StoreOperands.push_back(DAG.getConstant(ParamCount, dl, MVT::i32));
StoreOperands.push_back(DAG.getConstant(CurOffset, dl, MVT::i32));		StoreOperands.push_back(DAG.getConstant(CurOffset, dl, MVT::i32));
}		}

SDValue StVal = OutVals[OIdx];		SDValue StVal = OutVals[OIdx];

		MVT PromotedVT;
		if (PromoteScalarIntegerPTX(EltVT, &PromotedVT)) {
		EltVT = EVT(PromotedVT);
		}
		if (PromoteScalarIntegerPTX(StVal.getValueType(), &PromotedVT)) {
		llvm::ISD::NodeType Ext =
		Outs[OIdx].Flags.isSExt() ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
		StVal = DAG.getNode(Ext, dl, PromotedVT, StVal);
		}

if (IsByVal) {		if (IsByVal) {
auto PtrVT = getPointerTy(DL);		auto PtrVT = getPointerTy(DL);
SDValue srcAddr = DAG.getNode(ISD::ADD, dl, PtrVT, StVal,		SDValue srcAddr = DAG.getNode(ISD::ADD, dl, PtrVT, StVal,
DAG.getConstant(CurOffset, dl, PtrVT));		DAG.getConstant(CurOffset, dl, PtrVT));
StVal = DAG.getLoad(EltVT, dl, TempChain, srcAddr, MachinePointerInfo(),		StVal = DAG.getLoad(EltVT, dl, TempChain, srcAddr, MachinePointerInfo(),
PartAlign);		PartAlign);
} else if (ExtendIntegerParam) {		} else if (ExtendIntegerParam) {
assert(VTs.size() == 1 && "Scalar can't have multiple parts.");		assert(VTs.size() == 1 && "Scalar can't have multiple parts.");
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	if (Ins.size() > 0) {
// .param .align 16 .b8 retval0[<size-in-bytes>], or		// .param .align 16 .b8 retval0[<size-in-bytes>], or
// .param .b<size-in-bits> retval0		// .param .b<size-in-bits> retval0
unsigned resultsz = DL.getTypeAllocSizeInBits(RetTy);		unsigned resultsz = DL.getTypeAllocSizeInBits(RetTy);
// Emit ".param .b<size-in-bits> retval0" instead of byte arrays only for		// Emit ".param .b<size-in-bits> retval0" instead of byte arrays only for
// these three types to match the logic in		// these three types to match the logic in
// NVPTXAsmPrinter::printReturnValStr and NVPTXTargetLowering::getPrototype.		// NVPTXAsmPrinter::printReturnValStr and NVPTXTargetLowering::getPrototype.
// Plus, this behavior is consistent with nvcc's.		// Plus, this behavior is consistent with nvcc's.
if (RetTy->isFloatingPointTy() \|\| RetTy->isPointerTy() \|\|		if (RetTy->isFloatingPointTy() \|\| RetTy->isPointerTy() \|\|
(RetTy->isIntegerTy() && !RetTy->isIntegerTy(128))) {		(RetTy->isIntegerTy() && !RetTy->isIntegerTy(128))) {
// Scalar needs to be at least 32bit wide		resultsz = promoteScalarArgumentSize(resultsz);
if (resultsz < 32)
resultsz = 32;
SDVTList DeclareRetVTs = DAG.getVTList(MVT::Other, MVT::Glue);		SDVTList DeclareRetVTs = DAG.getVTList(MVT::Other, MVT::Glue);
		traUnsubmitted Done Reply Inline Actions ditto. tra: ditto.
SDValue DeclareRetOps[] = { Chain, DAG.getConstant(1, dl, MVT::i32),		SDValue DeclareRetOps[] = { Chain, DAG.getConstant(1, dl, MVT::i32),
DAG.getConstant(resultsz, dl, MVT::i32),		DAG.getConstant(resultsz, dl, MVT::i32),
DAG.getConstant(0, dl, MVT::i32), InFlag };		DAG.getConstant(0, dl, MVT::i32), InFlag };
Chain = DAG.getNode(NVPTXISD::DeclareRet, dl, DeclareRetVTs,		Chain = DAG.getNode(NVPTXISD::DeclareRet, dl, DeclareRetVTs,
DeclareRetOps);		DeclareRetOps);
InFlag = Chain.getValue(1);		InFlag = Chain.getValue(1);
} else {		} else {
retAlignment = getArgumentAlignment(Callee, CB, RetTy, 0, DL);		retAlignment = getArgumentAlignment(Callee, CB, RetTy, 0, DL);
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	if (Ins.size() > 0) {
bool ExtendIntegerRetVal =		bool ExtendIntegerRetVal =
RetTy->isIntegerTy() && DL.getTypeAllocSizeInBits(RetTy) < 32;		RetTy->isIntegerTy() && DL.getTypeAllocSizeInBits(RetTy) < 32;

for (unsigned i = 0, e = VTs.size(); i != e; ++i) {		for (unsigned i = 0, e = VTs.size(); i != e; ++i) {
bool needTruncate = false;		bool needTruncate = false;
EVT TheLoadType = VTs[i];		EVT TheLoadType = VTs[i];
EVT EltType = Ins[i].VT;		EVT EltType = Ins[i].VT;
Align EltAlign = commonAlignment(RetAlign, Offsets[i]);		Align EltAlign = commonAlignment(RetAlign, Offsets[i]);
		MVT PromotedVT;

		if (PromoteScalarIntegerPTX(TheLoadType, &PromotedVT)) {
		TheLoadType = EVT(PromotedVT);
		EltType = EVT(PromotedVT);
		needTruncate = true;
		}

if (ExtendIntegerRetVal) {		if (ExtendIntegerRetVal) {
TheLoadType = MVT::i32;		TheLoadType = MVT::i32;
EltType = MVT::i32;		EltType = MVT::i32;
needTruncate = true;		needTruncate = true;
} else if (TheLoadType.getSizeInBits() < 16) {		} else if (TheLoadType.getSizeInBits() < 16) {
if (VTs[i].isInteger())		if (VTs[i].isInteger())
needTruncate = true;		needTruncate = true;
EltType = MVT::i16;		EltType = MVT::i16;
▲ Show 20 Lines • Show All 764 Lines • ▼ Show 20 Lines	if (!PAL.hasParamAttr(i, Attribute::ByVal)) {
SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, LoadVT, P,		SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, LoadVT, P,
DAG.getIntPtrConstant(j, dl));		DAG.getIntPtrConstant(j, dl));
// We've loaded i1 as an i8 and now must truncate it back to i1		// We've loaded i1 as an i8 and now must truncate it back to i1
if (EltVT == MVT::i1)		if (EltVT == MVT::i1)
Elt = DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, Elt);		Elt = DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, Elt);
// v2f16 was loaded as an i32. Now we must bitcast it back.		// v2f16 was loaded as an i32. Now we must bitcast it back.
else if (EltVT == MVT::v2f16)		else if (EltVT == MVT::v2f16)
Elt = DAG.getNode(ISD::BITCAST, dl, MVT::v2f16, Elt);		Elt = DAG.getNode(ISD::BITCAST, dl, MVT::v2f16, Elt);

		// If a promoted integer type is used, truncate down to the original
		MVT PromotedVT;
		if (PromoteScalarIntegerPTX(EltVT, &PromotedVT)) {
		Elt = DAG.getNode(ISD::TRUNCATE, dl, EltVT, Elt);
		}

// Extend the element if necessary (e.g. an i8 is loaded		// Extend the element if necessary (e.g. an i8 is loaded
// into an i16 register)		// into an i16 register)
if (Ins[InsIdx].VT.isInteger() &&		if (Ins[InsIdx].VT.isInteger() &&
Ins[InsIdx].VT.getFixedSizeInBits() >		Ins[InsIdx].VT.getFixedSizeInBits() >
LoadVT.getFixedSizeInBits()) {		LoadVT.getFixedSizeInBits()) {
unsigned Extend = Ins[InsIdx].Flags.isSExt() ? ISD::SIGN_EXTEND		unsigned Extend = Ins[InsIdx].Flags.isSExt() ? ISD::SIGN_EXTEND
: ISD::ZERO_EXTEND;		: ISD::ZERO_EXTEND;
Elt = DAG.getNode(Extend, dl, Ins[InsIdx].VT, Elt);		Elt = DAG.getNode(Extend, dl, Ins[InsIdx].VT, Elt);
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	NVPTXTargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv,
Type *RetTy = MF.getFunction().getReturnType();		Type *RetTy = MF.getFunction().getReturnType();

bool isABI = (STI.getSmVersion() >= 20);		bool isABI = (STI.getSmVersion() >= 20);
assert(isABI && "Non-ABI compilation is not supported");		assert(isABI && "Non-ABI compilation is not supported");
if (!isABI)		if (!isABI)
return Chain;		return Chain;

const DataLayout &DL = DAG.getDataLayout();		const DataLayout &DL = DAG.getDataLayout();
		SmallVector<SDValue, 16> PromotedOutVals;
SmallVector<EVT, 16> VTs;		SmallVector<EVT, 16> VTs;
SmallVector<uint64_t, 16> Offsets;		SmallVector<uint64_t, 16> Offsets;
ComputePTXValueVTs(*this, DL, RetTy, VTs, &Offsets);		ComputePTXValueVTs(*this, DL, RetTy, VTs, &Offsets);
assert(VTs.size() == OutVals.size() && "Bad return value decomposition");		assert(VTs.size() == OutVals.size() && "Bad return value decomposition");

		for (unsigned i = 0, e = VTs.size(); i != e; ++i) {
		SDValue PromotedOutVal = OutVals[i];
		MVT PromotedVT;
		if (PromoteScalarIntegerPTX(VTs[i], &PromotedVT)) {
		VTs[i] = EVT(PromotedVT);
		}
		if (PromoteScalarIntegerPTX(PromotedOutVal.getValueType(), &PromotedVT)) {
		llvm::ISD::NodeType Ext =
		Outs[i].Flags.isSExt() ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
		PromotedOutVal = DAG.getNode(Ext, dl, PromotedVT, PromotedOutVal);
		}
		PromotedOutVals.push_back(PromotedOutVal);
		}

auto VectorInfo = VectorizePTXValueVTs(		auto VectorInfo = VectorizePTXValueVTs(
VTs, Offsets,		VTs, Offsets,
RetTy->isSized() ? getFunctionParamOptimizedAlign(&F, RetTy, DL)		RetTy->isSized() ? getFunctionParamOptimizedAlign(&F, RetTy, DL)
: Align(1));		: Align(1));

// PTX Interoperability Guide 3.3(A): [Integer] Values shorter than		// PTX Interoperability Guide 3.3(A): [Integer] Values shorter than
// 32-bits are sign extended or zero extended, depending on whether		// 32-bits are sign extended or zero extended, depending on whether
// they are signed or unsigned types.		// they are signed or unsigned types.
bool ExtendIntegerRetVal =		bool ExtendIntegerRetVal =
RetTy->isIntegerTy() && DL.getTypeAllocSizeInBits(RetTy) < 32;		RetTy->isIntegerTy() && DL.getTypeAllocSizeInBits(RetTy) < 32;

SmallVector<SDValue, 6> StoreOperands;		SmallVector<SDValue, 6> StoreOperands;
for (unsigned i = 0, e = VTs.size(); i != e; ++i) {		for (unsigned i = 0, e = VTs.size(); i != e; ++i) {
// New load/store. Record chain and offset operands.		// New load/store. Record chain and offset operands.
if (VectorInfo[i] & PVF_FIRST) {		if (VectorInfo[i] & PVF_FIRST) {
assert(StoreOperands.empty() && "Orphaned operand list.");		assert(StoreOperands.empty() && "Orphaned operand list.");
StoreOperands.push_back(Chain);		StoreOperands.push_back(Chain);
StoreOperands.push_back(DAG.getConstant(Offsets[i], dl, MVT::i32));		StoreOperands.push_back(DAG.getConstant(Offsets[i], dl, MVT::i32));
}		}

SDValue RetVal = OutVals[i];		SDValue OutVal = OutVals[i];
		SDValue RetVal = PromotedOutVals[i];

if (ExtendIntegerRetVal) {		if (ExtendIntegerRetVal) {
RetVal = DAG.getNode(Outs[i].Flags.isSExt() ? ISD::SIGN_EXTEND		RetVal = DAG.getNode(Outs[i].Flags.isSExt() ? ISD::SIGN_EXTEND
: ISD::ZERO_EXTEND,		: ISD::ZERO_EXTEND,
dl, MVT::i32, RetVal);		dl, MVT::i32, RetVal);
} else if (RetVal.getValueSizeInBits() < 16) {		} else if (OutVal.getValueSizeInBits() < 16) {
// Use 16-bit registers for small load-stores as it's the		// Use 16-bit registers for small load-stores as it's the
// smallest general purpose register size supported by NVPTX.		// smallest general purpose register size supported by NVPTX.
RetVal = DAG.getNode(ISD::ANY_EXTEND, dl, MVT::i16, RetVal);		RetVal = DAG.getNode(ISD::ANY_EXTEND, dl, MVT::i16, RetVal);
}		}

// Record the value to return.		// Record the value to return.
StoreOperands.push_back(RetVal);		StoreOperands.push_back(RetVal);

▲ Show 20 Lines • Show All 2,529 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXUtilities.h

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines

	bool getMinCTASm(const Function &, unsigned &);			bool getMinCTASm(const Function &, unsigned &);
	bool getMaxNReg(const Function &, unsigned &);			bool getMaxNReg(const Function &, unsigned &);
	bool isKernelFunction(const Function &);			bool isKernelFunction(const Function &);

	bool getAlign(const Function &, unsigned index, unsigned &);			bool getAlign(const Function &, unsigned index, unsigned &);
	bool getAlign(const CallInst &, unsigned index, unsigned &);			bool getAlign(const CallInst &, unsigned index, unsigned &);

				// PTX ABI requires all scalar argument/return values to have
				// bit-size as a power of two of at least 32 bits.
				inline unsigned promoteScalarArgumentSize(unsigned size) {
				if (size <= 32)
				return 32;
				else if (size <= 64)
				return 64;
				else
				return size;
				}
	}			}

	#endif			#endif

llvm/test/CodeGen/NVPTX/param-load-store.ll

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines
; CHECK-DAG: st.param.b8 [func_retval0+0], [[RE0]]		; CHECK-DAG: st.param.b8 [func_retval0+0], [[RE0]]
; CHECK-DAG: st.param.b8 [func_retval0+4], [[RE4]];		; CHECK-DAG: st.param.b8 [func_retval0+4], [[RE4]];
; CHECK-NEXT: ret;		; CHECK-NEXT: ret;
define <5 x i1> @test_v5i1(<5 x i1> %a) {		define <5 x i1> @test_v5i1(<5 x i1> %a) {
%r = tail call <5 x i1> @test_v5i1(<5 x i1> %a);		%r = tail call <5 x i1> @test_v5i1(<5 x i1> %a);
ret <5 x i1> %r;		ret <5 x i1> %r;
}		}

		; CHECK: .func (.param .b32 func_retval0)
		; CHECK-LABEL: test_i2(
		; CHECK-NEXT: .param .b32 test_i2_param_0
		; CHECK: ld.param.u8 {{%rs[0-9]+}}, [test_i2_param_0];
		; CHECK: .param .b32 param0;
		; CHECK: st.param.b32 [param0+0], {{%r[0-9]+}};
		; CHECK: .param .b32 retval0;
		; CHECK: call.uni (retval0),
		; CHECK: test_i2,
		; CHECK: ld.param.b32 {{%r[0-9]+}}, [retval0+0];
		; CHECK: st.param.b32 [func_retval0+0], {{%r[0-9]+}};
		; CHECK-NEXT: ret;
		define i2 @test_i2(i2 %a) {
		%r = tail call i2 @test_i2(i2 %a);
		ret i2 %r;
		}

		; CHECK: .func (.param .b32 func_retval0)
		; CHECK-LABEL: test_i3(
		; CHECK-NEXT: .param .b32 test_i3_param_0
		; CHECK: ld.param.u8 {{%rs[0-9]+}}, [test_i3_param_0];
		; CHECK: .param .b32 param0;
		; CHECK: st.param.b32 [param0+0], {{%r[0-9]+}};
		; CHECK: .param .b32 retval0;
		; CHECK: call.uni (retval0),
		; CHECK: test_i3,
		; CHECK: ld.param.b32 {{%r[0-9]+}}, [retval0+0];
		; CHECK: st.param.b32 [func_retval0+0], {{%r[0-9]+}};
		; CHECK-NEXT: ret;
		define i3 @test_i3(i3 %a) {
		%r = tail call i3 @test_i3(i3 %a);
		ret i3 %r;
		}

; Unsigned i8 is loaded directly into 32-bit register.		; Unsigned i8 is loaded directly into 32-bit register.
; CHECK: .func (.param .b32 func_retval0)		; CHECK: .func (.param .b32 func_retval0)
; CHECK-LABEL: test_i8(		; CHECK-LABEL: test_i8(
; CHECK-NEXT: .param .b32 test_i8_param_0		; CHECK-NEXT: .param .b32 test_i8_param_0
; CHECK: ld.param.u8 [[A8:%rs[0-9]+]], [test_i8_param_0];		; CHECK: ld.param.u8 [[A8:%rs[0-9]+]], [test_i8_param_0];
; CHECK: cvt.u32.u16 [[A32:%r[0-9]+]], [[A8]];		; CHECK: cvt.u32.u16 [[A32:%r[0-9]+]], [[A8]];
; CHECK: and.b32 [[A:%r[0-9]+]], [[A32]], 255;		; CHECK: and.b32 [[A:%r[0-9]+]], [[A32]], 255;
; CHECK: .param .b32 param0;		; CHECK: .param .b32 param0;
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	define <4 x i8> @test_v4i8(<4 x i8> %a) {
%r = tail call <4 x i8> @test_v4i8(<4 x i8> %a);		%r = tail call <4 x i8> @test_v4i8(<4 x i8> %a);
ret <4 x i8> %r;		ret <4 x i8> %r;
}		}

; CHECK: .func (.param .align 8 .b8 func_retval0[8])		; CHECK: .func (.param .align 8 .b8 func_retval0[8])
; CHECK-LABEL: test_v5i8(		; CHECK-LABEL: test_v5i8(
; CHECK-NEXT: .param .align 8 .b8 test_v5i8_param_0[8]		; CHECK-NEXT: .param .align 8 .b8 test_v5i8_param_0[8]
; CHECK-DAG: ld.param.u8 [[E4:%rs[0-9]+]], [test_v5i8_param_0+4];		; CHECK-DAG: ld.param.u8 [[E4:%rs[0-9]+]], [test_v5i8_param_0+4];
; CHECK-DAG: ld.param.v4.u8 {[[E0:%rs[0-9]+]], [[E1:%rs[0-9]+]], [[E2:%rs[0-9]+]], [[E3:%rs[0-9]+]]}, [test_v5i8_param_0]		; CHECK-DAG: ld.param.v4.u8 {[[E0:%rs[0-9]+]], [[E1:%rs[0-9]+]], [[E2:%rs[0-9]+]], [[E3:%rs[0-9]+]]}, [test_v5i8_param_0]
; CHECK: .param .align 8 .b8 param0[8];		; CHECK: .param .align 8 .b8 param0[8];
; CHECK-DAG: st.param.v4.b8 [param0+0], {[[E0]], [[E1]], [[E2]], [[E3]]};		; CHECK-DAG: st.param.v4.b8 [param0+0], {[[E0]], [[E1]], [[E2]], [[E3]]};
; CHECK-DAG: st.param.b8 [param0+4], [[E4]];		; CHECK-DAG: st.param.b8 [param0+4], [[E4]];
		traUnsubmitted Done Reply Inline Actions This, and other tests dealing with multiple loads/stores of parts of the argument may be fragile. First, the order of loads would not necessarily be guaranteed. Second -- the way LLVM reconstructs the value from parts may also change. I would stick with a set of `CHECK-DAG: ld.param.XX {{.}} [test_i23_param_0+Y]`. All we care about here is that we load the right set of bits. We can assume that reconstructing the integer is handled by appropriate tests already. tra:* This, and other tests dealing with multiple loads/stores of parts of the argument may be…
; CHECK: .param .align 8 .b8 retval0[8];		; CHECK: .param .align 8 .b8 retval0[8];
; CHECK: call.uni (retval0),		; CHECK: call.uni (retval0),
; CHECK-NEXT: test_v5i8,		; CHECK-NEXT: test_v5i8,
; CHECK-DAG: ld.param.v4.b8 {[[RE0:%rs[0-9]+]], [[RE1:%rs[0-9]+]], [[RE2:%rs[0-9]+]], [[RE3:%rs[0-9]+]]}, [retval0+0];		; CHECK-DAG: ld.param.v4.b8 {[[RE0:%rs[0-9]+]], [[RE1:%rs[0-9]+]], [[RE2:%rs[0-9]+]], [[RE3:%rs[0-9]+]]}, [retval0+0];
; CHECK-DAG: ld.param.b8 [[RE4:%rs[0-9]+]], [retval0+4];		; CHECK-DAG: ld.param.b8 [[RE4:%rs[0-9]+]], [retval0+4];
; CHECK-DAG: st.param.v4.b8 [func_retval0+0], {[[RE0]], [[RE1]], [[RE2]], [[RE3]]}		; CHECK-DAG: st.param.v4.b8 [func_retval0+0], {[[RE0]], [[RE1]], [[RE2]], [[RE3]]}
; CHECK-DAG: st.param.b8 [func_retval0+4], [[RE4]];		; CHECK-DAG: st.param.b8 [func_retval0+4], [[RE4]];
; CHECK-NEXT: ret;		; CHECK-NEXT: ret;
define <5 x i8> @test_v5i8(<5 x i8> %a) {		define <5 x i8> @test_v5i8(<5 x i8> %a) {
%r = tail call <5 x i8> @test_v5i8(<5 x i8> %a);		%r = tail call <5 x i8> @test_v5i8(<5 x i8> %a);
ret <5 x i8> %r;		ret <5 x i8> %r;
}		}

; CHECK: .func (.param .b32 func_retval0)		; CHECK: .func (.param .b32 func_retval0)
		traUnsubmitted Done Reply Inline Actions we want CHECK-DAG on loads here. Nit: no need to check logical ops. tra: we want CHECK-DAG on loads here. Nit: no need to check logical ops.
		; CHECK-LABEL: test_i11(
		; CHECK-NEXT: .param .b32 test_i11_param_0
		; CHECK: ld.param.u16 {{%rs[0-9]+}}, [test_i11_param_0];
		; CHECK: st.param.b32 [param0+0], {{%r[0-9]+}};
		; CHECK: .param .b32 retval0;
		; CHECK: call.uni (retval0),
		; CHECK-NEXT: test_i11,
		; CHECK: ld.param.b32 {{%r[0-9]+}}, [retval0+0];
		; CHECK: st.param.b32 [func_retval0+0], {{%r[0-9]+}};
		; CHECK-NEXT: ret;
		define i11 @test_i11(i11 %a) {
		%r = tail call i11 @test_i11(i11 %a);
		ret i11 %r;
		}

		; CHECK: .func (.param .b32 func_retval0)
; CHECK-LABEL: test_i16(		; CHECK-LABEL: test_i16(
; CHECK-NEXT: .param .b32 test_i16_param_0		; CHECK-NEXT: .param .b32 test_i16_param_0
; CHECK: ld.param.u16 [[E16:%rs[0-9]+]], [test_i16_param_0];		; CHECK: ld.param.u16 [[E16:%rs[0-9]+]], [test_i16_param_0];
; CHECK: cvt.u32.u16 [[E32:%r[0-9]+]], [[E16]];		; CHECK: cvt.u32.u16 [[E32:%r[0-9]+]], [[E16]];
; CHECK: .param .b32 param0;		; CHECK: .param .b32 param0;
; CHECK: st.param.b32 [param0+0], [[E32]];		; CHECK: st.param.b32 [param0+0], [[E32]];
; CHECK: .param .b32 retval0;		; CHECK: .param .b32 retval0;
; CHECK: call.uni (retval0),		; CHECK: call.uni (retval0),
▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
; CHECK-DAG: st.param.b16 [func_retval0+16], [[R8]];		; CHECK-DAG: st.param.b16 [func_retval0+16], [[R8]];
; CHECK: ret;		; CHECK: ret;
define <9 x half> @test_v9f16(<9 x half> %a) {		define <9 x half> @test_v9f16(<9 x half> %a) {
%r = tail call <9 x half> @test_v9f16(<9 x half> %a);		%r = tail call <9 x half> @test_v9f16(<9 x half> %a);
ret <9 x half> %r;		ret <9 x half> %r;
}		}

; CHECK: .func (.param .b32 func_retval0)		; CHECK: .func (.param .b32 func_retval0)
		; CHECK-LABEL: test_i19(
		; CHECK-NEXT: .param .b32 test_i19_param_0
		; CHECK-DAG: ld.param.u16 {{%r[0-9]+}}, [test_i19_param_0];
		; CHECK-DAG: ld.param.u8 {{%r[0-9]+}}, [test_i19_param_0+2];
		; CHECK: .param .b32 param0;
		; CHECK: st.param.b32 [param0+0], {{%r[0-9]+}};
		; CHECK: .param .b32 retval0;
		; CHECK: call.uni (retval0),
		; CHECK-NEXT: test_i19,
		; CHECK: ld.param.b32 {{%r[0-9]+}}, [retval0+0];
		; CHECK: st.param.b32 [func_retval0+0], {{%r[0-9]+}};
		; CHECK-NEXT: ret;
		define i19 @test_i19(i19 %a) {
		%r = tail call i19 @test_i19(i19 %a);
		ret i19 %r;
		}

		; CHECK: .func (.param .b32 func_retval0)
		; CHECK-LABEL: test_i23(
		; CHECK-NEXT: .param .b32 test_i23_param_0
		; CHECK-DAG: ld.param.u16 {{%r[0-9]+}}, [test_i23_param_0];
		; CHECK-DAG: ld.param.u8 {{%r[0-9]+}}, [test_i23_param_0+2];
		; CHECK: .param .b32 param0;
		; CHECK: st.param.b32 [param0+0], {{%r[0-9]+}};
		; CHECK: .param .b32 retval0;
		; CHECK: call.uni (retval0),
		; CHECK-NEXT: test_i23,
		; CHECK: ld.param.b32 {{%r[0-9]+}}, [retval0+0];
		; CHECK: st.param.b32 [func_retval0+0], {{%r[0-9]+}};
		; CHECK-NEXT: ret;
		define i23 @test_i23(i23 %a) {
		%r = tail call i23 @test_i23(i23 %a);
		ret i23 %r;
		}

		; CHECK: .func (.param .b32 func_retval0)
		; CHECK-LABEL: test_i24(
		; CHECK-NEXT: .param .b32 test_i24_param_0
		; CHECK-DAG: ld.param.u8 {{%r[0-9]+}}, [test_i24_param_0+2];
		; CHECK-DAG: ld.param.u16 {{%r[0-9]+}}, [test_i24_param_0];
		; CHECK: .param .b32 param0;
		; CHECK: st.param.b32 [param0+0], {{%r[0-9]+}};
		; CHECK: .param .b32 retval0;
		; CHECK: call.uni (retval0),
		; CHECK-NEXT: test_i24,
		; CHECK: ld.param.b32 {{%r[0-9]+}}, [retval0+0];
		; CHECK: st.param.b32 [func_retval0+0], {{%r[0-9]+}};
		; CHECK-NEXT: ret;
		define i24 @test_i24(i24 %a) {
		%r = tail call i24 @test_i24(i24 %a);
		ret i24 %r;
		}

		; CHECK: .func (.param .b32 func_retval0)
		; CHECK-LABEL: test_i29(
		; CHECK-NEXT: .param .b32 test_i29_param_0
		; CHECK: ld.param.u32 {{%r[0-9]+}}, [test_i29_param_0];
		; CHECK: .param .b32 param0;
		; CHECK: st.param.b32 [param0+0], {{%r[0-9]+}};
		; CHECK: .param .b32 retval0;
		; CHECK: call.uni (retval0),
		; CHECK-NEXT: test_i29,
		; CHECK: ld.param.b32 {{%r[0-9]+}}, [retval0+0];
		; CHECK: st.param.b32 [func_retval0+0], {{%r[0-9]+}};
		; CHECK-NEXT: ret;
		define i29 @test_i29(i29 %a) {
		%r = tail call i29 @test_i29(i29 %a);
		ret i29 %r;
		}

		; CHECK: .func (.param .b32 func_retval0)
; CHECK-LABEL: test_i32(		; CHECK-LABEL: test_i32(
; CHECK-NEXT: .param .b32 test_i32_param_0		; CHECK-NEXT: .param .b32 test_i32_param_0
; CHECK: ld.param.u32 [[E:%r[0-9]+]], [test_i32_param_0];		; CHECK: ld.param.u32 [[E:%r[0-9]+]], [test_i32_param_0];
; CHECK: .param .b32 param0;		; CHECK: .param .b32 param0;
; CHECK: st.param.b32 [param0+0], [[E]];		; CHECK: st.param.b32 [param0+0], [[E]];
; CHECK: .param .b32 retval0;		; CHECK: .param .b32 retval0;
; CHECK: call.uni (retval0),		; CHECK: call.uni (retval0),
; CHECK-NEXT: test_i32,		; CHECK-NEXT: test_i32,
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
; CHECK: st.param.f32 [func_retval0+0], [[R]];		; CHECK: st.param.f32 [func_retval0+0], [[R]];
; CHECK-NEXT: ret;		; CHECK-NEXT: ret;
define float @test_f32(float %a) {		define float @test_f32(float %a) {
%r = tail call float @test_f32(float %a);		%r = tail call float @test_f32(float %a);
ret float %r;		ret float %r;
}		}

; CHECK: .func (.param .b64 func_retval0)		; CHECK: .func (.param .b64 func_retval0)
		; CHECK-LABEL: test_i40(
		; CHECK-NEXT: .param .b64 test_i40_param_0
		; CHECK-DAG: ld.param.u8 {{%rd[0-9]+}}, [test_i40_param_0+4];
		; CHECK-DAG: ld.param.u32 {{%rd[0-9]+}}, [test_i40_param_0];
		; CHECK: .param .b64 param0;
		; CHECK: st.param.b64 [param0+0], {{%rd[0-9]+}};
		; CHECK: .param .b64 retval0;
		; CHECK: call.uni (retval0),
		; CHECK-NEXT: test_i40,
		; CHECK: ld.param.b64 {{%rd[0-9]+}}, [retval0+0];
		; CHECK: st.param.b64 [func_retval0+0], {{%rd[0-9]+}};
		; CHECK-NEXT: ret;
		define i40 @test_i40(i40 %a) {
		%r = tail call i40 @test_i40(i40 %a);
		ret i40 %r;
		}

		; CHECK: .func (.param .b64 func_retval0)
		; CHECK-LABEL: test_i47(
		; CHECK-NEXT: .param .b64 test_i47_param_0
		; CHECK-DAG: ld.param.u16 {{%rd[0-9]+}}, [test_i47_param_0+4];
		; CHECK-DAG: ld.param.u32 {{%rd[0-9]+}}, [test_i47_param_0];
		; CHECK: .param .b64 param0;
		; CHECK: st.param.b64 [param0+0], {{%rd[0-9]+}};
		; CHECK: .param .b64 retval0;
		; CHECK: call.uni (retval0),
		; CHECK-NEXT: test_i47,
		; CHECK: ld.param.b64 {{%rd[0-9]+}}, [retval0+0];
		; CHECK: st.param.b64 [func_retval0+0], {{%rd[0-9]+}};
		; CHECK-NEXT: ret;
		define i47 @test_i47(i47 %a) {
		%r = tail call i47 @test_i47(i47 %a);
		ret i47 %r;
		}

		; CHECK: .func (.param .b64 func_retval0)
		; CHECK-LABEL: test_i48(
		; CHECK-NEXT: .param .b64 test_i48_param_0
		; CHECK-DAG: ld.param.u16 {{%rd[0-9]+}}, [test_i48_param_0+4];
		; CHECK-DAG: ld.param.u32 {{%rd[0-9]+}}, [test_i48_param_0];
		; CHECK: .param .b64 param0;
		; CHECK: st.param.b64 [param0+0], {{%rd[0-9]+}};
		; CHECK: .param .b64 retval0;
		; CHECK: call.uni (retval0),
		; CHECK-NEXT: test_i48,
		; CHECK: ld.param.b64 {{%rd[0-9]+}}, [retval0+0];
		; CHECK: st.param.b64 [func_retval0+0], {{%rd[0-9]+}};
		; CHECK-NEXT: ret;
		define i48 @test_i48(i48 %a) {
		%r = tail call i48 @test_i48(i48 %a);
		ret i48 %r;
		}

		; CHECK: .func (.param .b64 func_retval0)
		; CHECK-LABEL: test_i51(
		; CHECK-NEXT: .param .b64 test_i51_param_0
		; CHECK-DAG: ld.param.u8 {{%rd[0-9]+}}, [test_i51_param_0+6];
		; CHECK-DAG: ld.param.u16 {{%rd[0-9]+}}, [test_i51_param_0+4];
		; CHECK-DAG: ld.param.u32 {{%rd[0-9]+}}, [test_i51_param_0];
		; CHECK: .param .b64 param0;
		; CHECK: st.param.b64 [param0+0], {{%rd[0-9]+}};
		; CHECK: .param .b64 retval0;
		; CHECK: call.uni (retval0),
		; CHECK-NEXT: test_i51,
		; CHECK: ld.param.b64 {{%rd[0-9]+}}, [retval0+0];
		; CHECK: st.param.b64 [func_retval0+0], {{%rd[0-9]+}};
		; CHECK-NEXT: ret;
		define i51 @test_i51(i51 %a) {
		%r = tail call i51 @test_i51(i51 %a);
		ret i51 %r;
		}

		; CHECK: .func (.param .b64 func_retval0)
		; CHECK-LABEL: test_i56(
		; CHECK-NEXT: .param .b64 test_i56_param_0
		; CHECK-DAG: ld.param.u8 {{%rd[0-9]+}}, [test_i56_param_0+6];
		; CHECK-DAG: ld.param.u16 {{%rd[0-9]+}}, [test_i56_param_0+4];
		; CHECK-DAG: ld.param.u32 {{%rd[0-9]+}}, [test_i56_param_0];
		; CHECK: .param .b64 param0;
		; CHECK: st.param.b64 [param0+0], {{%rd[0-9]+}};
		; CHECK: .param .b64 retval0;
		; CHECK: call.uni (retval0),
		; CHECK-NEXT: test_i56,
		; CHECK: ld.param.b64 {{%rd[0-9]+}}, [retval0+0];
		; CHECK: st.param.b64 [func_retval0+0], {{%rd[0-9]+}};
		; CHECK-NEXT: ret;
		define i56 @test_i56(i56 %a) {
		%r = tail call i56 @test_i56(i56 %a);
		ret i56 %r;
		}

		; CHECK: .func (.param .b64 func_retval0)
		; CHECK-LABEL: test_i57(
		; CHECK-NEXT: .param .b64 test_i57_param_0
		; CHECK: ld.param.u64 {{%rd[0-9]+}}, [test_i57_param_0];
		; CHECK: .param .b64 param0;
		; CHECK: st.param.b64 [param0+0], {{%rd[0-9]+}};
		; CHECK: .param .b64 retval0;
		; CHECK: call.uni (retval0),
		; CHECK-NEXT: test_i57,
		; CHECK: ld.param.b64 {{%rd[0-9]+}}, [retval0+0];
		; CHECK: st.param.b64 [func_retval0+0], {{%rd[0-9]+}};
		; CHECK-NEXT: ret;
		define i57 @test_i57(i57 %a) {
		%r = tail call i57 @test_i57(i57 %a);
		ret i57 %r;
		}

		; CHECK: .func (.param .b64 func_retval0)
; CHECK-LABEL: test_i64(		; CHECK-LABEL: test_i64(
; CHECK-NEXT: .param .b64 test_i64_param_0		; CHECK-NEXT: .param .b64 test_i64_param_0
; CHECK: ld.param.u64 [[E:%rd[0-9]+]], [test_i64_param_0];		; CHECK: ld.param.u64 [[E:%rd[0-9]+]], [test_i64_param_0];
; CHECK: .param .b64 param0;		; CHECK: .param .b64 param0;
; CHECK: st.param.b64 [param0+0], [[E]];		; CHECK: st.param.b64 [param0+0], [[E]];
; CHECK: .param .b64 retval0;		; CHECK: .param .b64 retval0;
; CHECK: call.uni (retval0),		; CHECK: call.uni (retval0),
; CHECK-NEXT: test_i64,		; CHECK-NEXT: test_i64,
▲ Show 20 Lines • Show All 371 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Promote i24, i40, i48 and i56 to next power-of-two register when passingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 446978

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

llvm/lib/Target/NVPTX/NVPTXUtilities.h

llvm/test/CodeGen/NVPTX/param-load-store.ll

[NVPTX] Promote i24, i40, i48 and i56 to next power-of-two register when passing
ClosedPublic