This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
4/5
AMDGPULowerModuleLDSPass.cpp
-
Utils/
-
AMDGPULDSUtils.h
4/5
AMDGPULDSUtils.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
lower-kernel-lds-constexpr.ll
5/5
lower-module-lds-constantexpr.ll

Differential D103655

[AMDGPU] Handle constant LDS uses from different kernels
ClosedPublic

Authored by rampitec on Jun 3 2021, 4:07 PM.

Download Raw Diff

Details

Reviewers

hsmhsm
arsenm
JonChesterfield

Commits

rG05289dfb6246: [AMDGPU] Handle constant LDS uses from different kernels

Summary

This allows to lower an LDS variable into a kernel structure
even if there is a constant expression used from different
kernels.

Diff Detail

Unit TestsFailed

	Time	Test
	450 ms	x64 debian > LLVM.CodeGen/AMDGPU::ds_read2.ll
	360 ms	x64 debian > LLVM.CodeGen/AMDGPU::ds_write2.ll
	90 ms	x64 debian > LLVM.CodeGen/AMDGPU::lower-kernel-lds-constexpr.ll

Event Timeline

rampitec created this revision.Jun 3 2021, 4:07 PM

Herald added subscribers: foad, kerbowa, hiraditya and 7 others. · View Herald TranscriptJun 3 2021, 4:07 PM

rampitec requested review of this revision.Jun 3 2021, 4:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2021, 4:07 PM

Herald added a subscriber: wdng. · View Herald Transcript

@hsmhsm It depends on the convertConstantExprsToInstructions() you have implemented in the D103225. I would appreciate if you extract the helper into a separate review. Kudos for the helper BTW.

Harbormaster completed remote builds in B107575: Diff 349721.Jun 3 2021, 4:25 PM

I think from this we are just one step from removing module lds except for potentially indirect functions. Everything else can be moved into kernel lds structure.

We do have a problem with excessive use of lds in rocBLAS because of the module lds already, so technically we have a regression with it. There is a w/a but it is better be solved sooner rather than later.

rampitec added a subscriber: tra.Jun 3 2021, 4:35 PM

hsmhsm mentioned this in D103661: [IR] Add utility to convert constant expression operands (of an instruction) to instructions..Jun 3 2021, 6:28 PM

rampitec added a parent revision: D103661: [IR] Add utility to convert constant expression operands (of an instruction) to instructions..Jun 3 2021, 6:48 PM

hsmhsm mentioned this in D103431: [AMDGPU] Fix missing lowering of LDS used in global scope..Jun 3 2021, 10:01 PM

hsmhsm added inline comments.Jun 4 2021, 1:20 AM

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
276	Probably, we can change Instruction I = dyn_cast<Instruction>(U.getUser()); return I && I->getFunction() == F; to Instruction I = cast<Instruction>(U.getUser()); return I->getFunction() == F;
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp
51	We probably better avoid recursion when it is possible to avoid.
54–60	Within the pointer replacement patch https://reviews.llvm.org/D103225, we already have this functionality in place - please take a look at the utility function getFunctionToInstsMap(). Probably we can reuse it here. Once we get FunctionToInstsMap, we can consider only key F, and ignore all others. If you think it is a good idea, then, I can create new patch by taking out this utility function from https://reviews.llvm.org/D103225. I leave it to you.

In D103655#2797744, @rampitec wrote:

@hsmhsm It depends on the convertConstantExprsToInstructions() you have implemented in the D103225. I would appreciate if you extract the helper into a separate review. Kudos for the helper BTW.

Pushed new patch at https://reviews.llvm.org/D103661

In D103655#2797809, @rampitec wrote:

I think from this we are just one step from removing module lds except for potentially indirect functions. Everything else can be moved into kernel lds structure.

We do have a problem with excessive use of lds in rocBLAS because of the module lds already, so technically we have a regression with it. There is a w/a but it is better be solved sooner rather than later.

I am not sure, if we can completely eliminate module lds. My understanding is that it is still required to handle within non-kernel function used LDS. But, instead of directly dealing with LDS, it should deal with pointers, hence the patch https://reviews.llvm.org/D103225.

foad added inline comments.Jun 4 2021, 6:44 AM

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
266	Can this be `for (auto &U : make_early_inc_range(GV->users()))`?
llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
43–45	Why do %4, %5, %6 need to be Instructions? Couldn't they could be left as ConstantExprs?

Addressed review comments.

In D103655#2797809, @rampitec wrote:

I think from this we are just one step from removing module lds except for potentially indirect functions. Everything else can be moved into kernel lds structure.

We do have a problem with excessive use of lds in rocBLAS because of the module lds already, so technically we have a regression with it. There is a w/a but it is better be solved sooner rather than later.

I am not sure, if we can completely eliminate module lds. My understanding is that it is still required to handle within non-kernel function used LDS. But, instead of directly dealing with LDS, it should deal with pointers, hence the patch https://reviews.llvm.org/D103225.

Not completely, but we can omit module lds for functions which are only used from a single kernel at the very least.

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
266	Thanks!
276	Not really, there can be non-instruction uses from outside of the kernel. In fact assert above is also not correct, we may not have converted non-kernel constant exprs.
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp
54–60	It is not the same, I am only interested in instructions with constant expr uses, a much smaller set.
llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
43–45	It converts the whole constant expr, this is how D103661 works (and you have already commented there). I am not sure that is important though. Do you see any benefits of having long constant expressions vs instructions? We have to admit the testcase is quite degenerate too, more a torture test than a real life use.

Harbormaster completed remote builds in B107719: Diff 349932.Jun 4 2021, 12:25 PM

rampitec added inline comments.Jun 4 2021, 2:16 PM

llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
43–45	Actually these have to be instructions. Reading this huge expression... It uses two lds globals, both constexprs are the innermost. If we replace an innermost expression with an instruction we then have to replace all constantexpr uses with instructions too since a constantexpr cannot use an instruction.

rampitec added inline comments.Jun 4 2021, 2:29 PM

llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
43–45	Moreover, in our case we always replace an innermost expression as it all end up with a GlobalVariable. D103661 could probably stop replacing down the operands if it met the CE we requested to replace already, but that is never our scenario. In this example we would always ask to replace @kern and @both users. I.e. an expression to pass to the helper is 'i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3))' for the first occurrence. It shall trigger the whole operand expression replacement. However, helper function might have been improved to stop earlier if somebody would call it with a bigger expression, say 'i32 addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)) to i32'. If that would be a case convertConstantExprsToInstructions() might just stop with producing the addrspacecast instruction and do not convert the inner bitcast.

LGTM except for the minor comment below,

llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp
36	Not sure. what is the reall difference between SmallVector<User > Stack({U}); and SmallVector<User > Stack{U};

This revision is now accepted and ready to land.Jun 6 2021, 8:23 AM

Changed initializer style.

llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp
36	No real difference except that I am not using c++11 syntax. Let's go in style and save couple symbols anyway.

Harbormaster completed remote builds in B108028: Diff 350351.Jun 7 2021, 10:36 AM

Rebased.

rampitec marked an inline comment as done.Jun 7 2021, 12:45 PM

rampitec added inline comments.

llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
43–45	Why do %4, %5, %6 need to be Instructions? Couldn't they could be left as ConstantExprs? With the change in the D103661 we do not convert @both anymore, which is a believe a right thing, it stays in module lds. Only the tree to @kern is converted.

Harbormaster completed remote builds in B108052: Diff 350389.Jun 7 2021, 1:02 PM

hsmhsm mentioned this in rG3af5f3e69247: [IR] Add utility to convert constant expression operands (of an instruction) to….Jun 7 2021, 2:53 PM

Rebased.

This revision was landed with ongoing or failed builds.Jun 7 2021, 3:44 PM

Closed by commit rG05289dfb6246: [AMDGPU] Handle constant LDS uses from different kernels (authored by rampitec). · Explain Why

This revision was automatically updated to reflect the committed changes.

rampitec added a commit: rG05289dfb6246: [AMDGPU] Handle constant LDS uses from different kernels.

Harbormaster completed remote builds in B108088: Diff 350435.Jun 7 2021, 3:52 PM

FYI. I've just got an assertion in the pass. I'll post a reduced reproducer when I have it.
Meanwhile here' the crash info:

F0616 14:20:09.488221 1150352 logging.cc:107] assert.h assertion failed at third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:269 in typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::PointerType, Y = llvm::Type]: isa<X>(Val) && "cast<Ty>() argument of incompatible type!"
*** Check failure stack trace: ***
    @     0x55555d4253df  absl::logging_internal::LogMessage::Die()
    @     0x55555d424e54  absl::logging_internal::LogMessage::SendToLog()
    @     0x55555d424b7f  absl::logging_internal::LogMessage::Flush()
    @     0x55555d425ae9  absl::logging_internal::LogMessageFatal::~LogMessageFatal()
    @     0x55555d4239c4  __assert_fail
    @     0x555558fddf77  llvm::GetElementPtrInst::Create()
    @     0x55555d1077d8  llvm::ConstantExpr::getAsInstruction()
    @     0x55555d201d42  llvm::convertConstantExprsToInstructions()
    @     0x55555d20125c  llvm::convertConstantExprsToInstructions()
    @     0x55555b47e1bd  llvm::AMDGPU::replaceConstantUsesInFunction()
    @     0x55555b2a40d5  (anonymous namespace)::AMDGPULowerModuleLDS::processUsedLDS()
    @     0x55555b2a31df  (anonymous namespace)::AMDGPULowerModuleLDS::runOnModule()
    @     0x55555d1bff34  llvm::legacy::PassManagerImpl::run()
    @     0x555558fae1b9  (anonymous namespace)::EmitAssemblyHelper::EmitAssemblyWithNewPassManager()
    @     0x555558fa9537  clang::EmitBackendOutput()
    @     0x555558fa66a5  clang::BackendConsumer::HandleTranslationUnit()
    @     0x555559c210b4  clang::ParseAST()
    @     0x5555599cb106  clang::FrontendAction::Execute()
    @     0x55555993fdcf  clang::CompilerInstance::ExecuteAction()
    @     0x555558bdbff3  clang::ExecuteCompilerInvocation()
    @     0x555558bcfd54  cc1_main()
    @     0x555558bcd6e7  ExecuteCC1Tool()
    @     0x555558bcd3fd  main
    @     0x7ffff7d29bbd  __libc_start_main
    @     0x555558bca0a9  _start

In D103655#2823172, @tra wrote:

FYI. I've just got an assertion in the pass. I'll post a reduced reproducer when I have it.
Meanwhile here' the crash info:

F0616 14:20:09.488221 1150352 logging.cc:107] assert.h assertion failed at third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:269 in typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::PointerType, Y = llvm::Type]: isa<X>(Val) && "cast<Ty>() argument of incompatible type!"
*** Check failure stack trace: ***
    @     0x55555d4253df  absl::logging_internal::LogMessage::Die()
    @     0x55555d424e54  absl::logging_internal::LogMessage::SendToLog()
    @     0x55555d424b7f  absl::logging_internal::LogMessage::Flush()
    @     0x55555d425ae9  absl::logging_internal::LogMessageFatal::~LogMessageFatal()
    @     0x55555d4239c4  __assert_fail
    @     0x555558fddf77  llvm::GetElementPtrInst::Create()
    @     0x55555d1077d8  llvm::ConstantExpr::getAsInstruction()
    @     0x55555d201d42  llvm::convertConstantExprsToInstructions()
    @     0x55555d20125c  llvm::convertConstantExprsToInstructions()
    @     0x55555b47e1bd  llvm::AMDGPU::replaceConstantUsesInFunction()
    @     0x55555b2a40d5  (anonymous namespace)::AMDGPULowerModuleLDS::processUsedLDS()
    @     0x55555b2a31df  (anonymous namespace)::AMDGPULowerModuleLDS::runOnModule()
    @     0x55555d1bff34  llvm::legacy::PassManagerImpl::run()
    @     0x555558fae1b9  (anonymous namespace)::EmitAssemblyHelper::EmitAssemblyWithNewPassManager()
    @     0x555558fa9537  clang::EmitBackendOutput()
    @     0x555558fa66a5  clang::BackendConsumer::HandleTranslationUnit()
    @     0x555559c210b4  clang::ParseAST()
    @     0x5555599cb106  clang::FrontendAction::Execute()
    @     0x55555993fdcf  clang::CompilerInstance::ExecuteAction()
    @     0x555558bdbff3  clang::ExecuteCompilerInvocation()
    @     0x555558bcfd54  cc1_main()
    @     0x555558bcd6e7  ExecuteCC1Tool()
    @     0x555558bcd3fd  main
    @     0x7ffff7d29bbd  __libc_start_main
    @     0x555558bca0a9  _start

Thanks for the report! Reproducer will be handy.

Minimized reproducer: https://gist.github.com/Artem-B/44d8fa3f1bf0a3c992f4fe5bcf678c3f#file-lds-assert-ll

LLVM version I've tested with: 47f18af55fd59e813144cc76711806d57a160e50

$ bin/opt -amdgpu-lower-module-lds -disable-output LDS-assert.ll

In D103655#2823247, @tra wrote:
Minimized reproducer: https://gist.github.com/Artem-B/44d8fa3f1bf0a3c992f4fe5bcf678c3f#file-lds-assert-ll

LLVM version I've tested with: 47f18af55fd59e813144cc76711806d57a160e50
$ bin/opt -amdgpu-lower-module-lds -disable-output LDS-assert.ll

Thanks, reproduced.

In D103655#2823248, @rampitec wrote:
In D103655#2823247, @tra wrote:
Minimized reproducer: https://gist.github.com/Artem-B/44d8fa3f1bf0a3c992f4fe5bcf678c3f#file-lds-assert-ll

LLVM version I've tested with: 47f18af55fd59e813144cc76711806d57a160e50
$ bin/opt -amdgpu-lower-module-lds -disable-output LDS-assert.ll
Thanks, reproduced.

D104425

foad added inline comments.Jun 17 2021, 1:44 AM

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
264	There is no need to create any Instructions. You are just replacing one Constant (GV) with another (GEP).

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPULowerModuleLDSPass.cpp

12 lines

Utils/

AMDGPULDSUtils.h

7 lines

AMDGPULDSUtils.cpp

42 lines

test/

CodeGen/

AMDGPU/

lower-kernel-lds-constexpr.ll

39 lines

lower-module-lds-constantexpr.ll

14 lines

Diff 350389

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

Show First 20 Lines • Show All 255 Lines • ▼ Show 20 Lines	bool processUsedLDS(Module &M, Function *F = nullptr) {
// Replace uses of ith variable with a constantexpr to the ith field of the		// Replace uses of ith variable with a constantexpr to the ith field of the
// instance that will be allocated by AMDGPUMachineFunction		// instance that will be allocated by AMDGPUMachineFunction
Type *I32 = Type::getInt32Ty(Ctx);		Type *I32 = Type::getInt32Ty(Ctx);
for (size_t I = 0; I < LocalVars.size(); I++) {		for (size_t I = 0; I < LocalVars.size(); I++) {
GlobalVariable *GV = LocalVars[I];		GlobalVariable *GV = LocalVars[I];
Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32, I)};		Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32, I)};
Constant *GEP = ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx);		Constant *GEP = ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx);
if (F) {		if (F) {
		// Replace all constant uses with instructions if they belong to the
		foadUnsubmitted Not Done Reply Inline Actions There is no need to create any Instructions. You are just replacing one Constant (GV) with another (GEP). foad: There is no need to create any Instructions. You are just replacing one Constant (GV) with…
		// current kernel.
		for (User *U : make_early_inc_range(GV->users())) {
		foadUnsubmitted Done Reply Inline Actions Can this be `for (auto &U : make_early_inc_range(GV->users()))`? foad: Can this be `for (auto &U : make_early_inc_range(GV->users()))`?
		rampitecAuthorUnsubmitted Done Reply Inline Actions Thanks! rampitec: Thanks!
		if (ConstantExpr *C = dyn_cast<ConstantExpr>(U))
		AMDGPU::replaceConstantUsesInFunction(C, F);
		}

		GV->removeDeadConstantUsers();

GV->replaceUsesWithIf(GEP, [F](Use &U) {		GV->replaceUsesWithIf(GEP, [F](Use &U) {
return AMDGPU::isUsedOnlyFromFunction(U.getUser(), F);		Instruction *I = dyn_cast<Instruction>(U.getUser());
		return I && I->getFunction() == F;
});		});
		hsmhsmUnsubmitted Done Reply Inline Actions Probably, we can change Instruction I = dyn_cast<Instruction>(U.getUser()); return I && I->getFunction() == F; to Instruction I = cast<Instruction>(U.getUser()); return I->getFunction() == F; hsmhsm: Probably, we can change ``` Instruction *I = dyn_cast<Instruction>(U.getUser()); return I…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Not really, there can be non-instruction uses from outside of the kernel. In fact assert above is also not correct, we may not have converted non-kernel constant exprs. rampitec: Not really, there can be non-instruction uses from outside of the kernel. In fact assert above…
} else {		} else {
GV->replaceAllUsesWith(GEP);		GV->replaceAllUsesWith(GEP);
}		}
if (GV->use_empty()) {		if (GV->use_empty()) {
UsedList.erase(GV);		UsedList.erase(GV);
GV->eraseFromParent();		GV->eraseFromParent();
}		}
}		}
Show All 38 Lines

llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.h

	Show All 11 Lines

	#ifndef LLVM_LIB_TARGET_AMDGPU_UTILS_AMDGPULDSUTILS_H			#ifndef LLVM_LIB_TARGET_AMDGPU_UTILS_AMDGPULDSUTILS_H
	#define LLVM_LIB_TARGET_AMDGPU_UTILS_AMDGPULDSUTILS_H			#define LLVM_LIB_TARGET_AMDGPU_UTILS_AMDGPULDSUTILS_H

	#include "AMDGPU.h"			#include "AMDGPU.h"

	namespace llvm {			namespace llvm {

				class ConstantExpr;

	namespace AMDGPU {			namespace AMDGPU {

	bool isKernelCC(const Function *Func);			bool isKernelCC(const Function *Func);

	Align getAlign(DataLayout const &DL, const GlobalVariable *GV);			Align getAlign(DataLayout const &DL, const GlobalVariable *GV);

	/// \returns true if an LDS global requres lowering to a module LDS structure			/// \returns true if an LDS global requres lowering to a module LDS structure
	/// if \p F is not given. If \p F is given it must be a kernel and function			/// if \p F is not given. If \p F is given it must be a kernel and function
	/// \returns true if an LDS global is directly used from that kernel and it			/// \returns true if an LDS global is directly used from that kernel and it
	/// is safe to replace its uses with a kernel LDS structure member.			/// is safe to replace its uses with a kernel LDS structure member.
	/// \p UsedList contains a union of llvm.used and llvm.compiler.used variables			/// \p UsedList contains a union of llvm.used and llvm.compiler.used variables
	/// which do not count as a use.			/// which do not count as a use.
	bool shouldLowerLDSToStruct(const SmallPtrSetImpl<GlobalValue *> &UsedList,			bool shouldLowerLDSToStruct(const SmallPtrSetImpl<GlobalValue *> &UsedList,
	const GlobalVariable &GV,			const GlobalVariable &GV,
	const Function *F = nullptr);			const Function *F = nullptr);

	std::vector<GlobalVariable *>			std::vector<GlobalVariable *>
	findVariablesToLower(Module &M, const SmallPtrSetImpl<GlobalValue *> &UsedList,			findVariablesToLower(Module &M, const SmallPtrSetImpl<GlobalValue *> &UsedList,
	const Function *F = nullptr);			const Function *F = nullptr);

	SmallPtrSet<GlobalValue *, 32> getUsedList(Module &M);			SmallPtrSet<GlobalValue *, 32> getUsedList(Module &M);

	/// \returns true if all uses of \p U end up in a function \p F.			/// Replace all uses of constant \p C with instructions in \p F.
	bool isUsedOnlyFromFunction(const User U, const Function F);			void replaceConstantUsesInFunction(ConstantExpr C, const Function F);

	} // end namespace AMDGPU			} // end namespace AMDGPU

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_LIB_TARGET_AMDGPU_UTILS_AMDGPULDSUTILS_H			#endif // LLVM_LIB_TARGET_AMDGPU_UTILS_AMDGPULDSUTILS_H

llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp

//===- AMDGPULDSUtils.cpp -------------------------------------------------===//		//===- AMDGPULDSUtils.cpp -------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// AMDGPU LDS related helper utility functions.		// AMDGPU LDS related helper utility functions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPULDSUtils.h"		#include "AMDGPULDSUtils.h"
#include "Utils/AMDGPUBaseInfo.h"		#include "Utils/AMDGPUBaseInfo.h"
		#include "llvm/ADT/SetVector.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
		#include "llvm/IR/ReplaceConstant.h"

using namespace llvm;		using namespace llvm;

namespace llvm {		namespace llvm {

namespace AMDGPU {		namespace AMDGPU {

bool isKernelCC(const Function *Func) {		bool isKernelCC(const Function *Func) {
return AMDGPU::isModuleEntryFunctionCC(Func->getCallingConv());		return AMDGPU::isModuleEntryFunctionCC(Func->getCallingConv());
}		}

Align getAlign(DataLayout const &DL, const GlobalVariable *GV) {		Align getAlign(DataLayout const &DL, const GlobalVariable *GV) {
return DL.getValueOrABITypeAlignment(GV->getPointerAlignment(DL),		return DL.getValueOrABITypeAlignment(GV->getPointerAlignment(DL),
GV->getValueType());		GV->getValueType());
}		}

bool isUsedOnlyFromFunction(const User U, const Function F) {		static void collectFunctionUses(User U, const Function F,
		SetVector<Instruction *> &InstUsers) {
		SmallVector<User *> Stack{U};
		hsmhsmUnsubmitted Done Reply Inline Actions Not sure. what is the reall difference between SmallVector<User > Stack({U}); and SmallVector<User > Stack{U}; hsmhsm: Not sure. what is the reall difference between SmallVector<User *> Stack({U}); and…
		rampitecAuthorUnsubmitted Done Reply Inline Actions No real difference except that I am not using c++11 syntax. Let's go in style and save couple symbols anyway. rampitec: No real difference except that I am not using c++11 syntax. Let's go in style and save couple…

		while (!Stack.empty()) {
		U = Stack.pop_back_val();

if (auto *I = dyn_cast<Instruction>(U)) {		if (auto *I = dyn_cast<Instruction>(U)) {
return I->getFunction() == F;		if (I->getFunction() == F)
		InstUsers.insert(I);
		continue;
}		}

if (isa<ConstantExpr>(U)) {		if (!isa<ConstantExpr>(U))
return all_of(U->users(),		continue;
[F](const User *U) { return isUsedOnlyFromFunction(U, F); });
		append_range(Stack, U->users());
		}
		hsmhsmUnsubmitted Done Reply Inline Actions We probably better avoid recursion when it is possible to avoid. hsmhsm: We probably better avoid recursion when it is possible to avoid.
}		}

return false;		void replaceConstantUsesInFunction(ConstantExpr C, const Function F) {
		SetVector<Instruction *> InstUsers;

		collectFunctionUses(C, F, InstUsers);
		for (Instruction *I : InstUsers) {
		convertConstantExprsToInstructions(I, C);
		}
		hsmhsmUnsubmitted Not Done Reply Inline Actions Within the pointer replacement patch https://reviews.llvm.org/D103225, we already have this functionality in place - please take a look at the utility function getFunctionToInstsMap(). Probably we can reuse it here. Once we get FunctionToInstsMap, we can consider only key F, and ignore all others. If you think it is a good idea, then, I can create new patch by taking out this utility function from https://reviews.llvm.org/D103225. I leave it to you. hsmhsm: Within the pointer replacement patch https://reviews.llvm.org/D103225, we already have this…
		rampitecAuthorUnsubmitted Done Reply Inline Actions It is not the same, I am only interested in instructions with constant expr uses, a much smaller set. rampitec: It is not the same, I am only interested in instructions with constant expr uses, a much…
}		}

bool shouldLowerLDSToStruct(const SmallPtrSetImpl<GlobalValue *> &UsedList,		bool shouldLowerLDSToStruct(const SmallPtrSetImpl<GlobalValue *> &UsedList,
const GlobalVariable &GV, const Function *F) {		const GlobalVariable &GV, const Function *F) {
// Any LDS variable can be lowered by moving into the created struct		// Any LDS variable can be lowered by moving into the created struct
// Each variable so lowered is allocated in every kernel, so variables		// Each variable so lowered is allocated in every kernel, so variables
// whose users are all known to be safe to lower without the transform		// whose users are all known to be safe to lower without the transform
// are left unchanged.		// are left unchanged.
Show All 20 Lines	if (auto *I = dyn_cast<Instruction>(V)) {
Ret = true;		Ret = true;
} else if (!F) {		} else if (!F) {
Ret \|= !isKernelCC(UF);		Ret \|= !isKernelCC(UF);
}		}
continue;		continue;
}		}

if (auto *E = dyn_cast<ConstantExpr>(V)) {		if (auto *E = dyn_cast<ConstantExpr>(V)) {
if (F) {
// Any use which does not end up an instruction disqualifies a
// variable to be put into a kernel's LDS structure because later
// we will need to replace only this kernel's uses for which we
// need to identify a using function.
if (!isUsedOnlyFromFunction(E, F))
return false;
}
for (const User *U : E->users()) {		for (const User *U : E->users()) {
if (Visited.insert(U).second) {		if (Visited.insert(U).second) {
Stack.push_back(U);		Stack.push_back(U);
}		}
}		}
continue;		continue;
}		}

▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll

		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: -p --check-globals
; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds < %s \| FileCheck %s		; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds < %s \| FileCheck %s
; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds < %s \| FileCheck %s		; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds < %s \| FileCheck %s

; CHECK: %llvm.amdgcn.kernel.k2.lds.t = type { i32 }
; CHECK-NOT: %llvm.amdgcn.kernel.k4.lds.t

@lds.1 = internal unnamed_addr addrspace(3) global [2 x i8] undef, align 1		@lds.1 = internal unnamed_addr addrspace(3) global [2 x i8] undef, align 1

; Use constant from different kernels		; CHECK: %llvm.amdgcn.kernel.k0.lds.t = type { [2 x i8] }
		; CHECK: %llvm.amdgcn.kernel.k1.lds.t = type { [2 x i8] }
		; CHECK: %llvm.amdgcn.kernel.k2.lds.t = type { i32 }
		; CHECK: %llvm.amdgcn.kernel.k3.lds.t = type { [32 x i8] }
		; CHECK: %llvm.amdgcn.kernel.k4.lds.t = type { [2 x i8] }
;.		;.
; CHECK: @lds.1 = internal unnamed_addr addrspace(3) global [2 x i8] undef, align 1		; CHECK: @llvm.amdgcn.kernel.k0.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k0.lds.t undef, align 1
		; CHECK: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 1
; CHECK: @llvm.amdgcn.kernel.k2.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k2.lds.t undef, align 4		; CHECK: @llvm.amdgcn.kernel.k2.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k2.lds.t undef, align 4
		; CHECK: @llvm.amdgcn.kernel.k3.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k3.lds.t undef, align 1
		; CHECK: @llvm.amdgcn.kernel.k4.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k4.lds.t undef, align 1
;.		;.

		; Use constant from different kernels
define amdgpu_kernel void @k0(i64 %x) {		define amdgpu_kernel void @k0(i64 %x) {
; CHECK-LABEL: @k0(		; CHECK-LABEL: @k0(
; CHECK-NEXT: %ptr = getelementptr inbounds i8, i8* addrspacecast (i8 addrspace(3)* getelementptr inbounds ([2 x i8], [2 x i8] addrspace(3)* @lds.1, i32 0, i32 0) to i8*), i64 %x		; CHECK-NEXT: %1 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, %llvm.amdgcn.kernel.k0.lds.t addrspace(3)* @llvm.amdgcn.kernel.k0.lds, i32 0, i32 0), i32 0, i32 0
		; CHECK-NEXT: %2 = addrspacecast i8 addrspace(3)* %1 to i8*
		; CHECK-NEXT: %ptr = getelementptr inbounds i8, i8* %2, i64 %x
; CHECK-NEXT: store i8 1, i8* %ptr, align 1		; CHECK-NEXT: store i8 1, i8* %ptr, align 1
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%ptr = getelementptr inbounds i8, i8* addrspacecast ([2 x i8] addrspace(3)* @lds.1 to i8*), i64 %x		%ptr = getelementptr inbounds i8, i8* addrspacecast ([2 x i8] addrspace(3)* @lds.1 to i8*), i64 %x
store i8 1, i8 addrspace(0)* %ptr, align 1		store i8 1, i8 addrspace(0)* %ptr, align 1
ret void		ret void
}		}

define amdgpu_kernel void @k1(i64 %x) {		define amdgpu_kernel void @k1(i64 %x) {
; CHECK-LABEL: @k1(		; CHECK-LABEL: @k1(
; CHECK-NEXT: %ptr = getelementptr inbounds i8, i8* addrspacecast (i8 addrspace(3)* getelementptr inbounds ([2 x i8], [2 x i8] addrspace(3)* @lds.1, i32 0, i32 0) to i8*), i64 %x		; CHECK-NEXT: %1 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, %llvm.amdgcn.kernel.k1.lds.t addrspace(3)* @llvm.amdgcn.kernel.k1.lds, i32 0, i32 0), i32 0, i32 0
		; CHECK-NEXT: %2 = addrspacecast i8 addrspace(3)* %1 to i8*
		; CHECK-NEXT: %ptr = getelementptr inbounds i8, i8* %2, i64 %x
; CHECK-NEXT: store i8 1, i8* %ptr, align 1		; CHECK-NEXT: store i8 1, i8* %ptr, align 1
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%ptr = getelementptr inbounds i8, i8* addrspacecast ([2 x i8] addrspace(3)* @lds.1 to i8*), i64 %x		%ptr = getelementptr inbounds i8, i8* addrspacecast ([2 x i8] addrspace(3)* @lds.1 to i8*), i64 %x
store i8 1, i8 addrspace(0)* %ptr, align 1		store i8 1, i8 addrspace(0)* %ptr, align 1
ret void		ret void
}		}

Show All 15 Lines	;
ret void		ret void
}		}

@lds.3 = internal unnamed_addr addrspace(3) global [32 x i8] undef, align 1		@lds.3 = internal unnamed_addr addrspace(3) global [32 x i8] undef, align 1

; Use constant twice from the same kernel but a different other constant.		; Use constant twice from the same kernel but a different other constant.
define amdgpu_kernel void @k3(i64 %x) {		define amdgpu_kernel void @k3(i64 %x) {
; CHECK-LABEL: @k3(		; CHECK-LABEL: @k3(
; CHECK-NEXT: %ptr1 = addrspacecast i64 addrspace(3)* bitcast (i8 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k3.lds.t, %llvm.amdgcn.kernel.k3.lds.t addrspace(3)* @llvm.amdgcn.kernel.k3.lds, i32 0, i32 0, i32 16) to i64 addrspace(3)) to i64		; CHECK-NEXT: %1 = getelementptr inbounds [32 x i8], [32 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k3.lds.t, %llvm.amdgcn.kernel.k3.lds.t addrspace(3)* @llvm.amdgcn.kernel.k3.lds, i32 0, i32 0), i32 0, i32 16
		; CHECK-NEXT: %2 = bitcast i8 addrspace(3)* %1 to i64 addrspace(3)*
		; CHECK-NEXT: %ptr1 = addrspacecast i64 addrspace(3)* %2 to i64*
; CHECK-NEXT: store i64 1, i64* %ptr1, align 1		; CHECK-NEXT: store i64 1, i64* %ptr1, align 1
; CHECK-NEXT: %ptr2 = addrspacecast i64 addrspace(3)* bitcast (i8 addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k3.lds.t, %llvm.amdgcn.kernel.k3.lds.t addrspace(3)* @llvm.amdgcn.kernel.k3.lds, i32 0, i32 0, i32 24) to i64 addrspace(3)) to i64		; CHECK-NEXT: %3 = getelementptr inbounds [32 x i8], [32 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k3.lds.t, %llvm.amdgcn.kernel.k3.lds.t addrspace(3)* @llvm.amdgcn.kernel.k3.lds, i32 0, i32 0), i32 0, i32 24
		; CHECK-NEXT: %4 = bitcast i8 addrspace(3)* %3 to i64 addrspace(3)*
		; CHECK-NEXT: %ptr2 = addrspacecast i64 addrspace(3)* %4 to i64*
; CHECK-NEXT: store i64 2, i64* %ptr2, align 1		; CHECK-NEXT: store i64 2, i64* %ptr2, align 1
		; CHECK-NEXT: ret void
;		;
%ptr1 = addrspacecast i64 addrspace(3)* bitcast (i8 addrspace(3)* getelementptr inbounds ([32 x i8], [32 x i8] addrspace(3)* @lds.3, i32 0, i32 16) to i64 addrspace(3)) to i64		%ptr1 = addrspacecast i64 addrspace(3)* bitcast (i8 addrspace(3)* getelementptr inbounds ([32 x i8], [32 x i8] addrspace(3)* @lds.3, i32 0, i32 16) to i64 addrspace(3)) to i64
store i64 1, i64* %ptr1, align 1		store i64 1, i64* %ptr1, align 1
%ptr2 = addrspacecast i64 addrspace(3)* bitcast (i8 addrspace(3)* getelementptr inbounds ([32 x i8], [32 x i8] addrspace(3)* @lds.3, i32 0, i32 24) to i64 addrspace(3)) to i64		%ptr2 = addrspacecast i64 addrspace(3)* bitcast (i8 addrspace(3)* getelementptr inbounds ([32 x i8], [32 x i8] addrspace(3)* @lds.3, i32 0, i32 24) to i64 addrspace(3)) to i64
store i64 2, i64* %ptr2, align 1		store i64 2, i64* %ptr2, align 1
ret void		ret void
}		}

; @lds.1 is used from constant expressions in different kernels.		; @lds.1 is used from constant expressions in different kernels.
; Make sure we do not create a structure for it as we cannot handle it yet.
define amdgpu_kernel void @k4(i64 %x) {		define amdgpu_kernel void @k4(i64 %x) {
; CHECK-LABEL: @k4(		; CHECK-LABEL: @k4(
; CHECK-NEXT: %ptr = getelementptr inbounds i8, i8* addrspacecast (i8 addrspace(3)* getelementptr inbounds ([2 x i8], [2 x i8] addrspace(3)* @lds.1, i32 0, i32 0) to i8*), i64 %x		; CHECK-NEXT: %1 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k4.lds.t, %llvm.amdgcn.kernel.k4.lds.t addrspace(3)* @llvm.amdgcn.kernel.k4.lds, i32 0, i32 0), i32 0, i32 0
		; CHECK-NEXT: %2 = addrspacecast i8 addrspace(3)* %1 to i8*
		; CHECK-NEXT: %ptr = getelementptr inbounds i8, i8* %2, i64 %x
; CHECK-NEXT: store i8 1, i8* %ptr, align 1		; CHECK-NEXT: store i8 1, i8* %ptr, align 1
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%ptr = getelementptr inbounds i8, i8* addrspacecast ([2 x i8] addrspace(3)* @lds.1 to i8*), i64 %x		%ptr = getelementptr inbounds i8, i8* addrspacecast ([2 x i8] addrspace(3)* @lds.1 to i8*), i64 %x
store i8 1, i8 addrspace(0)* %ptr, align 1		store i8 1, i8 addrspace(0)* %ptr, align 1
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll

	Show All 31 Lines
	define void @set_func(i32 %x) local_unnamed_addr #1 {			define void @set_func(i32 %x) local_unnamed_addr #1 {
	entry:			entry:
	store i32 %x, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)) to i32) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)) to i32) to i64)) to i32*), align 4			store i32 %x, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)) to i32) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)) to i32) to i64)) to i32*), align 4
	ret void			ret void
	}			}

	; CHECK-LABEL: @timestwo()			; CHECK-LABEL: @timestwo()
	; CHECK: call void @llvm.donothing() [ "ExplicitUse"(%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds) ]			; CHECK: call void @llvm.donothing() [ "ExplicitUse"(%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds) ]
	; CHECK: %ld = load i32, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds to i32 addrspace(3)) to i32) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (%llvm.amdgcn.kernel.timestwo.lds.t addrspace(3)* @llvm.amdgcn.kernel.timestwo.lds to i32 addrspace(3)) to i32) to i64)) to i32*), align 4			; CHECK: %1 = bitcast float addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.timestwo.lds.t, %llvm.amdgcn.kernel.timestwo.lds.t addrspace(3)* @llvm.amdgcn.kernel.timestwo.lds, i32 0, i32 0) to i32 addrspace(3)*
				; CHECK: %2 = addrspacecast i32 addrspace(3)* %1 to i32*
				; CHECK: %3 = ptrtoint i32* %2 to i64
				; CHECK: %4 = add i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds to i32 addrspace(3)) to i32) to i64), %3
				; CHECK: %5 = inttoptr i64 %4 to i32*
				; CHECK: %ld = load i32, i32* %5, align 4
				foadUnsubmitted Done Reply Inline Actions Why do %4, %5, %6 need to be Instructions? Couldn't they could be left as ConstantExprs? foad: Why do %4, %5, %6 need to be Instructions? Couldn't they could be left as ConstantExprs?
				rampitecAuthorUnsubmitted Done Reply Inline Actions It converts the whole constant expr, this is how D103661 works (and you have already commented there). I am not sure that is important though. Do you see any benefits of having long constant expressions vs instructions? We have to admit the testcase is quite degenerate too, more a torture test than a real life use. rampitec: It converts the whole constant expr, this is how D103661 works (and you have already commented…
				rampitecAuthorUnsubmitted Done Reply Inline Actions Actually these have to be instructions. Reading this huge expression... It uses two lds globals, both constexprs are the innermost. If we replace an innermost expression with an instruction we then have to replace all constantexpr uses with instructions too since a constantexpr cannot use an instruction. rampitec: Actually these have to be instructions. Reading this huge expression... It uses two lds globals…
				rampitecAuthorUnsubmitted Done Reply Inline Actions Moreover, in our case we always replace an innermost expression as it all end up with a GlobalVariable. D103661 could probably stop replacing down the operands if it met the CE we requested to replace already, but that is never our scenario. In this example we would always ask to replace @kern and @both users. I.e. an expression to pass to the helper is 'i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3))' for the first occurrence. It shall trigger the whole operand expression replacement. However, helper function might have been improved to stop earlier if somebody would call it with a bigger expression, say 'i32 addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)) to i32'. If that would be a case convertConstantExprsToInstructions() might just stop with producing the addrspacecast instruction and do not convert the inner bitcast. rampitec: Moreover, in our case we always replace an innermost expression as it all end up with a…
				rampitecAuthorUnsubmitted Done Reply Inline Actions Why do %4, %5, %6 need to be Instructions? Couldn't they could be left as ConstantExprs? With the change in the D103661 we do not convert @both anymore, which is a believe a right thing, it stays in module lds. Only the tree to @kern is converted. rampitec: > Why do %4, %5, %6 need to be Instructions? Couldn't they could be left as ConstantExprs?
	; CHECK: %mul = mul i32 %ld, 2			; CHECK: %mul = mul i32 %ld, 2
	; CHECK: store i32 %mul, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (%llvm.amdgcn.kernel.timestwo.lds.t addrspace(3)* @llvm.amdgcn.kernel.timestwo.lds to i32 addrspace(3)) to i32) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds to i32 addrspace(3)) to i32) to i64)) to i32*), align 4			; CHECK: %6 = bitcast float addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.timestwo.lds.t, %llvm.amdgcn.kernel.timestwo.lds.t addrspace(3)* @llvm.amdgcn.kernel.timestwo.lds, i32 0, i32 0) to i32 addrspace(3)*
				; CHECK: %7 = addrspacecast i32 addrspace(3)* %6 to i32*
				; CHECK: %8 = ptrtoint i32* %7 to i64
				; CHECK: %9 = add i64 %8, ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (%llvm.amdgcn.module.lds.t addrspace(3)* @llvm.amdgcn.module.lds to i32 addrspace(3)) to i32) to i64)
				; CHECK: %10 = inttoptr i64 %9 to i32*
				; CHECK: store i32 %mul, i32* %10, align 4
	define amdgpu_kernel void @timestwo() {			define amdgpu_kernel void @timestwo() {
	%ld = load i32, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)) to i32) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)) to i32) to i64)) to i32*), align 4			%ld = load i32, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)) to i32) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)) to i32) to i64)) to i32*), align 4
	%mul = mul i32 %ld, 2			%mul = mul i32 %ld, 2
	store i32 %mul, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)) to i32) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)) to i32) to i64)) to i32*), align 4			store i32 %mul, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)) to i32) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)) to i32) to i64)) to i32*), align 4
	ret void			ret void
	}			}