This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
13/13
AMDGPUPrintfRuntimeBinding.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
opencl-printf.ll

Differential D140560

AMDGPU: Fix broken opaque pointer handling in printf pass
ClosedPublic

Authored by arsenm on Dec 22 2022, 8:59 AM.

Download Raw Diff

Details

Reviewers

sameerds
yaxunl
vikramRH
rampitec
ronlieb

Group Reviewers

Restricted Project

Summary

This was directly considering the pointee type, and also applying
special semantics to constant address space.

@vikramRH since you're looking at this in D138702, can you please do something
about the test coverage before touching this pass? There's no coverage of any of the
format string handling and I see obvious bugs in every part of it. The same initializer bugs I fixed
for the format string in D140558 are repeated for print of string later.

Diff Detail

Event Timeline

arsenm created this revision.Dec 22 2022, 8:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2022, 8:59 AM

Herald added subscribers: kosarev, foad, kerbowa and 6 others. · View Herald Transcript

arsenm requested review of this revision.Dec 22 2022, 8:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2022, 8:59 AM

Herald added a subscriber: wdng. · View Herald Transcript

arsenm added a parent revision: D140558: AMDGPU: Fix broken and permissive handling of printf format strings.Dec 22 2022, 8:59 AM

Harbormaster completed remote builds in B204597: Diff 484856.Dec 22 2022, 9:00 AM

arsenm added inline comments.Dec 22 2022, 9:06 AM

llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
333	Should use GLOBAL_ADDRESS, this is repeated several other times in this function
384–385	Should be using IRBuilder, this is unconditionally creating no-op bitcasts with opaque pointers
387	Should create stores with explicit alignments
420	None of these paths are tested and I think this is broken for undef/poison. Also need some half handling
426–435	This is all wrong for the same reasons as the format string
446	No raw news. Use SmallString
467	Don't see why we need to use ptrtoint, store of pointer works just as well
472	Should at least not die on scalable vector IR
473	This is broken for vectors of pointers
520	Should use getTypeStoreSize
525–526	Another unneeded bitcast with opaque pointers

arsenm added inline comments.Dec 22 2022, 9:07 AM

llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
326–327	Can add a lot of other attributes

arsenm added inline comments.Dec 22 2022, 9:26 AM

llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
426	I don't see why this can't just emit a memcpy

ping

arsenm added a reviewer: ronlieb.Jan 3 2023, 10:34 AM

This pass is used only by OpenCL on AMDGPU. It's meant to be deprecated with @vikramRH working on a replacement that unifies HIP and OpenCL.

@arsenm, The change looks straightforward enough, but is it known to pass on internal OpenCL builds?

LGTM, assuming it does not break internal OpenCL builds.

This revision is now accepted and ready to land.Jan 4 2023, 10:12 PM

In D140560#4027791, @sameerds wrote:

LGTM, assuming it does not break internal OpenCL builds.

It passes. A hacky workaround for this slipped in one of the merge commits into the internal branches

7c327c2fbb8af6ff6eeb93cb976212068044a9e2

Hi, this seems to be causing our bots to fail. One example is https://lab.llvm.org/buildbot/#/builders/231/builds/6915. Can you please take a look? Thanks!

In D140560#4030857, @saghir wrote:

Hi, this seems to be causing our bots to fail. One example is https://lab.llvm.org/buildbot/#/builders/231/builds/6915. Can you please take a look? Thanks!

I have a patch ready for this, need to write some more tests for it

In D140560#4031989, @arsenm wrote:

In D140560#4030857, @saghir wrote:

Hi, this seems to be causing our bots to fail. One example is https://lab.llvm.org/buildbot/#/builders/231/builds/6915. Can you please take a look? Thanks!

I have a patch ready for this, need to write some more tests for it

Try 40078a6b713730ffc164d4c0733d26352eb1e236

In D140560#4032567, @arsenm wrote:

In D140560#4031989, @arsenm wrote:

In D140560#4030857, @saghir wrote:

Hi, this seems to be causing our bots to fail. One example is https://lab.llvm.org/buildbot/#/builders/231/builds/6915. Can you please take a look? Thanks!

I have a patch ready for this, need to write some more tests for it

Try 40078a6b713730ffc164d4c0733d26352eb1e236

Thanks for taking a look. Your patch seems to have fixed the error seen before but it fails at a later check now:

/llvm/test/CodeGen/AMDGPU/opencl-printf.ll:1874:13: error: GCN-NEXT: expected string not found in input
; GCN-NEXT: store i32 1953722977, ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4

Logs can be found here: https://lab.llvm.org/buildbot/#/builders/231/builds/7035/steps/6/logs/FAIL__LLVM__opencl-printf_ll

Jake-Egan added a subscriber: Jake-Egan.Jan 9 2023, 6:21 AM

In D140560#4036217, @saghir wrote:
In D140560#4032567, @arsenm wrote:

In D140560#4031989, @arsenm wrote:

In D140560#4030857, @saghir wrote:

Hi, this seems to be causing our bots to fail. One example is https://lab.llvm.org/buildbot/#/builders/231/builds/6915. Can you please take a look? Thanks!

I have a patch ready for this, need to write some more tests for it

Try 40078a6b713730ffc164d4c0733d26352eb1e236

Thanks for taking a look. Your patch seems to have fixed the error seen before but it fails at a later check now:
/llvm/test/CodeGen/AMDGPU/opencl-printf.ll:1874:13: error: GCN-NEXT: expected string not found in input
; GCN-NEXT: store i32 1953722977, ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4
Logs can be found here: https://lab.llvm.org/buildbot/#/builders/231/builds/7035/steps/6/logs/FAIL__LLVM__opencl-printf_ll

Since this is still causing failures (i.e. DataExtractor does not reorder bytes for getBytes() as it does for getU...()), we can fix the problem as follows:

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
index 18dbf5d..63a365e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
@@ -435,19 +435,31 @@ bool AMDGPUPrintfRuntimeBindingImpl::lowerPrintfForGpu(Module &M) {
             DataExtractor Extractor(S, /*IsLittleEndian=*/true, 8);
             DataExtractor::Cursor Offset(0);
             while (Offset && Offset.tell() < S.size()) {
-              StringRef ReadBytes = Extractor.getBytes(
-                  Offset, std::min(ReadSize, S.size() - Offset.tell()));
+              uint64_t ReadNow = std::min(ReadSize, S.size() - Offset.tell());
+              uint64_t ReadBytes = 0;
+              switch (ReadNow) {
+              default: llvm_unreachable("min(4, X) > 4?");
+              case 1:
+                ReadBytes = Extractor.getU8(Offset);
+                break;
+              case 2:
+                ReadBytes = Extractor.getU16(Offset);
+                break;
+              case 3:
+                ReadBytes = Extractor.getU24(Offset);
+                break;
+              case 4:
+                ReadBytes = Extractor.getU32(Offset);
+                break;
+              }
 
               cantFail(Offset.takeError(),
                        "failed to read bytes from constant array");
 
-              APInt IntVal(8 * ReadBytes.size(), 0);
-              LoadIntFromMemory(
-                  IntVal, reinterpret_cast<const uint8_t *>(ReadBytes.data()),
-                  ReadBytes.size());
+              APInt IntVal(8 * ReadSize, ReadBytes);
 
               // TODO: Should not bothering aligning up.
-              if (ReadBytes.size() < ReadSize)
+              if (ReadNow < ReadSize)
                 IntVal = IntVal.zext(8 * ReadSize);
 
               Type *IntTy = Type::getIntNTy(Ctx, IntVal.getBitWidth());

@arsenm Would a fix similar to this be acceptable?

In D140560#4040447, @nemanjai wrote:

In D140560#4036217, @saghir wrote:
In D140560#4032567, @arsenm wrote:

In D140560#4031989, @arsenm wrote:

In D140560#4030857, @saghir wrote:

Hi, this seems to be causing our bots to fail. One example is https://lab.llvm.org/buildbot/#/builders/231/builds/6915. Can you please take a look? Thanks!

I have a patch ready for this, need to write some more tests for it

Try 40078a6b713730ffc164d4c0733d26352eb1e236

Thanks for taking a look. Your patch seems to have fixed the error seen before but it fails at a later check now:
/llvm/test/CodeGen/AMDGPU/opencl-printf.ll:1874:13: error: GCN-NEXT: expected string not found in input
; GCN-NEXT: store i32 1953722977, ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4
Logs can be found here: https://lab.llvm.org/buildbot/#/builders/231/builds/7035/steps/6/logs/FAIL__LLVM__opencl-printf_ll

Since this is still causing failures (i.e. DataExtractor does not reorder bytes for getBytes() as it does for getU...()), we can fix the problem as follows:

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
index 18dbf5d..63a365e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
@@ -435,19 +435,31 @@ bool AMDGPUPrintfRuntimeBindingImpl::lowerPrintfForGpu(Module &M) {
             DataExtractor Extractor(S, /*IsLittleEndian=*/true, 8);
             DataExtractor::Cursor Offset(0);
             while (Offset && Offset.tell() < S.size()) {
-              StringRef ReadBytes = Extractor.getBytes(
-                  Offset, std::min(ReadSize, S.size() - Offset.tell()));
+              uint64_t ReadNow = std::min(ReadSize, S.size() - Offset.tell());
+              uint64_t ReadBytes = 0;
+              switch (ReadNow) {
+              default: llvm_unreachable("min(4, X) > 4?");
+              case 1:
+                ReadBytes = Extractor.getU8(Offset);
+                break;
+              case 2:
+                ReadBytes = Extractor.getU16(Offset);
+                break;
+              case 3:
+                ReadBytes = Extractor.getU24(Offset);
+                break;
+              case 4:
+                ReadBytes = Extractor.getU32(Offset);
+                break;
+              }
 
               cantFail(Offset.takeError(),
                        "failed to read bytes from constant array");
 
-              APInt IntVal(8 * ReadBytes.size(), 0);
-              LoadIntFromMemory(
-                  IntVal, reinterpret_cast<const uint8_t *>(ReadBytes.data()),
-                  ReadBytes.size());
+              APInt IntVal(8 * ReadSize, ReadBytes);
 
               // TODO: Should not bothering aligning up.
-              if (ReadBytes.size() < ReadSize)
+              if (ReadNow < ReadSize)
                 IntVal = IntVal.zext(8 * ReadSize);
 
               Type *IntTy = Type::getIntNTy(Ctx, IntVal.getBitWidth());

@arsenm Would a fix similar to this be acceptable?

Yes, I was hoping to extend this to up to 16-bytes at a time but I guess I can figure that out then (we should probably just be emitting a memcpy/strcpy anyway)

In D140560#4040448, @arsenm wrote:

@arsenm Would a fix similar to this be acceptable?

Yes, I was hoping to extend this to up to 16-bytes at a time but I guess I can figure that out then (we should probably just be emitting a memcpy/strcpy anyway)

Did you want me to go ahead with this fix now to unblock the bots and then you guys can finalize the fixes with memcpy/strcpy at a later date?

In D140560#4040714, @nemanjai wrote:

In D140560#4040448, @arsenm wrote:

@arsenm Would a fix similar to this be acceptable?

Yes, I was hoping to extend this to up to 16-bytes at a time but I guess I can figure that out then (we should probably just be emitting a memcpy/strcpy anyway)

Did you want me to go ahead with this fix now to unblock the bots and then you guys can finalize the fixes with memcpy/strcpy at a later date?

Yes, thanks

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUPrintfRuntimeBinding.cpp

11 lines

test/

CodeGen/

AMDGPU/

opencl-printf.ll

136 lines

Diff 484856

llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	if (ArgDump)
OpConvSpecifiers.push_back(Fmt[CurFmtSpecifierIdx]);		OpConvSpecifiers.push_back(Fmt[CurFmtSpecifierIdx]);

PrevFmtSpecifierIdx = ++CurFmtSpecifierIdx;		PrevFmtSpecifierIdx = ++CurFmtSpecifierIdx;
}		}
}		}

bool AMDGPUPrintfRuntimeBindingImpl::shouldPrintAsStr(char Specifier,		bool AMDGPUPrintfRuntimeBindingImpl::shouldPrintAsStr(char Specifier,
Type *OpType) const {		Type *OpType) const {
if (Specifier != 's')		return Specifier == 's' && isa<PointerType>(OpType);
return false;
const PointerType *PT = dyn_cast<PointerType>(OpType);
if (!PT \|\| PT->getAddressSpace() != AMDGPUAS::CONSTANT_ADDRESS)
return false;
Type *ElemType = PT->getContainedType(0);
if (ElemType->getTypeID() != Type::IntegerTyID)
return false;
IntegerType *ElemIType = cast<IntegerType>(ElemType);
return ElemIType->getBitWidth() == 8;
}		}

static void diagnoseInvalidFormatString(const CallBase *CI) {		static void diagnoseInvalidFormatString(const CallBase *CI) {
DiagnosticInfoUnsupported UnsupportedFormatStr(		DiagnosticInfoUnsupported UnsupportedFormatStr(
*CI->getParent()->getParent(),		*CI->getParent()->getParent(),
"printf format string must be a trivially resolved constant string "		"printf format string must be a trivially resolved constant string "
"global variable",		"global variable",
CI->getDebugLoc());		CI->getDebugLoc());
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	for (char C : Str) {
break;		break;
}		}
}		}

// Insert the printf_alloc call		// Insert the printf_alloc call
Builder.SetInsertPoint(CI);		Builder.SetInsertPoint(CI);
Builder.SetCurrentDebugLocation(CI->getDebugLoc());		Builder.SetCurrentDebugLocation(CI->getDebugLoc());

AttributeList Attr = AttributeList::get(Ctx, AttributeList::FunctionIndex,		AttributeList Attr = AttributeList::get(Ctx, AttributeList::FunctionIndex,
Attribute::NoUnwind);		Attribute::NoUnwind);
		arsenmAuthorUnsubmitted Done Reply Inline Actions Can add a lot of other attributes arsenm: Can add a lot of other attributes

Type *SizetTy = Type::getInt32Ty(Ctx);		Type *SizetTy = Type::getInt32Ty(Ctx);

Type *Tys_alloc[1] = {SizetTy};		Type *Tys_alloc[1] = {SizetTy};
Type *I8Ty = Type::getInt8Ty(Ctx);		Type *I8Ty = Type::getInt8Ty(Ctx);
Type *I8Ptr = PointerType::get(I8Ty, 1);		Type *I8Ptr = PointerType::get(I8Ty, 1);
		arsenmAuthorUnsubmitted Done Reply Inline Actions Should use GLOBAL_ADDRESS, this is repeated several other times in this function arsenm: Should use GLOBAL_ADDRESS, this is repeated several other times in this function
FunctionType *FTy_alloc = FunctionType::get(I8Ptr, Tys_alloc, false);		FunctionType *FTy_alloc = FunctionType::get(I8Ptr, Tys_alloc, false);
FunctionCallee PrintfAllocFn =		FunctionCallee PrintfAllocFn =
M.getOrInsertFunction(StringRef("__printf_alloc"), FTy_alloc, Attr);		M.getOrInsertFunction(StringRef("__printf_alloc"), FTy_alloc, Attr);

LLVM_DEBUG(dbgs() << "Printf metadata = " << Sizes.str() << '\n');		LLVM_DEBUG(dbgs() << "Printf metadata = " << Sizes.str() << '\n');
std::string fmtstr = itostr(++UniqID) + ":" + Sizes.str();		std::string fmtstr = itostr(++UniqID) + ":" + Sizes.str();
MDString *fmtStrArray = MDString::get(Ctx, fmtstr);		MDString *fmtStrArray = MDString::get(Ctx, fmtstr);

Show All 34 Lines	for (auto *CI : Printfs) {
Builder.SetInsertPoint(Brnch);		Builder.SetInsertPoint(Brnch);

// store unique printf id in the buffer		// store unique printf id in the buffer
//		//
GetElementPtrInst *BufferIdx = GetElementPtrInst::Create(		GetElementPtrInst *BufferIdx = GetElementPtrInst::Create(
I8Ty, pcall, ConstantInt::get(Ctx, APInt(32, 0)), "PrintBuffID", Brnch);		I8Ty, pcall, ConstantInt::get(Ctx, APInt(32, 0)), "PrintBuffID", Brnch);

Type *idPointer = PointerType::get(I32Ty, AMDGPUAS::GLOBAL_ADDRESS);		Type *idPointer = PointerType::get(I32Ty, AMDGPUAS::GLOBAL_ADDRESS);
Value *id_gep_cast =		Value *id_gep_cast =
new BitCastInst(BufferIdx, idPointer, "PrintBuffIdCast", Brnch);		new BitCastInst(BufferIdx, idPointer, "PrintBuffIdCast", Brnch);
		arsenmAuthorUnsubmitted Done Reply Inline Actions Should be using IRBuilder, this is unconditionally creating no-op bitcasts with opaque pointers arsenm: Should be using IRBuilder, this is unconditionally creating no-op bitcasts with opaque pointers

new StoreInst(ConstantInt::get(I32Ty, UniqID), id_gep_cast, Brnch);		new StoreInst(ConstantInt::get(I32Ty, UniqID), id_gep_cast, Brnch);
		arsenmAuthorUnsubmitted Done Reply Inline Actions Should create stores with explicit alignments arsenm: Should create stores with explicit alignments

// 1st 4 bytes hold the printf_id		// 1st 4 bytes hold the printf_id
// the following GEP is the buffer pointer		// the following GEP is the buffer pointer
BufferIdx = GetElementPtrInst::Create(I8Ty, pcall,		BufferIdx = GetElementPtrInst::Create(I8Ty, pcall,
ConstantInt::get(Ctx, APInt(32, 4)),		ConstantInt::get(Ctx, APInt(32, 4)),
"PrintBuffGep", Brnch);		"PrintBuffGep", Brnch);

Type *Int32Ty = Type::getInt32Ty(Ctx);		Type *Int32Ty = Type::getInt32Ty(Ctx);
Show All 16 Lines	for (unsigned ArgCount = 1;
IType = Int32Ty;		IType = Int32Ty;
} else if (auto *FpExt = dyn_cast<FPExtInst>(Arg)) {		} else if (auto *FpExt = dyn_cast<FPExtInst>(Arg)) {
if (FpExt->getType()->isDoubleTy() &&		if (FpExt->getType()->isDoubleTy() &&
FpExt->getOperand(0)->getType()->isFloatTy()) {		FpExt->getOperand(0)->getType()->isFloatTy()) {
Arg = FpExt->getOperand(0);		Arg = FpExt->getOperand(0);
IType = Int32Ty;		IType = Int32Ty;
}		}
}		}
}		}
		arsenmAuthorUnsubmitted Done Reply Inline Actions None of these paths are tested and I think this is broken for undef/poison. Also need some half handling arsenm: None of these paths are tested and I think this is broken for undef/poison. Also need some half…
Arg = new BitCastInst(Arg, IType, "PrintArgFP", Brnch);		Arg = new BitCastInst(Arg, IType, "PrintArgFP", Brnch);
WhatToStore.push_back(Arg);		WhatToStore.push_back(Arg);
} else if (ArgType->getTypeID() == Type::PointerTyID) {		} else if (ArgType->getTypeID() == Type::PointerTyID) {
if (shouldPrintAsStr(OpConvSpecifiers[ArgCount - 1], ArgType)) {		if (shouldPrintAsStr(OpConvSpecifiers[ArgCount - 1], ArgType)) {
const char *S = NonLiteralStr;		const char *S = NonLiteralStr;
if (auto *ConstExpr = dyn_cast<ConstantExpr>(Arg)) {		if (auto *ConstExpr = dyn_cast<ConstantExpr>(Arg)) {
		arsenmAuthorUnsubmitted Done Reply Inline Actions I don't see why this can't just emit a memcpy arsenm: I don't see why this can't just emit a memcpy
auto *GV = dyn_cast<GlobalVariable>(ConstExpr->getOperand(0));		auto *GV = dyn_cast<GlobalVariable>(ConstExpr->getOperand(0));
if (GV && GV->hasInitializer()) {		if (GV && GV->hasInitializer()) {
Constant *Init = GV->getInitializer();		Constant *Init = GV->getInitializer();
bool IsZeroValue = Init->isZeroValue();		bool IsZeroValue = Init->isZeroValue();
auto *CA = dyn_cast<ConstantDataArray>(Init);		auto *CA = dyn_cast<ConstantDataArray>(Init);
if (IsZeroValue \|\| (CA && CA->isString())) {		if (IsZeroValue \|\| (CA && CA->isString())) {
S = IsZeroValue ? "" : CA->getAsCString().data();		S = IsZeroValue ? "" : CA->getAsCString().data();
}		}
}		}
		arsenmAuthorUnsubmitted Done Reply Inline Actions This is all wrong for the same reasons as the format string arsenm: This is all wrong for the same reasons as the format string
}		}
size_t SizeStr = strlen(S) + 1;		size_t SizeStr = strlen(S) + 1;
size_t Rem = SizeStr % DWORD_ALIGN;		size_t Rem = SizeStr % DWORD_ALIGN;
size_t NSizeStr = 0;		size_t NSizeStr = 0;
if (Rem) {		if (Rem) {
NSizeStr = SizeStr + (DWORD_ALIGN - Rem);		NSizeStr = SizeStr + (DWORD_ALIGN - Rem);
} else {		} else {
NSizeStr = SizeStr;		NSizeStr = SizeStr;
}		}
if (S[0]) {		if (S[0]) {
char *MyNewStr = new char[NSizeStr]();		char *MyNewStr = new char[NSizeStr]();
		arsenmAuthorUnsubmitted Done Reply Inline Actions No raw news. Use SmallString arsenm: No raw news. Use SmallString
strcpy(MyNewStr, S);		strcpy(MyNewStr, S);
int NumInts = NSizeStr / 4;		int NumInts = NSizeStr / 4;
int CharC = 0;		int CharC = 0;
while (NumInts) {		while (NumInts) {
int ANum = (int )(MyNewStr + CharC);		int ANum = (int )(MyNewStr + CharC);
CharC += 4;		CharC += 4;
NumInts--;		NumInts--;
Value *ANumV = ConstantInt::get(Int32Ty, ANum, false);		Value *ANumV = ConstantInt::get(Int32Ty, ANum, false);
WhatToStore.push_back(ANumV);		WhatToStore.push_back(ANumV);
}		}
delete[] MyNewStr;		delete[] MyNewStr;
} else {		} else {
// Empty string, give a hint to RT it is no NULL		// Empty string, give a hint to RT it is no NULL
Value *ANumV = ConstantInt::get(Int32Ty, 0xFFFFFF00, false);		Value *ANumV = ConstantInt::get(Int32Ty, 0xFFFFFF00, false);
WhatToStore.push_back(ANumV);		WhatToStore.push_back(ANumV);
}		}
} else {		} else {
uint64_t Size = TD->getTypeAllocSizeInBits(ArgType);		uint64_t Size = TD->getTypeAllocSizeInBits(ArgType);
assert((Size == 32 \|\| Size == 64) && "unsupported size");		assert((Size == 32 \|\| Size == 64) && "unsupported size");
Type *DstType = (Size == 32) ? Int32Ty : Int64Ty;		Type *DstType = (Size == 32) ? Int32Ty : Int64Ty;
Arg = new PtrToIntInst(Arg, DstType, "PrintArgPtr", Brnch);		Arg = new PtrToIntInst(Arg, DstType, "PrintArgPtr", Brnch);
		arsenmAuthorUnsubmitted Done Reply Inline Actions Don't see why we need to use ptrtoint, store of pointer works just as well arsenm: Don't see why we need to use ptrtoint, store of pointer works just as well
WhatToStore.push_back(Arg);		WhatToStore.push_back(Arg);
}		}
} else if (isa<FixedVectorType>(ArgType)) {		} else if (isa<FixedVectorType>(ArgType)) {
Type *IType = nullptr;		Type *IType = nullptr;
uint32_t EleCount = cast<FixedVectorType>(ArgType)->getNumElements();		uint32_t EleCount = cast<FixedVectorType>(ArgType)->getNumElements();
		arsenmAuthorUnsubmitted Done Reply Inline Actions Should at least not die on scalable vector IR arsenm: Should at least not die on scalable vector IR
uint32_t EleSize = ArgType->getScalarSizeInBits();		uint32_t EleSize = ArgType->getScalarSizeInBits();
		arsenmAuthorUnsubmitted Done Reply Inline Actions This is broken for vectors of pointers arsenm: This is broken for vectors of pointers
uint32_t TotalSize = EleCount * EleSize;		uint32_t TotalSize = EleCount * EleSize;
if (EleCount == 3) {		if (EleCount == 3) {
ShuffleVectorInst *Shuffle =		ShuffleVectorInst *Shuffle =
new ShuffleVectorInst(Arg, Arg, ArrayRef<int>{0, 1, 2, 2});		new ShuffleVectorInst(Arg, Arg, ArrayRef<int>{0, 1, 2, 2});
Shuffle->insertBefore(Brnch);		Shuffle->insertBefore(Brnch);
Arg = Shuffle;		Arg = Shuffle;
ArgType = Arg->getType();		ArgType = Arg->getType();
TotalSize += EleSize;		TotalSize += EleSize;
Show All 30 Lines	for (unsigned ArgCount = 1;
}		}
Arg = new BitCastInst(Arg, IType, "PrintArgVect", Brnch);		Arg = new BitCastInst(Arg, IType, "PrintArgVect", Brnch);
WhatToStore.push_back(Arg);		WhatToStore.push_back(Arg);
} else {		} else {
WhatToStore.push_back(Arg);		WhatToStore.push_back(Arg);
}		}
for (unsigned I = 0, E = WhatToStore.size(); I != E; ++I) {		for (unsigned I = 0, E = WhatToStore.size(); I != E; ++I) {
Value *TheBtCast = WhatToStore[I];		Value *TheBtCast = WhatToStore[I];
unsigned ArgSize = TD->getTypeAllocSizeInBits(TheBtCast->getType()) / 8;		unsigned ArgSize = TD->getTypeAllocSizeInBits(TheBtCast->getType()) / 8;
		arsenmAuthorUnsubmitted Done Reply Inline Actions Should use getTypeStoreSize arsenm: Should use getTypeStoreSize
SmallVector<Value *, 1> BuffOffset;		SmallVector<Value *, 1> BuffOffset;
BuffOffset.push_back(ConstantInt::get(I32Ty, ArgSize));		BuffOffset.push_back(ConstantInt::get(I32Ty, ArgSize));

Type *ArgPointer = PointerType::get(TheBtCast->getType(), 1);		Type *ArgPointer = PointerType::get(TheBtCast->getType(), 1);
Value *CastedGEP =		Value *CastedGEP =
new BitCastInst(BufferIdx, ArgPointer, "PrintBuffPtrCast", Brnch);		new BitCastInst(BufferIdx, ArgPointer, "PrintBuffPtrCast", Brnch);
		arsenmAuthorUnsubmitted Done Reply Inline Actions Another unneeded bitcast with opaque pointers arsenm: Another unneeded bitcast with opaque pointers
StoreInst *StBuff = new StoreInst(TheBtCast, CastedGEP, Brnch);		StoreInst *StBuff = new StoreInst(TheBtCast, CastedGEP, Brnch);
LLVM_DEBUG(dbgs() << "inserting store to printf buffer:\n"		LLVM_DEBUG(dbgs() << "inserting store to printf buffer:\n"
<< *StBuff << '\n');		<< *StBuff << '\n');
(void)StBuff;		(void)StBuff;
if (I + 1 == E && ArgCount + 1 == CI->arg_size())		if (I + 1 == E && ArgCount + 1 == CI->arg_size())
break;		break;
BufferIdx = GetElementPtrInst::Create(I8Ty, BufferIdx, BuffOffset,		BufferIdx = GetElementPtrInst::Create(I8Ty, BufferIdx, BuffOffset,
"PrintBuffNextPtr", Brnch);		"PrintBuffNextPtr", Brnch);
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/opencl-printf.ll

	Show All 9 Lines
	; R600-NEXT: entry:			; R600-NEXT: entry:
	; R600-NEXT: [[STR:%.*]] = alloca [9 x i8], align 1, addrspace(5)			; R600-NEXT: [[STR:%.*]] = alloca [9 x i8], align 1, addrspace(5)
	; R600-NEXT: [[CALL1:%.]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) @.str, ptr addrspace(5) [[STR]], i32 [[N:%.]])			; R600-NEXT: [[CALL1:%.]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) @.str, ptr addrspace(5) [[STR]], i32 [[N:%.]])
	; R600-NEXT: ret void			; R600-NEXT: ret void
	;			;
	; GCN-LABEL: @test_kernel(			; GCN-LABEL: @test_kernel(
	; GCN-NEXT: entry:			; GCN-NEXT: entry:
	; GCN-NEXT: [[STR:%.*]] = alloca [9 x i8], align 1, addrspace(5)			; GCN-NEXT: [[STR:%.*]] = alloca [9 x i8], align 1, addrspace(5)
	; GCN-NEXT: [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 16)			; GCN-NEXT: [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 12)
	; GCN-NEXT: br label [[ENTRY_SPLIT:%.*]]			; GCN-NEXT: br label [[ENTRY_SPLIT:%.*]]
	; GCN: entry.split:			; GCN: entry.split:
	; GCN-NEXT: [[TMP0:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null			; GCN-NEXT: [[TMP0:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null
	; GCN-NEXT: br i1 [[TMP0]], label [[TMP1:%.]], label [[TMP2:%.]]			; GCN-NEXT: br i1 [[TMP0]], label [[TMP1:%.]], label [[TMP2:%.]]
	; GCN: 1:			; GCN: 1:
	; GCN-NEXT: [[PRINTBUFFID:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 0			; GCN-NEXT: [[PRINTBUFFID:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 0
	; GCN-NEXT: [[PRINTBUFFIDCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFID]] to ptr addrspace(1)			; GCN-NEXT: [[PRINTBUFFIDCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFID]] to ptr addrspace(1)
	; GCN-NEXT: store i32 1, ptr addrspace(1) [[PRINTBUFFIDCAST]], align 4			; GCN-NEXT: store i32 1, ptr addrspace(1) [[PRINTBUFFIDCAST]], align 4
	; GCN-NEXT: [[PRINTBUFFGEP:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 4			; GCN-NEXT: [[PRINTBUFFGEP:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 4
	; GCN-NEXT: [[PRINTARGPTR:%.*]] = ptrtoint ptr addrspace(5) [[STR]] to i64
	; GCN-NEXT: [[PRINTBUFFPTRCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFGEP]] to ptr addrspace(1)			; GCN-NEXT: [[PRINTBUFFPTRCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFGEP]] to ptr addrspace(1)
	; GCN-NEXT: store i64 [[PRINTARGPTR]], ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4			; GCN-NEXT: store i32 4144959, ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4
	; GCN-NEXT: [[PRINTBUFFNEXTPTR:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTBUFFGEP]], i32 8			; GCN-NEXT: [[PRINTBUFFNEXTPTR:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTBUFFGEP]], i32 4
	; GCN-NEXT: [[PRINTBUFFPTRCAST1:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFNEXTPTR]] to ptr addrspace(1)			; GCN-NEXT: [[PRINTBUFFPTRCAST1:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFNEXTPTR]] to ptr addrspace(1)
	; GCN-NEXT: store i32 [[N:%.*]], ptr addrspace(1) [[PRINTBUFFPTRCAST1]], align 4			; GCN-NEXT: store i32 [[N:%.*]], ptr addrspace(1) [[PRINTBUFFPTRCAST1]], align 4
	; GCN-NEXT: br label [[TMP2]]			; GCN-NEXT: br label [[TMP2]]
	; GCN: 2:			; GCN: 2:
	; GCN-NEXT: ret void			; GCN-NEXT: ret void
	;			;
	entry:			entry:
	%str = alloca [9 x i8], align 1, addrspace(5)			%str = alloca [9 x i8], align 1, addrspace(5)
	%call1 = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) @.str, ptr addrspace(5) %str, i32 %n)			%call1 = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) @.str, ptr addrspace(5) %str, i32 %n)
	ret void			ret void
	}			}

				define amdgpu_kernel void @string_pointee_type(i32 %n) {
				; R600-LABEL: @string_pointee_type(
				; R600-NEXT: [[STR:%.*]] = alloca [9 x i8], align 1, addrspace(5)
				; R600-NEXT: [[CALL1:%.]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) @.str, ptr addrspace(5) [[STR]], i32 [[N:%.]])
				; R600-NEXT: ret void
				;
				; GCN-LABEL: @string_pointee_type(
				; GCN-NEXT: [[STR:%.*]] = alloca [9 x i8], align 1, addrspace(5)
				; GCN-NEXT: [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 12)
				; GCN-NEXT: br label [[DOTSPLIT:%.*]]
				; GCN: .split:
				; GCN-NEXT: [[TMP1:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null
				; GCN-NEXT: br i1 [[TMP1]], label [[TMP2:%.]], label [[TMP3:%.]]
				; GCN: 2:
				; GCN-NEXT: [[PRINTBUFFID:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 0
				; GCN-NEXT: [[PRINTBUFFIDCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFID]] to ptr addrspace(1)
				; GCN-NEXT: store i32 2, ptr addrspace(1) [[PRINTBUFFIDCAST]], align 4
				; GCN-NEXT: [[PRINTBUFFGEP:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 4
				; GCN-NEXT: [[PRINTBUFFPTRCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFGEP]] to ptr addrspace(1)
				; GCN-NEXT: store i32 4144959, ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4
				; GCN-NEXT: [[PRINTBUFFNEXTPTR:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTBUFFGEP]], i32 4
				; GCN-NEXT: [[PRINTBUFFPTRCAST1:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFNEXTPTR]] to ptr addrspace(1)
				; GCN-NEXT: store i32 [[N:%.*]], ptr addrspace(1) [[PRINTBUFFPTRCAST1]], align 4
				; GCN-NEXT: br label [[TMP3]]
				; GCN: 3:
				; GCN-NEXT: ret void
				;
				%str = alloca [9 x i8], align 1, addrspace(5)
				%call1 = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) @.str, ptr addrspace(5) %str, i32 %n)
				ret void
				}

				define amdgpu_kernel void @string_address_space4(i32 %n, ptr addrspace(4) %str) {
				; R600-LABEL: @string_address_space4(
				; R600-NEXT: [[CALL1:%.]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) @.str, ptr addrspace(4) [[STR:%.]], i32 [[N:%.*]])
				; R600-NEXT: ret void
				;
				; GCN-LABEL: @string_address_space4(
				; GCN-NEXT: [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 12)
				; GCN-NEXT: br label [[DOTSPLIT:%.*]]
				; GCN: .split:
				; GCN-NEXT: [[TMP1:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null
				; GCN-NEXT: br i1 [[TMP1]], label [[TMP2:%.]], label [[TMP3:%.]]
				; GCN: 2:
				; GCN-NEXT: [[PRINTBUFFID:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 0
				; GCN-NEXT: [[PRINTBUFFIDCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFID]] to ptr addrspace(1)
				; GCN-NEXT: store i32 3, ptr addrspace(1) [[PRINTBUFFIDCAST]], align 4
				; GCN-NEXT: [[PRINTBUFFGEP:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 4
				; GCN-NEXT: [[PRINTBUFFPTRCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFGEP]] to ptr addrspace(1)
				; GCN-NEXT: store i32 4144959, ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4
				; GCN-NEXT: [[PRINTBUFFNEXTPTR:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTBUFFGEP]], i32 4
				; GCN-NEXT: [[PRINTBUFFPTRCAST1:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFNEXTPTR]] to ptr addrspace(1)
				; GCN-NEXT: store i32 [[N:%.*]], ptr addrspace(1) [[PRINTBUFFPTRCAST1]], align 4
				; GCN-NEXT: br label [[TMP3]]
				; GCN: 3:
				; GCN-NEXT: ret void
				;
				%call1 = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) @.str, ptr addrspace(4) %str, i32 %n)
				ret void
				}

				define amdgpu_kernel void @string_address_space1(i32 %n, ptr addrspace(1) %str) {
				; R600-LABEL: @string_address_space1(
				; R600-NEXT: [[CALL1:%.]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) @.str, ptr addrspace(1) [[STR:%.]], i32 [[N:%.*]])
				; R600-NEXT: ret void
				;
				; GCN-LABEL: @string_address_space1(
				; GCN-NEXT: [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 12)
				; GCN-NEXT: br label [[DOTSPLIT:%.*]]
				; GCN: .split:
				; GCN-NEXT: [[TMP1:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null
				; GCN-NEXT: br i1 [[TMP1]], label [[TMP2:%.]], label [[TMP3:%.]]
				; GCN: 2:
				; GCN-NEXT: [[PRINTBUFFID:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 0
				; GCN-NEXT: [[PRINTBUFFIDCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFID]] to ptr addrspace(1)
				; GCN-NEXT: store i32 4, ptr addrspace(1) [[PRINTBUFFIDCAST]], align 4
				; GCN-NEXT: [[PRINTBUFFGEP:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 4
				; GCN-NEXT: [[PRINTBUFFPTRCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFGEP]] to ptr addrspace(1)
				; GCN-NEXT: store i32 4144959, ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4
				; GCN-NEXT: [[PRINTBUFFNEXTPTR:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTBUFFGEP]], i32 4
				; GCN-NEXT: [[PRINTBUFFPTRCAST1:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFNEXTPTR]] to ptr addrspace(1)
				; GCN-NEXT: store i32 [[N:%.*]], ptr addrspace(1) [[PRINTBUFFPTRCAST1]], align 4
				; GCN-NEXT: br label [[TMP3]]
				; GCN: 3:
				; GCN-NEXT: ret void
				;
				%call1 = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) @.str, ptr addrspace(1) %str, i32 %n)
				ret void
				}

				define amdgpu_kernel void @string_format_passed_i32(i32 %n, i32 %str) {
				; R600-LABEL: @string_format_passed_i32(
				; R600-NEXT: [[CALL1:%.]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) @.str, i32 [[STR:%.]], i32 [[N:%.*]])
				; R600-NEXT: ret void
				;
				; GCN-LABEL: @string_format_passed_i32(
				; GCN-NEXT: [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 12)
				; GCN-NEXT: br label [[DOTSPLIT:%.*]]
				; GCN: .split:
				; GCN-NEXT: [[TMP1:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null
				; GCN-NEXT: br i1 [[TMP1]], label [[TMP2:%.]], label [[TMP3:%.]]
				; GCN: 2:
				; GCN-NEXT: [[PRINTBUFFID:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 0
				; GCN-NEXT: [[PRINTBUFFIDCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFID]] to ptr addrspace(1)
				; GCN-NEXT: store i32 5, ptr addrspace(1) [[PRINTBUFFIDCAST]], align 4
				; GCN-NEXT: [[PRINTBUFFGEP:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 4
				; GCN-NEXT: [[PRINTBUFFPTRCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFGEP]] to ptr addrspace(1)
				; GCN-NEXT: store i32 [[STR:%.*]], ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4
				; GCN-NEXT: [[PRINTBUFFNEXTPTR:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTBUFFGEP]], i32 4
				; GCN-NEXT: [[PRINTBUFFPTRCAST1:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFNEXTPTR]] to ptr addrspace(1)
				; GCN-NEXT: store i32 [[N:%.*]], ptr addrspace(1) [[PRINTBUFFPTRCAST1]], align 4
				; GCN-NEXT: br label [[TMP3]]
				; GCN: 3:
				; GCN-NEXT: ret void
				;
				%call1 = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) @.str, i32 %str, i32 %n)
				ret void
				}


	@str.as1 = private unnamed_addr addrspace(1) constant [6 x i8] c"%s:%d\00", align 1			@str.as1 = private unnamed_addr addrspace(1) constant [6 x i8] c"%s:%d\00", align 1

	define amdgpu_kernel void @test_kernel_addrspacecasted_format_str(i32 %n) {			define amdgpu_kernel void @test_kernel_addrspacecasted_format_str(i32 %n) {
	; R600-LABEL: @test_kernel_addrspacecasted_format_str(			; R600-LABEL: @test_kernel_addrspacecasted_format_str(
	; R600-NEXT: entry:			; R600-NEXT: entry:
	; R600-NEXT: [[STR:%.*]] = alloca [9 x i8], align 1, addrspace(5)			; R600-NEXT: [[STR:%.*]] = alloca [9 x i8], align 1, addrspace(5)
	; R600-NEXT: [[ARRAYDECAY:%.*]] = getelementptr inbounds [9 x i8], ptr addrspace(5) [[STR]], i32 0, i32 0			; R600-NEXT: [[ARRAYDECAY:%.*]] = getelementptr inbounds [9 x i8], ptr addrspace(5) [[STR]], i32 0, i32 0
	; R600-NEXT: [[CALL1:%.]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) addrspacecast (ptr addrspace(1) @str.as1 to ptr addrspace(4)), ptr addrspace(5) [[ARRAYDECAY]], i32 [[N:%.]])			; R600-NEXT: [[CALL1:%.]] = call i32 (ptr addrspace(4), ...) @printf(ptr addrspace(4) addrspacecast (ptr addrspace(1) @str.as1 to ptr addrspace(4)), ptr addrspace(5) [[ARRAYDECAY]], i32 [[N:%.]])
	; R600-NEXT: ret void			; R600-NEXT: ret void
	;			;
	; GCN-LABEL: @test_kernel_addrspacecasted_format_str(			; GCN-LABEL: @test_kernel_addrspacecasted_format_str(
	; GCN-NEXT: entry:			; GCN-NEXT: entry:
	; GCN-NEXT: [[STR:%.*]] = alloca [9 x i8], align 1, addrspace(5)			; GCN-NEXT: [[STR:%.*]] = alloca [9 x i8], align 1, addrspace(5)
	; GCN-NEXT: [[ARRAYDECAY:%.*]] = getelementptr inbounds [9 x i8], ptr addrspace(5) [[STR]], i32 0, i32 0			; GCN-NEXT: [[ARRAYDECAY:%.*]] = getelementptr inbounds [9 x i8], ptr addrspace(5) [[STR]], i32 0, i32 0
	; GCN-NEXT: [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 16)			; GCN-NEXT: [[PRINTF_ALLOC_FN:%.*]] = call ptr addrspace(1) @__printf_alloc(i32 12)
	; GCN-NEXT: br label [[ENTRY_SPLIT:%.*]]			; GCN-NEXT: br label [[ENTRY_SPLIT:%.*]]
	; GCN: entry.split:			; GCN: entry.split:
	; GCN-NEXT: [[TMP0:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null			; GCN-NEXT: [[TMP0:%.*]] = icmp ne ptr addrspace(1) [[PRINTF_ALLOC_FN]], null
	; GCN-NEXT: br i1 [[TMP0]], label [[TMP1:%.]], label [[TMP2:%.]]			; GCN-NEXT: br i1 [[TMP0]], label [[TMP1:%.]], label [[TMP2:%.]]
	; GCN: 1:			; GCN: 1:
	; GCN-NEXT: [[PRINTBUFFID:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 0			; GCN-NEXT: [[PRINTBUFFID:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 0
	; GCN-NEXT: [[PRINTBUFFIDCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFID]] to ptr addrspace(1)			; GCN-NEXT: [[PRINTBUFFIDCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFID]] to ptr addrspace(1)
	; GCN-NEXT: store i32 2, ptr addrspace(1) [[PRINTBUFFIDCAST]], align 4			; GCN-NEXT: store i32 6, ptr addrspace(1) [[PRINTBUFFIDCAST]], align 4
	; GCN-NEXT: [[PRINTBUFFGEP:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 4			; GCN-NEXT: [[PRINTBUFFGEP:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTF_ALLOC_FN]], i32 4
	; GCN-NEXT: [[PRINTARGPTR:%.*]] = ptrtoint ptr addrspace(5) [[ARRAYDECAY]] to i64
	; GCN-NEXT: [[PRINTBUFFPTRCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFGEP]] to ptr addrspace(1)			; GCN-NEXT: [[PRINTBUFFPTRCAST:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFGEP]] to ptr addrspace(1)
	; GCN-NEXT: store i64 [[PRINTARGPTR]], ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4			; GCN-NEXT: store i32 4144959, ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4
	; GCN-NEXT: [[PRINTBUFFNEXTPTR:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTBUFFGEP]], i32 8			; GCN-NEXT: [[PRINTBUFFNEXTPTR:%.*]] = getelementptr i8, ptr addrspace(1) [[PRINTBUFFGEP]], i32 4
	; GCN-NEXT: [[PRINTBUFFPTRCAST1:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFNEXTPTR]] to ptr addrspace(1)			; GCN-NEXT: [[PRINTBUFFPTRCAST1:%.*]] = bitcast ptr addrspace(1) [[PRINTBUFFNEXTPTR]] to ptr addrspace(1)
	; GCN-NEXT: store i32 [[N:%.*]], ptr addrspace(1) [[PRINTBUFFPTRCAST1]], align 4			; GCN-NEXT: store i32 [[N:%.*]], ptr addrspace(1) [[PRINTBUFFPTRCAST1]], align 4
	; GCN-NEXT: br label [[TMP2]]			; GCN-NEXT: br label [[TMP2]]
	; GCN: 2:			; GCN: 2:
	; GCN-NEXT: ret void			; GCN-NEXT: ret void
	;			;
	entry:			entry:
	%str = alloca [9 x i8], align 1, addrspace(5)			%str = alloca [9 x i8], align 1, addrspace(5)
	▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines