This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fix broken opaque pointer handling in printf pass
ClosedPublic

Authored by arsenm on Dec 22 2022, 8:59 AM.

Details

Reviewers
sameerds
yaxunl
vikramRH
rampitec
ronlieb
Group Reviewers
Restricted Project
Summary

This was directly considering the pointee type, and also applying
special semantics to constant address space.

@vikramRH since you're looking at this in D138702, can you please do something
about the test coverage before touching this pass? There's no coverage of any of the
format string handling and I see obvious bugs in every part of it. The same initializer bugs I fixed
for the format string in D140558 are repeated for print of string later.

Diff Detail

Event Timeline

arsenm created this revision.Dec 22 2022, 8:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2022, 8:59 AM
arsenm requested review of this revision.Dec 22 2022, 8:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2022, 8:59 AM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Dec 22 2022, 9:06 AM
llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
333

Should use GLOBAL_ADDRESS, this is repeated several other times in this function

384–385

Should be using IRBuilder, this is unconditionally creating no-op bitcasts with opaque pointers

387

Should create stores with explicit alignments

420

None of these paths are tested and I think this is broken for undef/poison. Also need some half handling

426–435

This is all wrong for the same reasons as the format string

446

No raw news. Use SmallString

467

Don't see why we need to use ptrtoint, store of pointer works just as well

472

Should at least not die on scalable vector IR

473

This is broken for vectors of pointers

520

Should use getTypeStoreSize

525–526

Another unneeded bitcast with opaque pointers

arsenm added inline comments.Dec 22 2022, 9:07 AM
llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
326–327

Can add a lot of other attributes

arsenm added inline comments.Dec 22 2022, 9:26 AM
llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
426

I don't see why this can't just emit a memcpy

This pass is used only by OpenCL on AMDGPU. It's meant to be deprecated with @vikramRH working on a replacement that unifies HIP and OpenCL.

@arsenm, The change looks straightforward enough, but is it known to pass on internal OpenCL builds?

sameerds accepted this revision.Jan 4 2023, 10:12 PM

LGTM, assuming it does not break internal OpenCL builds.

This revision is now accepted and ready to land.Jan 4 2023, 10:12 PM
arsenm added a comment.Jan 5 2023, 4:48 AM

LGTM, assuming it does not break internal OpenCL builds.

It passes. A hacky workaround for this slipped in one of the merge commits into the internal branches

saghir added a subscriber: saghir.Jan 6 2023, 2:24 AM

Hi, this seems to be causing our bots to fail. One example is https://lab.llvm.org/buildbot/#/builders/231/builds/6915. Can you please take a look? Thanks!

Hi, this seems to be causing our bots to fail. One example is https://lab.llvm.org/buildbot/#/builders/231/builds/6915. Can you please take a look? Thanks!

I have a patch ready for this, need to write some more tests for it

arsenm added a comment.Jan 6 2023, 2:24 PM

Hi, this seems to be causing our bots to fail. One example is https://lab.llvm.org/buildbot/#/builders/231/builds/6915. Can you please take a look? Thanks!

I have a patch ready for this, need to write some more tests for it

Try 40078a6b713730ffc164d4c0733d26352eb1e236

saghir added a comment.Jan 9 2023, 6:17 AM

Hi, this seems to be causing our bots to fail. One example is https://lab.llvm.org/buildbot/#/builders/231/builds/6915. Can you please take a look? Thanks!

I have a patch ready for this, need to write some more tests for it

Try 40078a6b713730ffc164d4c0733d26352eb1e236

Thanks for taking a look. Your patch seems to have fixed the error seen before but it fails at a later check now:

/llvm/test/CodeGen/AMDGPU/opencl-printf.ll:1874:13: error: GCN-NEXT: expected string not found in input
; GCN-NEXT: store i32 1953722977, ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4

Logs can be found here: https://lab.llvm.org/buildbot/#/builders/231/builds/7035/steps/6/logs/FAIL__LLVM__opencl-printf_ll

Hi, this seems to be causing our bots to fail. One example is https://lab.llvm.org/buildbot/#/builders/231/builds/6915. Can you please take a look? Thanks!

I have a patch ready for this, need to write some more tests for it

Try 40078a6b713730ffc164d4c0733d26352eb1e236

Thanks for taking a look. Your patch seems to have fixed the error seen before but it fails at a later check now:

/llvm/test/CodeGen/AMDGPU/opencl-printf.ll:1874:13: error: GCN-NEXT: expected string not found in input
; GCN-NEXT: store i32 1953722977, ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4

Logs can be found here: https://lab.llvm.org/buildbot/#/builders/231/builds/7035/steps/6/logs/FAIL__LLVM__opencl-printf_ll

Since this is still causing failures (i.e. DataExtractor does not reorder bytes for getBytes() as it does for getU...()), we can fix the problem as follows:

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
index 18dbf5d..63a365e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
@@ -435,19 +435,31 @@ bool AMDGPUPrintfRuntimeBindingImpl::lowerPrintfForGpu(Module &M) {
             DataExtractor Extractor(S, /*IsLittleEndian=*/true, 8);
             DataExtractor::Cursor Offset(0);
             while (Offset && Offset.tell() < S.size()) {
-              StringRef ReadBytes = Extractor.getBytes(
-                  Offset, std::min(ReadSize, S.size() - Offset.tell()));
+              uint64_t ReadNow = std::min(ReadSize, S.size() - Offset.tell());
+              uint64_t ReadBytes = 0;
+              switch (ReadNow) {
+              default: llvm_unreachable("min(4, X) > 4?");
+              case 1:
+                ReadBytes = Extractor.getU8(Offset);
+                break;
+              case 2:
+                ReadBytes = Extractor.getU16(Offset);
+                break;
+              case 3:
+                ReadBytes = Extractor.getU24(Offset);
+                break;
+              case 4:
+                ReadBytes = Extractor.getU32(Offset);
+                break;
+              }
 
               cantFail(Offset.takeError(),
                        "failed to read bytes from constant array");
 
-              APInt IntVal(8 * ReadBytes.size(), 0);
-              LoadIntFromMemory(
-                  IntVal, reinterpret_cast<const uint8_t *>(ReadBytes.data()),
-                  ReadBytes.size());
+              APInt IntVal(8 * ReadSize, ReadBytes);
 
               // TODO: Should not bothering aligning up.
-              if (ReadBytes.size() < ReadSize)
+              if (ReadNow < ReadSize)
                 IntVal = IntVal.zext(8 * ReadSize);
 
               Type *IntTy = Type::getIntNTy(Ctx, IntVal.getBitWidth());

@arsenm Would a fix similar to this be acceptable?

Hi, this seems to be causing our bots to fail. One example is https://lab.llvm.org/buildbot/#/builders/231/builds/6915. Can you please take a look? Thanks!

I have a patch ready for this, need to write some more tests for it

Try 40078a6b713730ffc164d4c0733d26352eb1e236

Thanks for taking a look. Your patch seems to have fixed the error seen before but it fails at a later check now:

/llvm/test/CodeGen/AMDGPU/opencl-printf.ll:1874:13: error: GCN-NEXT: expected string not found in input
; GCN-NEXT: store i32 1953722977, ptr addrspace(1) [[PRINTBUFFPTRCAST]], align 4

Logs can be found here: https://lab.llvm.org/buildbot/#/builders/231/builds/7035/steps/6/logs/FAIL__LLVM__opencl-printf_ll

Since this is still causing failures (i.e. DataExtractor does not reorder bytes for getBytes() as it does for getU...()), we can fix the problem as follows:

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
index 18dbf5d..63a365e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
@@ -435,19 +435,31 @@ bool AMDGPUPrintfRuntimeBindingImpl::lowerPrintfForGpu(Module &M) {
             DataExtractor Extractor(S, /*IsLittleEndian=*/true, 8);
             DataExtractor::Cursor Offset(0);
             while (Offset && Offset.tell() < S.size()) {
-              StringRef ReadBytes = Extractor.getBytes(
-                  Offset, std::min(ReadSize, S.size() - Offset.tell()));
+              uint64_t ReadNow = std::min(ReadSize, S.size() - Offset.tell());
+              uint64_t ReadBytes = 0;
+              switch (ReadNow) {
+              default: llvm_unreachable("min(4, X) > 4?");
+              case 1:
+                ReadBytes = Extractor.getU8(Offset);
+                break;
+              case 2:
+                ReadBytes = Extractor.getU16(Offset);
+                break;
+              case 3:
+                ReadBytes = Extractor.getU24(Offset);
+                break;
+              case 4:
+                ReadBytes = Extractor.getU32(Offset);
+                break;
+              }
 
               cantFail(Offset.takeError(),
                        "failed to read bytes from constant array");
 
-              APInt IntVal(8 * ReadBytes.size(), 0);
-              LoadIntFromMemory(
-                  IntVal, reinterpret_cast<const uint8_t *>(ReadBytes.data()),
-                  ReadBytes.size());
+              APInt IntVal(8 * ReadSize, ReadBytes);
 
               // TODO: Should not bothering aligning up.
-              if (ReadBytes.size() < ReadSize)
+              if (ReadNow < ReadSize)
                 IntVal = IntVal.zext(8 * ReadSize);
 
               Type *IntTy = Type::getIntNTy(Ctx, IntVal.getBitWidth());

@arsenm Would a fix similar to this be acceptable?

Yes, I was hoping to extend this to up to 16-bytes at a time but I guess I can figure that out then (we should probably just be emitting a memcpy/strcpy anyway)

@arsenm Would a fix similar to this be acceptable?

Yes, I was hoping to extend this to up to 16-bytes at a time but I guess I can figure that out then (we should probably just be emitting a memcpy/strcpy anyway)

Did you want me to go ahead with this fix now to unblock the bots and then you guys can finalize the fixes with memcpy/strcpy at a later date?

@arsenm Would a fix similar to this be acceptable?

Yes, I was hoping to extend this to up to 16-bytes at a time but I guess I can figure that out then (we should probably just be emitting a memcpy/strcpy anyway)

Did you want me to go ahead with this fix now to unblock the bots and then you guys can finalize the fixes with memcpy/strcpy at a later date?

Yes, thanks