Page MenuHomePhabricator

[AMDGPU] Non hostcall printf support for HIP
Needs ReviewPublic

Authored by vikramRH on Thu, May 11, 11:32 PM.

Details

Summary

This is an alternative to currently existing hostcall implementation and uses printf buffer similar to OpenCL,
The data stored in the buffer (i.e the data frame) for each printf call are as follows,

  1. Control DWord - contains info regarding stream, format string constness and size of data frame
  2. Hash of the format string (if constant) else the format string itself
  3. Printf arguments (each aligned to 8 byte boundary)

The format string Hash is generated using LLVM's MD5 Message-Digest Algorithm implementation and only low 64 bits are used.
The implementation still uses amdhsa metadata and hash is stored as part of format string itself to ensure
minimal changes in runtime.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
yaxunl added inline comments.Fri, May 12, 9:28 AM
clang/include/clang/Basic/LangOptions.def
274 ↗(On Diff #521570)

This should be a target option like https://reviews.llvm.org/D91546 instead of a language option since it is target specific.

clang/lib/Driver/ToolChains/Clang.cpp
4700–4701

You can check whether target triple is amdgpu instead.

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
208–209

Please add some comments about the transformations done for buffered and non-buffered cases. Better provide a simple example.

yaxunl added inline comments.Fri, May 12, 9:34 AM
clang/include/clang/Driver/Options.td
1042

as a target option, usually it starts with -m and does not contain target name, e.g. -mprintf=.

Better describe allowed values and their effect from users point of view, e.g. hostcall outputs immediately during kernel execution but requires some hardware feature.

Where does the runtime implementation of this live? I'm not very familiar with the HIP / hostcall ecosystem.

arsenm added inline comments.Fri, May 12, 10:42 AM
clang/include/clang/Basic/LangOptions.def
274 ↗(On Diff #521570)

Should be -m option

clang/lib/Driver/ToolChains/Clang.cpp
4711

Capitalize, typo 'pritnf'

4712

Typo 'will should'

Don't really understand the TODO, this should trigger for OpenCL as it is

clang/test/CodeGenHIP/printf_nonhostcall.cpp
6

Do we need a test with -fno-builtin?

70

Doesn't cover the full range of printable types. need some other non-string pointers and different address spaces, some FP promotions, 16 and 64 bit integers

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
257–259

Don't understand the point of the comment, I would assume anything involving analysis of a constant string has ignorable compile time

369–382

Builder.CreateMemCpy

400

Just use the type store size

401

You don't need a SmallVector to push back a single entry

422

Don't need llvm::

424

No auto

452

Do we really need another ockl control variable for this? Why isn't it a parameter? printf=stdout always

467

Don't need llvm::

492

Backwards conditional

vikramRH updated this revision to Diff 522954.EditedWed, May 17, 1:54 AM

Handled review comments

vikramRH marked 10 inline comments as done.Wed, May 17, 1:57 AM

Where does the runtime implementation of this live? I'm not very familiar with the HIP / hostcall ecosystem.

Currently there is a rocclr review up internally for the runtime side support

vikramRH added inline comments.Wed, May 17, 2:05 AM
llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
452

There are certain HIP API's such as "hip_assert" that output to stderr. currently such API's are supported via hostcalls. Although this implementation does not currently support the API's ,its kept as an option.

sameerds added inline comments.Wed, May 17, 3:11 AM
llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
210

you can just say "struct StringData"

216–217

Every argument has a default value. Why not simply assign them as member initializers?

236

Should this say 4?

254

move this closer to first use

311

function argument should use SmallVectorImpl, and not SmallVector

319

This "if (...) { ... }" has a really long body. Can it be moved into a separate function for readability?

322

Try to move declarations to the smallest scope that they are used in. For ReadSize, this seems to be the while loop later.

350

typo: bother

432–433

So we cannot use a buffered printf in HIP?

457

FmtStr.empty() gets checked in multiple places. Can this be captured as one or more boolean variables with meaningful names?

493

Probably not a good choice if it shows up where the user can see it. It's also not very informative. A more descriptive string will be better.

arsenm added inline comments.Wed, May 17, 3:42 AM
clang/include/clang/Basic/TargetOptions.h
99

Typo pritnf

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
452

Right but the way to handle that would be a parameter for where to output, not an externally set global

sameerds added inline comments.Wed, May 17, 5:25 AM
llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
184–185

Seems like the only purpose of this change is to make the new StringRef available outside. It will be less confusing to just move the call to getConstantStringInfo() out of this function, and pass the StringRef as an input parameter instead of output. Also, try not to change existing variable names unless it is really relevant. That reduces the number of lines affected by mostly non-functional changes.

221

Needs consistent capitalization in the argument names. Since the rest of the code seems to prefer TitleCase, maybe alignment should start with "A"?

367

Avoid first person comments.

vikramRH updated this revision to Diff 523344.EditedThu, May 18, 5:03 AM

Handled Review comments from @sameerds

vikramRH marked 13 inline comments as done.Thu, May 18, 5:07 AM
vikramRH added inline comments.Thu, May 18, 5:13 AM
llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
432–433

I guess hostcall version returns number of bytes printed as the printf return value, This is not done in Buffered case since we do not print stuff in "real time", instead I have decided to stick with OpenCL standard (return 0 on success and -1 on failure)

vikramRH added inline comments.Thu, May 18, 5:16 AM
llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
452

I am not clear here, you expect additional inputs to device lib function ?

vikramRH marked 5 inline comments as done.Mon, May 22, 10:04 PM
sameerds added inline comments.Mon, May 22, 11:01 PM
llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
184–185

Where is Fmt used in this function? Also, StringRef is passed by value. It's already a lightweight reference, as demonstrated by its name.

210

typedef is not required for this in C++

452

@arsenm, this "control word" written into the buffer. In that sense, it is indeed a parameter passed from device to host as part of the printf packet. It is not yet another global variable.

489–490

else should be on the same line as the closing brace. Please do run clang-format on the entire change.

vikramRH updated this revision to Diff 524590.Tue, May 23, 12:48 AM

Handled review comments and rebase

vikramRH marked 3 inline comments as done.Tue, May 23, 12:58 AM
arsenm added inline comments.Tue, May 23, 1:21 AM
llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
220

MathExtras already has alignTo

391

Remove this newline

393

You can use one of the CreateConstGEPs. I also suspect this can be inbounds

399

Remove this newline

461

There's a getFalse

486

CreateConstGEP

arsenm added inline comments.Tue, May 23, 3:22 AM
clang/test/CodeGenHIP/printf_nonhostcall.cpp
171–172

Just store the double, don't need the bitcast?

vikramRH updated this revision to Diff 525082.Wed, May 24, 2:51 AM

Handled review comments and rebase

vikramRH marked 7 inline comments as done.Wed, May 24, 2:54 AM
vikramRH updated this revision to Diff 525946.Thu, May 25, 10:51 PM

rebase and ping, apologies for the short frequency.

arsenm added inline comments.Tue, May 30, 3:19 AM
clang/include/clang/Driver/Options.td
1036

I'm a bit worried this is introducing a side-ABI option not captured in the triple or at least module flags

clang/lib/Driver/ToolChains/Clang.cpp
4703

Braces

clang/test/CodeGenHIP/printf_nonhostcall.cpp
65

Can directly store the pointer, don't ptrtoint? Should have some other pointer address spaces tested, the spec is quiet on how %p is supposed to work with different sized pointers

229

Need some vector tests, especially 3 x vectors

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
211

Don't need = ""

212

Move last

224

ArrayRef should be passed by value

235

No auto

235–238

No auto

294

I suppose you can't access the AMDGPU enum from Utils. You could use DL.getDefaultGlobalsAddrSpace

367

You have better alignment info than Align(1)

381

Don't see why isDoubleTy would be special cased here and not part of fitArgInto64Bits

395

This should probably be getTypeAllocSize

452

I'm not following why this part requires introducing another call into device libs. It's expanding this not-really defined API. I'd expect this to be a trivial function we can expand inline here and write directly into the parameter buffer?

arsenm added inline comments.Tue, May 30, 3:20 AM
llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
211

Can't you just use the raw StringRef out of getConstantStringInfo

vikramRH updated this revision to Diff 526963.Wed, May 31, 2:16 AM

Handled most review comments

vikramRH marked 11 inline comments as done.Wed, May 31, 2:35 AM
vikramRH added inline comments.
clang/test/CodeGenHIP/printf_nonhostcall.cpp
65

Added few more address space cases in foo2(), do you think we need more ?

229

vectors are not yet supported as in the hostcall case since the code currently runs for HIP only. I plan to port the OpenCL lowering also here along with hostcall support . we would need to extend hostcall and this implementation to support vectors as part of the porting activity.

llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
211

The structure just caches the string calculated from getConstantStringInfo

381

The redundant cast instructions and this case were due to function "fitArgInto64Bits", which I reused from hostcall Implementation and did not want to modify. This dependency has been removed now and I have inlined the code

452

I started the implementation keeping in mind the device side asserts currenlty supported via hostcall. They use explicit calls to such device lib functions in their definitions. I could not support them now but these would be required if I ever decide to. also I never ended up inlining this call. do you think its really necessary ?

vikramRH marked 2 inline comments as done.Wed, May 31, 2:43 AM
vikramRH added inline comments.
clang/include/clang/Driver/Options.td
1036

I did not understand the concern here. I agree The usage of option itself is not the most robust solution and it would have been better if we could do this without user intervention. but I do not see ways to do that now (atleast not immediately).