Add builtins which accept floats for these instructions. A user is requesting to have permlane builtins for floats without use of casts.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for your comment Stas!
I think it would be ideal if clang inserted pure bitcasts for floats instead of fptoui when passed as operands to these builtins. My concern is -- Do you think we need to preserve the implicit casting behavior for compatibility?
There is a benefit to not having bitcast noise in the IR
llvm/include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
1964–1965 | Should use type mangling for the existing intrinsics rather than introducing new typed copies |
Changing the existing intrinsics to use type mangling could break clients like LLPC and Mesa. I've put up a patch for LLPC to protect it against this change: https://github.com/GPUOpen-Drivers/llpc/pull/2404
No, I'm thinking of clients that use IRBuilder to create intrinsic calls programmatically.
Thanks @foad for pointing that out.
Inherit https://reviews.llvm.org/D86154 -- the main idea is: if we are going to break calls to CreateIntrinsic, we might as well break them all in a single commit.
I have also reworked ISel for permlane intrinsics s.t. we consistently (w.r.t implementation in D86154) rely on LateCodeGenPrepare for legalization, rather than select into a new MI.
Please make sure we can consume old IR using unmangled llvm.amdgcn.permlanex16 by fixing auto upgrade. Otherwise, it may cause regressions for device libs or existing apps. Thanks.
Hi @yaxunl thanks for the comment.
I looked into AutoUpgrade, and it catches upgrades of this type (unmangled -> mangled) by default (via remangleIntrinsicFunction). I've included tests to demonstrate. Is this the desired effect, or are you asking to blacklist these intrinstics from being upgraded?
D84639 is an old version of this. It has some additional tests not covered here, can you copy them?
Add tests from D84639 + actually support ptr type during legalization (i.e. AMDGPULateCodeGenPrepare)
I think this may not hard break mesa. I believe mesa bypasses the intrinsic creation API, and just declares the string name of the intrinsic. The type name mangling suffix is technically irrelevant, and as long as you use a consistent type with a consistent suffix things should work out (and the null suffix also works). After committing mesa should still move to adding the type suffix
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll | ||
---|---|---|
22 | This is a hint mesa won't break |
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp | ||
---|---|---|
213 | Please use poison wherever possible. In this case it seems it's just a placeholder, so it can be poison. |
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp | ||
---|---|---|
209 | isIntegerTy(16). Also, just check the bitsize is 16. Might as well also handle bfloat | |
301–310 | Just let pointer types pass through to codegen, we try really hard to never introduce ptrtoint/inttoptr | |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll | ||
5 | Add bfloat and <2 x i16>, <2 x half>, <2 x bfloat> tests |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll | ||
---|---|---|
5 | Also p2, p3, p5, p6 |
I can echo this sentiment.
The main issues arises when there are untyped calls to CreateIntrinsic, as the intrinsics are no longer defined with a type.
For {read, readfirst, write, perm}lanes, Mesa uses LLVMAddFunction and LLVMBuildCall2 APIs under its own ac_build_intrinsic -- these calls are all typed in the current implementation. Also, (as expected) the implementation inserts bitcasts to cast to Int32Ty before inserting these calls since only that version of the intrinsic currently exists. This also implies they wont have an issue with intrinsic / type declarations.
Unless I have missed something, I don't see why switching to type-mangling would cause an issue with Mesa's current implementation.
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp | ||
---|---|---|
187 | You're not relying on this for correctness are you? This is an optimization pass, you can't lower here. You also shouldn't need to handle this in the IR, it should codegen normally |
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp | ||
---|---|---|
187 | This is the legalization for non 32bit types -- I don't exactly know why it wasn't handled via the normal codegen / selection process. @nhaehnle , I believe you tried this in https://reviews.llvm.org/D86154 -- do you happen to remember why we do legalization this way? If not, I'll rework the approach. |
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp | ||
---|---|---|
187 | CodeGenPrepare/LateCodeGenPrepare can't be used for lowering, they're optimization passes. Legalization needs to be handled in the codegen |
Should use type mangling for the existing intrinsics rather than introducing new typed copies