This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add type mangling for {read, write, readfirst, perm}lane intrinsics
Needs RevisionPublic

Authored by jrbyrnes on Apr 6 2023, 11:58 AM.

Details

Summary

Add builtins which accept floats for these instructions. A user is requesting to have permlane builtins for floats without use of casts.

Diff Detail

Event Timeline

jrbyrnes created this revision.Apr 6 2023, 11:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2023, 11:58 AM
jrbyrnes requested review of this revision.Apr 6 2023, 11:58 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 6 2023, 11:58 AM

Isn't it simpler to lower it to an existing int intrinsic and casts in clang?

Isn't it simpler to lower it to an existing int intrinsic and casts in clang?

Thanks for your comment Stas!

I think it would be ideal if clang inserted pure bitcasts for floats instead of fptoui when passed as operands to these builtins. My concern is -- Do you think we need to preserve the implicit casting behavior for compatibility?

Isn't it simpler to lower it to an existing int intrinsic and casts in clang?

Thanks for your comment Stas!

I think it would be ideal if clang inserted pure bitcasts for floats instead of fptoui when passed as operands to these builtins. My concern is -- Do you think we need to preserve the implicit casting behavior for compatibility?

You can manually lower builtin to a proper cast and intrinsic in the CGBuiltin.cpp.

arsenm requested changes to this revision.Apr 7 2023, 3:43 PM

There is a benefit to not having bitcast noise in the IR

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1964–1965

Should use type mangling for the existing intrinsics rather than introducing new typed copies

This revision now requires changes to proceed.Apr 7 2023, 3:43 PM
jrbyrnes updated this revision to Diff 513386.Apr 13 2023, 4:42 PM
jrbyrnes marked an inline comment as done.

Use type mangling

foad added a comment.Apr 14 2023, 2:17 AM

Changing the existing intrinsics to use type mangling could break clients like LLPC and Mesa. I've put up a patch for LLPC to protect it against this change: https://github.com/GPUOpen-Drivers/llpc/pull/2404

Changing the existing intrinsics to use type mangling could break clients like LLPC and Mesa. I've put up a patch for LLPC to protect it against this change: https://github.com/GPUOpen-Drivers/llpc/pull/2404

It can be fixed with IR autoupgrade I suppose.

foad added a comment.Apr 14 2023, 9:27 AM

Changing the existing intrinsics to use type mangling could break clients like LLPC and Mesa. I've put up a patch for LLPC to protect it against this change: https://github.com/GPUOpen-Drivers/llpc/pull/2404

It can be fixed with IR autoupgrade I suppose.

No, I'm thinking of clients that use IRBuilder to create intrinsic calls programmatically.

jrbyrnes updated this revision to Diff 521786.May 12 2023, 1:48 PM

Thanks @foad for pointing that out.

Inherit https://reviews.llvm.org/D86154 -- the main idea is: if we are going to break calls to CreateIntrinsic, we might as well break them all in a single commit.

I have also reworked ISel for permlane intrinsics s.t. we consistently (w.r.t implementation in D86154) rely on LateCodeGenPrepare for legalization, rather than select into a new MI.

jrbyrnes retitled this revision from [AMDGPU] Add f32 permlane{16, x16} builtin variants to [AMDGPU] Add type mangling for {read, write, readfirst, perm}lane intrinsics.May 12 2023, 1:49 PM
jrbyrnes added a reviewer: nhaehnle.
jrbyrnes updated this revision to Diff 521793.May 12 2023, 2:10 PM

clang-format + newlines

Please make sure we can consume old IR using unmangled llvm.amdgcn.permlanex16 by fixing auto upgrade. Otherwise, it may cause regressions for device libs or existing apps. Thanks.

jrbyrnes updated this revision to Diff 522267.May 15 2023, 11:00 AM

Hi @yaxunl thanks for the comment.

I looked into AutoUpgrade, and it catches upgrades of this type (unmangled -> mangled) by default (via remangleIntrinsicFunction). I've included tests to demonstrate. Is this the desired effect, or are you asking to blacklist these intrinstics from being upgraded?

Hi @yaxunl thanks for the comment.

I looked into AutoUpgrade, and it catches upgrades of this type (unmangled -> mangled) by default (via remangleIntrinsicFunction). I've included tests to demonstrate. Is this the desired effect, or are you asking to blacklist these intrinstics from being upgraded?

That is the desired effect. Thanks.

arsenm added a comment.Jun 9 2023, 7:05 PM

D84639 is an old version of this. It has some additional tests not covered here, can you copy them?

jrbyrnes updated this revision to Diff 530688.Jun 12 2023, 3:35 PM

Add tests from D84639 + actually support ptr type during legalization (i.e. AMDGPULateCodeGenPrepare)

I think this may not hard break mesa. I believe mesa bypasses the intrinsic creation API, and just declares the string name of the intrinsic. The type name mangling suffix is technically irrelevant, and as long as you use a consistent type with a consistent suffix things should work out (and the null suffix also works). After committing mesa should still move to adding the type suffix

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll
22

This is a hint mesa won't break

nlopes added inline comments.Jun 20 2023, 7:17 AM
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
213

Please use poison wherever possible. In this case it seems it's just a placeholder, so it can be poison.
We're trying to get rid of poison. Thanks!

arsenm added inline comments.Jun 20 2023, 7:21 AM
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
209

isIntegerTy(16). Also, just check the bitsize is 16. Might as well also handle bfloat

301–310

Just let pointer types pass through to codegen, we try really hard to never introduce ptrtoint/inttoptr

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll
5

Add bfloat and <2 x i16>, <2 x half>, <2 x bfloat> tests

arsenm added inline comments.Jun 20 2023, 7:24 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll
5

Also p2, p3, p5, p6

jrbyrnes updated this revision to Diff 533080.Jun 20 2023, 4:32 PM
jrbyrnes marked 5 inline comments as done.

Address comments + enable selection of ptr types

I think this may not hard break mesa. I believe mesa bypasses the intrinsic creation API, and just declares the string name of the intrinsic. The type name mangling suffix is technically irrelevant, and as long as you use a consistent type with a consistent suffix things should work out (and the null suffix also works). After committing mesa should still move to adding the type suffix

I can echo this sentiment.

The main issues arises when there are untyped calls to CreateIntrinsic, as the intrinsics are no longer defined with a type.

For {read, readfirst, write, perm}lanes, Mesa uses LLVMAddFunction and LLVMBuildCall2 APIs under its own ac_build_intrinsic -- these calls are all typed in the current implementation. Also, (as expected) the implementation inserts bitcasts to cast to Int32Ty before inserting these calls since only that version of the intrinsic currently exists. This also implies they wont have an issue with intrinsic / type declarations.

Unless I have missed something, I don't see why switching to type-mangling would cause an issue with Mesa's current implementation.

arsenm requested changes to this revision.Jul 6 2023, 12:25 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
187

You're not relying on this for correctness are you? This is an optimization pass, you can't lower here. You also shouldn't need to handle this in the IR, it should codegen normally

This revision now requires changes to proceed.Jul 6 2023, 12:25 PM
jrbyrnes added inline comments.Jul 6 2023, 2:25 PM
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
187

This is the legalization for non 32bit types -- I don't exactly know why it wasn't handled via the normal codegen / selection process. @nhaehnle , I believe you tried this in https://reviews.llvm.org/D86154 -- do you happen to remember why we do legalization this way? If not, I'll rework the approach.

arsenm added inline comments.Jul 24 2023, 9:00 AM
llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
187

CodeGenPrepare/LateCodeGenPrepare can't be used for lowering, they're optimization passes. Legalization needs to be handled in the codegen

llvm/test/CodeGen/AMDGPU/permlane-ptr.ll