In emitting metadata for implicit kernel arguments, we need to be in sync with the actual loads
to align the implicit kernel argument segment to 8 byte boundary. In this work, we simply force
this alignment through the first implicit argument.
In addition, we don't emit metadata for any implicit kernel argument if none of them is actually used.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
[AMD Official Use Only]
Hi, Matt:
Do you have any suggestion about the patch ? I need the permission to integrate.
Thanks
Changpeng
Get Outlook for iOShttps://aka.ms/o0ukef
From: Brian Sumner via Phabricator <reviews@reviews.llvm.org>
Sent: Thursday, April 7, 2022 7:35:02 PM
To: Fang, Changpeng <Changpeng.Fang@amd.com>; Arsenault, Matthew <Matthew.Arsenault@amd.com>; Sumner, Brian <Brian.Sumner@amd.com>
Cc: Zhuravlyov, Konstantin <Konstantin.Zhuravlyov@amd.com>; jv356@scarletmail.rutgers.edu <jv356@scarletmail.rutgers.edu>; nhaehnle@gmail.com <nhaehnle@gmail.com>; Liu, Yaxun (Sam) <Yaxun.Liu@amd.com>; Stuttard, David <David.Stuttard@amd.com>; Tye, Tony <Tony.Tye@amd.com>; Kerbow, Austin <Austin.Kerbow@amd.com>; llvm-commits@lists.llvm.org <llvm-commits@lists.llvm.org>; jay.foad@gmail.com <jay.foad@gmail.com>; wei.ding2@amd.com <wei.ding2@amd.com>; mahesha.comp@gmail.com <mahesha.comp@gmail.com>; acmerzhangxipeng@163.com <acmerzhangxipeng@163.com>; Kumar N, Bhuvanendra <Bhuvanendra.KumarN@amd.com>; wangyihansh@outlook.com <wangyihansh@outlook.com>; yanliang.mu@intel.com <yanliang.mu@intel.com>; aeubanks@google.com <aeubanks@google.com>; dougpuob@gmail.com <dougpuob@gmail.com>; michael.hliao@gmail.com <michael.hliao@gmail.com>; Neubauer, Sebastian <Sebastian.Neubauer@amd.com>; Song, Ruiling <Ruiling.Song@amd.com>; Saleil, Baptiste <Baptiste.Saleil@amd.com>; Lambert, Jacob <Jacob.Lambert@amd.com>
Subject: [PATCH] D123346: AMDGPU: Align the implicit kernel argument segment to 8 bytes for v5
[CAUTION: External Email]
b-sumner added a comment.
Looks fine to me!
CHANGES SINCE LAST ACTION
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Freviews.llvm.org%2FD123346%2Fnew%2F&data=04%7C01%7Cchangpeng.fang%40amd.com%7C9948ae9fa2004f0af0c008da190862bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637849821660760975%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=HGOsFLQqbL4gLMC8SA4TXspkWjTbp%2FvLUyKG70XLfcw%3D&reserved=0
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp | ||
---|---|---|
991 | It would be better to compute the actual alignment based on the known offset plus kernarg segment base alignment, if only to assert that it's at least 8 |
Use alignTo to force the alignment for the implicit kernarg segment:
Offset = alignTo(Offset, ST.getAlignmentForImplicitArgPtr());
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp | ||
---|---|---|
991 | We can use the alignTo to align the Offset at the very beginning. We don't need to worry about 8 bytes: |
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp | ||
---|---|---|
804 | This isn't the real alignment. You should assert the alignment you compute using the kernarg base alignment is at least getAlignmentForImplicitArgPtr |
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp | ||
---|---|---|
804 | I think this is to align Offset to ST.getAlignmentForImplicitArgPtr which is 8 bytes. We do need this alignment to be in sync with the actual load of the implicit kernel argument. |
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp | ||
---|---|---|
804 | You do not want to use alignTo. You are not adjusting the alignment or offset here. You want to use commonAlignment(16, Offset), and assert this is >= ST.getAlignmentForImplicitArgPtr() |
llvm/lib/Target/AMDGPU/AMDGPUHSAMetadataStreamer.cpp | ||
---|---|---|
804 | OK, you are re-doing the same full layout recomputation in this case, so you do need to alignTo the implicitarg align |
Sorry, implicit-kernel-argument-alignment.ll should be a new addition of the file.
I will correct that. Thanks.
Ping
Now that we agreed that we have to use alignTo to align the Offset to what the implicitarg_ptr requires
and the LIT tests have been updated to show the alignment related layout. Thanks.
This isn't the real alignment. You should assert the alignment you compute using the kernarg base alignment is at least getAlignmentForImplicitArgPtr