This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Implicit kernel arguments related optimization when uniform-workgroup-size=true
ClosedPublic

Authored by cfang on Aug 5 2022, 11:28 AM.

Details

Summary

Under code object version 5, ockl_get_local_size returns the value computed by the expression:
workgroup_id < hidden_block_count ? hidden_group_size : hidden_remainder
For functions with the attribute uniform-work-group-size=true. we can evaluate workgroup_id < hidden_block_count
as true, and thus hidden_group_size is returned for
ockl_get_local_size.

With uniform-workgroup-size=true, this work also set all remainders to zero, and if there
is reqd_work_group_size, we also set work-group-size to the required value from the metadata.

Diff Detail

Event Timeline

cfang created this revision.Aug 5 2022, 11:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 11:28 AM
cfang requested review of this revision.Aug 5 2022, 11:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 11:28 AM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Aug 5 2022, 12:11 PM
llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
79

You've repeated most of the body of the existing function when only the core piece of the match differs.

387–389

You're missing a code object version check since these inputs don't exist for < v5

I also think we're still missing a module flag to indicate the code object version

b-sumner added a comment.EditedAug 5 2022, 12:43 PM

I also think we're still missing a module flag to indicate the code object version

The module flag was implemented quite a while back. See D119026 from February. From a recent compile:

!llvm.module.flags = !{!0, !1, !2}
!0 = !{i32 1, !"amdgpu_code_object_version", i32 500}

cfang added a comment.Aug 5 2022, 2:37 PM

I also think we're still missing a module flag to indicate the code object version

The module flag was implemented quite a while back. See D119026 from February. From a recent compile:

!llvm.module.flags = !{!0, !1, !2}
!0 = !{i32 1, !"amdgpu_code_object_version", i32 500}

We were using --amdhsa-code-object-version=5 to run LIT tests. Do you mean this flag will no longer take effect if we
switch to module flag for code object version?

I also think we're still missing a module flag to indicate the code object version

The module flag was implemented quite a while back. See D119026 from February. From a recent compile:

!llvm.module.flags = !{!0, !1, !2}
!0 = !{i32 1, !"amdgpu_code_object_version", i32 500}

We were using --amdhsa-code-object-version=5 to run LIT tests. Do you mean this flag will no longer take effect if we
switch to module flag for code object version?

I think the flag is distinct from the metadata. The module metadata just records which version is being used. You can access it in the code:

auto Ver = mdconst::dyn_extract_or_null<ConstantInt>(M.getModuleFlag("amdgpu_code_object_version"))

and, then if Ver != 5, then there is no need to execute the loop with the calls to processImplicitArgUse().

cfang added a comment.Aug 5 2022, 10:22 PM

I also think we're still missing a module flag to indicate the code object version

The module flag was implemented quite a while back. See D119026 from February. From a recent compile:

!llvm.module.flags = !{!0, !1, !2}
!0 = !{i32 1, !"amdgpu_code_object_version", i32 500}

We were using --amdhsa-code-object-version=5 to run LIT tests. Do you mean this flag will no longer take effect if we
switch to module flag for code object version?

I think the flag is distinct from the metadata. The module metadata just records which version is being used. You can access it in the code:

auto Ver = mdconst::dyn_extract_or_null<ConstantInt>(M.getModuleFlag("amdgpu_code_object_version"))

and, then if Ver != 5, then there is no need to execute the loop with the calls to processImplicitArgUse().

This should work, Because most LIT tests have been manually written, amdgpu_code_object_version should not be existing in the module flag
for most, if not all, the tests. It is to say, Ver = null. So what should be the default value?

cfang updated this revision to Diff 450474.Aug 5 2022, 11:46 PM

Do the pattern matching optimizations based on their existence with code object version.
Get code object version from amdhsa-code-object-version module flag. Note that now
I just declare a static function to do as a proof of concept because we don't know what is the default
code object version if the module flag does not exists.
Also, it is beyond this work to get code object version from module flag everywhere in the compiler.

cfang marked an inline comment as done.Aug 5 2022, 11:50 PM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
79

Actually that is not pure repeat even though the logic is the same. The fields, offsets and even the base pointers are different.

cfang updated this revision to Diff 450883.Aug 8 2022, 11:09 AM
cfang marked an inline comment as done.

Still use the existing AMDGPU::getAmdhsaCodeObjectVersion() to check code object version.
This is for consistency in the backend. Plan to use module flag in a later patch for all cases.

arsenm requested changes to this revision.Sep 16 2022, 9:09 AM

This should really merge the processUse logic together

llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
79

The entire core logic is the same. You can select between the pointers, fields and offsets within the same function.

This revision now requires changes to proceed.Sep 16 2022, 9:09 AM
cfang updated this revision to Diff 461417.Sep 19 2022, 4:56 PM

Merge two ProcessUse functions, and do selection based on code object version.

arsenm added inline comments.Sep 20 2022, 11:16 AM
llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
47–49

Should be named HIDDEN_BLOCK_*?

91–94

Hidden?

326–327

Looks like something weird is happening with the return indentation

370

This isn't reading the IR/module flag

374

else if

cfang marked 6 inline comments as done.Sep 20 2022, 2:29 PM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
47–49

Ok, named with hidden_ prefix to reflect the field names. Thanks.

91–94

These are just temporary variable names, and some of them are shared with pre-v5 implementation. I am thinking it
is better not to add hidden_ prefix.

326–327

Fixed.

370

Waiting for clang to generate a complete code object versions (only v5 at this moment). This is be updated with a separate task (patch).

374

Done. Thanks.

cfang updated this revision to Diff 461707.Sep 20 2022, 2:32 PM
cfang marked 5 inline comments as done.

Update based on arsenm's comments.

arsenm added inline comments.Sep 20 2022, 4:24 PM
llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
340

Typo ImplicitArf

343–350

These can reuse the same set and user loop

cfang marked 2 inline comments as done.Sep 20 2022, 5:12 PM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
340

Thanks.

343–350

Done. Thanks for the suggestions.

cfang updated this revision to Diff 461765.Sep 20 2022, 5:14 PM
cfang marked 2 inline comments as done.

Updated based on arsenm's comment to merge two cases.

arsenm accepted this revision.Sep 20 2022, 5:16 PM

LGTM

This revision is now accepted and ready to land.Sep 20 2022, 5:16 PM
This revision was landed with ongoing or failed builds.Sep 20 2022, 5:27 PM
This revision was automatically updated to reflect the committed changes.