Page MenuHomePhabricator

AMDGPU: Implicit kernel arguments related optimization when uniform-workgroup-size=true
Needs ReviewPublic

Authored by cfang on Fri, Aug 5, 11:28 AM.

Details

Reviewers
bcahoon
arsenm
b-sumner
Group Reviewers
Restricted Project
Summary

Under code object version 5, ockl_get_local_size returns the value computed by the expression:
workgroup_id < hidden_block_count ? hidden_group_size : hidden_remainder
For functions with the attribute uniform-work-group-size=true. we can evaluate workgroup_id < hidden_block_count
as true, and thus hidden_group_size is returned for
ockl_get_local_size.

With uniform-workgroup-size=true, this work also set all remainders to zero, and if there
is reqd_work_group_size, we also set work-group-size to the required value from the metadata.

Diff Detail

Unit TestsFailed

TimeTest
60,140 msx64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/scariness_score_test.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/scariness_score_test.cpp.tmp
200 msx64 debian > LLVM.CodeGen/AMDGPU::implicit-arg-v5-opt.ll
Script: -- : 'RUN: at line 2'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -mtriple=amdgcn-amd-amdhsa -S -passes=amdgpu-lower-kernel-attributes,instcombine /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AMDGPU/implicit-arg-v5-opt.ll | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck -enable-var-scope -check-prefix=GCN /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AMDGPU/implicit-arg-v5-opt.ll

Event Timeline

cfang created this revision.Fri, Aug 5, 11:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptFri, Aug 5, 11:28 AM
cfang requested review of this revision.Fri, Aug 5, 11:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptFri, Aug 5, 11:28 AM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Fri, Aug 5, 12:11 PM
llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
79

You've repeated most of the body of the existing function when only the core piece of the match differs.

387–389

You're missing a code object version check since these inputs don't exist for < v5

I also think we're still missing a module flag to indicate the code object version

b-sumner added a comment.EditedFri, Aug 5, 12:43 PM

I also think we're still missing a module flag to indicate the code object version

The module flag was implemented quite a while back. See D119026 from February. From a recent compile:

!llvm.module.flags = !{!0, !1, !2}
!0 = !{i32 1, !"amdgpu_code_object_version", i32 500}

cfang added a comment.Fri, Aug 5, 2:37 PM

I also think we're still missing a module flag to indicate the code object version

The module flag was implemented quite a while back. See D119026 from February. From a recent compile:

!llvm.module.flags = !{!0, !1, !2}
!0 = !{i32 1, !"amdgpu_code_object_version", i32 500}

We were using --amdhsa-code-object-version=5 to run LIT tests. Do you mean this flag will no longer take effect if we
switch to module flag for code object version?

I also think we're still missing a module flag to indicate the code object version

The module flag was implemented quite a while back. See D119026 from February. From a recent compile:

!llvm.module.flags = !{!0, !1, !2}
!0 = !{i32 1, !"amdgpu_code_object_version", i32 500}

We were using --amdhsa-code-object-version=5 to run LIT tests. Do you mean this flag will no longer take effect if we
switch to module flag for code object version?

I think the flag is distinct from the metadata. The module metadata just records which version is being used. You can access it in the code:

auto Ver = mdconst::dyn_extract_or_null<ConstantInt>(M.getModuleFlag("amdgpu_code_object_version"))

and, then if Ver != 5, then there is no need to execute the loop with the calls to processImplicitArgUse().

cfang added a comment.Fri, Aug 5, 10:22 PM

I also think we're still missing a module flag to indicate the code object version

The module flag was implemented quite a while back. See D119026 from February. From a recent compile:

!llvm.module.flags = !{!0, !1, !2}
!0 = !{i32 1, !"amdgpu_code_object_version", i32 500}

We were using --amdhsa-code-object-version=5 to run LIT tests. Do you mean this flag will no longer take effect if we
switch to module flag for code object version?

I think the flag is distinct from the metadata. The module metadata just records which version is being used. You can access it in the code:

auto Ver = mdconst::dyn_extract_or_null<ConstantInt>(M.getModuleFlag("amdgpu_code_object_version"))

and, then if Ver != 5, then there is no need to execute the loop with the calls to processImplicitArgUse().

This should work, Because most LIT tests have been manually written, amdgpu_code_object_version should not be existing in the module flag
for most, if not all, the tests. It is to say, Ver = null. So what should be the default value?

cfang updated this revision to Diff 450474.Fri, Aug 5, 11:46 PM

Do the pattern matching optimizations based on their existence with code object version.
Get code object version from amdhsa-code-object-version module flag. Note that now
I just declare a static function to do as a proof of concept because we don't know what is the default
code object version if the module flag does not exists.
Also, it is beyond this work to get code object version from module flag everywhere in the compiler.

cfang marked an inline comment as done.Fri, Aug 5, 11:50 PM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
79

Actually that is not pure repeat even though the logic is the same. The fields, offsets and even the base pointers are different.

cfang updated this revision to Diff 450883.Mon, Aug 8, 11:09 AM
cfang marked an inline comment as done.

Still use the existing AMDGPU::getAmdhsaCodeObjectVersion() to check code object version.
This is for consistency in the backend. Plan to use module flag in a later patch for all cases.