This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Handle "uniform-work-group-size" attribute
ClosedPublic

Authored by aakanksha555 on Aug 2 2018, 1:26 PM.

Download Raw Diff

Details

Reviewers

arsenm
b-sumner
aakanksha555

Commits

rG729309cc8956: [AMDGPU] Support for "uniform-work-group-size" attribute
rL348971: [AMDGPU] Support for "uniform-work-group-size" attribute

Summary

Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute. This attribute propagates top down from the callers till the callees. Maintained a list in the increasing order of uses (caller to callee). Propagated the uniform-work-group attribute from called functions for each node to the callees. Any functions which do not have the attribute will be added this attribute with a "false" value after this pass.

Diff Detail

Event Timeline

aakanksha555 created this revision.Aug 2 2018, 1:26 PM

Herald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptAug 2 2018, 1:26 PM

arsenm requested changes to this revision.Aug 2 2018, 1:29 PM

arsenm added inline comments.

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
807 ↗	(On Diff #158825)	We cannot rely on function names, or introduce functions such as this. This is also not necessary for optimizing this

This revision now requires changes to proceed.Aug 2 2018, 1:29 PM

A more correct way to optimize this would be to have a CallGraphSCC pass that propagates the uniform-work-group-size attribute to callees only reachable from kernels with uniform-work-group-size

aakanksha555 updated this revision to Diff 164503.Sep 7 2018, 1:39 PM

aakanksha555 edited the summary of this revision. (Show Details)

Herald added a subscriber: jvesely. · View Herald TranscriptSep 7 2018, 1:39 PM

Added the tests.

arsenm added inline comments.Sep 10 2018, 8:49 PM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
280	OS check isn't necessary
283–284	I don't see how this prevents propagating the attribute if other callers do not have it
286–288	Should only be looking for == "true"
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
684	Extra space at end
771–772	Delete
test/CodeGen/AMDGPU/uniform-workgroup-test1.ll
1 ↗	(On Diff #164518)	Tests are missing run lines
25–30 ↗	(On Diff #164518)	This includes attributes added by the pass

aakanksha555 marked 4 inline comments as done.Sep 11 2018, 10:00 AM

aakanksha555 added inline comments.

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
283–284	By other callers, do you mean other kernel function? AttrNames[] does not include "uniform-work-group-size" in the list so it wouldn't get copied to other kernel functions form the callee function.

Added support to ensure the attribute propagates from the caller to the function even within nested function calls.

Updated Test5

Updated the patch to prevent propagating the attribute if other callers do not have it.
Updated Test 5 to show the update.

Found 2 failing tests. Updated code to fix them.

I think you need to split this into a separate loop before the propagate attributes function instead of adding the recursive call at the same time. This is different from the other attributes because it is inferred top down. You should have a first loop over the CallGraphSCC that adds this. Since the CallGraphSCC should have all of the nodes reachable from each other, this should be some set building / checks from there. You shouldn't need to be looking at the instructions inside the functions and looking for specific call sites

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
227–228	You don't need to check if it has the attribute and check the value == false. Just check the value == true
238	This shouldn't use recursion. This will fail when there's a recursive call. It also shouldn't be necessary, because callees should be called before callers in an SCC pass
test/CodeGen/AMDGPU/uniform-work-group-test1.ll
1 ↗	(On Diff #166333)	GCN check prefix is usually used for ISA check lines. It doesn't really matter, but would require more changes if for some reason in the future an llc run line were added
4–5 ↗	(On Diff #166333)	Functions and attribute group variables could use better names (as well as the test file). It would also help to add a comment at the top of each test file for what case this is supposed to be
test/CodeGen/AMDGPU/uniform-work-group-test5.ll
1 ↗	(On Diff #166333)	Triple is broken
23 ↗	(On Diff #166333)	This is accepted by the IR parser? I would remove all of these empty attribute groups

aakanksha555 accepted this revision.Oct 30 2018, 1:04 PM

aakanksha555 marked 5 inline comments as done.

aakanksha555 updated this revision to Diff 171769.Oct 30 2018, 1:17 PM

aakanksha555 edited the summary of this revision. (Show Details)

arsenm added inline comments.Oct 30 2018, 2:24 PM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
83–85	No globals
217	I don't understand the point of anything this function is doing. You're just copying the same information that's already present in the SCC into a slightly different structure
220	Range loop
223	This leaks and should also be unnecessary
230	std::make_pair
239	Typo propagte
337	Comment should be capitalized, have a space after // and be punctuated
test/CodeGen/AMDGPU/uniform-work-group-nested-function-calls.ll
22	Should also test a recursive loop

aakanksha555 marked 8 inline comments as done.Nov 5 2018, 2:29 PM

Got rid of extra structures and added a recursive test.

arsenm added inline comments.Nov 6 2018, 9:10 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
49	llvm:: unnecessary
189	Random whitespace change
220	Space before (
221	You aren't using this as a stack, so this is weird. You could just do a range loop on reverse(SortedNodeList)
338	Comment is misleading since you aren't really sorting it. Also still not clear why the number of uses matter. Is it just to find unused functions?
351	Extra whitespace change
test/CodeGen/AMDGPU/uniform-work-group-resursion-test.ll
1	Missing amdhsa in triple Testname spelling resursion
7	Better check variable name
test/CodeGen/AMDGPU/uniform-work-group-test.ll
33	A testcase with the attribute on an external function might also be useful

aakanksha555 added inline comments.Nov 6 2018, 9:26 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
338	I didn't need to explicitly sort it as push_back is doing it for me. The result is a list ranging from the most number of uses to the least. The uses help to identify Callers and the Callees. For eg. kernel functions which are only calling other functions have just one use. And that is where we need to start the attribute propagation from. This ensures that attributes get propagated through nested function calls.

aakanksha555 added inline comments.Nov 6 2018, 9:29 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
221	I used pop_back since I wanted the last element first. I'll change it to a range loop.

-Updated a test case to include an external function.
-Uploaded the correct version of the recursion test.
-Fixed other small errors.

arsenm added inline comments.Nov 14 2018, 10:04 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
226–228	You should just deal with this fixme and add a test for it. This should be as easy as checking Callee->mayBeRedefined?
253	It would be cleaner to pull all of this into a function which returns true on changed

aakanksha555 added inline comments.Nov 14 2018, 1:53 PM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
226–228	Did you mean Callee->mayBeDerefined() ?

Added a check for external functions and modified the test for it

arsenm added inline comments.Nov 29 2018, 11:34 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
49	Remove Sorted from the name now?
226–228	I think so? I'm not sure what the difference is from isInterposable. This also should check F.hasAddressTaken.
234	You don't need this, which was the point of putting this into a function
242–243	You can just return true directly
253	You can just return true
257	You can just return true
260–262	Can you avoid adding the attribute if not originally present?

aakanksha555 added inline comments.Nov 29 2018, 11:55 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
260–262	If I don't add the attribute if not originally present, it can create a discrepancy in certain scenarios. For eg. A function is called by two kernels, one without the attribute and the other with uniform-work-group-attribute = true. The function will be set as uniform-work-group-attribute = true, which may not be the correct approach.

-Directly returning "true" from the propagateAttribute function as per the comments.
-Renamed the list to NodeList, dropped "Sorted" from the name

LGTM with the long line fixed

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
233	I think this goes over the column limit

This revision is now accepted and ready to land.Dec 10 2018, 12:17 PM

Closed by commit rL348971: [AMDGPU] Support for "uniform-work-group-size" attribute (authored by aakanksha555). · Explain WhyDec 12 2018, 12:54 PM

This revision was automatically updated to reflect the committed changes.

Hi,

This patch breaks RADV (and probably RadeonSI as well). Here's a backtrace of the problem:

$ gdb --args ./deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:

<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./deqp-vk...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/hakzsam/programming/VK-GL-CTS/build/external/vulkancts/modules/vulkan/deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Writing test log into TestResults.qpa
dEQP Core git-12aa347f43c85df3a0daf930739551d3f53d3d48 (0x12aa347f) starting..

target implementation = 'Default'

[New Thread 0x7ffff1968700 (LWP 4723)]
[Thread 0x7ffff1968700 (LWP 4723) exited]
[New Thread 0x7ffff1968700 (LWP 4724)]

Thread 1 "deqp-vk" received signal SIGSEGV, Segmentation fault.
0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
745 *List = this;
(gdb) bt
#0 0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
#1 0x00007ffff572fd87 in llvm::ValueHandleBase::ValueHandleBase (RHS=..., Kind=llvm::ValueHandleBase::WeakTracking, this=0x7fffffffbc50) at ../include/llvm/ADT/PointerIntPair.h:150
#2 llvm::WeakTrackingVH::WeakTrackingVH (RHS=..., this=0x7fffffffbc50) at ../include/llvm/IR/ValueHandle.h:187
#3 std::pair<llvm::WeakTrackingVH, llvm::CallGraphNode*>::pair (this=0x7fffffffbc50) at /usr/include/c++/8.2.1/bits/stl_pair.h:303
#4 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::processUniformWorkGroupAttribute (this=0x555557e87570) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:224
#5 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::runOnSCC (this=<optimized out>, SCC=...) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:355
#6 0x00007ffff51d3d47 in (anonymous namespace)::CGPassManager::RunPassOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CallGraphUpToDate=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., P=

0x555557e87570, this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:141

#7 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:442
#8 (anonymous namespace)::CGPassManager::runOnModule (this=0x555557e87650, M=...) at ../lib/Analysis/CallGraphSCCPass.cpp:498
#9 0x00007ffff4428fba in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=0x555557e65c20) at ../lib/IR/LegacyPassManager.cpp:1744
#10 llvm::legacy::PassManagerImpl::run (this=0x555557e657b0, M=...) at ../lib/IR/LegacyPassManager.cpp:1857
#11 0x00007ffff61fa7a6 in ac_compile_module_to_binary (p=0x555557e65750, module=module@entry=0x555557eb65a0, binary=binary@entry=0x7fffffffc080) at /home/hakzsam/install/llvm/debug/master/include/llvm/IR/Module.h:889
#12 0x00007ffff61b6e2b in radv_llvm_per_thread_info::compile_to_memory_buffer (this=<optimized out>, binary=0x7fffffffc080, module=0x555557eb65a0) at radv_llvm_helper.cpp:97
#13 radv_compile_to_binary (info=info@entry=0x7fffffffc050, module=module@entry=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080)

at radv_llvm_helper.cpp:97

#14 0x00007ffff61b0d81 in ac_llvm_compile (ac_llvm=0x7fffffffc050, binary=0x7fffffffc080, M=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>) at radv_nir_to_llvm.c:3660
#15 ac_compile_llvm_module (ac_llvm=ac_llvm@entry=0x7fffffffc050, llvm_module=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080,

config=0x7fffffffc080, config@entry=0x555557f71778, stage=MESA_SHADER_COMPUTE, options=0x7fffffffc140, shader_info=<optimized out>, shader_info=<optimized out>) at radv_nir_to_llvm.c:3684

#16 0x00007ffff61b65d0 in radv_compile_nir_shader (ac_llvm=ac_llvm@entry=0x7fffffffc050, binary=binary@entry=0x7fffffffc080, config=config@entry=0x555557f71778, shader_info=shader_info@entry=0x555557f717a0,

nir=nir@entry=0x7fffffffc388, nir_count=nir_count@entry=1, options=0x7fffffffc140) at radv_nir_to_llvm.c:3808

#17 0x00007ffff61c56db in shader_variant_create (device=device@entry=0x555557e4d920, module=0x7fffffffcd40, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, stage=MESA_SHADER_COMPUTE,

options=options@entry=0x7fffffffc140, gs_copy_shader=false, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:612

#18 0x00007ffff61c5b04 in radv_shader_variant_create (device=device@entry=0x555557e4d920, module=<optimized out>, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, layout=<optimized out>,

key=key@entry=0x7fffffffc810, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:666

#19 0x00007ffff61b8aa6 in radv_create_shaders (pipeline=0x555557ef5ca0, device=<optimized out>, cache=0x555557e4d998, key=<optimized out>, pStages=<optimized out>, flags=<optimized out>) at radv_pipeline.c:2151
#20 0x00007ffff61bf7eb in radv_compute_pipeline_create (pPipeline=0x555557e4ef70, pAllocator=<optimized out>, pCreateInfo=0x7fffffffcbd0, _cache=<optimized out>, _device=0x555557e4d920) at radv_pipeline.c:3787
#21 radv_CreateComputePipelines (_device=_device@entry=0x555557e4d920, pipelineCache=pipelineCache@entry=0x555557e4d998, count=count@entry=1, pCreateInfos=pCreateInfos@entry=0x7fffffffcbd0, pAllocator=pAllocator@entry=0x0,

pPipelines=pPipelines@entry=0x555557e4ef70) at radv_pipeline.c:3817

#22 0x00007ffff619653a in radv_device_init_meta_itob_state (device=0x555557e4d920) at radv_private.h:1986
#23 radv_device_init_meta_bufimage_state (device=device@entry=0x555557e4d920) at radv_meta_bufimage.c:1489
#24 0x00007ffff6175a4a in radv_device_init_meta (device=device@entry=0x555557e4d920) at radv_meta.c:365
#25 0x00007ffff61680d0 in radv_CreateDevice (physicalDevice=0x555557d7c0e0, pCreateInfo=0x7fffffffd0d0, pAllocator=<optimized out>, pDevice=0x555557d82ec0) at radv_device.c:1702
#26 0x00007ffff640c574 in ?? () from /usr/lib/libvulkan.so.1
#27 0x00007ffff641599b in ?? () from /usr/lib/libvulkan.so.1
#28 0x00007ffff6419b29 in vkCreateDevice () from /usr/lib/libvulkan.so.1
#29 0x0000555556942f7d in vk::createDevice(vk::PlatformInterface const&, vk::VkInstance_s*, vk::InstanceInterface const&, vk::VkPhysicalDevice_s*, vk::VkDeviceCreateInfo const*, vk::VkAllocationCallbacks const*) ()
#30 0x00005555558a3384 in vkt::DefaultDevice::DefaultDevice(vk::PlatformInterface const&, tcu::CommandLine const&) ()
#31 0x00005555558a40e5 in vkt::Context::Context(tcu::TestContext&, vk::PlatformInterface const&, vk::ProgramCollection<vk::ProgramBinary, vk::BinaryBuildOptions>&) ()
#32 0x000055555588c3e2 in vkt::TestCaseExecutor::TestCaseExecutor(tcu::TestContext&) ()
#33 0x000055555588c552 in vkt::TestPackage::createExecutor() const ()
#34 0x0000555556e04964 in tcu::TestSessionExecutor::iterate() ()
#35 0x0000555556dd89a9 in tcu::App::iterate() ()
#36 0x000055555587e4e8 in main ()

Can you look into it?

Thanks!

In D50200#1329491, @hakzsam wrote:
Hi,

This patch breaks RADV (and probably RadeonSI as well). Here's a backtrace of the problem:

$ gdb --args ./deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./deqp-vk...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/hakzsam/programming/VK-GL-CTS/build/external/vulkancts/modules/vulkan/deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Writing test log into TestResults.qpa
dEQP Core git-12aa347f43c85df3a0daf930739551d3f53d3d48 (0x12aa347f) starting..
target implementation = 'Default'
[New Thread 0x7ffff1968700 (LWP 4723)]
[Thread 0x7ffff1968700 (LWP 4723) exited]
[New Thread 0x7ffff1968700 (LWP 4724)]

Thread 1 "deqp-vk" received signal SIGSEGV, Segmentation fault.
0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
745 *List = this;
(gdb) bt
#0 0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
#1 0x00007ffff572fd87 in llvm::ValueHandleBase::ValueHandleBase (RHS=..., Kind=llvm::ValueHandleBase::WeakTracking, this=0x7fffffffbc50) at ../include/llvm/ADT/PointerIntPair.h:150
#2 llvm::WeakTrackingVH::WeakTrackingVH (RHS=..., this=0x7fffffffbc50) at ../include/llvm/IR/ValueHandle.h:187
#3 std::pair<llvm::WeakTrackingVH, llvm::CallGraphNode*>::pair (this=0x7fffffffbc50) at /usr/include/c++/8.2.1/bits/stl_pair.h:303
#4 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::processUniformWorkGroupAttribute (this=0x555557e87570) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:224
#5 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::runOnSCC (this=<optimized out>, SCC=...) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:355
#6 0x00007ffff51d3d47 in (anonymous namespace)::CGPassManager::RunPassOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CallGraphUpToDate=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., P=
0x555557e87570, this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:141
#7 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:442
#8 (anonymous namespace)::CGPassManager::runOnModule (this=0x555557e87650, M=...) at ../lib/Analysis/CallGraphSCCPass.cpp:498
#9 0x00007ffff4428fba in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=0x555557e65c20) at ../lib/IR/LegacyPassManager.cpp:1744
#10 llvm::legacy::PassManagerImpl::run (this=0x555557e657b0, M=...) at ../lib/IR/LegacyPassManager.cpp:1857
#11 0x00007ffff61fa7a6 in ac_compile_module_to_binary (p=0x555557e65750, module=module@entry=0x555557eb65a0, binary=binary@entry=0x7fffffffc080) at /home/hakzsam/install/llvm/debug/master/include/llvm/IR/Module.h:889
#12 0x00007ffff61b6e2b in radv_llvm_per_thread_info::compile_to_memory_buffer (this=<optimized out>, binary=0x7fffffffc080, module=0x555557eb65a0) at radv_llvm_helper.cpp:97
#13 radv_compile_to_binary (info=info@entry=0x7fffffffc050, module=module@entry=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080)
at radv_llvm_helper.cpp:97
#14 0x00007ffff61b0d81 in ac_llvm_compile (ac_llvm=0x7fffffffc050, binary=0x7fffffffc080, M=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>) at radv_nir_to_llvm.c:3660
#15 ac_compile_llvm_module (ac_llvm=ac_llvm@entry=0x7fffffffc050, llvm_module=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080,
config=0x7fffffffc080, config@entry=0x555557f71778, stage=MESA_SHADER_COMPUTE, options=0x7fffffffc140, shader_info=<optimized out>, shader_info=<optimized out>) at radv_nir_to_llvm.c:3684
#16 0x00007ffff61b65d0 in radv_compile_nir_shader (ac_llvm=ac_llvm@entry=0x7fffffffc050, binary=binary@entry=0x7fffffffc080, config=config@entry=0x555557f71778, shader_info=shader_info@entry=0x555557f717a0,
nir=nir@entry=0x7fffffffc388, nir_count=nir_count@entry=1, options=0x7fffffffc140) at radv_nir_to_llvm.c:3808
#17 0x00007ffff61c56db in shader_variant_create (device=device@entry=0x555557e4d920, module=0x7fffffffcd40, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, stage=MESA_SHADER_COMPUTE,
options=options@entry=0x7fffffffc140, gs_copy_shader=false, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:612
#18 0x00007ffff61c5b04 in radv_shader_variant_create (device=device@entry=0x555557e4d920, module=<optimized out>, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, layout=<optimized out>,
key=key@entry=0x7fffffffc810, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:666
#19 0x00007ffff61b8aa6 in radv_create_shaders (pipeline=0x555557ef5ca0, device=<optimized out>, cache=0x555557e4d998, key=<optimized out>, pStages=<optimized out>, flags=<optimized out>) at radv_pipeline.c:2151
#20 0x00007ffff61bf7eb in radv_compute_pipeline_create (pPipeline=0x555557e4ef70, pAllocator=<optimized out>, pCreateInfo=0x7fffffffcbd0, _cache=<optimized out>, _device=0x555557e4d920) at radv_pipeline.c:3787
#21 radv_CreateComputePipelines (_device=_device@entry=0x555557e4d920, pipelineCache=pipelineCache@entry=0x555557e4d998, count=count@entry=1, pCreateInfos=pCreateInfos@entry=0x7fffffffcbd0, pAllocator=pAllocator@entry=0x0,
pPipelines=pPipelines@entry=0x555557e4ef70) at radv_pipeline.c:3817
#22 0x00007ffff619653a in radv_device_init_meta_itob_state (device=0x555557e4d920) at radv_private.h:1986
#23 radv_device_init_meta_bufimage_state (device=device@entry=0x555557e4d920) at radv_meta_bufimage.c:1489
#24 0x00007ffff6175a4a in radv_device_init_meta (device=device@entry=0x555557e4d920) at radv_meta.c:365
#25 0x00007ffff61680d0 in radv_CreateDevice (physicalDevice=0x555557d7c0e0, pCreateInfo=0x7fffffffd0d0, pAllocator=<optimized out>, pDevice=0x555557d82ec0) at radv_device.c:1702
#26 0x00007ffff640c574 in ?? () from /usr/lib/libvulkan.so.1
#27 0x00007ffff641599b in ?? () from /usr/lib/libvulkan.so.1
#28 0x00007ffff6419b29 in vkCreateDevice () from /usr/lib/libvulkan.so.1
#29 0x0000555556942f7d in vk::createDevice(vk::PlatformInterface const&, vk::VkInstance_s*, vk::InstanceInterface const&, vk::VkPhysicalDevice_s*, vk::VkDeviceCreateInfo const*, vk::VkAllocationCallbacks const*) ()
#30 0x00005555558a3384 in vkt::DefaultDevice::DefaultDevice(vk::PlatformInterface const&, tcu::CommandLine const&) ()
#31 0x00005555558a40e5 in vkt::Context::Context(tcu::TestContext&, vk::PlatformInterface const&, vk::ProgramCollection<vk::ProgramBinary, vk::BinaryBuildOptions>&) ()
#32 0x000055555588c3e2 in vkt::TestCaseExecutor::TestCaseExecutor(tcu::TestContext&) ()
#33 0x000055555588c552 in vkt::TestPackage::createExecutor() const ()
#34 0x0000555556e04964 in tcu::TestSessionExecutor::iterate() ()
#35 0x0000555556dd89a9 in tcu::App::iterate() ()
#36 0x000055555587e4e8 in main ()

Can you look into it?

Thanks!

Sorry about that. I have reverted the changes until I fix this.

In D50200#1329491, @hakzsam wrote:
Hi,

This patch breaks RADV (and probably RadeonSI as well). Here's a backtrace of the problem:

$ gdb --args ./deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./deqp-vk...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/hakzsam/programming/VK-GL-CTS/build/external/vulkancts/modules/vulkan/deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Writing test log into TestResults.qpa
dEQP Core git-12aa347f43c85df3a0daf930739551d3f53d3d48 (0x12aa347f) starting..
target implementation = 'Default'
[New Thread 0x7ffff1968700 (LWP 4723)]
[Thread 0x7ffff1968700 (LWP 4723) exited]
[New Thread 0x7ffff1968700 (LWP 4724)]

Thread 1 "deqp-vk" received signal SIGSEGV, Segmentation fault.
0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
745 *List = this;
(gdb) bt
#0 0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
#1 0x00007ffff572fd87 in llvm::ValueHandleBase::ValueHandleBase (RHS=..., Kind=llvm::ValueHandleBase::WeakTracking, this=0x7fffffffbc50) at ../include/llvm/ADT/PointerIntPair.h:150
#2 llvm::WeakTrackingVH::WeakTrackingVH (RHS=..., this=0x7fffffffbc50) at ../include/llvm/IR/ValueHandle.h:187
#3 std::pair<llvm::WeakTrackingVH, llvm::CallGraphNode*>::pair (this=0x7fffffffbc50) at /usr/include/c++/8.2.1/bits/stl_pair.h:303
#4 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::processUniformWorkGroupAttribute (this=0x555557e87570) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:224
#5 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::runOnSCC (this=<optimized out>, SCC=...) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:355
#6 0x00007ffff51d3d47 in (anonymous namespace)::CGPassManager::RunPassOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CallGraphUpToDate=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., P=
0x555557e87570, this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:141
#7 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:442
#8 (anonymous namespace)::CGPassManager::runOnModule (this=0x555557e87650, M=...) at ../lib/Analysis/CallGraphSCCPass.cpp:498
#9 0x00007ffff4428fba in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=0x555557e65c20) at ../lib/IR/LegacyPassManager.cpp:1744
#10 llvm::legacy::PassManagerImpl::run (this=0x555557e657b0, M=...) at ../lib/IR/LegacyPassManager.cpp:1857
#11 0x00007ffff61fa7a6 in ac_compile_module_to_binary (p=0x555557e65750, module=module@entry=0x555557eb65a0, binary=binary@entry=0x7fffffffc080) at /home/hakzsam/install/llvm/debug/master/include/llvm/IR/Module.h:889
#12 0x00007ffff61b6e2b in radv_llvm_per_thread_info::compile_to_memory_buffer (this=<optimized out>, binary=0x7fffffffc080, module=0x555557eb65a0) at radv_llvm_helper.cpp:97
#13 radv_compile_to_binary (info=info@entry=0x7fffffffc050, module=module@entry=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080)
at radv_llvm_helper.cpp:97
#14 0x00007ffff61b0d81 in ac_llvm_compile (ac_llvm=0x7fffffffc050, binary=0x7fffffffc080, M=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>) at radv_nir_to_llvm.c:3660
#15 ac_compile_llvm_module (ac_llvm=ac_llvm@entry=0x7fffffffc050, llvm_module=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080,
config=0x7fffffffc080, config@entry=0x555557f71778, stage=MESA_SHADER_COMPUTE, options=0x7fffffffc140, shader_info=<optimized out>, shader_info=<optimized out>) at radv_nir_to_llvm.c:3684
#16 0x00007ffff61b65d0 in radv_compile_nir_shader (ac_llvm=ac_llvm@entry=0x7fffffffc050, binary=binary@entry=0x7fffffffc080, config=config@entry=0x555557f71778, shader_info=shader_info@entry=0x555557f717a0,
nir=nir@entry=0x7fffffffc388, nir_count=nir_count@entry=1, options=0x7fffffffc140) at radv_nir_to_llvm.c:3808
#17 0x00007ffff61c56db in shader_variant_create (device=device@entry=0x555557e4d920, module=0x7fffffffcd40, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, stage=MESA_SHADER_COMPUTE,
options=options@entry=0x7fffffffc140, gs_copy_shader=false, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:612
#18 0x00007ffff61c5b04 in radv_shader_variant_create (device=device@entry=0x555557e4d920, module=<optimized out>, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, layout=<optimized out>,
key=key@entry=0x7fffffffc810, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:666
#19 0x00007ffff61b8aa6 in radv_create_shaders (pipeline=0x555557ef5ca0, device=<optimized out>, cache=0x555557e4d998, key=<optimized out>, pStages=<optimized out>, flags=<optimized out>) at radv_pipeline.c:2151
#20 0x00007ffff61bf7eb in radv_compute_pipeline_create (pPipeline=0x555557e4ef70, pAllocator=<optimized out>, pCreateInfo=0x7fffffffcbd0, _cache=<optimized out>, _device=0x555557e4d920) at radv_pipeline.c:3787
#21 radv_CreateComputePipelines (_device=_device@entry=0x555557e4d920, pipelineCache=pipelineCache@entry=0x555557e4d998, count=count@entry=1, pCreateInfos=pCreateInfos@entry=0x7fffffffcbd0, pAllocator=pAllocator@entry=0x0,
pPipelines=pPipelines@entry=0x555557e4ef70) at radv_pipeline.c:3817
#22 0x00007ffff619653a in radv_device_init_meta_itob_state (device=0x555557e4d920) at radv_private.h:1986
#23 radv_device_init_meta_bufimage_state (device=device@entry=0x555557e4d920) at radv_meta_bufimage.c:1489
#24 0x00007ffff6175a4a in radv_device_init_meta (device=device@entry=0x555557e4d920) at radv_meta.c:365
#25 0x00007ffff61680d0 in radv_CreateDevice (physicalDevice=0x555557d7c0e0, pCreateInfo=0x7fffffffd0d0, pAllocator=<optimized out>, pDevice=0x555557d82ec0) at radv_device.c:1702
#26 0x00007ffff640c574 in ?? () from /usr/lib/libvulkan.so.1
#27 0x00007ffff641599b in ?? () from /usr/lib/libvulkan.so.1
#28 0x00007ffff6419b29 in vkCreateDevice () from /usr/lib/libvulkan.so.1
#29 0x0000555556942f7d in vk::createDevice(vk::PlatformInterface const&, vk::VkInstance_s*, vk::InstanceInterface const&, vk::VkPhysicalDevice_s*, vk::VkDeviceCreateInfo const*, vk::VkAllocationCallbacks const*) ()
#30 0x00005555558a3384 in vkt::DefaultDevice::DefaultDevice(vk::PlatformInterface const&, tcu::CommandLine const&) ()
#31 0x00005555558a40e5 in vkt::Context::Context(tcu::TestContext&, vk::PlatformInterface const&, vk::ProgramCollection<vk::ProgramBinary, vk::BinaryBuildOptions>&) ()
#32 0x000055555588c3e2 in vkt::TestCaseExecutor::TestCaseExecutor(tcu::TestContext&) ()
#33 0x000055555588c552 in vkt::TestPackage::createExecutor() const ()
#34 0x0000555556e04964 in tcu::TestSessionExecutor::iterate() ()
#35 0x0000555556dd89a9 in tcu::App::iterate() ()
#36 0x000055555587e4e8 in main ()

Can you look into it?

Thanks!

Would you be able to provide LLVM IR? I am trying to reproduce this issue locally but it is taking some time as I haven't worked with RADV yet.

Thanks.

Unfortunately, I can't get any LLVM IR because it crashes too early in the process. You can also reproduce the problem with RadeonSI btw.

Thanks for the revert!

Added a fix for the failing deqp-vk test.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUAnnotateKernelFeatures.cpp

57 lines

AMDGPUTargetMachine.cpp

3 lines

test/

CodeGen/

AMDGPU/

annotate-kernel-features-hsa-call.ll

48 lines

uniform-work-group-attribute-missing.ll

18 lines

uniform-work-group-nested-function-calls.ll

24 lines

uniform-work-group-prevent-attribute-propagation.ll

25 lines

uniform-work-group-propagate-attribute.ll

20 lines

uniform-work-group-resursion-test.ll

39 lines

uniform-work-group-test.ll

36 lines

Diff 172657

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp

Show All 40 Lines

using namespace llvm;		using namespace llvm;

namespace {		namespace {

class AMDGPUAnnotateKernelFeatures : public CallGraphSCCPass {		class AMDGPUAnnotateKernelFeatures : public CallGraphSCCPass {
private:		private:
const TargetMachine *TM = nullptr;		const TargetMachine *TM = nullptr;
		llvm::SmallVector<CallGraphNode*, 8> SortedNodeList;
		arsenmUnsubmitted Done Reply Inline Actions llvm:: unnecessary arsenm: llvm:: unnecessary
		arsenmUnsubmitted Done Reply Inline Actions Remove Sorted from the name now? arsenm: Remove Sorted from the name now?

bool addFeatureAttributes(Function &F);		bool addFeatureAttributes(Function &F);
		bool propagateUniformWorkGroupAttribute();

public:		public:
static char ID;		static char ID;

AMDGPUAnnotateKernelFeatures() : CallGraphSCCPass(ID) {}		AMDGPUAnnotateKernelFeatures() : CallGraphSCCPass(ID) {}

bool doInitialization(CallGraph &CG) override;		bool doInitialization(CallGraph &CG) override;
bool runOnSCC(CallGraphSCC &SCC) override;		bool runOnSCC(CallGraphSCC &SCC) override;
Show All 14 Lines
};		};

} // end anonymous namespace		} // end anonymous namespace

char AMDGPUAnnotateKernelFeatures::ID = 0;		char AMDGPUAnnotateKernelFeatures::ID = 0;

char &llvm::AMDGPUAnnotateKernelFeaturesID = AMDGPUAnnotateKernelFeatures::ID;		char &llvm::AMDGPUAnnotateKernelFeaturesID = AMDGPUAnnotateKernelFeatures::ID;

INITIALIZE_PASS(AMDGPUAnnotateKernelFeatures, DEBUG_TYPE,		INITIALIZE_PASS(AMDGPUAnnotateKernelFeatures, DEBUG_TYPE,
"Add AMDGPU function attributes", false, false)		"Add AMDGPU function attributes", false, false)

		arsenmUnsubmitted Done Reply Inline Actions No globals arsenm: No globals

// The queue ptr is only needed when casting to flat, not from it.		// The queue ptr is only needed when casting to flat, not from it.
static bool castRequiresQueuePtr(unsigned SrcAS) {		static bool castRequiresQueuePtr(unsigned SrcAS) {
return SrcAS == AMDGPUAS::LOCAL_ADDRESS \|\| SrcAS == AMDGPUAS::PRIVATE_ADDRESS;		return SrcAS == AMDGPUAS::LOCAL_ADDRESS \|\| SrcAS == AMDGPUAS::PRIVATE_ADDRESS;
}		}

static bool castRequiresQueuePtr(const AddrSpaceCastInst *ASC) {		static bool castRequiresQueuePtr(const AddrSpaceCastInst *ASC) {
return castRequiresQueuePtr(ASC->getSrcAddressSpace());		return castRequiresQueuePtr(ASC->getSrcAddressSpace());
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
}		}

static bool handleAttr(Function &Parent, const Function &Callee,		static bool handleAttr(Function &Parent, const Function &Callee,
StringRef Name) {		StringRef Name) {
if (Callee.hasFnAttribute(Name)) {		if (Callee.hasFnAttribute(Name)) {
Parent.addFnAttr(Name);		Parent.addFnAttr(Name);
return true;		return true;
}		}

arsenmUnsubmitted Not Done Reply Inline Actions Random whitespace change arsenm: Random whitespace change
return false;		return false;
}		}

static void copyFeaturesToFunction(Function &Parent, const Function &Callee,		static void copyFeaturesToFunction(Function &Parent, const Function &Callee,
bool &NeedQueuePtr) {		bool &NeedQueuePtr) {
// X ids unnecessarily propagated to kernels.		// X ids unnecessarily propagated to kernels.
static const StringRef AttrNames[] = {		static const StringRef AttrNames[] = {
{ "amdgpu-work-item-id-x" },		{ "amdgpu-work-item-id-x" },
Show All 10 Lines	static void copyFeaturesToFunction(Function &Parent, const Function &Callee,

if (handleAttr(Parent, Callee, "amdgpu-queue-ptr"))		if (handleAttr(Parent, Callee, "amdgpu-queue-ptr"))
NeedQueuePtr = true;		NeedQueuePtr = true;

for (StringRef AttrName : AttrNames)		for (StringRef AttrName : AttrNames)
handleAttr(Parent, Callee, AttrName);		handleAttr(Parent, Callee, AttrName);
}		}

		bool AMDGPUAnnotateKernelFeatures::propagateUniformWorkGroupAttribute() {
		arsenmUnsubmitted Done Reply Inline Actions I don't understand the point of anything this function is doing. You're just copying the same information that's already present in the SCC into a slightly different structure arsenm: I don't understand the point of anything this function is doing. You're just copying the same…
		bool Changed = false;

		while(!SortedNodeList.empty()) {
		arsenmUnsubmitted Done Reply Inline Actions Range loop arsenm: Range loop
		arsenmUnsubmitted Done Reply Inline Actions Space before ( arsenm: Space before (
		CallGraphNode *Node = SortedNodeList.pop_back_val();
		arsenmUnsubmitted Done Reply Inline Actions You aren't using this as a stack, so this is weird. You could just do a range loop on reverse(SortedNodeList) arsenm: You aren't using this as a stack, so this is weird. You could just do a range loop on reverse…
		aakanksha555AuthorUnsubmitted Not Done Reply Inline Actions I used pop_back since I wanted the last element first. I'll change it to a range loop. aakanksha555: I used pop_back since I wanted the last element first. I'll change it to a range loop.
		Function *Caller = Node->getFunction();

		arsenmUnsubmitted Done Reply Inline Actions This leaks and should also be unnecessary arsenm: This leaks and should also be unnecessary
		for (auto Iter = Node->begin(), End = Node->end(); Iter != End; ++Iter) {
		Function Callee = std::get<1>(Iter)->getFunction();

		// Check if the Caller has the attribute
		if (Caller->hasFnAttribute("uniform-work-group-size")) {
		arsenmUnsubmitted Done Reply Inline Actions You don't need to check if it has the attribute and check the value == false. Just check the value == true arsenm: You don't need to check if it has the attribute and check the value == false. Just check the…
		arsenmUnsubmitted Done Reply Inline Actions You should just deal with this fixme and add a test for it. This should be as easy as checking Callee->mayBeRedefined? arsenm: You should just deal with this fixme and add a test for it. This should be as easy as checking…
		aakanksha555AuthorUnsubmitted Not Done Reply Inline Actions Did you mean Callee->mayBeDerefined() ? aakanksha555: Did you mean Callee->mayBeDerefined() ?
		arsenmUnsubmitted Not Done Reply Inline Actions I think so? I'm not sure what the difference is from isInterposable. This also should check F.hasAddressTaken. arsenm: I think so? I'm not sure what the difference is from isInterposable. This also should check F.
		if (Callee) {
		// Check if the value of the attribute is true
		arsenmUnsubmitted Done Reply Inline Actions std::make_pair arsenm: std::make_pair
		if (Caller->getFnAttribute("uniform-work-group-size")
		.getValueAsString().equals("true")) {
		// Propagate the attribute to the Callee, if it does not have it
		arsenmUnsubmitted Not Done Reply Inline Actions I think this goes over the column limit arsenm: I think this goes over the column limit
		if (!Callee->hasFnAttribute("uniform-work-group-size")) {
		arsenmUnsubmitted Done Reply Inline Actions You don't need this, which was the point of putting this into a function arsenm: You don't need this, which was the point of putting this into a function
		Callee->addFnAttr("uniform-work-group-size", "true");
		Changed = true;
		}
		} else {
		arsenmUnsubmitted Done Reply Inline Actions This shouldn't use recursion. This will fail when there's a recursive call. It also shouldn't be necessary, because callees should be called before callers in an SCC pass arsenm: This shouldn't use recursion. This will fail when there's a recursive call. It also shouldn't…
		Callee->addFnAttr("uniform-work-group-size", "false");
		arsenmUnsubmitted Done Reply Inline Actions Typo propagte arsenm: Typo propagte
		Changed = true;
		}
		}
		} else {
		arsenmUnsubmitted Done Reply Inline Actions You can just return true directly arsenm: You can just return true directly
		// If the attribute is absent, set it as false
		Caller->addFnAttr("uniform-work-group-size", "false");
		if (Callee)
		Callee->addFnAttr("uniform-work-group-size", "false");
		Changed = true;
		}
		}
		}
		return Changed;
		}
		arsenmUnsubmitted Done Reply Inline Actions It would be cleaner to pull all of this into a function which returns true on changed arsenm: It would be cleaner to pull all of this into a function which returns true on changed
		arsenmUnsubmitted Done Reply Inline Actions You can just return true arsenm: You can just return true

bool AMDGPUAnnotateKernelFeatures::addFeatureAttributes(Function &F) {		bool AMDGPUAnnotateKernelFeatures::addFeatureAttributes(Function &F) {
const GCNSubtarget &ST = TM->getSubtarget<GCNSubtarget>(F);		const GCNSubtarget &ST = TM->getSubtarget<GCNSubtarget>(F);
bool HasFlat = ST.hasFlatAddressSpace();		bool HasFlat = ST.hasFlatAddressSpace();
		arsenmUnsubmitted Done Reply Inline Actions You can just return true arsenm: You can just return true
bool HasApertureRegs = ST.hasApertureRegs();		bool HasApertureRegs = ST.hasApertureRegs();
SmallPtrSet<const Constant *, 8> ConstantExprVisited;		SmallPtrSet<const Constant *, 8> ConstantExprVisited;

bool Changed = false;		bool Changed = false;
bool NeedQueuePtr = false;		bool NeedQueuePtr = false;
		arsenmUnsubmitted Not Done Reply Inline Actions Can you avoid adding the attribute if not originally present? arsenm: Can you avoid adding the attribute if not originally present?
		aakanksha555AuthorUnsubmitted Not Done Reply Inline Actions If I don't add the attribute if not originally present, it can create a discrepancy in certain scenarios. For eg. A function is called by two kernels, one without the attribute and the other with uniform-work-group-attribute = true. The function will be set as uniform-work-group-attribute = true, which may not be the correct approach. aakanksha555: If I don't add the attribute if not originally present, it can create a discrepancy in certain…
bool HaveCall = false;		bool HaveCall = false;
bool IsFunc = !AMDGPU::isEntryFunctionCC(F.getCallingConv());		bool IsFunc = !AMDGPU::isEntryFunctionCC(F.getCallingConv());

for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
for (Instruction &I : BB) {		for (Instruction &I : BB) {
CallSite CS(&I);		CallSite CS(&I);
if (CS) {		if (CS) {
Function *Callee = CS.getCalledFunction();		Function *Callee = CS.getCalledFunction();

// TODO: Do something with indirect calls.		// TODO: Do something with indirect calls.
if (!Callee) {		if (!Callee) {
if (!CS.isInlineAsm())		if (!CS.isInlineAsm())
HaveCall = true;		HaveCall = true;
continue;		continue;
}		}

Intrinsic::ID IID = Callee->getIntrinsicID();		Intrinsic::ID IID = Callee->getIntrinsicID();
if (IID == Intrinsic::not_intrinsic) {		if (IID == Intrinsic::not_intrinsic) {
		arsenmUnsubmitted Done Reply Inline Actions OS check isn't necessary arsenm: OS check isn't necessary
HaveCall = true;		HaveCall = true;
copyFeaturesToFunction(F, *Callee, NeedQueuePtr);		copyFeaturesToFunction(F, *Callee, NeedQueuePtr);
Changed = true;		Changed = true;
} else {		} else {
		arsenmUnsubmitted Not Done Reply Inline Actions I don't see how this prevents propagating the attribute if other callers do not have it arsenm: I don't see how this prevents propagating the attribute if other callers do not have it
		aakanksha555AuthorUnsubmitted Not Done Reply Inline Actions By other callers, do you mean other kernel function? AttrNames[] does not include "uniform-work-group-size" in the list so it wouldn't get copied to other kernel functions form the callee function. aakanksha555: By other callers, do you mean other kernel function? AttrNames[] does not include "uniform…
bool NonKernelOnly = false;		bool NonKernelOnly = false;
StringRef AttrName = intrinsicToAttrName(IID,		StringRef AttrName = intrinsicToAttrName(IID,
NonKernelOnly, NeedQueuePtr);		NonKernelOnly, NeedQueuePtr);
if (!AttrName.empty() && (IsFunc \|\| !NonKernelOnly)) {		if (!AttrName.empty() && (IsFunc \|\| !NonKernelOnly)) {
		arsenmUnsubmitted Not Done Reply Inline Actions Should only be looking for == "true" arsenm: Should only be looking for == "true"
F.addFnAttr(AttrName);		F.addFnAttr(AttrName);
Changed = true;		Changed = true;
}		}
}		}
}		}

if (NeedQueuePtr \|\| HasApertureRegs)		if (NeedQueuePtr \|\| HasApertureRegs)
continue;		continue;
Show All 30 Lines	if (HasFlat && !IsFunc && HaveCall) {
F.addFnAttr("amdgpu-flat-scratch");		F.addFnAttr("amdgpu-flat-scratch");
Changed = true;		Changed = true;
}		}

return Changed;		return Changed;
}		}

bool AMDGPUAnnotateKernelFeatures::runOnSCC(CallGraphSCC &SCC) {		bool AMDGPUAnnotateKernelFeatures::runOnSCC(CallGraphSCC &SCC) {
Module &M = SCC.getCallGraph().getModule();
Triple TT(M.getTargetTriple());

bool Changed = false;		bool Changed = false;

for (CallGraphNode *I : SCC) {		for (CallGraphNode *I : SCC) {
		arsenmUnsubmitted Done Reply Inline Actions Comment should be capitalized, have a space after // and be punctuated arsenm: Comment should be capitalized, have a space after // and be punctuated
		// Build a list of CallGraphNodes that is sorted by the number of uses
		arsenmUnsubmitted Not Done Reply Inline Actions Comment is misleading since you aren't really sorting it. Also still not clear why the number of uses matter. Is it just to find unused functions? arsenm: Comment is misleading since you aren't really sorting it. Also still not clear why the number…
		aakanksha555AuthorUnsubmitted Not Done Reply Inline Actions I didn't need to explicitly sort it as push_back is doing it for me. The result is a list ranging from the most number of uses to the least. The uses help to identify Callers and the Callees. For eg. kernel functions which are only calling other functions have just one use. And that is where we need to start the attribute propagation from. This ensures that attributes get propagated through nested function calls. aakanksha555: I didn't need to explicitly sort it as push_back is doing it for me. The result is a list…
		if (I->getNumReferences())
		SortedNodeList.push_back(I);
		else
		propagateUniformWorkGroupAttribute();

Function *F = I->getFunction();		Function *F = I->getFunction();
		// Add feature attributes
if (!F \|\| F->isDeclaration())		if (!F \|\| F->isDeclaration())
continue;		continue;

Changed \|= addFeatureAttributes(*F);		Changed \|= addFeatureAttributes(*F);
}		}

return Changed;		return Changed;

		arsenmUnsubmitted Not Done Reply Inline Actions Extra whitespace change arsenm: Extra whitespace change
}		}

bool AMDGPUAnnotateKernelFeatures::doInitialization(CallGraph &CG) {		bool AMDGPUAnnotateKernelFeatures::doInitialization(CallGraph &CG) {
auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();		auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
if (!TPC)		if (!TPC)
report_fatal_error("TargetMachine is required");		report_fatal_error("TargetMachine is required");

TM = &TPC->getTM<TargetMachine>();		TM = &TPC->getTM<TargetMachine>();
return false;		return false;
}		}

Pass *llvm::createAMDGPUAnnotateKernelFeaturesPass() {		Pass *llvm::createAMDGPUAnnotateKernelFeaturesPass() {
return new AMDGPUAnnotateKernelFeatures();		return new AMDGPUAnnotateKernelFeatures();
}		}

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 674 Lines • ▼ Show 20 Lines	void AMDGPUPassConfig::addIRPasses() {
// %1 = shl %a, 2		// %1 = shl %a, 2
//		//
// but EarlyCSE can do neither of them.		// but EarlyCSE can do neither of them.
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None)
addEarlyCSEOrGVNPass();		addEarlyCSEOrGVNPass();
}		}

void AMDGPUPassConfig::addCodeGenPrepare() {		void AMDGPUPassConfig::addCodeGenPrepare() {
		if (TM->getTargetTriple().getArch() == Triple::amdgcn)
		addPass(createAMDGPUAnnotateKernelFeaturesPass());
		arsenmUnsubmitted Done Reply Inline Actions Extra space at end arsenm: Extra space at end
if (TM->getTargetTriple().getArch() == Triple::amdgcn &&		if (TM->getTargetTriple().getArch() == Triple::amdgcn &&
EnableLowerKernelArguments)		EnableLowerKernelArguments)
addPass(createAMDGPULowerKernelArgumentsPass());		addPass(createAMDGPULowerKernelArgumentsPass());

TargetPassConfig::addCodeGenPrepare();		TargetPassConfig::addCodeGenPrepare();

if (EnableLoadStoreVectorizer)		if (EnableLoadStoreVectorizer)
addPass(createLoadStoreVectorizerPass());		addPass(createLoadStoreVectorizerPass());
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
bool GCNPassConfig::addPreISel() {		bool GCNPassConfig::addPreISel() {
AMDGPUPassConfig::addPreISel();		AMDGPUPassConfig::addPreISel();

if (EnableAtomicOptimizations) {		if (EnableAtomicOptimizations) {
addPass(createAMDGPUAtomicOptimizerPass());		addPass(createAMDGPUAtomicOptimizerPass());
}		}

// FIXME: We need to run a pass to propagate the attributes when calls are		// FIXME: We need to run a pass to propagate the attributes when calls are
// supported.		// supported.
addPass(createAMDGPUAnnotateKernelFeaturesPass());

		arsenmUnsubmitted Done Reply Inline Actions Delete arsenm: Delete
// Merge divergent exit nodes. StructurizeCFG won't recognize the multi-exit		// Merge divergent exit nodes. StructurizeCFG won't recognize the multi-exit
// regions formed by them.		// regions formed by them.
addPass(&AMDGPUUnifyDivergentExitNodesID);		addPass(&AMDGPUUnifyDivergentExitNodesID);
if (!LateCFGStructurize) {		if (!LateCFGStructurize) {
addPass(createStructurizeCFGPass(true)); // true -> SkipUniformRegions		addPass(createStructurizeCFGPass(true)); // true -> SkipUniformRegions
}		}
addPass(createSinkingPass());		addPass(createSinkingPass());
addPass(createAMDGPUAnnotateUniformValues());		addPass(createAMDGPUAnnotateUniformValues());
▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll

	Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines

	; HSA: define amdgpu_kernel void @kern_use_implicitarg_ptr() #15 {			; HSA: define amdgpu_kernel void @kern_use_implicitarg_ptr() #15 {
	define amdgpu_kernel void @kern_use_implicitarg_ptr() #1 {			define amdgpu_kernel void @kern_use_implicitarg_ptr() #1 {
	%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()			%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()
	store volatile i8 addrspace(4)* %implicitarg.ptr, i8 addrspace(4)* addrspace(1)* undef			store volatile i8 addrspace(4)* %implicitarg.ptr, i8 addrspace(4)* addrspace(1)* undef
	ret void			ret void
	}			}

	; HSA: define void @use_implicitarg_ptr() #15 {			; HSA: define void @use_implicitarg_ptr() #16 {
	define void @use_implicitarg_ptr() #1 {			define void @use_implicitarg_ptr() #1 {
	%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()			%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()
	store volatile i8 addrspace(4)* %implicitarg.ptr, i8 addrspace(4)* addrspace(1)* undef			store volatile i8 addrspace(4)* %implicitarg.ptr, i8 addrspace(4)* addrspace(1)* undef
	ret void			ret void
	}			}

	; HSA: define void @func_indirect_use_implicitarg_ptr() #15 {			; HSA: define void @func_indirect_use_implicitarg_ptr() #16 {
	define void @func_indirect_use_implicitarg_ptr() #1 {			define void @func_indirect_use_implicitarg_ptr() #1 {
	call void @use_implicitarg_ptr()			call void @use_implicitarg_ptr()
	ret void			ret void
	}			}

	; HSA: declare void @external.func() #16			; HSA: declare void @external.func() #17
	declare void @external.func() #3			declare void @external.func() #3

	; HSA: define internal void @defined.func() #16 {			; HSA: define internal void @defined.func() #17 {
	define internal void @defined.func() #3 {			define internal void @defined.func() #3 {
	ret void			ret void
	}			}

	; HSA: define void @func_call_external() #16 {			; HSA: define void @func_call_external() #17 {
	define void @func_call_external() #3 {			define void @func_call_external() #3 {
	call void @external.func()			call void @external.func()
	ret void			ret void
	}			}

	; HSA: define void @func_call_defined() #16 {			; HSA: define void @func_call_defined() #17 {
	define void @func_call_defined() #3 {			define void @func_call_defined() #3 {
	call void @defined.func()			call void @defined.func()
	ret void			ret void
	}			}

	; HSA: define void @func_call_asm() #16 {			; HSA: define void @func_call_asm() #17 {
	define void @func_call_asm() #3 {			define void @func_call_asm() #3 {
	call void asm sideeffect "", ""() #3			call void asm sideeffect "", ""() #3
	ret void			ret void
	}			}

	; HSA: define amdgpu_kernel void @kern_call_external() #17 {			; HSA: define amdgpu_kernel void @kern_call_external() #18 {
	define amdgpu_kernel void @kern_call_external() #3 {			define amdgpu_kernel void @kern_call_external() #3 {
	call void @external.func()			call void @external.func()
	ret void			ret void
	}			}

	; HSA: define amdgpu_kernel void @func_kern_defined() #17 {			; HSA: define amdgpu_kernel void @func_kern_defined() #18 {
	define amdgpu_kernel void @func_kern_defined() #3 {			define amdgpu_kernel void @func_kern_defined() #3 {
	call void @defined.func()			call void @defined.func()
	ret void			ret void
	}			}

	attributes #0 = { nounwind readnone speculatable }			attributes #0 = { nounwind readnone speculatable }
	attributes #1 = { nounwind "target-cpu"="fiji" }			attributes #1 = { nounwind "target-cpu"="fiji" }
	attributes #2 = { nounwind "target-cpu"="gfx900" }			attributes #2 = { nounwind "target-cpu"="gfx900" }
	attributes #3 = { nounwind }			attributes #3 = { nounwind }

	; HSA: attributes #0 = { nounwind readnone speculatable }			; HSA: attributes #0 = { nounwind readnone speculatable }
	; HSA: attributes #1 = { nounwind "amdgpu-work-item-id-x" "target-cpu"="fiji" }			; HSA: attributes #1 = { nounwind "amdgpu-work-item-id-x" "target-cpu"="fiji" "uniform-work-group-size"="false" }
	; HSA: attributes #2 = { nounwind "amdgpu-work-item-id-y" "target-cpu"="fiji" }			; HSA: attributes #2 = { nounwind "amdgpu-work-item-id-y" "target-cpu"="fiji" "uniform-work-group-size"="false" }
	; HSA: attributes #3 = { nounwind "amdgpu-work-item-id-z" "target-cpu"="fiji" }			; HSA: attributes #3 = { nounwind "amdgpu-work-item-id-z" "target-cpu"="fiji" "uniform-work-group-size"="false" }
	; HSA: attributes #4 = { nounwind "amdgpu-work-group-id-x" "target-cpu"="fiji" }			; HSA: attributes #4 = { nounwind "amdgpu-work-group-id-x" "target-cpu"="fiji" "uniform-work-group-size"="false" }
	; HSA: attributes #5 = { nounwind "amdgpu-work-group-id-y" "target-cpu"="fiji" }			; HSA: attributes #5 = { nounwind "amdgpu-work-group-id-y" "target-cpu"="fiji" "uniform-work-group-size"="false" }
	; HSA: attributes #6 = { nounwind "amdgpu-work-group-id-z" "target-cpu"="fiji" }			; HSA: attributes #6 = { nounwind "amdgpu-work-group-id-z" "target-cpu"="fiji" "uniform-work-group-size"="false" }
	; HSA: attributes #7 = { nounwind "amdgpu-dispatch-ptr" "target-cpu"="fiji" }			; HSA: attributes #7 = { nounwind "amdgpu-dispatch-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }
	; HSA: attributes #8 = { nounwind "amdgpu-queue-ptr" "target-cpu"="fiji" }			; HSA: attributes #8 = { nounwind "amdgpu-queue-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }
	; HSA: attributes #9 = { nounwind "amdgpu-dispatch-id" "target-cpu"="fiji" }			; HSA: attributes #9 = { nounwind "amdgpu-dispatch-id" "target-cpu"="fiji" "uniform-work-group-size"="false" }
	; HSA: attributes #10 = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "target-cpu"="fiji" }			; HSA: attributes #10 = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "target-cpu"="fiji" }
	; HSA: attributes #11 = { nounwind "target-cpu"="fiji" }			; HSA: attributes #11 = { nounwind "target-cpu"="fiji" "uniform-work-group-size"="false" }
	; HSA: attributes #12 = { nounwind "target-cpu"="gfx900" }			; HSA: attributes #12 = { nounwind "target-cpu"="gfx900" "uniform-work-group-size"="false" }
	; HSA: attributes #13 = { nounwind "amdgpu-queue-ptr" "target-cpu"="gfx900" }			; HSA: attributes #13 = { nounwind "amdgpu-queue-ptr" "target-cpu"="gfx900" "uniform-work-group-size"="false" }
	; HSA: attributes #14 = { nounwind "amdgpu-kernarg-segment-ptr" "target-cpu"="fiji" }			; HSA: attributes #14 = { nounwind "amdgpu-kernarg-segment-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }
	; HSA: attributes #15 = { nounwind "amdgpu-implicitarg-ptr" "target-cpu"="fiji" }			; HSA: attributes #15 = { nounwind "amdgpu-implicitarg-ptr" "target-cpu"="fiji" }
	; HSA: attributes #16 = { nounwind }			; HSA: attributes #16 = { nounwind "amdgpu-implicitarg-ptr" "target-cpu"="fiji" "uniform-work-group-size"="false" }
	; HSA: attributes #17 = { nounwind "amdgpu-flat-scratch" }			; HSA: attributes #17 = { nounwind "uniform-work-group-size"="false" }

test/CodeGen/AMDGPU/uniform-work-group-attribute-missing.ll

This file was added.

				; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck %s

				; Test 1
				; "If the kernel does not have the uniform-work-group-attribute, set both callee and caller as false"
				; CHECK: define void @foo() #[[FOO:[0-9]+]] {
				define void @foo() #0 {
				ret void
				}

				; CHECK: define amdgpu_kernel void @kernel1() #[[FOO]] {
				define amdgpu_kernel void @kernel1() #1 {
				call void @foo()
				ret void
				}

				attributes #0 = { "uniform-work-group-size"="true" }

				; CHECK: attributes #[[FOO]] = { "uniform-work-group-size"="false" }

test/CodeGen/AMDGPU/uniform-work-group-nested-function-calls.ll

This file was added.

				; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck %s

				; Test 3
				; "Test to verify if the attribute gets propagated across nested function calls"
				; CHECK: define void @func1() #[[FUNC:[0-9]+]] {
				define void @func1() #0 {
				ret void
				}

				; CHECK: define void @func2() #[[FUNC]] {
				define void @func2() #1 {
				call void @func1()
				ret void
				}

				; CHECK: define amdgpu_kernel void @kernel3() #[[FUNC:[0-9]+]] {
				define amdgpu_kernel void @kernel3() #2 {
				call void @func2()
				ret void
				}

				attributes #2 = { "uniform-work-group-size"="true" }
				arsenmUnsubmitted Done Reply Inline Actions Should also test a recursive loop arsenm: Should also test a recursive loop

				; CHECK: attributes #[[FUNC]] = { "uniform-work-group-size"="true" }

test/CodeGen/AMDGPU/uniform-work-group-prevent-attribute-propagation.ll

This file was added.

				; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck %s

				; Test 5
				; "Two kernels with different values of the uniform-work-group-attribute call the same function"
				; CHECK: define void @func() #[[FUNC:[0-9]+]] {
				define void @func() #0 {
				ret void
				}

				; CHECK: define amdgpu_kernel void @kernel1() #[[KERNEL1:[0-9]+]] {
				define amdgpu_kernel void @kernel1() #1 {
				call void @func()
				ret void
				}

				; CHECK: define amdgpu_kernel void @kernel2() #[[FUNC]] {
				define amdgpu_kernel void @kernel2() #2 {
				call void @func()
				ret void
				}

				attributes #1 = { "uniform-work-group-size"="true" }

				; CHECK: attributes #[[FUNC]] = { "uniform-work-group-size"="false" }
				; CHECK: attributes #[[KERNEL1]] = { "uniform-work-group-size"="true" }

test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll

This file was added.

				; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck %s

				; Test 2
				; "Propagate the uniform-work-group-attribute from the kernel to callee if it doesn't have it"
				; CHECK: define void @func() #[[FUNC:[0-9]+]] {
				define void @func() #0 {
				ret void
				}

				; CHECK: define amdgpu_kernel void @kernel() #[[KERNEL:[0-9]+]] {
				define amdgpu_kernel void @kernel() #1 {
				call void @func()
				ret void
				}

				attributes #0 = { nounwind }
				attributes #1 = { "uniform-work-group-size"="false" }

				; CHECK: attributes #[[FUNC]] = { nounwind "uniform-work-group-size"="false" }
				; CHECK: attributes #[[KERNEL]] = { "uniform-work-group-size"="false" }

test/CodeGen/AMDGPU/uniform-work-group-resursion-test.ll

This file was added.

				; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck %s
				arsenmUnsubmitted Done Reply Inline Actions Missing amdhsa in triple Testname spelling resursion arsenm: Missing amdhsa in triple Testname spelling resursion

				; Test 5
				; Test to ensure recursive functions exhibit proper behaviour
				; Add all numbers below n

				; CHECK: define i32 @fib(i32 %n) #[[FOO:[0-9]+]] {
				arsenmUnsubmitted Done Reply Inline Actions Better check variable name arsenm: Better check variable name
				define i32 @fib(i32 %n) #0 {
				%cmp1 = icmp eq i32 %n, 0
				br i1 %cmp1, label %exit, label %cont1

				cont1:
				%cmp2 = icmp eq i32 %n, 1
				br i1 %cmp2, label %exit, label %cont2

				cont2:
				%nm1 = sub i32 %n, 1
				%fibm1 = call i32 @fib(i32 %nm1)
				%nm2 = sub i32 %n, 2
				%fibm2 = call i32 @fib(i32 %nm2)
				%retval = add i32 %fibm1, %fibm2

				ret i32 %retval

				exit:
				ret i32 1

				}

				; CHECK: define amdgpu_kernel void @kernel1(i32 addrspace(1)* %m) #[[FOO]] {
				define amdgpu_kernel void @kernel1(i32 addrspace(1)* %m) #1 {
				%r = call i32 @fib(i32 5)
				store i32 %r, i32 addrspace(1)* %m
				ret void
				}

				attributes #1 = { "uniform-work-group-size"="true" }

				; CHECK: attributes #[[FOO]] = { "uniform-work-group-size"="true" }

test/CodeGen/AMDGPU/uniform-work-group-test.ll

This file was added.

				; RUN: opt -S -mtriple=amdgcn-amd- -amdgpu-annotate-kernel-features %s \| FileCheck %s

				; Test 3
				; "Test to verify if the attribute gets propagated across nested function calls"
				; CHECK: define void @func1() #[[FUNC:[0-9]+]] {
				define void @func1() #0 {
				ret void
				}
				; CHECK: define void @func4() #[[FUNC]] {
				define void @func4() #1 {
				ret void
				}

				; CHECK: define void @func2() #[[FUNC]] {
				define void @func2() #1 {
				call void @func4()
				call void @func1()
				ret void
				}

				; CHECK: define void @func3() #[[FUNC]] {
				define void @func3() #1 {
				call void @func1()
				ret void
				}

				; CHECK: define amdgpu_kernel void @kernel3() #[[FUNC:[0-9]+]] {
				define amdgpu_kernel void @kernel3() #2 {
				call void @func2()
				call void @func3()
				ret void
				}

				arsenmUnsubmitted Done Reply Inline Actions A testcase with the attribute on an external function might also be useful arsenm: A testcase with the attribute on an external function might also be useful
				attributes #2 = { "uniform-work-group-size"="true" }

				; CHECK: attributes #[[FUNC]] = { "uniform-work-group-size"="true" }