Download Raw Diff

Details

Reviewers

arsenm
b-sumner
aakanksha555

Commits

rG729309cc8956: [AMDGPU] Support for "uniform-work-group-size" attribute
rL348971: [AMDGPU] Support for "uniform-work-group-size" attribute

Summary

Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute. This attribute propagates top down from the callers till the callees. Maintained a list in the increasing order of uses (caller to callee). Propagated the uniform-work-group attribute from called functions for each node to the callees. Any functions which do not have the attribute will be added this attribute with a "false" value after this pass.

Diff Detail

Event Timeline

aakanksha555 created this revision.Aug 2 2018, 1:26 PM

Herald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptAug 2 2018, 1:26 PM

arsenm requested changes to this revision.Aug 2 2018, 1:29 PM

arsenm added inline comments.

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
807	We cannot rely on function names, or introduce functions such as this. This is also not necessary for optimizing this

This revision now requires changes to proceed.Aug 2 2018, 1:29 PM

A more correct way to optimize this would be to have a CallGraphSCC pass that propagates the uniform-work-group-size attribute to callees only reachable from kernels with uniform-work-group-size

aakanksha555 updated this revision to Diff 164503.Sep 7 2018, 1:39 PM

aakanksha555 edited the summary of this revision. (Show Details)

Herald added a subscriber: jvesely. · View Herald TranscriptSep 7 2018, 1:39 PM

Added the tests.

arsenm added inline comments.Sep 10 2018, 8:49 PM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
241 ↗	(On Diff #164518)	OS check isn't necessary
244–245 ↗	(On Diff #164518)	I don't see how this prevents propagating the attribute if other callers do not have it
247–249 ↗	(On Diff #164518)	Should only be looking for == "true"
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
671 ↗	(On Diff #164518)	Extra space at end
756 ↗	(On Diff #164518)	Delete
test/CodeGen/AMDGPU/uniform-workgroup-test1.ll
1 ↗	(On Diff #164518)	Tests are missing run lines
25–30 ↗	(On Diff #164518)	This includes attributes added by the pass

aakanksha555 marked 4 inline comments as done.Sep 11 2018, 10:00 AM

aakanksha555 added inline comments.

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
244–245 ↗	(On Diff #164518)	By other callers, do you mean other kernel function? AttrNames[] does not include "uniform-work-group-size" in the list so it wouldn't get copied to other kernel functions form the callee function.

Added support to ensure the attribute propagates from the caller to the function even within nested function calls.

Updated Test5

Updated the patch to prevent propagating the attribute if other callers do not have it.
Updated Test 5 to show the update.

Found 2 failing tests. Updated code to fix them.

I think you need to split this into a separate loop before the propagate attributes function instead of adding the recursive call at the same time. This is different from the other attributes because it is inferred top down. You should have a first loop over the CallGraphSCC that adds this. Since the CallGraphSCC should have all of the nodes reachable from each other, this should be some set building / checks from there. You shouldn't need to be looking at the instructions inside the functions and looking for specific call sites

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
227–228 ↗	(On Diff #166333)	You don't need to check if it has the attribute and check the value == false. Just check the value == true
238 ↗	(On Diff #166333)	This shouldn't use recursion. This will fail when there's a recursive call. It also shouldn't be necessary, because callees should be called before callers in an SCC pass
test/CodeGen/AMDGPU/uniform-work-group-test1.ll
1 ↗	(On Diff #166333)	GCN check prefix is usually used for ISA check lines. It doesn't really matter, but would require more changes if for some reason in the future an llc run line were added
4–5 ↗	(On Diff #166333)	Functions and attribute group variables could use better names (as well as the test file). It would also help to add a comment at the top of each test file for what case this is supposed to be
test/CodeGen/AMDGPU/uniform-work-group-test5.ll
1 ↗	(On Diff #166333)	Triple is broken
23 ↗	(On Diff #166333)	This is accepted by the IR parser? I would remove all of these empty attribute groups

aakanksha555 accepted this revision.Oct 30 2018, 1:04 PM

aakanksha555 marked 5 inline comments as done.

aakanksha555 updated this revision to Diff 171769.Oct 30 2018, 1:17 PM

aakanksha555 edited the summary of this revision. (Show Details)

arsenm added inline comments.Oct 30 2018, 2:24 PM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
91–93 ↗	(On Diff #171769)	No globals
229 ↗	(On Diff #171769)	I don't understand the point of anything this function is doing. You're just copying the same information that's already present in the SCC into a slightly different structure
232 ↗	(On Diff #171769)	Range loop
235 ↗	(On Diff #171769)	This leaks and should also be unnecessary
242 ↗	(On Diff #171769)	std::make_pair
251 ↗	(On Diff #171769)	Typo propagte
372 ↗	(On Diff #171769)	Comment should be capitalized, have a space after // and be punctuated
test/CodeGen/AMDGPU/uniform-work-group-nested-function-calls.ll
21 ↗	(On Diff #171769)	Should also test a recursive loop

aakanksha555 marked 8 inline comments as done.Nov 5 2018, 2:29 PM

Got rid of extra structures and added a recursive test.

arsenm added inline comments.Nov 6 2018, 9:10 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
49 ↗	(On Diff #172657)	llvm:: unnecessary
189 ↗	(On Diff #172657)	Random whitespace change
220 ↗	(On Diff #172657)	Space before (
221 ↗	(On Diff #172657)	You aren't using this as a stack, so this is weird. You could just do a range loop on reverse(SortedNodeList)
338 ↗	(On Diff #172657)	Comment is misleading since you aren't really sorting it. Also still not clear why the number of uses matter. Is it just to find unused functions?
351 ↗	(On Diff #172657)	Extra whitespace change
test/CodeGen/AMDGPU/uniform-work-group-resursion-test.ll
1 ↗	(On Diff #172657)	Missing amdhsa in triple Testname spelling resursion
7 ↗	(On Diff #172657)	Better check variable name
test/CodeGen/AMDGPU/uniform-work-group-test.ll
33 ↗	(On Diff #172657)	A testcase with the attribute on an external function might also be useful

aakanksha555 added inline comments.Nov 6 2018, 9:26 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
338 ↗	(On Diff #172657)	I didn't need to explicitly sort it as push_back is doing it for me. The result is a list ranging from the most number of uses to the least. The uses help to identify Callers and the Callees. For eg. kernel functions which are only calling other functions have just one use. And that is where we need to start the attribute propagation from. This ensures that attributes get propagated through nested function calls.

aakanksha555 added inline comments.Nov 6 2018, 9:29 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
221 ↗	(On Diff #172657)	I used pop_back since I wanted the last element first. I'll change it to a range loop.

-Updated a test case to include an external function.
-Uploaded the correct version of the recursion test.
-Fixed other small errors.

arsenm added inline comments.Nov 14 2018, 10:04 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
226–228 ↗	(On Diff #172835)	You should just deal with this fixme and add a test for it. This should be as easy as checking Callee->mayBeRedefined?
253 ↗	(On Diff #172835)	It would be cleaner to pull all of this into a function which returns true on changed

aakanksha555 added inline comments.Nov 14 2018, 1:53 PM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
226–228 ↗	(On Diff #172835)	Did you mean Callee->mayBeDerefined() ?

Added a check for external functions and modified the test for it

arsenm added inline comments.Nov 29 2018, 11:34 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
49 ↗	(On Diff #174290)	Remove Sorted from the name now?
235 ↗	(On Diff #174290)	You don't need this, which was the point of putting this into a function
243–244 ↗	(On Diff #174290)	You can just return true directly
254 ↗	(On Diff #174290)	You can just return true
258 ↗	(On Diff #174290)	You can just return true
261–263 ↗	(On Diff #174290)	Can you avoid adding the attribute if not originally present?
226–228 ↗	(On Diff #172835)	I think so? I'm not sure what the difference is from isInterposable. This also should check F.hasAddressTaken.

aakanksha555 added inline comments.Nov 29 2018, 11:55 AM

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
261–263 ↗	(On Diff #174290)	If I don't add the attribute if not originally present, it can create a discrepancy in certain scenarios. For eg. A function is called by two kernels, one without the attribute and the other with uniform-work-group-attribute = true. The function will be set as uniform-work-group-attribute = true, which may not be the correct approach.

-Directly returning "true" from the propagateAttribute function as per the comments.
-Renamed the list to NodeList, dropped "Sorted" from the name

LGTM with the long line fixed

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
234 ↗	(On Diff #175931)	I think this goes over the column limit

This revision is now accepted and ready to land.Dec 10 2018, 12:17 PM

Closed by commit rL348971: [AMDGPU] Support for "uniform-work-group-size" attribute (authored by aakanksha555). · Explain WhyDec 12 2018, 12:54 PM

This revision was automatically updated to reflect the committed changes.

Hi,

This patch breaks RADV (and probably RadeonSI as well). Here's a backtrace of the problem:

$ gdb --args ./deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:

<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./deqp-vk...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/hakzsam/programming/VK-GL-CTS/build/external/vulkancts/modules/vulkan/deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Writing test log into TestResults.qpa
dEQP Core git-12aa347f43c85df3a0daf930739551d3f53d3d48 (0x12aa347f) starting..

target implementation = 'Default'

[New Thread 0x7ffff1968700 (LWP 4723)]
[Thread 0x7ffff1968700 (LWP 4723) exited]
[New Thread 0x7ffff1968700 (LWP 4724)]

Thread 1 "deqp-vk" received signal SIGSEGV, Segmentation fault.
0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
745 *List = this;
(gdb) bt
#0 0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
#1 0x00007ffff572fd87 in llvm::ValueHandleBase::ValueHandleBase (RHS=..., Kind=llvm::ValueHandleBase::WeakTracking, this=0x7fffffffbc50) at ../include/llvm/ADT/PointerIntPair.h:150
#2 llvm::WeakTrackingVH::WeakTrackingVH (RHS=..., this=0x7fffffffbc50) at ../include/llvm/IR/ValueHandle.h:187
#3 std::pair<llvm::WeakTrackingVH, llvm::CallGraphNode*>::pair (this=0x7fffffffbc50) at /usr/include/c++/8.2.1/bits/stl_pair.h:303
#4 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::processUniformWorkGroupAttribute (this=0x555557e87570) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:224
#5 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::runOnSCC (this=<optimized out>, SCC=...) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:355
#6 0x00007ffff51d3d47 in (anonymous namespace)::CGPassManager::RunPassOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CallGraphUpToDate=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., P=

0x555557e87570, this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:141

#7 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:442
#8 (anonymous namespace)::CGPassManager::runOnModule (this=0x555557e87650, M=...) at ../lib/Analysis/CallGraphSCCPass.cpp:498
#9 0x00007ffff4428fba in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=0x555557e65c20) at ../lib/IR/LegacyPassManager.cpp:1744
#10 llvm::legacy::PassManagerImpl::run (this=0x555557e657b0, M=...) at ../lib/IR/LegacyPassManager.cpp:1857
#11 0x00007ffff61fa7a6 in ac_compile_module_to_binary (p=0x555557e65750, module=module@entry=0x555557eb65a0, binary=binary@entry=0x7fffffffc080) at /home/hakzsam/install/llvm/debug/master/include/llvm/IR/Module.h:889
#12 0x00007ffff61b6e2b in radv_llvm_per_thread_info::compile_to_memory_buffer (this=<optimized out>, binary=0x7fffffffc080, module=0x555557eb65a0) at radv_llvm_helper.cpp:97
#13 radv_compile_to_binary (info=info@entry=0x7fffffffc050, module=module@entry=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080)

at radv_llvm_helper.cpp:97

#14 0x00007ffff61b0d81 in ac_llvm_compile (ac_llvm=0x7fffffffc050, binary=0x7fffffffc080, M=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>) at radv_nir_to_llvm.c:3660
#15 ac_compile_llvm_module (ac_llvm=ac_llvm@entry=0x7fffffffc050, llvm_module=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080,

config=0x7fffffffc080, config@entry=0x555557f71778, stage=MESA_SHADER_COMPUTE, options=0x7fffffffc140, shader_info=<optimized out>, shader_info=<optimized out>) at radv_nir_to_llvm.c:3684

#16 0x00007ffff61b65d0 in radv_compile_nir_shader (ac_llvm=ac_llvm@entry=0x7fffffffc050, binary=binary@entry=0x7fffffffc080, config=config@entry=0x555557f71778, shader_info=shader_info@entry=0x555557f717a0,

nir=nir@entry=0x7fffffffc388, nir_count=nir_count@entry=1, options=0x7fffffffc140) at radv_nir_to_llvm.c:3808

#17 0x00007ffff61c56db in shader_variant_create (device=device@entry=0x555557e4d920, module=0x7fffffffcd40, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, stage=MESA_SHADER_COMPUTE,

options=options@entry=0x7fffffffc140, gs_copy_shader=false, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:612

#18 0x00007ffff61c5b04 in radv_shader_variant_create (device=device@entry=0x555557e4d920, module=<optimized out>, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, layout=<optimized out>,

key=key@entry=0x7fffffffc810, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:666

#19 0x00007ffff61b8aa6 in radv_create_shaders (pipeline=0x555557ef5ca0, device=<optimized out>, cache=0x555557e4d998, key=<optimized out>, pStages=<optimized out>, flags=<optimized out>) at radv_pipeline.c:2151
#20 0x00007ffff61bf7eb in radv_compute_pipeline_create (pPipeline=0x555557e4ef70, pAllocator=<optimized out>, pCreateInfo=0x7fffffffcbd0, _cache=<optimized out>, _device=0x555557e4d920) at radv_pipeline.c:3787
#21 radv_CreateComputePipelines (_device=_device@entry=0x555557e4d920, pipelineCache=pipelineCache@entry=0x555557e4d998, count=count@entry=1, pCreateInfos=pCreateInfos@entry=0x7fffffffcbd0, pAllocator=pAllocator@entry=0x0,

pPipelines=pPipelines@entry=0x555557e4ef70) at radv_pipeline.c:3817

#22 0x00007ffff619653a in radv_device_init_meta_itob_state (device=0x555557e4d920) at radv_private.h:1986
#23 radv_device_init_meta_bufimage_state (device=device@entry=0x555557e4d920) at radv_meta_bufimage.c:1489
#24 0x00007ffff6175a4a in radv_device_init_meta (device=device@entry=0x555557e4d920) at radv_meta.c:365
#25 0x00007ffff61680d0 in radv_CreateDevice (physicalDevice=0x555557d7c0e0, pCreateInfo=0x7fffffffd0d0, pAllocator=<optimized out>, pDevice=0x555557d82ec0) at radv_device.c:1702
#26 0x00007ffff640c574 in ?? () from /usr/lib/libvulkan.so.1
#27 0x00007ffff641599b in ?? () from /usr/lib/libvulkan.so.1
#28 0x00007ffff6419b29 in vkCreateDevice () from /usr/lib/libvulkan.so.1
#29 0x0000555556942f7d in vk::createDevice(vk::PlatformInterface const&, vk::VkInstance_s*, vk::InstanceInterface const&, vk::VkPhysicalDevice_s*, vk::VkDeviceCreateInfo const*, vk::VkAllocationCallbacks const*) ()
#30 0x00005555558a3384 in vkt::DefaultDevice::DefaultDevice(vk::PlatformInterface const&, tcu::CommandLine const&) ()
#31 0x00005555558a40e5 in vkt::Context::Context(tcu::TestContext&, vk::PlatformInterface const&, vk::ProgramCollection<vk::ProgramBinary, vk::BinaryBuildOptions>&) ()
#32 0x000055555588c3e2 in vkt::TestCaseExecutor::TestCaseExecutor(tcu::TestContext&) ()
#33 0x000055555588c552 in vkt::TestPackage::createExecutor() const ()
#34 0x0000555556e04964 in tcu::TestSessionExecutor::iterate() ()
#35 0x0000555556dd89a9 in tcu::App::iterate() ()
#36 0x000055555587e4e8 in main ()

Can you look into it?

Thanks!

In D50200#1329491, @hakzsam wrote:
Hi,

This patch breaks RADV (and probably RadeonSI as well). Here's a backtrace of the problem:

$ gdb --args ./deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./deqp-vk...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/hakzsam/programming/VK-GL-CTS/build/external/vulkancts/modules/vulkan/deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Writing test log into TestResults.qpa
dEQP Core git-12aa347f43c85df3a0daf930739551d3f53d3d48 (0x12aa347f) starting..
target implementation = 'Default'
[New Thread 0x7ffff1968700 (LWP 4723)]
[Thread 0x7ffff1968700 (LWP 4723) exited]
[New Thread 0x7ffff1968700 (LWP 4724)]

Thread 1 "deqp-vk" received signal SIGSEGV, Segmentation fault.
0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
745 *List = this;
(gdb) bt
#0 0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
#1 0x00007ffff572fd87 in llvm::ValueHandleBase::ValueHandleBase (RHS=..., Kind=llvm::ValueHandleBase::WeakTracking, this=0x7fffffffbc50) at ../include/llvm/ADT/PointerIntPair.h:150
#2 llvm::WeakTrackingVH::WeakTrackingVH (RHS=..., this=0x7fffffffbc50) at ../include/llvm/IR/ValueHandle.h:187
#3 std::pair<llvm::WeakTrackingVH, llvm::CallGraphNode*>::pair (this=0x7fffffffbc50) at /usr/include/c++/8.2.1/bits/stl_pair.h:303
#4 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::processUniformWorkGroupAttribute (this=0x555557e87570) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:224
#5 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::runOnSCC (this=<optimized out>, SCC=...) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:355
#6 0x00007ffff51d3d47 in (anonymous namespace)::CGPassManager::RunPassOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CallGraphUpToDate=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., P=
0x555557e87570, this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:141
#7 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:442
#8 (anonymous namespace)::CGPassManager::runOnModule (this=0x555557e87650, M=...) at ../lib/Analysis/CallGraphSCCPass.cpp:498
#9 0x00007ffff4428fba in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=0x555557e65c20) at ../lib/IR/LegacyPassManager.cpp:1744
#10 llvm::legacy::PassManagerImpl::run (this=0x555557e657b0, M=...) at ../lib/IR/LegacyPassManager.cpp:1857
#11 0x00007ffff61fa7a6 in ac_compile_module_to_binary (p=0x555557e65750, module=module@entry=0x555557eb65a0, binary=binary@entry=0x7fffffffc080) at /home/hakzsam/install/llvm/debug/master/include/llvm/IR/Module.h:889
#12 0x00007ffff61b6e2b in radv_llvm_per_thread_info::compile_to_memory_buffer (this=<optimized out>, binary=0x7fffffffc080, module=0x555557eb65a0) at radv_llvm_helper.cpp:97
#13 radv_compile_to_binary (info=info@entry=0x7fffffffc050, module=module@entry=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080)
at radv_llvm_helper.cpp:97
#14 0x00007ffff61b0d81 in ac_llvm_compile (ac_llvm=0x7fffffffc050, binary=0x7fffffffc080, M=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>) at radv_nir_to_llvm.c:3660
#15 ac_compile_llvm_module (ac_llvm=ac_llvm@entry=0x7fffffffc050, llvm_module=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080,
config=0x7fffffffc080, config@entry=0x555557f71778, stage=MESA_SHADER_COMPUTE, options=0x7fffffffc140, shader_info=<optimized out>, shader_info=<optimized out>) at radv_nir_to_llvm.c:3684
#16 0x00007ffff61b65d0 in radv_compile_nir_shader (ac_llvm=ac_llvm@entry=0x7fffffffc050, binary=binary@entry=0x7fffffffc080, config=config@entry=0x555557f71778, shader_info=shader_info@entry=0x555557f717a0,
nir=nir@entry=0x7fffffffc388, nir_count=nir_count@entry=1, options=0x7fffffffc140) at radv_nir_to_llvm.c:3808
#17 0x00007ffff61c56db in shader_variant_create (device=device@entry=0x555557e4d920, module=0x7fffffffcd40, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, stage=MESA_SHADER_COMPUTE,
options=options@entry=0x7fffffffc140, gs_copy_shader=false, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:612
#18 0x00007ffff61c5b04 in radv_shader_variant_create (device=device@entry=0x555557e4d920, module=<optimized out>, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, layout=<optimized out>,
key=key@entry=0x7fffffffc810, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:666
#19 0x00007ffff61b8aa6 in radv_create_shaders (pipeline=0x555557ef5ca0, device=<optimized out>, cache=0x555557e4d998, key=<optimized out>, pStages=<optimized out>, flags=<optimized out>) at radv_pipeline.c:2151
#20 0x00007ffff61bf7eb in radv_compute_pipeline_create (pPipeline=0x555557e4ef70, pAllocator=<optimized out>, pCreateInfo=0x7fffffffcbd0, _cache=<optimized out>, _device=0x555557e4d920) at radv_pipeline.c:3787
#21 radv_CreateComputePipelines (_device=_device@entry=0x555557e4d920, pipelineCache=pipelineCache@entry=0x555557e4d998, count=count@entry=1, pCreateInfos=pCreateInfos@entry=0x7fffffffcbd0, pAllocator=pAllocator@entry=0x0,
pPipelines=pPipelines@entry=0x555557e4ef70) at radv_pipeline.c:3817
#22 0x00007ffff619653a in radv_device_init_meta_itob_state (device=0x555557e4d920) at radv_private.h:1986
#23 radv_device_init_meta_bufimage_state (device=device@entry=0x555557e4d920) at radv_meta_bufimage.c:1489
#24 0x00007ffff6175a4a in radv_device_init_meta (device=device@entry=0x555557e4d920) at radv_meta.c:365
#25 0x00007ffff61680d0 in radv_CreateDevice (physicalDevice=0x555557d7c0e0, pCreateInfo=0x7fffffffd0d0, pAllocator=<optimized out>, pDevice=0x555557d82ec0) at radv_device.c:1702
#26 0x00007ffff640c574 in ?? () from /usr/lib/libvulkan.so.1
#27 0x00007ffff641599b in ?? () from /usr/lib/libvulkan.so.1
#28 0x00007ffff6419b29 in vkCreateDevice () from /usr/lib/libvulkan.so.1
#29 0x0000555556942f7d in vk::createDevice(vk::PlatformInterface const&, vk::VkInstance_s*, vk::InstanceInterface const&, vk::VkPhysicalDevice_s*, vk::VkDeviceCreateInfo const*, vk::VkAllocationCallbacks const*) ()
#30 0x00005555558a3384 in vkt::DefaultDevice::DefaultDevice(vk::PlatformInterface const&, tcu::CommandLine const&) ()
#31 0x00005555558a40e5 in vkt::Context::Context(tcu::TestContext&, vk::PlatformInterface const&, vk::ProgramCollection<vk::ProgramBinary, vk::BinaryBuildOptions>&) ()
#32 0x000055555588c3e2 in vkt::TestCaseExecutor::TestCaseExecutor(tcu::TestContext&) ()
#33 0x000055555588c552 in vkt::TestPackage::createExecutor() const ()
#34 0x0000555556e04964 in tcu::TestSessionExecutor::iterate() ()
#35 0x0000555556dd89a9 in tcu::App::iterate() ()
#36 0x000055555587e4e8 in main ()

Can you look into it?

Thanks!

Sorry about that. I have reverted the changes until I fix this.

In D50200#1329491, @hakzsam wrote:
Hi,

This patch breaks RADV (and probably RadeonSI as well). Here's a backtrace of the problem:

$ gdb --args ./deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./deqp-vk...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/hakzsam/programming/VK-GL-CTS/build/external/vulkancts/modules/vulkan/deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Writing test log into TestResults.qpa
dEQP Core git-12aa347f43c85df3a0daf930739551d3f53d3d48 (0x12aa347f) starting..
target implementation = 'Default'
[New Thread 0x7ffff1968700 (LWP 4723)]
[Thread 0x7ffff1968700 (LWP 4723) exited]
[New Thread 0x7ffff1968700 (LWP 4724)]

Thread 1 "deqp-vk" received signal SIGSEGV, Segmentation fault.
0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
745 *List = this;
(gdb) bt
#0 0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
#1 0x00007ffff572fd87 in llvm::ValueHandleBase::ValueHandleBase (RHS=..., Kind=llvm::ValueHandleBase::WeakTracking, this=0x7fffffffbc50) at ../include/llvm/ADT/PointerIntPair.h:150
#2 llvm::WeakTrackingVH::WeakTrackingVH (RHS=..., this=0x7fffffffbc50) at ../include/llvm/IR/ValueHandle.h:187
#3 std::pair<llvm::WeakTrackingVH, llvm::CallGraphNode*>::pair (this=0x7fffffffbc50) at /usr/include/c++/8.2.1/bits/stl_pair.h:303
#4 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::processUniformWorkGroupAttribute (this=0x555557e87570) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:224
#5 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::runOnSCC (this=<optimized out>, SCC=...) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:355
#6 0x00007ffff51d3d47 in (anonymous namespace)::CGPassManager::RunPassOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CallGraphUpToDate=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., P=
0x555557e87570, this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:141
#7 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:442
#8 (anonymous namespace)::CGPassManager::runOnModule (this=0x555557e87650, M=...) at ../lib/Analysis/CallGraphSCCPass.cpp:498
#9 0x00007ffff4428fba in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=0x555557e65c20) at ../lib/IR/LegacyPassManager.cpp:1744
#10 llvm::legacy::PassManagerImpl::run (this=0x555557e657b0, M=...) at ../lib/IR/LegacyPassManager.cpp:1857
#11 0x00007ffff61fa7a6 in ac_compile_module_to_binary (p=0x555557e65750, module=module@entry=0x555557eb65a0, binary=binary@entry=0x7fffffffc080) at /home/hakzsam/install/llvm/debug/master/include/llvm/IR/Module.h:889
#12 0x00007ffff61b6e2b in radv_llvm_per_thread_info::compile_to_memory_buffer (this=<optimized out>, binary=0x7fffffffc080, module=0x555557eb65a0) at radv_llvm_helper.cpp:97
#13 radv_compile_to_binary (info=info@entry=0x7fffffffc050, module=module@entry=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080)
at radv_llvm_helper.cpp:97
#14 0x00007ffff61b0d81 in ac_llvm_compile (ac_llvm=0x7fffffffc050, binary=0x7fffffffc080, M=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>) at radv_nir_to_llvm.c:3660
#15 ac_compile_llvm_module (ac_llvm=ac_llvm@entry=0x7fffffffc050, llvm_module=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080,
config=0x7fffffffc080, config@entry=0x555557f71778, stage=MESA_SHADER_COMPUTE, options=0x7fffffffc140, shader_info=<optimized out>, shader_info=<optimized out>) at radv_nir_to_llvm.c:3684
#16 0x00007ffff61b65d0 in radv_compile_nir_shader (ac_llvm=ac_llvm@entry=0x7fffffffc050, binary=binary@entry=0x7fffffffc080, config=config@entry=0x555557f71778, shader_info=shader_info@entry=0x555557f717a0,
nir=nir@entry=0x7fffffffc388, nir_count=nir_count@entry=1, options=0x7fffffffc140) at radv_nir_to_llvm.c:3808
#17 0x00007ffff61c56db in shader_variant_create (device=device@entry=0x555557e4d920, module=0x7fffffffcd40, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, stage=MESA_SHADER_COMPUTE,
options=options@entry=0x7fffffffc140, gs_copy_shader=false, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:612
#18 0x00007ffff61c5b04 in radv_shader_variant_create (device=device@entry=0x555557e4d920, module=<optimized out>, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, layout=<optimized out>,
key=key@entry=0x7fffffffc810, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:666
#19 0x00007ffff61b8aa6 in radv_create_shaders (pipeline=0x555557ef5ca0, device=<optimized out>, cache=0x555557e4d998, key=<optimized out>, pStages=<optimized out>, flags=<optimized out>) at radv_pipeline.c:2151
#20 0x00007ffff61bf7eb in radv_compute_pipeline_create (pPipeline=0x555557e4ef70, pAllocator=<optimized out>, pCreateInfo=0x7fffffffcbd0, _cache=<optimized out>, _device=0x555557e4d920) at radv_pipeline.c:3787
#21 radv_CreateComputePipelines (_device=_device@entry=0x555557e4d920, pipelineCache=pipelineCache@entry=0x555557e4d998, count=count@entry=1, pCreateInfos=pCreateInfos@entry=0x7fffffffcbd0, pAllocator=pAllocator@entry=0x0,
pPipelines=pPipelines@entry=0x555557e4ef70) at radv_pipeline.c:3817
#22 0x00007ffff619653a in radv_device_init_meta_itob_state (device=0x555557e4d920) at radv_private.h:1986
#23 radv_device_init_meta_bufimage_state (device=device@entry=0x555557e4d920) at radv_meta_bufimage.c:1489
#24 0x00007ffff6175a4a in radv_device_init_meta (device=device@entry=0x555557e4d920) at radv_meta.c:365
#25 0x00007ffff61680d0 in radv_CreateDevice (physicalDevice=0x555557d7c0e0, pCreateInfo=0x7fffffffd0d0, pAllocator=<optimized out>, pDevice=0x555557d82ec0) at radv_device.c:1702
#26 0x00007ffff640c574 in ?? () from /usr/lib/libvulkan.so.1
#27 0x00007ffff641599b in ?? () from /usr/lib/libvulkan.so.1
#28 0x00007ffff6419b29 in vkCreateDevice () from /usr/lib/libvulkan.so.1
#29 0x0000555556942f7d in vk::createDevice(vk::PlatformInterface const&, vk::VkInstance_s*, vk::InstanceInterface const&, vk::VkPhysicalDevice_s*, vk::VkDeviceCreateInfo const*, vk::VkAllocationCallbacks const*) ()
#30 0x00005555558a3384 in vkt::DefaultDevice::DefaultDevice(vk::PlatformInterface const&, tcu::CommandLine const&) ()
#31 0x00005555558a40e5 in vkt::Context::Context(tcu::TestContext&, vk::PlatformInterface const&, vk::ProgramCollection<vk::ProgramBinary, vk::BinaryBuildOptions>&) ()
#32 0x000055555588c3e2 in vkt::TestCaseExecutor::TestCaseExecutor(tcu::TestContext&) ()
#33 0x000055555588c552 in vkt::TestPackage::createExecutor() const ()
#34 0x0000555556e04964 in tcu::TestSessionExecutor::iterate() ()
#35 0x0000555556dd89a9 in tcu::App::iterate() ()
#36 0x000055555587e4e8 in main ()

Can you look into it?

Thanks!

Would you be able to provide LLVM IR? I am trying to reproduce this issue locally but it is taking some time as I haven't worked with RADV yet.

Thanks.

Unfortunately, I can't get any LLVM IR because it crashes too early in the process. You can also reproduce the problem with RadeonSI btw.

Thanks for the revert!

Added a fix for the failing deqp-vk test.

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Handle "uniform-work-group-size" attribute
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 158825

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

test/CodeGen/AMDGPU/amdgpu-codegenprepare-uniform-work-group-size.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Handle "uniform-work-group-size" attributeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 158825

lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

test/CodeGen/AMDGPU/amdgpu-codegenprepare-uniform-work-group-size.ll

AMDGPU: Handle "uniform-work-group-size" attribute
ClosedPublic