This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Handle "uniform-work-group-size" attribute
ClosedPublic

Authored by aakanksha555 on Aug 2 2018, 1:26 PM.

Details

Summary

Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute. This attribute propagates top down from the callers till the callees. Maintained a list in the increasing order of uses (caller to callee). Propagated the uniform-work-group attribute from called functions for each node to the callees. Any functions which do not have the attribute will be added this attribute with a "false" value after this pass.

Diff Detail

Event Timeline

aakanksha555 created this revision.Aug 2 2018, 1:26 PM
arsenm requested changes to this revision.Aug 2 2018, 1:29 PM
arsenm added inline comments.
lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
807

We cannot rely on function names, or introduce functions such as this. This is also not necessary for optimizing this

This revision now requires changes to proceed.Aug 2 2018, 1:29 PM
arsenm added a comment.Aug 2 2018, 1:35 PM

A more correct way to optimize this would be to have a CallGraphSCC pass that propagates the uniform-work-group-size attribute to callees only reachable from kernels with uniform-work-group-size

aakanksha555 edited the summary of this revision. (Show Details)

Added the tests.

arsenm added inline comments.Sep 10 2018, 8:49 PM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
241 ↗(On Diff #164518)

OS check isn't necessary

244–245 ↗(On Diff #164518)

I don't see how this prevents propagating the attribute if other callers do not have it

247–249 ↗(On Diff #164518)

Should only be looking for == "true"

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
671 ↗(On Diff #164518)

Extra space at end

756 ↗(On Diff #164518)

Delete

test/CodeGen/AMDGPU/uniform-workgroup-test1.ll
1 ↗(On Diff #164518)

Tests are missing run lines

25–30 ↗(On Diff #164518)

This includes attributes added by the pass

aakanksha555 marked 4 inline comments as done.Sep 11 2018, 10:00 AM
aakanksha555 added inline comments.
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
244–245 ↗(On Diff #164518)

By other callers, do you mean other kernel function?
AttrNames[] does not include "uniform-work-group-size" in the list so it wouldn't get copied to other kernel functions form the callee function.

aakanksha555 marked 2 inline comments as not done.
aakanksha555 edited the summary of this revision. (Show Details)

Added support to ensure the attribute propagates from the caller to the function even within nested function calls.

aakanksha555 marked an inline comment as done.

Updated Test5

Updated the patch to prevent propagating the attribute if other callers do not have it.
Updated Test 5 to show the update.

Found 2 failing tests. Updated code to fix them.

I think you need to split this into a separate loop before the propagate attributes function instead of adding the recursive call at the same time. This is different from the other attributes because it is inferred top down. You should have a first loop over the CallGraphSCC that adds this. Since the CallGraphSCC should have all of the nodes reachable from each other, this should be some set building / checks from there. You shouldn't need to be looking at the instructions inside the functions and looking for specific call sites

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
227–228 ↗(On Diff #166333)

You don't need to check if it has the attribute and check the value == false. Just check the value == true

238 ↗(On Diff #166333)

This shouldn't use recursion. This will fail when there's a recursive call. It also shouldn't be necessary, because callees should be called before callers in an SCC pass

test/CodeGen/AMDGPU/uniform-work-group-test1.ll
1 ↗(On Diff #166333)

GCN check prefix is usually used for ISA check lines. It doesn't really matter, but would require more changes if for some reason in the future an llc run line were added

4–5 ↗(On Diff #166333)

Functions and attribute group variables could use better names (as well as the test file). It would also help to add a comment at the top of each test file for what case this is supposed to be

test/CodeGen/AMDGPU/uniform-work-group-test5.ll
1 ↗(On Diff #166333)

Triple is broken

23 ↗(On Diff #166333)

This is accepted by the IR parser? I would remove all of these empty attribute groups

aakanksha555 accepted this revision.Oct 30 2018, 1:04 PM
aakanksha555 marked 5 inline comments as done.
aakanksha555 edited the summary of this revision. (Show Details)
arsenm added inline comments.Oct 30 2018, 2:24 PM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
91–93 ↗(On Diff #171769)

No globals

229 ↗(On Diff #171769)

I don't understand the point of anything this function is doing. You're just copying the same information that's already present in the SCC into a slightly different structure

232 ↗(On Diff #171769)

Range loop

235 ↗(On Diff #171769)

This leaks and should also be unnecessary

242 ↗(On Diff #171769)

std::make_pair

251 ↗(On Diff #171769)

Typo propagte

372 ↗(On Diff #171769)

Comment should be capitalized, have a space after // and be punctuated

test/CodeGen/AMDGPU/uniform-work-group-nested-function-calls.ll
21 ↗(On Diff #171769)

Should also test a recursive loop

aakanksha555 marked 8 inline comments as done.Nov 5 2018, 2:29 PM
aakanksha555 edited the summary of this revision. (Show Details)

Got rid of extra structures and added a recursive test.

arsenm added inline comments.Nov 6 2018, 9:10 AM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
49 ↗(On Diff #172657)

llvm:: unnecessary

189 ↗(On Diff #172657)

Random whitespace change

220 ↗(On Diff #172657)

Space before (

221 ↗(On Diff #172657)

You aren't using this as a stack, so this is weird. You could just do a range loop on reverse(SortedNodeList)

338 ↗(On Diff #172657)

Comment is misleading since you aren't really sorting it. Also still not clear why the number of uses matter. Is it just to find unused functions?

351 ↗(On Diff #172657)

Extra whitespace change

test/CodeGen/AMDGPU/uniform-work-group-resursion-test.ll
1 ↗(On Diff #172657)

Missing amdhsa in triple

Testname spelling resursion

7 ↗(On Diff #172657)

Better check variable name

test/CodeGen/AMDGPU/uniform-work-group-test.ll
33 ↗(On Diff #172657)

A testcase with the attribute on an external function might also be useful

aakanksha555 added inline comments.Nov 6 2018, 9:26 AM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
338 ↗(On Diff #172657)

I didn't need to explicitly sort it as push_back is doing it for me. The result is a list ranging from the most number of uses to the least.
The uses help to identify Callers and the Callees. For eg. kernel functions which are only calling other functions have just one use.
And that is where we need to start the attribute propagation from. This ensures that attributes get propagated through nested function calls.

aakanksha555 added inline comments.Nov 6 2018, 9:29 AM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
221 ↗(On Diff #172657)

I used pop_back since I wanted the last element first. I'll change it to a range loop.

aakanksha555 marked 6 inline comments as done.

-Updated a test case to include an external function.
-Uploaded the correct version of the recursion test.
-Fixed other small errors.

arsenm added inline comments.Nov 14 2018, 10:04 AM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
226–228 ↗(On Diff #172835)

You should just deal with this fixme and add a test for it. This should be as easy as checking Callee->mayBeRedefined?

253 ↗(On Diff #172835)

It would be cleaner to pull all of this into a function which returns true on changed

aakanksha555 added inline comments.Nov 14 2018, 1:53 PM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
226–228 ↗(On Diff #172835)

Did you mean Callee->mayBeDerefined() ?

aakanksha555 marked 2 inline comments as done.

Added a check for external functions and modified the test for it

arsenm added inline comments.Nov 29 2018, 11:34 AM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
49 ↗(On Diff #174290)

Remove Sorted from the name now?

235 ↗(On Diff #174290)

You don't need this, which was the point of putting this into a function

243–244 ↗(On Diff #174290)

You can just return true directly

254 ↗(On Diff #174290)

You can just return true

258 ↗(On Diff #174290)

You can just return true

261–263 ↗(On Diff #174290)

Can you avoid adding the attribute if not originally present?

226–228 ↗(On Diff #172835)

I think so? I'm not sure what the difference is from isInterposable. This also should check F.hasAddressTaken.

aakanksha555 added inline comments.Nov 29 2018, 11:55 AM
lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
261–263 ↗(On Diff #174290)

If I don't add the attribute if not originally present, it can create a discrepancy in certain scenarios.
For eg. A function is called by two kernels, one without the attribute and the other with uniform-work-group-attribute = true. The function will be set as uniform-work-group-attribute = true, which may not be the correct approach.

aakanksha555 marked 5 inline comments as done.

-Directly returning "true" from the propagateAttribute function as per the comments.
-Renamed the list to NodeList, dropped "Sorted" from the name

arsenm accepted this revision.Dec 10 2018, 12:17 PM

LGTM with the long line fixed

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp
234 ↗(On Diff #175931)

I think this goes over the column limit

This revision is now accepted and ready to land.Dec 10 2018, 12:17 PM
This revision was automatically updated to reflect the committed changes.

Hi,

This patch breaks RADV (and probably RadeonSI as well). Here's a backtrace of the problem:

$ gdb --args ./deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:

<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./deqp-vk...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/hakzsam/programming/VK-GL-CTS/build/external/vulkancts/modules/vulkan/deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Writing test log into TestResults.qpa
dEQP Core git-12aa347f43c85df3a0daf930739551d3f53d3d48 (0x12aa347f) starting..

target implementation = 'Default'

[New Thread 0x7ffff1968700 (LWP 4723)]
[Thread 0x7ffff1968700 (LWP 4723) exited]
[New Thread 0x7ffff1968700 (LWP 4724)]

Thread 1 "deqp-vk" received signal SIGSEGV, Segmentation fault.
0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
745 *List = this;
(gdb) bt
#0 0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
#1 0x00007ffff572fd87 in llvm::ValueHandleBase::ValueHandleBase (RHS=..., Kind=llvm::ValueHandleBase::WeakTracking, this=0x7fffffffbc50) at ../include/llvm/ADT/PointerIntPair.h:150
#2 llvm::WeakTrackingVH::WeakTrackingVH (RHS=..., this=0x7fffffffbc50) at ../include/llvm/IR/ValueHandle.h:187
#3 std::pair<llvm::WeakTrackingVH, llvm::CallGraphNode*>::pair (this=0x7fffffffbc50) at /usr/include/c++/8.2.1/bits/stl_pair.h:303
#4 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::processUniformWorkGroupAttribute (this=0x555557e87570) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:224
#5 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::runOnSCC (this=<optimized out>, SCC=...) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:355
#6 0x00007ffff51d3d47 in (anonymous namespace)::CGPassManager::RunPassOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CallGraphUpToDate=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., P=

0x555557e87570, this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:141

#7 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:442
#8 (anonymous namespace)::CGPassManager::runOnModule (this=0x555557e87650, M=...) at ../lib/Analysis/CallGraphSCCPass.cpp:498
#9 0x00007ffff4428fba in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=0x555557e65c20) at ../lib/IR/LegacyPassManager.cpp:1744
#10 llvm::legacy::PassManagerImpl::run (this=0x555557e657b0, M=...) at ../lib/IR/LegacyPassManager.cpp:1857
#11 0x00007ffff61fa7a6 in ac_compile_module_to_binary (p=0x555557e65750, module=module@entry=0x555557eb65a0, binary=binary@entry=0x7fffffffc080) at /home/hakzsam/install/llvm/debug/master/include/llvm/IR/Module.h:889
#12 0x00007ffff61b6e2b in radv_llvm_per_thread_info::compile_to_memory_buffer (this=<optimized out>, binary=0x7fffffffc080, module=0x555557eb65a0) at radv_llvm_helper.cpp:97
#13 radv_compile_to_binary (info=info@entry=0x7fffffffc050, module=module@entry=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080)

at radv_llvm_helper.cpp:97

#14 0x00007ffff61b0d81 in ac_llvm_compile (ac_llvm=0x7fffffffc050, binary=0x7fffffffc080, M=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>) at radv_nir_to_llvm.c:3660
#15 ac_compile_llvm_module (ac_llvm=ac_llvm@entry=0x7fffffffc050, llvm_module=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080,

config=0x7fffffffc080, config@entry=0x555557f71778, stage=MESA_SHADER_COMPUTE, options=0x7fffffffc140, shader_info=<optimized out>, shader_info=<optimized out>) at radv_nir_to_llvm.c:3684

#16 0x00007ffff61b65d0 in radv_compile_nir_shader (ac_llvm=ac_llvm@entry=0x7fffffffc050, binary=binary@entry=0x7fffffffc080, config=config@entry=0x555557f71778, shader_info=shader_info@entry=0x555557f717a0,

nir=nir@entry=0x7fffffffc388, nir_count=nir_count@entry=1, options=0x7fffffffc140) at radv_nir_to_llvm.c:3808

#17 0x00007ffff61c56db in shader_variant_create (device=device@entry=0x555557e4d920, module=0x7fffffffcd40, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, stage=MESA_SHADER_COMPUTE,

options=options@entry=0x7fffffffc140, gs_copy_shader=false, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:612

#18 0x00007ffff61c5b04 in radv_shader_variant_create (device=device@entry=0x555557e4d920, module=<optimized out>, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, layout=<optimized out>,

key=key@entry=0x7fffffffc810, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:666

#19 0x00007ffff61b8aa6 in radv_create_shaders (pipeline=0x555557ef5ca0, device=<optimized out>, cache=0x555557e4d998, key=<optimized out>, pStages=<optimized out>, flags=<optimized out>) at radv_pipeline.c:2151
#20 0x00007ffff61bf7eb in radv_compute_pipeline_create (pPipeline=0x555557e4ef70, pAllocator=<optimized out>, pCreateInfo=0x7fffffffcbd0, _cache=<optimized out>, _device=0x555557e4d920) at radv_pipeline.c:3787
#21 radv_CreateComputePipelines (_device=_device@entry=0x555557e4d920, pipelineCache=pipelineCache@entry=0x555557e4d998, count=count@entry=1, pCreateInfos=pCreateInfos@entry=0x7fffffffcbd0, pAllocator=pAllocator@entry=0x0,

pPipelines=pPipelines@entry=0x555557e4ef70) at radv_pipeline.c:3817

#22 0x00007ffff619653a in radv_device_init_meta_itob_state (device=0x555557e4d920) at radv_private.h:1986
#23 radv_device_init_meta_bufimage_state (device=device@entry=0x555557e4d920) at radv_meta_bufimage.c:1489
#24 0x00007ffff6175a4a in radv_device_init_meta (device=device@entry=0x555557e4d920) at radv_meta.c:365
#25 0x00007ffff61680d0 in radv_CreateDevice (physicalDevice=0x555557d7c0e0, pCreateInfo=0x7fffffffd0d0, pAllocator=<optimized out>, pDevice=0x555557d82ec0) at radv_device.c:1702
#26 0x00007ffff640c574 in ?? () from /usr/lib/libvulkan.so.1
#27 0x00007ffff641599b in ?? () from /usr/lib/libvulkan.so.1
#28 0x00007ffff6419b29 in vkCreateDevice () from /usr/lib/libvulkan.so.1
#29 0x0000555556942f7d in vk::createDevice(vk::PlatformInterface const&, vk::VkInstance_s*, vk::InstanceInterface const&, vk::VkPhysicalDevice_s*, vk::VkDeviceCreateInfo const*, vk::VkAllocationCallbacks const*) ()
#30 0x00005555558a3384 in vkt::DefaultDevice::DefaultDevice(vk::PlatformInterface const&, tcu::CommandLine const&) ()
#31 0x00005555558a40e5 in vkt::Context::Context(tcu::TestContext&, vk::PlatformInterface const&, vk::ProgramCollection<vk::ProgramBinary, vk::BinaryBuildOptions>&) ()
#32 0x000055555588c3e2 in vkt::TestCaseExecutor::TestCaseExecutor(tcu::TestContext&) ()
#33 0x000055555588c552 in vkt::TestPackage::createExecutor() const ()
#34 0x0000555556e04964 in tcu::TestSessionExecutor::iterate() ()
#35 0x0000555556dd89a9 in tcu::App::iterate() ()
#36 0x000055555587e4e8 in main ()

Can you look into it?

Thanks!

Hi,

This patch breaks RADV (and probably RadeonSI as well). Here's a backtrace of the problem:

$ gdb --args ./deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:

<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./deqp-vk...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/hakzsam/programming/VK-GL-CTS/build/external/vulkancts/modules/vulkan/deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Writing test log into TestResults.qpa
dEQP Core git-12aa347f43c85df3a0daf930739551d3f53d3d48 (0x12aa347f) starting..

target implementation = 'Default'

[New Thread 0x7ffff1968700 (LWP 4723)]
[Thread 0x7ffff1968700 (LWP 4723) exited]
[New Thread 0x7ffff1968700 (LWP 4724)]

Thread 1 "deqp-vk" received signal SIGSEGV, Segmentation fault.
0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
745 *List = this;
(gdb) bt
#0 0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
#1 0x00007ffff572fd87 in llvm::ValueHandleBase::ValueHandleBase (RHS=..., Kind=llvm::ValueHandleBase::WeakTracking, this=0x7fffffffbc50) at ../include/llvm/ADT/PointerIntPair.h:150
#2 llvm::WeakTrackingVH::WeakTrackingVH (RHS=..., this=0x7fffffffbc50) at ../include/llvm/IR/ValueHandle.h:187
#3 std::pair<llvm::WeakTrackingVH, llvm::CallGraphNode*>::pair (this=0x7fffffffbc50) at /usr/include/c++/8.2.1/bits/stl_pair.h:303
#4 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::processUniformWorkGroupAttribute (this=0x555557e87570) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:224
#5 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::runOnSCC (this=<optimized out>, SCC=...) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:355
#6 0x00007ffff51d3d47 in (anonymous namespace)::CGPassManager::RunPassOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CallGraphUpToDate=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., P=

0x555557e87570, this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:141

#7 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:442
#8 (anonymous namespace)::CGPassManager::runOnModule (this=0x555557e87650, M=...) at ../lib/Analysis/CallGraphSCCPass.cpp:498
#9 0x00007ffff4428fba in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=0x555557e65c20) at ../lib/IR/LegacyPassManager.cpp:1744
#10 llvm::legacy::PassManagerImpl::run (this=0x555557e657b0, M=...) at ../lib/IR/LegacyPassManager.cpp:1857
#11 0x00007ffff61fa7a6 in ac_compile_module_to_binary (p=0x555557e65750, module=module@entry=0x555557eb65a0, binary=binary@entry=0x7fffffffc080) at /home/hakzsam/install/llvm/debug/master/include/llvm/IR/Module.h:889
#12 0x00007ffff61b6e2b in radv_llvm_per_thread_info::compile_to_memory_buffer (this=<optimized out>, binary=0x7fffffffc080, module=0x555557eb65a0) at radv_llvm_helper.cpp:97
#13 radv_compile_to_binary (info=info@entry=0x7fffffffc050, module=module@entry=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080)

at radv_llvm_helper.cpp:97

#14 0x00007ffff61b0d81 in ac_llvm_compile (ac_llvm=0x7fffffffc050, binary=0x7fffffffc080, M=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>) at radv_nir_to_llvm.c:3660
#15 ac_compile_llvm_module (ac_llvm=ac_llvm@entry=0x7fffffffc050, llvm_module=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080,

config=0x7fffffffc080, config@entry=0x555557f71778, stage=MESA_SHADER_COMPUTE, options=0x7fffffffc140, shader_info=<optimized out>, shader_info=<optimized out>) at radv_nir_to_llvm.c:3684

#16 0x00007ffff61b65d0 in radv_compile_nir_shader (ac_llvm=ac_llvm@entry=0x7fffffffc050, binary=binary@entry=0x7fffffffc080, config=config@entry=0x555557f71778, shader_info=shader_info@entry=0x555557f717a0,

nir=nir@entry=0x7fffffffc388, nir_count=nir_count@entry=1, options=0x7fffffffc140) at radv_nir_to_llvm.c:3808

#17 0x00007ffff61c56db in shader_variant_create (device=device@entry=0x555557e4d920, module=0x7fffffffcd40, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, stage=MESA_SHADER_COMPUTE,

options=options@entry=0x7fffffffc140, gs_copy_shader=false, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:612

#18 0x00007ffff61c5b04 in radv_shader_variant_create (device=device@entry=0x555557e4d920, module=<optimized out>, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, layout=<optimized out>,

key=key@entry=0x7fffffffc810, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:666

#19 0x00007ffff61b8aa6 in radv_create_shaders (pipeline=0x555557ef5ca0, device=<optimized out>, cache=0x555557e4d998, key=<optimized out>, pStages=<optimized out>, flags=<optimized out>) at radv_pipeline.c:2151
#20 0x00007ffff61bf7eb in radv_compute_pipeline_create (pPipeline=0x555557e4ef70, pAllocator=<optimized out>, pCreateInfo=0x7fffffffcbd0, _cache=<optimized out>, _device=0x555557e4d920) at radv_pipeline.c:3787
#21 radv_CreateComputePipelines (_device=_device@entry=0x555557e4d920, pipelineCache=pipelineCache@entry=0x555557e4d998, count=count@entry=1, pCreateInfos=pCreateInfos@entry=0x7fffffffcbd0, pAllocator=pAllocator@entry=0x0,

pPipelines=pPipelines@entry=0x555557e4ef70) at radv_pipeline.c:3817

#22 0x00007ffff619653a in radv_device_init_meta_itob_state (device=0x555557e4d920) at radv_private.h:1986
#23 radv_device_init_meta_bufimage_state (device=device@entry=0x555557e4d920) at radv_meta_bufimage.c:1489
#24 0x00007ffff6175a4a in radv_device_init_meta (device=device@entry=0x555557e4d920) at radv_meta.c:365
#25 0x00007ffff61680d0 in radv_CreateDevice (physicalDevice=0x555557d7c0e0, pCreateInfo=0x7fffffffd0d0, pAllocator=<optimized out>, pDevice=0x555557d82ec0) at radv_device.c:1702
#26 0x00007ffff640c574 in ?? () from /usr/lib/libvulkan.so.1
#27 0x00007ffff641599b in ?? () from /usr/lib/libvulkan.so.1
#28 0x00007ffff6419b29 in vkCreateDevice () from /usr/lib/libvulkan.so.1
#29 0x0000555556942f7d in vk::createDevice(vk::PlatformInterface const&, vk::VkInstance_s*, vk::InstanceInterface const&, vk::VkPhysicalDevice_s*, vk::VkDeviceCreateInfo const*, vk::VkAllocationCallbacks const*) ()
#30 0x00005555558a3384 in vkt::DefaultDevice::DefaultDevice(vk::PlatformInterface const&, tcu::CommandLine const&) ()
#31 0x00005555558a40e5 in vkt::Context::Context(tcu::TestContext&, vk::PlatformInterface const&, vk::ProgramCollection<vk::ProgramBinary, vk::BinaryBuildOptions>&) ()
#32 0x000055555588c3e2 in vkt::TestCaseExecutor::TestCaseExecutor(tcu::TestContext&) ()
#33 0x000055555588c552 in vkt::TestPackage::createExecutor() const ()
#34 0x0000555556e04964 in tcu::TestSessionExecutor::iterate() ()
#35 0x0000555556dd89a9 in tcu::App::iterate() ()
#36 0x000055555587e4e8 in main ()

Can you look into it?

Thanks!

Sorry about that. I have reverted the changes until I fix this.

Hi,

This patch breaks RADV (and probably RadeonSI as well). Here's a backtrace of the problem:

$ gdb --args ./deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:

<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./deqp-vk...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/hakzsam/programming/VK-GL-CTS/build/external/vulkancts/modules/vulkan/deqp-vk --deqp-case=dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Writing test log into TestResults.qpa
dEQP Core git-12aa347f43c85df3a0daf930739551d3f53d3d48 (0x12aa347f) starting..

target implementation = 'Default'

[New Thread 0x7ffff1968700 (LWP 4723)]
[Thread 0x7ffff1968700 (LWP 4723) exited]
[New Thread 0x7ffff1968700 (LWP 4724)]

Thread 1 "deqp-vk" received signal SIGSEGV, Segmentation fault.
0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
745 *List = this;
(gdb) bt
#0 0x00007ffff447a81a in llvm::ValueHandleBase::AddToExistingUseList (this=this@entry=0x7fffffffbc50, List=0x7ffff5fdab80 <vtable for llvm::AAResults::Model<llvm::ScopedNoAliasAAResult>+16>) at ../lib/IR/Value.cpp:745
#1 0x00007ffff572fd87 in llvm::ValueHandleBase::ValueHandleBase (RHS=..., Kind=llvm::ValueHandleBase::WeakTracking, this=0x7fffffffbc50) at ../include/llvm/ADT/PointerIntPair.h:150
#2 llvm::WeakTrackingVH::WeakTrackingVH (RHS=..., this=0x7fffffffbc50) at ../include/llvm/IR/ValueHandle.h:187
#3 std::pair<llvm::WeakTrackingVH, llvm::CallGraphNode*>::pair (this=0x7fffffffbc50) at /usr/include/c++/8.2.1/bits/stl_pair.h:303
#4 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::processUniformWorkGroupAttribute (this=0x555557e87570) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:224
#5 (anonymous namespace)::AMDGPUAnnotateKernelFeatures::runOnSCC (this=<optimized out>, SCC=...) at ../lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:355
#6 0x00007ffff51d3d47 in (anonymous namespace)::CGPassManager::RunPassOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CallGraphUpToDate=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., P=

0x555557e87570, this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:141

#7 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (DevirtualizedCall=<synthetic pointer>: <optimized out>, CG=..., CurSCC=..., this=0x555557e87650) at ../lib/Analysis/CallGraphSCCPass.cpp:442
#8 (anonymous namespace)::CGPassManager::runOnModule (this=0x555557e87650, M=...) at ../lib/Analysis/CallGraphSCCPass.cpp:498
#9 0x00007ffff4428fba in (anonymous namespace)::MPPassManager::runOnModule (M=..., this=0x555557e65c20) at ../lib/IR/LegacyPassManager.cpp:1744
#10 llvm::legacy::PassManagerImpl::run (this=0x555557e657b0, M=...) at ../lib/IR/LegacyPassManager.cpp:1857
#11 0x00007ffff61fa7a6 in ac_compile_module_to_binary (p=0x555557e65750, module=module@entry=0x555557eb65a0, binary=binary@entry=0x7fffffffc080) at /home/hakzsam/install/llvm/debug/master/include/llvm/IR/Module.h:889
#12 0x00007ffff61b6e2b in radv_llvm_per_thread_info::compile_to_memory_buffer (this=<optimized out>, binary=0x7fffffffc080, module=0x555557eb65a0) at radv_llvm_helper.cpp:97
#13 radv_compile_to_binary (info=info@entry=0x7fffffffc050, module=module@entry=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080)

at radv_llvm_helper.cpp:97

#14 0x00007ffff61b0d81 in ac_llvm_compile (ac_llvm=0x7fffffffc050, binary=0x7fffffffc080, M=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>) at radv_nir_to_llvm.c:3660
#15 ac_compile_llvm_module (ac_llvm=ac_llvm@entry=0x7fffffffc050, llvm_module=0x7ffff61fa7a6 <ac_compile_module_to_binary(ac_compiler_passes*, LLVMModuleRef, ac_shader_binary*)+22>, binary=binary@entry=0x7fffffffc080,

config=0x7fffffffc080, config@entry=0x555557f71778, stage=MESA_SHADER_COMPUTE, options=0x7fffffffc140, shader_info=<optimized out>, shader_info=<optimized out>) at radv_nir_to_llvm.c:3684

#16 0x00007ffff61b65d0 in radv_compile_nir_shader (ac_llvm=ac_llvm@entry=0x7fffffffc050, binary=binary@entry=0x7fffffffc080, config=config@entry=0x555557f71778, shader_info=shader_info@entry=0x555557f717a0,

nir=nir@entry=0x7fffffffc388, nir_count=nir_count@entry=1, options=0x7fffffffc140) at radv_nir_to_llvm.c:3808

#17 0x00007ffff61c56db in shader_variant_create (device=device@entry=0x555557e4d920, module=0x7fffffffcd40, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, stage=MESA_SHADER_COMPUTE,

options=options@entry=0x7fffffffc140, gs_copy_shader=false, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:612

#18 0x00007ffff61c5b04 in radv_shader_variant_create (device=device@entry=0x555557e4d920, module=<optimized out>, shaders=shaders@entry=0x7fffffffc388, shader_count=shader_count@entry=1, layout=<optimized out>,

key=key@entry=0x7fffffffc810, code_out=0x7fffffffc3b8, code_size_out=0x7fffffffc2e4) at radv_shader.c:666

#19 0x00007ffff61b8aa6 in radv_create_shaders (pipeline=0x555557ef5ca0, device=<optimized out>, cache=0x555557e4d998, key=<optimized out>, pStages=<optimized out>, flags=<optimized out>) at radv_pipeline.c:2151
#20 0x00007ffff61bf7eb in radv_compute_pipeline_create (pPipeline=0x555557e4ef70, pAllocator=<optimized out>, pCreateInfo=0x7fffffffcbd0, _cache=<optimized out>, _device=0x555557e4d920) at radv_pipeline.c:3787
#21 radv_CreateComputePipelines (_device=_device@entry=0x555557e4d920, pipelineCache=pipelineCache@entry=0x555557e4d998, count=count@entry=1, pCreateInfos=pCreateInfos@entry=0x7fffffffcbd0, pAllocator=pAllocator@entry=0x0,

pPipelines=pPipelines@entry=0x555557e4ef70) at radv_pipeline.c:3817

#22 0x00007ffff619653a in radv_device_init_meta_itob_state (device=0x555557e4d920) at radv_private.h:1986
#23 radv_device_init_meta_bufimage_state (device=device@entry=0x555557e4d920) at radv_meta_bufimage.c:1489
#24 0x00007ffff6175a4a in radv_device_init_meta (device=device@entry=0x555557e4d920) at radv_meta.c:365
#25 0x00007ffff61680d0 in radv_CreateDevice (physicalDevice=0x555557d7c0e0, pCreateInfo=0x7fffffffd0d0, pAllocator=<optimized out>, pDevice=0x555557d82ec0) at radv_device.c:1702
#26 0x00007ffff640c574 in ?? () from /usr/lib/libvulkan.so.1
#27 0x00007ffff641599b in ?? () from /usr/lib/libvulkan.so.1
#28 0x00007ffff6419b29 in vkCreateDevice () from /usr/lib/libvulkan.so.1
#29 0x0000555556942f7d in vk::createDevice(vk::PlatformInterface const&, vk::VkInstance_s*, vk::InstanceInterface const&, vk::VkPhysicalDevice_s*, vk::VkDeviceCreateInfo const*, vk::VkAllocationCallbacks const*) ()
#30 0x00005555558a3384 in vkt::DefaultDevice::DefaultDevice(vk::PlatformInterface const&, tcu::CommandLine const&) ()
#31 0x00005555558a40e5 in vkt::Context::Context(tcu::TestContext&, vk::PlatformInterface const&, vk::ProgramCollection<vk::ProgramBinary, vk::BinaryBuildOptions>&) ()
#32 0x000055555588c3e2 in vkt::TestCaseExecutor::TestCaseExecutor(tcu::TestContext&) ()
#33 0x000055555588c552 in vkt::TestPackage::createExecutor() const ()
#34 0x0000555556e04964 in tcu::TestSessionExecutor::iterate() ()
#35 0x0000555556dd89a9 in tcu::App::iterate() ()
#36 0x000055555587e4e8 in main ()

Can you look into it?

Thanks!

Would you be able to provide LLVM IR? I am trying to reproduce this issue locally but it is taking some time as I haven't worked with RADV yet.

Thanks.

Unfortunately, I can't get any LLVM IR because it crashes too early in the process. You can also reproduce the problem with RadeonSI btw.

Thanks for the revert!

Added a fix for the failing deqp-vk test.