This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] lower-switch in preISel as a workaround for legacy DA
ClosedPublic

Authored by sameerds on Sep 18 2018, 1:04 AM.

Details

Summary

The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.

As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.

Diff Detail

Repository
rL LLVM

Event Timeline

sameerds created this revision.Sep 18 2018, 1:04 AM

This workaround will not longer be required with the new Divergence Analysis (https://reviews.llvm.org/D50433). I will add this test case as a unit test for the new implementation.

Good to hear the new DA can handle this. We need lowered switches for the control flow lowering anyway, though, so we may as well do it this way.

The test case is extremely mysterious to me, though. What is it that's actually being tested here? From the description I understand it's the divergence of the phi in sw.epilog? I don't see how that relates to the CHECK lines.

sameerds updated this revision to Diff 166000.Sep 18 2018, 10:35 AM

improve the lit test

The test is now beefed up to use FileCheck variables. Also added a new
comment to explain what is being checked.

This workaround will not longer be required with the new Divergence Analysis (https://reviews.llvm.org/D50433). I will add this test case as a unit test for the new implementation.

Thanks @simoll. The new unit test in D50433 correctly captures the issue.

Good to hear the new DA can handle this. We need lowered switches for the control flow lowering anyway, though, so we may as well do it this way.

The test case is extremely mysterious to me, though. What is it that's actually being tested here? From the description I understand it's the divergence of the phi in sw.epilog? I don't see how that relates to the CHECK lines.

Updated the test to better convey the intention. Does that help? Note that since this change is a backend-specific workaround, the test focuses on the symptom seen in the results of si-annotate-control-flow rather than checking the results of divergence analysis.

Good to hear the new DA can handle this. We need lowered switches for the control flow lowering anyway, though, so we may as well do it this way.

The test case is extremely mysterious to me, though. What is it that's actually being tested here? From the description I understand it's the divergence of the phi in sw.epilog? I don't see how that relates to the CHECK lines.

Updated the test to better convey the intention. Does that help? Note that since this change is a backend-specific workaround, the test focuses on the symptom seen in the results of si-annotate-control-flow rather than checking the results of divergence analysis.

@nhaehnle, is this good to go? Waiting for an LGTM ...

arsenm added inline comments.Sep 20 2018, 12:37 AM
test/CodeGen/AMDGPU/diverge-switch-default.ll
4 ↗(On Diff #166000)

Drop the triple in the file since it is redundant with the run lines

Needs a comment that it’s a workaround for DA

Yes, thank you, this is clearer. I'm fine with the change, but please do address Matt's comments.

sameerds updated this revision to Diff 166315.Sep 20 2018, 9:39 AM

addressed comments from @arsenm

sameerds marked an inline comment as done.Sep 20 2018, 9:44 AM

I changed the first line in the change description to make it explicit that this is a workaround for DA, but it is not showing up here. I just used "arc diff master" ... how do I get the web interface to show the updated description?

I changed the first line in the change description to make it explicit that this is a workaround for DA, but it is not showing up here. I just used "arc diff master" ... how do I get the web interface to show the updated description?

I'd be interested in this as well. Maybe there's an arc flag for this? In the past I always just updated manually using the web interface.

nhaehnle accepted this revision.Sep 21 2018, 1:34 AM
This revision is now accepted and ready to land.Sep 21 2018, 1:34 AM
sameerds updated this revision to Diff 166441.Sep 21 2018, 4:04 AM
sameerds retitled this revision from require lower-switch in preISel passes for AMDGPU to [AMDGPU] lower-switch in preISel as a workaround for legacy DA.

figured out how to fix the revision using "arc diff --edit"

This revision was automatically updated to reflect the committed changes.

This broke glamor with the radeonsi driver, hits unreachable():

Pass ID not registered
UNREACHABLE executed at ../lib/CodeGen/TargetPassConfig.cpp:536!

Thread 1 "X" received signal SIGABRT, Aborted.
GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0
GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff67a12f1 in GI_abort () at abort.c:79
#2 0x00007ffff0c9d3c3 in llvm::llvm_unreachable_internal (msg=<optimized out>, file=<optimized out>, line=<optimized out>) at ../lib/Support/ErrorHandling.cpp:222
#3 0x00007ffff11cbfce in llvm::TargetPassConfig::addPass (this=0x555555856560, PassID=<optimized out>, verifyAfter=true, printAfter=true) at ../lib/CodeGen/TargetPassConfig.cpp:536
#4 0x00007ffff2075582 in (anonymous namespace)::AMDGPUPassConfig::addPreISel (this=0x555555856560) at ../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:682
#5 (anonymous namespace)::GCNPassConfig::addPreISel (this=0x555555856560) at ../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:750
#6 0x00007ffff11cc4bb in llvm::TargetPassConfig::addISelPrepare (this=0x555555856560) at ../lib/CodeGen/TargetPassConfig.cpp:693
#7 0x00007ffff11cc8bf in llvm::TargetPassConfig::addISelPasses (this=0x555555856560) at ../lib/CodeGen/TargetPassConfig.cpp:773
#8 0x00007ffff0fcaa65 in addPassesToGenerateCode (TM=0x55555584a320, PM=..., DisableVerify=<optimized out>, WillCompleteCodeGenPipeline=@0x7fffffffda47: true, Out=..., MMI=0x5555558608c0) at ../lib/CodeGen/LLVMTargetMachine.cpp:113
#9 0x00007ffff0fca968 in llvm::LLVMTargetMachine::addPassesToEmitFile (this=<optimized out>, PM=..., Out=..., DwoOut=<optimized out>, FileType=llvm::TargetMachine::CGFT_ObjectFile, DisableVerify=false, MMI=0x0) at ../lib/CodeGen/LLVMTargetMachine.cpp:205
#10 0x00007ffff4501dbc in ac_create_llvm_passes () at ../../../src/amd/common/ac_llvm_helper.cpp:134
#11 0x00007ffff43f9669 in si_init_compiler (sscreen=sscreen@entry=0x5555558400a0, compiler=compiler@entry=0x5555558407b0) at ../../../../../src/gallium/drivers/radeonsi/si_pipe.c:128
#12 0x00007ffff43fb792 in radeonsi_screen_create (ws=<optimized out>, config=<optimized out>) at ../../../../../src/gallium/drivers/radeonsi/si_pipe.c:1079
#13 0x00007ffff44cdbbb in amdgpu_winsys_create (fd=fd@entry=16, config=config@entry=0x7fffffffdc18, screen_create=screen_create@entry=0x7ffff43fb200 <radeonsi_screen_create>) at ../../../../../../src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c:351
#14 0x00007ffff3cb0cb2 in pipe_radeonsi_create_screen (fd=16, config=0x7fffffffdc18) at ../../../../../src/gallium/auxiliary/target-helpers/drm_helper.h:171
#15 0x00007ffff42f764d in pipe_loader_create_screen (dev=0x55555583be00) at ../../../../../src/gallium/auxiliary/pipe-loader/pipe_loader.c:137
#16 0x00007ffff4133bfe in dri2_init_screen (sPriv=0x55555583a1c0) at ../../../../../src/gallium/state_trackers/dri/dri2.c:2112
#17 0x00007ffff412e011 in driCreateNewScreen2 (scrn=0, fd=13, extensions=<optimized out>, driver_extensions=<optimized out>, driver_configs=0x5555557f8c58, data=0x5555557f8a90) at ../../../../../../src/mesa/drivers/dri/common/dri_util.c:153
#18 0x00007ffff57a556e in dri_screen_create_dri2 (dri=dri@entry=0x5555557f8a90, driver_name=<optimized out>) at ../../../src/gbm/backends/dri/gbm_dri.c:451
#19 0x00007ffff57a5922 in dri_screen_create (dri=0x5555557f8a90) at ../../../src/gbm/backends/dri/gbm_dri.c:526
#20 dri_device_create (fd=13) at ../../../src/gbm/backends/dri/gbm_dri.c:1433
#21 0x00007ffff57a31a9 in gbm_create_device (fd=13) at ../../../src/gbm/main/gbm.c:137
#22 0x00007ffff57c0b22 in AMDGPUPreInitAccel_KMS (pScrn=0x5555557f62a0) at ../../src/amdgpu_kms.c:1167
#23 AMDGPUPreInit_KMS (pScrn=0x5555557f62a0, flags=<optimized out>) at ../../src/amdgpu_kms.c:1394
#24 0x00005555555ef898 in InitOutput (pScreenInfo=pScreenInfo@entry=0x5555557b5fa0 <screenInfo>, argc=argc@entry=4, argv=argv@entry=0x7fffffffeb18) at ../../../../hw/xfree86/common/xf86Init.c:483
#25 0x00005555555b0a88 in dix_main (argc=4, argv=0x7fffffffeb18, envp=<optimized out>) at ../../dix/main.c:191
#26 0x00007ffff678cb17 in
libc_start_main (main=0x55555559a1c0 <main>, argc=4, argv=0x7fffffffeb18, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffeb08) at ../csu/libc-start.c:310
#27 0x000055555559a1fa in _start ()

Sorry about that! I am reverting the change for now. Waiting for the tests
to pass locally with the change reverted.

Sameer.