Avoid assert/crash during liveness calculation in situations where the
incoming machine function has statically unreachable BBs.
Fixes PR37130.
Differential D46265
StackColoring: better handling of statically unreachable code thanm on Apr 30 2018, 7:01 AM. Authored by
Details Avoid assert/crash during liveness calculation in situations where the Fixes PR37130.
Diff Detail
Event TimelineComment Actions I would welcome suggestions on how to write a regression test / lit test for this issue. The tricky part seems to be insuring that a statically unreachable BB survives all the way through the llc pipeline to make it to stack coloring. Comment Actions @thanm You can take the reproducer in the bug report, produce LLVM-IR from it (-S -emit-llvm), then run it through llc with "-stop-before stack-slot-coloring" to get MIR output. The test file would then be invoked with the same options as before except you'd add '-start-before stack-slot-coloring'. test/CodeGen/X86/branch_instruction_and_target_split_perf_nops.mir is an example of such a test. Comment Actions @thanm You can take the reproducer in the bug report, produce LLVM-IR from it (-S -emit-llvm), then run it through llc with "-stop-before stack-slot-coloring" to get MIR output. The test file would then be invoked with the same options as before except you'd add '-start-before stack-slot-coloring'.
Well, I am not having much luck creating a runnable MIR test case using this recipe. I tried: clang -S -emit-llvm ... then edited the resulting file to introduce the statically unreachable block. When I run it using llc ../llvm/test/CodeGen/X86/stack-coloring-unreachable-bb.mir -no-stack-coloring=false -start-before stack-coloring I get a fatal error in the pass manager: "Unable to schedule 'Slot index numbering' required by 'Merge disjoint stack slots'". I am probably making some sort of rookie mistake here somewhere, but I am not sure what. Comment Actions Sorry for the delay, but this change looks ok, but the supplied test case is flaky and crashes sometimes on my machine: Assertion failed: (!isa<Constant>(V) && "Can't get a constant or global slot with this!"), function getLocalSlot, file /Users/sdardis/dev/llvm/llvm/lib/IR/AsmWriter.cpp, line 1038.
0 llc 0x0000000107cfdeff llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 63 Digging in with valgrind shows several invalid reads and "Conditional jump or move depends on uninitialised value(s)". Digging in with LLDB gives differing results (somewhat expected). Can you run valgrind on the run of llc for this test case and report your findings?
Comment Actions You are right, the testcase is indeed flaky. I left out the "-start-before stack-coloring" in the llc run rule -- as a result, what's happening is that the UnreachableBlockElimLegacyPass is running (prior to stack coloring), so it deletes the unreachable BB "unreachable:". The corresponding unreachable machine BB still exists, however, and it includes a reference to the BB, which becomes stale, leading to the crash. If I add back in "-start-before stack-coloring" to the LLC command line (so as not preserve the BBs), I am back at my original problem of Unable to schedule 'Slot index numbering' required by 'Merge disjoint stack slots' for which I have no viable solution. I think what I will do is add a new knob to disable unreachable block elimination, then run llc as usual. Let me work on that.
Comment Actions Revised test case. New hidden option to turn off statically Comment Actions I've tried the revised patch, but I am still seeing flakiness when running this on my darwin machine. It appears CodeGenPrepare::eliminateMostlyEmptyBlock is leaving around stale information that is then read by the MIRPrintingPass. I have a suggestion inline, and with that, the change to UnreachableBlockElim.cpp seems unnecessary. Can you also test on your side?
Comment Actions OK, got it. Sounds like the test case still needs more work. I am traveling this week and don't have access to a darwin machine, but I should be able to test that scenario when I get back.
Will do. Comment Actions Incorporate changes suggested by Simon:
Comment Actions Thanks! Review comments and suggestions much appreciated. I will try to submit this later in the week. Comment Actions Hi Simon, Your suggested revision to the MIR testcase ("-start-before isel") doesn't seem to agree with some of the buildbots. Do you have any insight into why this might be the case? I will go ahead and revert the change now, will try to iron out the test case issues later on. Thanks, Than Comment Actions Did you land the revert? I'm seeing the same failure running tests locally as on http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/4445/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3APR37310.mir If you haven't reverted yet, please do. Comment Actions Hi Than, Sorry I'm not sure what to do in that case. Locally it runs fine on my machine, and I've tested it on another system and it's ok there. You're probably safe enough to re-land the change without the test case. Thanks, Comment Actions
I agree, given the nature of the change. I'll prepare another patch. Comment Actions
OK, I finally figured out the problem. It turns out that "isel" is not (as I had naively assumed) the instruction selection pass, it's a separate pass specific to AMDGPU (confusing name, IMHO). I changed the test to use "-start-before dwarfehprepare" (a real pass that is not target-specific) and it seems to work now. I will send another patch. |