Tail merge was making the assumption that a layout successor or
predecessor was always a cfg successor/predecessor. Remove that
assumption. Changes to tests are necessary because the errant cfg edges
were preventing optimizations.
Details
Diff Detail
Event Timeline
The following 4 tests don't have an obvious fix with this test. I was hoping to get some help by the test authors.
LLVM :: CodeGen/Thumb2/v8_IT_5.ll LLVM :: CodeGen/WebAssembly/cfg-stackify.ll LLVM :: CodeGen/X86/licm-dominance.ll LLVM :: CodeGen/X86/wineh-coreclr.ll
licm-dominance is a real pain because the undefineds get loop-optimized away and invalidate the test assumptions. It looks like it was never correct.
Feel free to disable CodeGen/WebAssembly/cfg-stackify.ll, such as by just removing the "| FileCheck %s" parts of the RUN lines, for now. I'm happy to fix the test and re-enable it myself later.
(This is a particularly hairy test. I'm hoping that MIR serialization will allow us to rewrite it in a less fragile way. But for now, it seems best to just disable it so that it doesn't block other work.)
For CodeGen/X86/licm-dominance.ll I was specifically hoping that the test could be re-reduced with this change in place. I tried to create a test that would exercise the code in question, but I don't seem to understand licm well enough.
I just noticed that test/CodeGen/WebAssembly/mem-intrinsics.ll seems to be failing as well with this change.
If you can look at that as well, it would be awesome.
test/CodeGen/WebAssembly/mem-intrinsics.ll passes for me, with this patch applied. Let me know if there's anything I can help with.
Hi Kyle,
I think FallThrough does not have to be the CFG successor of MBB. It might be easier to understand it as the potential fallthrough block of either MBB or PrevBB.
Haicheng
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
1381 | If you change FallThrough to MF.end(), I think you may miss this optimization. This piece of code tries to optimize this case PrevBB---MBB FallThrough |____________| Where MBB is the return block, PrevBB is both the layout predecessor and CFG predecessor of MBB, Fallthourgh is just a layout successor of MBB and must *not* be the CFG successor of MBB. In this case, MBB can be moved to the bottom of the MF and PrevBB can fallthrough to FallThrough. | |
1603 | Similar situation here. Fallthrough is used to iterate all blocks below MBB in the layout to find the first non-EHPad. I think changing it to MF.end() may miss this one too. |
Narrowed the check for FallThrough being an actual successor to the one place where this was assumed.
Thanks Haicheng.
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
1381 | I moved the test to the empty block removal, which does assume that FallThrough is a CFG successor. |
That only leaves licm-dominance and Thumb2/v8_IT_5.ll
The optimizer seems to be doing reasonable things in both of those cases.
I don't really want to remove tests from the regression suite, but the tests seem to be relying on bad assumptions, and I'm not the best person to fix them.
I could disable the FileCheck lines and open bugs for them.
Please see the inlined comment.
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
1258 | I think we can enter here only if MBB is empty which means MBB does not have any branch. In this case, I think MBB has nowhere to go but fallthrough to FallThrough and this check is not necessary. |
Barring comment from anyone else, I'm going to go ahead and remove
LLVM :: CodeGen/Thumb2/v8_IT_5.ll LLVM :: CodeGen/X86/licm-dominance.ll
In licm-dominance it's clear that bugpoint has gone too far and the optimizer is being perfectly reasonable,
and it looks pretty reasonable for v8_IT_5.ll as well.
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
1258 | This is actually what caused me to look into fallthrough in this file. An empty unreachable block has no successors, and replacing it with its layout successor can create invalid cfg edges. |
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
1258 | That makes sense. Do you want to add a comment to explain this? Do you also want to create a test case for this? I think you don't have any test case yet to show what you want to fix. |
AFAICT, all of the test updates are just side-effects of this patch. Therefore, I agree with Haicheng in that we need a test case that exposes the issue this patch is trying to address.
Added a test case, and another check where we weren't being explicit about cfg vs layout successor.
There's now a test for the codepath that handles empty blocks with no successors. I tested, and that seems to be the driver behind all the other test changes as well.
The two other code changes at 969 and 1270 were from looking over the file to see if there were other places where fallthrough was being assumed instead of actually tested.
I can split those out into a separate patch if you would prefer.
Hi Kyle,
Do you have test cases for your change in line 969 and 1270?
Haicheng
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
969 | My understanding is that PredBB is just the layout predecessor of MBB. The CFG predecessors of MBB are stored in MergePotentials and they are compared with PredBB to see if any of them is also the layout predecessor. I know the debug info is misleading.... Do you have a test case that the current code miscompiles? | |
1270 | Is this check you added covered by the rest existing conditions? Do you have a test case for this? |
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
999 | No, there is no test case that changes, and nothing in the test suite changes hashes. However, the comments documenting the function disagree with you. See the comments on lines 796-797. The fact that nothing bad has happened so far isn't a good argument for not fixing the code to match the assumptions given in the comments. | |
1270 | You're right it's covered. Looking at AnalyzeBranch, Cond.empty && !TBB implies fallthrough. I'll remove it. |
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
999 | I need to get this in, so if you are set against this check, I can pull it out of this patch and worry about it later. |
Sorry for the late response.
Have you updated the broken test cases? I ran your change on my machine and I could pass test-sharedidx.ll always-ext.ll thumb2-ifcvt3.ll
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
999 | Comment 796-797 emphasizes *if any*. So, maybe we can worry about it later. | |
test/CodeGen/X86/tail-merge-unreachable.ll | ||
16–19 | Not directly related to your change. Do you think tail merging should merge sw.bb and sw.bb2? I know tail merging currently cannot merge empty blocks or unconditional branch only blocks. I may work on it as next. |
I updated the test cases as requested.
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
999 | That to me implies that it will be null if there is no such predecessor. |
I checked the modifications to the broken test cases.
I think you still break v8_IT_5.ll and licm-dominance.ll. What is your plan about these two?
Your modifications to missinglabel.ll and eh.ll are reasonable.
Your patch causes unnecessary branches in thumb2-cbnz.ll and br-fold.ll, but I think it is not your fault. I will try to create a patch to merge unreachable only blocks so that your change would not break these two.
lib/CodeGen/BranchFolding.cpp | ||
---|---|---|
999 | Please remove it from this patch. I think it is not closely related to the rest of the change. I understand your concern and do not have strong opinion about it. If you insist, it might be better to do it in your next patch. |
OK, the two failing tests are gone, I don't think they were valid to begin with. Are there any other concerns about this change?
In general, we don't want to reduce our test coverage. Please provide additional details as to why you believe these two tests aren't valid and why it's reasonable to delete them.
OK, the two failing tests are gone, I don't think they were valid to begin with. Are there any other concerns about this change?
In general, we want coverage by correct tests. Incorrect tests are a liability, not an asset.
OK, let's take them one at a time:
licm-dominance:
This test has NEVER been correct. I've checked the entire version history. The goal of the test is to verify that LICM checks the dominance relation to make sure a load is guaranteed to be executed. The problem with the test is that with all the undefineds, the compiler is free to make sure the load is guaranteed to be executed. The test also relies on the particular behavior of undefined as setting eax to 0
Thumb2/v8_IT_5.ll:
There are 4 other v8_it tests, but this one suffers from a similar problem as the one above. It has too many undefineds for the assumptions about the resulting code to ever be correct.
I feel I've given plenty of time for people to respond to the tests in question. On May 19, almost a month ago I said:
"""
That only leaves licm-dominance and Thumb2/v8_IT_5.ll
The optimizer seems to be doing reasonable things in both of those cases.
I don't really want to remove tests from the regression suite, but the tests seem to be relying on bad assumptions, and I'm not the best person to fix them.
I could disable the FileCheck lines and open bugs for them.
"""
and a week later on the 25th I said:
"""
Barring comment from anyone else, I'm going to go ahead and remove
LLVM :: CodeGen/Thumb2/v8_IT_5.ll
LLVM :: CodeGen/X86/licm-dominance.ll
In licm-dominance it's clear that bugpoint has gone too far and the optimizer is being perfectly reasonable,
and it looks pretty reasonable for v8_IT_5.ll as well.
"""
A comment from you then would have been very helpful.
I agree. Please understand I just want to make sure we're doing our due diligence here. I'm not trying to impeded your progress.
OK, let's take them one at a time:
licm-dominance:
This test has NEVER been correct. I've checked the entire version history. The goal of the test is to verify that LICM checks the dominance relation to make sure a load is guaranteed to be executed. The problem with the test is that with all the undefineds, the compiler is free to make sure the load is guaranteed to be executed. The test also relies on the particular behavior of undefined as setting eax to 0
After further investigation I tend to agree that bugpoint was overly aggressive and the reduced test case doesn't appear to be actually testing anything.
Thumb2/v8_IT_5.ll:
There are 4 other v8_it tests, but this one suffers from a similar problem as the one above. It has too many undefineds for the assumptions about the resulting code to ever be correct.
I think v8_IT_5.ll could have been better written, but I believe it is testing what it is intended to test. Specifically, that we can predicate a tCMPi8 instruction.
I think the critical checks are:
; CHECK: it ne
; CHECK-NEXT: cmpne
; CHECK-NEXT: bne [[JUMPTARGET:.LBB[0-9]+_[0-9]+]]
If you edit isV8EligibleForIT() in ARMFeatures.h to return false for ARM::tCMPi8 you'll break this test, which is the regression we're trying to avoid. We'll generate code like the following:
cmp r0, #3 beq .LBB0_3 cmp r0, #1 beq .LBB0_3 ...
IMO, we should figure out how to fix the test so it continues to test this behavior while passing with your patch.
I've reduced the test a little without changing the CHECKs. Let me know if this works with your patch.
; RUN: llc < %s -mtriple=thumbv8 -arm-atomic-cfg-tidy=0 | FileCheck %s ; RUN: llc < %s -mtriple=thumbv7 -arm-atomic-cfg-tidy=0 -arm-restrict-it | FileCheck %s ; CHECK: it ne ; CHECK-NEXT: cmpne ; CHECK-NEXT: bne [[JUMPTARGET:.LBB[0-9]+_[0-9]+]] ; CHECK: cbz ; CHECK-NEXT: %if.else163 ; CHECK-NEXT: mov.w ; CHECK-NEXT: b ; CHECK: [[JUMPTARGET]]:{{.*}}%if.else173 ; CHECK-NEXT: mov.w ; CHECK-NEXT: bx lr ; CHECK-NEXT: %if.else145 ; CHECK-NEXT: mov.w %struct.hc = type { i32, i32, i32, i32 } define i32 @t(i32 %type) optsize { entry: switch i32 %type, label %if.else173 [ i32 3, label %if.then115 i32 1, label %if.then102 ] if.then102: unreachable if.then115: br i1 undef, label %if.else163, label %if.else145 if.else145: %call150 = call fastcc %struct.hc* @foo(%struct.hc* undef, i32 34865152) optsize br label %while.body172 if.else163: %call168 = call fastcc %struct.hc* @foo(%struct.hc* undef, i32 34078720) optsize br label %while.body172 while.body172: br label %while.body172 if.else173: ret i32 -1 } declare hidden fastcc %struct.hc* @foo(%struct.hc* nocapture, i32) nounwind optsize
Chad
I feel I've given plenty of time for people to respond to the tests in question. On May 19, almost a month ago I said:
"""
That only leaves licm-dominance and Thumb2/v8_IT_5.llThe optimizer seems to be doing reasonable things in both of those cases.
I don't really want to remove tests from the regression suite, but the tests seem to be relying on bad assumptions, and I'm not the best person to fix them.
I could disable the FileCheck lines and open bugs for them.
"""
and a week later on the 25th I said:
"""
Barring comment from anyone else, I'm going to go ahead and removeLLVM :: CodeGen/Thumb2/v8_IT_5.ll
LLVM :: CodeGen/X86/licm-dominance.ll
In licm-dominance it's clear that bugpoint has gone too far and the optimizer is being perfectly reasonable,
and it looks pretty reasonable for v8_IT_5.ll as well.
"""
A comment from you then would have been very helpful.
I have a fixed version of v8_IT_5.ll
Your short version made it obvious that they're relying on then102 and
then115 becoming the same destination.
Maybe you can help me with licm-dominance. I can't seem to write a load in
IR that ends up being considered as invariant to save my life.
https://xkcd.com/1168/
I can tag the loads as invariant, and then the flags are correct, but for
the load to be *actually* invariant it also has to be "dereferenceable".
I'm not certain what that means or how to write it into the IR. Pointers
would be useful.
Err, are you talking about https://github.com/llvm-mirror/llvm/blob/961fcb527d3c49bfcf9d6ff212cca3dc15682dbe/lib/CodeGen/MachineLICM.cpp#L869 ? You can't write a constant pool load in IR. You can write something like store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32*> @g, and SelectionDAG will generate a constant pool load.
mcrosier, thanks for your help with the v8 test. When you shrunk it, it was obvious what was intended, and what needed to change.
As an aside, we can't use 1 and 3 any more, as the compiler gets clever with constants, oring with 2 and then checking against 3. Hence 6 and 13.
It looks good to me now. Thank you for working on this, Kyle. I will create a patch to cleanup the extra branches later.
My understanding is that PredBB is just the layout predecessor of MBB. The CFG predecessors of MBB are stored in MergePotentials and they are compared with PredBB to see if any of them is also the layout predecessor. I know the debug info is misleading....
Do you have a test case that the current code miscompiles?