This fixes the failures in PR27738, our longstanding incorrect codegen
bug. It appears that our hypothesis that we were generating control
flow that's too complicated for ptxas to analyze correctly was right.
Details
- Reviewers
tra
Diff Detail
- Build Status
Buildable 1472 Build 1472: arc lint + arc unit
Event Timeline
@majnemer points out that the structurizer has known bugs, e.g. PR25378, PR31076, PR27488.
Guess we may need to fix those before turning this on. Bummer.
You can do the setRequiresStructuredCFG(true); independently of enabling the structurizer.
I tried and it didn't make any difference in my testcases.
If we have a test where it does make a difference, I am absolutely onboard with turning it on without getting the structurizer.
Here is one:
$ cat t.ll
define void @f(i32 %x, i32 %n, i32* %p) { entry: %tmp29 = lshr i32 %x, %n %tmp3 = and i32 %tmp29, 1 %tmp4 = icmp eq i32 %tmp3, 0 br i1 %tmp4, label %bb, label %UnifiedReturnBlock bb: store volatile i32 0, i32* %p ret void UnifiedReturnBlock: ret void }
$ diff <(~/llvm/Debug+Asserts/bin/llc bt.ll -o - -mtriple nvptx64 -enable-tail-merge=true) <(~/llvm/Debug+Asserts/bin/llc bt.ll -o - -mtriple nvptx64 -enable-tail-merge=false)
31a32ret;
Here is one:
Capturing IRC conversation: I meant, a difference in the correctness of ptxas's generated code.
I am definitely willing to hear arguments that this is too conservative, it's just my instinct not to change stuff without a failing testcase.