This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Structurize the NVPTX CFG.
Changes PlannedPublic

Authored by jlebar on Nov 21 2016, 4:30 PM.

Details

Reviewers
tra
Summary

This fixes the failures in PR27738, our longstanding incorrect codegen
bug. It appears that our hypothesis that we were generating control
flow that's too complicated for ptxas to analyze correctly was right.

Event Timeline

jlebar updated this revision to Diff 78801.Nov 21 2016, 4:30 PM
jlebar retitled this revision from to [NVPTX] Structurize the NVPTX CFG..
jlebar updated this object.
jlebar added a reviewer: tra.
jlebar updated this object.Nov 21 2016, 4:32 PM
jlebar updated this object.
jlebar added subscribers: jingyue, majnemer.

@majnemer points out that the structurizer has known bugs, e.g. PR25378, PR31076, PR27488.

Guess we may need to fix those before turning this on. Bummer.

tra edited edge metadata.Nov 21 2016, 4:44 PM

Bummer, indeed. But we do see the light at the end of this particular tunnel. :-)

You can do the setRequiresStructuredCFG(true); independently of enabling the structurizer.

You can do the setRequiresStructuredCFG(true); independently of enabling the structurizer.

I tried and it didn't make any difference in my testcases.

If we have a test where it does make a difference, I am absolutely onboard with turning it on without getting the structurizer.

You can do the setRequiresStructuredCFG(true); independently of enabling the structurizer.

I tried and it didn't make any difference in my testcases.

If we have a test where it does make a difference, I am absolutely onboard with turning it on without getting the structurizer.

Here is one:

$ cat t.ll

define void @f(i32 %x, i32 %n, i32* %p) {
entry:
  %tmp29 = lshr i32 %x, %n
  %tmp3 = and i32 %tmp29, 1
  %tmp4 = icmp eq i32 %tmp3, 0
  br i1 %tmp4, label %bb, label %UnifiedReturnBlock

bb:
  store volatile i32 0, i32* %p
  ret void

UnifiedReturnBlock:
  ret void
}

$ diff <(~/llvm/Debug+Asserts/bin/llc bt.ll -o - -mtriple nvptx64 -enable-tail-merge=true) <(~/llvm/Debug+Asserts/bin/llc bt.ll -o - -mtriple nvptx64 -enable-tail-merge=false)
31a32

ret;

Here is one:

Capturing IRC conversation: I meant, a difference in the correctness of ptxas's generated code.

I am definitely willing to hear arguments that this is too conservative, it's just my instinct not to change stuff without a failing testcase.

jlebar planned changes to this revision.Jan 16 2017, 11:33 PM

Waiting for a more correct structurizer before we turn this on.