This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Simplification to ARMBlockPlacement Pass.
ClosedPublic

Authored by malharJ on Apr 8 2021, 2:58 AM.

Details

Summary

It simplifies the logic by moving the predecessor (preHeader or it's predecessor) above the target (or loopExit),
instead of moving the target to after the predecessor.

Since the loopExit is no longer being moved, directions of any branches within/to it are unaffected.

While the predecessor is being moved, the backwards movement simplifies some considerations,
and the only consideration now required is that a forward WLS to the predecessor should not become backwards.

Diff Detail

Event Timeline

malharJ created this revision.Apr 8 2021, 2:58 AM
malharJ requested review of this revision.Apr 8 2021, 2:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2021, 2:58 AM
malharJ added a comment.EditedApr 8 2021, 3:18 AM

I'm putting some details of my understanding of the existing logic, and how it is being simplified by this patch:


Existing logic:

When moving the loopExit forwards (after preheader), 4 things can be considered:

  1. directionality of WLS (if any) within the loopExit should not regress (forward->backward).
  2. directionality of LE (if any) within the loopExit should not regress (backward->forward).
  3. directionality of WLS (if any) to the loopExit should not regress (forward->backward).
  4. directionality of LE (if any) to the loopExit should not regress (backward->forward).

Existing code takes care to handle cases 1) and 4).
There is no need to handle 2) and 3) since they can not regress due to moving forward.

Simplification:

But if we move preHeader backwards (before loopExit), the same 4 points above can be considered but
now in the context of moving the preHeader:

  • 1) & 2) above cannot occur since existing WLS is a terminator and there can't be another WLS/LE in the same block.
  • 3) is the only case (ie. WLS to PreHeader) that needs to be handled
  • 4) Any LE to PreHeader cannot regress if moving backwards.
malharJ edited the summary of this revision. (Show Details)Apr 8 2021, 3:30 AM
malharJ updated this revision to Diff 337105.EditedApr 13 2021, 5:03 AM

Rebased the patch on top of main branch to get the changes from D99649

malharJ added inline comments.Apr 13 2021, 5:08 AM
llvm/test/CodeGen/Thumb2/block-placement.mir
17–20

This test checked that the loopExit/target block is unaffected when moved forwards,
if it already contains a backwards WLS.

However, it has been removed since the updates in this patch imply that
the loopExit/target is no longer moved around.

22–25

This test was removed because it didn't seem correct/intent was not clear.
The two blocks containing the t2WhileLoopStartLR have exit branches to each other.

Just thought it might be better to remove it as the intent is unclear.
Otherwise, happy to leave it in if required.

@SjoerdMeijer,

Do you think you can review this patch ?
It's not really a priority (but potentially a good simplification), but since you had reviewed the original patch, I thought I'd ask you first.

malharJ edited the summary of this revision. (Show Details)Apr 13 2021, 5:18 AM

Adding @samtebbs as the original author. I need to get back into this, wrap my head around this; perhaps it's less ramping up for Sam. But I will do it anyway tomorrow.

malharJ updated this revision to Diff 337124.EditedApr 13 2021, 6:09 AM

Corrected leftover (from the rebase) error in code.

malharJ updated this revision to Diff 337420.Apr 14 2021, 5:31 AM

Updated failing tests.

Sorry for the delay.
First a few questions: just checking did you run all testing with this patch?

llvm/test/CodeGen/Thumb2/block-placement.mir
149

Looks like you're removing a few tests here. Are they not good tests to keep?

I have run (and it passes)

make/ninja check-all

if that's what you mean by all testing ?

llvm/test/CodeGen/Thumb2/block-placement.mir
149

Please check the 2 comments here: https://reviews.llvm.org/D100094#inline-945352

I have run (and it passes)

make/ninja check-all

if that's what you mean by all testing ?

I meant our downstream tests, and thus also double checking that performance is unaffected would be good too.

When comparing moving Pred before Exit, vs Exit after Pred, is either expected to be more or less efficient? Or are they both just a bit inefficient?

Please add this test case:

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -run-pass arm-block-placement %s -o - | FileCheck %s

--- |
  target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
  target triple = "thumbv8.1m.main-arm-none-eabi"

  define i32 @c(i32 %d, i32 %e, i32 %f) #0 {
  entry:
    %g = alloca i16, align 2
    %0 = bitcast i16* %g to i8*
    call void @llvm.lifetime.start.p0i8(i64 2, i8* nonnull %0) #5
    %cond = icmp eq i32 %e, 8
    %cmp = icmp eq i32 %d, 0
    %or.cond = and i1 %cmp, %cond
    br i1 %or.cond, label %while.cond.preheader, label %sw.epilog

  while.cond.preheader:                             ; preds = %entry
    %tobool.not16 = icmp eq i32 %f, 0
    %1 = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %f)
    %2 = extractvalue { i32, i1 } %1, 1
    %3 = extractvalue { i32, i1 } %1, 0
    br i1 %2, label %while.body.preheader, label %sw.epilog

  while.body.preheader:                             ; preds = %while.cond.preheader
    br label %while.body

  while.cond1.preheader:                            ; preds = %while.body
    %4 = icmp eq i32 %f, 0
    %5 = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %f)
    %6 = extractvalue { i32, i1 } %5, 1
    %7 = extractvalue { i32, i1 } %5, 0
    br i1 %6, label %while.body3.preheader, label %sw.epilog

  while.body3.preheader:                            ; preds = %while.cond1.preheader
    br label %while.body3

  while.body:                                       ; preds = %while.body.preheader, %while.body
    %h.017 = phi i32 [ %10, %while.body ], [ undef, %while.body.preheader ]
    %8 = phi i32 [ %3, %while.body.preheader ], [ %12, %while.body ]
    %9 = call { <8 x i16>, i32 } @llvm.arm.mve.viwdup.v8i16(i32 %h.017, i32 0, i32 1)
    %10 = extractvalue { <8 x i16>, i32 } %9, 1
    %11 = extractvalue { <8 x i16>, i32 } %9, 0
    call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v8i16.v8i16(i16* nonnull %g, <8 x i16> %11, <8 x i16> undef, i32 16, i32 1)
    %12 = call i32 @llvm.loop.decrement.reg.i32(i32 %8, i32 1)
    %13 = icmp ne i32 %12, 0
    br i1 %13, label %while.body, label %while.cond1.preheader

  while.body3:                                      ; preds = %while.body3.preheader, %while.body3
    %.pn = phi { <8 x i16>, i32 } [ %15, %while.body3 ], [ %9, %while.body3.preheader ]
    %14 = phi i32 [ %7, %while.body3.preheader ], [ %17, %while.body3 ]
    %h.121 = extractvalue { <8 x i16>, i32 } %.pn, 1
    %15 = call { <8 x i16>, i32 } @llvm.arm.mve.viwdup.v8i16(i32 %h.121, i32 0, i32 2)
    %16 = extractvalue { <8 x i16>, i32 } %15, 0
    call void @llvm.arm.mve.vstr.scatter.offset.p0i16.v8i16.v8i16(i16* nonnull %g, <8 x i16> %16, <8 x i16> undef, i32 16, i32 1)
    %17 = call i32 @llvm.loop.decrement.reg.i32(i32 %14, i32 1)
    %18 = icmp ne i32 %17, 0
    br i1 %18, label %while.body3, label %sw.epilog

  sw.epilog:                                        ; preds = %while.body3, %while.cond1.preheader, %while.cond.preheader, %entry
    %19 = bitcast i16* %g to i8*
    call void @llvm.lifetime.end.p0i8(i64 2, i8* nonnull %19) #5
    ret i32 undef
  }

  declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1
  declare { <8 x i16>, i32 } @llvm.arm.mve.viwdup.v8i16(i32, i32, i32) #2
  declare void @llvm.arm.mve.vstr.scatter.offset.p0i16.v8i16.v8i16(i16*, <8 x i16>, <8 x i16>, i32, i32) #3
  declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1
  declare { i32, i1 } @llvm.test.start.loop.iterations.i32(i32) #4
  declare i32 @llvm.loop.decrement.reg.i32(i32, i32) #4

  attributes #0 = { "target-features"="+armv8.1-m.main,+hwdiv,+mve.fp,+ras,+thumb-mode" }

...
---
name:            c
alignment:       2
tracksRegLiveness: true
liveins:
  - { reg: '$r0', virtual-reg: '' }
  - { reg: '$r1', virtual-reg: '' }
  - { reg: '$r2', virtual-reg: '' }
stack:
  - { id: 0, name: g, type: default, offset: -10, size: 2, alignment: 2,
      stack-id: default, callee-saved-register: '', callee-saved-restored: true,
      local-offset: -2, debug-info-variable: '', debug-info-expression: '',
      debug-info-location: '' }
  - { id: 1, name: '', type: spill-slot, offset: -4, size: 4, alignment: 4,
      stack-id: default, callee-saved-register: '$lr', callee-saved-restored: false,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
  - { id: 2, name: '', type: spill-slot, offset: -8, size: 4, alignment: 4,
      stack-id: default, callee-saved-register: '$r7', callee-saved-restored: true,
      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
body:             |
  ; CHECK-LABEL: name: c
  ; CHECK: bb.0.entry:
  ; CHECK:   successors: %bb.1(0x55555555), %bb.2(0x2aaaaaab)
  ; CHECK:   liveins: $r0, $r1, $r2, $r7, $lr
  ; CHECK:   frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
  ; CHECK:   frame-setup CFI_INSTRUCTION def_cfa_offset 8
  ; CHECK:   frame-setup CFI_INSTRUCTION offset $lr, -4
  ; CHECK:   frame-setup CFI_INSTRUCTION offset $r7, -8
  ; CHECK:   $sp = frame-setup tSUBspi $sp, 1, 14 /* CC::al */, $noreg
  ; CHECK:   frame-setup CFI_INSTRUCTION def_cfa_offset 12
  ; CHECK:   tCMPi8 killed renamable $r0, 0, 14 /* CC::al */, $noreg, implicit-def $cpsr
  ; CHECK:   t2IT 0, 8, implicit-def $itstate
  ; CHECK:   tCMPi8 killed renamable $r1, 8, 0 /* CC::eq */, killed $cpsr, implicit-def $cpsr, implicit killed $itstate
  ; CHECK:   t2Bcc %bb.2, 0 /* CC::eq */, killed $cpsr
  ; CHECK:   tB %bb.1, 14 /* CC::al */, $noreg
  ; CHECK: bb.5.while.cond1.preheader:
  ; CHECK:   successors: %bb.6(0x40000000), %bb.1(0x40000000)
  ; CHECK:   liveins: $r0, $r1, $r2
  ; CHECK:   renamable $lr = t2WhileLoopStartLR killed renamable $r2, %bb.1, implicit-def dead $cpsr
  ; CHECK:   t2B %bb.6, 14 /* CC::al */, $noreg
  ; CHECK: bb.2.while.cond.preheader:
  ; CHECK:   successors: %bb.3(0x40000000), %bb.1(0x40000000)
  ; CHECK:   liveins: $r2
  ; CHECK:   renamable $lr = t2WhileLoopStartLR renamable $r2, %bb.1, implicit-def dead $cpsr
  ; CHECK:   t2B %bb.3, 14 /* CC::al */, $noreg
  ; CHECK: bb.1.sw.epilog:
  ; CHECK:   $sp = frame-destroy tADDspi $sp, 1, 14 /* CC::al */, $noreg
  ; CHECK:   frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit undef $r0
  ; CHECK: bb.3.while.body.preheader:
  ; CHECK:   successors: %bb.4(0x80000000)
  ; CHECK:   liveins: $lr, $r2
  ; CHECK:   renamable $r3, dead $cpsr = tMOVi8 0, 14 /* CC::al */, $noreg
  ; CHECK:   renamable $r1 = t2ADDri $sp, 2, 14 /* CC::al */, $noreg, $noreg
  ; CHECK:   renamable $r0 = IMPLICIT_DEF
  ; CHECK: bb.4.while.body (align 4):
  ; CHECK:   successors: %bb.4(0x7c000000), %bb.5(0x04000000)
  ; CHECK:   liveins: $lr, $r0, $r1, $r2, $r3
  ; CHECK:   renamable $q0, renamable $r0 = MVE_VIWDUPu16 killed renamable $r0, renamable $r3, 1, 0, $noreg, undef renamable $q0
  ; CHECK:   MVE_VSTRH16_rq undef renamable $q0, renamable $r1, killed renamable $q0, 0, $noreg
  ; CHECK:   renamable $lr = t2LoopEndDec killed renamable $lr, %bb.4, implicit-def dead $cpsr
  ; CHECK:   t2B %bb.5, 14 /* CC::al */, $noreg
  ; CHECK: bb.6.while.body3.preheader:
  ; CHECK:   successors: %bb.7(0x80000000)
  ; CHECK:   liveins: $lr, $r0, $r1
  ; CHECK:   renamable $r3, dead $cpsr = tMOVi8 0, 14 /* CC::al */, $noreg
  ; CHECK: bb.7.while.body3 (align 4):
  ; CHECK:   successors: %bb.7(0x7c000000), %bb.1(0x04000000)
  ; CHECK:   liveins: $lr, $r0, $r1, $r3
  ; CHECK:   renamable $q0, renamable $r0 = MVE_VIWDUPu16 killed renamable $r0, renamable $r3, 2, 0, $noreg, undef renamable $q0
  ; CHECK:   MVE_VSTRH16_rq undef renamable $q0, renamable $r1, killed renamable $q0, 0, $noreg
  ; CHECK:   renamable $lr = t2LoopEndDec killed renamable $lr, %bb.7, implicit-def dead $cpsr
  ; CHECK:   t2B %bb.1, 14 /* CC::al */, $noreg

  bb.0.entry:
    successors: %bb.7(0x80000000), %bb.1(0x40000000)
    liveins: $r0, $r1, $r2, $r7, $lr

    frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r7, killed $lr, implicit-def $sp, implicit $sp
    frame-setup CFI_INSTRUCTION def_cfa_offset 8
    frame-setup CFI_INSTRUCTION offset $lr, -4
    frame-setup CFI_INSTRUCTION offset $r7, -8
    $sp = frame-setup tSUBspi $sp, 1, 14 /* CC::al */, $noreg
    frame-setup CFI_INSTRUCTION def_cfa_offset 12
    tCMPi8 killed renamable $r0, 0, 14 /* CC::al */, $noreg, implicit-def $cpsr
    t2IT 0, 8, implicit-def $itstate
    tCMPi8 killed renamable $r1, 8, 0 /* CC::eq */, killed $cpsr, implicit-def $cpsr, implicit killed $itstate
    t2Bcc %bb.1, 0 /* CC::eq */, killed $cpsr

  bb.7.sw.epilog:
    $sp = frame-destroy tADDspi $sp, 1, 14 /* CC::al */, $noreg
    frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r7, def $pc, implicit undef $r0

  bb.1.while.cond.preheader:
    successors: %bb.3(0x40000000), %bb.7(0x40000000)
    liveins: $r2

    renamable $lr = t2WhileLoopStartLR renamable $r2, %bb.7, implicit-def dead $cpsr
    t2B %bb.3, 14 /* CC::al */, $noreg

  bb.3.while.body.preheader:
    successors: %bb.4(0x80000000)
    liveins: $lr, $r2

    renamable $r3, dead $cpsr = tMOVi8 0, 14 /* CC::al */, $noreg
    renamable $r1 = t2ADDri $sp, 2, 14 /* CC::al */, $noreg, $noreg
    renamable $r0 = IMPLICIT_DEF

  bb.4.while.body (align 4):
    successors: %bb.4(0x7c000000), %bb.2(0x04000000)
    liveins: $lr, $r0, $r1, $r2, $r3

    renamable $q0, renamable $r0 = MVE_VIWDUPu16 killed renamable $r0, renamable $r3, 1, 0, $noreg, undef renamable $q0
    MVE_VSTRH16_rq undef renamable $q0, renamable $r1, killed renamable $q0, 0, $noreg
    renamable $lr = t2LoopEndDec killed renamable $lr, %bb.4, implicit-def dead $cpsr
    t2B %bb.2, 14 /* CC::al */, $noreg

  bb.2.while.cond1.preheader:
    successors: %bb.5(0x40000000), %bb.7(0x40000000)
    liveins: $r0, $r1, $r2

    renamable $lr = t2WhileLoopStartLR killed renamable $r2, %bb.7, implicit-def dead $cpsr
    t2B %bb.5, 14 /* CC::al */, $noreg

  bb.5.while.body3.preheader:
    successors: %bb.6(0x80000000)
    liveins: $lr, $r0, $r1

    renamable $r3, dead $cpsr = tMOVi8 0, 14 /* CC::al */, $noreg

  bb.6.while.body3 (align 4):
    successors: %bb.6(0x7c000000), %bb.7(0x04000000)
    liveins: $lr, $r0, $r1, $r3

    renamable $q0, renamable $r0 = MVE_VIWDUPu16 killed renamable $r0, renamable $r3, 2, 0, $noreg, undef renamable $q0
    MVE_VSTRH16_rq undef renamable $q0, renamable $r1, killed renamable $q0, 0, $noreg
    renamable $lr = t2LoopEndDec killed renamable $lr, %bb.6, implicit-def dead $cpsr
    t2B %bb.7, 14 /* CC::al */, $noreg

...
llvm/lib/Target/ARM/ARMBlockPlacement.cpp
225–227

I think it's worth just renumbering all nodes and recalculating all offsets.

malharJ updated this revision to Diff 341453.Apr 29 2021, 3:06 AM

Addressed review comments:

  • Added a test for multiple predecessors.
  • Updated code to renumber all blocks + compute offset of all blocks.
malharJ marked an inline comment as done.Apr 29 2021, 3:08 AM

@SjoerdMeijer and @dmgreen , I can confirm the performance of downstream tests is unaffected by this patch.

malharJ retitled this revision from [ARM] This patch adds some simplifications to ARMBlockPlacement Pass. to [ARM] Simplification to ARMBlockPlacement Pass..Apr 29 2021, 4:03 AM
samtebbs added inline comments.Apr 29 2021, 8:16 AM
llvm/test/CodeGen/Thumb2/block-placement.mir
22–25

Sorry for not looking at this sooner. The backwards_branch_sibling test made sure that if a backwards WLS (bb3 -> bb1) in this case was introduced by moving the loop exit (bb1) then no transformation would occur. It would still be useful to include this but adapt it to make sure that no moving happens if it would create a forwards LE.

samtebbs added inline comments.Apr 29 2021, 8:17 AM
llvm/test/CodeGen/Thumb2/block-placement.mir
22–25

I meant to write "backwards WLS (bb3 -> bb1 in this case) was introduced..."!

malharJ updated this revision to Diff 342662.May 4 2021, 12:35 AM

Updated tests:

  • Added a test that checks if Predecessor is moved backwards if moving it does not convert a forward WLS into backwards WLS.
  • simplified the test file a bit further.
llvm/test/CodeGen/Thumb2/block-placement.mir
17–20

Instead of removing this, I've now replaced this with an equivalent test: backwards_branch_forwards_wls.

(The naming is as such because there is already another test called backwards_branch_backwards_wls, which
tests the opposite scenario)

22–25

Thanks for explaining this, but there are a few things I'd like to mention:

  1. I've added a test (for checking that backwards WLS is not introduced): backwards_branch_target_already_backwards.

(It was essentially a replacement for the older test named backwards_branch_forwards_le)

So would it be ok to not include backwards_branch_sibling test since it's already being tested ?

It would still be useful to include this but adapt it to make sure that no moving happens if it would create a forwards LE.

With the new logic, a forward LE cannot be introduced since only the PreHeader is being moved backwards.
Any LE to it will remain backwards, and there is no chance of it containing a LE (since it already contains WLS which is a terminator) unless Im mistaken there.

dmgreen accepted this revision.May 4 2021, 9:34 AM

Thanks for the tests. In the interest of fixing the bug that this fixes, LGTM.

llvm/lib/Target/ARM/ARMBlockPlacement.cpp
130

Can you format this line so it flows better.

llvm/test/CodeGen/Thumb2/block-placement.mir
195

Should this be bb.1 and bb.4?

This revision is now accepted and ready to land.May 4 2021, 9:34 AM
malharJ updated this revision to Diff 342952.May 4 2021, 11:20 PM
malharJ marked 2 inline comments as done.

Minor updates: formatting + test

dmgreen accepted this revision.May 5 2021, 3:01 AM

Thanks

llvm/lib/Target/ARM/ARMBlockPlacement.cpp
130

I guess it might be too long to format nicely.

This revision was landed with ongoing or failed builds.May 5 2021, 5:20 PM
This revision was automatically updated to reflect the committed changes.