This is an archive of the discontinued LLVM Phabricator instance.

[SWP] Recognize mem carried dep with different base
ClosedPublic

Authored by thopre on Oct 21 2022, 9:26 AM.

Details

Summary

The loop-carried dependency detection logic in isLoopCarriedDep relies
on the load and store using the same definition for the base register.
This misses the case of post-increment loads and stores whose base
register are different PHI initialized from the same initial value.

This commit extends the logic to accept the load and store having
different PHI base address provided that they had the same initial value
when entering the loop and are incremented by the same amount in each
loop.

Diff Detail

Event Timeline

thopre created this revision.Oct 21 2022, 9:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2022, 9:26 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
thopre requested review of this revision.Oct 21 2022, 9:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2022, 9:26 AM
thopre updated this revision to Diff 469664.Oct 21 2022, 9:41 AM

Remove IR block in test

thopre added inline comments.Oct 21 2022, 9:47 AM
llvm/test/CodeGen/Hexagon/swp-carried-dep3.mir
58

The loadrh_io + this A2_addi is meant to represent a postinc MIR load. Likewise for the S2_storeh_io and the A2_addi on the line before. This is the usecase we are facing on the Graphcore IPU.

If there's a way to reproduce the same either on Hexagon or PowerPC I'm happy to amend the testcase.

bcahoon added inline comments.Oct 23 2022, 12:48 PM
llvm/lib/CodeGen/MachinePipeliner.cpp
2301

I don't think this is needed because computeDelta performs this check. However, this isn't really related to your change...

llvm/test/CodeGen/Hexagon/swp-carried-dep3.mir
15

It looks like this is the same output without your patch. Can you add a test, or change this one, that shows the effect of the patch?

53

If the offset is changed from -8 to 0, then there is no loop carried dependence with your patch.

BTW, if you want to use post-increment instructions it would be
%19:intregs, %22:intregs = L2_loadrh_pi %21, 2 :: (load (s16))

55

The post-increment version is
%9:intregs = S2_storerh_pi %4, 2, killed %20 :: (store (s16))

thopre updated this revision to Diff 470187.Oct 24 2022, 9:31 AM
thopre marked 2 inline comments as done.

Modify tests as per suggestion and checked it shows a difference with and without diff

thopre marked an inline comment as done.Oct 24 2022, 9:32 AM
thopre added inline comments.
llvm/lib/CodeGen/MachinePipeliner.cpp
2301

Do you want me to remove that then?

llvm/test/CodeGen/Hexagon/swp-carried-dep3.mir
15

I had checked that the code was being hit with the earlier version of the testcase but hadn't really understood the CHECK lines. I've verify that the updated test does indeed show a difference with and without the patch.

bcahoon added inline comments.Oct 28 2022, 7:59 AM
llvm/lib/CodeGen/MachinePipeliner.cpp
2301

I don't think that needs to be done with this patch. Maybe a separate NFC patch.

llvm/test/CodeGen/Hexagon/swp-carried-dep3.mir
15

I don't think the nodeset that includes SU(4) and SU(6) is affected by by this patch?

I believe the patch eliminates Nodeset that includes SU(5) and SU(7) because the patch eliminates the need for a loop carried dependence. If so, then I think the test should check that the Nodeset with SU(5) and SU(7) is no longer created.

thopre updated this revision to Diff 471978.Oct 31 2022, 5:33 AM
thopre marked an inline comment as done.

Rewrite CHECK lines

thopre marked an inline comment as done.Oct 31 2022, 5:33 AM
thopre added inline comments.
llvm/test/CodeGen/Hexagon/swp-carried-dep3.mir
15

You're absolutely right:

--- swp-carried-dep3.withoutpatch.log   2022-10-31 11:17:38.775837761 +0000
+++ swp-carried-dep3.withpatch.log      2022-10-31 11:06:53.083501068 +0000
@@ -228,10 +228,6 @@ SU(8):   %17:intregs = M2_acci %19:intre
    SU(6) %20:intregs = A2_addi %18:intregs, 10
    SU(7) %13:intregs = S2_storerh_pi %12:intregs(tied-def 0), 2, %20:intregs :: (store (s16))
 
-  Rec NodeSet Num nodes 2 rec 0 mov 0 depth 2 col 0
-   SU(5) %19:intregs, %15:intregs = L2_loadrh_pi %14:intregs(tied-def 1), 2 :: (load (s16))
-   SU(7) %13:intregs = S2_storerh_pi %12:intregs(tied-def 0), 2, %20:intregs :: (store (s16))
-
   NodeSet Num nodes 3 rec 1 mov 0 depth 2 col 0
    SU(4) %18:intregs, %11:intregs = L2_loadrh_pi %10:intregs(tied-def 1), 2 :: (load (s16))
    SU(6) %20:intregs = A2_addi %18:intregs, 10
bcahoon accepted this revision.Nov 6 2022, 3:41 PM
This revision is now accepted and ready to land.Nov 6 2022, 3:41 PM
This revision was automatically updated to reflect the committed changes.
thopre marked an inline comment as done.
sgundapa added inline comments.
llvm/lib/CodeGen/MachinePipeliner.cpp
2282

Too late to comment, but I believe this if condition will mark wrong loop carried dependences.

Eg:
SU(0): %62:intregs = PHI %12:intregs, %bb.8, %85:intregs, %bb.9
SU(1): %63:intregs = L2_loadrh_io %62:intregs, 0 :: (load (s16) from %ir.arrayidx.phi)
......
......
......
SU(5): %85:intregs = S2_storerh_pi %62:intregs(tied-def 0), 8, %66:intregs :: (store (s16) into %ir.arrayidx.phi)
.......
.......
SU(10): S2_storerh_io %85:intregs, -6, %72:intregs :: (store (s16) into %ir.arrayidx.phi + 2)

SU1-SU10 will me marked as an LCD

thopre marked an inline comment as done.Dec 15 2022, 2:03 PM
thopre added inline comments.
llvm/lib/CodeGen/MachinePipeliner.cpp
2282

Indeed, I've created https://reviews.llvm.org/D140168 to fix that. Thanks!