Page MenuHomePhabricator

[X86][AMX] Hoist ldtilecfg
ClosedPublic

Authored by pengfei on Mar 19 2021, 11:24 PM.

Details

Summary

The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop.

This patch try to calculate the nearest point where post dominats all shapes of AMX registers.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Minor fix.
Add a test to verify the sink code.

pengfei added inline comments.Fri, Apr 2, 8:28 PM
llvm/test/CodeGen/X86/AMX/amx-ldtilecfg-insert.ll
139

This seems another bug, I'll investigate it.

pengfei updated this revision to Diff 335062.Fri, Apr 2, 11:38 PM

Fix a bug found by test.

pengfei added inline comments.Fri, Apr 2, 11:40 PM
llvm/test/CodeGen/X86/AMX/amx-ldtilecfg-insert.ll
139

Fixed.

pengfei updated this revision to Diff 335078.Sat, Apr 3, 3:48 AM

Refactor.

pengfei updated this revision to Diff 335089.Sat, Apr 3, 7:03 AM

Fixed the problem when the sink need to be forked. I.e.

     +------+
     |Entry | BB0
     +------+
     /     \
+------+  +------+
|Shape1|  |Shape2| BB2
+------+  +------+
BB1  \       /
     +------+
     | AMX  | BB3
     +------+

If BB1 and BB2 don't have a call, we will try to insert ldtilecfg from BB0.
Since BB0 doesn't dominate all shapes, we will sink the insert point to its successors.
The previous code has limitation that only one of BB0's successors has NeedTileCfgLiveIn = true.
This update fixes the problem without major changes. So I amended it instead of putting into a new revision.

pengfei planned changes to this revision.Sat, Apr 3, 8:17 AM

The algorithm for updating shape postdominate BBs is buggy. For a given ShapeBB, clear all its predecessors flag is not enough since its unreachable BBs are also need to clear.
I have thought out a new method but need major refactor. Stay tuned~

pengfei updated this revision to Diff 335142.Sun, Apr 4, 5:27 AM

The algorithm for updating shape postdominate BBs is buggy. For a given ShapeBB, clear all its predecessors flag is not enough since its unreachable BBs are also need to clear.

I have thought out a new method but need major refactor. Stay tuned~

Worked it out.

pengfei updated this revision to Diff 335203.Sun, Apr 4, 10:57 PM

A litter format & comments refactor.

pengfei updated this revision to Diff 335266.Mon, Apr 5, 8:49 AM

Fix a bug in collectShapeInfo.

LuoYuanke added inline comments.Wed, Apr 7, 6:33 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
27

Do you mean all tile registers are caller saved?

364–365

Is this semicolon necessary?

367

May update the comments. All shapes are reachable.

371

Do we need to recursively handle phi instruction? In below example, record "def shape" instruction?

def shape

\
phi
  \
 phi
373

So we would try to insert ldtilecfg at the entry if there is not shape dependency issue?

378

Can we sink the ldtilecfg in below case? We need insert ldtilecfg to B4.

     Entry
   /         \
 B1         B2
call        call
   \           /
       B3
         |
    B4 shape defs
         |
      AMX
pengfei updated this revision to Diff 335975.Wed, Apr 7, 7:15 PM
pengfei marked an inline comment as done.

Address Yuanke's comments.

llvm/lib/Target/X86/X86PreTileConfig.cpp
27

No, tile registers are visible to RA, we don't need to restore them ourselves.
We are mainly doing the restore for the shape register in this pass.

364–365

Good catch. It's residue from previous lambda expression.

371

I don't think so. Each phi will be iterated once. So we just need to peek its latest level and ignore the phi in it.

s1 s2
 \ /
  p1  s3  ...
   \  /   /
    p2   p3
     \  /
      p4

For exampe, if we meet p4 firstly, we find all its defs are phis, we don't do anything. Then, we meet p2 in another iteration and record s3. Then p1 etc. The order doesn't metter, you can see all the real shape defs will be recorded without recursion.

373

Yes. Here the insert points are candidates. We will sink them based on the shape defs. We will insert at the entry if shape dependency satisfies.

378

Yes. We will sink two candidates from B1 anf B2. (If there's no call in B2, the candidates are B1 and Entry) We avoid BB being multi visited. So the second sink will stop at B3. This first sink will cross B3 and reach B4.

xiangzhangllvm added inline comments.Wed, Apr 7, 7:38 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
68

use C++ comments // between line 54,55

80

use getMI() or directly p->MI is more readable.

201

I suggest use tile operand to judge if a MI is AMX Instruction or not. and spilt the collection of shape to another special function.

366

here may be a constant imm in PHI's incomings.

371
  1. MI2->isMoveImmediate() may need to be handled, because the imm may always keep in a register after PHI.

In other wards, if AMX MI use the imm shape, they must use the register after PHI.

  1. the recursive PHI case is very rare, we can let it easy, not handle it, there is not correction problem.
xiangzhangllvm added inline comments.Wed, Apr 7, 8:47 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
375

Not sure the non-bottom BBs in loop can both with "fall through + continue".

pengfei updated this revision to Diff 335989.Wed, Apr 7, 8:55 PM
pengfei marked 2 inline comments as done.

Address Xiang's comments.

llvm/lib/Target/X86/X86PreTileConfig.cpp
201

But we still need to judge if it is PHI. I don't know if there're other exceptions.
We have split it to collectShapeInfo.

366

Seems IR parser will normalize this case, so we don't need to handle it.

371
  1. The PostRA pass can handle this case.
  2. We have covered this case here.
pengfei added inline comments.Wed, Apr 7, 8:57 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
375

non-bottom BBs can propagate until the bottom BB.

xiangzhangllvm added inline comments.Wed, Apr 7, 10:56 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
201

Yes, PHI is exception, the exceptions is very few, more better than list all the AMX opcodes.

371

I don't think "Each phi will be iterated once."

MRI->def_operands(ShapeReg) only contains the shape reg used in AMX MI

s1 s2
 \ /
  p1  s3  ...
   \  /   /
    p2   p3
     \  /
      p4
 then AMX MI use p4 as this shape,

only p4 in def_operands(ShapeReg).

371

But here missed the def shape BB, it is not meet the following logic of use "shapeBBNumber == ShapeReachedCount" to judge the BB is all shape def reachable.

   reg=imm        ...
   /     \         / 
amx          phi
              |
             BB
pengfei added inline comments.Wed, Apr 7, 11:12 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
371

You are right, I excluded phi at first. I'll refactor for it. Thanks!

xiangzhangllvm added inline comments.Wed, Apr 7, 11:28 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
373

I am a little puzzle here, The I.MBB is all shape def reachable, but how can we make sure the BBVisitedInfo[I.MBB].LastShape is shape def reachable ? They may be not same BB.

pengfei added inline comments.Wed, Apr 7, 11:50 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
373

LastShape is the last shape def in a BB. It is only used when its BB is all shape def reachable. I.e. insert ldtilecfg after it. But it can be null if the BB has no shape def, which means we can insert at the begining of the BB.

pengfei updated this revision to Diff 336036.Thu, Apr 8, 2:13 AM
pengfei marked an inline comment as done.

Address Xiang's comment.
Iterate all PHIs recursively to find all shapes.

LuoYuanke added inline comments.Thu, Apr 8, 5:23 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
368

Better to add test case to cover the logic.

373

Better to add test case for it.

pengfei updated this revision to Diff 336094.Thu, Apr 8, 7:12 AM
pengfei marked 2 inline comments as done.

Address Yuanke's comments.

Add test5 for checking shape peek's loop break.
Add test6 for checking shape reachability's loop break.

pengfei updated this revision to Diff 336610.Sat, Apr 10, 6:49 AM

Fix a silly bug that uses != in case == and a bug when records shape for phi.

This is found by work with PostRA implementation. This should be found by case amx-ldtilecfg-insert.ll:, but happened to get expected output with these 2 bugs.

pengfei updated this revision to Diff 336636.Sat, Apr 10, 8:01 PM

Fix crush when build with -g.

pengfei updated this revision to Diff 336638.Sat, Apr 10, 8:24 PM

Add back a condition deleted last time by mistaken.

LuoYuanke added inline comments.Sun, Apr 11, 1:09 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
173

!MI.isPseudo()?

xiangzhangllvm added inline comments.Sun, Apr 11, 6:23 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
370

This should be calculated in ShapeNum, for AMX always use reg as its shape after PHI.

pengfei updated this revision to Diff 336816.Mon, Apr 12, 6:20 AM

Refactor for excluding DBG_VALUE case.

llvm/lib/Target/X86/X86PreTileConfig.cpp
173

Yeah, I meant to. Surprisingly, we didn't set this flag when we define PseudoI. So the tests happened to pass. :)

370

The immediate value is specially handled in PostRA pass. Since the immediate value is always defined adjoining its use, we cannot calculate it as shape def. Otherwise, they will intersect with other AMX instructions.

LuoYuanke accepted this revision.Mon, Apr 12, 6:22 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Mon, Apr 12, 6:22 AM
This revision was landed with ongoing or failed builds.Mon, Apr 12, 7:37 AM
This revision was automatically updated to reflect the committed changes.