This is an archive of the discontinued LLVM Phabricator instance.

[X86][AMX] Hoist ldtilecfg
ClosedPublic

Authored by pengfei on Mar 19 2021, 11:24 PM.

Details

Summary

The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop.

This patch try to calculate the nearest point where post dominats all shapes of AMX registers.

Diff Detail

Event Timeline

pengfei created this revision.Mar 19 2021, 11:24 PM
pengfei requested review of this revision.Mar 19 2021, 11:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2021, 11:24 PM
pengfei planned changes to this revision.Mar 20 2021, 2:51 AM

Since D98845 is landed. I'd like to do the ldtilecfg hoist together with this patch. WIP.

pengfei updated this revision to Diff 334577.Mar 31 2021, 7:50 PM

Implement ldtilecfg hoist.

pengfei retitled this revision from [X86] Fix a bug when calculating the ldtilecfg insertion points. to [X86][AMX] Hoist ldtilecfg.Mar 31 2021, 7:52 PM
pengfei edited the summary of this revision. (Show Details)
pengfei updated this revision to Diff 334582.Mar 31 2021, 9:09 PM

Minor fix.

pengfei updated this revision to Diff 334599.Mar 31 2021, 11:36 PM

Fix a bug in calculating HasAMXBeforeCall.

pengfei updated this revision to Diff 334606.Apr 1 2021, 12:30 AM

Minor fix.

LuoYuanke added inline comments.Apr 1 2021, 8:06 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
8–9

Since we change the algorithm, we need to update the pass description.

97

In line 68 Pos is size_t, but the input is unsigned. Need to align the type?

112

Better to comment on each condition.

call
here we need to reload tile config.
amx instr
115

Does HasAMXBeforeCall mean amx config register live in MBB?

127

PostDominateAllShapes.

128

This looks to represent the tile register live through MBB in the code. Basically we need to reconfig after last call if the config register live out the MBB.

129–130

We can initialize at the beginning. Maybe at runOnMachineFunction().

155

Does HasAMXBeforeCall mean amx tile config register live through the MBB?

164

Maybe rename it as "update(MachineBasicBlock *Pred, MachineBasicBlock *MBB)"

pengfei updated this revision to Diff 334892.Apr 1 2021, 9:32 PM
pengfei marked 8 inline comments as done.

Address Yuanke and Xiang's comments.

llvm/lib/Target/X86/X86PreTileConfig.cpp
128

Yes, But it's equal to requiring instead of describing tile config register live into MBB.

    +----+
BB0 |Cfg |
    +----+
    /    \
+----+  +----+ 
|Call|  |AMX |
|AMX |  |Call|
+----+  +----+
  BB1     BB2

For example, tilecfg lives out BB0 and lives in BB1 and BB2, but we require it lives in for BB2 only. So,

BB1.NeedTileCfgLiveIn = false;
BB2.NeedTileCfgLiveIn = true;
164

How about MBB and Succ, I'd like to use MBB to respect the current node that is operating.
It's more clear to me if we consider UpdateShapeDominators is more independent with DFSPredsUpdater.

pengfei updated this revision to Diff 334917.Apr 2 2021, 1:17 AM

Minor fix.

Perhaps we need more comments and more test cases (maybe in a sperate file) to cover those scenario.

llvm/lib/Target/X86/X86PreTileConfig.cpp
164

Better to add comments to say shape is defined in Succ, so the pred won't postdominate the shape.

166

Here need comments to explain why the except is needed.

175

Add comment to explain why the loop is needed, and how to find a position to insert.

177

Add comment to explain why it fails.

179

If possible, better add a test case for it.

181

Better to split the function into 2 function. One is isAMXInstruction(), the other is collectShapeInfo().

191

Is it possible that LastShape is null?

pengfei updated this revision to Diff 334967.Apr 2 2021, 9:02 AM
pengfei marked 4 inline comments as done.

Address Yuanke's comments.

llvm/lib/Target/X86/X86PreTileConfig.cpp
179

Will do later.

191

Yes, compare with MIRef() is allowed. The Pos of it is 0. So the condition is false here.

pengfei updated this revision to Diff 335056.Apr 2 2021, 8:20 PM

Minor fix.
Add a test to verify the sink code.

pengfei added inline comments.Apr 2 2021, 8:28 PM
llvm/test/CodeGen/X86/AMX/amx-ldtilecfg-insert.ll
139

This seems another bug, I'll investigate it.

pengfei updated this revision to Diff 335062.Apr 2 2021, 11:38 PM

Fix a bug found by test.

pengfei added inline comments.Apr 2 2021, 11:40 PM
llvm/test/CodeGen/X86/AMX/amx-ldtilecfg-insert.ll
139

Fixed.

pengfei updated this revision to Diff 335078.Apr 3 2021, 3:48 AM

Refactor.

pengfei updated this revision to Diff 335089.Apr 3 2021, 7:03 AM

Fixed the problem when the sink need to be forked. I.e.

     +------+
     |Entry | BB0
     +------+
     /     \
+------+  +------+
|Shape1|  |Shape2| BB2
+------+  +------+
BB1  \       /
     +------+
     | AMX  | BB3
     +------+

If BB1 and BB2 don't have a call, we will try to insert ldtilecfg from BB0.
Since BB0 doesn't dominate all shapes, we will sink the insert point to its successors.
The previous code has limitation that only one of BB0's successors has NeedTileCfgLiveIn = true.
This update fixes the problem without major changes. So I amended it instead of putting into a new revision.

pengfei planned changes to this revision.Apr 3 2021, 8:17 AM

The algorithm for updating shape postdominate BBs is buggy. For a given ShapeBB, clear all its predecessors flag is not enough since its unreachable BBs are also need to clear.
I have thought out a new method but need major refactor. Stay tuned~

pengfei updated this revision to Diff 335142.Apr 4 2021, 5:27 AM

The algorithm for updating shape postdominate BBs is buggy. For a given ShapeBB, clear all its predecessors flag is not enough since its unreachable BBs are also need to clear.

I have thought out a new method but need major refactor. Stay tuned~

Worked it out.

pengfei updated this revision to Diff 335203.Apr 4 2021, 10:57 PM

A litter format & comments refactor.

pengfei updated this revision to Diff 335266.Apr 5 2021, 8:49 AM

Fix a bug in collectShapeInfo.

LuoYuanke added inline comments.Apr 7 2021, 6:33 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
17–19

Do you mean all tile registers are caller saved?

204

Do we need to recursively handle phi instruction? In below example, record "def shape" instruction?

def shape

\
phi
  \
 phi
267

So we would try to insert ldtilecfg at the entry if there is not shape dependency issue?

307

May update the comments. All shapes are reachable.

318

Can we sink the ldtilecfg in below case? We need insert ldtilecfg to B4.

     Entry
   /         \
 B1         B2
call        call
   \           /
       B3
         |
    B4 shape defs
         |
      AMX
357–358

Is this semicolon necessary?

pengfei updated this revision to Diff 335975.Apr 7 2021, 7:15 PM
pengfei marked an inline comment as done.

Address Yuanke's comments.

llvm/lib/Target/X86/X86PreTileConfig.cpp
17–19

No, tile registers are visible to RA, we don't need to restore them ourselves.
We are mainly doing the restore for the shape register in this pass.

204

I don't think so. Each phi will be iterated once. So we just need to peek its latest level and ignore the phi in it.

s1 s2
 \ /
  p1  s3  ...
   \  /   /
    p2   p3
     \  /
      p4

For exampe, if we meet p4 firstly, we find all its defs are phis, we don't do anything. Then, we meet p2 in another iteration and record s3. Then p1 etc. The order doesn't metter, you can see all the real shape defs will be recorded without recursion.

267

Yes. Here the insert points are candidates. We will sink them based on the shape defs. We will insert at the entry if shape dependency satisfies.

318

Yes. We will sink two candidates from B1 anf B2. (If there's no call in B2, the candidates are B1 and Entry) We avoid BB being multi visited. So the second sink will stop at B3. This first sink will cross B3 and reach B4.

357–358

Good catch. It's residue from previous lambda expression.

xiangzhangllvm added inline comments.Apr 7 2021, 7:38 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
55

use C++ comments // between line 54,55

67

use getMI() or directly p->MI is more readable.

140

I suggest use tile operand to judge if a MI is AMX Instruction or not. and spilt the collection of shape to another special function.

199

here may be a constant imm in PHI's incomings.

204
  1. MI2->isMoveImmediate() may need to be handled, because the imm may always keep in a register after PHI.

In other wards, if AMX MI use the imm shape, they must use the register after PHI.

  1. the recursive PHI case is very rare, we can let it easy, not handle it, there is not correction problem.
xiangzhangllvm added inline comments.Apr 7 2021, 8:47 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
312

Not sure the non-bottom BBs in loop can both with "fall through + continue".

pengfei updated this revision to Diff 335989.Apr 7 2021, 8:55 PM
pengfei marked 2 inline comments as done.

Address Xiang's comments.

llvm/lib/Target/X86/X86PreTileConfig.cpp
140

But we still need to judge if it is PHI. I don't know if there're other exceptions.
We have split it to collectShapeInfo.

199

Seems IR parser will normalize this case, so we don't need to handle it.

204
  1. The PostRA pass can handle this case.
  2. We have covered this case here.
pengfei added inline comments.Apr 7 2021, 8:57 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
312

non-bottom BBs can propagate until the bottom BB.

xiangzhangllvm added inline comments.Apr 7 2021, 10:56 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
140

Yes, PHI is exception, the exceptions is very few, more better than list all the AMX opcodes.

204

I don't think "Each phi will be iterated once."

MRI->def_operands(ShapeReg) only contains the shape reg used in AMX MI

s1 s2
 \ /
  p1  s3  ...
   \  /   /
    p2   p3
     \  /
      p4
 then AMX MI use p4 as this shape,

only p4 in def_operands(ShapeReg).

204

But here missed the def shape BB, it is not meet the following logic of use "shapeBBNumber == ShapeReachedCount" to judge the BB is all shape def reachable.

   reg=imm        ...
   /     \         / 
amx          phi
              |
             BB
pengfei added inline comments.Apr 7 2021, 11:12 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
204

You are right, I excluded phi at first. I'll refactor for it. Thanks!

xiangzhangllvm added inline comments.Apr 7 2021, 11:28 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
332

I am a little puzzle here, The I.MBB is all shape def reachable, but how can we make sure the BBVisitedInfo[I.MBB].LastShape is shape def reachable ? They may be not same BB.

pengfei added inline comments.Apr 7 2021, 11:50 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
332

LastShape is the last shape def in a BB. It is only used when its BB is all shape def reachable. I.e. insert ldtilecfg after it. But it can be null if the BB has no shape def, which means we can insert at the begining of the BB.

pengfei updated this revision to Diff 336036.Apr 8 2021, 2:13 AM
pengfei marked an inline comment as done.

Address Xiang's comment.
Iterate all PHIs recursively to find all shapes.

LuoYuanke added inline comments.Apr 8 2021, 5:23 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
201

Better to add test case to cover the logic.

310

Better to add test case for it.

pengfei updated this revision to Diff 336094.Apr 8 2021, 7:12 AM
pengfei marked 2 inline comments as done.

Address Yuanke's comments.

Add test5 for checking shape peek's loop break.
Add test6 for checking shape reachability's loop break.

pengfei updated this revision to Diff 336610.Apr 10 2021, 6:49 AM

Fix a silly bug that uses != in case == and a bug when records shape for phi.

This is found by work with PostRA implementation. This should be found by case amx-ldtilecfg-insert.ll:, but happened to get expected output with these 2 bugs.

pengfei updated this revision to Diff 336636.Apr 10 2021, 8:01 PM

Fix crush when build with -g.

pengfei updated this revision to Diff 336638.Apr 10 2021, 8:24 PM

Add back a condition deleted last time by mistaken.

LuoYuanke added inline comments.Apr 11 2021, 1:09 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
109

!MI.isPseudo()?

xiangzhangllvm added inline comments.Apr 11 2021, 6:23 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
184

This should be calculated in ShapeNum, for AMX always use reg as its shape after PHI.

pengfei updated this revision to Diff 336816.Apr 12 2021, 6:20 AM

Refactor for excluding DBG_VALUE case.

llvm/lib/Target/X86/X86PreTileConfig.cpp
109

Yeah, I meant to. Surprisingly, we didn't set this flag when we define PseudoI. So the tests happened to pass. :)

184

The immediate value is specially handled in PostRA pass. Since the immediate value is always defined adjoining its use, we cannot calculate it as shape def. Otherwise, they will intersect with other AMX instructions.

LuoYuanke accepted this revision.Apr 12 2021, 6:22 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Apr 12 2021, 6:22 AM
This revision was landed with ongoing or failed builds.Apr 12 2021, 7:37 AM
This revision was automatically updated to reflect the committed changes.