Page MenuHomePhabricator

[X86] Fix tile config register spill issue.
ClosedPublic

Authored by LuoYuanke on Jan 5 2021, 10:55 PM.

Details

Summary

Previous code build the model that tile config register is the user of
each AMX instruction. There is a problem for the tile config register
spill. When across function, the ldtilecfg instruction may be inserted
on each AMX instruction which use tile config register. This cause all
tile data register clobber.
To fix this issue, we remove the model of tile config register. We
analyze the regmask of call instruction and insert ldtilecfg if there is
any tile data register live across the call. Inserting the sttilecfg
before the call is unneccessary, because the tile config doesn't change
and we can just reload the config.
We also need check tile config register interference. Since we
don't model the config register we should check interference from the
ldtilecfg to each tile data register def.

  ldtilecfg
  /       \
BB1      BB2
 /         \
call       BB3
/           \

%1=tileload %2=tilezero
We can start from the instruction of each tile def, and backward to
ldtilecfg. If there is any call instruction, and tile data register is
not preserved, we should insert ldtilecfg after the call instruction.

Diff Detail

Event Timeline

LuoYuanke created this revision.Jan 5 2021, 10:55 PM
LuoYuanke requested review of this revision.Jan 5 2021, 10:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2021, 10:55 PM
LuoYuanke updated this revision to Diff 314806.Jan 5 2021, 11:14 PM

Remove dead code.

pengfei added inline comments.Jan 5 2021, 11:43 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
4602

I think the removal of CFG may enable you to use TD patterns.

LuoYuanke added inline comments.Jan 5 2021, 11:45 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
4602

Good idea. I'll try it.

LuoYuanke updated this revision to Diff 314815.Jan 6 2021, 12:38 AM

Add test case for IPRA.

pengfei added inline comments.Jan 12 2021, 8:10 PM
llvm/include/llvm/CodeGen/LiveIntervals.h
383

Seems we usually use SmallVectorImpl<...> here.

llvm/lib/CodeGen/LiveIntervals.cpp
956

The same here.

982

Should Found always be true in our case?

llvm/test/CodeGen/X86/AMX/amx-across-func.ll
133

How about if it is a tail call?

137

It's better to add another foo after all users of tmm registers to make sure no ldtilecfg been generated.

llvm/test/CodeGen/X86/AMX/amx-spill-merge.ll
107

It's better add a call before any user of tmm instructions to see if ldtilecfg can be sunk

LuoYuanke added inline comments.Jan 12 2021, 11:40 PM
llvm/lib/CodeGen/LiveIntervals.cpp
982

No. For example slots is { 16, 32, 64 } and LiveI is <17, 30>. 17 is LiveI->start, then SlotI is 32, but 32 is not in the range of <17, 30>, so Found will be false.

llvm/test/CodeGen/X86/AMX/amx-across-func.ll
133
  1. Tail call should be the last instruction of a function, so I think no tile register should live through tail call.
  2. Tail call will be transformed to Jmp and won't return back, so even we insert ldtilecfg, it won't be executed.
137

Good suggestion. I'll add anther case for it.

llvm/test/CodeGen/X86/AMX/amx-spill-merge.ll
107

Yes, this looks a good case to me, I'll add it.

LuoYuanke updated this revision to Diff 316364.Jan 13 2021, 4:16 AM

Address Pengfei's comments.

LuoYuanke updated this revision to Diff 316634.Jan 14 2021, 5:30 AM

We also need check tile config register interference. Since we
don't model the config register we should check interference from the
ldtilecfg to each tile data register def.

      ldtilecfg
      /       \
    BB1      BB2
     /         \
    call       BB3
    /           \
%1=tileload   %2=tilezero

We can start from the instruction of each tile def, and backward to
ldtilecfg. If there is any call instruction, and tile data register is
not preserved, we should insert ldtilecfg after the call instruction.

LuoYuanke edited the summary of this revision. (Show Details)Jan 14 2021, 5:31 AM
LuoYuanke updated this revision to Diff 316635.Jan 14 2021, 5:42 AM

Get tile register number from backend.

tile config register is the user of each AMX instruction

Should be all AMX instructions are users of tile config register?

We can start from the instruction of each tile def

I think instructions like tilestore only have tile use, right? How about analyzing the tile instructions? Can we simply add a ldtilecfg after slot1 (assuming it is a call) if we saw tile instructions between slot1 and slot2?

slot0 ... slot1 ... tileInst1 ... slot2

LuoYuanke updated this revision to Diff 317161.Jan 16 2021, 1:41 AM

Make the logic more clean.

xiangzhangllvm added inline comments.Jan 16 2021, 5:44 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
325

If you don't want to re-iterate the handled instructions in the first BB, PLS re-consider of following changes:
if (MIVisited.count(&*I)) {

MBBVisited.insert(MBB);
break;

}

346–347

Pls move after line 358

350–351
if (Terminate || MBBVisited.count(MBB))
  break;
359
if (!First)
  MBBVisited.insert(MBB);
First = false;
LuoYuanke updated this revision to Diff 317196.Jan 16 2021, 6:02 PM

Address Xiang's comments.

LuoYuanke marked an inline comment as done.Jan 16 2021, 6:05 PM
LuoYuanke added inline comments.
llvm/lib/Target/X86/X86PreTileConfig.cpp
325

I'd like to make it stable first, so I would make the logic as simple as possible. I prefer not judging the visited call to determine we terminate the iteration or not.

350–351

The same reason, not count on visited call to terminate the iteration.

LuoYuanke marked an inline comment as done.Jan 16 2021, 6:05 PM
pengfei added inline comments.Jan 19 2021, 1:17 AM
llvm/lib/Target/X86/X86FrameLowering.cpp
2098

Better to use unsigned

llvm/lib/Target/X86/X86PreTileConfig.cpp
110

Better to keep the function the same place with before.

296–300

I think using push_back and pop_front seems more clear.

351

Curly brackets can be removed.

353

Better to use push_back

LuoYuanke added inline comments.Jan 19 2021, 3:45 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
296–300

It seems there is no member push_back and pop_front for std::queue.

LuoYuanke updated this revision to Diff 317515.Jan 19 2021, 3:45 AM

Address Pengfei's comments.

pengfei accepted this revision.Jan 19 2021, 6:40 PM

LGTM, but let's wait for one day or two to see others' opinions.

llvm/lib/Target/X86/X86PreTileConfig.cpp
227–290

The code here looks correct but not optimized to me. But I think we can land it first and refine it later.

This revision is now accepted and ready to land.Jan 19 2021, 6:40 PM

LGTM, but let's wait for one day or two to see others' opinions.

Thank Pengfei. Sure, I'll wait for one day or two to see others' opinions.

This revision was landed with ongoing or failed builds.Jan 21 2021, 12:02 AM
This revision was automatically updated to reflect the committed changes.
pengfei added inline comments.Jan 21 2021, 12:35 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
372

Can moving this after line 375 solve this problem?

LuoYuanke added inline comments.Jan 21 2021, 12:38 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
372

That may be caused by live interval analysis. Let me try.
Anyone knows how to duplicate the regression?

pengfei added inline comments.Jan 21 2021, 12:49 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
372
LuoYuanke added inline comments.Jan 21 2021, 12:58 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
372

Thank you. I'll try to reproduce it.

I have an optimizated patch D95136, which is based on the method I talked to you. It doesn't use live interval analysis, so I think its also will fix the compile time regression.