This is an archive of the discontinued LLVM Phabricator instance.

[X86][AMX] Replace PXOR instruction with SET0 in AMX pre config.
ClosedPublic

Authored by LuoYuanke on May 3 2022, 10:02 PM.

Details

Summary

To generate zero value, the PXOR instruction need 3 operands that is
tied to the same vreg. If is not good in SSA form and with undef value
two address instruction pass may convert
%0:vr128 = PXORrr undef %0, undef %0
to %1:vr128 = PXORrr undef %1:vr128(tied-def 0), undef %0:vr128.
It is not expected.
It can be simplified to SET0 instruction which only take 1 destination
operand. It should be more friendly to two address instruction pass and
register allocation pass.
%0:vr128 = V_SET0

Diff Detail

Event Timeline

LuoYuanke created this revision.May 3 2022, 10:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 10:02 PM
LuoYuanke requested review of this revision.May 3 2022, 10:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 10:02 PM
LuoYuanke retitled this revision from [X86][AMX] Replace PXOR instruction with SET0 in AMX pre config. to [X86][AMX][NFC] Replace PXOR instruction with SET0 in AMX pre config..May 3 2022, 10:03 PM
craig.topper added inline comments.May 3 2022, 10:04 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
403

Why is there no AVX1 code path to use VMOVUPSmr for 128 bits?

LuoYuanke added inline comments.May 3 2022, 10:11 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
403

I think usaully AMX imply AVX512 enabled, and for the correctness, it can be covered by SSE2 instruction MOVUPSmr. So I think AVX1 code path is nice to have, but not must.

craig.topper added inline comments.May 3 2022, 10:14 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
403

We usually try not to mix AVX and SSE code due to the impact to the upper bits and how it affects the vzeroupper insertion. But I guess since these are stores they don't dirty the upper bits so maybe it isn't an impact.

LuoYuanke added inline comments.May 3 2022, 10:18 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
403

Thanks for reminding. Actually I'm going to ask why use VMOVUPSmr for 128 bits given MOVUPSmr works both for SEE and AVX. So AVX-SSE switch penatly is a reason.

LuoYuanke added inline comments.May 3 2022, 10:22 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
403

@craig.topper, do you prefer adding AVX1 code path?

Is this patch really NFC?

LuoYuanke retitled this revision from [X86][AMX][NFC] Replace PXOR instruction with SET0 in AMX pre config. to [X86][AMX] Replace PXOR instruction with SET0 in AMX pre config..May 3 2022, 10:27 PM
craig.topper added inline comments.May 3 2022, 10:27 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
403

It's more consistent with other code. I wouldn't do it in this patch though.

LuoYuanke added inline comments.May 3 2022, 10:30 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
403

OK, I'll add the AVX1 code path.

Is this patch really NFC?

I do see some lit cases fail. I will update the patch.

LuoYuanke updated this revision to Diff 426920.May 3 2022, 11:28 PM

Address Craig's comments.

Is it possible to get it to use a register other than xmm0 in a test? Maybe add a vector function argument?

LuoYuanke updated this revision to Diff 426923.May 3 2022, 11:45 PM

Address Craig's comments.

Is it possible to get it to use a register other than xmm0 in a test? Maybe add a vector function argument?

Done in test/CodeGen/X86/AMX/amx-config.ll which is used to test the stack slot initialization for different target features.

pengfei added inline comments.May 4 2022, 2:23 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
401

hasAVX1?

LuoYuanke added inline comments.May 4 2022, 3:37 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
401

Yes, I forgot to change it to hasAVX. Thanks.

craig.topper added inline comments.May 4 2022, 8:42 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
401

AVX should be a separate patch. Can you share most of the AVX and SSE2 code by picking the store opcode at the start of the block?

xiangzhangllvm added inline comments.May 4 2022, 6:09 PM
llvm/test/CodeGen/X86/AMX/amx-config.ll
11

Why change the void to return parameter %xmm0,I didn't see %xmm0 join any calculation.

LuoYuanke added inline comments.May 4 2022, 6:19 PM
llvm/test/CodeGen/X86/AMX/amx-config.ll
11

Because we want register other than xmm0 is allocated to V_SET0. I think it is to test if there is some potential issue in register allocation. @craig.topper, am I right?

To generate zero value, ... may convert

>%0:vr128 = PXORrr undef %0, undef %0
>to %1:vr128 = PXORrr undef %1:vr128(tied-def 0), undef %0:vr128.

It is not expected.

And I think this is not only "expected", this convert is not equal/right, it should not do this convert.

craig.topper added inline comments.May 4 2022, 6:39 PM
llvm/test/CodeGen/X86/AMX/amx-config.ll
11

If you run this modified test without this patch, does it show the bug?

craig.topper added inline comments.May 4 2022, 6:43 PM
llvm/test/CodeGen/X86/AMX/amx-config.ll
11

I guess not because you're doing this after the Two Address Instruction Pass runs and you've left SSA?

LuoYuanke added inline comments.May 4 2022, 6:49 PM
llvm/test/CodeGen/X86/AMX/amx-config.ll
11

This pass runs before Two Address Instruction Pass. It is in SSA form.

craig.topper added inline comments.May 4 2022, 6:55 PM
llvm/test/CodeGen/X86/AMX/amx-config.ll
11

My mistake, I guess it worked because the same vreg was used for src and dest even though we're in SSA form. Is that allowed because it is undef?

LuoYuanke added inline comments.May 4 2022, 6:58 PM
llvm/test/CodeGen/X86/AMX/amx-config.ll
11

It seems invalid to declare it is in SSA form, while have src and dest tied to same vreg.

LuoYuanke updated this revision to Diff 427176.May 4 2022, 7:15 PM

Address Craig and Phoebe's comments.

LuoYuanke added inline comments.May 4 2022, 7:19 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
401

@craig.topper , do you prefer to having AVX code path in a separate patch?

craig.topper added inline comments.May 4 2022, 7:28 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
401

The lambda feels like overkill. keep has.SSE2 else block. And do

unsigned StoreOpc = hasAVX() ? X86::VMOVUPSmr : MOVUPSmr.

LuoYuanke added inline comments.May 4 2022, 7:31 PM
llvm/lib/Target/X86/X86PreTileConfig.cpp
401

Ok, got you now. :)

LuoYuanke updated this revision to Diff 427178.May 4 2022, 7:37 PM

Address Craig's comments.

This revision is now accepted and ready to land.May 4 2022, 7:40 PM
This revision was landed with ongoing or failed builds.May 4 2022, 7:53 PM
This revision was automatically updated to reflect the committed changes.