To generate zero value, the PXOR instruction need 3 operands that is
tied to the same vreg. If is not good in SSA form and with undef value
two address instruction pass may convert
%0:vr128 = PXORrr undef %0, undef %0
to %1:vr128 = PXORrr undef %1:vr128(tied-def 0), undef %0:vr128.
It is not expected.
It can be simplified to SET0 instruction which only take 1 destination
operand. It should be more friendly to two address instruction pass and
register allocation pass.
%0:vr128 = V_SET0
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | Why is there no AVX1 code path to use VMOVUPSmr for 128 bits? |
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | I think usaully AMX imply AVX512 enabled, and for the correctness, it can be covered by SSE2 instruction MOVUPSmr. So I think AVX1 code path is nice to have, but not must. |
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | We usually try not to mix AVX and SSE code due to the impact to the upper bits and how it affects the vzeroupper insertion. But I guess since these are stores they don't dirty the upper bits so maybe it isn't an impact. |
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | Thanks for reminding. Actually I'm going to ask why use VMOVUPSmr for 128 bits given MOVUPSmr works both for SEE and AVX. So AVX-SSE switch penatly is a reason. |
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | @craig.topper, do you prefer adding AVX1 code path? |
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | It's more consistent with other code. I wouldn't do it in this patch though. |
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | OK, I'll add the AVX1 code path. |
Is it possible to get it to use a register other than xmm0 in a test? Maybe add a vector function argument?
Done in test/CodeGen/X86/AMX/amx-config.ll which is used to test the stack slot initialization for different target features.
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | hasAVX1? |
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | Yes, I forgot to change it to hasAVX. Thanks. |
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | AVX should be a separate patch. Can you share most of the AVX and SSE2 code by picking the store opcode at the start of the block? |
llvm/test/CodeGen/X86/AMX/amx-config.ll | ||
---|---|---|
11 | Why change the void to return parameter %xmm0,I didn't see %xmm0 join any calculation. |
llvm/test/CodeGen/X86/AMX/amx-config.ll | ||
---|---|---|
11 | Because we want register other than xmm0 is allocated to V_SET0. I think it is to test if there is some potential issue in register allocation. @craig.topper, am I right? |
To generate zero value, ... may convert
>%0:vr128 = PXORrr undef %0, undef %0 >to %1:vr128 = PXORrr undef %1:vr128(tied-def 0), undef %0:vr128.
It is not expected.
And I think this is not only "expected", this convert is not equal/right, it should not do this convert.
llvm/test/CodeGen/X86/AMX/amx-config.ll | ||
---|---|---|
11 | If you run this modified test without this patch, does it show the bug? |
llvm/test/CodeGen/X86/AMX/amx-config.ll | ||
---|---|---|
11 | I guess not because you're doing this after the Two Address Instruction Pass runs and you've left SSA? |
llvm/test/CodeGen/X86/AMX/amx-config.ll | ||
---|---|---|
11 | This pass runs before Two Address Instruction Pass. It is in SSA form. |
llvm/test/CodeGen/X86/AMX/amx-config.ll | ||
---|---|---|
11 | My mistake, I guess it worked because the same vreg was used for src and dest even though we're in SSA form. Is that allowed because it is undef? |
llvm/test/CodeGen/X86/AMX/amx-config.ll | ||
---|---|---|
11 | It seems invalid to declare it is in SSA form, while have src and dest tied to same vreg. |
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | @craig.topper , do you prefer to having AVX code path in a separate patch? |
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | The lambda feels like overkill. keep has.SSE2 else block. And do unsigned StoreOpc = hasAVX() ? X86::VMOVUPSmr : MOVUPSmr. |
llvm/lib/Target/X86/X86PreTileConfig.cpp | ||
---|---|---|
393 | Ok, got you now. :) |
Why is there no AVX1 code path to use VMOVUPSmr for 128 bits?