This is an archive of the discontinued LLVM Phabricator instance.

[X86] SET0 to use XMM registers where possible PR26018 PR32862
ClosedPublic

Authored by dtemirbulatov on Jul 25 2017, 7:36 AM.

Details

Summary

VEX-encoded vxorps %xmm0, %xmm0, %xmm0 does the same thing as EVEX-encoded vxorps %zmm0, %zmm0, %zmm0, zeroing the full-width vector and breaking dependencies on the old value of the architectural register.

The VEX version is one byte shorter than the EVEX version.

Diff Detail

Repository
rL LLVM

Event Timeline

dtemirbulatov created this revision.Jul 25 2017, 7:36 AM

oh, I missed "X86::AVX512_256_SET0:" case , I have to check some testcases, I think I saw there something incorrect. I will redo my change.

RKSimon added inline comments.Jul 25 2017, 7:59 AM
test/CodeGen/X86/vector-shuffle-combining-avx.ll
7 ↗(On Diff #108074)

Please can improve the test prefixes to avoid so much duplication:

; RUN: llc < %s -mtriple=i686-unknown -mattr=+avx | FileCheck %s --check-prefix=X32 --check-prefix=X32-AVX
; RUN: llc < %s -mtriple=i686-unknown -mattr=+avx2 | FileCheck %s --check-prefix=X32 --check-prefix=X32-AVX
; RUN: llc < %s -mtriple=i686-unknown -mattr=+avx512f | FileCheck %s --check-prefix=X32 --check-prefix=X32-AVX512
; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx | FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX
; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2 | FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX
; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx512f | FileCheck %s --check-prefix=X64 --check-prefix=X64-AVX512
delena edited edge metadata.Jul 25 2017, 10:03 AM

We have EVEX to VEX pass to shorten the encoding. But you replaced ymm with xmm. I don't understand why it's better.

RKSimon edited edge metadata.Jul 25 2017, 10:45 AM

We have EVEX to VEX pass to shorten the encoding. But you replaced ymm with xmm. I don't understand why it's better.

This came up on PR32862, which was mainly about AMD 128-bit SIMD ALUs being able to create fewer uops in 256-bit vector code by making use of VEX implicit zeroing of the upper subvectors. Is there any benefit to using xmm on AVX512 hardware or would ymm be better?

We have EVEX to VEX pass to shorten the encoding. But you replaced ymm with xmm. I don't understand why it's better.

This came up on PR32862, which was mainly about AMD 128-bit SIMD ALUs being able to create fewer uops in 256-bit vector code by making use of VEX implicit zeroing of the upper subvectors. Is there any benefit to using xmm on AVX512 hardware or would ymm be better?

There is no any diff between xmm and ymm on Intel processors, AFAIK.
I was wrong talking about EVEX2VEX pass, btw. It will do nothing with zmm.
Do AMD processors support AVX-512?

lib/Target/X86/X86InstrInfo.cpp
7743 ↗(On Diff #108074)

Please run clang-format on this code.

There is no any diff between xmm and ymm on Intel processors, AFAIK.
I was wrong talking about EVEX2VEX pass, btw. It will do nothing with zmm.

So should AVX512 zmm zeroing be done with ymm instead of xmm?

Do AMD processors support AVX-512?

No, just AVX2 with Excavator (bdver4) and Ryzen (znver1) - both have 128-bit ALUs and must 'double-pump' 256-bit instructions as separate uops and then merge them at retirement. Agner's microarchitecture.pdf has more details.

So should AVX512 zmm zeroing be done with ymm instead of xmm?

Doesn't matter. They both are good.

add X86::AVX512_256_SET0 handling, format, rebase

oh, I missed that vec_uint_to_fp-fastmath.ll should be updated manually. I will redo my change.

and memset.ll, sorry.

updated manually vec_uint_to_fp-fastmath.ll, memset.ll

We decided to split this change into two AVX and AVX512 , this is part one AVX and AVX2.

A couple of minors, but otherwise seems almost ready.

test/CodeGen/X86/vec_uint_to_fp-fastmath.ll
1 ↗(On Diff #108342)

Don't regenerate this file (we want the symbols to still be tested) - just manually tweak the xor cases.

test/CodeGen/X86/vector-shuffle-combining-avx.ll
7 ↗(On Diff #108074)

This still stands

Update after RKSimon's remarks.

craig.topper edited edge metadata.Jul 26 2017, 10:36 PM

Should we go ahead and do X86::AVX512_256_SET0 as part of this so the tests don't split?

ok, thanks, I will redo the change.

we don't need to change X86::AVX512_256_SET0, it is already using X86::sub_ymm there, I have not seen any ZMM to YMM change in the diff.

update with X86::AVX512_256_SET0 included, please ignore my last comment

Just vector-shuffle-combining-avx.ll to fix and then I think its ready

Update to vector-shuffle-combining-avx.ll

RKSimon accepted this revision.Jul 27 2017, 7:56 AM

LGTM

This revision is now accepted and ready to land.Jul 27 2017, 7:56 AM

I know Simon already accepted this, but shouldn't we be using xmm with VLX?

lib/Target/X86/X86InstrInfo.cpp
7735 ↗(On Diff #108458)

I think we should use VPXORDZ128rr and the equivalent xmm instead to be consistent. That will bring some of the tests back together I think.

test/CodeGen/X86/vector-shuffle-combining-avx.ll
26 ↗(On Diff #108458)

There's no FileCheck prefix that references this prefix.

If we used xmm instead of ymm for VLX would this test not need to have its filecheck lines split?

I know Simon already accepted this, but shouldn't we be using xmm with VLX?

Discussed this with @dtemirbulatov offline - AVX512/AVX512VL (inc AVX512_256_SET0) will be done as a separate patch as this thing is getting too big as it is.

This revision was automatically updated to reflect the committed changes.