This is an archive of the discontinued LLVM Phabricator instance.

[X86] Generate VZEROUPPER for Skylake-avx512
ClosedPublic

Authored by aaboud on Feb 12 2017, 7:59 AM.

Details

Summary

VZEROUPPER should not be issued on Knights Landing (KNL) , but on Skylake-avx512 it should be.
Modified the x86vzeroupper pass to support that.

A lot of LIT tests had to be modified, most of them were fixed automatically by regenerating the "CHECK" lines using update_llc_test_checks.py.
Other changes are name changing to illustrate that VZEROUPPER is needed for zmm as well as ymm.

Diff Detail

Repository
rL LLVM

Event Timeline

aaboud created this revision.Feb 12 2017, 7:59 AM
craig.topper added inline comments.Feb 13 2017, 8:11 PM
lib/Target/X86/X86.td
241 ↗(On Diff #88137)

Minor nit. The word "Some" shouldn't be capitalized.

lib/Target/X86/X86VZeroUpper.cpp
222 ↗(On Diff #88137)

Why did this change to not check specific register clobbers?

225 ↗(On Diff #88137)

This comments is stale with ZMM. It's not upper 128 bits.

aaboud marked an inline comment as done.Feb 13 2017, 11:02 PM

Thanks Craig for the comments, see answers inline.

lib/Target/X86/X86VZeroUpper.cpp
222 ↗(On Diff #88137)

We can get to this point only if ClobberAllTmmReg is true, in such case ClobberAnyYmmReg will also be true, and we will fall-through exactly as we do with this implementation.
This was a kind of did code.

Also, can you explain what is unique with "cannot clobber any" and "cannot clobber all" regarding these lines?

225 ↗(On Diff #88137)

Actually, this is what it does. even for the ZMM registers it reset the upper 128bits of the YMM corresponding register. Is not it?

The important behavior is the next sentence, which emphasize that processor changes back to Clean state.

craig.topper added inline comments.Feb 13 2017, 11:20 PM
lib/Target/X86/X86VZeroUpper.cpp
222 ↗(On Diff #88137)

This is probably the first time I've really looked at this code so I think explain what is unique without more thought.

225 ↗(On Diff #88137)

Doesn't it clear the upper 384 bits of ZMM? The comment itself doesn't say YMM. It just says "AVX registers".

aaboud added inline comments.Feb 13 2017, 11:48 PM
lib/Target/X86/X86VZeroUpper.cpp
222 ↗(On Diff #88137)

Michael added this code about 3 years ago, he might be able to explain better why he implemented it like this.

225 ↗(On Diff #88137)

I am fine with rephrasing the comment, though AVX registers does not include any ZMM, just YMM.
Do you think I should replace YMM with ZMM?

mkuper added a subscriber: mkuper.Feb 14 2017, 12:01 AM
mkuper added inline comments.
lib/Target/X86/X86VZeroUpper.cpp
111 ↗(On Diff #88137)

Drive by comment - is this relevant for Y/ZMM16-31? If not, then this should be explicitly documented, since these are obviously YmmOrZmmRegs.

aaboud added inline comments.Feb 14 2017, 12:05 AM
lib/Target/X86/X86VZeroUpper.cpp
111 ↗(On Diff #88137)

Y/ZMM16-31 are not relevant, I will add a comment emphasizing that fact.

hliao added inline comments.Feb 14 2017, 9:46 AM
lib/Target/X86/X86VZeroUpper.cpp
159 ↗(On Diff #88137)

Remove this empty line unless it's definitely required for formatting.

222 ↗(On Diff #88137)

for CALL, clobberAllTmmReg checks whether it clobbers ALL tmm regs. Here, clobberAnyTmmReg checks whether that call will uses any YMM registers. they are not equivalent.

however, a previous test case added seems not valid any more due to the calling into builtin library _ftol2 is replaced with native code sequence. not sure whether we will have other code to reproduce PR17631.

aaboud marked an inline comment as done.Feb 15 2017, 4:08 AM

Thanks Michael for the comments.
Answer inline.

lib/Target/X86/X86VZeroUpper.cpp
222 ↗(On Diff #88137)

That is right that clobberAllYmmReg and clobberAnyYmmReg are different.
However:

  1. If clobberAllYmmReg is false, we will not get to this line.
  2. If clobberAllYmmReg is true, then clobberAnyYmmReg is true, i.e. !clobberAnyYmmReg is false.

So at this point, we will "continue" only if Call instruction has no RegMask.

Now, can you explain why we would not add vzeroupper before call instruction that does not clobberAnyTmmReg (i.e. has all YMM registers in its RegMask), but we do not want to add the vzeroupper before function that has only some of the YMM registers in the RegMask?
I believe that we want to add vzeroupper in both cases.

In fact, I believe that we should not check for clobbering at all in this pass (but this is for other patch).

aaboud added inline comments.Feb 15 2017, 4:14 AM
lib/Target/X86/X86VZeroUpper.cpp
225 ↗(On Diff #88137)

I am fine with rephrasing the comment, though AVX registers does not include any ZMM, just YMM.
Do you think I should replace YMM with ZMM?

I meant to say: Do you think I should replace AVX with YMM (or YMM0-16)?

craig.topper added inline comments.Feb 15 2017, 8:14 AM
lib/Target/X86/X86VZeroUpper.cpp
225 ↗(On Diff #88137)

Probably should at least say YMM. "AVX register" is potentially confusing since the letters AVX appear in AVX512.

aaboud updated this revision to Diff 88547.Feb 15 2017, 8:31 AM
aaboud marked 2 inline comments as done.

Applied minor fixes (documentation and style) based on several comments.

aaboud added a comment.Mar 1 2017, 4:46 AM

Craig,
Do you have more comments? I addressed all the comments in this review.

Thanks

This revision is now accepted and ready to land.Mar 1 2017, 10:11 AM
This revision was automatically updated to reflect the committed changes.