This is an archive of the discontinued LLVM Phabricator instance.

[X86][[AVX512] Code size reduction in X86 by replacing EVEX with VEX encoding
ClosedPublic

Authored by gadi.haber on Dec 18 2016, 7:08 AM.

Details

Summary

This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible.

There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.

For example: “vmovss %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible encodings:

EVEX encoding (8 bytes long):

62 f1 7e 08 11 44 84 08         vmovss  %xmm0, 32(%rsp,%rax,4)

VEX encoding (6 bytes long):

c5 fa 11 44 84 20                     vmovss  %xmm0, 32(%rsp,%rax,4)

Reported Bugzilla bugs related to this patch:
https://llvm.org/bugs/show_bug.cgi?id=23376
https://llvm.org/bugs/show_bug.cgi?id=29162

In this patch we created a new pass called createX86EvexToVexInsts at the pre-emit stage which uses a table of all EVEX opcodes that can be encoded via VEX.
The table is placed in a separate header file along with the above pass file under lib/Target/X86.

The patch requires many modifications to CodeGen/X86 unit tests since a new string: "EVEX TO VEX Compression " was added to the encoding string of each optimized instruction. The string is printed whenever the llc --show-mc-encoding flag is applied.
Finally, an additional MIR test file was added to the CodeGen/X86 unit tests called: "evex-to-vex.mir" containing all the EVEX instructions that are handled by this optimization.

Diff Detail

Repository
rL LLVM

Event Timeline

gadi.haber updated this revision to Diff 81882.Dec 18 2016, 7:08 AM
gadi.haber retitled this revision from to [X86][[AVX512] Code size reduction in X86 by replacing EVEX with VEX encoding.
gadi.haber updated this object.
gadi.haber added reviewers: zvi, delena, craig.topper, igorb.
gadi.haber set the repository for this revision to rL LLVM.
gadi.haber added a subscriber: llvm-commits.
craig.topper added inline comments.Dec 18 2016, 11:05 AM
lib/Target/X86/InstPrinter/X86InstComments.h
20 ↗(On Diff #81882)

The other bit (bit 0) used for AsmComments is target independent. How do we keep someone from adding a new target independent bit 1 and conflicting with this usage? We should have some documentation and probably an enum contant in MachineInstr.h that indicates which bits can be used for target dependent things.

lib/Target/X86/X86EvexToVex.cpp
54 ↗(On Diff #81882)

The second line is indented too far.

125 ↗(On Diff #81882)

This line should be indented further to line up arguments.

151 ↗(On Diff #81882)

This should be (Desc.TSFlags & X86II::EncodingMask) == X86II::EVEX)

X86II::EVEX is not a bit mask its a value for the encoding field.

159 ↗(On Diff #81882)

Variable names should start with a capital letter.

162 ↗(On Diff #81882)

Comments in LLVM are generally written as sentences starting with a capitalized first word and ending with a period.

168 ↗(On Diff #81882)

What do we get out of having two different DenseMaps? Couldn't we have 128 and 256 in the same map?

206 ↗(On Diff #81882)

Can we use TargetRegisterInfo::getRegEncoding() to simplify this?

224 ↗(On Diff #81882)

Remove space before parentheses

lib/Target/X86/X86MCInstLower.cpp
1294 ↗(On Diff #81882)

EVE should be EVEX

lib/Target/X86/X86TargetMachine.cpp
403 ↗(On Diff #81882)

Fix the indentation here.

zvi added inline comments.Dec 19 2016, 7:50 AM
lib/MC/MCAsmStreamer.cpp
303 ↗(On Diff #81882)

Have you considered using GetCommentOS() or EmitRawComment() before modifying this method?

308 ↗(On Diff #81882)

The 'by default' comment should be placed where EOL is initialized by default to true?

lib/Target/X86/X86EvexToVex.cpp
1 ↗(On Diff #81882)

Can you please run clang-format on this file? it will fix some of Craig's comments about indentation and perhaps a few more glitches.

105 ↗(On Diff #81882)

What if MF is empty?

108 ↗(On Diff #81882)

Please remove this DEBUG() macro and the one in line 118 as they don't contribute much.

112 ↗(On Diff #81882)

Please consider:
for (MachineBasicBlock &MBB : F)

rc = ....
130 ↗(On Diff #81882)

*encoding

142 ↗(On Diff #81882)

There is a convention in other passes to name this variable 'Changed'

159 ↗(On Diff #81882)

Consider dropping this variable since it's used only in the following statement.

198 ↗(On Diff #81882)

consider this:
for (const MachineOperand &MO : I->operands()) {

222 ↗(On Diff #81882)

if you change the loop header in line 145 to the following you won't need this extra variable:
for (MachineInstr &MI : MBB) {

test/CodeGen/X86/evex-to-vex-compress.mir
17 ↗(On Diff #81882)

It's hard to follow this enormous test. Can you please put the '# CHECK:' line above and adjacent to its corresponding input instruction?

gadi.haber marked 7 inline comments as done.Dec 19 2016, 7:55 AM
gadi.haber added inline comments.
lib/Target/X86/InstPrinter/X86InstComments.h
20 ↗(On Diff #81882)

I'm open to suggestions on this.

One option is to simply add the following comment in MachineInstr.h:

enum CommentFlag {

  ReloadReuse = 0x1 // higher bits are reserved for target dep comments.
};
lib/Target/X86/X86EvexToVex.cpp
151 ↗(On Diff #81882)

Good catch. Thanx!
You probably meant:
(Desc.TSFlags & X86II::EncodingMask) != X86II::EVEX

168 ↗(On Diff #81882)

The smaller the DenseMap table is the faster the find() method returns a value. It is therefore, recommended to chunk the DenseMap table to smaller tables whenever possible.

206 ↗(On Diff #81882)

I can simplify the code by defining a lambda function called isAVX512RegOnly(unsigned Reg) and use it to check on the Reg operand to be between X86::ZMM0 and X86::ZMM31 or between X86::YMM16 and X86::YMM31 or between X86:XMM16 and X86::XMM31 as follows:

auto isAVX512RegOnly = [](unsigned Reg) {

if (Reg >= X86::ZMM0 && Reg <= X86::ZMM31)
    return true;

if (Reg >= X86::XMM16 && Reg <= X86::XMM31)
    return true;

if (Reg >= X86::YMM16 && Reg <= X86::YMM31)
    return true;

return false;

};

craig.topper added inline comments.Dec 19 2016, 9:37 PM
lib/Target/X86/X86EvexToVex.cpp
206 ↗(On Diff #81882)

Is it even possible to get a VR512 on any of the instructions in the table? The register class is tightly bound to the instruction.

For XMM/YMM, getRegEncoding would return 0-31 and you can just check that.

delena added inline comments.Dec 19 2016, 11:10 PM
lib/Target/X86/X86EvexToVex.cpp
206 ↗(On Diff #81882)

I can't find any getRegEncoding() method, do you mean to write a new one?

craig.topper added inline comments.Dec 19 2016, 11:24 PM
lib/Target/X86/X86EvexToVex.cpp
206 ↗(On Diff #81882)

Sorry, it's getEncodingValue().

gadi.haber marked 6 inline comments as done.Dec 19 2016, 11:42 PM
gadi.haber added inline comments.
lib/MC/MCAsmStreamer.cpp
303 ↗(On Diff #81882)

Yes, but they did not generate a concatenated comment.

lib/Target/X86/X86EvexToVex.cpp
130 ↗(On Diff #81882)

Did you mean removing the comma that follows encoding?

craig.topper added inline comments.Dec 20 2016, 12:08 AM
lib/Target/X86/X86EvexToVex.cpp
47 ↗(On Diff #81882)

This also says "encodig"

130 ↗(On Diff #81882)

The first line of the comment says "encodig" rather than "encoding"

gadi.haber marked 2 inline comments as done.Dec 20 2016, 12:47 AM
gadi.haber marked 2 inline comments as done.Dec 20 2016, 1:03 AM
gadi.haber added inline comments.
lib/Target/X86/X86EvexToVex.cpp
105 ↗(On Diff #81882)

Added a check:

if (!MF)
  return false;
206 ↗(On Diff #81882)

True. There shouldn't be any VR512 registers in the EVEX to VEX tables.
This is an additional safety check.

gadi.haber marked 4 inline comments as done.

Updated diff following review comments by Craig and Zvi.

zvi added inline comments.Dec 21 2016, 12:57 AM
lib/Target/X86/X86EvexToVex.cpp
100 ↗(On Diff #82128)

Maybe bail-out early if subtarget does not support avx512?

const X86Subtarget &ST = MF.getSubtarget<X86Subtarget>();
if (!ST.hasAVX512())
  return false;
133 ↗(On Diff #82128)

Please remove the comment

141 ↗(On Diff #82128)

I think it would be helpful if the comment said that in these cases the transformation is not possible because EVEX is needed to carry this information.

199 ↗(On Diff #82128)

An improvement may be to iterate only over MI.explicit_operands()

gadi.haber marked 3 inline comments as done.Dec 21 2016, 3:15 AM
gadi.haber added inline comments.
lib/Target/X86/X86EvexToVex.cpp
141 ↗(On Diff #82128)

Updated comment:

// Check for EVEX instructions with mask or broadcast as in these cases 
 // the EVEX prefix is needed in order to carry this information 
 // thus preventing the transformation to VEX encoding.
craig.topper added inline comments.Dec 21 2016, 9:14 AM
lib/Target/X86/X86EvexToVex.cpp
196 ↗(On Diff #82128)

I think this comment should say "16" instead of "6".

206 ↗(On Diff #81882)

If it shouldn't happen, shouldn't it just be an assert?

gadi.haber marked 4 inline comments as done.Dec 21 2016, 12:22 PM
gadi.haber added inline comments.
lib/Target/X86/X86EvexToVex.cpp
196 ↗(On Diff #82128)

Good catch! Thanx!

206 ↗(On Diff #81882)

good point.
replaced the check with the following assert:

assert (!(Reg >= X86::ZMM0 && Reg <= X86::ZMM31));
gadi.haber updated this revision to Diff 82341.Dec 22 2016, 7:41 AM
gadi.haber marked 2 inline comments as done.

Updated diff after additional comments by Craig and Zvi

craig.topper added inline comments.Dec 23 2016, 6:23 PM
lib/Target/X86/X86EvexToVex.cpp
111 ↗(On Diff #82341)

Don't we need to OR into rc? This just captures the last returned value.

gadi.haber marked an inline comment as done.Dec 24 2016, 11:27 PM
gadi.haber added inline comments.
lib/Target/X86/X86EvexToVex.cpp
111 ↗(On Diff #82341)

Good catch!
Here is the modified code:

bool Changed = false;

  /// Go over all basic blocks in function and replace
  /// EVEX encoded instrs by VEX encoding when possible.
  for (MachineBasicBlock &MBB : MF)
    Changed |= CompressEvexToVexImpl(MBB);

  return Changed;
zvi added inline comments.Dec 25 2016, 12:21 AM
lib/Target/X86/X86EvexToVex.cpp
49 ↗(On Diff #82341)

Since the transform of an individual instruction is independant of other instructions in the BB, i think it would be better to reduce the scope of this function from a MBB to a MachineInstr. (So the loop on the MBB can be moved to the caller of this function)

219 ↗(On Diff #82341)

Now that this API takes also target-specific flags, I think that instead of casting, setAsmPrinterFlag(CommentFlag) (and friends) should be changed to setAsmPrinterFlag(uint8_t) or something similar.

gadi.haber marked 2 inline comments as done.Dec 25 2016, 5:50 AM
gadi.haber added inline comments.
lib/Target/X86/X86EvexToVex.cpp
219 ↗(On Diff #82341)

There is only single definition of setAsmPrinterFlag in MachineInstr.h and it receives the CommenFlag parameter.

gadi.haber updated this revision to Diff 82470.Dec 25 2016, 5:57 AM

More updates to X86EvexToVex.cpp following comments by Zvi and Craig.

zvi added inline comments.Dec 25 2016, 7:09 AM
lib/Target/X86/X86EvexToVex.cpp
118 ↗(On Diff #82470)

I think these 3 if's belong in CompressEvexToVexImpl()

165 ↗(On Diff #82470)

assert here that (IsEVEX_V128 XOR IsEVEX_V256) ?

165 ↗(On Diff #82470)

Please ignore my other comment on this line, Phabricator won't let me remove it.

gadi.haber marked an inline comment as done.

updated diff after one comment by Zvi

gadi.haber marked an inline comment as done.Dec 27 2016, 7:48 AM
gadi.haber added inline comments.
lib/Target/X86/X86EvexToVex.cpp
219 ↗(On Diff #82341)

OK. I understand your comment now.
I changed the definition of SetAsmPrinterFlag(CommentFlag) to SetAsmPrinterFlag(unit8_t) in MachineInstr.h

gadi.haber updated this revision to Diff 82539.Dec 27 2016, 7:59 AM

Updated diff file after comment by Zvi on changing the setAsmPrinterFlag to receive unit8_t instead of CommentFlag.

gadi.haber updated this revision to Diff 82541.Dec 27 2016, 8:33 AM

updated diff after Zvi's comment on setAsmPrinterFlag + fixed typo in the enum AC_EVEX_2_VEX.

zvi accepted this revision.Dec 27 2016, 8:43 AM
zvi edited edge metadata.

LGTM. Thanks, Gadi.

This revision is now accepted and ready to land.Dec 27 2016, 8:43 AM
This revision was automatically updated to reflect the committed changes.