This is an archive of the discontinued LLVM Phabricator instance.

[X86][[AVX512] Code size reduction in X86 by replacing EVEX with VEX encoding
ClosedPublic

Authored by gadi.haber on Dec 18 2016, 7:08 AM.

Details

Summary

This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible.

There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.

For example: “vmovss %xmm0, 32(%rsp,%rax,4)“, has the following 2 possible encodings:

EVEX encoding (8 bytes long):

62 f1 7e 08 11 44 84 08         vmovss  %xmm0, 32(%rsp,%rax,4)

VEX encoding (6 bytes long):

c5 fa 11 44 84 20                     vmovss  %xmm0, 32(%rsp,%rax,4)

Reported Bugzilla bugs related to this patch:
https://llvm.org/bugs/show_bug.cgi?id=23376
https://llvm.org/bugs/show_bug.cgi?id=29162

In this patch we created a new pass called createX86EvexToVexInsts at the pre-emit stage which uses a table of all EVEX opcodes that can be encoded via VEX.
The table is placed in a separate header file along with the above pass file under lib/Target/X86.

The patch requires many modifications to CodeGen/X86 unit tests since a new string: "EVEX TO VEX Compression " was added to the encoding string of each optimized instruction. The string is printed whenever the llc --show-mc-encoding flag is applied.
Finally, an additional MIR test file was added to the CodeGen/X86 unit tests called: "evex-to-vex.mir" containing all the EVEX instructions that are handled by this optimization.

Diff Detail

Repository
rL LLVM

Event Timeline

gadi.haber updated this revision to Diff 81882.Dec 18 2016, 7:08 AM
gadi.haber retitled this revision from to [X86][[AVX512] Code size reduction in X86 by replacing EVEX with VEX encoding.
gadi.haber updated this object.
gadi.haber added reviewers: zvi, delena, craig.topper, igorb.
gadi.haber set the repository for this revision to rL LLVM.
gadi.haber added a subscriber: llvm-commits.
craig.topper added inline comments.Dec 18 2016, 11:05 AM
lib/Target/X86/InstPrinter/X86InstComments.h
20

The other bit (bit 0) used for AsmComments is target independent. How do we keep someone from adding a new target independent bit 1 and conflicting with this usage? We should have some documentation and probably an enum contant in MachineInstr.h that indicates which bits can be used for target dependent things.

lib/Target/X86/X86EvexToVex.cpp
55

The second line is indented too far.

126

This line should be indented further to line up arguments.

152

This should be (Desc.TSFlags & X86II::EncodingMask) == X86II::EVEX)

X86II::EVEX is not a bit mask its a value for the encoding field.

160

Variable names should start with a capital letter.

163

Comments in LLVM are generally written as sentences starting with a capitalized first word and ending with a period.

169

What do we get out of having two different DenseMaps? Couldn't we have 128 and 256 in the same map?

207

Can we use TargetRegisterInfo::getRegEncoding() to simplify this?

225

Remove space before parentheses

lib/Target/X86/X86MCInstLower.cpp
1294

EVE should be EVEX

lib/Target/X86/X86TargetMachine.cpp
404

Fix the indentation here.

zvi added inline comments.Dec 19 2016, 7:50 AM
lib/MC/MCAsmStreamer.cpp
304–305

Have you considered using GetCommentOS() or EmitRawComment() before modifying this method?

310

The 'by default' comment should be placed where EOL is initialized by default to true?

lib/Target/X86/X86EvexToVex.cpp
2

Can you please run clang-format on this file? it will fix some of Craig's comments about indentation and perhaps a few more glitches.

106

What if MF is empty?

109

Please remove this DEBUG() macro and the one in line 118 as they don't contribute much.

113

Please consider:
for (MachineBasicBlock &MBB : F)

rc = ....
131

*encoding

143

There is a convention in other passes to name this variable 'Changed'

160

Consider dropping this variable since it's used only in the following statement.

199

consider this:
for (const MachineOperand &MO : I->operands()) {

223

if you change the loop header in line 145 to the following you won't need this extra variable:
for (MachineInstr &MI : MBB) {

test/CodeGen/X86/evex-to-vex-compress.mir
18

It's hard to follow this enormous test. Can you please put the '# CHECK:' line above and adjacent to its corresponding input instruction?

gadi.haber marked 7 inline comments as done.Dec 19 2016, 7:55 AM
gadi.haber added inline comments.
lib/Target/X86/InstPrinter/X86InstComments.h
20

I'm open to suggestions on this.

One option is to simply add the following comment in MachineInstr.h:

enum CommentFlag {

  ReloadReuse = 0x1 // higher bits are reserved for target dep comments.
};
lib/Target/X86/X86EvexToVex.cpp
152

Good catch. Thanx!
You probably meant:
(Desc.TSFlags & X86II::EncodingMask) != X86II::EVEX

169

The smaller the DenseMap table is the faster the find() method returns a value. It is therefore, recommended to chunk the DenseMap table to smaller tables whenever possible.

207

I can simplify the code by defining a lambda function called isAVX512RegOnly(unsigned Reg) and use it to check on the Reg operand to be between X86::ZMM0 and X86::ZMM31 or between X86::YMM16 and X86::YMM31 or between X86:XMM16 and X86::XMM31 as follows:

auto isAVX512RegOnly = [](unsigned Reg) {

if (Reg >= X86::ZMM0 && Reg <= X86::ZMM31)
    return true;

if (Reg >= X86::XMM16 && Reg <= X86::XMM31)
    return true;

if (Reg >= X86::YMM16 && Reg <= X86::YMM31)
    return true;

return false;

};

craig.topper added inline comments.Dec 19 2016, 9:37 PM
lib/Target/X86/X86EvexToVex.cpp
207

Is it even possible to get a VR512 on any of the instructions in the table? The register class is tightly bound to the instruction.

For XMM/YMM, getRegEncoding would return 0-31 and you can just check that.

delena added inline comments.Dec 19 2016, 11:10 PM
lib/Target/X86/X86EvexToVex.cpp
207

I can't find any getRegEncoding() method, do you mean to write a new one?

craig.topper added inline comments.Dec 19 2016, 11:24 PM
lib/Target/X86/X86EvexToVex.cpp
207

Sorry, it's getEncodingValue().

gadi.haber marked 6 inline comments as done.Dec 19 2016, 11:42 PM
gadi.haber added inline comments.
lib/MC/MCAsmStreamer.cpp
304–305

Yes, but they did not generate a concatenated comment.

lib/Target/X86/X86EvexToVex.cpp
131

Did you mean removing the comma that follows encoding?

craig.topper added inline comments.Dec 20 2016, 12:08 AM
lib/Target/X86/X86EvexToVex.cpp
48

This also says "encodig"

131

The first line of the comment says "encodig" rather than "encoding"

gadi.haber marked 2 inline comments as done.Dec 20 2016, 12:47 AM
gadi.haber marked 2 inline comments as done.Dec 20 2016, 1:03 AM
gadi.haber added inline comments.
lib/Target/X86/X86EvexToVex.cpp
106

Added a check:

if (!MF)
  return false;
207

True. There shouldn't be any VR512 registers in the EVEX to VEX tables.
This is an additional safety check.

gadi.haber marked 4 inline comments as done.

Updated diff following review comments by Craig and Zvi.

zvi added inline comments.Dec 21 2016, 12:57 AM
lib/Target/X86/X86EvexToVex.cpp
101

Maybe bail-out early if subtarget does not support avx512?

const X86Subtarget &ST = MF.getSubtarget<X86Subtarget>();
if (!ST.hasAVX512())
  return false;
134

Please remove the comment

142

I think it would be helpful if the comment said that in these cases the transformation is not possible because EVEX is needed to carry this information.

200

An improvement may be to iterate only over MI.explicit_operands()

gadi.haber marked 3 inline comments as done.Dec 21 2016, 3:15 AM
gadi.haber added inline comments.
lib/Target/X86/X86EvexToVex.cpp
142

Updated comment:

// Check for EVEX instructions with mask or broadcast as in these cases 
 // the EVEX prefix is needed in order to carry this information 
 // thus preventing the transformation to VEX encoding.
craig.topper added inline comments.Dec 21 2016, 9:14 AM
lib/Target/X86/X86EvexToVex.cpp
197

I think this comment should say "16" instead of "6".

207

If it shouldn't happen, shouldn't it just be an assert?

gadi.haber marked 4 inline comments as done.Dec 21 2016, 12:22 PM
gadi.haber added inline comments.
lib/Target/X86/X86EvexToVex.cpp
197

Good catch! Thanx!

207

good point.
replaced the check with the following assert:

assert (!(Reg >= X86::ZMM0 && Reg <= X86::ZMM31));
gadi.haber updated this revision to Diff 82341.Dec 22 2016, 7:41 AM
gadi.haber marked 2 inline comments as done.

Updated diff after additional comments by Craig and Zvi

craig.topper added inline comments.Dec 23 2016, 6:23 PM
lib/Target/X86/X86EvexToVex.cpp
112

Don't we need to OR into rc? This just captures the last returned value.

gadi.haber marked an inline comment as done.Dec 24 2016, 11:27 PM
gadi.haber added inline comments.
lib/Target/X86/X86EvexToVex.cpp
112

Good catch!
Here is the modified code:

bool Changed = false;

  /// Go over all basic blocks in function and replace
  /// EVEX encoded instrs by VEX encoding when possible.
  for (MachineBasicBlock &MBB : MF)
    Changed |= CompressEvexToVexImpl(MBB);

  return Changed;
zvi added inline comments.Dec 25 2016, 12:21 AM
lib/Target/X86/X86EvexToVex.cpp
50

Since the transform of an individual instruction is independant of other instructions in the BB, i think it would be better to reduce the scope of this function from a MBB to a MachineInstr. (So the loop on the MBB can be moved to the caller of this function)

220

Now that this API takes also target-specific flags, I think that instead of casting, setAsmPrinterFlag(CommentFlag) (and friends) should be changed to setAsmPrinterFlag(uint8_t) or something similar.

gadi.haber marked 2 inline comments as done.Dec 25 2016, 5:50 AM
gadi.haber added inline comments.
lib/Target/X86/X86EvexToVex.cpp
220

There is only single definition of setAsmPrinterFlag in MachineInstr.h and it receives the CommenFlag parameter.

gadi.haber updated this revision to Diff 82470.Dec 25 2016, 5:57 AM

More updates to X86EvexToVex.cpp following comments by Zvi and Craig.

zvi added inline comments.Dec 25 2016, 7:09 AM
lib/Target/X86/X86EvexToVex.cpp
119

I think these 3 if's belong in CompressEvexToVexImpl()

166

assert here that (IsEVEX_V128 XOR IsEVEX_V256) ?

166

Please ignore my other comment on this line, Phabricator won't let me remove it.

gadi.haber marked an inline comment as done.

updated diff after one comment by Zvi

gadi.haber marked an inline comment as done.Dec 27 2016, 7:48 AM
gadi.haber added inline comments.
lib/Target/X86/X86EvexToVex.cpp
220

OK. I understand your comment now.
I changed the definition of SetAsmPrinterFlag(CommentFlag) to SetAsmPrinterFlag(unit8_t) in MachineInstr.h

gadi.haber updated this revision to Diff 82539.Dec 27 2016, 7:59 AM

Updated diff file after comment by Zvi on changing the setAsmPrinterFlag to receive unit8_t instead of CommentFlag.

gadi.haber updated this revision to Diff 82541.Dec 27 2016, 8:33 AM

updated diff after Zvi's comment on setAsmPrinterFlag + fixed typo in the enum AC_EVEX_2_VEX.

zvi accepted this revision.Dec 27 2016, 8:43 AM
zvi edited edge metadata.

LGTM. Thanks, Gadi.

This revision is now accepted and ready to land.Dec 27 2016, 8:43 AM
This revision was automatically updated to reflect the committed changes.