This is an archive of the discontinued LLVM Phabricator instance.

[MC][X86] Code padding for performance stability - Branch instructions and targets alignment
Needs ReviewPublic

Authored by opaparo on Nov 9 2017, 6:09 AM.

Details

Summary

Adding X86-specific code padding (the X86MCCodePadder class) that will utilize the previously introduced code padding infrastructure.

The Intel optimization reference manual states that:

When executing code from the legacy decode pipeline, direct branches that are mostly taken should have all their instruction bytes in a 16B aligned chunk of memory and nearer the end of that 16B aligned chunk.

(Rule 12 under section 3.4.1.5 "Code Alignment").

In this patch a new policy is introduced, which implements this rule by inserting MCPaddingFragments before branches and branch targets and returning positive penalty weight for 16 Byte windows that contain the said situation.

Diff Detail

Event Timeline

opaparo created this revision.Nov 9 2017, 6:09 AM
zvi added inline comments.Nov 12 2017, 11:19 PM
lib/Target/X86/MCTargetDesc/X86MCCodePadder.cpp
71

Is it possible to encode this info in the .td's with something like isConditionalBranch=1?

109

This info should already be encoded in the MCDesc look for something like MCID::Call, so instead o f looking for explicit opcode check if this attribute is set. Or does this function look only for a subset of the call instructions?

lib/Target/X86/MCTargetDesc/X86MCCodePadder.h
24

*in charge

test/CodeGen/X86/branch-instructions-end-of-16B-chunk-perf-nops.mir
1 ↗(On Diff #122230)
  1. Please commit this file as a NFC change and rebase patch so the diff will be more apparent.
  2. Please add a negative tests.
gadi.haber added inline comments.Nov 12 2017, 11:58 PM
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
77

why is there a diff here? Did some unseen character was inserted?

lib/Target/X86/MCTargetDesc/X86MCCodePadder.cpp
39

isn't it safer just to check that the CPU is atom instead?
The rule is good for any future CPU by default.

lib/Target/X86/MCTargetDesc/X86MCCodePadder.h
24

decisions --> policies ?

42

*branch

108

both direct and indirect?

craig.topper added inline comments.Nov 13 2017, 12:02 AM
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
80

The std::move here should be unnecessary the unique_ptr would be a temporary.

Maybe better to just do MCAsmBackend(llvm::make_unique<X86::X86MCCodePadder>(CPU))

lib/Target/X86/MCTargetDesc/X86MCCodePadder.cpp
39

Should skylake-avx512 be in this list? What about "cannonlake". This seems destined to become unmaintainable quickly. Is there some way we can put this in a subtarget property.

70

Immediate*

71

Inst.getDesc().isConditionalBranch() might work?

88

Variable names should be capitalized.

LLVM normally prefers "unsigned" over "unsigned int"

100

Unconditional*

opaparo updated this revision to Diff 122662.Nov 13 2017, 8:45 AM

Now using MCInstrDesc via MCInstrInfo instead of checking for specific opcodes.
Checked in the test and rebased to make changes more clear.
Added a negative test.
Fixed some typos.

opaparo marked 9 inline comments as done.Nov 13 2017, 8:53 AM
opaparo added inline comments.
lib/Target/X86/MCTargetDesc/X86MCCodePadder.cpp
39

I rather have an inclusive list than an exclusive list.
That way you have more control and no CPU can "snick in".

I don't see a better way to achieve that. The Backend has no access to a MCSubtargetInfo (And so X86AsmBackend's constructor also checks for individual CPUs when initalizing the field HasNopl, for example).
Do you know of a way around it, or a better solution?

lib/Target/X86/MCTargetDesc/X86MCCodePadder.h
24

"policies" is more of an implementation detail

108

Yes.

craig.topper added inline comments.Nov 13 2017, 7:02 PM
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
79

Use llvm::make_unique not std::make_unique. std::make_unique is a C++14 feature. LLVM code is required to build in C++11.

lib/Target/X86/MCTargetDesc/X86MCCodePadder.cpp
34

I didn't see an answer about skylake-avx512.

opaparo updated this revision to Diff 122783.Nov 13 2017, 11:48 PM

Replacing std::make_unique with llvm::make_unique.
Adding "skylake-avx512" to the architectures which the optimization will be enabled for.

opaparo marked 2 inline comments as done.Nov 13 2017, 11:49 PM
xur added a subscriber: xur.Nov 29 2017, 4:11 PM