This is an archive of the discontinued LLVM Phabricator instance.

[MC] Adding code padding for performance stability - infrastructure. NFC.
ClosedPublic

Authored by opaparo on Jun 20 2017, 4:25 AM.

Details

Summary

Implementation of padding code with nop instructions in key places that will cause performance improvement.
The implementation is such that the adding of the nops is done in the Assembler after the layout is done and all IPs and alignments are known.
The patch is the result of this RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107223.html
This is the first patch of the implemented solution, and contains only the infrastructure needed for this change (this by itself is a NFC).

Diff Detail

Event Timeline

opaparo created this revision.Jun 20 2017, 4:25 AM

Should we expose this to the .s?

That would make it easy to test codegen because you would just need to
check for things like .pad_bb_begin.

It would make it easier to test the layout as you could use a simple .s
file for that.

Cheers,
Rafael

Hi Rafael,

Unfortunately it cannot be exposed to the .s. I'll explain:
When generating an object file, MCObjectStreamer (or more specifically, one of its descendants) is used as the MCStreamer by MCAsmPrinter. All the decisions regarding the places to insert padding and the final size of the padding are made in the MCAsmBackend, which is a member of MCObjectStreamer (via the MCAssembler). Some of those decisions (the decisions regarding the final size of the padding) rely on the final layout of the code (as manifested by MCAsmLayout). On the other hand, when generating a .s file, MCAsmStreamer is used as the MCStreamer by MCAsmPrinter. Unlike MCObjectStreamer, this streamer does not require a MCAsmBackend. In addition, .s files don't have a layout, so there's no MCAsmLayout either, which, as I stated, is needed for the decision making. All in all, we can't mimic the padding behavior when generating a .s file.

Hope that explains it.

Thanks,
Omer

craig.topper edited edge metadata.Jun 21 2017, 1:56 PM

Can you please repost the diff with full context using one of the commands mentioned here. http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface

That will allow phabricator to show the entire file that's being touched so we can see areas around your patch in the web interface.

include/llvm/MC/MCCodePadder.h
43

I think LLVM coding practices would prefer DenseSet here, but I'm not sure.

64

I think we would normally use DenseMap here.

craig.topper added inline comments.Jun 21 2017, 2:00 PM
include/llvm/MC/MCAsmBackend.h
22

I don't think MC layer supposed to use any of the Machine* classes.

zvi edited edge metadata.Jun 21 2017, 2:08 PM

Please update the patch with a context that contains the full file (svn diff -U99999 ...)

Can you add a short section to docs/CodeGenerator.rst explaining how code padding works? It will make it easier to review this patch and following patches, and would serve as a good reference.

It would be helpful if you uploaded the next patch in the queue to justify this patch.

include/llvm/MC/MCCodePadder.h
71

Where is this used?

include/llvm/MC/MCFragment.h
44

check the alignment

327

Please add a description for this class

338

Use:
struct Foo {};

instead of:
typedef struct {} Foo;

339

check alignment

407

Not sure we need this method

lib/MC/MCAsmBackend.cpp
21

Avoid the need for explicit initialization and destruction by using a unique_ptr<> or a similar object.
Or even better (if possible) avoid allocating on the heap.

25

is getOrCreateCodePadder a better name?

26

Is there a way to avoid these repeating runtime checks by requiring the user to ensure createCodePadder() happends before any getCodePadder()?
If handleCodePaddingFunctionStart() is called before other callers of getCodePadder(), maybe the allocation should happen there?

31

Is this function called anywhere other than getCodePadder? If no, i think we don't need it

lib/MC/MCCodePadder.cpp
64

if (SealedSections.count(...)) is a bit more compact

196

auto

228

Maybe "Fragment key already exists" would be more accurate?

234

auto

opaparo added inline comments.Jun 26 2017, 6:19 AM
include/llvm/MC/MCAsmBackend.h
22

Thanks for the heads up. I am now looking for ways to remove those unwanted dependencies.

opaparo updated this revision to Diff 106196.Jul 12 2017, 6:47 AM

Removed unwanted dependencies introduces in my previous patch: MC layer no longer depends on CodeGen, Core or Target. This was achieved by introducing a new structure named MCCodePaddingContext that is being created by AsmPrinter and passed on to the MC layer (all the way to MCCodePadder). This replaces the direct use of TargetMachine, MachineBasicBlock, MachineFunction and MachineLoopInfo by the MCCodePadder.
Note: This time I used "svn diff -x -U999999" to create the patch, so comparison between this patch and the previous one may prove to be challenging (This had to be done for future work and review).

opaparo marked 14 inline comments as done.Jul 12 2017, 6:55 AM

I'm going to add @rafael to this since he's much more familiar with the MC layer than myself.

lib/MC/MCAssembler.cpp
938

Remove the blank lines above and below your code to match the prevailing style of this switch.

Adding folks here with a strong interest in this. =]

Can you send the full patch so that we can help evaluate it?

gberry added a subscriber: gberry.Aug 17 2017, 7:14 AM

Tried the two patches with our internal benchmark -- the alignment related performance issue is still there. The problem disappears with -mllvm -x86-experimental-pref-loop-alignment=5 is used.

Tried the two patches with our internal benchmark -- the alignment related performance issue is still there. The problem disappears with -mllvm -x86-experimental-pref-loop-alignment=5 is used.

I can think of two reasons that might have caused the patch to fail to help you:

  1. This optimization is not enabled for all architectures. Use -target-cpu to specify the target (if you haven't already) to make sure you are compiling for the desired architecture, which will enable the optimization (if relevant). See the constructor of X86MCCodePadder for the list of architectures for which the optimization is enabled.
  2. You might have encountered a different alignment issue than the one handled by this patch. I will send you soon another patch (which was not yet uploaded for review) that addresses another alignment issue (in addition to the first one), which will hopefully handle your case. Regardless of the results you'll get, could you send a small reproducer for this problem? I would like to investigate it and see the kind of alignment issue we are facing here.
gadi.haber added inline comments.Oct 15 2017, 7:04 AM
include/llvm/CodeGen/AsmPrinter.h
630

Consider to rename the method to: setupCodePaddingContext

include/llvm/MC/MCAsmBackend.h
153

when --> after

include/llvm/MC/MCCodePadder.h
24

MCPaddingFragment --> MCCodePaddingFragment ?

47

PaddingPolicies --> CodePaddingPolicies ?

216

basicBlockRequiresPaddingFragment --> isBasicBlockRequiresPaddingFragment ?

224

instructionRequiresPaddingFragment --> isInstructionRequiresPaddingFragment

236

computeRangePenaltyWeight --> computeScopePenaltyWeight ?

include/llvm/MC/MCFragment.h
332

MCPaddingFragment --> MCCodePaddingFragment

335

PaddingPoliciesMask --> AggregatedPaddingPoliciesMask ?

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2607

IsPaddingActive --> IsCodePaddingActive ?

opaparo retitled this revision from Adding code padding for performance stability - infrastructure to [MC] Adding code padding for performance stability - infrastructure. NFC..Oct 16 2017, 1:46 AM
opaparo updated this revision to Diff 119116.Oct 16 2017, 1:52 AM
  1. MCAsmBackend c'tor now takes a unique_ptr of MCCodePadder instead of a raw pointer
  2. MCCodePaddingContext has a new field: IsBasicBlockReachableViaBranch
  3. Changed names of a function and several variables
opaparo marked 3 inline comments as done.Oct 16 2017, 1:58 AM
opaparo added inline comments.
include/llvm/MC/MCCodePadder.h
24

I actually don't want to limit the padding to code padding. My current implementation indeed pads with code (nops), but future implementation may pad in a different way.

216

This convention also exists in LLVM, I rather not change the name.

224

This convention also exists in LLVM, I rather not change the name.

gadi.haber accepted this revision.Oct 18 2017, 12:41 AM
This revision is now accepted and ready to land.Oct 18 2017, 12:41 AM
This revision was automatically updated to reflect the committed changes.
xur added a subscriber: xur.Nov 29 2017, 4:09 PM