Implementation of padding code with nop instructions in key places that will cause performance improvement.
The implementation is such that the adding of the nops is done in the Assembler after the layout is done and all IPs and alignments are known.
The patch is the result of this RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107223.html
This is the first patch of the implemented solution, and contains only the infrastructure needed for this change (this by itself is a NFC).
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Should we expose this to the .s?
That would make it easy to test codegen because you would just need to
check for things like .pad_bb_begin.It would make it easier to test the layout as you could use a simple .s
file for that.Cheers,
Rafael
Hi Rafael,
Unfortunately it cannot be exposed to the .s. I'll explain:
When generating an object file, MCObjectStreamer (or more specifically, one of its descendants) is used as the MCStreamer by MCAsmPrinter. All the decisions regarding the places to insert padding and the final size of the padding are made in the MCAsmBackend, which is a member of MCObjectStreamer (via the MCAssembler). Some of those decisions (the decisions regarding the final size of the padding) rely on the final layout of the code (as manifested by MCAsmLayout). On the other hand, when generating a .s file, MCAsmStreamer is used as the MCStreamer by MCAsmPrinter. Unlike MCObjectStreamer, this streamer does not require a MCAsmBackend. In addition, .s files don't have a layout, so there's no MCAsmLayout either, which, as I stated, is needed for the decision making. All in all, we can't mimic the padding behavior when generating a .s file.
Hope that explains it.
Thanks,
Omer
Can you please repost the diff with full context using one of the commands mentioned here. http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface
That will allow phabricator to show the entire file that's being touched so we can see areas around your patch in the web interface.
include/llvm/MC/MCCodePadder.h | ||
---|---|---|
42 ↗ | (On Diff #103181) | I think LLVM coding practices would prefer DenseSet here, but I'm not sure. |
63 ↗ | (On Diff #103181) | I think we would normally use DenseMap here. |
include/llvm/MC/MCAsmBackend.h | ||
---|---|---|
22 ↗ | (On Diff #103181) | I don't think MC layer supposed to use any of the Machine* classes. |
Please update the patch with a context that contains the full file (svn diff -U99999 ...)
Can you add a short section to docs/CodeGenerator.rst explaining how code padding works? It will make it easier to review this patch and following patches, and would serve as a good reference.
It would be helpful if you uploaded the next patch in the queue to justify this patch.
include/llvm/MC/MCCodePadder.h | ||
---|---|---|
70 ↗ | (On Diff #103181) | Where is this used? |
include/llvm/MC/MCFragment.h | ||
44 ↗ | (On Diff #103181) | check the alignment |
327 ↗ | (On Diff #103181) | Please add a description for this class |
338 ↗ | (On Diff #103181) | Use: instead of: |
339 ↗ | (On Diff #103181) | check alignment |
407 ↗ | (On Diff #103181) | Not sure we need this method |
lib/MC/MCAsmBackend.cpp | ||
21 ↗ | (On Diff #103181) | Avoid the need for explicit initialization and destruction by using a unique_ptr<> or a similar object. |
25 ↗ | (On Diff #103181) | is getOrCreateCodePadder a better name? |
26 ↗ | (On Diff #103181) | Is there a way to avoid these repeating runtime checks by requiring the user to ensure createCodePadder() happends before any getCodePadder()? |
31 ↗ | (On Diff #103181) | Is this function called anywhere other than getCodePadder? If no, i think we don't need it |
lib/MC/MCCodePadder.cpp | ||
63 ↗ | (On Diff #103181) | if (SealedSections.count(...)) is a bit more compact |
195 ↗ | (On Diff #103181) | auto |
227 ↗ | (On Diff #103181) | Maybe "Fragment key already exists" would be more accurate? |
233 ↗ | (On Diff #103181) | auto |
include/llvm/MC/MCAsmBackend.h | ||
---|---|---|
22 ↗ | (On Diff #103181) | Thanks for the heads up. I am now looking for ways to remove those unwanted dependencies. |
Removed unwanted dependencies introduces in my previous patch: MC layer no longer depends on CodeGen, Core or Target. This was achieved by introducing a new structure named MCCodePaddingContext that is being created by AsmPrinter and passed on to the MC layer (all the way to MCCodePadder). This replaces the direct use of TargetMachine, MachineBasicBlock, MachineFunction and MachineLoopInfo by the MCCodePadder.
Note: This time I used "svn diff -x -U999999" to create the patch, so comparison between this patch and the previous one may prove to be challenging (This had to be done for future work and review).
Tried the two patches with our internal benchmark -- the alignment related performance issue is still there. The problem disappears with -mllvm -x86-experimental-pref-loop-alignment=5 is used.
I can think of two reasons that might have caused the patch to fail to help you:
- This optimization is not enabled for all architectures. Use -target-cpu to specify the target (if you haven't already) to make sure you are compiling for the desired architecture, which will enable the optimization (if relevant). See the constructor of X86MCCodePadder for the list of architectures for which the optimization is enabled.
- You might have encountered a different alignment issue than the one handled by this patch. I will send you soon another patch (which was not yet uploaded for review) that addresses another alignment issue (in addition to the first one), which will hopefully handle your case. Regardless of the results you'll get, could you send a small reproducer for this problem? I would like to investigate it and see the kind of alignment issue we are facing here.
include/llvm/CodeGen/AsmPrinter.h | ||
---|---|---|
630 ↗ | (On Diff #106196) | Consider to rename the method to: setupCodePaddingContext |
include/llvm/MC/MCAsmBackend.h | ||
153 ↗ | (On Diff #106196) | when --> after |
include/llvm/MC/MCCodePadder.h | ||
24 ↗ | (On Diff #106196) | MCPaddingFragment --> MCCodePaddingFragment ? |
47 ↗ | (On Diff #106196) | PaddingPolicies --> CodePaddingPolicies ? |
216 ↗ | (On Diff #106196) | basicBlockRequiresPaddingFragment --> isBasicBlockRequiresPaddingFragment ? |
224 ↗ | (On Diff #106196) | instructionRequiresPaddingFragment --> isInstructionRequiresPaddingFragment |
236 ↗ | (On Diff #106196) | computeRangePenaltyWeight --> computeScopePenaltyWeight ? |
include/llvm/MC/MCFragment.h | ||
332 ↗ | (On Diff #106196) | MCPaddingFragment --> MCCodePaddingFragment |
335 ↗ | (On Diff #106196) | PaddingPoliciesMask --> AggregatedPaddingPoliciesMask ? |
lib/CodeGen/AsmPrinter/AsmPrinter.cpp | ||
2607 ↗ | (On Diff #106196) | IsPaddingActive --> IsCodePaddingActive ? |
- MCAsmBackend c'tor now takes a unique_ptr of MCCodePadder instead of a raw pointer
- MCCodePaddingContext has a new field: IsBasicBlockReachableViaBranch
- Changed names of a function and several variables
include/llvm/MC/MCCodePadder.h | ||
---|---|---|
24 ↗ | (On Diff #106196) | I actually don't want to limit the padding to code padding. My current implementation indeed pads with code (nops), but future implementation may pad in a different way. |
216 ↗ | (On Diff #106196) | This convention also exists in LLVM, I rather not change the name. |
224 ↗ | (On Diff #106196) | This convention also exists in LLVM, I rather not change the name. |