This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/MCTargetDesc/
-
Target/
-
X86/
-
MCTargetDesc/
9/14
X86AsmBackend.cpp
-
test/MC/X86/
-
MC/
-
X86/
-
align-branch-64.s
8/8
align-via-relaxation.s

Differential D75203

[X86] Relax existing instructions to reduce the number of nops needed for alignment purposes
ClosedPublic

Authored by reames on Feb 26 2020, 11:21 AM.

Download Raw Diff

Details

Reviewers

MaskRay
jyknight
craig.topper
tstellar
skan

Commits

rGf708c823f06c: [X86] Relax existing instructions to reduce the number of nops needed for…

Summary

If we have an explicit align directive, we currently default to emitting nops to fill the space. As discussed in the context of the prefix padding work for branch alignment (D72225), we're allowed to play other tricks such as extending the size of previous instructions instead.

This patch will convert near jumps to far jumps if doing so decreases the number of bytes of nops needed for a following align. It does so as a post-pass after relaxation is complete. It intentionally works without moving any labels or doing anything which might require another round of relaxation.

The point of this patch is mainly to mock out the approach. The optimization implemented is real, and possibly useful, but the main point is to demonstrate an approach for implementing such "pad previous instruction" approaches. The key notion in this patch is to treat padding previous instructions as an optional optimization, not as a core part of relaxation. The benefit to this is that we avoid the potential concern about increasing the distance between two labels and thus causing further potentially non-local code grown due to relaxation. The downside is that we may miss some opportunities to avoid nops.

For the moment, this patch only implements a small set of existing relaxations.. Assuming the approach is satisfactory, I plan to extend this to a broader set of instructions where there are obvious "relaxations" which are roughly performance equivalent.

Note that this patch *doesn't* change which instructions are relaxable. We may wish to explore that separately to increase optimization opportunity, but I figured that deserved it's own separate discussion.

There are possible downsides to this optimization (and all "pad previous instruction" variants). The major two are potentially increasing instruction fetch and perturbing uop caching. (i.e. the usual alignment risks) Specifically:

If we pad an instruction such that it crosses a fetch window (16 bytes on modern X86), we may cause the decoder to have to trigger a fetch it wouldn't have otherwise. This can effect both decode speed, and icache pressure.
Intel's uop caching have particular restrictions on instruction combinations which can fit in a particular way. By moving around instructions, we can both cause misses an change misses into hits. Many of the most painful cases are around branch density, so I don't expect this to be too bad on the whole.

On the whole, I expect to see small swings (i.e. the typical alignment change problem), but nothing major or systematic in either direction.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Feb 26 2020, 11:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 26 2020, 11:21 AM

Herald added subscribers: bollu, hiraditya, mcrosier. · View Herald Transcript

Remove some stray debugging code, and tweak style a bit as a result of no-longer needed variables.

Fix a minor bug in new test - intel vs att syntax bites again.

Harbormaster failed remote builds in B47350: Diff 246795!Feb 26 2020, 12:20 PM

reames edited the summary of this revision. (Show Details)Feb 26 2020, 2:34 PM

Add support for boundary align

At first, I thought we'd be able to handle other directives as well (such as .org), but a closer read indicates I had misread the semantics of those.

LuoYuanke added a subscriber: LuoYuanke.Feb 26 2020, 5:51 PM

skan added a subscriber: skan.Feb 26 2020, 6:17 PM

skan added inline comments.Feb 26 2020, 7:31 PM

llvm/include/llvm/MC/MCFragment.h
278 ↗	(On Diff #246848)	For data
279 ↗	(On Diff #246848)	"Value is repeated ValueSize times" , I am afraid this statement is not true. `ValueSize` is the size of the integer (in bytes) of `Value`, and the `Value` is repeated `FragmentSize / AF.getValueSize()` times.

skan mentioned this in D75268: A light-weight solution to align branches within 32B boundary by prefix padding.Feb 27 2020, 8:50 AM

reames marked an inline comment as done.Feb 27 2020, 9:44 AM

reames added inline comments.

llvm/include/llvm/MC/MCFragment.h
279 ↗	(On Diff #246848)	You're correct. I'm just going to remove this. The comment change isn't really related to the patch.

Remove unrelated comment, and a couple of splittable changes.

Restructure code triggered by discussion on D75268. If I simply admit that padding instructions is inherently target specific, we have a nice clean API for any form of target specific padding. As a separate patch, I'll add prefix padding to this API for x86. It also appears that Hexagon has a bundle nop padding (currently via finishLayout) which can be refactored into this API as well.

On top of this, I prototyped what prefix padding for relaxable instructions might look like in D75300.

reames marked an inline comment as done.Feb 27 2020, 1:50 PM

reames added inline comments.

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
689	Note to self: Since this is increasing the size of the instruction, we need to make sure we're not creating a branch boundary crossing case. If we are, we can just skip the optional expansion here.

skan added inline comments.Feb 29 2020, 1:00 AM

llvm/lib/MC/MCAssembler.cpp
1186–1187 ↗	(On Diff #247044)	Capitalize the first character of the variable it
1198–1200 ↗	(On Diff #247044)	`F.getKind() == MCFragment::FT_Data \|\| F.getKind() == FT_CompactEncodedInst` I think we can handle `MCCompactEncodedInstFragment` here.
1214–1215 ↗	(On Diff #247044)	The return type of `getFragmentOffset` and `computeFragmentSize` is `uint64_t`, so I suggest use `uint64_t` here is better
1266–1267 ↗	(On Diff #247044)	uint64_t
llvm/test/MC/X86/align-via-relaxation.s
32	around

skan added a reviewer: skan.Feb 29 2020, 1:06 AM

skan added inline comments.Feb 29 2020, 1:22 AM

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
689	I think we add the code if needAlignInst(Inst) return false; auto *PF = cast_or_null<MCBoundaryAlignFragment>(RF.getPrevNode()); if(PF && PF.canEmitNops()) return false; in the function `canBeRelaxedForPadding` to make sure we're not creating a branch boundary crossing case

skan added inline comments.Feb 29 2020, 2:29 AM

llvm/lib/MC/MCAssembler.cpp
1249–1251 ↗	(On Diff #247044)	We can avoid setting size for boundaryalign with D75404

MaskRay added inline comments.Feb 29 2020, 10:32 PM

llvm/lib/MC/MCAssembler.cpp
1186 ↗	(On Diff #247044)	`for (MCSection &Sec : *this) {`
1192 ↗	(On Diff #247044)	`for (MCFragment &F : Sec) {`
1198–1200 ↗	(On Diff #247044)	It seems we can use `switch (F.getKind())` here.
1237 ↗	(On Diff #247044)	The comment can be moved before the `if`. Then the brace can be deleted.
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
672	`*RF.getSubtargetInfo()` -> `STI` (it is a member variable)
675	`SmallString<16>` should be sufficient. (The SmallString used in MCAssembler::relaxInstruction should be fixed instead)
llvm/test/MC/X86/align-via-relaxation.s
2	The `RUN` line is unnecessarily indented. It is not common. `-pc-linux-gnu` can be deleted to make the line length smaller.
7	`.text` and `.section .text` do the same thing. We can actually delete `.file`, `.text` and `.section .text`.
11	`jmp` is misaligned.

MaskRay added inline comments.Feb 29 2020, 10:41 PM

llvm/lib/MC/MCAssembler.cpp
793 ↗	(On Diff #247044)	I suspect we may have to do `while (layoutOnce(Layout))` and `optimizeLayout(Layout)` in a lockstep. `optimizeLayout` may cause some JCC_1/JMP_1 in MCRelaxableFragment to need relaxation.
794 ↗	(On Diff #247044)	We can dump the layout only when something has changed. This requires a change to `optimizeLayout`'s return type.
1028 ↗	(On Diff #247044)	Unneeded

reames marked 7 inline comments as done.Mar 2 2020, 3:14 PM

reames added inline comments.

llvm/lib/MC/MCAssembler.cpp
793 ↗	(On Diff #247044)	The whole point of the design is that we don't have to iterate them. If you see a case where we do, please point it out; that's a bug. See the comments about which starting offsets are changing and why we're not skipping over any relaxable fragment.
794 ↗	(On Diff #247044)	I see no value in this. I can make the change if you want, but I don't think it's actually useful.
1198–1200 ↗	(On Diff #247044)	I'd written it with a switch originally. It was much harder to read. I could use a switch for the filtering and then fall into the complicated logic for the align cases if you want?
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
672	I'd prefer to leave this as is. It follows the idiom of the other uses of relaxInstruction, and I haven't cross checked that the two subtarget infos are identical.
llvm/test/MC/X86/align-via-relaxation.s
7	We need to know it's a text section not a data section.
11	Er, what? Are you possibly thinking of the branch-align feature? That's not enabled in this test file.
32	No. That would change meaning of comment.

Address most of the stylistic comments. A further update to address the one functional bug is forthcoming.

Fix the functional bug and add tests for it.

Remove the relaxFragment loop. Now that boundary align fragments no longer need relaxed to update size, this is now just a nop.

Give up and just make all the code target specific. I'd originally hoped to apply this for other targets as well, but when looking at Hexagon (the only one with anything analogous), I realized that forcing a common implementation was just adding complexity for little value. Each backend has it's own set of constraints as to which padding is valuable. Instead of introducing new APIs, just follow the precedent that Hexagon already set.

Once this is in and backed a bit, we can revisit to see if any of it can be made target agnostic after all.

craig.topper added inline comments.Mar 2 2020, 4:49 PM

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
755	it's ->its
778	The end of this line and into the next line doesn't read right. It says "a it's"
779	it's -> its

skan added inline comments.Mar 2 2020, 5:55 PM

llvm/test/MC/X86/align-via-relaxation.s
11	I think his meaning is "jmp" is misaligned in a text editor.

skan mentioned this in D75357: [X86] Add a private member function determinePaddingPrefix for X86AsmBackend.Mar 2 2020, 10:16 PM

skan mentioned this in D75404: [X86] Not track size of the boudaryalign fragment during the layout.Mar 2 2020, 10:28 PM

In D75203#1901962, @reames wrote:

Remove the relaxFragment loop. Now that boundary align fragments no longer need relaxed to update size, this is now just a nop.

We may need add the relaxFragment loop back since the commit for D75404 was reverted.

Rebase and address grammar comments.

Remove tabs from test file to fix (textual) alignment.

reames marked 9 inline comments as done.Mar 3 2020, 10:21 AM

Add command line options to selectively disable. This is to ease performance regression triage post commit.

I think this is ready to land. I'm just waiting on an LGTM.

skan added inline comments.Mar 3 2020, 7:14 PM

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
661	Is16BitMode
802	Capitalize i here otherwise clang tidy would report a warning
811	Capitalize i and N here otherwise clang tidy would report a warning

reames marked 3 inline comments as done.Mar 3 2020, 7:54 PM

reames added inline comments.

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
661	This code is copied from existing example. Will change both after commit.
802	"i" and "I" are different variables. Also, if clang-tidy reports a warning for "i", clang-tidy has a bug.
811	No, clang-tidy is wrong.

LGTM , I applied this patch and passed the chek-all tests and found no new fails when running SPEC.

This revision is now accepted and ready to land.Mar 3 2020, 8:52 PM

Closed by commit rGf708c823f06c: [X86] Relax existing instructions to reduce the number of nops needed for… (authored by reames). · Explain WhyMar 4 2020, 4:54 PM

This revision was automatically updated to reflect the committed changes.

reames mentioned this in D75300: Support prefix padding for alignment purposes (Relaxable instructions only).Mar 9 2020, 1:57 PM

reames mentioned this in D42616: [X86] Emit 11-byte or 15-byte NOPs on recent AMD targets, else default to 10-byte NOPs (PR22965).Mar 9 2020, 3:13 PM

reames mentioned this in rGa79863f2f727: Support prefix padding for alignment purposes (Relaxable instructions only).Mar 15 2020, 7:54 PM

The patch locates MCRelaxableFragment's within two MCSymbol's and relaxes some MCRelaxableFragment's to reduce the size of a MCAlignFragment. The behavior is hence dependent on additional temporary labels due to -g.

@condy reported an example where clang -O1 -g and clang -O1 have different .text content
https://bugs.llvm.org/show_bug.cgi?id=42138#c13 (a MCRelaxableFragment (jmp) has 5 bytes with -O1 and 2 bytes with -O1 -g)
(There is also a thread https://lists.llvm.org/pipermail/llvm-dev/2021-January/147568.html CC @vsk for thoughts).

.p2align 4, 0x90 is common due to loops. For a larger program, with a lot of temporary labels (-g vs non -g), the assembly output difference may be quite destined.

--- a.s 2021-01-11 13:57:25.055152745 -0800
+++ b.s 2021-01-11 13:57:20.627140370 -0800
@@ -2,2 +2,3 @@
        .file   "czw.cc"
+       .file   1 "/tmp/c" "czw.cc"
        .globl  _ZN1k1lEv                       # -- Begin function _ZN1k1lEv
@@ -7,2 +8,3 @@
 .Lfunc_begin0:
+       .loc    1 26 0                          # czw.cc:26:0
        .cfi_startproc
@@ -11,2 +13,3 @@
 # %bb.0:                                # %entry
+       #DEBUG_VALUE: l:this <- $rdi
        pushq   %r15
@@ -27,2 +30,4 @@
 .Ltmp0:
+.Ltmp3:
+       #DEBUG_VALUE: l:this <- $rbx
        .cfi_escape 0x2e, 0x00
@@ -30,9 +35,17 @@
        movq    %rsp, %rdx
+.Ltmp4:
+       .loc    1 27 15 prologue_end            # czw.cc:27:15
        movl    $.L.str, %esi
        callq   _ZN1eIciEC1EPc9allocator

I am thinking of whether we can use some properties of MCSymbol to filter out some MCSymbol's in LabeledFragments but cannot find one which works (for example, IsUsedInReloc is only used in ObjectWriter, which is after the assembler optimization).

@condy: -mllvm -x86-pad-for-align=false is a workaround.

Herald added subscribers: dantrushin, pengfei. · View Herald TranscriptJan 11 2021, 3:26 PM

In D75203#2491618, @MaskRay wrote:

I am thinking of whether we can use some properties of MCSymbol to filter out some MCSymbol's in LabeledFragments but cannot find one which works (for example, IsUsedInReloc is only used in ObjectWriter, which is after the assembler optimization).

@condy: -mllvm -x86-pad-for-align=false is a workaround.

Option -x86-pad-for-align is for assembler optimization, I think we should set its default value to false and add some comments that it may interact with labels in text section. If someone turn on it explictly, the difference of machine code will be fine since it's a assembler optimization.

MaskRay mentioned this in D94542: [X86] Default to -x86-pad-for-align=false to drop assembler difference with or w/o -g.Jan 12 2021, 12:15 PM

This review has been closed for nearly 10 months. Please move discussion of proposed changes to the bug or another relevant location.

Right, there's no fundamental reason why moving a label has to be forbidden -- but it'd be extremely complex if we allowed moving a label which could cause the re-layout of a fragment we thought we'd already finalized the offsets for. This would happen if the label offset required relaxation of some instruction/data referencing it. That, then, might require undoing the padding from an instruction we've already padded out, due to less alignment-padding being required overall.

So, maybe we can allow changing the address of labels that CAN'T cause such an issue, then? Can we identify them straightforwardly? We would need to avoid changing the offset of any label which might require further relaxation of instructions in a text segment (because that's all this pass can modify).

But that doesn't mean we can modify all labels in text which are referred to only by non-text fixups. Consider relaxable uleb128 data section containing offsets of labels in text sections -- naively it seems fine to allow further relaxation of the uleb128 data after running the padding...but unfortunately that's wrong, because if relaxable instructions in text point _back_ to labels in the uleb128 section, they may need to be relaxed when/if the address offsets in the uleb128 data grow. I think the criteria we'd need to use is something like: find all sections referenced by a relaxable fixup in the text section. Then find all sections referenced by a relaxable fixup in one of those sections, recursively. This is the set of sections that cannot point TO a label in text, if we want to be able to move it. That seems...complicated, itself.

Simpler, is it be valid to simply assume that text sections cannot ever refer, even transitively, to debug sections?

In D75203#2494775, @reames wrote:

This review has been closed for nearly 10 months. Please move discussion of proposed changes to the bug or another relevant location.

Created https://bugs.llvm.org/show_bug.cgi?id=48742

MaskRay mentioned this in rGa048ce13e32d: [X86] Default to -x86-pad-for-align=false to drop assembler difference with or….Jan 16 2021, 4:40 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

MCTargetDesc/

X86AsmBackend.cpp

176 lines

test/

MC/

X86/

align-branch-64.s

53 lines

align-via-relaxation.s

74 lines

Diff 248358

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

//===-- X86AsmBackend.cpp - X86 Assembler Backend -------------------------===//		//===-- X86AsmBackend.cpp - X86 Assembler Backend -------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "MCTargetDesc/X86BaseInfo.h"		#include "MCTargetDesc/X86BaseInfo.h"
#include "MCTargetDesc/X86FixupKinds.h"		#include "MCTargetDesc/X86FixupKinds.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"
#include "llvm/BinaryFormat/MachO.h"		#include "llvm/BinaryFormat/MachO.h"
#include "llvm/MC/MCAsmBackend.h"		#include "llvm/MC/MCAsmBackend.h"
		#include "llvm/MC/MCAsmLayout.h"
#include "llvm/MC/MCAssembler.h"		#include "llvm/MC/MCAssembler.h"
		#include "llvm/MC/MCCodeEmitter.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCDwarf.h"		#include "llvm/MC/MCDwarf.h"
#include "llvm/MC/MCELFObjectWriter.h"		#include "llvm/MC/MCELFObjectWriter.h"
#include "llvm/MC/MCExpr.h"		#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCFixupKindInfo.h"		#include "llvm/MC/MCFixupKindInfo.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCMachObjectWriter.h"		#include "llvm/MC/MCMachObjectWriter.h"
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
cl::opt<bool> X86AlignBranchWithin32BBoundaries(		cl::opt<bool> X86AlignBranchWithin32BBoundaries(
"x86-branches-within-32B-boundaries", cl::init(false),		"x86-branches-within-32B-boundaries", cl::init(false),
cl::desc(		cl::desc(
"Align selected instructions to mitigate negative performance impact "		"Align selected instructions to mitigate negative performance impact "
"of Intel's micro code update for errata skx102. May break "		"of Intel's micro code update for errata skx102. May break "
"assumptions about labels corresponding to particular instructions, "		"assumptions about labels corresponding to particular instructions, "
"and should be used with caution."));		"and should be used with caution."));

		cl::opt<bool> X86PadForAlign(
		"x86-pad-for-align", cl::init(true), cl::Hidden,
		cl::desc("Pad previous instructions to implement align directives"));

		cl::opt<bool> X86PadForBranchAlign(
		"x86-pad-for-branch-align", cl::init(true), cl::Hidden,
		cl::desc("Pad previous instructions to implement branch alignment"));

class X86ELFObjectWriter : public MCELFObjectTargetWriter {		class X86ELFObjectWriter : public MCELFObjectTargetWriter {
public:		public:
X86ELFObjectWriter(bool is64Bit, uint8_t OSABI, uint16_t EMachine,		X86ELFObjectWriter(bool is64Bit, uint8_t OSABI, uint16_t EMachine,
bool HasRelocationAddend, bool foobar)		bool HasRelocationAddend, bool foobar)
: MCELFObjectTargetWriter(is64Bit, OSABI, EMachine, HasRelocationAddend) {}		: MCELFObjectTargetWriter(is64Bit, OSABI, EMachine, HasRelocationAddend) {}
};		};

class X86AsmBackend : public MCAsmBackend {		class X86AsmBackend : public MCAsmBackend {
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	public:

bool fixupNeedsRelaxation(const MCFixup &Fixup, uint64_t Value,		bool fixupNeedsRelaxation(const MCFixup &Fixup, uint64_t Value,
const MCRelaxableFragment *DF,		const MCRelaxableFragment *DF,
const MCAsmLayout &Layout) const override;		const MCAsmLayout &Layout) const override;

void relaxInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,		void relaxInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
MCInst &Res) const override;		MCInst &Res) const override;

		bool padInstructionEncoding(MCRelaxableFragment &RF, MCCodeEmitter &Emitter,
		unsigned &RemainingSize) const;
		void finishLayout(MCAssembler const &Asm, MCAsmLayout &Layout) const override;

bool writeNopData(raw_ostream &OS, uint64_t Count) const override;		bool writeNopData(raw_ostream &OS, uint64_t Count) const override;
};		};
} // end anonymous namespace		} // end anonymous namespace

static unsigned getRelaxedOpcodeBranch(const MCInst &Inst, bool is16BitMode) {		static unsigned getRelaxedOpcodeBranch(const MCInst &Inst, bool is16BitMode) {
unsigned Op = Inst.getOpcode();		unsigned Op = Inst.getOpcode();
switch (Op) {		switch (Op) {
default:		default:
▲ Show 20 Lines • Show All 450 Lines • ▼ Show 20 Lines	if (RelaxedOp == Inst.getOpcode()) {
OS << "\n";		OS << "\n";
report_fatal_error("unexpected instruction to relax: " + OS.str());		report_fatal_error("unexpected instruction to relax: " + OS.str());
}		}

Res = Inst;		Res = Inst;
Res.setOpcode(RelaxedOp);		Res.setOpcode(RelaxedOp);
}		}

		static bool canBeRelaxedForPadding(const MCRelaxableFragment &RF) {
		// TODO: There are lots of other tricks we could apply for increasing
		// encoding size without impacting performance.
		auto &Inst = RF.getInst();
		auto &STI = *RF.getSubtargetInfo();
		bool is16BitMode = STI.getFeatureBits()[X86::Mode16Bit];
		skanUnsubmitted Not Done Reply Inline Actions Is16BitMode skan: Is16BitMode
		reamesAuthorUnsubmitted Done Reply Inline Actions This code is copied from existing example. Will change both after commit. reames: This code is copied from existing example. Will change both after commit.
		return getRelaxedOpcode(Inst, is16BitMode) != Inst.getOpcode();
		}

		bool X86AsmBackend::padInstructionEncoding(MCRelaxableFragment &RF,
		MCCodeEmitter &Emitter,
		unsigned &RemainingSize) const {
		if (!canBeRelaxedForPadding(RF))
		return false;

		MCInst Relaxed;
		relaxInstruction(RF.getInst(), *RF.getSubtargetInfo(), Relaxed);
		MaskRayUnsubmitted Not Done Reply Inline Actions `RF.getSubtargetInfo()` -> `STI` (it is a member variable) MaskRay:* `*RF.getSubtargetInfo()` -> `STI` (it is a member variable)
		reamesAuthorUnsubmitted Done Reply Inline Actions I'd prefer to leave this as is. It follows the idiom of the other uses of relaxInstruction, and I haven't cross checked that the two subtarget infos are identical. reames: I'd prefer to leave this as is. It follows the idiom of the other uses of relaxInstruction…

		SmallVector<MCFixup, 4> Fixups;
		SmallString<15> Code;
		MaskRayUnsubmitted Not Done Reply Inline Actions `SmallString<16>` should be sufficient. (The SmallString used in MCAssembler::relaxInstruction should be fixed instead) MaskRay: `SmallString<16>` should be sufficient. (The SmallString used in MCAssembler::relaxInstruction…
		raw_svector_ostream VecOS(Code);
		Emitter.encodeInstruction(Relaxed, VecOS, Fixups, *RF.getSubtargetInfo());
		const unsigned OldSize = RF.getContents().size();
		const unsigned NewSize = Code.size();
		assert(NewSize >= OldSize && "size decrease during relaxation?");
		unsigned Delta = NewSize - OldSize;
		if (Delta > RemainingSize)
		return false;
		RF.setInst(Relaxed);
		RF.getContents() = Code;
		RF.getFixups() = Fixups;
		RemainingSize -= Delta;
		return true;
		}
		reamesAuthorUnsubmitted Done Reply Inline Actions Note to self: Since this is increasing the size of the instruction, we need to make sure we're not creating a branch boundary crossing case. If we are, we can just skip the optional expansion here. reames: Note to self: Since this is increasing the size of the instruction, we need to make sure we're…
		skanUnsubmitted Done Reply Inline Actions I think we add the code if needAlignInst(Inst) return false; auto PF = cast_or_null<MCBoundaryAlignFragment>(RF.getPrevNode()); if(PF && PF.canEmitNops()) return false; in the function `canBeRelaxedForPadding` to make sure we're not creating a branch boundary crossing case skan:* I think we add the code ``` if needAlignInst(Inst) return false; auto *PF =…

		void X86AsmBackend::finishLayout(MCAssembler const &Asm,
		MCAsmLayout &Layout) const {
		// See if we can further relax some instructions to cut down on the number of
		// nop bytes required for code alignment. The actual win is in reducing
		// instruction count, not number of bytes. Modern X86-64 can easily end up
		// decode limited. It is often better to reduce the number of instructions
		// (i.e. eliminate nops) even at the cost of increasing the size and
		// complexity of others.
		if (!X86PadForAlign && !X86PadForBranchAlign)
		return;

		DenseSet<MCFragment *> LabeledFragments;
		for (const MCSymbol &S : Asm.symbols())
		LabeledFragments.insert(S.getFragment(false));

		for (MCSection &Sec : Asm) {
		if (!Sec.getKind().isText())
		continue;

		SmallVector<MCRelaxableFragment *, 4> Relaxable;
		for (MCSection::iterator I = Sec.begin(), IE = Sec.end(); I != IE; ++I) {
		MCFragment &F = *I;

		if (LabeledFragments.count(&F))
		Relaxable.clear();

		if (F.getKind() == MCFragment::FT_Data \|\|
		F.getKind() == MCFragment::FT_CompactEncodedInst)
		// Skip and ignore
		continue;

		if (F.getKind() == MCFragment::FT_Relaxable) {
		auto &RF = cast<MCRelaxableFragment>(*I);
		Relaxable.push_back(&RF);
		continue;
		}

		auto canHandle = [](MCFragment &F) -> bool {
		switch (F.getKind()) {
		default:
		return false;
		case MCFragment::FT_Align:
		return X86PadForAlign;
		case MCFragment::FT_BoundaryAlign:
		return X86PadForBranchAlign;
		}
		};
		// For any unhandled kind, assume we can't change layout.
		if (!canHandle(F)) {
		Relaxable.clear();
		continue;
		}

		const uint64_t OrigOffset = Layout.getFragmentOffset(&F);
		const uint64_t OrigSize = Asm.computeFragmentSize(Layout, F);
		if (OrigSize == 0 \|\| Relaxable.empty()) {
		Relaxable.clear();
		continue;
		}

		// To keep the effects local, prefer to relax instructions closest to
		// the align directive. This is purely about human understandability
		// of the resulting code. If we later find a reason to expand
		// particular instructions over others, we can adjust.
		MCFragment *FirstChangedFragment = nullptr;
		craig.topperUnsubmitted Done Reply Inline Actions it's ->its craig.topper: it's ->its
		unsigned RemainingSize = OrigSize;
		while (!Relaxable.empty() && RemainingSize != 0) {
		auto &RF = *Relaxable.pop_back_val();
		// Give the backend a chance to play any tricks it wishes to increase
		// the encoding size of the given instruction. Target independent code
		// will try further relaxation, but target's may play further tricks.
		if (padInstructionEncoding(RF, Asm.getEmitter(), RemainingSize))
		FirstChangedFragment = &RF;

		// If we have an instruction which hasn't been fully relaxed, we can't
		// skip past it and insert bytes before it. Changing its starting
		// offset might require a larger negative offset than it can encode.
		// We don't need to worry about larger positive offsets as none of the
		// possible offsets between this and our align are visible, and the
		// ones afterwards aren't changing.
		if (mayNeedRelaxation(RF.getInst(), *RF.getSubtargetInfo()))
		break;
		}
		Relaxable.clear();

		if (FirstChangedFragment) {
		// Make sure the offsets for any fragments in the effected range get
		// updated. Note that this (conservatively) invalidates the offsets of
		craig.topperUnsubmitted Done Reply Inline Actions The end of this line and into the next line doesn't read right. It says "a it's" craig.topper: The end of this line and into the next line doesn't read right. It says "a it's"
		// those following, but this is not required.
		craig.topperUnsubmitted Done Reply Inline Actions it's -> its craig.topper: it's -> its
		Layout.invalidateFragmentsFrom(FirstChangedFragment);
		}

		// BoundaryAlign explicitly tracks it's size (unlike align)
		if (F.getKind() == MCFragment::FT_BoundaryAlign)
		cast<MCBoundaryAlignFragment>(F).setSize(RemainingSize);

		const uint64_t FinalOffset = Layout.getFragmentOffset(&F);
		const uint64_t FinalSize = Asm.computeFragmentSize(Layout, F);
		assert(OrigOffset + OrigSize == FinalOffset + FinalSize &&
		"can't move start of next fragment!");
		assert(FinalSize == RemainingSize && "inconsistent size computation?");

		// If we're looking at a boundary align, make sure we don't try to pad
		// its target instructions for some following directive. Doing so would
		// break the alignment of the current boundary align.
		if (F.getKind() == MCFragment::FT_BoundaryAlign) {
		auto &BF = cast<MCBoundaryAlignFragment>(F);
		const MCFragment *F = BF.getNextNode();
		// If the branch is unfused, it is emitted into one fragment, otherwise
		// it is emitted into two fragments at most, the next
		// MCBoundaryAlignFragment(if exists) also marks the end of the branch.
		for (int i = 0, N = BF.isFused() ? 2 : 1;
		skanUnsubmitted Not Done Reply Inline Actions Capitalize i here otherwise clang tidy would report a warning skan: Capitalize i here otherwise clang tidy would report a warning
		reamesAuthorUnsubmitted Done Reply Inline Actions "i" and "I" are different variables. Also, if clang-tidy reports a warning for "i", clang-tidy has a bug. reames: "i" and "I" are different variables. Also, if clang-tidy reports a warning for "i", clang-tidy…
		i != N && !isa<MCBoundaryAlignFragment>(F);
		++i, F = F->getNextNode(), I++) {
		}
		}
		}
		}

		// The layout is done. Mark every fragment as valid.
		for (unsigned int i = 0, n = Layout.getSectionOrder().size(); i != n; ++i) {
		skanUnsubmitted Not Done Reply Inline Actions Capitalize i and N here otherwise clang tidy would report a warning skan: Capitalize i and N here otherwise clang tidy would report a warning
		reamesAuthorUnsubmitted Done Reply Inline Actions No, clang-tidy is wrong. reames: No, clang-tidy is wrong.
		MCSection &Section = *Layout.getSectionOrder()[i];
		Layout.getFragmentOffset(&*Section.getFragmentList().rbegin());
		Asm.computeFragmentSize(Layout, *Section.getFragmentList().rbegin());
		}
		}

/// Write a sequence of optimal nops to the output, covering \p Count		/// Write a sequence of optimal nops to the output, covering \p Count
/// bytes.		/// bytes.
/// \return - true on success, false on failure		/// \return - true on success, false on failure
bool X86AsmBackend::writeNopData(raw_ostream &OS, uint64_t Count) const {		bool X86AsmBackend::writeNopData(raw_ostream &OS, uint64_t Count) const {
static const char Nops[10][11] = {		static const char Nops[10][11] = {
// nop		// nop
"\x90",		"\x90",
// xchg %ax,%ax		// xchg %ax,%ax
▲ Show 20 Lines • Show All 522 Lines • Show Last 20 Lines

llvm/test/MC/X86/align-branch-64.s

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	test_indirect:
.endr		.endr
jmpq *(%rax)		jmpq *(%rax)

.p2align 4		.p2align 4
.type bar,@function		.type bar,@function
bar:		bar:
retq		retq

		# CHECK: test_pad_via_relax:
		# CHECK: 200: testq
		# CHECK: 203: jne
		# CHECK: 209: int3
		# note 6 byte jne which could be a 2 byte jne, but is instead
		# expanded for padding purposes
		# CHECK-NOT: nop
		# CHECK: 220: callq
		.global test_pad_via_relax
		.p2align 5
		test_pad_via_relax:
		testq %rax, %rax
		jnz bar
		.rept 23
		int3
		.endr
		callq bar

		# This case looks really tempting to pad, but doing so for the call causes
		# the jmp to be misaligned.
		# CHECK: test_pad_via_relax_neg1:
		# CHECK: 240: int3
		# CHECK: 25a: testq
		# CHECK: 25d: jne
		# CHECK: 25f: nop
		# CHECK: 260: callq
		.global test_pad_via_relax_neg1
		.p2align 5
		test_pad_via_relax_neg1:
		.rept 26
		int3
		.endr
		testq %rax, %rax
		jnz bar
		callq bar

		# Same as previous, but without fusion
		# CHECK: test_pad_via_relax_neg2:
		# CHECK: 280: int3
		# CHECK: 29d: jmp
		# CHECK: 29f: nop
		# CHECK: 2a0: callq
		.global test_pad_via_relax_neg2
		.p2align 5
		test_pad_via_relax_neg2:
		.rept 29
		int3
		.endr
		jmp bar2
		callq bar2

		bar2:

.section "unknown"		.section "unknown"
.p2align 4		.p2align 4
.type baz,@function		.type baz,@function
baz:		baz:
retq		retq

llvm/test/MC/X86/align-via-relaxation.s

This file was added.

				# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu %s \| llvm-objdump -d --section=.text - \| FileCheck %s

				MaskRayUnsubmitted Done Reply Inline Actions The `RUN` line is unnecessarily indented. It is not common. `-pc-linux-gnu` can be deleted to make the line length smaller. MaskRay: The `RUN` line is unnecessarily indented. It is not common. `-pc-linux-gnu` can be deleted to…

				.file "test.c"
				.text
				.section .text
				# Demonstrate that we can relax instructions to provide padding, not
				MaskRayUnsubmitted Done Reply Inline Actions `.text` and `.section .text` do the same thing. We can actually delete `.file`, `.text` and `.section .text`. MaskRay: `.text` and `.section .text` do the same thing. We can actually delete `.file`, `.text` and `.
				reamesAuthorUnsubmitted Done Reply Inline Actions We need to know it's a text section not a data section. reames: We need to know it's a text section not a data section.
				# just insert nops. jmps are being used for ease of demonstration.
				# CHECK: .text
				# CHECK: 0: eb 1f jmp 31 <foo>
				# CHECK: 2: e9 1a 00 00 00 jmp 26 <foo>
				MaskRayUnsubmitted Done Reply Inline Actions `jmp` is misaligned. MaskRay: `jmp` is misaligned.
				reamesAuthorUnsubmitted Done Reply Inline Actions Er, what? Are you possibly thinking of the branch-align feature? That's not enabled in this test file. reames: Er, what? Are you possibly thinking of the branch-align feature? That's not enabled in this…
				skanUnsubmitted Done Reply Inline Actions I think his meaning is "jmp" is misaligned in a text editor. skan: I think his meaning is "jmp" is misaligned in a text editor.
				# CHECK: 7: e9 15 00 00 00 jmp 21 <foo>
				# CHECK: c: e9 10 00 00 00 jmp 16 <foo>
				# CHECK: 11: e9 0b 00 00 00 jmp 11 <foo>
				# CHECK: 16: e9 06 00 00 00 jmp 6 <foo>
				# CHECK: 1b: e9 01 00 00 00 jmp 1 <foo>
				# CHECK: 20: cc int3
				.p2align 4
				jmp foo
				jmp foo
				jmp foo
				jmp foo
				jmp foo
				jmp foo
				jmp foo
				.p2align 5
				int3
				foo:
				ret

				# Check that we're not shifting aroudn the offsets of labels - doing
				# that would require a further round of relaxation
				skanUnsubmitted Done Reply Inline Actions around skan: around
				reamesAuthorUnsubmitted Done Reply Inline Actions No. That would change meaning of comment. reames: No. That would change meaning of comment.
				# CHECK: bar:
				# CHECK: 22: eb fe jmp -2 <bar>
				# CHECK: 24: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
				# CHECK: 2e: 66 90 nop
				# CHECK: 30: 0f 0b ud2

				bar:
				jmp bar
				nobypass:
				.p2align 4
				ud2


				# Canonical toy loop to show benefit - we can align the loop header with
				# fewer nops by relaxing the branch, even though we don't need to
				# CHECK: loop_preheader:
				# CHECK: 45: 48 85 c0 testq %rax, %rax
				# CHECK: 48: 0f 8e 22 00 00 00 jle 34 <loop_exit>
				# CHECK: 4e: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
				# CHECK: 58: 0f 1f 84 00 00 00 00 00 nopl (%rax,%rax)
				# CHECK: loop_header:
				# CHECK: 60: 48 83 e8 01 subq $1, %rax
				# CHECK: 64: 48 85 c0 testq %rax, %rax
				# CHECK: 67: 7e 07 jle 7 <loop_exit>
				# CHECK: 69: e9 f2 ff ff ff jmp -14 <loop_header>
				# CHECK: 6e: 66 90 nop
				# CHECK: loop_exit:
				# CHECK: 70: c3 retq
				.p2align 5
				.skip 5
				loop_preheader:
				testq %rax, %rax
				jle loop_exit
				.p2align 5
				loop_header:
				subq $1, %rax
				testq %rax, %rax
				jle loop_exit
				jmp loop_header
				.p2align 4
				loop_exit:
				ret

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Relax existing instructions to reduce the number of nops needed for alignment purposesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 248358

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

llvm/test/MC/X86/align-branch-64.s

llvm/test/MC/X86/align-via-relaxation.s

[X86] Relax existing instructions to reduce the number of nops needed for alignment purposes
ClosedPublic