This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/MC/
-
llvm/
-
MC/
-
MCAsmBackend.h
-
MCAssembler.h
1/3
MCFragment.h
-
lib/
-
MC/
3/15
MCAssembler.cpp
-
MCFragment.cpp
-
Target/X86/MCTargetDesc/
-
X86/
-
MCTargetDesc/
9/14
X86AsmBackend.cpp
-
test/MC/X86/
-
MC/
-
X86/
-
align-branch-64.s
8/8
align-via-relaxation.s

Differential D75203

[X86] Relax existing instructions to reduce the number of nops needed for alignment purposes
ClosedPublic

Authored by reames on Feb 26 2020, 11:21 AM.

Download Raw Diff

Details

Reviewers

MaskRay
jyknight
craig.topper
tstellar
skan

Commits

rGf708c823f06c: [X86] Relax existing instructions to reduce the number of nops needed for…

Summary

If we have an explicit align directive, we currently default to emitting nops to fill the space. As discussed in the context of the prefix padding work for branch alignment (D72225), we're allowed to play other tricks such as extending the size of previous instructions instead.

This patch will convert near jumps to far jumps if doing so decreases the number of bytes of nops needed for a following align. It does so as a post-pass after relaxation is complete. It intentionally works without moving any labels or doing anything which might require another round of relaxation.

The point of this patch is mainly to mock out the approach. The optimization implemented is real, and possibly useful, but the main point is to demonstrate an approach for implementing such "pad previous instruction" approaches. The key notion in this patch is to treat padding previous instructions as an optional optimization, not as a core part of relaxation. The benefit to this is that we avoid the potential concern about increasing the distance between two labels and thus causing further potentially non-local code grown due to relaxation. The downside is that we may miss some opportunities to avoid nops.

For the moment, this patch only implements a small set of existing relaxations.. Assuming the approach is satisfactory, I plan to extend this to a broader set of instructions where there are obvious "relaxations" which are roughly performance equivalent.

Note that this patch *doesn't* change which instructions are relaxable. We may wish to explore that separately to increase optimization opportunity, but I figured that deserved it's own separate discussion.

There are possible downsides to this optimization (and all "pad previous instruction" variants). The major two are potentially increasing instruction fetch and perturbing uop caching. (i.e. the usual alignment risks) Specifically:

If we pad an instruction such that it crosses a fetch window (16 bytes on modern X86), we may cause the decoder to have to trigger a fetch it wouldn't have otherwise. This can effect both decode speed, and icache pressure.
Intel's uop caching have particular restrictions on instruction combinations which can fit in a particular way. By moving around instructions, we can both cause misses an change misses into hits. Many of the most painful cases are around branch density, so I don't expect this to be too bad on the whole.

On the whole, I expect to see small swings (i.e. the typical alignment change problem), but nothing major or systematic in either direction.

Diff Detail

Event Timeline

reames created this revision.Feb 26 2020, 11:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 26 2020, 11:21 AM

Herald added subscribers: bollu, hiraditya, mcrosier. · View Herald Transcript

Remove some stray debugging code, and tweak style a bit as a result of no-longer needed variables.

Fix a minor bug in new test - intel vs att syntax bites again.

Harbormaster failed remote builds in B47350: Diff 246795!Feb 26 2020, 12:20 PM

reames edited the summary of this revision. (Show Details)Feb 26 2020, 2:34 PM

Add support for boundary align

At first, I thought we'd be able to handle other directives as well (such as .org), but a closer read indicates I had misread the semantics of those.

LuoYuanke added a subscriber: LuoYuanke.Feb 26 2020, 5:51 PM

skan added a subscriber: skan.Feb 26 2020, 6:17 PM

skan added inline comments.Feb 26 2020, 7:31 PM

llvm/include/llvm/MC/MCFragment.h
278	For data
279	"Value is repeated ValueSize times" , I am afraid this statement is not true. `ValueSize` is the size of the integer (in bytes) of `Value`, and the `Value` is repeated `FragmentSize / AF.getValueSize()` times.

skan mentioned this in D75268: A light-weight solution to align branches within 32B boundary by prefix padding.Feb 27 2020, 8:50 AM

reames marked an inline comment as done.Feb 27 2020, 9:44 AM

reames added inline comments.

llvm/include/llvm/MC/MCFragment.h
279	You're correct. I'm just going to remove this. The comment change isn't really related to the patch.

Remove unrelated comment, and a couple of splittable changes.

Restructure code triggered by discussion on D75268. If I simply admit that padding instructions is inherently target specific, we have a nice clean API for any form of target specific padding. As a separate patch, I'll add prefix padding to this API for x86. It also appears that Hexagon has a bundle nop padding (currently via finishLayout) which can be refactored into this API as well.

On top of this, I prototyped what prefix padding for relaxable instructions might look like in D75300.

reames marked an inline comment as done.Feb 27 2020, 1:50 PM

reames added inline comments.

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
682	Note to self: Since this is increasing the size of the instruction, we need to make sure we're not creating a branch boundary crossing case. If we are, we can just skip the optional expansion here.

skan added inline comments.Feb 29 2020, 1:00 AM

llvm/lib/MC/MCAssembler.cpp
1182–1183	Capitalize the first character of the variable it
1194–1196	`F.getKind() == MCFragment::FT_Data \|\| F.getKind() == FT_CompactEncodedInst` I think we can handle `MCCompactEncodedInstFragment` here.
1210–1211	The return type of `getFragmentOffset` and `computeFragmentSize` is `uint64_t`, so I suggest use `uint64_t` here is better
1262–1263	uint64_t
llvm/test/MC/X86/align-via-relaxation.s
31	around

skan added a reviewer: skan.Feb 29 2020, 1:06 AM

skan added inline comments.Feb 29 2020, 1:22 AM

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
682	I think we add the code if needAlignInst(Inst) return false; auto *PF = cast_or_null<MCBoundaryAlignFragment>(RF.getPrevNode()); if(PF && PF.canEmitNops()) return false; in the function `canBeRelaxedForPadding` to make sure we're not creating a branch boundary crossing case

skan added inline comments.Feb 29 2020, 2:29 AM

llvm/lib/MC/MCAssembler.cpp
1245–1247	We can avoid setting size for boundaryalign with D75404

MaskRay added inline comments.Feb 29 2020, 10:32 PM

llvm/lib/MC/MCAssembler.cpp
1182	`for (MCSection &Sec : *this) {`
1188	`for (MCFragment &F : Sec) {`
1194–1196	It seems we can use `switch (F.getKind())` here.
1233	The comment can be moved before the `if`. Then the brace can be deleted.
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
665	`*RF.getSubtargetInfo()` -> `STI` (it is a member variable)
668	`SmallString<16>` should be sufficient. (The SmallString used in MCAssembler::relaxInstruction should be fixed instead)
llvm/test/MC/X86/align-via-relaxation.s
2	The `RUN` line is unnecessarily indented. It is not common. `-pc-linux-gnu` can be deleted to make the line length smaller.
7	`.text` and `.section .text` do the same thing. We can actually delete `.file`, `.text` and `.section .text`.
11	`jmp` is misaligned.

MaskRay added inline comments.Feb 29 2020, 10:41 PM

llvm/lib/MC/MCAssembler.cpp
793–799	I suspect we may have to do `while (layoutOnce(Layout))` and `optimizeLayout(Layout)` in a lockstep. `optimizeLayout` may cause some JCC_1/JMP_1 in MCRelaxableFragment to need relaxation.
794	We can dump the layout only when something has changed. This requires a change to `optimizeLayout`'s return type.
1025	Unneeded

reames marked 7 inline comments as done.Mar 2 2020, 3:14 PM

reames added inline comments.

llvm/lib/MC/MCAssembler.cpp
793–799	The whole point of the design is that we don't have to iterate them. If you see a case where we do, please point it out; that's a bug. See the comments about which starting offsets are changing and why we're not skipping over any relaxable fragment.
794	I see no value in this. I can make the change if you want, but I don't think it's actually useful.
1194–1196	I'd written it with a switch originally. It was much harder to read. I could use a switch for the filtering and then fall into the complicated logic for the align cases if you want?
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
665	I'd prefer to leave this as is. It follows the idiom of the other uses of relaxInstruction, and I haven't cross checked that the two subtarget infos are identical.
llvm/test/MC/X86/align-via-relaxation.s
7	We need to know it's a text section not a data section.
11	Er, what? Are you possibly thinking of the branch-align feature? That's not enabled in this test file.
31	No. That would change meaning of comment.

Address most of the stylistic comments. A further update to address the one functional bug is forthcoming.

Fix the functional bug and add tests for it.

Remove the relaxFragment loop. Now that boundary align fragments no longer need relaxed to update size, this is now just a nop.

Give up and just make all the code target specific. I'd originally hoped to apply this for other targets as well, but when looking at Hexagon (the only one with anything analogous), I realized that forcing a common implementation was just adding complexity for little value. Each backend has it's own set of constraints as to which padding is valuable. Instead of introducing new APIs, just follow the precedent that Hexagon already set.

Once this is in and backed a bit, we can revisit to see if any of it can be made target agnostic after all.

craig.topper added inline comments.Mar 2 2020, 4:49 PM

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
748	it's ->its
771	The end of this line and into the next line doesn't read right. It says "a it's"
772	it's -> its

skan added inline comments.Mar 2 2020, 5:55 PM

llvm/test/MC/X86/align-via-relaxation.s
11	I think his meaning is "jmp" is misaligned in a text editor.

skan mentioned this in D75357: [X86] Add a private member function determinePaddingPrefix for X86AsmBackend.Mar 2 2020, 10:16 PM

skan mentioned this in D75404: [X86] Not track size of the boudaryalign fragment during the layout.Mar 2 2020, 10:28 PM

In D75203#1901962, @reames wrote:

Remove the relaxFragment loop. Now that boundary align fragments no longer need relaxed to update size, this is now just a nop.

We may need add the relaxFragment loop back since the commit for D75404 was reverted.

Rebase and address grammar comments.

Remove tabs from test file to fix (textual) alignment.

reames marked 9 inline comments as done.Mar 3 2020, 10:21 AM

Add command line options to selectively disable. This is to ease performance regression triage post commit.

I think this is ready to land. I'm just waiting on an LGTM.

skan added inline comments.Mar 3 2020, 7:14 PM

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
654	Is16BitMode
795	Capitalize i here otherwise clang tidy would report a warning
804	Capitalize i and N here otherwise clang tidy would report a warning

reames marked 3 inline comments as done.Mar 3 2020, 7:54 PM

reames added inline comments.

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
654	This code is copied from existing example. Will change both after commit.
795	"i" and "I" are different variables. Also, if clang-tidy reports a warning for "i", clang-tidy has a bug.
804	No, clang-tidy is wrong.

LGTM , I applied this patch and passed the chek-all tests and found no new fails when running SPEC.

This revision is now accepted and ready to land.Mar 3 2020, 8:52 PM

Closed by commit rGf708c823f06c: [X86] Relax existing instructions to reduce the number of nops needed for… (authored by reames). · Explain WhyMar 4 2020, 4:54 PM

This revision was automatically updated to reflect the committed changes.

reames mentioned this in D75300: Support prefix padding for alignment purposes (Relaxable instructions only).Mar 9 2020, 1:57 PM

reames mentioned this in D42616: [X86] Emit 11-byte or 15-byte NOPs on recent AMD targets, else default to 10-byte NOPs (PR22965).Mar 9 2020, 3:13 PM

reames mentioned this in rGa79863f2f727: Support prefix padding for alignment purposes (Relaxable instructions only).Mar 15 2020, 7:54 PM

The patch locates MCRelaxableFragment's within two MCSymbol's and relaxes some MCRelaxableFragment's to reduce the size of a MCAlignFragment. The behavior is hence dependent on additional temporary labels due to -g.

@condy reported an example where clang -O1 -g and clang -O1 have different .text content
https://bugs.llvm.org/show_bug.cgi?id=42138#c13 (a MCRelaxableFragment (jmp) has 5 bytes with -O1 and 2 bytes with -O1 -g)
(There is also a thread https://lists.llvm.org/pipermail/llvm-dev/2021-January/147568.html CC @vsk for thoughts).

.p2align 4, 0x90 is common due to loops. For a larger program, with a lot of temporary labels (-g vs non -g), the assembly output difference may be quite destined.

--- a.s 2021-01-11 13:57:25.055152745 -0800
+++ b.s 2021-01-11 13:57:20.627140370 -0800
@@ -2,2 +2,3 @@
        .file   "czw.cc"
+       .file   1 "/tmp/c" "czw.cc"
        .globl  _ZN1k1lEv                       # -- Begin function _ZN1k1lEv
@@ -7,2 +8,3 @@
 .Lfunc_begin0:
+       .loc    1 26 0                          # czw.cc:26:0
        .cfi_startproc
@@ -11,2 +13,3 @@
 # %bb.0:                                # %entry
+       #DEBUG_VALUE: l:this <- $rdi
        pushq   %r15
@@ -27,2 +30,4 @@
 .Ltmp0:
+.Ltmp3:
+       #DEBUG_VALUE: l:this <- $rbx
        .cfi_escape 0x2e, 0x00
@@ -30,9 +35,17 @@
        movq    %rsp, %rdx
+.Ltmp4:
+       .loc    1 27 15 prologue_end            # czw.cc:27:15
        movl    $.L.str, %esi
        callq   _ZN1eIciEC1EPc9allocator

I am thinking of whether we can use some properties of MCSymbol to filter out some MCSymbol's in LabeledFragments but cannot find one which works (for example, IsUsedInReloc is only used in ObjectWriter, which is after the assembler optimization).

@condy: -mllvm -x86-pad-for-align=false is a workaround.

Herald added subscribers: dantrushin, pengfei. · View Herald TranscriptJan 11 2021, 3:26 PM

In D75203#2491618, @MaskRay wrote:

I am thinking of whether we can use some properties of MCSymbol to filter out some MCSymbol's in LabeledFragments but cannot find one which works (for example, IsUsedInReloc is only used in ObjectWriter, which is after the assembler optimization).

@condy: -mllvm -x86-pad-for-align=false is a workaround.

Option -x86-pad-for-align is for assembler optimization, I think we should set its default value to false and add some comments that it may interact with labels in text section. If someone turn on it explictly, the difference of machine code will be fine since it's a assembler optimization.

MaskRay mentioned this in D94542: [X86] Default to -x86-pad-for-align=false to drop assembler difference with or w/o -g.Jan 12 2021, 12:15 PM

This review has been closed for nearly 10 months. Please move discussion of proposed changes to the bug or another relevant location.

Right, there's no fundamental reason why moving a label has to be forbidden -- but it'd be extremely complex if we allowed moving a label which could cause the re-layout of a fragment we thought we'd already finalized the offsets for. This would happen if the label offset required relaxation of some instruction/data referencing it. That, then, might require undoing the padding from an instruction we've already padded out, due to less alignment-padding being required overall.

So, maybe we can allow changing the address of labels that CAN'T cause such an issue, then? Can we identify them straightforwardly? We would need to avoid changing the offset of any label which might require further relaxation of instructions in a text segment (because that's all this pass can modify).

But that doesn't mean we can modify all labels in text which are referred to only by non-text fixups. Consider relaxable uleb128 data section containing offsets of labels in text sections -- naively it seems fine to allow further relaxation of the uleb128 data after running the padding...but unfortunately that's wrong, because if relaxable instructions in text point _back_ to labels in the uleb128 section, they may need to be relaxed when/if the address offsets in the uleb128 data grow. I think the criteria we'd need to use is something like: find all sections referenced by a relaxable fixup in the text section. Then find all sections referenced by a relaxable fixup in one of those sections, recursively. This is the set of sections that cannot point TO a label in text, if we want to be able to move it. That seems...complicated, itself.

Simpler, is it be valid to simply assume that text sections cannot ever refer, even transitively, to debug sections?

In D75203#2494775, @reames wrote:

This review has been closed for nearly 10 months. Please move discussion of proposed changes to the bug or another relevant location.

Created https://bugs.llvm.org/show_bug.cgi?id=48742

MaskRay mentioned this in rGa048ce13e32d: [X86] Default to -x86-pad-for-align=false to drop assembler difference with or….Jan 16 2021, 4:40 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

MC/

MCAsmBackend.h

9 lines

MCAssembler.h

4 lines

MCFragment.h

3 lines

lib/

MC/

MCAssembler.cpp

130 lines

MCFragment.cpp

1 line

Target/

X86/

MCTargetDesc/

X86AsmBackend.cpp

14 lines

test/

MC/

X86/

align-branch-64.s

18 lines

align-via-relaxation.s

74 lines

Diff 246848

llvm/include/llvm/MC/MCAsmBackend.h

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	public:
/// Check whether the given instruction may need relaxation.		/// Check whether the given instruction may need relaxation.
///		///
/// \param Inst - The instruction to test.		/// \param Inst - The instruction to test.
/// \param STI - The MCSubtargetInfo in effect when the instruction was		/// \param STI - The MCSubtargetInfo in effect when the instruction was
/// encoded.		/// encoded.
virtual bool mayNeedRelaxation(const MCInst &Inst,		virtual bool mayNeedRelaxation(const MCInst &Inst,
const MCSubtargetInfo &STI) const = 0;		const MCSubtargetInfo &STI) const = 0;

		/// Return true if the given instruction can be further relaxed. This is
		/// different than mayNeedFurther relaxation in that it is allowed to return
		/// true when relaxation is possible, but not required.
		virtual bool canBeFurtherRelaxed(const MCInst &Inst,
		const MCSubtargetInfo &STI) const {
		return mayNeedRelaxation(Inst, STI);
		}


/// Target specific predicate for whether a given fixup requires the		/// Target specific predicate for whether a given fixup requires the
/// associated instruction to be relaxed.		/// associated instruction to be relaxed.
virtual bool fixupNeedsRelaxationAdvanced(const MCFixup &Fixup, bool Resolved,		virtual bool fixupNeedsRelaxationAdvanced(const MCFixup &Fixup, bool Resolved,
uint64_t Value,		uint64_t Value,
const MCRelaxableFragment *DF,		const MCRelaxableFragment *DF,
const MCAsmLayout &Layout,		const MCAsmLayout &Layout,
const bool WasForced) const;		const bool WasForced) const;

▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/include/llvm/MC/MCAssembler.h

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	private:
bool relaxBoundaryAlign(MCAsmLayout &Layout, MCBoundaryAlignFragment &BF);		bool relaxBoundaryAlign(MCAsmLayout &Layout, MCBoundaryAlignFragment &BF);
bool relaxDwarfLineAddr(MCAsmLayout &Layout, MCDwarfLineAddrFragment &DF);		bool relaxDwarfLineAddr(MCAsmLayout &Layout, MCDwarfLineAddrFragment &DF);
bool relaxDwarfCallFrameFragment(MCAsmLayout &Layout,		bool relaxDwarfCallFrameFragment(MCAsmLayout &Layout,
MCDwarfCallFrameFragment &DF);		MCDwarfCallFrameFragment &DF);
bool relaxCVInlineLineTable(MCAsmLayout &Layout,		bool relaxCVInlineLineTable(MCAsmLayout &Layout,
MCCVInlineLineTableFragment &DF);		MCCVInlineLineTableFragment &DF);
bool relaxCVDefRange(MCAsmLayout &Layout, MCCVDefRangeFragment &DF);		bool relaxCVDefRange(MCAsmLayout &Layout, MCCVDefRangeFragment &DF);

		/// Once relaxation is complete, try to reduce number of nops required
		/// without requiring any further relaxation.
		void optimizeLayout(MCAsmLayout &Layout);

/// finishLayout - Finalize a layout, including fragment lowering.		/// finishLayout - Finalize a layout, including fragment lowering.
void finishLayout(MCAsmLayout &Layout);		void finishLayout(MCAsmLayout &Layout);

std::tuple<MCValue, uint64_t, bool>		std::tuple<MCValue, uint64_t, bool>
handleFixup(const MCAsmLayout &Layout, MCFragment &F, const MCFixup &Fixup);		handleFixup(const MCAsmLayout &Layout, MCFragment &F, const MCFixup &Fixup);

public:		public:
std::vector<std::pair<StringRef, const MCSymbol *>> Symvers;		std::vector<std::pair<StringRef, const MCSymbol *>> Symvers;
▲ Show 20 Lines • Show All 255 Lines • Show Last 20 Lines

llvm/include/llvm/MC/MCFragment.h

Show First 20 Lines • Show All 268 Lines • ▼ Show 20 Lines	public:
const MCInst &getInst() const { return Inst; }		const MCInst &getInst() const { return Inst; }
void setInst(const MCInst &Value) { Inst = Value; }		void setInst(const MCInst &Value) { Inst = Value; }

static bool classof(const MCFragment *F) {		static bool classof(const MCFragment *F) {
return F->getKind() == MCFragment::FT_Relaxable;		return F->getKind() == MCFragment::FT_Relaxable;
}		}
};		};

		/// Represents alignment for either code or data. For code, the Value and
		/// ValueSize fields are ignored; nops are always emitted. For date, the
		skanUnsubmitted Not Done Reply Inline Actions For data skan: For data
		/// Value is repeated ValueSize times. Value is usually 0.
		skanUnsubmitted Not Done Reply Inline Actions "Value is repeated ValueSize times" , I am afraid this statement is not true. `ValueSize` is the size of the integer (in bytes) of `Value`, and the `Value` is repeated `FragmentSize / AF.getValueSize()` times. skan: "Value is repeated ValueSize times" , I am afraid this statement is not true. `ValueSize` is…
		reamesAuthorUnsubmitted Done Reply Inline Actions You're correct. I'm just going to remove this. The comment change isn't really related to the patch. reames: You're correct. I'm just going to remove this. The comment change isn't really related to the…
class MCAlignFragment : public MCFragment {		class MCAlignFragment : public MCFragment {
/// The alignment to ensure, in bytes.		/// The alignment to ensure, in bytes.
unsigned Alignment;		unsigned Alignment;

/// Flag to indicate that (optimal) NOPs should be emitted instead		/// Flag to indicate that (optimal) NOPs should be emitted instead
/// of using the provided value. The exact interpretation of this flag is		/// of using the provided value. The exact interpretation of this flag is
/// target dependent.		/// target dependent.
bool EmitNops : 1;		bool EmitNops : 1;
▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines

llvm/lib/MC/MCAssembler.cpp

Show First 20 Lines • Show All 784 Lines • ▼ Show 20 Lines	void MCAssembler::layout(MCAsmLayout &Layout) {
// Layout until everything fits.		// Layout until everything fits.
while (layoutOnce(Layout))		while (layoutOnce(Layout))
if (getContext().hadError())		if (getContext().hadError())
return;		return;

DEBUG_WITH_TYPE("mc-dump", {		DEBUG_WITH_TYPE("mc-dump", {
errs() << "assembler backend - post-relaxation\n--\n";		errs() << "assembler backend - post-relaxation\n--\n";
dump(); });		dump(); });

		optimizeLayout(Layout);
		MaskRayUnsubmitted Not Done Reply Inline Actions We can dump the layout only when something has changed. This requires a change to `optimizeLayout`'s return type. MaskRay: We can dump the layout only when something has changed. This requires a change to…
		reamesAuthorUnsubmitted Done Reply Inline Actions I see no value in this. I can make the change if you want, but I don't think it's actually useful. reames: I see no value in this. I can make the change if you want, but I don't think it's actually…

		DEBUG_WITH_TYPE("mc-dump", {
		errs() << "assembler backend - post-optimization\n--\n";
		dump(); });

		MaskRayUnsubmitted Not Done Reply Inline Actions I suspect we may have to do `while (layoutOnce(Layout))` and `optimizeLayout(Layout)` in a lockstep. `optimizeLayout` may cause some JCC_1/JMP_1 in MCRelaxableFragment to need relaxation. MaskRay: I suspect we may have to do `while (layoutOnce(Layout))` and `optimizeLayout(Layout)` in a…
		reamesAuthorUnsubmitted Done Reply Inline Actions The whole point of the design is that we don't have to iterate them. If you see a case where we do, please point it out; that's a bug. See the comments about which starting offsets are changing and why we're not skipping over any relaxable fragment. reames: The whole point of the design is that we don't have to iterate them. If you see a case where…
// Finalize the layout, including fragment lowering.		// Finalize the layout, including fragment lowering.
finishLayout(Layout);		finishLayout(Layout);

DEBUG_WITH_TYPE("mc-dump", {		DEBUG_WITH_TYPE("mc-dump", {
errs() << "assembler backend - final-layout\n--\n";		errs() << "assembler backend - final-layout\n--\n";
dump(); });		dump(); });

// Allow the object writer a chance to perform post-layout binding (for		// Allow the object writer a chance to perform post-layout binding (for
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	for (auto i = 0, N = BF.isFused() ? 2 : 1;
AlignedSize += computeFragmentSize(Layout, *F);		AlignedSize += computeFragmentSize(Layout, *F);
}		}
uint64_t OldSize = BF.getSize();		uint64_t OldSize = BF.getSize();
AlignedOffset -= OldSize;		AlignedOffset -= OldSize;
Align BoundaryAlignment = BF.getAlignment();		Align BoundaryAlignment = BF.getAlignment();
uint64_t NewSize = needPadding(AlignedOffset, AlignedSize, BoundaryAlignment)		uint64_t NewSize = needPadding(AlignedOffset, AlignedSize, BoundaryAlignment)
? offsetToAlignment(AlignedOffset, BoundaryAlignment)		? offsetToAlignment(AlignedOffset, BoundaryAlignment)
: 0U;		: 0U;
if (NewSize == OldSize)
return false;
BF.setSize(NewSize);		BF.setSize(NewSize);
Layout.invalidateFragmentsFrom(&BF);		return (NewSize != OldSize);
return true;
}		}
		MaskRayUnsubmitted Not Done Reply Inline Actions Unneeded MaskRay: Unneeded

bool MCAssembler::relaxDwarfLineAddr(MCAsmLayout &Layout,		bool MCAssembler::relaxDwarfLineAddr(MCAsmLayout &Layout,
MCDwarfLineAddrFragment &DF) {		MCDwarfLineAddrFragment &DF) {
MCContext &Context = Layout.getAssembler().getContext();		MCContext &Context = Layout.getAssembler().getContext();
uint64_t OldSize = DF.getContents().size();		uint64_t OldSize = DF.getContents().size();
int64_t AddrDelta;		int64_t AddrDelta;
bool Abs = DF.getAddrDelta().evaluateKnownAbsolute(AddrDelta, Layout);		bool Abs = DF.getAddrDelta().evaluateKnownAbsolute(AddrDelta, Layout);
assert(Abs && "We created a line delta with an invalid expression");		assert(Abs && "We created a line delta with an invalid expression");
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	for (iterator it = begin(), ie = end(); it != ie; ++it) {
MCSection &Sec = *it;		MCSection &Sec = *it;
while (layoutSectionOnce(Layout, Sec))		while (layoutSectionOnce(Layout, Sec))
WasRelaxed = true;		WasRelaxed = true;
}		}

return WasRelaxed;		return WasRelaxed;
}		}

		void MCAssembler::optimizeLayout(MCAsmLayout &Layout) {
		// See if we can further relax some instructions to cut down on the number of
		// nop bytes required for code alignment. The actual win is in reducing
		// instruction count, not number of bytes. Some micro-architectures (such
		// as, say, modern X86-64) can easily end up decode limited. It is
		// often better to reduce the number of instructions (i.e. eliminate nops)
		// even at the cost of increasing the size and complexity of others.

		DenseSet<MCFragment *> LabeledFragments;
		for (const MCSymbol &S : symbols())
		LabeledFragments.insert(S.getFragment(false));

		for (iterator it = begin(), ie = end(); it != ie; ++it) {
		MaskRayUnsubmitted Not Done Reply Inline Actions `for (MCSection &Sec : this) {` MaskRay:* `for (MCSection &Sec : *this) {`
		MCSection &Sec = *it;
		skanUnsubmitted Not Done Reply Inline Actions Capitalize the first character of the variable it skan: Capitalize the first character of the variable it
		if (!Sec.getKind().isText())
		continue;

		SmallVector<MCRelaxableFragment*, 4> Relaxable;
		for (MCSection::iterator I = Sec.begin(), IE = Sec.end(); I != IE; ++I) {
		MaskRayUnsubmitted Not Done Reply Inline Actions `for (MCFragment &F : Sec) {` MaskRay: `for (MCFragment &F : Sec) {`
		MCFragment &F = *I;

		if (LabeledFragments.count(&F))
		Relaxable.clear();

		if (F.getKind() == MCFragment::FT_Data)
		// Skip and ignore
		continue;
		skanUnsubmitted Not Done Reply Inline Actions `F.getKind() == MCFragment::FT_Data \|\| F.getKind() == FT_CompactEncodedInst` I think we can handle `MCCompactEncodedInstFragment` here. skan: `F.getKind() == MCFragment::FT_Data \|\| F.getKind() == FT_CompactEncodedInst` I think we can…
		MaskRayUnsubmitted Not Done Reply Inline Actions It seems we can use `switch (F.getKind())` here. MaskRay: It seems we can use `switch (F.getKind())` here.
		reamesAuthorUnsubmitted Done Reply Inline Actions I'd written it with a switch originally. It was much harder to read. I could use a switch for the filtering and then fall into the complicated logic for the align cases if you want? reames: I'd written it with a switch originally. It was much harder to read. I could use a switch for…

		if (F.getKind() == MCFragment::FT_Relaxable) {
		auto &RF = cast<MCRelaxableFragment>(*I);
		if (getBackend().canBeFurtherRelaxed(RF.getInst(),
		*RF.getSubtargetInfo()))
		Relaxable.push_back(&RF);
		// Note: It's okay to skip an instruction which can't be further
		// relaxed as it a) will be a fixed number of bytes, and b) must be
		// able to encode any larger offsets which result from shifting it
		// around.
		continue;
		}

		// For any unhandled kind, assume we can't change layout.
		if (F.getKind() != MCFragment::FT_Align &&
		skanUnsubmitted Not Done Reply Inline Actions The return type of `getFragmentOffset` and `computeFragmentSize` is `uint64_t`, so I suggest use `uint64_t` here is better skan: The return type of `getFragmentOffset` and `computeFragmentSize` is `uint64_t`, so I suggest…
		F.getKind() != MCFragment::FT_BoundaryAlign) {
		Relaxable.clear();
		continue;
		}
		const unsigned OrigOffset = Layout.getFragmentOffset(&F);
		const unsigned OrigSize = computeFragmentSize(Layout, F);
		if (OrigSize == 0 \|\| Relaxable.empty()) {
		Relaxable.clear();
		continue;
		}

		// To keep the effects local, prefer to relax instructions closest to
		// the align directive. This is purely about human understandability
		// of the resulting code. If we later find a reason to expand
		// particular instructions over others, we can adjust.
		MCFragment *FirstChangedFragment = nullptr;
		unsigned RemainingSize = OrigSize;
		while (!Relaxable.empty() && RemainingSize != 0) {
		auto *RF = Relaxable.pop_back_val();

		MCInst Relaxed;
		getBackend().relaxInstruction(RF->getInst(), *RF->getSubtargetInfo(),
		MaskRayUnsubmitted Not Done Reply Inline Actions The comment can be moved before the `if`. Then the brace can be deleted. MaskRay: The comment can be moved before the `if`. Then the brace can be deleted.
		Relaxed);
		SmallVector<MCFixup, 4> Fixups;
		SmallString<256> Code;
		raw_svector_ostream VecOS(Code);
		getEmitter().encodeInstruction(Relaxed, VecOS, Fixups,
		*RF->getSubtargetInfo());
		const unsigned OldSize = RF->getContents().size();
		const unsigned NewSize = Code.size();
		assert(NewSize >= OldSize && "size decrease during relaxation?");
		unsigned Delta = NewSize - OldSize;
		if (Delta > RemainingSize) {
		// Too big, can't use this relocation without requiring another
		// round of relaxation. Subtly, we can't skip past this one either
		// as we haven't relaxed it. Changing it's starting offset might
		skanUnsubmitted Not Done Reply Inline Actions We can avoid setting size for boundaryalign with D75404 skan: We can avoid setting size for boundaryalign with D75404
		// require a larger negative offset than it can encode. We don't
		// need to worry about larger positive offsets as none of the
		// possible offsets between this and our align are visible, and the
		// ones afterwards aren't changing.
		break;
		}
		RemainingSize -= Delta;
		RF->setInst(Relaxed);
		RF->getContents() = Code;
		RF->getFixups() = Fixups;
		FirstChangedFragment = RF;
		}
		Relaxable.clear();

		// Unlike align, boundary align tracks it's own size after relaxation.
		if (F.getKind() == MCFragment::FT_BoundaryAlign)
		skanUnsubmitted Not Done Reply Inline Actions uint64_t skan: uint64_t
		cast<MCBoundaryAlignFragment>(F).setSize(RemainingSize);

		if (FirstChangedFragment) {
		// Redo the layout for any fragements in the effected range. This is
		// mostly updating start offsets, but also may need to apply other
		// updates (such as changing offsets) to the fragments in question.
		// Note that the relaxation itself has already been done above, and
		// thus the total size of the range isn't changing.
		Layout.invalidateFragmentsFrom(FirstChangedFragment);
		while (FirstChangedFragment != &F) {
		relaxFragment(Layout, *FirstChangedFragment);
		FirstChangedFragment = FirstChangedFragment->getNextNode();
		}
		}

		const unsigned FinalOffset = Layout.getFragmentOffset(&F);
		const unsigned FinalSize = computeFragmentSize(Layout, F);
		assert(OrigOffset + OrigSize == FinalOffset + FinalSize &&
		"can't move start of next fragment!");
		assert(FinalSize == RemainingSize && "inconsistent size computation?");
		}
		}
		}


void MCAssembler::finishLayout(MCAsmLayout &Layout) {		void MCAssembler::finishLayout(MCAsmLayout &Layout) {
assert(getBackendPtr() && "Expected assembler backend");		assert(getBackendPtr() && "Expected assembler backend");
// The layout is done. Mark every fragment as valid.		// The layout is done. Mark every fragment as valid.
for (unsigned int i = 0, n = Layout.getSectionOrder().size(); i != n; ++i) {		for (unsigned int i = 0, n = Layout.getSectionOrder().size(); i != n; ++i) {
MCSection &Section = *Layout.getSectionOrder()[i];		MCSection &Section = *Layout.getSectionOrder()[i];
Layout.getFragmentOffset(&*Section.getFragmentList().rbegin());		Layout.getFragmentOffset(&*Section.getFragmentList().rbegin());
computeFragmentSize(Layout, *Section.getFragmentList().rbegin());		computeFragmentSize(Layout, *Section.getFragmentList().rbegin());
}		}
Show All 26 Lines

llvm/lib/MC/MCFragment.cpp

Show First 20 Lines • Show All 388 Lines • ▼ Show 20 Lines	OS << " Value:" << static_cast<unsigned>(FF->getValue())
<< " NumValues:" << FF->getNumValues();		<< " NumValues:" << FF->getNumValues();
break;		break;
}		}
case MCFragment::FT_Relaxable: {		case MCFragment::FT_Relaxable: {
const auto *F = cast<MCRelaxableFragment>(this);		const auto *F = cast<MCRelaxableFragment>(this);
OS << "\n ";		OS << "\n ";
OS << " Inst:";		OS << " Inst:";
F->getInst().dump_pretty(OS);		F->getInst().dump_pretty(OS);
		OS << " (" << F->getContents().size() << ")";
break;		break;
}		}
case MCFragment::FT_Org: {		case MCFragment::FT_Org: {
const auto *OF = cast<MCOrgFragment>(this);		const auto *OF = cast<MCOrgFragment>(this);
OS << "\n ";		OS << "\n ";
OS << " Offset:" << OF->getOffset()		OS << " Offset:" << OF->getOffset()
<< " Value:" << static_cast<unsigned>(OF->getValue());		<< " Value:" << static_cast<unsigned>(OF->getValue());
break;		break;
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	public:
void applyFixup(const MCAssembler &Asm, const MCFixup &Fixup,		void applyFixup(const MCAssembler &Asm, const MCFixup &Fixup,
const MCValue &Target, MutableArrayRef<char> Data,		const MCValue &Target, MutableArrayRef<char> Data,
uint64_t Value, bool IsResolved,		uint64_t Value, bool IsResolved,
const MCSubtargetInfo *STI) const override;		const MCSubtargetInfo *STI) const override;

bool mayNeedRelaxation(const MCInst &Inst,		bool mayNeedRelaxation(const MCInst &Inst,
const MCSubtargetInfo &STI) const override;		const MCSubtargetInfo &STI) const override;

		bool canBeFurtherRelaxed(const MCInst &Inst,
		const MCSubtargetInfo &STI) const override;

bool fixupNeedsRelaxation(const MCFixup &Fixup, uint64_t Value,		bool fixupNeedsRelaxation(const MCFixup &Fixup, uint64_t Value,
const MCRelaxableFragment *DF,		const MCRelaxableFragment *DF,
const MCAsmLayout &Layout) const override;		const MCAsmLayout &Layout) const override;

void relaxInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,		void relaxInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
MCInst &Res) const override;		MCInst &Res) const override;

bool writeNopData(raw_ostream &OS, uint64_t Count) const override;		bool writeNopData(raw_ostream &OS, uint64_t Count) const override;
▲ Show 20 Lines • Show All 419 Lines • ▼ Show 20 Lines	bool X86AsmBackend::mayNeedRelaxation(const MCInst &Inst,
// relaxable instructions, the relaxable operand is always the last operand.		// relaxable instructions, the relaxable operand is always the last operand.
unsigned RelaxableOp = Inst.getNumOperands() - 1;		unsigned RelaxableOp = Inst.getNumOperands() - 1;
if (Inst.getOperand(RelaxableOp).isExpr())		if (Inst.getOperand(RelaxableOp).isExpr())
return true;		return true;

return false;		return false;
}		}

		bool X86AsmBackend::canBeFurtherRelaxed(const MCInst &Inst,
		const MCSubtargetInfo &STI) const {
		// TODO: There are lots of other tricks we could apply for increasing
		// encoding size without impacting performance.
		bool is16BitMode = STI.getFeatureBits()[X86::Mode16Bit];
		return getRelaxedOpcode(Inst, is16BitMode) != Inst.getOpcode();
		}


bool X86AsmBackend::fixupNeedsRelaxation(const MCFixup &Fixup,		bool X86AsmBackend::fixupNeedsRelaxation(const MCFixup &Fixup,
uint64_t Value,		uint64_t Value,
const MCRelaxableFragment *DF,		const MCRelaxableFragment *DF,
const MCAsmLayout &Layout) const {		const MCAsmLayout &Layout) const {
// Relax if the value is too big for a (signed) i8.		// Relax if the value is too big for a (signed) i8.
return !isInt<8>(Value);		return !isInt<8>(Value);
}		}

// FIXME: Can tblgen help at all here to verify there aren't other instructions		// FIXME: Can tblgen help at all here to verify there aren't other instructions
// we can relax?		// we can relax?
void X86AsmBackend::relaxInstruction(const MCInst &Inst,		void X86AsmBackend::relaxInstruction(const MCInst &Inst,
const MCSubtargetInfo &STI,		const MCSubtargetInfo &STI,
MCInst &Res) const {		MCInst &Res) const {
		assert(canBeFurtherRelaxed(Inst, STI));

// The only relaxations X86 does is from a 1byte pcrel to a 4byte pcrel.		// The only relaxations X86 does is from a 1byte pcrel to a 4byte pcrel.
bool is16BitMode = STI.getFeatureBits()[X86::Mode16Bit];		bool is16BitMode = STI.getFeatureBits()[X86::Mode16Bit];
unsigned RelaxedOp = getRelaxedOpcode(Inst, is16BitMode);		unsigned RelaxedOp = getRelaxedOpcode(Inst, is16BitMode);

if (RelaxedOp == Inst.getOpcode()) {		if (RelaxedOp == Inst.getOpcode()) {
SmallString<256> Tmp;		SmallString<256> Tmp;
raw_svector_ostream OS(Tmp);		raw_svector_ostream OS(Tmp);
Inst.dump_pretty(OS);		Inst.dump_pretty(OS);
OS << "\n";		OS << "\n";
report_fatal_error("unexpected instruction to relax: " + OS.str());		report_fatal_error("unexpected instruction to relax: " + OS.str());
}		}

Res = Inst;		Res = Inst;
Res.setOpcode(RelaxedOp);		Res.setOpcode(RelaxedOp);
}		}

/// Write a sequence of optimal nops to the output, covering \p Count		/// Write a sequence of optimal nops to the output, covering \p Count
/// bytes.		/// bytes.
/// \return - true on success, false on failure		/// \return - true on success, false on failure
bool X86AsmBackend::writeNopData(raw_ostream &OS, uint64_t Count) const {		bool X86AsmBackend::writeNopData(raw_ostream &OS, uint64_t Count) const {
static const char Nops[10][11] = {		static const char Nops[10][11] = {
// nop		// nop
		skanUnsubmitted Not Done Reply Inline Actions Is16BitMode skan: Is16BitMode
		reamesAuthorUnsubmitted Done Reply Inline Actions This code is copied from existing example. Will change both after commit. reames: This code is copied from existing example. Will change both after commit.
"\x90",		"\x90",
// xchg %ax,%ax		// xchg %ax,%ax
"\x66\x90",		"\x66\x90",
// nopl (%[re]ax)		// nopl (%[re]ax)
"\x0f\x1f\x00",		"\x0f\x1f\x00",
// nopl 0(%[re]ax)		// nopl 0(%[re]ax)
"\x0f\x1f\x40\x00",		"\x0f\x1f\x40\x00",
// nopl 0(%[re]ax,%[re]ax,1)		// nopl 0(%[re]ax,%[re]ax,1)
"\x0f\x1f\x44\x00\x00",		"\x0f\x1f\x44\x00\x00",
// nopw 0(%[re]ax,%[re]ax,1)		// nopw 0(%[re]ax,%[re]ax,1)
"\x66\x0f\x1f\x44\x00\x00",		"\x66\x0f\x1f\x44\x00\x00",
		MaskRayUnsubmitted Not Done Reply Inline Actions `RF.getSubtargetInfo()` -> `STI` (it is a member variable) MaskRay:* `*RF.getSubtargetInfo()` -> `STI` (it is a member variable)
		reamesAuthorUnsubmitted Done Reply Inline Actions I'd prefer to leave this as is. It follows the idiom of the other uses of relaxInstruction, and I haven't cross checked that the two subtarget infos are identical. reames: I'd prefer to leave this as is. It follows the idiom of the other uses of relaxInstruction…
// nopl 0L(%[re]ax)		// nopl 0L(%[re]ax)
"\x0f\x1f\x80\x00\x00\x00\x00",		"\x0f\x1f\x80\x00\x00\x00\x00",
// nopl 0L(%[re]ax,%[re]ax,1)		// nopl 0L(%[re]ax,%[re]ax,1)
		MaskRayUnsubmitted Not Done Reply Inline Actions `SmallString<16>` should be sufficient. (The SmallString used in MCAssembler::relaxInstruction should be fixed instead) MaskRay: `SmallString<16>` should be sufficient. (The SmallString used in MCAssembler::relaxInstruction…
"\x0f\x1f\x84\x00\x00\x00\x00\x00",		"\x0f\x1f\x84\x00\x00\x00\x00\x00",
// nopw 0L(%[re]ax,%[re]ax,1)		// nopw 0L(%[re]ax,%[re]ax,1)
"\x66\x0f\x1f\x84\x00\x00\x00\x00\x00",		"\x66\x0f\x1f\x84\x00\x00\x00\x00\x00",
// nopw %cs:0L(%[re]ax,%[re]ax,1)		// nopw %cs:0L(%[re]ax,%[re]ax,1)
"\x66\x2e\x0f\x1f\x84\x00\x00\x00\x00\x00",		"\x66\x2e\x0f\x1f\x84\x00\x00\x00\x00\x00",
};		};

// This CPU doesn't support long nops. If needed add more.		// This CPU doesn't support long nops. If needed add more.
// FIXME: We could generated something better than plain 0x90.		// FIXME: We could generated something better than plain 0x90.
if (!STI.getFeatureBits()[X86::FeatureNOPL]) {		if (!STI.getFeatureBits()[X86::FeatureNOPL]) {
for (uint64_t i = 0; i < Count; ++i)		for (uint64_t i = 0; i < Count; ++i)
OS << '\x90';		OS << '\x90';
return true;		return true;
}		}
		reamesAuthorUnsubmitted Done Reply Inline Actions Note to self: Since this is increasing the size of the instruction, we need to make sure we're not creating a branch boundary crossing case. If we are, we can just skip the optional expansion here. reames: Note to self: Since this is increasing the size of the instruction, we need to make sure we're…
		skanUnsubmitted Done Reply Inline Actions I think we add the code if needAlignInst(Inst) return false; auto PF = cast_or_null<MCBoundaryAlignFragment>(RF.getPrevNode()); if(PF && PF.canEmitNops()) return false; in the function `canBeRelaxedForPadding` to make sure we're not creating a branch boundary crossing case skan:* I think we add the code ``` if needAlignInst(Inst) return false; auto *PF =…

// 15-bytes is the longest single NOP instruction, but 10-bytes is		// 15-bytes is the longest single NOP instruction, but 10-bytes is
// commonly the longest that can be efficiently decoded.		// commonly the longest that can be efficiently decoded.
uint64_t MaxNopLength = 10;		uint64_t MaxNopLength = 10;
if (STI.getFeatureBits()[X86::ProcIntelSLM])		if (STI.getFeatureBits()[X86::ProcIntelSLM])
MaxNopLength = 7;		MaxNopLength = 7;
else if (STI.getFeatureBits()[X86::FeatureFast15ByteNOP])		else if (STI.getFeatureBits()[X86::FeatureFast15ByteNOP])
MaxNopLength = 15;		MaxNopLength = 15;
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	public:
createObjectTargetWriter() const override {		createObjectTargetWriter() const override {
return createX86ELFObjectWriter(/IsELF64/ false, OSABI,		return createX86ELFObjectWriter(/IsELF64/ false, OSABI,
ELF::EM_X86_64);		ELF::EM_X86_64);
}		}
};		};

class ELFX86_IAMCUAsmBackend : public ELFX86AsmBackend {		class ELFX86_IAMCUAsmBackend : public ELFX86AsmBackend {
public:		public:
ELFX86_IAMCUAsmBackend(const Target &T, uint8_t OSABI,		ELFX86_IAMCUAsmBackend(const Target &T, uint8_t OSABI,
		craig.topperUnsubmitted Done Reply Inline Actions it's ->its craig.topper: it's ->its
const MCSubtargetInfo &STI)		const MCSubtargetInfo &STI)
: ELFX86AsmBackend(T, OSABI, STI) {}		: ELFX86AsmBackend(T, OSABI, STI) {}

std::unique_ptr<MCObjectTargetWriter>		std::unique_ptr<MCObjectTargetWriter>
createObjectTargetWriter() const override {		createObjectTargetWriter() const override {
return createX86ELFObjectWriter(/IsELF64/ false, OSABI,		return createX86ELFObjectWriter(/IsELF64/ false, OSABI,
ELF::EM_IAMCU);		ELF::EM_IAMCU);
}		}
};		};

class ELFX86_64AsmBackend : public ELFX86AsmBackend {		class ELFX86_64AsmBackend : public ELFX86AsmBackend {
public:		public:
ELFX86_64AsmBackend(const Target &T, uint8_t OSABI,		ELFX86_64AsmBackend(const Target &T, uint8_t OSABI,
const MCSubtargetInfo &STI)		const MCSubtargetInfo &STI)
: ELFX86AsmBackend(T, OSABI, STI) {}		: ELFX86AsmBackend(T, OSABI, STI) {}

std::unique_ptr<MCObjectTargetWriter>		std::unique_ptr<MCObjectTargetWriter>
createObjectTargetWriter() const override {		createObjectTargetWriter() const override {
return createX86ELFObjectWriter(/IsELF64/ true, OSABI, ELF::EM_X86_64);		return createX86ELFObjectWriter(/IsELF64/ true, OSABI, ELF::EM_X86_64);
}		}
};		};

class WindowsX86AsmBackend : public X86AsmBackend {		class WindowsX86AsmBackend : public X86AsmBackend {
		craig.topperUnsubmitted Done Reply Inline Actions The end of this line and into the next line doesn't read right. It says "a it's" craig.topper: The end of this line and into the next line doesn't read right. It says "a it's"
bool Is64Bit;		bool Is64Bit;
		craig.topperUnsubmitted Done Reply Inline Actions it's -> its craig.topper: it's -> its

public:		public:
WindowsX86AsmBackend(const Target &T, bool is64Bit,		WindowsX86AsmBackend(const Target &T, bool is64Bit,
const MCSubtargetInfo &STI)		const MCSubtargetInfo &STI)
: X86AsmBackend(T, STI)		: X86AsmBackend(T, STI)
, Is64Bit(is64Bit) {		, Is64Bit(is64Bit) {
}		}

Optional<MCFixupKind> getFixupKind(StringRef Name) const override {		Optional<MCFixupKind> getFixupKind(StringRef Name) const override {
return StringSwitch<Optional<MCFixupKind>>(Name)		return StringSwitch<Optional<MCFixupKind>>(Name)
.Case("dir32", FK_Data_4)		.Case("dir32", FK_Data_4)
.Case("secrel32", FK_SecRel_4)		.Case("secrel32", FK_SecRel_4)
.Case("secidx", FK_SecRel_2)		.Case("secidx", FK_SecRel_2)
.Default(MCAsmBackend::getFixupKind(Name));		.Default(MCAsmBackend::getFixupKind(Name));
}		}

std::unique_ptr<MCObjectTargetWriter>		std::unique_ptr<MCObjectTargetWriter>
createObjectTargetWriter() const override {		createObjectTargetWriter() const override {
return createX86WinCOFFObjectWriter(Is64Bit);		return createX86WinCOFFObjectWriter(Is64Bit);
}		}
};		};

namespace CU {		namespace CU {
		skanUnsubmitted Not Done Reply Inline Actions Capitalize i here otherwise clang tidy would report a warning skan: Capitalize i here otherwise clang tidy would report a warning
		reamesAuthorUnsubmitted Done Reply Inline Actions "i" and "I" are different variables. Also, if clang-tidy reports a warning for "i", clang-tidy has a bug. reames: "i" and "I" are different variables. Also, if clang-tidy reports a warning for "i", clang-tidy…

/// Compact unwind encoding values.		/// Compact unwind encoding values.
enum CompactUnwindEncodings {		enum CompactUnwindEncodings {
/// [RE]BP based frame where [RE]BP is pused on the stack immediately after		/// [RE]BP based frame where [RE]BP is pused on the stack immediately after
/// the return address, then [RE]SP is moved to [RE]BP.		/// the return address, then [RE]SP is moved to [RE]BP.
UNWIND_MODE_BP_FRAME = 0x01000000,		UNWIND_MODE_BP_FRAME = 0x01000000,

/// A frameless function with a small constant stack size.		/// A frameless function with a small constant stack size.
UNWIND_MODE_STACK_IMMD = 0x02000000,		UNWIND_MODE_STACK_IMMD = 0x02000000,
		skanUnsubmitted Not Done Reply Inline Actions Capitalize i and N here otherwise clang tidy would report a warning skan: Capitalize i and N here otherwise clang tidy would report a warning
		reamesAuthorUnsubmitted Done Reply Inline Actions No, clang-tidy is wrong. reames: No, clang-tidy is wrong.

/// A frameless function with a large constant stack size.		/// A frameless function with a large constant stack size.
UNWIND_MODE_STACK_IND = 0x03000000,		UNWIND_MODE_STACK_IND = 0x03000000,

/// No compact unwind encoding is available.		/// No compact unwind encoding is available.
UNWIND_MODE_DWARF = 0x04000000,		UNWIND_MODE_DWARF = 0x04000000,

/// Mask for encoding the frame registers.		/// Mask for encoding the frame registers.
▲ Show 20 Lines • Show All 366 Lines • Show Last 20 Lines

llvm/test/MC/X86/align-branch-64.s

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	test_indirect:
.endr		.endr
jmpq *(%rax)		jmpq *(%rax)

.p2align 4		.p2align 4
.type bar,@function		.type bar,@function
bar:		bar:
retq		retq

		# CHECK: test_pad_via_relax:
		# CHECK: 200: testq
		# CHECK: 203: jne
		# CHECK: 209: int3
		# note 6 byte jne which could be a 2 byte jne, but is instead
		# expanded for padding purposes
		# CHECK-NOT: nop
		# CHECK: 220: callq
		.global test_pad_via_relax
		.p2align 5
		test_pad_via_relax:
		testq %rax, %rax
		jnz bar
		.rept 23
		int3
		.endr
		callq bar

.section "unknown"		.section "unknown"
.p2align 4		.p2align 4
.type baz,@function		.type baz,@function
baz:		baz:
retq		retq

llvm/test/MC/X86/align-via-relaxation.s

This file was added.

				# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu %s \| llvm-objdump -d --section=.text - \| FileCheck %s

				MaskRayUnsubmitted Done Reply Inline Actions The `RUN` line is unnecessarily indented. It is not common. `-pc-linux-gnu` can be deleted to make the line length smaller. MaskRay: The `RUN` line is unnecessarily indented. It is not common. `-pc-linux-gnu` can be deleted to…

				.file "test.c"
				.text
				.section .text
				# Demonstrate that we can relax instructions to provide padding, not
				MaskRayUnsubmitted Done Reply Inline Actions `.text` and `.section .text` do the same thing. We can actually delete `.file`, `.text` and `.section .text`. MaskRay: `.text` and `.section .text` do the same thing. We can actually delete `.file`, `.text` and `.
				reamesAuthorUnsubmitted Done Reply Inline Actions We need to know it's a text section not a data section. reames: We need to know it's a text section not a data section.
				# just insert nops. jmps are being used for ease of demonstration.
				# CHECK: .text
				# CHECK: 0: eb 1f jmp 31 <foo>
				# CHECK: 2: e9 1a 00 00 00 jmp 26 <foo>
				MaskRayUnsubmitted Done Reply Inline Actions `jmp` is misaligned. MaskRay: `jmp` is misaligned.
				reamesAuthorUnsubmitted Done Reply Inline Actions Er, what? Are you possibly thinking of the branch-align feature? That's not enabled in this test file. reames: Er, what? Are you possibly thinking of the branch-align feature? That's not enabled in this…
				skanUnsubmitted Done Reply Inline Actions I think his meaning is "jmp" is misaligned in a text editor. skan: I think his meaning is "jmp" is misaligned in a text editor.
				# CHECK: 7: e9 15 00 00 00 jmp 21 <foo>
				# CHECK: c: e9 10 00 00 00 jmp 16 <foo>
				# CHECK: 11: e9 0b 00 00 00 jmp 11 <foo>
				# CHECK: 16: e9 06 00 00 00 jmp 6 <foo>
				# CHECK: 1b: e9 01 00 00 00 jmp 1 <foo>
				# CHECK: 20: cc int3
				.p2align 4
				jmp foo
				jmp foo
				jmp foo
				jmp foo
				jmp foo
				jmp foo
				jmp foo
				.p2align 5
				int3
				foo:
				ret

				# Check that we're not shifting aroudn the offsets of labels - doing
				skanUnsubmitted Done Reply Inline Actions around skan: around
				reamesAuthorUnsubmitted Done Reply Inline Actions No. That would change meaning of comment. reames: No. That would change meaning of comment.
				# that would require a further round of relaxation
				# CHECK: bar:
				# CHECK: 22: eb fe jmp -2 <bar>
				# CHECK: 24: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
				# CHECK: 2e: 66 90 nop
				# CHECK: 30: 0f 0b ud2

				bar:
				jmp bar
				nobypass:
				.p2align 4
				ud2


				# Canonical toy loop to show benefit - we can align the loop header with
				# fewer nops by relaxing the branch, even though we don't need to
				# CHECK: loop_preheader:
				# CHECK: 45: 48 85 c0 testq %rax, %rax
				# CHECK: 48: 0f 8e 22 00 00 00 jle 34 <loop_exit>
				# CHECK: 4e: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
				# CHECK: 58: 0f 1f 84 00 00 00 00 00 nopl (%rax,%rax)
				# CHECK: loop_header:
				# CHECK: 60: 48 83 e8 01 subq $1, %rax
				# CHECK: 64: 48 85 c0 testq %rax, %rax
				# CHECK: 67: 7e 07 jle 7 <loop_exit>
				# CHECK: 69: e9 f2 ff ff ff jmp -14 <loop_header>
				# CHECK: 6e: 66 90 nop
				# CHECK: loop_exit:
				# CHECK: 70: c3 retq
				.p2align 5
				.skip 5
				loop_preheader:
				testq %rax, %rax
				jle loop_exit
				.p2align 5
				loop_header:
				subq $1, %rax
				testq %rax, %rax
				jle loop_exit
				jmp loop_header
				.p2align 4
				loop_exit:
				ret

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Relax existing instructions to reduce the number of nops needed for alignment purposesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 246848

llvm/include/llvm/MC/MCAsmBackend.h

llvm/include/llvm/MC/MCAssembler.h

llvm/include/llvm/MC/MCFragment.h

llvm/lib/MC/MCAssembler.cpp

llvm/lib/MC/MCFragment.cpp

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

llvm/test/MC/X86/align-branch-64.s

llvm/test/MC/X86/align-via-relaxation.s

[X86] Relax existing instructions to reduce the number of nops needed for alignment purposes
ClosedPublic