Download Raw Diff

Details

Reviewers

reames
MaskRay
craig.topper
LuoYuanke
jyknight

Commits

rG916044d819c8: [X86][MC] Support enhanced relaxation for branch align

Summary

Since D75300 has been landed, I want to support enhanced relaxation when we need to align branches and allow prefix padding. "Enhanced Relaxtion" means we allow an instruction that could not be traditionally relaxed to be emitted into RelaxableFragment so that we increase its length by adding prefixes for optimization.

The motivation is straightforward, RelaxFragment is mostly for relative jumps and we can not increase the length of jumps when we need to align them, so if we need to achieve D75300's purpose (reducing the bytes of nops) when need to align jumps, we have to make more instructions "relaxable".

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

skan created this revision.Mar 17 2020, 7:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2020, 7:54 AM

Harbormaster completed remote builds in B49431: Diff 250775.Mar 17 2020, 8:31 AM

skan added a parent revision: D76285: [X86][MC] Fix the bug for prefix padding support.Mar 17 2020, 8:26 PM

skan mentioned this in D76176: [X86] Disable nop padding before instruction following hardcode.Mar 18 2020, 8:26 PM

Split part of code to D76475, the new added test can pass after D76475 is landed

skan added a parent revision: D76475: [X86][MC] Disable Prefix padding after hardcode/prefix.Mar 19 2020, 8:21 PM

skan mentioned this in D72225: Align branches within 32-Byte boundary(Prefix padding).Mar 19 2020, 8:27 PM

skan mentioned this in D75268: A light-weight solution to align branches within 32B boundary by prefix padding.

[NFC] align the text in the test file

Harbormaster failed remote builds in B49831: Diff 251545!Mar 19 2020, 8:50 PM

Harbormaster failed remote builds in B49833: Diff 251548!Mar 19 2020, 9:23 PM

LuoYuanke added inline comments.Mar 19 2020, 9:56 PM

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
167	Why not just name it as "allowRelaxing" ? The relax information is eventually be indicated in instruction's .td file, right?

Fix the warning reported by clang-format

skan marked an inline comment as done.Mar 19 2020, 10:19 PM

skan added inline comments.

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
167	See the description in the summary. Traditional relaxation means the operand's size of instruction may be change (e.g, JCC1->JCC4), but if we allow enhanced relaxtion, we can change the instruction's size by adding prefixes on it. They are different, and the enhanced relaxtion has nothing to do with td file

Harbormaster completed remote builds in B49836: Diff 251553.Mar 19 2020, 10:58 PM

I don't think this is a good idea. Without evidence to the contrary, I would assume that this suffers from exactly the same memory overhead problems as the original rejected patch. The basic challenge is that a MCInst is quite large (~144 bytes if my quick mental math is right). And the containing RelaxableInst is a bit larger still. The storage for a small encoded instruction in a DataFragment without fixups is ~2-6 bytes. That's a huge difference.

An alternate approach is to explore allowing prefixes to be spliced into DataFragments. This would require keeping something analogous to a Fixup with the offset, prefix, and maximum bytes to insert.

Another idea would be a new fragment subclass for instructions which are paddable, but not relaxable. Or possibly a compressed representation for RelaxableFragment (i.e. most of the space/generality appears to be overkill.)

This revision now requires changes to proceed.Mar 24 2020, 4:10 PM

In D76286#1940291, @reames wrote:

I don't think this is a good idea. Without evidence to the contrary, I would assume that this suffers from exactly the same memory overhead problems as the original rejected patch. The basic challenge is that a MCInst is quite large (~144 bytes if my quick mental math is right). And the containing RelaxableInst is a bit larger still. The storage for a small encoded instruction in a DataFragment without fixups is ~2-6 bytes. That's a huge difference.

I did some measurement for the memory usage, and will paste the data next week.

An alternate approach is to explore allowing prefixes to be spliced into DataFragments. This would require keeping something analogous to a Fixup with the offset, prefix, and maximum bytes to insert.

I don't want that DataFragment can change its size, it's very strange. And if we allow prefixes to be spliced into DataFragments, we have to add a vector member to DataFragment to record the position where each encoded instruction starts. It would increase the size of DataFragment even if we do not need prefix padding.

Another idea would be a new fragment subclass for instructions which are paddable, but not relaxable. Or possibly a compressed representation for RelaxableFragment (i.e. most of the space/generality appears to be overkill.)

Unless we don't need the MCInst member, we almost can not have a smaller fragment than RelaxableFragment to do the padding.
In addition, introducing a new kind of fragment would increase code complexity in MCAssembler.cpp , X86AsmBackend::padInstruction a lot.

Applied D76286,D76475 on commit 7b808b105f6aedc4066502b68b71cf205bafa582

I collected the memory usage for the prefix padding and NOP padding when building SPEC

With LTO, the prefix padding solution consumes 17% more memory than NOP padding in geomean. The highest outlier is 43%. Without LTO, the prefix padding solution consumes 5% more memory than NOP padding in geomean. The highest outlier is 28%.

Options for NOP padding:
-x86-align-branch-boundary=32 -x86-align-branch=fused+jcc+jmp+indirect+call+ret -x86-pad-max-prefix-size=0

Options for prefix padding:
-x86-align-branch-boundary=32 -x86-align-branch=fused+jcc+jmp+indirect+call+ret -x86-pad-max-prefix-size=5

skan edited the summary of this revision. (Show Details)Apr 4 2020, 6:46 AM

skan added a child revision: D77628: [Driver][X86] Add -mpad-max-prefix-size.Apr 6 2020, 9:19 PM

In D76286#1961436, @skan wrote:

Applied D76286,D76475 on commit 7b808b105f6aedc4066502b68b71cf205bafa582

I collected the memory usage for the prefix padding and NOP padding when building SPEC

With LTO, the prefix padding solution consumes 17% more memory than NOP padding in geomean. The highest outlier is 43%. Without LTO, the prefix padding solution consumes 5% more memory than NOP padding in geomean. The highest outlier is 28%.

Options for NOP padding:
-x86-align-branch-boundary=32 -x86-align-branch=fused+jcc+jmp+indirect+call+ret -x86-pad-max-prefix-size=0

Options for prefix padding:
-x86-align-branch-boundary=32 -x86-align-branch=fused+jcc+jmp+indirect+call+ret -x86-pad-max-prefix-size=5

Is the extra memory overhead acceptable to reviewers ? It occurs only when we need to align branches by prefix padding, and provide the user more choices for padding branches.

Ping

LGTM w/one required change before submit. This desperately needs test cases demonstrating the padding of instructions which aren't relaxable. Please add at least a couple. I'm fine adding some additional ones in a separate commit, but landing this without any would be inappropriate.

On the memory usage, I think the demonstrated results are not great - 45% increase worse case in LTO on *spec* is actually pretty bad -, but it's a reasonable starting point to optimize from. The non-LTO maximum looks a lot more reasonable, and I suspect this is good enough for many real users. We can continue iterating on the representation to reduce memory usage in tree.

This revision is now accepted and ready to land.Apr 7 2020, 8:59 PM

Address review comments(demonstrate the instruction being padded may be unrelaxable)

Harbormaster failed remote builds in B52309: Diff 255917!Apr 8 2020, 1:03 AM

Remote builds failed due to the bug of pre-merge check. It tried to apply a landed patch. What's the right way to work around it? Remove some parent revision or ignore the fail?

skan removed parent revisions: D76475: [X86][MC] Disable Prefix padding after hardcode/prefix, D76285: [X86][MC] Fix the bug for prefix padding support.Apr 8 2020, 1:55 AM

Nothing changes, just try to trigger a new remote build

Harbormaster completed remote builds in B52315: Diff 255927.Apr 8 2020, 2:40 AM

Harbormaster completed remote builds in B52309: Diff 255917.

Closed by commit rG916044d819c8: [X86][MC] Support enhanced relaxation for branch align (authored by skan). · Explain WhyApr 8 2020, 4:18 AM

This revision was automatically updated to reflect the committed changes.

Diff 255957

llvm/include/llvm/MC/MCAsmBackend.h

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	public:
MCAsmBackend &operator=(const MCAsmBackend &) = delete;		MCAsmBackend &operator=(const MCAsmBackend &) = delete;
virtual ~MCAsmBackend();		virtual ~MCAsmBackend();

const support::endianness Endian;		const support::endianness Endian;

/// Return true if this target might automatically pad instructions and thus		/// Return true if this target might automatically pad instructions and thus
/// need to emit padding enable/disable directives around sensative code.		/// need to emit padding enable/disable directives around sensative code.
virtual bool allowAutoPadding() const { return false; }		virtual bool allowAutoPadding() const { return false; }
		/// Return true if this target allows an unrelaxable instruction to be
		/// emitted into RelaxableFragment and then we can increase its size in a
		/// tricky way for optimization.
		virtual bool allowEnhancedRelaxation() const { return false; }

/// Give the target a chance to manipulate state related to instruction		/// Give the target a chance to manipulate state related to instruction
/// alignment (e.g. padding for optimization), instruction relaxablility, etc.		/// alignment (e.g. padding for optimization), instruction relaxablility, etc.
/// before and after actually emitting the instruction.		/// before and after actually emitting the instruction.
virtual void emitInstructionBegin(MCObjectStreamer &OS, const MCInst &Inst) {}		virtual void emitInstructionBegin(MCObjectStreamer &OS, const MCInst &Inst) {}
virtual void emitInstructionEnd(MCObjectStreamer &OS, const MCInst &Inst) {}		virtual void emitInstructionEnd(MCObjectStreamer &OS, const MCInst &Inst) {}

/// lifetime management		/// lifetime management
▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

llvm/lib/MC/MCObjectStreamer.cpp

Show First 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	void MCObjectStreamer::emitInstructionImpl(const MCInst &Inst,
Sec->setHasInstructions(true);		Sec->setHasInstructions(true);

// Now that a machine instruction has been assembled into this section, make		// Now that a machine instruction has been assembled into this section, make
// a line entry for any .loc directive that has been seen.		// a line entry for any .loc directive that has been seen.
MCDwarfLineEntry::Make(this, getCurrentSectionOnly());		MCDwarfLineEntry::Make(this, getCurrentSectionOnly());

// If this instruction doesn't need relaxation, just emit it as data.		// If this instruction doesn't need relaxation, just emit it as data.
MCAssembler &Assembler = getAssembler();		MCAssembler &Assembler = getAssembler();
if (!Assembler.getBackend().mayNeedRelaxation(Inst, STI)) {		MCAsmBackend &Backend = Assembler.getBackend();
		if (!(Backend.mayNeedRelaxation(Inst, STI) \|\|
		Backend.allowEnhancedRelaxation())) {
EmitInstToData(Inst, STI);		EmitInstToData(Inst, STI);
return;		return;
}		}

// Otherwise, relax and emit it as data if either:		// Otherwise, relax and emit it as data if either:
// - The RelaxAll flag was passed		// - The RelaxAll flag was passed
// - Bundling is enabled and this instruction is inside a bundle-locked		// - Bundling is enabled and this instruction is inside a bundle-locked
// group. We want to emit all such instructions into the same data		// group. We want to emit all such instructions into the same data
▲ Show 20 Lines • Show All 373 Lines • Show Last 20 Lines

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	X86AsmBackend(const Target &T, const MCSubtargetInfo &STI)
// Allow overriding defaults set by master flag		// Allow overriding defaults set by master flag
if (X86AlignBranchBoundary.getNumOccurrences())		if (X86AlignBranchBoundary.getNumOccurrences())
AlignBoundary = assumeAligned(X86AlignBranchBoundary);		AlignBoundary = assumeAligned(X86AlignBranchBoundary);
if (X86AlignBranch.getNumOccurrences())		if (X86AlignBranch.getNumOccurrences())
AlignBranchType = X86AlignBranchKindLoc;		AlignBranchType = X86AlignBranchKindLoc;
}		}

bool allowAutoPadding() const override;		bool allowAutoPadding() const override;
		bool allowEnhancedRelaxation() const override;
		LuoYuankeUnsubmitted Not Done Reply Inline Actions Why not just name it as "allowRelaxing" ? The relax information is eventually be indicated in instruction's .td file, right? LuoYuanke: Why not just name it as "allowRelaxing" ? The relax information is eventually be indicated in…
		skanAuthorUnsubmitted Done Reply Inline Actions See the description in the summary. Traditional relaxation means the operand's size of instruction may be change (e.g, JCC1->JCC4), but if we allow enhanced relaxtion, we can change the instruction's size by adding prefixes on it. They are different, and the enhanced relaxtion has nothing to do with td file skan: See the description in the summary. Traditional relaxation means the operand's size of…
void emitInstructionBegin(MCObjectStreamer &OS, const MCInst &Inst) override;		void emitInstructionBegin(MCObjectStreamer &OS, const MCInst &Inst) override;
void emitInstructionEnd(MCObjectStreamer &OS, const MCInst &Inst) override;		void emitInstructionEnd(MCObjectStreamer &OS, const MCInst &Inst) override;

unsigned getNumFixupKinds() const override {		unsigned getNumFixupKinds() const override {
return X86::NumTargetFixupKinds;		return X86::NumTargetFixupKinds;
}		}

Optional<MCFixupKind> getFixupKind(StringRef Name) const override;		Optional<MCFixupKind> getFixupKind(StringRef Name) const override;
▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	static bool hasVariantSymbol(const MCInst &MI) {
}		}
return false;		return false;
}		}

bool X86AsmBackend::allowAutoPadding() const {		bool X86AsmBackend::allowAutoPadding() const {
return (AlignBoundary != Align(1) && AlignBranchType != X86::AlignBranchNone);		return (AlignBoundary != Align(1) && AlignBranchType != X86::AlignBranchNone);
}		}

		bool X86AsmBackend::allowEnhancedRelaxation() const {
		return allowAutoPadding() && X86PadMaxPrefixSize != 0 && X86PadForBranchAlign;
		}

bool X86AsmBackend::needAlign(MCObjectStreamer &OS) const {		bool X86AsmBackend::needAlign(MCObjectStreamer &OS) const {
if (!OS.getAllowAutoPadding())		if (!OS.getAllowAutoPadding())
return false;		return false;
assert(allowAutoPadding() && "incorrect initialization!");		assert(allowAutoPadding() && "incorrect initialization!");

// To be Done: Currently don't deal with Bundle cases.		// To be Done: Currently don't deal with Bundle cases.
if (OS.getAssembler().isBundlingEnabled())		if (OS.getAssembler().isBundlingEnabled())
return false;		return false;
▲ Show 20 Lines • Show All 1,134 Lines • Show Last 20 Lines

llvm/test/MC/X86/align-branch-enhanced-relaxation.s

This file was added.

				# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu %s -x86-pad-max-prefix-size=1 --x86-align-branch-boundary=32 --x86-align-branch=jmp+indirect \| llvm-objdump -d - \| FileCheck %s
				# RUN: llvm-mc -mcpu=skylake -filetype=obj -triple x86_64-pc-linux-gnu %s --mc-relax-all \| llvm-objdump -d - \| FileCheck --check-prefixes=RELAX-ALL %s

				# Exercise cases where we are allowed to increase the length of unrelaxable
				# instructions (by adding prefixes) for alignment purposes.

				# The first test checks instructions 'int3', 'push %rbp', which will be padded
				# later are unrelaxable (their encoding size is still 1 byte when
				# --mc-relax-all is passed).
				.text
				.globl labeled_unrelaxable_test
				labeled_unrelaxable_test:
				# RELAX-ALL: 0: cc int3
				# RELAX-ALL: 1: 54 pushq %rsp
				int3
				push %rsp

				# The second test is a basic test, we just check the jmp is aligned by prefix
				# padding the previous instructions.
				.text
				.globl labeled_basic_test
				labeled_basic_test:
				.p2align 5
				.rept 28
				int3
				.endr
				# CHECK: 3c: 2e cc int3
				# CHECK: 3e: 2e 54 pushq %rsp
				# CHECK: 40: eb 00 jmp
				int3
				push %rsp
				jmp foo
				foo:
				ret

				# The third test check the correctness cornercase - can't add prefixes on a
				# prefix or a instruction following by a prefix.
				.globl labeled_prefix_test
				labeled_prefix_test:
				.p2align 5
				.rept 28
				int3
				.endr
				# CHECK: 7c: 2e cc int3
				int3
				# CHECK: 7e: 3e cc int3
				DS
				int3
				# CHECK: 80: eb 00 jmp
				jmp bar
				bar:
				ret

This is an archive of the discontinued LLVM Phabricator instance.

[X86][MC] Support enhanced relaxation for branch align
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 255957

llvm/include/llvm/MC/MCAsmBackend.h

llvm/lib/MC/MCObjectStreamer.cpp

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

llvm/test/MC/X86/align-branch-enhanced-relaxation.s

This is an archive of the discontinued LLVM Phabricator instance.

[X86][MC] Support enhanced relaxation for branch alignClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 255957

llvm/include/llvm/MC/MCAsmBackend.h

llvm/lib/MC/MCObjectStreamer.cpp

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

llvm/test/MC/X86/align-branch-enhanced-relaxation.s

[X86][MC] Support enhanced relaxation for branch align
ClosedPublic