This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/MC/
-
llvm/
-
MC/
-
MCAsmBackend.h
-
MCAssembler.h
-
MCFragment.h
-
lib/
-
MC/
1
MCAssembler.cpp
-
MCFragment.cpp
-
MCObjectStreamer.cpp
-
Target/X86/MCTargetDesc/
-
X86/
-
MCTargetDesc/
1
X86AsmBackend.cpp
-
test/MC/X86/
-
MC/
-
X86/
-
align-branch-64.s

Differential D71238

Align non-fused branches within 32-Byte boundary (basic case)
AbandonedPublic

Authored by reames on Dec 9 2019, 5:11 PM.

Download Raw Diff

Details

Reviewers

xiangzhangllvm
LuoYuanke
pengfei
craig.topper
MaskRay
jyknight
chandlerc
annita.zhang
ruiu
fedor.sergeev
skan

Summary

This patch is derived from D70157. Changes from that one:

Remove prefix padding and simply pad with variable lengths nops. This reduces the complexity sharply. We can and should return to prefix padding in a follow up patch.
Remove fused instruction handling. Needs to be reimplemented, but not having it makes the mechanics of the basic padding requirements more straight-forward.
Rename MCMachineDependentFragment to MCBoundaryAlignFragment and try to clarify (in comments and asserts) associated invariants and expectations.
- Remove master flag as user expectations (which are set by our patch and the corresponding gcc patch) would not be met by only aligning a subset of branches.

The intent here is to create something straight forward and easy to review. If others agree with the incremental approach, the original patch can be incrementally reimplemented on top of this one.

Diff Detail

Event Timeline

reames created this revision.Dec 9 2019, 5:11 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2019, 5:12 PM

Herald added subscribers: bollu, hiraditya, mcrosier. · View Herald Transcript

skan added a reviewer: skan.Dec 9 2019, 5:41 PM

The patch doesn't handle with the hard code case either, but as far as I can see that change was not mentioned in the description of this patch.

In D71238#1776443, @skan wrote:

The patch doesn't handle with the hard code case either, but as far as I can see that change was not mentioned in the description of this patch.

Unless I'm misreading the original patch, the notion of "hard code" is only relevant to the prefix padding scheme. Admittedly, I might be misreading, it isn't clearly stated anywhere in the original patch exactly what "hard code" is or what purpose it serves. .

tstellar added a subscriber: tstellar.Dec 9 2019, 6:46 PM

In D71238#1776488, @reames wrote:

In D71238#1776443, @skan wrote:

The patch doesn't handle with the hard code case either, but as far as I can see that change was not mentioned in the description of this patch.

Unless I'm misreading the original patch, the notion of "hard code" is only relevant to the prefix padding scheme. Admittedly, I might be misreading, it isn't clearly stated anywhere in the original patch exactly what "hard code" is or what purpose it serves. .

In function X86AsmBackend::alignBranchesBegin, I added the comment

// The prefix or nop isn't inserted if the previous item is hard code, which
// may be used to hardcode an instruction, since there is no clear instruction
// boundary.

I'm sorry if I didn't make it clear. As far as i am concerned, I think the hard code case should be dealt with.

In D71238#1776512, @skan wrote:
In D71238#1776488, @reames wrote:

In D71238#1776443, @skan wrote:

The patch doesn't handle with the hard code case either, but as far as I can see that change was not mentioned in the description of this patch.

Unless I'm misreading the original patch, the notion of "hard code" is only relevant to the prefix padding scheme. Admittedly, I might be misreading, it isn't clearly stated anywhere in the original patch exactly what "hard code" is or what purpose it serves. .

In function X86AsmBackend::alignBranchesBegin, I added the comment
// The prefix or nop isn't inserted if the previous item is hard code, which
// may be used to hardcode an instruction, since there is no clear instruction
// boundary.
I'm sorry if I didn't make it clear. As far as i am concerned, I think the hard code case should be dealt with.

I'm still not following. Can you explain the use case here? What test case differs based on the presence of the "hard code" support in the original patch?

In D71238#1776518, @reames wrote:
In D71238#1776512, @skan wrote:
In D71238#1776488, @reames wrote:

In D71238#1776443, @skan wrote:

The patch doesn't handle with the hard code case either, but as far as I can see that change was not mentioned in the description of this patch.

Unless I'm misreading the original patch, the notion of "hard code" is only relevant to the prefix padding scheme. Admittedly, I might be misreading, it isn't clearly stated anywhere in the original patch exactly what "hard code" is or what purpose it serves. .

In function X86AsmBackend::alignBranchesBegin, I added the comment
// The prefix or nop isn't inserted if the previous item is hard code, which
// may be used to hardcode an instruction, since there is no clear instruction
// boundary.
I'm sorry if I didn't make it clear. As far as i am concerned, I think the hard code case should be dealt with.
I'm still not following. Can you explain the use case here? What test case differs based on the presence of the "hard code" support in the original patch?

    .text
    nop
.Ltmp0:
    .p2align 3, 0x90
    .rept 16
    nop
    .endr
.Ltmp3:
    movl  %eax, -4(%rsp)
    .rept 2
    .byte 0x2e
    .endr
    jmp .Ltmp0

The prefix .rept 2 .byte 0x2e .endr is used to prefix instruction jmp .Ltmp0. We shouldn't add a nop before the this jump, since it makes .rept 2 .byte 0x2e .endr prefix instruction nop.

In D71238#1776518, @reames wrote:
In D71238#1776512, @skan wrote:
In function X86AsmBackend::alignBranchesBegin, I added the comment
// The prefix or nop isn't inserted if the previous item is hard code, which
// may be used to hardcode an instruction, since there is no clear instruction
// boundary.
I'm sorry if I didn't make it clear. As far as i am concerned, I think the hard code case should be dealt with.
I'm still not following. Can you explain the use case here? What test case differs based on the presence of the "hard code" support in the original patch?

Consider hand-written assembly code which looks like this:

	 .byte	0x67
	 mov	%rdx, %rax

That is from boringssl, fwiw, https://github.com/google/boringssl/blob/master/crypto/fipsmodule/bn/asm/rsaz-avx2.pl.

Or, an example from Mesa xform4.S:

p4_general_done:
	.byte 0xf3
	ret

Inserting anything between the tacked-on-prefix with .byte, and the rest of the instruction written mnemonically, would be bad.

That said...I don't like the "hard-code" code. It feels like a workaround for the root issue -- that this feature represents a big shift in what x86 assemblers might do to your code, and thus should be opt-in by asm writers, and not enabled by any global flag.

skan added inline comments.Dec 9 2019, 8:30 PM

llvm/lib/MC/MCAssembler.cpp
1007–1008	It seems that you didn't simply the code based on my latest patch, I have removed the class/function name in my comment.

skan added inline comments.Dec 9 2019, 8:50 PM

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
394–395	There is a deficiency here. For example, there may be a `MCAlignFragment` fragment between the `MCBoundaryAlignFragment` and the branch to be aligned, which will result an infinite loop.

I have a general comment. I think the macro-fused jcc instructions should be taken into account even for NOP padding. From our experience, it's better to take the macro-fused instructions as an integrated instruction, and ensure it not cross or against 32-byte boundary by padding if necessary.

Given the lack of macro-fused jcc and hard copy handling in this patch, I would suggest that we base on the previous patch and split it into two, one for NOP padding and the other for prefix padding. I suppose the first one is what you desire to have. And it should cover more corner cases we met and resolved before. What do you think?

@skan

I just want to mention here, that my comment on the original patch still stands and really needs to be addressed:

We need to have detailed performance and binary size numbers for this kind of change. We should be looking at the test suite and SPEC for performance. I'd like to see before/after performance numbers on a system that is *not* using the microcode feature that addresses jCC (either a newer CPU, an older CPu, or an unpatched CPU). I'd also like to see before/after performance numbers on a system that *is* using the microcode feature to address the jCC issue. I'd suggest Clang itself, Firefox, and Chromium as binaries to show the binary size impact of this patch. Happy for others to suggest binaries to use to evaluate the binary size impact.

I continue to think that any patch we plan to land should have these numbers, and each incremental update to that patch should have the updated numbers. Otherwise, there is no way to do a good job reviewing the *impact* of these changes even after folks are satisfied with the *code* in these changes.

In D71238#1776539, @jyknight wrote:
In D71238#1776518, @reames wrote:
In D71238#1776512, @skan wrote:
In function X86AsmBackend::alignBranchesBegin, I added the comment
// The prefix or nop isn't inserted if the previous item is hard code, which
// may be used to hardcode an instruction, since there is no clear instruction
// boundary.
I'm sorry if I didn't make it clear. As far as i am concerned, I think the hard code case should be dealt with.
I'm still not following. Can you explain the use case here? What test case differs based on the presence of the "hard code" support in the original patch?
Consider hand-written assembly code which looks like this:
...
Inserting anything between the tacked-on-prefix with .byte, and the rest of the instruction written mnemonically, would be bad.

Ok, this makes more sense now.

That said...I don't like the "hard-code" code. It feels like a workaround for the root issue -- that this feature represents a big shift in what x86 assemblers might do to your code, and thus should be opt-in by asm writers, and not enabled by any global flag.

Honestly, I both strongly agree with this, and see the argument as a somewhat lost cause. I think in practical terms, we *will* have to support something which works for both compiler generated assembly and legacy hand written assembly. I dislike that fact and the design it encourages, but it's reality.

On the hard code notion, after thinking about it over night, I suggest that we do *not* include it in the first patch.

First, exactly what "hard code" is has not been settled and is worthy of careful discussion. For instance, I see no particular reason why the "manual prefix" trick described here should be supported, but the assumptions about label alignment mentioned in the previous thread shouldn't be. There is room for discussion here, and if we block all progress on settling this, the review will stall.

Second, there are use cases for which the "hard code" support is unneeded. For instance, compiler generated assembly which doesn't do the silly "manual prefix" trick. Supporting them while we work through the support needed for opt in legacy assembly seems worthwhile.

In D71238#1776751, @chandlerc wrote:

I just want to mention here, that my comment on the original patch still stands and really needs to be addressed:

We need to have detailed performance and binary size numbers for this kind of change. We should be looking at the test suite and SPEC for performance. I'd like to see before/after performance numbers on a system that is *not* using the microcode feature that addresses jCC (either a newer CPU, an older CPu, or an unpatched CPU). I'd also like to see before/after performance numbers on a system that *is* using the microcode feature to address the jCC issue. I'd suggest Clang itself, Firefox, and Chromium as binaries to show the binary size impact of this patch. Happy for others to suggest binaries to use to evaluate the binary size impact.

I strongly agree that Intel should share these numbers in detail. So far, we've seen some summary shared on llvm-dev, but nothing in detail.

I can't currently share the details of our internal runs, but let me summarize takeaways:

All results are very preliminary - this is all from a single machine with a subset of our workloads.
Applying microcode without mitigation hurts. Geomean impacts aren't too terrible (5-10%), but some individual workloads suffer badly.
Apply mitigation (either nop padding or prefix padding) addresses most of the regressions seen. Not all; further work is needed to understand the outliers, but it's a definite step forward.
The difference between nop and prefix padding is in the noise for the current measurements. (Noise is fairly high though, so take that with a grain of salt.)
Applying mitigations on an unpatched machine does have a small negative impact. For us, it appears to be around a -1% geomean impact. (Again, *very* noisy results so take with a grain of salt.)
Reminder: This is all in the context of a JIT targetting the specific machine architecture run on. That may be important w.r.t. nop padding cost for instance.

I continue to think that any patch we plan to land should have these numbers, and each incremental update to that patch should have the updated numbers. Otherwise, there is no way to do a good job reviewing the *impact* of these changes even after folks are satisfied with the *code* in these changes.

I understand your concern, but I want to caution drawing too firm a line here. We have a situation where review on a very important patch is stalled, and we need to unblock it. This subseting was my attempt to do so, but I'm not in a position to be able to share data at the scale your arguing for. If we can't work incrementally here, I suspect this effort is going to stall.

For context, we will be shipping a mitigation for this. I would strongly prefer that be based on upstream work, but if I have to, we'll diverge and drop from the upstream discussion.

Also, to be careful, I think there's a huge difference between landing infrastructure (this patch) and enabling by default. This patch does not do the later, and I'm not sure I support *ever* doing so. That's for later discussion.

Honestly, I both strongly agree with this, and see the argument as a somewhat lost cause. I think in practical terms, we *will* have to support something which works for both compiler generated assembly and legacy hand written assembly. I dislike that fact and the design it encourages, but it's reality.

Hand-written assembly is generally responsible for its own performance optimizations, and we do not usually expect the assembler to muck around with the code. So, why is it assumed that _this_ optimization should be done automatically on such assembly code?

emaste added a subscriber: emaste.Dec 10 2019, 1:35 PM

In D71238#1777857, @jyknight wrote:

Honestly, I both strongly agree with this, and see the argument as a somewhat lost cause. I think in practical terms, we *will* have to support something which works for both compiler generated assembly and legacy hand written assembly. I dislike that fact and the design it encourages, but it's reality.

Hand-written assembly is generally responsible for its own performance optimizations, and we do not usually expect the assembler to muck around with the code. So, why is it assumed that _this_ optimization should be done automatically on such assembly code?

The more I think about this, the more I think James is right here.

Regardless of whether he is or not, we're clearly going to need an opt in assembler syntax for this at some point. Given that, I'm wondering if my time wouldn't be better spent there. The majority of the code would be shared with this patch, and it defers the controversy around optimizing assemblers until later. If we get that piece done, we can build either compiler or optimizing assembler (or both) support on top.

I've gone as far as writing a rough draft of textual assembler support. If folks agree this is helpful, I'll abandon this patch, and post one in that direction. Any high level suggestions as to syntax? I see two major options, but definitely welcome suggestions w.r.t. naming/semantics/etc...

Option 1

.boundary_align 4
jmp foo

and

.boundary_align 4
.bundle_lock
test ...
jcc foo
.bundle_lock

Option 2

Option 1

.boundary_align 4
jmp foo
.end_boundary_align

and

.boundary_align 4
test ...
jcc foo
.end_boundary_align

p.s. If anyone has a better name than "boundary align" please suggest it. That's not a good name; it's just a placeholder.

reames mentioned this in D71315: [WIP] Draft assembler support for branch alignment.Dec 10 2019, 4:28 PM

In D71238#1778356, @reames wrote:

I've gone as far as writing a rough draft of textual assembler support.

I posted the WIP draft here (https://reviews.llvm.org/D71315). Please keep the discussion here or the original thread. I don't want to fork discussion further. Once we've settled direction, I can cleanup and repost the patch if needed.

I've gone as far as writing a rough draft of textual assembler support. If folks agree this is helpful, I'll abandon this patch, and post one in that direction. Any high level suggestions as to syntax? I see two major options, but definitely welcome suggestions w.r.t. naming/semantics/etc...

Option 1

.boundary_align 4
jmp foo

and

.boundary_align 4
.bundle_lock
test ...
jcc foo
.bundle_lock

Option 2

Option 1

.boundary_align 4
jmp foo
.end_boundary_align

and

.boundary_align 4
test ...
jcc foo
.end_boundary_align

p.s. If anyone has a better name than "boundary align" please suggest it. That's not a good name; it's just a placeholder.

Do we need some option or syntax to disable ".boundary_align 4", so that the same code can be compiled for some machine that do not need align branch?

In D71238#1777857, @jyknight wrote:

Honestly, I both strongly agree with this, and see the argument as a somewhat lost cause. I think in practical terms, we *will* have to support something which works for both compiler generated assembly and legacy hand written assembly. I dislike that fact and the design it encourages, but it's reality.

Hand-written assembly is generally responsible for its own performance optimizations, and we do not usually expect the assembler to muck around with the code. So, why is it assumed that _this_ optimization should be done automatically on such assembly code?

I think we need to think about those legacy assembly codes, and libraries written in assembly etc. If they are impacted by the microcode update, the assembler can provide a quick way to mitigate it w/o any code change. Why don't we do it? At least we can give users the option to use it or not.

In D71238#1778856, @LuoYuanke wrote:

Do we need some option or syntax to disable ".boundary_align 4", so that the same code can be compiled for some machine that do not need align branch?

I wouldn't think so. Macros could be used for this purpose if needed. We - to my admittedly limited knowledge - don't have anything analogous for other forms of alignment for instance.

bcain added a subscriber: bcain.Dec 12 2019, 7:15 AM

andrew.w.kaylor added a subscriber: andrew.w.kaylor.Dec 12 2019, 11:29 PM

I'm late jumping into the discussion of these patches and may not have seen all comments, so go easy on me if I say something that has already been covered.

In D71238#1776751, @chandlerc wrote:

I just want to mention here, that my comment on the original patch still stands and really needs to be addressed:

We need to have detailed performance and binary size numbers for this kind of change. We should be looking at the test suite and SPEC for performance. I'd like to see before/after performance numbers on a system that is *not* using the microcode feature that addresses jCC (either a newer CPU, an older CPu, or an unpatched CPU). I'd also like to see before/after performance numbers on a system that *is* using the microcode feature to address the jCC issue. I'd suggest Clang itself, Firefox, and Chromium as binaries to show the binary size impact of this patch. Happy for others to suggest binaries to use to evaluate the binary size impact.

In D71238#1777747, @reames wrote:
I strongly agree that Intel should share these numbers in detail. So far, we've seen some summary shared on llvm-dev, but nothing in detail.

We ran into a very slow required internal process here. Annita tells me that we'll have data to share very soon, hopefully on Monday. I haven't seen the data myself, but I have a suspicion that it isn't going to be what you're looking for. I'm talking to people to see what I can do to establish a faster process for turning around data for incremental updates to these patches. More detailed numbers mean more scrutiny before we're allowed to share. I like the suggestion of using clang. A self build would give us code size and an execution time that I hope would be good enough for evaluating relative differences introduced by the patch. All interested reviewers should easily be able to repeat this test locally on whatever machine they like, and everyone could share their own data from whatever variations they were interested in trying.

In D71238#1776751, @chandlerc wrote:

I continue to think that any patch we plan to land should have these numbers, and each incremental update to that patch should have the updated numbers. Otherwise, there is no way to do a good job reviewing the *impact* of these changes even after folks are satisfied with the *code* in these changes.

In D71238#1777747, @reames wrote:
I understand your concern, but I want to caution drawing too firm a line here. We have a situation where review on a very important patch is stalled, and we need to unblock it. This subseting was my attempt to do so, but I'm not in a position to be able to share data at the scale your arguing for. If we can't work incrementally here, I suspect this effort is going to stall.

I agree with Philip here. Obviously, having data is critical to making decisions and knowing when we've reached an acceptable state. We just need to agree on where the threshold is for enough data to make a reasonable decision.

In D71238#1776751, @chandlerc wrote:

I just want to mention here, that my comment on the original patch still stands and really needs to be addressed:

We need to have detailed performance and binary size numbers for this kind of change. We should be looking at the test suite and SPEC for performance. I'd like to see before/after performance numbers on a system that is *not* using the microcode feature that addresses jCC (either a newer CPU, an older CPu, or an unpatched CPU). I'd also like to see before/after performance numbers on a system that *is* using the microcode feature to address the jCC issue. I'd suggest Clang itself, Firefox, and Chromium as binaries to show the binary size impact of this patch. Happy for others to suggest binaries to use to evaluate the binary size impact.

I continue to think that any patch we plan to land should have these numbers, and each incremental update to that patch should have the updated numbers. Otherwise, there is no way to do a good job reviewing the *impact* of these changes even after folks are satisfied with the *code* in these changes.

More performance data was posted on http://lists.llvm.org/pipermail/llvm-dev/2019-December/137609.html and http://lists.llvm.org/pipermail/llvm-dev/2019-December/137610.html. Let's move on based on the performance data.

I do like the idea of generalizing bundle_lock to mean generally "keep these instructions together and don't introduce anything extraneous in the middle". So, ".boundary_align <argument>" would apply to the next instruction-or-instruction-bundle, and will emit nops at its location, such that the next instruction is guaranteed to have the proper "within but not ending at a given 2**<arg> block" alignment.

This has the advantage that padding is only added to locations that explicitly ask for it -- unmodified assembly code will not be broken. (BTW, not breaking existing assembly code by introducing unexpected padding seems pretty important IMO, which is why I keep coming back to it. If we don't figure that out, I don't think we can enable the feature by default, which we will probably want to do in certain tunings.)

The disadvantage would be that the compiler needs to know exactly which instructions to request padding for.

But now I'm pondering if we may wish to leave space in our design to be able to avoid the other possible fallbacks out of the DSB. (E.g., avoiding having more than 3 jumps or 18 uops per 32-byte block of code.) I don't mean to suggest to implement any of those optimizations now, only that it's worth considering how we could implement that, as a potential future enhancement.

And, thinking about that, I can't really see a way to implement any of that with explicit per-instruction directives generated by the compiler. But, it seems like it would fit very well with a region-based mode. So, considering that potential future extension, I'm currently thinking that something similar to D71238, but with a directive to opt-in rather than command-line all-or-nothing, would be a great first-step.

And, we need a region-based annotation to mark which instructions can get prefix-padded, too.

So, compiler would request (let's say, with ".autopadding" and ".noautopadding") this mode for code where it's safe, and profitable to do. This would declare to the assembler that nop or prefix padding may be added as necessary before any instruction within the region to keep the instructions within whatever target microarchitectural constraints exist for the given architecture.

I imagine it not indicating any particular padding action be taken, only the opt-in for the assembler being _allowed_ to insert nops or padding.

So, my suggestions for next steps to take are:

Add support for such a directive into this (D71238, basic case) patch.
Add support for emitting it to clang. By default, I'd think it should be enabled around code clang emits, except for cold code, inline assembly, and patchable instructions emitted for PATCHPOINT, STATEPOINT, xray.
Incrementally extend support to use prefix-padding, and any other such future padding smarts we want.

Unfortunately, I need to disappear right now, will not be able to join the zoom call. Sorry about that!

Abandoning this patch. After call, we're consolidating back on the original patch now that direction has been agreed on.

Revision Contents

Path

Size

llvm/

include/

llvm/

MC/

MCAsmBackend.h

3 lines

MCAssembler.h

3 lines

MCFragment.h

33 lines

lib/

MC/

MCAssembler.cpp

79 lines

MCFragment.cpp

13 lines

MCObjectStreamer.cpp

2 lines

Target/

X86/

MCTargetDesc/

X86AsmBackend.cpp

166 lines

test/

MC/

X86/

align-branch-64.s

87 lines

Diff 232971

llvm/include/llvm/MC/MCAsmBackend.h

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines

	public:			public:
	MCAsmBackend(const MCAsmBackend &) = delete;			MCAsmBackend(const MCAsmBackend &) = delete;
	MCAsmBackend &operator=(const MCAsmBackend &) = delete;			MCAsmBackend &operator=(const MCAsmBackend &) = delete;
	virtual ~MCAsmBackend();			virtual ~MCAsmBackend();

	const support::endianness Endian;			const support::endianness Endian;

				virtual void alignBranchesBegin(MCObjectStreamer &OS, const MCInst &Inst) {}
				virtual void alignBranchesEnd(MCObjectStreamer &OS, const MCInst &Inst) {}

	/// lifetime management			/// lifetime management
	virtual void reset() {}			virtual void reset() {}

	/// Create a new MCObjectWriter instance for use by the assembler backend to			/// Create a new MCObjectWriter instance for use by the assembler backend to
	/// emit the final object file.			/// emit the final object file.
	std::unique_ptr<MCObjectWriter>			std::unique_ptr<MCObjectWriter>
	createObjectWriter(raw_pwrite_stream &OS) const;			createObjectWriter(raw_pwrite_stream &OS) const;

	▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

llvm/include/llvm/MC/MCAssembler.h

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	private:
/// if any offsets were adjusted.		/// if any offsets were adjusted.
bool layoutSectionOnce(MCAsmLayout &Layout, MCSection &Sec);		bool layoutSectionOnce(MCAsmLayout &Layout, MCSection &Sec);

bool relaxInstruction(MCAsmLayout &Layout, MCRelaxableFragment &IF);		bool relaxInstruction(MCAsmLayout &Layout, MCRelaxableFragment &IF);

bool relaxPaddingFragment(MCAsmLayout &Layout, MCPaddingFragment &PF);		bool relaxPaddingFragment(MCAsmLayout &Layout, MCPaddingFragment &PF);

bool relaxLEB(MCAsmLayout &Layout, MCLEBFragment &IF);		bool relaxLEB(MCAsmLayout &Layout, MCLEBFragment &IF);
		bool relaxBoundaryAlign(MCAsmLayout &Layout,
		MCBoundaryAlignFragment &MF);
bool relaxDwarfLineAddr(MCAsmLayout &Layout, MCDwarfLineAddrFragment &DF);		bool relaxDwarfLineAddr(MCAsmLayout &Layout, MCDwarfLineAddrFragment &DF);
bool relaxDwarfCallFrameFragment(MCAsmLayout &Layout,		bool relaxDwarfCallFrameFragment(MCAsmLayout &Layout,
MCDwarfCallFrameFragment &DF);		MCDwarfCallFrameFragment &DF);
bool relaxCVInlineLineTable(MCAsmLayout &Layout,		bool relaxCVInlineLineTable(MCAsmLayout &Layout,
MCCVInlineLineTableFragment &DF);		MCCVInlineLineTableFragment &DF);
bool relaxCVDefRange(MCAsmLayout &Layout, MCCVDefRangeFragment &DF);		bool relaxCVDefRange(MCAsmLayout &Layout, MCCVDefRangeFragment &DF);

/// finishLayout - Finalize a layout, including fragment lowering.		/// finishLayout - Finalize a layout, including fragment lowering.
▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

llvm/include/llvm/MC/MCFragment.h

Show All 35 Lines	enum FragmentType : uint8_t {
FT_Data,		FT_Data,
FT_CompactEncodedInst,		FT_CompactEncodedInst,
FT_Fill,		FT_Fill,
FT_Relaxable,		FT_Relaxable,
FT_Org,		FT_Org,
FT_Dwarf,		FT_Dwarf,
FT_DwarfFrame,		FT_DwarfFrame,
FT_LEB,		FT_LEB,
		FT_BoundaryAlign,
FT_Padding,		FT_Padding,
FT_SymbolId,		FT_SymbolId,
FT_CVInlineLines,		FT_CVInlineLines,
FT_CVDefRange,		FT_CVDefRange,
FT_Dummy		FT_Dummy
};		};

private:		private:
▲ Show 20 Lines • Show All 599 Lines • ▼ Show 20 Lines	public:
StringRef getFixedSizePortion() const { return FixedSizePortion; }		StringRef getFixedSizePortion() const { return FixedSizePortion; }
/// @}		/// @}

static bool classof(const MCFragment *F) {		static bool classof(const MCFragment *F) {
return F->getKind() == MCFragment::FT_CVDefRange;		return F->getKind() == MCFragment::FT_CVDefRange;
}		}
};		};

		/// Representing required padding such that a particular other fragment does
		/// not cross a particular power-of-two boundary. The other fragment must
		/// follow this one within the same section, but is not guaranteed to
		/// immediately follow.
		class MCBoundaryAlignFragment : public MCFragment {
		private:
		/// The size of the MCBoundaryAlignFragment. Lazily populated
		/// during relaxation.
		unsigned Size = 0;
		/// The fragment which must be aligned so as not to cross boundary..
		const MCFragment *Fragment = nullptr;
		/// The boundary which must not be crossed. Must be a power of two.
		unsigned AlignBoundarySize = 0;

		public:
		MCBoundaryAlignFragment(unsigned AlignBoundarySize,
		MCSection *Sec = nullptr)
		: MCFragment(FT_BoundaryAlign, false, Sec),
		AlignBoundarySize(AlignBoundarySize) {}

		unsigned getBoundarySize() const { return AlignBoundarySize; }

		void setSize(unsigned Value) { Size = Value; }
		uint64_t getSize() const { return Size; }

		void setFragment(const MCFragment *Target) { Fragment = Target; }
		const MCFragment *getFragment() const { return Fragment; }

		static bool classof(const MCFragment *F) {
		return F->getKind() == MCFragment::FT_BoundaryAlign;
		}
		};
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_MC_MCFRAGMENT_H		#endif // LLVM_MC_MCFRAGMENT_H

llvm/lib/MC/MCAssembler.cpp

Show First 20 Lines • Show All 307 Lines • ▼ Show 20 Lines	if (Size < 0) {
return 0;		return 0;
}		}
return Size;		return Size;
}		}

case MCFragment::FT_LEB:		case MCFragment::FT_LEB:
return cast<MCLEBFragment>(F).getContents().size();		return cast<MCLEBFragment>(F).getContents().size();

		case MCFragment::FT_BoundaryAlign:
		return cast<MCBoundaryAlignFragment>(F).getSize();

case MCFragment::FT_Padding:		case MCFragment::FT_Padding:
return cast<MCPaddingFragment>(F).getSize();		return cast<MCPaddingFragment>(F).getSize();

case MCFragment::FT_SymbolId:		case MCFragment::FT_SymbolId:
return 4;		return 4;

case MCFragment::FT_Align: {		case MCFragment::FT_Align: {
const MCAlignFragment &AF = cast<MCAlignFragment>(F);		const MCAlignFragment &AF = cast<MCAlignFragment>(F);
▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	static void writeFragment(raw_ostream &OS, const MCAssembler &Asm,
}		}

case MCFragment::FT_LEB: {		case MCFragment::FT_LEB: {
const MCLEBFragment &LF = cast<MCLEBFragment>(F);		const MCLEBFragment &LF = cast<MCLEBFragment>(F);
OS << LF.getContents();		OS << LF.getContents();
break;		break;
}		}

		case MCFragment::FT_BoundaryAlign: {
		if (FragmentSize == 0)
		break;
		if (!Asm.getBackend().writeNopData(OS, FragmentSize))
		report_fatal_error("unable to write nop sequence of " +
		Twine(FragmentSize) + " bytes");
		break;
		}

case MCFragment::FT_Padding: {		case MCFragment::FT_Padding: {
if (!Asm.getBackend().writeNopData(OS, FragmentSize))		if (!Asm.getBackend().writeNopData(OS, FragmentSize))
report_fatal_error("unable to write nop sequence of " +		report_fatal_error("unable to write nop sequence of " +
Twine(FragmentSize) + " bytes");		Twine(FragmentSize) + " bytes");
break;		break;
}		}

case MCFragment::FT_SymbolId: {		case MCFragment::FT_SymbolId: {
▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	bool MCAssembler::relaxLEB(MCAsmLayout &Layout, MCLEBFragment &LF) {
// only increase an LEB fragment size here, not decrease it. See PR35809.		// only increase an LEB fragment size here, not decrease it. See PR35809.
if (LF.isSigned())		if (LF.isSigned())
encodeSLEB128(Value, OSE, OldSize);		encodeSLEB128(Value, OSE, OldSize);
else		else
encodeULEB128(Value, OSE, OldSize);		encodeULEB128(Value, OSE, OldSize);
return OldSize != LF.getContents().size();		return OldSize != LF.getContents().size();
}		}

		/// mayCrossBoundary - Check if the branch with given address and size crosses
		/// the boundary.
		static bool mayCrossBoundary(unsigned StartAddr, unsigned Size,
		unsigned BoundarySize) {
		unsigned EndAddr = StartAddr + Size;
		return StartAddr / BoundarySize != ((EndAddr - 1) / BoundarySize);
		}

		/// isAgainstBoundary - Check if the branch with given address and size is
		/// against the boundary.
		static bool isAgainstBoundary(unsigned StartAddr, unsigned Size,
		unsigned BoundarySize) {
		unsigned EndAddr = StartAddr + Size;
		return EndAddr % BoundarySize == 0;
		}

		/// needPadding - Check if the branch with given address and size needs padding.
		static bool needPadding(unsigned StartAddr, unsigned Size,
		unsigned BoundarySize) {
		return mayCrossBoundary(StartAddr, Size, BoundarySize) \|\|
		isAgainstBoundary(StartAddr, Size, BoundarySize);
		}

		/// getPaddingSize - Get how many bytes need to be padded to align branch with
		/// given address if the branch cross or is against the boundary.
		skanUnsubmitted Not Done Reply Inline Actions It seems that you didn't simply the code based on my latest patch, I have removed the class/function name in my comment. skan: It seems that you didn't simply the code based on my latest patch, I have removed the…
		static unsigned getPaddingSize(unsigned StartAddr, unsigned BoundarySize) {
		return BoundarySize - (StartAddr % BoundarySize);
		}

		/// getInstSize - Get the size of encoded instruction in the fragment.
		// Can't this be replaced w/computeFragmentSize?
		static unsigned getInstSize(const MCFragment &F) {
		switch (F.getKind()) {
		default:
		llvm_unreachable("Illegal fragment type");
		case MCFragment::FT_Data:
		return cast<MCDataFragment>(F).getContents().size();
		case MCFragment::FT_Relaxable:
		return cast<MCRelaxableFragment>(F).getContents().size();
		case MCFragment::FT_CompactEncodedInst:
		return cast<MCCompactEncodedInstFragment>(F).getContents().size();
		}
		}

		bool MCAssembler::relaxBoundaryAlign(MCAsmLayout &Layout,
		MCBoundaryAlignFragment &MF) {
		auto *BranchFragment = MF.getFragment();
		if (!BranchFragment)
		return false;
		unsigned OldSize = MF.getSize();
		unsigned AlignedSize = getInstSize(*BranchFragment);
		unsigned AlignedOffset = Layout.getFragmentOffset(BranchFragment);
		AlignedOffset -= OldSize;
		unsigned NewSize = 0;
		unsigned BoundarySize = MF.getBoundarySize();
		if (needPadding(AlignedOffset, AlignedSize, BoundarySize)) {
		NewSize = getPaddingSize(AlignedOffset, BoundarySize);
		assert(NewSize < BoundarySize);
		}
		MF.setSize(NewSize);
		return (NewSize != OldSize);
		}

bool MCAssembler::relaxDwarfLineAddr(MCAsmLayout &Layout,		bool MCAssembler::relaxDwarfLineAddr(MCAsmLayout &Layout,
MCDwarfLineAddrFragment &DF) {		MCDwarfLineAddrFragment &DF) {
MCContext &Context = Layout.getAssembler().getContext();		MCContext &Context = Layout.getAssembler().getContext();
uint64_t OldSize = DF.getContents().size();		uint64_t OldSize = DF.getContents().size();
int64_t AddrDelta;		int64_t AddrDelta;
bool Abs = DF.getAddrDelta().evaluateKnownAbsolute(AddrDelta, Layout);		bool Abs = DF.getAddrDelta().evaluateKnownAbsolute(AddrDelta, Layout);
assert(Abs && "We created a line delta with an invalid expression");		assert(Abs && "We created a line delta with an invalid expression");
(void)Abs;		(void)Abs;
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	for (MCSection::iterator I = Sec.begin(), IE = Sec.end(); I != IE; ++I) {
case MCFragment::FT_DwarfFrame:		case MCFragment::FT_DwarfFrame:
RelaxedFrag =		RelaxedFrag =
relaxDwarfCallFrameFragment(Layout,		relaxDwarfCallFrameFragment(Layout,
*cast<MCDwarfCallFrameFragment>(I));		*cast<MCDwarfCallFrameFragment>(I));
break;		break;
case MCFragment::FT_LEB:		case MCFragment::FT_LEB:
RelaxedFrag = relaxLEB(Layout, *cast<MCLEBFragment>(I));		RelaxedFrag = relaxLEB(Layout, *cast<MCLEBFragment>(I));
break;		break;
		case MCFragment::FT_BoundaryAlign:
		RelaxedFrag =
		relaxBoundaryAlign(Layout, *cast<MCBoundaryAlignFragment>(I));
		break;
case MCFragment::FT_Padding:		case MCFragment::FT_Padding:
RelaxedFrag = relaxPaddingFragment(Layout, *cast<MCPaddingFragment>(I));		RelaxedFrag = relaxPaddingFragment(Layout, *cast<MCPaddingFragment>(I));
break;		break;
case MCFragment::FT_CVInlineLines:		case MCFragment::FT_CVInlineLines:
RelaxedFrag =		RelaxedFrag =
relaxCVInlineLineTable(Layout, *cast<MCCVInlineLineTableFragment>(I));		relaxCVInlineLineTable(Layout, *cast<MCCVInlineLineTableFragment>(I));
break;		break;
case MCFragment::FT_CVDefRange:		case MCFragment::FT_CVDefRange:
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/lib/MC/MCFragment.cpp

Show First 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	case FT_Dwarf:
delete cast<MCDwarfLineAddrFragment>(this);		delete cast<MCDwarfLineAddrFragment>(this);
return;		return;
case FT_DwarfFrame:		case FT_DwarfFrame:
delete cast<MCDwarfCallFrameFragment>(this);		delete cast<MCDwarfCallFrameFragment>(this);
return;		return;
case FT_LEB:		case FT_LEB:
delete cast<MCLEBFragment>(this);		delete cast<MCLEBFragment>(this);
return;		return;
		case FT_BoundaryAlign:
		delete cast<MCBoundaryAlignFragment>(this);
		return;
case FT_Padding:		case FT_Padding:
delete cast<MCPaddingFragment>(this);		delete cast<MCPaddingFragment>(this);
return;		return;
case FT_SymbolId:		case FT_SymbolId:
delete cast<MCSymbolIdFragment>(this);		delete cast<MCSymbolIdFragment>(this);
return;		return;
case FT_CVInlineLines:		case FT_CVInlineLines:
delete cast<MCCVInlineLineTableFragment>(this);		delete cast<MCCVInlineLineTableFragment>(this);
Show All 31 Lines	LLVM_DUMP_METHOD void MCFragment::dump() const {
case MCFragment::FT_CompactEncodedInst:		case MCFragment::FT_CompactEncodedInst:
OS << "MCCompactEncodedInstFragment"; break;		OS << "MCCompactEncodedInstFragment"; break;
case MCFragment::FT_Fill: OS << "MCFillFragment"; break;		case MCFragment::FT_Fill: OS << "MCFillFragment"; break;
case MCFragment::FT_Relaxable: OS << "MCRelaxableFragment"; break;		case MCFragment::FT_Relaxable: OS << "MCRelaxableFragment"; break;
case MCFragment::FT_Org: OS << "MCOrgFragment"; break;		case MCFragment::FT_Org: OS << "MCOrgFragment"; break;
case MCFragment::FT_Dwarf: OS << "MCDwarfFragment"; break;		case MCFragment::FT_Dwarf: OS << "MCDwarfFragment"; break;
case MCFragment::FT_DwarfFrame: OS << "MCDwarfCallFrameFragment"; break;		case MCFragment::FT_DwarfFrame: OS << "MCDwarfCallFrameFragment"; break;
case MCFragment::FT_LEB: OS << "MCLEBFragment"; break;		case MCFragment::FT_LEB: OS << "MCLEBFragment"; break;
		case MCFragment::FT_BoundaryAlign: OS<<"MCBoundaryAlignFragment"; break;
case MCFragment::FT_Padding: OS << "MCPaddingFragment"; break;		case MCFragment::FT_Padding: OS << "MCPaddingFragment"; break;
case MCFragment::FT_SymbolId: OS << "MCSymbolIdFragment"; break;		case MCFragment::FT_SymbolId: OS << "MCSymbolIdFragment"; break;
case MCFragment::FT_CVInlineLines: OS << "MCCVInlineLineTableFragment"; break;		case MCFragment::FT_CVInlineLines: OS << "MCCVInlineLineTableFragment"; break;
case MCFragment::FT_CVDefRange: OS << "MCCVDefRangeTableFragment"; break;		case MCFragment::FT_CVDefRange: OS << "MCCVDefRangeTableFragment"; break;
case MCFragment::FT_Dummy: OS << "MCDummyFragment"; break;		case MCFragment::FT_Dummy: OS << "MCDummyFragment"; break;
}		}

OS << "<MCFragment " << (const void *)this << " LayoutOrder:" << LayoutOrder		OS << "<MCFragment " << (const void *)this << " LayoutOrder:" << LayoutOrder
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	case MCFragment::FT_DwarfFrame: {
break;		break;
}		}
case MCFragment::FT_LEB: {		case MCFragment::FT_LEB: {
const MCLEBFragment *LF = cast<MCLEBFragment>(this);		const MCLEBFragment *LF = cast<MCLEBFragment>(this);
OS << "\n ";		OS << "\n ";
OS << " Value:" << LF->getValue() << " Signed:" << LF->isSigned();		OS << " Value:" << LF->getValue() << " Signed:" << LF->isSigned();
break;		break;
}		}
		case MCFragment::FT_BoundaryAlign: {
		const MCBoundaryAlignFragment *MF =
		cast<MCBoundaryAlignFragment>(this);
		OS << "\n ";
		OS << " Size:" << MF->getSize();
		OS << " AlignBoundarySize:" << MF->getBoundarySize();
		OS << " Fragment:" << MF->getFragment();
		break;
		}
case MCFragment::FT_Padding: {		case MCFragment::FT_Padding: {
const MCPaddingFragment *F = cast<MCPaddingFragment>(this);		const MCPaddingFragment *F = cast<MCPaddingFragment>(this);
OS << "\n ";		OS << "\n ";
OS << " PaddingPoliciesMask:" << F->getPaddingPoliciesMask()		OS << " PaddingPoliciesMask:" << F->getPaddingPoliciesMask()
<< " IsInsertionPoint:" << F->isInsertionPoint()		<< " IsInsertionPoint:" << F->isInsertionPoint()
<< " Size:" << F->getSize();		<< " Size:" << F->getSize();
OS << "\n ";		OS << "\n ";
OS << " Inst:";		OS << " Inst:";
Show All 33 Lines

llvm/lib/MC/MCObjectStreamer.cpp

	Show First 20 Lines • Show All 324 Lines • ▼ Show 20 Lines

	bool MCObjectStreamer::mayHaveInstructions(MCSection &Sec) const {			bool MCObjectStreamer::mayHaveInstructions(MCSection &Sec) const {
	return Sec.hasInstructions();			return Sec.hasInstructions();
	}			}

	void MCObjectStreamer::EmitInstruction(const MCInst &Inst,			void MCObjectStreamer::EmitInstruction(const MCInst &Inst,
	const MCSubtargetInfo &STI) {			const MCSubtargetInfo &STI) {
	getAssembler().getBackend().handleCodePaddingInstructionBegin(Inst);			getAssembler().getBackend().handleCodePaddingInstructionBegin(Inst);
				getAssembler().getBackend().alignBranchesBegin(*this, Inst);
	EmitInstructionImpl(Inst, STI);			EmitInstructionImpl(Inst, STI);
				getAssembler().getBackend().alignBranchesEnd(*this, Inst);
	getAssembler().getBackend().handleCodePaddingInstructionEnd(Inst);			getAssembler().getBackend().handleCodePaddingInstructionEnd(Inst);
	}			}

	void MCObjectStreamer::EmitInstructionImpl(const MCInst &Inst,			void MCObjectStreamer::EmitInstructionImpl(const MCInst &Inst,
	const MCSubtargetInfo &STI) {			const MCSubtargetInfo &STI) {
	MCStreamer::EmitInstruction(Inst, STI);			MCStreamer::EmitInstruction(Inst, STI);

	MCSection *Sec = getCurrentSectionOnly();			MCSection *Sec = getCurrentSectionOnly();
	▲ Show 20 Lines • Show All 404 Lines • Show Last 20 Lines

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

Show All 13 Lines
#include "llvm/MC/MCAsmBackend.h"		#include "llvm/MC/MCAsmBackend.h"
#include "llvm/MC/MCAssembler.h"		#include "llvm/MC/MCAssembler.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCDwarf.h"		#include "llvm/MC/MCDwarf.h"
#include "llvm/MC/MCELFObjectWriter.h"		#include "llvm/MC/MCELFObjectWriter.h"
#include "llvm/MC/MCExpr.h"		#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCFixupKindInfo.h"		#include "llvm/MC/MCFixupKindInfo.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
		#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCMachObjectWriter.h"		#include "llvm/MC/MCMachObjectWriter.h"
		#include "llvm/MC/MCObjectStreamer.h"
#include "llvm/MC/MCObjectWriter.h"		#include "llvm/MC/MCObjectWriter.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSectionMachO.h"		#include "llvm/MC/MCSectionMachO.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/MCValue.h"		#include "llvm/MC/MCValue.h"
		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		#include "llvm/Support/TargetRegistry.h"
using namespace llvm;		using namespace llvm;

static unsigned getFixupKindSize(unsigned Kind) {		static unsigned getFixupKindSize(unsigned Kind) {
switch (Kind) {		switch (Kind) {
default:		default:
llvm_unreachable("invalid fixup kind!");		llvm_unreachable("invalid fixup kind!");
case FK_NONE:		case FK_NONE:
return 0;		return 0;
Show All 21 Lines	static unsigned getFixupKindSize(unsigned Kind) {
case FK_SecRel_8:		case FK_SecRel_8:
case FK_Data_8:		case FK_Data_8:
case X86::reloc_global_offset_table8:		case X86::reloc_global_offset_table8:
return 8;		return 8;
}		}
}		}

namespace {		namespace {
		class X86AlignBranchKind {
		private:
		uint8_t AlignBranchKind = 0;

		public:
		enum Flag : uint8_t {
		AlignBranchNone = 0,
		AlignBranchJmp = 1U << 1,
		AlignBranchCall = 1U << 2,
		AlignBranchRet = 1U << 3,
		AlignBranchIndirect = 1U << 4
		};

		void operator=(const std::string &Val) {
		if (Val.empty())
		return;
		SmallVector<StringRef, 6> BranchTypes;
		StringRef(Val).split(BranchTypes, '+', -1, false);
		for (auto BranchType : BranchTypes) {
		if (BranchType == "jmp")
		addKind(AlignBranchJmp);
		else if (BranchType == "call")
		addKind(AlignBranchCall);
		else if (BranchType == "ret")
		addKind(AlignBranchRet);
		else if (BranchType == "indirect")
		addKind(AlignBranchIndirect);
		}
		}

		operator uint8_t() const { return AlignBranchKind; }
		void addKind(Flag Value) { AlignBranchKind \|= Value; }
		};

		X86AlignBranchKind X86AlignBranchKindLoc;

		cl::opt<unsigned> X86AlignBranchBoundary(
		"x86-align-branch-boundary", cl::init(0), cl::Hidden,
		cl::desc("Control how the assembler should align branches with segment "
		"prefixes or NOP. The boundary's size must be a power of 2. It "
		"should be 0 or no less than 32. Branches will be aligned within "
		"the boundary of specifies size. -x86-align-branch-boundary=0 "
		"doesn't align branches."));

		cl::opt<X86AlignBranchKind, true, cl::parser<std::string>> X86AlignBranch(
		"x86-align-branch",
		cl::desc("Specify types of branches to align (plus separated list of "
		"types). The branches's types is combination of jmp, call, ret,"
		"and indirect."),
		cl::Hidden,
		cl::value_desc(
		"jmp, which aligns unconditional jumps; call, which aligns calls; "
		"ret, which aligns rets; indirect, which aligns indirect jumps."),
		cl::location(X86AlignBranchKindLoc));

class X86ELFObjectWriter : public MCELFObjectTargetWriter {		class X86ELFObjectWriter : public MCELFObjectTargetWriter {
public:		public:
X86ELFObjectWriter(bool is64Bit, uint8_t OSABI, uint16_t EMachine,		X86ELFObjectWriter(bool is64Bit, uint8_t OSABI, uint16_t EMachine,
bool HasRelocationAddend, bool foobar)		bool HasRelocationAddend, bool foobar)
: MCELFObjectTargetWriter(is64Bit, OSABI, EMachine, HasRelocationAddend) {}		: MCELFObjectTargetWriter(is64Bit, OSABI, EMachine, HasRelocationAddend) {}
};		};

class X86AsmBackend : public MCAsmBackend {		class X86AsmBackend : public MCAsmBackend {
const MCSubtargetInfo &STI;		const MCSubtargetInfo &STI;
		const MCInstrInfo &MCII;
		X86AlignBranchKind AlignBranchType;
		unsigned AlignBoundarySize = 0;
		MCBoundaryAlignFragment *PendingBA = nullptr;

		bool hasVariantSymbol(const MCInst &MI) const;
		bool needAlign(MCObjectStreamer &OS) const;
		bool needAlignInst(const MCInst &Inst) const;

public:		public:
X86AsmBackend(const Target &T, const MCSubtargetInfo &STI)		X86AsmBackend(const Target &T, const MCSubtargetInfo &STI)
: MCAsmBackend(support::little), STI(STI) {}		: MCAsmBackend(support::little), STI(STI),
		MCII(*(T.createMCInstrInfo())) {
		AlignBoundarySize = X86AlignBranchBoundary;
		AlignBranchType = X86AlignBranchKindLoc;
		}

		void alignBranchesBegin(MCObjectStreamer &OS, const MCInst &Inst) override;
		void alignBranchesEnd(MCObjectStreamer &OS, const MCInst &Inst) override;

unsigned getNumFixupKinds() const override {		unsigned getNumFixupKinds() const override {
return X86::NumTargetFixupKinds;		return X86::NumTargetFixupKinds;
}		}

Optional<MCFixupKind> getFixupKind(StringRef Name) const override;		Optional<MCFixupKind> getFixupKind(StringRef Name) const override;

const MCFixupKindInfo &getFixupKindInfo(MCFixupKind Kind) const override {		const MCFixupKindInfo &getFixupKindInfo(MCFixupKind Kind) const override {
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines

static unsigned getRelaxedOpcode(const MCInst &Inst, bool is16BitMode) {		static unsigned getRelaxedOpcode(const MCInst &Inst, bool is16BitMode) {
unsigned R = getRelaxedOpcodeArith(Inst);		unsigned R = getRelaxedOpcodeArith(Inst);
if (R != Inst.getOpcode())		if (R != Inst.getOpcode())
return R;		return R;
return getRelaxedOpcodeBranch(Inst, is16BitMode);		return getRelaxedOpcodeBranch(Inst, is16BitMode);
}		}

		/// hasVariantSymbol - Check if the instruction has variant symbol operand.
		bool X86AsmBackend::hasVariantSymbol(const MCInst &MI) const {

		for (auto &Operand : MI) {
		if (Operand.isExpr()) {
		const MCExpr &Expr = *Operand.getExpr();
		if (Expr.getKind() == MCExpr::SymbolRef &&
		cast<MCSymbolRefExpr>(*Operand.getExpr()).getKind() !=
		MCSymbolRefExpr::VK_None)
		return true;
		}
		}
		return false;
		}

		bool X86AsmBackend::needAlign(MCObjectStreamer &OS) const {
		if (AlignBoundarySize == 0 \|\|
		AlignBranchType == X86AlignBranchKind::AlignBranchNone)
		return false;

		MCAssembler &Assembler = OS.getAssembler();
		MCSection *Sec = OS.getCurrentSectionOnly();
		// To be Done: Currently don't deal with Bundle cases.
		if (Assembler.isBundlingEnabled() && Sec->isBundleLocked())
		return false;

		// Branches only need to be aligned in 32-bit or 64-bit mode.
		if (!(STI.getFeatureBits()[X86::Mode64Bit] \|\|
		STI.getFeatureBits()[X86::Mode32Bit]))
		return false;

		return true;
		}

		/// needAlignInst - Check if the instruction operand needs to be aligned.
		/// Padding is disabled before intruction which may be rewritten by linker(eg.
		/// TLSCALL).
		bool X86AsmBackend::needAlignInst(const MCInst &Inst) const {
		// Linker may rewrite the instruction with variant symbol operand.
		if(hasVariantSymbol(Inst)) return false;

		const MCInstrDesc &InstDesc = MCII.get(Inst.getOpcode());
		return (InstDesc.isUnconditionalBranch() &&
		(AlignBranchType & X86AlignBranchKind::AlignBranchJmp)) \|\|
		(InstDesc.isCall() &&
		(AlignBranchType & X86AlignBranchKind::AlignBranchCall)) \|\|
		(InstDesc.isReturn() &&
		(AlignBranchType & X86AlignBranchKind::AlignBranchRet)) \|\|
		(InstDesc.isIndirectBranch() &&
		(AlignBranchType & X86AlignBranchKind::AlignBranchIndirect));
		}

		/// alignBranchesBegin - Insert MCBoundaryAlignFragment before instructions
		/// to align branches.
		void X86AsmBackend::alignBranchesBegin(MCObjectStreamer &OS,
		const MCInst &Inst) {
		if (!needAlign(OS) \|\| !needAlignInst(Inst))
		return;

		// Insert BranchPadding before the instruction need to be aligned.
		auto *F = new MCBoundaryAlignFragment(AlignBoundarySize);
		skanUnsubmitted Not Done Reply Inline Actions There is a deficiency here. For example, there may be a `MCAlignFragment` fragment between the `MCBoundaryAlignFragment` and the branch to be aligned, which will result an infinite loop. skan: There is a deficiency here. For example, there may be a `MCAlignFragment` fragment between the…
		OS.insert(F);
		PendingBA = F;
		}

		void X86AsmBackend::alignBranchesEnd(MCObjectStreamer &OS, const MCInst &Inst) {
		if (!needAlign(OS) \|\| !needAlignInst(Inst))
		return;

		MCFragment *CF = OS.getCurrentFragment();
		if (!CF)
		return;

		auto *F = PendingBA;
		PendingBA = nullptr;
		assert(F);

		// Link it for later relaxation - this allows the padding for alignment to
		// actually be computed and emitted.
		F->setFragment(CF);

		// Update the maximum alignment on the current section if necessary.
		MCSection *Sec = OS.getCurrentSectionOnly();
		if (AlignBoundarySize > Sec->getAlignment())
		Sec->setAlignment(Align(AlignBoundarySize));

		// Break the last fragment so that more instructions can't be pushed into it.
		OS.insert(new MCDataFragment());
		}

Optional<MCFixupKind> X86AsmBackend::getFixupKind(StringRef Name) const {		Optional<MCFixupKind> X86AsmBackend::getFixupKind(StringRef Name) const {
if (STI.getTargetTriple().isOSBinFormatELF()) {		if (STI.getTargetTriple().isOSBinFormatELF()) {
if (STI.getTargetTriple().getArch() == Triple::x86_64) {		if (STI.getTargetTriple().getArch() == Triple::x86_64) {
if (Name == "R_X86_64_NONE")		if (Name == "R_X86_64_NONE")
return FK_NONE;		return FK_NONE;
} else {		} else {
if (Name == "R_386_NONE")		if (Name == "R_386_NONE")
return FK_NONE;		return FK_NONE;
▲ Show 20 Lines • Show All 626 Lines • Show Last 20 Lines

llvm/test/MC/X86/align-branch-64.s

This file was added.

				# RUN: llvm-mc -filetype=obj -triple x86_64-pc-linux-gnu --x86-align-branch-boundary=32 --x86-align-branch=call+jmp+indirect+ret %s \| llvm-objdump -d --no-show-raw-insn - \| FileCheck %s

				# instruction sizes for reference:
				# callq is 5 bytes long
				# push %rax is 1 byte
				# jmp <near-label> is 2 bytes
				# jmp <far-label> is 5 bytes
				# ret N is 2 bytes

				# These tests are checking the edge cases on the alignment computation

				.text
				# CHECK: test1:
				# CHECK: 20: callq
				.globl test1
				.p2align 5
				test1:
				.rept 29
				push %rax
				.endr
				callq bar

				# CHECK: test2:
				# CHECK: 60: callq
				.globl test2
				.p2align 5
				test2:
				.rept 31
				push %rax
				.endr
				callq bar

				# CHECK: test3:
				# CHECK: a0: callq
				.globl test3
				.p2align 5
				test3:
				.rept 27
				push %rax
				.endr
				callq bar

				# next couple check instruction type coverage

				# CHECK: test_jmp:
				# CHECK: e0: jmp
				.globl test_jmp
				.p2align 5
				test_jmp:
				.rept 31
				push %rax
				.endr
				jmp bar

				# CHECK: test_ret:
				# CHECK: 120: retq
				.globl test_ret
				.p2align 5
				test_ret:
				.rept 31
				push %rax
				.endr
				retq $0

				# check a case with a relaxable instruction

				# CHECK: test_jmp_far:
				# CHECK: 160: jmp
				.globl test_jmp_far
				.p2align 5
				test_jmp_far:
				.rept 31
				push %rax
				.endr
				jmp baz


				.p2align 4
				.type bar,@function
				bar:
				retq

				.section "unknown"
				.p2align 4
				.type baz,@function
				baz:
				retq