This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/MC/
-
llvm/
-
MC/
5/5
MCFragment.h
2/2
MCObjectStreamer.h
2/2
MCStreamer.h
-
lib/
-
MC/
15/15
MCAssembler.cpp
-
MCFragment.cpp
-
MCObjectStreamer.cpp
-
MCStreamer.cpp
-
Target/X86/AsmParser/
-
X86/
-
AsmParser/
11/11
X86AsmParser.cpp
-
test/MC/X86/
-
MC/
-
X86/
-
directive-avoid_end_align.s

Differential D97982

[MC] Introduce NeverAlign fragment type
Needs ReviewPublic

Authored by Amir on Mar 4 2021, 2:29 PM.

Download Raw Diff

Details

Reviewers

skan
MaskRay
RKSimon

Summary

Introduce NeverAlign fragment type.

The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.

In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).

This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.

The use case is different from BoundaryAlign and instruction bundling:

BoundaryAlign can be extended to perform the desired alignment for the

first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.

Instruction bundling: the intent of NeverAlign fragment is to prevent

the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Amir created this revision.Mar 4 2021, 2:29 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 4 2021, 2:29 PM

Amir requested review of this revision.Mar 4 2021, 2:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2021, 2:29 PM

Formatting fixes

Harbormaster completed remote builds in B92161: Diff 328309.Mar 5 2021, 7:36 AM

Harbormaster completed remote builds in B92179: Diff 328332.Mar 5 2021, 9:55 AM

Amir updated this revision to Diff 328583.Mar 5 2021, 10:35 AM

Harbormaster completed remote builds in B92352: Diff 328583.Mar 6 2021, 3:52 AM

Is it the first Bolt upstreaming patch? Good to see it!

Can you update the summary with the semantics of FT_NeverAlign?

Also, I'm not seeing this new type of MCFragment is used anywhere in this patch. Looks like the real usage will come with upcoming changes. I'm wondering if it is possible to move some of the upcoming changes there so that this patch is self-contained and can be tested by a regression test?

hoy added a subscriber: wenlei.Mar 7 2021, 12:57 PM

rafauler added a subscriber: rafauler.Mar 8 2021, 9:20 AM

@hoy, thank you for your suggestions! Yes, it's one of the first BOLT patches, but not the first :)
We've decided with @maksfb to upstream our alignment-for-macrofusion logic as well, will include it into the follow-up revision. This would enable us to produce a test case.

Amir edited the summary of this revision. (Show Details)Mar 10 2021, 12:50 PM

We've considered two options to integrate this functionality into LLVM:

Similar to JCC errata/BoundaryAlign fragment:

Automatically insert NeverAlign into the section on cmp instruction, check if fuseable instruction follows it, and remove the fragment otherwise.

Similar to the current usage in BOLT: https://github.com/facebookincubator/BOLT/blob/68abc968b706b55585b1b8be315aef5d3bf90b1c/bolt/src/BinaryEmitter.cpp#L445

Check basic block instructions in advance, only emit NeverAlign fragment if fuseable sequence is found.

Pros and cons of each approach:

Pro is that macro-op fusion alignment will be applied to inline assembly as well as LLVM-produced MCs, the downside is that I've observed a lot of insertions and removals of NeverAlign fragment, which may hurt processing time.
Pro is that the check and insertion are happening once per basic block, the downside is that we need to integrate the logic for determining the insertion of this fragment higher up in the MCStreamer stack where we see the basic block.

Thoughts?
+CC @skan

The patch should come with with tests.

We probably want to avoid another feature which does complex things in emitInstructionBegin/emitInstructionEnd, so (1) helping just inline asm does not seem a good justification to me. Perhaps (2) should be used.

llvm/include/llvm/MC/MCFragment.h
338	Don’t duplicate function or class name at the beginning of the comment. https://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments
llvm/lib/MC/MCAssembler.cpp
394	The logic is wrong if NextFrag->getContents().size() % NAF.getAlignment() == 0. In that case you want to make the value 0
400	If neither, return 0
407	Store Size % getBackend().getMinimumNopSize() into a variable, check whether it is zero, instead of using a loop which increases size one at once.

In D97982#2652314, @Amir wrote:

We've considered two options to integrate this functionality into LLVM:

Similar to JCC errata/BoundaryAlign fragment:

Automatically insert NeverAlign into the section on cmp instruction, check if fuseable instruction follows it, and remove the fragment otherwise.

Similar to the current usage in BOLT: https://github.com/facebookincubator/BOLT/blob/68abc968b706b55585b1b8be315aef5d3bf90b1c/bolt/src/BinaryEmitter.cpp#L445

Check basic block instructions in advance, only emit NeverAlign fragment if fuseable sequence is found.

Pros and cons of each approach:

Pro is that macro-op fusion alignment will be applied to inline assembly as well as LLVM-produced MCs, the downside is that I've observed a lot of insertions and removals of NeverAlign fragment, which may hurt processing time.

Pro is that the check and insertion are happening once per basic block, the downside is that we need to integrate the logic for determining the insertion of this fragment higher up in the MCStreamer stack where we see the basic block.

Thoughts?
+CC @skan

Some questions.

Is the purpose of this patch to avoid cmp ends and jmp starts exactly at 64b alignment? If it's the only purpose, I think we can reuse the BoundaryAlign fragment by adding a function like "needPadding" in MCAssembler.cpp.
Could you also upstream the code that inserts/remove the NeverAlign fragment and some test cases? Usage can help us determine if the design of the fragment is reasonable.

tschuett added a subscriber: tschuett.Mar 26 2021, 12:56 AM

@MaskRay, @skan, appreciate your comments and suggestions!
@MaskRay:

The patch should come with with tests.

Agree, still trying to find a way to test added functionality outside of BOLT's use case. Thank you for reviewing this patch and BOLT upstreaming repo!

@skan:

Is the purpose of this patch to avoid cmp ends and jmp starts exactly at 64b alignment? If it's the only purpose, I think we can reuse the BoundaryAlign fragment by adding a function like "needPadding" in MCAssembler.cpp.

Yes, that's the purpose. We've looked at BoundaryAlign and aligned instruction bundles, but failed to come up with a simple solution. How do you envision BoundaryAlign avoiding the given alignment boundary falling between cmp and jmp? NeverAlign inserts a one-byte nop before cmp if it would end at a given boundary.

Could you also upstream the code that inserts/remove the NeverAlign fragment and some test cases? Usage can help us determine if the design of the fragment is reasonable.

We seek to upstream LLVM parts required by BOLT before pushing BOLT.
The usage is shown here: https://github.com/facebookincubator/BOLT/blob/68abc968b706b55585b1b8be315aef5d3bf90b1c/bolt/src/BinaryEmitter.cpp#L445
We analyze the basic block for macro-op fusion pairs; if such a pair is found, we insert NeverAlign fragment after the first instruction (cmp) with a call MCStreamer::emitNeverAlignCodeAtEnd(). That's the only external use of the interfaces we add, which makes it non-trivial to test. That's why I'm seeking to add a way to trigger the insertion of NeverAlign fragment (see comment above).

In D97982#2657470, @Amir wrote:

@skan:

Is the purpose of this patch to avoid cmp ends and jmp starts exactly at 64b alignment? If it's the only purpose, I think we can reuse the BoundaryAlign fragment by adding a function like "needPadding" in MCAssembler.cpp.

Yes, that's the purpose. We've looked at BoundaryAlign and aligned instruction bundles, but failed to come up with a simple solution. How do you envision BoundaryAlign avoiding the given alignment boundary falling between cmp and jmp? NeverAlign inserts a one-byte nop before cmp if it would end at a given boundary.

Could you also upstream the code that inserts/remove the NeverAlign fragment and some test cases? Usage can help us determine if the design of the fragment is reasonable.

We seek to upstream LLVM parts required by BOLT before pushing BOLT.
The usage is shown here: https://github.com/facebookincubator/BOLT/blob/68abc968b706b55585b1b8be315aef5d3bf90b1c/bolt/src/BinaryEmitter.cpp#L445
We analyze the basic block for macro-op fusion pairs; if such a pair is found, we insert NeverAlign fragment after the first instruction (cmp) with a call MCStreamer::emitNeverAlignCodeAtEnd(). That's the only external use of the interfaces we add, which makes it non-trivial to test. That's why I'm seeking to add a way to trigger the insertion of NeverAlign fragment (see comment above).

I think most of the logic should be already there for the "NeverAlign" purpose. My concern about the usage of MCStreamer::emitNeverAlignCodeAtEnd() is that it ignores some corner cases( see the check in X86AsmBackend::emitInstructionBegin), which may cause correctness issues. BoundaryAlign fragment is designed for JCC erratum originally, and the logic whether the macro-fused pair crosses the boundary and the length of nop to be inserted lie in the function MCAssembler::relaxBoundaryAlign. As far as I can see, the interface of MCStreamer::emitNeverAlignCodeAtEnd() is unnecessary and simple customization of relaxBoundaryAlign can achive your purpose easily (setting the size of BoundaryAlign fragment to 1 if given alignment boundary falling between macro-fused pair). In addition, reusing the BoundaryAlign can make the test trival and MC/X86/align-branch*.s gives some examples.

pengfei added a subscriber: pengfei.Apr 6 2021, 10:52 PM

@skan: I've experimented with adding customization of BoundaryAlign fragment and conducted some tests to see if it's functionally identical and has similar processing time.
I've implemented two options: automatic alignment (similar to JCC erratum mitigation with -x86-branches-within-32B-boundaries), and client insertion: exposed interfaces to insert BoundaryAlign into OS from the client (identical to current use of NeverAlign fragment in BOLT).
Automatic alignment has noticeably higher overhead than current alignment with NeverAlign, which is expected since there are more fragments inserted (BoundaryAlign fragments are inserted on every cmp/test instruction).
However, even with client insertion (BoundaryAlign fragment is inserted using the same logic as NeverAlign), BoundaryAlign still has higher overhead. One test binary has up to ~40% increase in processing time (1:09.36elapsed for NeverAlign vs 1:38.23 for BoundaryAlign).
The overhead comes from the fact that BoundaryAlign relies on relaxation to determine its size and invalidates all following fragments, while NeverAlign uses simpler logic similar to FT_Align which doesn't trigger invalidation. At this point I don't think we should pursue this direction further as the processing time is important for BOLT. I can submit a diff with automatic alignment for macro-fusion for a review, and it might be of use in a compiler/assembler (maybe as -O3 optimization?), but it won't be used by BOLT.
Thank you for suggesting to try to reuse BoundaryAlign and pointing out various correctness issues handled by BoundaryAlign.

To continue with this review: in order to produce a test case, I'll expose an assembly directive to insert NeverAlign fragment (it's hard to test otherwise outside of BOLT). I'll address the comments and update soon.

Addressed comments by @MaskRay, added X86 assembler directive, added a testcase using the directive

Harbormaster completed remote builds in B101090: Diff 340739.Apr 27 2021, 12:27 AM

Rebased

Amir marked 4 inline comments as done.Apr 27 2021, 3:41 PM

Lost the original diff in the rebase, fixed that

Harbormaster completed remote builds in B101284: Diff 341009.Apr 27 2021, 6:29 PM

Harbormaster completed remote builds in B101299: Diff 341029.Apr 27 2021, 8:05 PM

skan added a reviewer: skan.Apr 28 2021, 12:21 AM

skan added inline comments.Apr 28 2021, 1:52 AM

llvm/include/llvm/MC/MCFragment.h
343–350	If MCNeverAlignFragment only emits nop in your usage scenarios, the members EmitNops, Value and ValueSize can be removed. Otherwise, there may be a lot of untested code.
360–365	Remove these interfaces.
llvm/lib/MC/MCAssembler.cpp
402–404	The summary and comments in the code need to be updated, you insert a minimum-size nop rather than a single one-byte nop, although they are same for most cases.
661–679	Line 611-628 are redundant and untested.
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
4743–4750	Could you add test cases for line 4743-4750?
llvm/test/MC/X86/x86_64-directive-avoid_end_align.s
1 ↗	(On Diff #341029)	Add "--no-show-raw-insn" after llvm-objdump since you do not need to check instruction encoding.
5–6 ↗	(On Diff #341029)	If you remove line 5-6 and change ".nops 59" to ".nops 62", the purpose of the test can be more clear. And you need to check there is the extra nop is inserted only when cmp ends and jmp starts exactly at 64b alignment.
13–16 ↗	(On Diff #341029)	Line 13-16 can be removed.

skan added inline comments.Apr 28 2021, 5:00 AM

llvm/lib/MC/MCAssembler.cpp
396–397	Should this be a `llvm_unreachable`?

Addressed comments by @skan, added test case for errors in .avoid_end_align directive

Amir marked 9 inline comments as done.Apr 28 2021, 5:37 PM

Harbormaster completed remote builds in B101534: Diff 341364.Apr 28 2021, 7:00 PM

skan added inline comments.Apr 28 2021, 7:25 PM

llvm/lib/MC/MCAssembler.cpp
391–392	You can add test case like .L0: .nops 57 int3 .avoid_end_align 64 cmpl $(.L1-.L0), %eax je .L0 .nops 65 .L1: to check the situation where the fragment after NeverAlign is a RelaxableFragment.

Added a test case suggested by @skan with NeverAlign followed by RelaxableFragment

Amir marked an inline comment as done.Apr 28 2021, 11:48 PM

Amir added inline comments.

llvm/lib/MC/MCAssembler.cpp
391–392	Thank you!

Harbormaster completed remote builds in B101569: Diff 341413.Apr 29 2021, 12:30 AM

Amir marked an inline comment as done.Apr 29 2021, 12:32 AM

LGTM

This revision is now accepted and ready to land.Apr 29 2021, 12:33 AM

Shengchen @skan, Fangrui @MaskRay: thank you for detailed comments and suggestions! I don't have commit access, can you please commit this for me?

In D97982#2724942, @Amir wrote:

Shengchen @skan, Fangrui @MaskRay: thank you for detailed comments and suggestions! I don't have commit access, can you please commit this for me?

Better to give other reviewers one or two days to confirm.

Do I need to add some documentation for the new assembler directive?

llvm/lib/MC/MCAssembler.cpp
399	Better reuse isAgainstBoundary

Moved up and reused isAgainstBoundary.

Harbormaster completed remote builds in B102469: Diff 342654.May 4 2021, 12:48 AM

@Amir Let me know if you'd like me to commit this for you.

MaskRay added inline comments.May 4 2021, 12:33 PM

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
4745	No need to mention the directive name. The diagnostic uses caret to highlight to user input.
4749	ditto
4753	The code is self-explanatory

MaskRay requested changes to this revision.May 4 2021, 12:34 PM

MaskRay added inline comments.May 4 2021, 12:34 PM

llvm/test/MC/X86/x86_64-directive-avoid_end_align.s
1 ↗	(On Diff #342654)	Remove `x86_64-` from the filename.

This revision now requires changes to proceed.May 4 2021, 12:34 PM

MaskRay added inline comments.May 4 2021, 12:39 PM

llvm/include/llvm/MC/MCObjectStreamer.h
142	Why "AtEnd"?
llvm/test/MC/X86/x86_64-directive-avoid_end_align_errors.s
1 ↗	(On Diff #342654)	You can use `.ifdef ERR` to place negative tests into the main test file.
5 ↗	(On Diff #342654)	See `[[#@LINE+1]]` in existing tests how to attach line/column information properly.

Ignorant thinking out loud, just because i've been thinking about kinda-similar issue:

Where is this fragment going to be added? The alignment will differ per CPU.
Do fragments only have the start point, or can they specify a range of instructions? If they can, then NeverAlign seems contrived to me. Perhaps this should instead be generalized into something like "ensure that this group of instructions doesn't cross ?-byte alignment"?

llvm/include/llvm/MC/MCStreamer.h
838	emit where?

@lebedev.ri:

Where is this fragment going to be added? The alignment will differ per CPU.

This fragment is inserted before first instruction in macro-fusion pair (cmp instruction in cmp+jcc pair). Modern Intel Cores have a macro-fusion restriction that cmp+jcc shouldn't be split by a cache line boundary. It's OK to for cmp instruction to cross cache line boundary. Not all X86 cores have macro-fusion or this restriction, so the insertion policy and alignment is up to the MC client (BOLT or assembly programmer).

Do fragments only have the start point, or can they specify a range of instructions? If they can, then NeverAlign seems contrived to me. Perhaps this should instead be generalized into something like "ensure that this group of instructions doesn't cross ?-byte alignment"?

This fragment only looks at the subsequent fragment (instruction) in the stream. It doesn't make sense for us to generalize to a range of instructions. There's BoundaryAlign fragment type that can handle alignment boundary crossing for macro-fusion pairs for Intel JCC erratum mitigation, or BundleAlign fragment that can align a group of instructions to the specified alignment (left, right alignment inside bundle, introduced by PNaCl to ensure control-flow integrity). These fragments are more general than NeverAlign, and come with a higher overhead for the client.

llvm/include/llvm/MC/MCObjectStreamer.h
142	"End" is the end of the subsequent fragment that we want _not_ be at a given boundary. Can be replaced with (arguably) more descriptive like `emitAvoidEndAlign`, similar to directive name. What do you think?
llvm/include/llvm/MC/MCStreamer.h
838	Thanks for flagging! The wording might be a bit misleading. It'd be more clear to state it as "If the end of the fragment following this NeverAlign fragment ever gets aligned to \p ByteAlignment, this fragment emits a single nop before the following fragment to break this end-alignment."

Addressed comments by @MaskRay and @lebedev.ri:

Directive parsing error messages
Combined tests into single file
Reworded comment for emitNeverAlignCodeAtEnd

Amir marked 5 inline comments as done.May 4 2021, 3:45 PM

Harbormaster completed remote builds in B102629: Diff 342888.May 4 2021, 4:35 PM

MaskRay added a reviewer: RKSimon.May 4 2021, 5:42 PM

skan mentioned this in D101817: [MC][X86] Automatic alignment for Macro-Op Fusion.May 4 2021, 7:49 PM

In D97982#2737483, @Amir wrote:

@lebedev.ri:

Where is this fragment going to be added? The alignment will differ per CPU.

This fragment is inserted before first instruction in macro-fusion pair (cmp instruction in cmp+jcc pair).

That much was already clear :)

Modern Intel Cores have a macro-fusion restriction that cmp+jcc shouldn't be split by a cache line boundary. It's OK to for cmp instruction to cross cache line boundary. Not all X86 cores have macro-fusion or this restriction,

so the insertion policy and alignment is up to the MC client (BOLT or assembly programmer).

But this problem exists regardless of BOLT, so shouldn't this also happen without BOLT?

Do fragments only have the start point, or can they specify a range of instructions? If they can, then NeverAlign seems contrived to me. Perhaps this should instead be generalized into something like "ensure that this group of instructions doesn't cross ?-byte alignment"?

This fragment only looks at the subsequent fragment (instruction) in the stream. It doesn't make sense for us to generalize to a range of instructions. There's BoundaryAlign fragment type that can handle alignment boundary crossing for macro-fusion pairs for Intel JCC erratum mitigation, or BundleAlign fragment that can align a group of instructions to the specified alignment (left, right alignment inside bundle, introduced by PNaCl to ensure control-flow integrity). These fragments are more general than NeverAlign, and come with a higher overhead for the client.

I understand that, i just think generalizing a bit may result in a somewhat more generally useful solution.

@lebedev.ri:

Where is this fragment going to be added? The alignment will differ per CPU.

Sorry, didn't understand this question then. Do you mean handling of X86 subtargets?

But this problem exists regardless of BOLT, so shouldn't this also happen without BOLT?

Yes, the issue exists regardless of BOLT. Clang currently doesn't ensure the alignment restriction of macro-fusion. I'm proposing a mechanism in D101817 to address that inside Streamer, similar to JCC erratum mitigation. The mechanism can potentially be enabled by the driver in -O3 mode.
But this diff exposes the fragment type used by BOLT where we see entire basic block in advance and control the emitting, so don't have to incur overheads of automatic alignment insertion.

I understand that, i just think generalizing a bit may result in a somewhat more generally useful solution.

What exactly do you think is worth generalizing? If you mean automatic macro-fusion alignment insertion, which would cover both MC clients (BOLT) and compiler, it's proposed in D101817. But unfortunately the overhead in BOLT use case is significantly higher than with NeverAlign fragment insertion.

RKSimon removed a reviewer: llvm-commits.May 5 2021, 4:01 AM

RKSimon added a subscriber: llvm-commits.

@RKSimon, @skan, @MaskRay, @lebedev.ri: I've addressed all comments and suggestions. I suggest that we limit this discussion to NeverAlign fragment, and direct all questions and comments regarding automatic macro-fusion alignment to D101817.
Please let me know if there's anything to change here, otherwise I'd gently ask you to approve.

Ping

@skan, @MaskRay: it's been three weeks with no updates. Can we commit this diff?

In D97982#2778440, @Amir wrote:

@skan, @MaskRay: it's been three weeks with no updates. Can we commit this diff?

LGTM and I already accepted this patch before. @MaskRay requested change on this, he may have more concern.

This is performance related. I hope folks like @RKSimon @lebedev.ri can comment.

I've made some code suggestions.

llvm/include/llvm/MC/MCFragment.h
37	Perhaps the name should include `X86`? Inserting a one-byte nop isn't something any RISC arch can do.
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
4733	Are you using the assembly syntax `.avoid_end_align` in tools? Or is it for testing purpose only?
4737	Delete
4743	`if (parseEOL())`
4748	expected a positive alignment
4753	delete blank line

Addressed comments

Amir marked 4 inline comments as done.May 26 2021, 1:30 PM

Amir added inline comments.

llvm/include/llvm/MC/MCFragment.h
37	NeverAlign fragment emits a minimum-size nop, so the usage is not limited to X86 and can potentially be extended to architectures with fixed-length instructions. Modern ARM and RISC-V cores also have macro-fusion and may also face similar cache line crossing restrictions.
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
4733	It's only for testing purposes.

MaskRay added inline comments.May 26 2021, 1:31 PM

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
4733	Add a comment to make clear this is not intended for general usage.

Added a comment re: avoid_end_align directive usage

Amir marked an inline comment as done.May 26 2021, 1:52 PM

Harbormaster completed remote builds in B106372: Diff 348083.May 26 2021, 2:24 PM

RKSimon resigned from this revision.May 27 2021, 2:46 AM

Rebase

@lebedev.ri: can you please weigh in on this diff?

Harbormaster completed remote builds in B107344: Diff 349409.Jun 2 2021, 4:44 PM

Amir removed reviewers: hoy, grosbach.Jun 3 2021, 2:30 PM

@MaskRay: we didn't get more comments in the last two weeks. @skan is the author of a related functionality (BoundaryAlign) and he has accepted this diff. Do you think we can proceed with it?

@lebedev.ri @reames Could you weigh in from performance perspective?

And tag @SjoerdMeijer on non-x86 usability.

In D97982#2806598, @MaskRay wrote:

@lebedev.ri @reames Could you weigh in from performance perspective?

I do agree that this is needed at least for intel (not quite sure about amd).

And tag @SjoerdMeijer on non-x86 usability.

efriedma added a subscriber: efriedma.Jun 8 2021, 4:17 PM

efriedma added inline comments.

llvm/lib/MC/MCAssembler.cpp
389	Recomputing the size of the fragment here is suspicious. For other fragments, computeFragmentSize is independent from the layout of the section. If the size needs to be changed based on the layout, it's mutated by MCAssembler::layoutSectionOnce. I realize FT_Org doesn't follow this rule, but that's not a great example to follow. The following currently crashes: foo: .org bar bar:

efriedma added inline comments.Jun 8 2021, 4:32 PM

llvm/lib/MC/MCAssembler.cpp
389	My biggest concern with the current approach is the potential for an infinite loop. There isn't any protection against the following fragment's offset moving backwards, which could lead to infinitely oscillating fragment sizes.

efriedma added inline comments.Jun 8 2021, 5:14 PM

llvm/lib/MC/MCAssembler.cpp
389	Thought a little more. Probably the simplest way to ensure that computeFragmentSize() is sane is to establish the following rule: it should never examine fragments after the current fragment in the section. If we logically need to examine any fragment after the current fragment, we need to do that using relaxation, inside MCAssembler::layoutSectionOnce. This means we can compute the "current" layout using a single pass computeFragmentSize() calls. Under that rule, FT_Align is fine because it only cares about its own offset. The FT_Org code crashes because it tries to look at fragments after itself. FT_NeverAlign is... marginal. The size of an MCDataFragment is fixed, so that doesn't really count as looking forward. But I suspect MCRelaxableFragment doesn't work right: if we do end up relaxing the fragment, I'm not sure we recompute everything we need to.

Amir added inline comments.Jun 8 2021, 7:10 PM

llvm/lib/MC/MCAssembler.cpp
389	@efriedma: do you have a specific corner case in mind? Wouldn’t it be covered by a test case with MCRelaxableFragment following NA fragment?

efriedma added inline comments.Jun 8 2021, 9:01 PM

llvm/lib/MC/MCAssembler.cpp
389	You'd want a testcase where there's an MCRelaxableFragment that gets relaxed after we computeFragmentSize() access. Something like a forward jump more than 128 bytes. Then make sure we chose the right size for the MCNeverAlignFragment. It looks like in your regression test, the fragment after is always an MCDataFragment?

Amir added inline comments.Jun 8 2021, 9:55 PM

llvm/lib/MC/MCAssembler.cpp
389	Let me experiment with the idea. The second test is a NeverAlign followed by MCRelaxableFragment (cmpl insn). Actually we wanted to avoid computing the size using relaxation (as BoundaryAlign does) because it slows down the layout and appears unnecessary if we can handle the RelaxableFragment following NA.

@efriedma:
Addressed the concern that NeverAlign may introduce infinite loops in layout,
checked the case if there's an MCRelaxableFragment that gets relaxed after
we invoke computeFragmentSize().

Added the test case confirming that this case is handled correctly:

NeverAlign is added before cmp+jcc, initially no padding.
cmp+jcc are both relaxable, initially short form and are split by an alignment boundary,
Which triggers NeverAlign padding, one byte is added before cmp+jcc.
Which triggers jcc relaxation (pushes out jcc to long form)
Which triggers cmp relaxation (pushes out cmp to long form)
Which makes NeverAlign redundant as cmp+jcc are not longer split by an alignment boundary.
Which disables NeverAlign padding (0 bytes)
Which in theory could shrink cmp+jcc to short forms causing an infinite loop, but as long as relaxation only increases instruction sizes, it converges at this point, leaving cmp and jcc in relaxed form.

Harbormaster completed remote builds in B108538: Diff 351058.Jun 9 2021, 10:29 PM

Which disables NeverAlign padding (0 bytes)

When do we actually do this computation? At first glance, MCAssembler::layoutSectionOnce never actually goes back to recompute the size of the NeverAlign padding.

Which in theory could shrink cmp+jcc to short forms causing an infinite loop, but as long as relaxation only increases instruction sizes, it converges at this point, leaving cmp and jcc in relaxed form.

That seems right.

In D97982#2811315, @efriedma wrote:

Which disables NeverAlign padding (0 bytes)

When do we actually do this computation? At first glance, MCAssembler::layoutSectionOnce never actually goes back to recompute the size of the NeverAlign padding.

There's one invocation of computeFragmentSize which would set the final size of NeverAlign fragment in MCAssembler::finishLayout, which is performed after layoutSectionOnce/layoutOnce is done.

In D97982#2811362, @Amir wrote:

In D97982#2811315, @efriedma wrote:

Which disables NeverAlign padding (0 bytes)

When do we actually do this computation? At first glance, MCAssembler::layoutSectionOnce never actually goes back to recompute the size of the NeverAlign padding.

There's one invocation of computeFragmentSize which would set the final size of NeverAlign fragment in MCAssembler::finishLayout, which is performed after layoutSectionOnce/layoutOnce is done.

That's not going to work out in general. Removing the NeverAlign padding can increase the distance between two symbols if .palign is involved. That might require relaxation.

In D97982#2811382, @efriedma wrote:

That's not going to work out in general. Removing the NeverAlign padding can increase the distance between two symbols if .palign is involved. That might require relaxation.

Do you mean p2align? Yes, removing the NeverAlign padding can increase the distance between two symbols. Let me check if that causes a problem.

In D97982#2806598, @MaskRay wrote:

@lebedev.ri @reames Could you weigh in from performance perspective?

JFYI, I no longer have access to the infrastructure used to test previous patches in this area. You could consider asking @skatkov if needed.

Drive by thought, not intended to be blocking.

Reading over the description, I'm left wondering why not treat this as a bundling problem? We have precedent for bundles of instructions which need to not be split across an alignment boundary. Why not simply say that the test/jcc are are in a two instruction bundle which can't cross the boundary? In fact, shouldn't this be able to reuse the existing boundary align mechanism pretty much exactly? It seems like the same basic problem, just for a different use case.

Or maybe I'm misunderstanding the problem. Do you get good performance if one of the instructions starts in the first cache line, but ends in the second? That doesn't match my memory of the performance characteristics, but I don't have that fully loaded any more either. That seems to be the difference between the two approaches right?

If that is the difference, and it's intention, I'd suggest updating the commit message to be really explicit about that being the desired behavior in the edge case.

Added Experiment B to address @efriedma comment.

Harbormaster completed remote builds in B108941: Diff 351623.Jun 11 2021, 8:32 PM

In D97982#2811382, @efriedma wrote:

Removing the NeverAlign padding can increase the distance between two symbols if .palign is involved. That might require relaxation.

I've tried to reproduce this situation, and came up with the following test case: see experiment B in llvm/test/MC/X86/directive-avoid_end_align.s.
It's a bit hard to follow, but please bear with me:
Experiment A:

NeverAlign is added before cmp+jcc, initially no padding.
cmp+jcc are both relaxable, initially short form and are split by an alignment boundary,
Which triggers NeverAlign padding, one byte is added before cmp+jcc.
Which triggers jcc relaxation (pushes out jcc to long form)
Which triggers cmp relaxation (pushes out cmp to long form)
Which makes NeverAlign redundant as cmp+jcc are not longer split by an alignment boundary.
Which disables NeverAlign padding (0 bytes)
Which in theory could shrink cmp+jcc to short forms causing an infinite loop, but as long as relaxation only increases instruction sizes, it converges at this point, leaving cmp and jcc in relaxed form.

Experiment B:

when NeverAlign padding is removed (step 7 in exp A),
causing an increased distance between two symbols (symbols .L53-.L54),
which might require relaxation of cmp instruction at .L52 (imm operand goes from -128 to -129),

this doesn't cause in an incorrect result - cmp instruction at L52 is of correct (relaxed) format and has a correct operand (-129, corresponding to removed NeverAlign padding).
I've made a slideshow to highlight what I think is happening: https://drive.google.com/file/d/1ORpeGj9iK6q4NUi0AOFZW3poAkn-Q-tH/view?usp=sharing

When do we actually do this computation? At first glance, MCAssembler::layoutSectionOnce never actually goes back to recompute the size of the NeverAlign padding.

I'm not sure of the exact mechanism. It might be that layoutSectionOnce/layoutOnce actually recomputes the size of the NeverAlign padding (indirectly). I can try to debug what's going on, but relaxation is a black box to me.

In D97982#2813799, @reames wrote:

Drive by thought, not intended to be blocking.

Reading over the description, I'm left wondering why not treat this as a bundling problem? We have precedent for bundles of instructions which need to not be split across an alignment boundary. Why not simply say that the test/jcc are are in a two instruction bundle which can't cross the boundary? In fact, shouldn't this be able to reuse the existing boundary align mechanism pretty much exactly? It seems like the same basic problem, just for a different use case.

Or maybe I'm misunderstanding the problem. Do you get good performance if one of the instructions starts in the first cache line, but ends in the second? That doesn't match my memory of the performance characteristics, but I don't have that fully loaded any more either. That seems to be the difference between the two approaches right?

If that is the difference, and it's intention, I'd suggest updating the commit message to be really explicit about that being the desired behavior in the edge case.

The use case is different: the intent is to prevent the first instruction in a pair ending at a given alignment boundary, by inserting at most one byte. It's OK if either instruction crosses the cache line. The performance metric for this alignment is whether macro-op fusion is performed or not. BOLT aggressively removes nop padding to improve icache/iTLB utilization, so padding both instructions using bundles to not cross the cache line would go against this goal. Performance gain from enabled macro-op fusion in this rare case would be negated by code size increase due to an excessive padding. There's no straightforward way to request instruction bundling to avoid a given end alignment for the first instruction in the bundle.

Following the suggestion by @skan I've experimented with re-purposing BoundaryAlign (D101817) and achieved the desired alignment but this approach has more overhead due to reliance on relaxation (as BoundaryAlign requires in the general case) - see https://reviews.llvm.org/D97982#2710638.

So neither instruction bundling nor BoundaryAlign satisfy our functional and overhead requirements, that NeverAlign addresses. I'll this information to the commit message.

In D97982#2811315, @efriedma wrote:

Which disables NeverAlign padding (0 bytes)

When do we actually do this computation? At first glance, MCAssembler::layoutSectionOnce never actually goes back to recompute the size of the NeverAlign padding.

So it's the case that layoutSectionOnce/layoutOnce actually recomputes the size of the NeverAlign padding indirectly:

Inside layoutSectionOnce, when it relaxes the fragment (relaxFragment), if the fragment is FT_Relaxable, it calls relaxInstruction, which in turn checks fragmentNeedsRelaxation for the passed fragment F, which then goes over F->getFixups, calling evaluateFixup, and so on through Layout.getSymbolOffset(Sym) and Layout.getFragmentOffset, inside the ensureValid check, calls layoutFragment, which then computes fragment size (computeFragmentSize):

(gdb) l
382           return 0;
383         return Size;
384       }
385
386       case MCFragment::FT_NeverAlign: {
>387         const MCNeverAlignFragment &NAF = cast<MCNeverAlignFragment>(F);
388         const MCFragment *NF = F.getNextNode();
389         uint64_t Offset = Layout.getFragmentOffset(&NAF);
390         size_t NextFragSize = 0;
391         if (const auto *NextFrag = dyn_cast<MCRelaxableFragment>(NF)) {
(gdb) bt
#0  llvm::MCAssembler::computeFragmentSize (this=0x10499e0, Layout=..., F=...) at /home/aaupov/local/llvm-project/llvm/lib/MC/MCAssembler.cpp:387
#1  0x0000000000503e54 in llvm::MCAsmLayout::layoutFragment (this=0x7fffffffc4e0, F=0x104e770)
    at /home/aaupov/local/llvm-project/llvm/lib/MC/MCAssembler.cpp:470
#2  0x0000000000553b8f in llvm::MCAsmLayout::ensureValid (this=0x7fffffffc4e0, F=0x104e9a0)
    at /home/aaupov/local/llvm-project/llvm/lib/MC/MCFragment.cpp:91
#3  0x0000000000553bc7 in llvm::MCAsmLayout::getFragmentOffset (this=0x7fffffffc4e0, F=0x104e9a0)
    at /home/aaupov/local/llvm-project/llvm/lib/MC/MCFragment.cpp:97
#4  0x0000000000553ce7 in getLabelOffset (Layout=..., S=..., ReportError=true, Val=@0x7fffffffbc68: 17057240)
    at /home/aaupov/local/llvm-project/llvm/lib/MC/MCFragment.cpp:111
#5  0x0000000000553d79 in getSymbolOffsetImpl (Layout=..., S=..., ReportError=true, Val=@0x7fffffffbc68: 17057240)
    at /home/aaupov/local/llvm-project/llvm/lib/MC/MCFragment.cpp:118
#6  0x0000000000553fa8 in llvm::MCAsmLayout::getSymbolOffset (this=0x7fffffffc4e0, S=...)
    at /home/aaupov/local/llvm-project/llvm/lib/MC/MCFragment.cpp:154
#7  0x00000000005030e2 in llvm::MCAssembler::evaluateFixup (this=0x10499e0, Layout=..., Fixup=..., DF=0x104e470, Target=...,
    Value=@0x7fffffffbdb8: 18446744073709551615, WasForced=@0x7fffffffbdb7: false) at /home/aaupov/local/llvm-project/llvm/lib/MC/MCAssembler.cpp:258
#8  0x00000000005067ff in llvm::MCAssembler::fixupNeedsRelaxation (this=0x10499e0, Fixup=..., DF=0x104e470, Layout=...)
    at /home/aaupov/local/llvm-project/llvm/lib/MC/MCAssembler.cpp:1026
#9  0x0000000000506994 in llvm::MCAssembler::fragmentNeedsRelaxation (this=0x10499e0, F=0x104e470, Layout=...)
    at /home/aaupov/local/llvm-project/llvm/lib/MC/MCAssembler.cpp:1045
#10 0x0000000000506a2d in llvm::MCAssembler::relaxInstruction (this=0x10499e0, Layout=..., F=...)
    at /home/aaupov/local/llvm-project/llvm/lib/MC/MCAssembler.cpp:1055
#11 0x0000000000507664 in llvm::MCAssembler::relaxFragment (this=0x10499e0, Layout=..., F=...)
    at /home/aaupov/local/llvm-project/llvm/lib/MC/MCAssembler.cpp:1241
#12 0x00000000005077f2 in llvm::MCAssembler::layoutSectionOnce (this=0x10499e0, Layout=..., Sec=...)
    at /home/aaupov/local/llvm-project/llvm/lib/MC/MCAssembler.cpp:1270

Amir edited the summary of this revision. (Show Details)Jun 14 2021, 3:44 PM

Amir marked 4 inline comments as done.

In D97982#2818213, @Amir wrote:

In D97982#2811315, @efriedma wrote:

Which disables NeverAlign padding (0 bytes)

When do we actually do this computation? At first glance, MCAssembler::layoutSectionOnce never actually goes back to recompute the size of the NeverAlign padding.

So it's the case that layoutSectionOnce/layoutOnce actually recomputes the size of the NeverAlign padding indirectly:

Inside layoutSectionOnce, when it relaxes the fragment (relaxFragment), if the fragment is FT_Relaxable, it calls relaxInstruction, which in turn checks fragmentNeedsRelaxation for the passed fragment F, which then goes over F->getFixups, calling evaluateFixup, and so on through Layout.getSymbolOffset(Sym) and Layout.getFragmentOffset, inside the ensureValid check, calls layoutFragment, which then computes fragment size (computeFragmentSize):

That looks really fragile: it's depending on subtle details of the way the LastValidFragment cache works. I'd like to ensure the interaction here is more obviously correct.

@efriedma:

That looks really fragile: it's depending on subtle details of the way the LastValidFragment cache works. I'd like to ensure the interaction here is more obviously correct.

NeverAlign functionality doesn't depend on the details of valid fragment caching. I've followed the debugger a bit more thoughtfully and I see that NeverAlign's fragment size is being updated on a relaxation of subsequent fragment here in MCAsmLayout::layoutFragment: https://github.com/llvm/llvm-project/blob/main/llvm/lib/MC/MCAssembler.cpp#L412.

As a big picture: we're trying to avoid using relaxation at all costs, because macro-op fusion alignment fragments are inserted before every eligible cmp+jcc fragment, which is pretty much every basic block. It's known to significantly increase layout time for us. It's easier to reason about this padding in terms of relaxation, but if it can be avoided, it should be, for practical reasons.

Probably the simplest way to ensure that computeFragmentSize() is sane is to establish the following rule: it should never examine fragments after the current fragment in the section. If we logically need to examine any fragment after the current fragment, we need to do that using relaxation, inside MCAssembler::layoutSectionOnce. This means we can compute the "current" layout using a single pass computeFragmentSize() calls.

I checked the implementation here too, and, indeed, when there is a RelaxableFragment that was relaxed, it gets invalidated by marking "last valid fragment" as its predecessor.

This forces the assembler to call "layoutFragment" on that exact relaxable fragment, which in turn will always ask the predecessor to compute its size.

So the bottom line is: if you have a fragment like neveralign that depends on the size of its immediate successor, that's fine, but if it depends on information of any other successor, then, and only then, you need to implement it as a relaxable fragment.

@efriedma: did you have a chance to look at test cases and comments above?

NeverAlign functionality doesn't depend on the details of valid fragment caching. I've followed the debugger a bit more thoughtfully and I see that NeverAlign's fragment size is being updated on a relaxation of subsequent fragment here in MCAsmLayout::layoutFragment: https://github.com/llvm/llvm-project/blob/main/llvm/lib/MC/MCAssembler.cpp#L412.

Oh, hmm, I see. I guess in that case, it should be reliable; I'm okay with leaving it the way it is. Maybe add a comment to computeFragmentSize explaining that this is happening, though.

Added a comment in computeFragmentSize with the mechanism how NeverAlign
fragment size is updated on a relaxation of the successor fragment.

Harbormaster completed remote builds in B110739: Diff 354124.Jun 23 2021, 6:59 PM

@efriedma: thanks for a good inquisitive review and bringing up the question of NA size recomputation. Would you care to accept this diff if you think all concerns are addressed?

Ping @efriedma, @MaskRay

There were no new unaddressed comments or issues regarding this diff. Can it be accepted and landed? @skan

skan accepted this revision.Jul 6 2021, 3:11 PM

Sorry for being slow on the review, but you may have noticed that people have been reluctant to accept this, likely because they are unsure whether NeverAlign is the correct design here. So hope you will not land this as is (This is what the state "Needs Review" intends).

@MaskRay: thank you for getting back to it. I understand the review process, but without an actionable feedback it's hard to move forward. I also understand the fatigue of this back and forth for reviewers. Thus I appreciate your getting back to this diff.
Agree that this solution may appear complicated and redundant wrt other (existing, simpler) mechanisms. However, the drawbacks of other solutions were discussed here and are outlined in the summary. Honest question: what other collateral do you think could show it's the right design?

Matt added a subscriber: Matt.Jul 13 2021, 7:10 AM

spupyrev mentioned this in rG6d0528636ae5: Rebase: [Facebook] [MC] Introduce NeverAlign fragment type.Jul 11 2022, 9:33 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

MC/

MCFragment.h

16 lines

MCObjectStreamer.h

1 line

MCStreamer.h

5 lines

lib/

MC/

116 lines

12 lines

4 lines

1 line

Target/

X86/

AsmParser/

X86AsmParser.cpp

25 lines

test/

MC/

X86/

directive-avoid_end_align.s

208 lines

Diff 354124

llvm/include/llvm/MC/MCFragment.h

Show All 28 Lines
class MCSymbol;		class MCSymbol;

class MCFragment : public ilist_node_with_parent<MCFragment, MCSection> {		class MCFragment : public ilist_node_with_parent<MCFragment, MCSection> {
friend class MCAsmLayout;		friend class MCAsmLayout;

public:		public:
enum FragmentType : uint8_t {		enum FragmentType : uint8_t {
FT_Align,		FT_Align,
		FT_NeverAlign,
		MaskRayUnsubmitted Done Reply Inline Actions Perhaps the name should include `X86`? Inserting a one-byte nop isn't something any RISC arch can do. MaskRay: Perhaps the name should include `X86`? Inserting a one-byte nop isn't something any RISC arch…
		AmirAuthorUnsubmitted Done Reply Inline Actions NeverAlign fragment emits a minimum-size nop, so the usage is not limited to X86 and can potentially be extended to architectures with fixed-length instructions. Modern ARM and RISC-V cores also have macro-fusion and may also face similar cache line crossing restrictions. Amir: NeverAlign fragment emits a minimum-size nop, so the usage is not limited to X86 and can…
FT_Data,		FT_Data,
FT_CompactEncodedInst,		FT_CompactEncodedInst,
FT_Fill,		FT_Fill,
FT_Nops,		FT_Nops,
FT_Relaxable,		FT_Relaxable,
FT_Org,		FT_Org,
FT_Dwarf,		FT_Dwarf,
FT_DwarfFrame,		FT_DwarfFrame,
▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	public:
bool hasEmitNops() const { return EmitNops; }		bool hasEmitNops() const { return EmitNops; }
void setEmitNops(bool Value) { EmitNops = Value; }		void setEmitNops(bool Value) { EmitNops = Value; }

static bool classof(const MCFragment *F) {		static bool classof(const MCFragment *F) {
return F->getKind() == MCFragment::FT_Align;		return F->getKind() == MCFragment::FT_Align;
}		}
};		};

		class MCNeverAlignFragment : public MCFragment {
		/// The alignment the end of the next fragment should avoid.
		MaskRayUnsubmitted Done Reply Inline Actions Don’t duplicate function or class name at the beginning of the comment. https://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments MaskRay: Don’t duplicate function or class name at the beginning of the comment. https://llvm.
		unsigned Alignment;

		public:
		MCNeverAlignFragment(unsigned Alignment, MCSection *Sec = nullptr)
		: MCFragment(FT_NeverAlign, false, Sec), Alignment(Alignment) {}

		unsigned getAlignment() const { return Alignment; }

		static bool classof(const MCFragment *F) {
		return F->getKind() == MCFragment::FT_NeverAlign;
		}
		};
		skanUnsubmitted Done Reply Inline Actions If MCNeverAlignFragment only emits nop in your usage scenarios, the members EmitNops, Value and ValueSize can be removed. Otherwise, there may be a lot of untested code. skan: If MCNeverAlignFragment only emits nop in your usage scenarios, the members EmitNops, Value and…

class MCFillFragment : public MCFragment {		class MCFillFragment : public MCFragment {
uint8_t ValueSize;		uint8_t ValueSize;
/// Value to use for filling bytes.		/// Value to use for filling bytes.
uint64_t Value;		uint64_t Value;
/// The number of bytes to insert.		/// The number of bytes to insert.
const MCExpr &NumValues;		const MCExpr &NumValues;

/// Source location of the directive that this fragment was created for.		/// Source location of the directive that this fragment was created for.
SMLoc Loc;		SMLoc Loc;

public:		public:
MCFillFragment(uint64_t Value, uint8_t VSize, const MCExpr &NumValues,		MCFillFragment(uint64_t Value, uint8_t VSize, const MCExpr &NumValues,
SMLoc Loc, MCSection *Sec = nullptr)		SMLoc Loc, MCSection *Sec = nullptr)
: MCFragment(FT_Fill, false, Sec), ValueSize(VSize), Value(Value),		: MCFragment(FT_Fill, false, Sec), ValueSize(VSize), Value(Value),
		skanUnsubmitted Done Reply Inline Actions Remove these interfaces. skan: Remove these interfaces.
NumValues(NumValues), Loc(Loc) {}		NumValues(NumValues), Loc(Loc) {}

uint64_t getValue() const { return Value; }		uint64_t getValue() const { return Value; }
uint8_t getValueSize() const { return ValueSize; }		uint8_t getValueSize() const { return ValueSize; }
const MCExpr &getNumValues() const { return NumValues; }		const MCExpr &getNumValues() const { return NumValues; }

SMLoc getLoc() const { return Loc; }		SMLoc getLoc() const { return Loc; }

▲ Show 20 Lines • Show All 258 Lines • Show Last 20 Lines

llvm/include/llvm/MC/MCObjectStreamer.h

Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	public:
void emitBundleLock(bool AlignToEnd) override;		void emitBundleLock(bool AlignToEnd) override;
void emitBundleUnlock() override;		void emitBundleUnlock() override;
void emitBytes(StringRef Data) override;		void emitBytes(StringRef Data) override;
void emitValueToAlignment(unsigned ByteAlignment, int64_t Value = 0,		void emitValueToAlignment(unsigned ByteAlignment, int64_t Value = 0,
unsigned ValueSize = 1,		unsigned ValueSize = 1,
unsigned MaxBytesToEmit = 0) override;		unsigned MaxBytesToEmit = 0) override;
void emitCodeAlignment(unsigned ByteAlignment,		void emitCodeAlignment(unsigned ByteAlignment,
unsigned MaxBytesToEmit = 0) override;		unsigned MaxBytesToEmit = 0) override;
		void emitNeverAlignCodeAtEnd(unsigned ByteAlignment) override;
		MaskRayUnsubmitted Done Reply Inline Actions Why "AtEnd"? MaskRay: Why "AtEnd"?
		AmirAuthorUnsubmitted Done Reply Inline Actions "End" is the end of the subsequent fragment that we want _not_ be at a given boundary. Can be replaced with (arguably) more descriptive like `emitAvoidEndAlign`, similar to directive name. What do you think? Amir: "End" is the end of the subsequent fragment that we want _not_ be at a given boundary. Can be…
void emitValueToOffset(const MCExpr *Offset, unsigned char Value,		void emitValueToOffset(const MCExpr *Offset, unsigned char Value,
SMLoc Loc) override;		SMLoc Loc) override;
void emitDwarfLocDirective(unsigned FileNo, unsigned Line, unsigned Column,		void emitDwarfLocDirective(unsigned FileNo, unsigned Line, unsigned Column,
unsigned Flags, unsigned Isa,		unsigned Flags, unsigned Isa,
unsigned Discriminator,		unsigned Discriminator,
StringRef FileName) override;		StringRef FileName) override;
void emitDwarfAdvanceLineAddr(int64_t LineDelta, const MCSymbol *LastLabel,		void emitDwarfAdvanceLineAddr(int64_t LineDelta, const MCSymbol *LastLabel,
const MCSymbol *Label,		const MCSymbol *Label,
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/include/llvm/MC/MCStreamer.h

Show First 20 Lines • Show All 828 Lines • ▼ Show 20 Lines	public:
/// \param ByteAlignment - The alignment to reach. This must be a power of		/// \param ByteAlignment - The alignment to reach. This must be a power of
/// two on some targets.		/// two on some targets.
/// \param MaxBytesToEmit - The maximum numbers of bytes to emit, or 0. If		/// \param MaxBytesToEmit - The maximum numbers of bytes to emit, or 0. If
/// the alignment cannot be reached in this many bytes, no bytes are		/// the alignment cannot be reached in this many bytes, no bytes are
/// emitted.		/// emitted.
virtual void emitCodeAlignment(unsigned ByteAlignment,		virtual void emitCodeAlignment(unsigned ByteAlignment,
unsigned MaxBytesToEmit = 0);		unsigned MaxBytesToEmit = 0);

		/// If the end of the fragment following this NeverAlign fragment ever gets
		/// aligned to \p ByteAlignment, this fragment emits a single nop before the
		lebedev.riUnsubmitted Done Reply Inline Actions emit where? lebedev.ri: emit where?
		AmirAuthorUnsubmitted Done Reply Inline Actions Thanks for flagging! The wording might be a bit misleading. It'd be more clear to state it as "If the end of the fragment following this NeverAlign fragment ever gets aligned to \p ByteAlignment, this fragment emits a single nop before the following fragment to break this end-alignment." Amir: Thanks for flagging! The wording might be a bit misleading. It'd be more clear to state it as…
		/// following fragment to break this end-alignment.
		virtual void emitNeverAlignCodeAtEnd(unsigned ByteAlignment);

/// Emit some number of copies of \p Value until the byte offset \p		/// Emit some number of copies of \p Value until the byte offset \p
/// Offset is reached.		/// Offset is reached.
///		///
/// This is used to implement assembler directives such as .org.		/// This is used to implement assembler directives such as .org.
///		///
/// \param Offset - The offset to reach. This may be an expression, but the		/// \param Offset - The offset to reach. This may be an expression, but the
/// expression must be associated with the current section.		/// expression must be associated with the current section.
/// \param Value - The value to use when filling bytes.		/// \param Value - The value to use when filling bytes.
▲ Show 20 Lines • Show All 276 Lines • Show Last 20 Lines

llvm/lib/MC/MCAssembler.cpp

Show First 20 Lines • Show All 280 Lines • ▼ Show 20 Lines	bool MCAssembler::evaluateFixup(const MCAsmLayout &Layout,
if (IsResolved && getBackend().shouldForceRelocation(*this, Fixup, Target)) {		if (IsResolved && getBackend().shouldForceRelocation(*this, Fixup, Target)) {
IsResolved = false;		IsResolved = false;
WasForced = true;		WasForced = true;
}		}

return IsResolved;		return IsResolved;
}		}

		/// Check if the branch crosses the boundary.
		///
		/// \param StartAddr start address of the fused/unfused branch.
		/// \param Size size of the fused/unfused branch.
		/// \param BoundaryAlignment alignment requirement of the branch.
		/// \returns true if the branch cross the boundary.
		static bool mayCrossBoundary(uint64_t StartAddr, uint64_t Size,
		Align BoundaryAlignment) {
		uint64_t EndAddr = StartAddr + Size;
		return (StartAddr >> Log2(BoundaryAlignment)) !=
		((EndAddr - 1) >> Log2(BoundaryAlignment));
		}

		/// Check if the branch is against the boundary.
		///
		/// \param StartAddr start address of the fused/unfused branch.
		/// \param Size size of the fused/unfused branch.
		/// \param BoundaryAlignment alignment requirement of the branch.
		/// \returns true if the branch is against the boundary.
		static bool isAgainstBoundary(uint64_t StartAddr, uint64_t Size,
		Align BoundaryAlignment) {
		uint64_t EndAddr = StartAddr + Size;
		return (EndAddr & (BoundaryAlignment.value() - 1)) == 0;
		}

		/// Check if the branch needs padding.
		///
		/// \param StartAddr start address of the fused/unfused branch.
		/// \param Size size of the fused/unfused branch.
		/// \param BoundaryAlignment alignment requirement of the branch.
		/// \returns true if the branch needs padding.
		static bool needPadding(uint64_t StartAddr, uint64_t Size,
		Align BoundaryAlignment) {
		return mayCrossBoundary(StartAddr, Size, BoundaryAlignment) \|\|
		isAgainstBoundary(StartAddr, Size, BoundaryAlignment);
		}

uint64_t MCAssembler::computeFragmentSize(const MCAsmLayout &Layout,		uint64_t MCAssembler::computeFragmentSize(const MCAsmLayout &Layout,
const MCFragment &F) const {		const MCFragment &F) const {
assert(getBackendPtr() && "Requires assembler backend");		assert(getBackendPtr() && "Requires assembler backend");
switch (F.getKind()) {		switch (F.getKind()) {
case MCFragment::FT_Data:		case MCFragment::FT_Data:
return cast<MCDataFragment>(F).getContents().size();		return cast<MCDataFragment>(F).getContents().size();
case MCFragment::FT_Relaxable:		case MCFragment::FT_Relaxable:
return cast<MCRelaxableFragment>(F).getContents().size();		return cast<MCRelaxableFragment>(F).getContents().size();
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (Size > 0 && AF.hasEmitNops()) {
while (Size % getBackend().getMinimumNopSize())		while (Size % getBackend().getMinimumNopSize())
Size += AF.getAlignment();		Size += AF.getAlignment();
}		}
if (Size > AF.getMaxBytesToEmit())		if (Size > AF.getMaxBytesToEmit())
return 0;		return 0;
return Size;		return Size;
}		}

		case MCFragment::FT_NeverAlign: {
		// Disclaimer: NeverAlign fragment size depends on the size of its immediate
		// successor, but NeverAlign need not be a MCRelaxableFragment.
		// NeverAlign fragment size is recomputed if the successor is relaxed:
		efriedmaUnsubmitted Done Reply Inline Actions Recomputing the size of the fragment here is suspicious. For other fragments, computeFragmentSize is independent from the layout of the section. If the size needs to be changed based on the layout, it's mutated by MCAssembler::layoutSectionOnce. I realize FT_Org doesn't follow this rule, but that's not a great example to follow. The following currently crashes: foo: .org bar bar: efriedma: Recomputing the size of the fragment here is suspicious. For other fragments…
		efriedmaUnsubmitted Done Reply Inline Actions My biggest concern with the current approach is the potential for an infinite loop. There isn't any protection against the following fragment's offset moving backwards, which could lead to infinitely oscillating fragment sizes. efriedma: My biggest concern with the current approach is the potential for an infinite loop. There…
		efriedmaUnsubmitted Done Reply Inline Actions Thought a little more. Probably the simplest way to ensure that computeFragmentSize() is sane is to establish the following rule: it should never examine fragments after the current fragment in the section. If we logically need to examine any fragment after the current fragment, we need to do that using relaxation, inside MCAssembler::layoutSectionOnce. This means we can compute the "current" layout using a single pass computeFragmentSize() calls. Under that rule, FT_Align is fine because it only cares about its own offset. The FT_Org code crashes because it tries to look at fragments after itself. FT_NeverAlign is... marginal. The size of an MCDataFragment is fixed, so that doesn't really count as looking forward. But I suspect MCRelaxableFragment doesn't work right: if we do end up relaxing the fragment, I'm not sure we recompute everything we need to. efriedma: Thought a little more. Probably the simplest way to ensure that computeFragmentSize() is sane…
		AmirAuthorUnsubmitted Done Reply Inline Actions @efriedma: do you have a specific corner case in mind? Wouldn’t it be covered by a test case with MCRelaxableFragment following NA fragment? Amir: @efriedma: do you have a specific corner case in mind? Wouldn’t it be covered by a test case…
		efriedmaUnsubmitted Done Reply Inline Actions You'd want a testcase where there's an MCRelaxableFragment that gets relaxed after we computeFragmentSize() access. Something like a forward jump more than 128 bytes. Then make sure we chose the right size for the MCNeverAlignFragment. It looks like in your regression test, the fragment after is always an MCDataFragment? efriedma: You'd want a testcase where there's an MCRelaxableFragment that gets relaxed after we…
		AmirAuthorUnsubmitted Done Reply Inline Actions Let me experiment with the idea. The second test is a NeverAlign followed by MCRelaxableFragment (cmpl insn). Actually we wanted to avoid computing the size using relaxation (as BoundaryAlign does) because it slows down the layout and appears unnecessary if we can handle the RelaxableFragment following NA. Amir: Let me experiment with the idea. The second test is a NeverAlign followed by…
		// - If RelaxableFragment is relaxed, it gets invalidated by marking its
		// predecessor as LastValidFragment.
		// - This forces the assembler to call MCAsmLayout::layoutFragment on that
		skanUnsubmitted Done Reply Inline Actions You can add test case like .L0: .nops 57 int3 .avoid_end_align 64 cmpl $(.L1-.L0), %eax je .L0 .nops 65 .L1: to check the situation where the fragment after NeverAlign is a RelaxableFragment. skan: You can add test case like ``` .L0: .nops 57 int3 .avoid_end_align 64 cmpl $(.L1-.L0), %eax…
		AmirAuthorUnsubmitted Done Reply Inline Actions Thank you! Amir: Thank you!
		// relaxable fragment, which in turn will always ask the predecessor to
		// compute its size (see "computeFragmentSize(prev)" in layoutFragment).
		MaskRayUnsubmitted Done Reply Inline Actions The logic is wrong if NextFrag->getContents().size() % NAF.getAlignment() == 0. In that case you want to make the value 0 MaskRay: The logic is wrong if NextFrag->getContents().size() % NAF.getAlignment() == 0. In that case…
		//
		// In short, the simplest way to ensure that computeFragmentSize() is sane
		// is to establish the following rule: it should never examine fragments
		skanUnsubmitted Done Reply Inline Actions Should this be a `llvm_unreachable`? skan: Should this be a `llvm_unreachable`?
		// after the current fragment in the section. If we logically need to
		// examine any fragment after the current fragment, we need to do that using
		AmirAuthorUnsubmitted Done Reply Inline Actions Better reuse isAgainstBoundary Amir: Better reuse isAgainstBoundary
		// relaxation, inside MCAssembler::layoutSectionOnce.
		MaskRayUnsubmitted Done Reply Inline Actions If neither, return 0 MaskRay: If neither, return 0
		const MCNeverAlignFragment &NAF = cast<MCNeverAlignFragment>(F);
		const MCFragment *NF = F.getNextNode();
		uint64_t Offset = Layout.getFragmentOffset(&NAF);
		size_t NextFragSize = 0;
		skanUnsubmitted Done Reply Inline Actions The summary and comments in the code need to be updated, you insert a minimum-size nop rather than a single one-byte nop, although they are same for most cases. skan: The summary and comments in the code need to be updated, you insert a minimum-size nop rather…
		if (const auto *NextFrag = dyn_cast<MCRelaxableFragment>(NF)) {
		NextFragSize = NextFrag->getContents().size();
		} else if (const auto *NextFrag = dyn_cast<MCDataFragment>(NF)) {
		MaskRayUnsubmitted Done Reply Inline Actions Store Size % getBackend().getMinimumNopSize() into a variable, check whether it is zero, instead of using a loop which increases size one at once. MaskRay: Store Size % getBackend().getMinimumNopSize() into a variable, check whether it is zero…
		NextFragSize = NextFrag->getContents().size();
		} else {
		llvm_unreachable("Didn't find the expected fragment after NeverAlign");
		}
		// Check if the next fragment ends at the alignment we want to avoid.
		if (isAgainstBoundary(Offset, NextFragSize, Align(NAF.getAlignment()))) {
		// Avoid this alignment by introducing minimum nop.
		assert(getBackend().getMinimumNopSize() != NAF.getAlignment());
		return getBackend().getMinimumNopSize();
		}
		return 0;
		}

case MCFragment::FT_Org: {		case MCFragment::FT_Org: {
const MCOrgFragment &OF = cast<MCOrgFragment>(F);		const MCOrgFragment &OF = cast<MCOrgFragment>(F);
MCValue Value;		MCValue Value;
if (!OF.getOffset().evaluateAsValue(Value, Layout)) {		if (!OF.getOffset().evaluateAsValue(Value, Layout)) {
getContext().reportError(OF.getLoc(),		getContext().reportError(OF.getLoc(),
"expected assembly-time absolute expression");		"expected assembly-time absolute expression");
return 0;		return 0;
}		}
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	for (uint64_t i = 0; i != Count; ++i) {
case 8:		case 8:
support::endian::write<uint64_t>(OS, AF.getValue(), Endian);		support::endian::write<uint64_t>(OS, AF.getValue(), Endian);
break;		break;
}		}
}		}
break;		break;
}		}

		case MCFragment::FT_NeverAlign: {
		if (!Asm.getBackend().writeNopData(OS, FragmentSize))
		report_fatal_error("unable to write nop sequence of " +
		Twine(FragmentSize) + " bytes");
		break;
		}

case MCFragment::FT_Data:		case MCFragment::FT_Data:
++stats::EmittedDataFragments;		++stats::EmittedDataFragments;
OS << cast<MCDataFragment>(F).getContents();		OS << cast<MCDataFragment>(F).getContents();
break;		break;

case MCFragment::FT_Relaxable:		case MCFragment::FT_Relaxable:
++stats::EmittedRelaxableFragments;		++stats::EmittedRelaxableFragments;
OS << cast<MCRelaxableFragment>(F).getContents();		OS << cast<MCRelaxableFragment>(F).getContents();
break;		break;

case MCFragment::FT_CompactEncodedInst:		case MCFragment::FT_CompactEncodedInst:
++stats::EmittedCompactEncodedInstFragments;		++stats::EmittedCompactEncodedInstFragments;
OS << cast<MCCompactEncodedInstFragment>(F).getContents();		OS << cast<MCCompactEncodedInstFragment>(F).getContents();
break;		break;

case MCFragment::FT_Fill: {		case MCFragment::FT_Fill: {
++stats::EmittedFillFragments;		++stats::EmittedFillFragments;
const MCFillFragment &FF = cast<MCFillFragment>(F);		const MCFillFragment &FF = cast<MCFillFragment>(F);
uint64_t V = FF.getValue();		uint64_t V = FF.getValue();
unsigned VSize = FF.getValueSize();		unsigned VSize = FF.getValueSize();
const unsigned MaxChunkSize = 16;		const unsigned MaxChunkSize = 16;
char Data[MaxChunkSize];		char Data[MaxChunkSize];
assert(0 < VSize && VSize <= MaxChunkSize && "Illegal fragment fill size");		assert(0 < VSize && VSize <= MaxChunkSize && "Illegal fragment fill size");
// Duplicate V into Data as byte vector to reduce number of		// Duplicate V into Data as byte vector to reduce number of
// writes done. As such, do endian conversion here.		// writes done. As such, do endian conversion here.
for (unsigned I = 0; I != VSize; ++I) {		for (unsigned I = 0; I != VSize; ++I) {
unsigned index = Endian == support::little ? I : (VSize - I - 1);		unsigned index = Endian == support::little ? I : (VSize - I - 1);
Data[I] = uint8_t(V >> (index * 8));		Data[I] = uint8_t(V >> (index * 8));
}		}
		skanUnsubmitted Done Reply Inline Actions Line 611-628 are redundant and untested. skan: Line 611-628 are redundant and untested.
for (unsigned I = VSize; I < MaxChunkSize; ++I)		for (unsigned I = VSize; I < MaxChunkSize; ++I)
Data[I] = Data[I - VSize];		Data[I] = Data[I - VSize];

// Set to largest multiple of VSize in Data.		// Set to largest multiple of VSize in Data.
const unsigned NumPerChunk = MaxChunkSize / VSize;		const unsigned NumPerChunk = MaxChunkSize / VSize;
// Set ChunkSize to largest multiple of VSize in Data		// Set ChunkSize to largest multiple of VSize in Data
const unsigned ChunkSize = VSize * NumPerChunk;		const unsigned ChunkSize = VSize * NumPerChunk;

▲ Show 20 Lines • Show All 404 Lines • ▼ Show 20 Lines	bool MCAssembler::relaxLEB(MCAsmLayout &Layout, MCLEBFragment &LF) {
// only increase an LEB fragment size here, not decrease it. See PR35809.		// only increase an LEB fragment size here, not decrease it. See PR35809.
if (LF.isSigned())		if (LF.isSigned())
encodeSLEB128(Value, OSE, OldSize);		encodeSLEB128(Value, OSE, OldSize);
else		else
encodeULEB128(Value, OSE, OldSize);		encodeULEB128(Value, OSE, OldSize);
return OldSize != LF.getContents().size();		return OldSize != LF.getContents().size();
}		}

/// Check if the branch crosses the boundary.
///
/// \param StartAddr start address of the fused/unfused branch.
/// \param Size size of the fused/unfused branch.
/// \param BoundaryAlignment alignment requirement of the branch.
/// \returns true if the branch cross the boundary.
static bool mayCrossBoundary(uint64_t StartAddr, uint64_t Size,
Align BoundaryAlignment) {
uint64_t EndAddr = StartAddr + Size;
return (StartAddr >> Log2(BoundaryAlignment)) !=
((EndAddr - 1) >> Log2(BoundaryAlignment));
}

/// Check if the branch is against the boundary.
///
/// \param StartAddr start address of the fused/unfused branch.
/// \param Size size of the fused/unfused branch.
/// \param BoundaryAlignment alignment requirement of the branch.
/// \returns true if the branch is against the boundary.
static bool isAgainstBoundary(uint64_t StartAddr, uint64_t Size,
Align BoundaryAlignment) {
uint64_t EndAddr = StartAddr + Size;
return (EndAddr & (BoundaryAlignment.value() - 1)) == 0;
}

/// Check if the branch needs padding.
///
/// \param StartAddr start address of the fused/unfused branch.
/// \param Size size of the fused/unfused branch.
/// \param BoundaryAlignment alignment requirement of the branch.
/// \returns true if the branch needs padding.
static bool needPadding(uint64_t StartAddr, uint64_t Size,
Align BoundaryAlignment) {
return mayCrossBoundary(StartAddr, Size, BoundaryAlignment) \|\|
isAgainstBoundary(StartAddr, Size, BoundaryAlignment);
}

bool MCAssembler::relaxBoundaryAlign(MCAsmLayout &Layout,		bool MCAssembler::relaxBoundaryAlign(MCAsmLayout &Layout,
MCBoundaryAlignFragment &BF) {		MCBoundaryAlignFragment &BF) {
// BoundaryAlignFragment that doesn't need to align any fragment should not be		// BoundaryAlignFragment that doesn't need to align any fragment should not be
// relaxed.		// relaxed.
if (!BF.getLastFragment())		if (!BF.getLastFragment())
return false;		return false;

uint64_t AlignedOffset = Layout.getFragmentOffset(&BF);		uint64_t AlignedOffset = Layout.getFragmentOffset(&BF);
▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

llvm/lib/MC/MCFragment.cpp

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	if (Kind == FragmentType(~0)) {
delete this;		delete this;
return;		return;
}		}

switch (Kind) {		switch (Kind) {
case FT_Align:		case FT_Align:
delete cast<MCAlignFragment>(this);		delete cast<MCAlignFragment>(this);
return;		return;
		case FT_NeverAlign:
		delete cast<MCNeverAlignFragment>(this);
		return;
case FT_Data:		case FT_Data:
delete cast<MCDataFragment>(this);		delete cast<MCDataFragment>(this);
return;		return;
case FT_CompactEncodedInst:		case FT_CompactEncodedInst:
delete cast<MCCompactEncodedInstFragment>(this);		delete cast<MCCompactEncodedInstFragment>(this);
return;		return;
case FT_Fill:		case FT_Fill:
delete cast<MCFillFragment>(this);		delete cast<MCFillFragment>(this);
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void MCFragment::dump() const {		LLVM_DUMP_METHOD void MCFragment::dump() const {
raw_ostream &OS = errs();		raw_ostream &OS = errs();

OS << "<";		OS << "<";
switch (getKind()) {		switch (getKind()) {
case MCFragment::FT_Align: OS << "MCAlignFragment"; break;		case MCFragment::FT_Align: OS << "MCAlignFragment"; break;
		case MCFragment::FT_NeverAlign:
		OS << "MCNeverAlignFragment";
		break;
case MCFragment::FT_Data: OS << "MCDataFragment"; break;		case MCFragment::FT_Data: OS << "MCDataFragment"; break;
case MCFragment::FT_CompactEncodedInst:		case MCFragment::FT_CompactEncodedInst:
OS << "MCCompactEncodedInstFragment"; break;		OS << "MCCompactEncodedInstFragment"; break;
case MCFragment::FT_Fill: OS << "MCFillFragment"; break;		case MCFragment::FT_Fill: OS << "MCFillFragment"; break;
case MCFragment::FT_Nops:		case MCFragment::FT_Nops:
OS << "MCFNopsFragment";		OS << "MCFNopsFragment";
break;		break;
case MCFragment::FT_Relaxable: OS << "MCRelaxableFragment"; break;		case MCFragment::FT_Relaxable: OS << "MCRelaxableFragment"; break;
Show All 23 Lines	case MCFragment::FT_Align: {
if (AF->hasEmitNops())		if (AF->hasEmitNops())
OS << " (emit nops)";		OS << " (emit nops)";
OS << "\n ";		OS << "\n ";
OS << " Alignment:" << AF->getAlignment()		OS << " Alignment:" << AF->getAlignment()
<< " Value:" << AF->getValue() << " ValueSize:" << AF->getValueSize()		<< " Value:" << AF->getValue() << " ValueSize:" << AF->getValueSize()
<< " MaxBytesToEmit:" << AF->getMaxBytesToEmit() << ">";		<< " MaxBytesToEmit:" << AF->getMaxBytesToEmit() << ">";
break;		break;
}		}
		case MCFragment::FT_NeverAlign: {
		const MCNeverAlignFragment *NAF = cast<MCNeverAlignFragment>(this);
		OS << "\n ";
		OS << " Alignment:" << NAF->getAlignment() << ">";
		break;
		}
case MCFragment::FT_Data: {		case MCFragment::FT_Data: {
const auto *DF = cast<MCDataFragment>(this);		const auto *DF = cast<MCDataFragment>(this);
OS << "\n ";		OS << "\n ";
OS << " Contents:[";		OS << " Contents:[";
const SmallVectorImpl<char> &Contents = DF->getContents();		const SmallVectorImpl<char> &Contents = DF->getContents();
for (unsigned i = 0, e = Contents.size(); i != e; ++i) {		for (unsigned i = 0, e = Contents.size(); i != e; ++i) {
if (i) OS << ",";		if (i) OS << ",";
OS << hexdigit((Contents[i] >> 4) & 0xF) << hexdigit(Contents[i] & 0xF);		OS << hexdigit((Contents[i] >> 4) & 0xF) << hexdigit(Contents[i] & 0xF);
▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

llvm/lib/MC/MCObjectStreamer.cpp

	Show First 20 Lines • Show All 608 Lines • ▼ Show 20 Lines
	}			}

	void MCObjectStreamer::emitCodeAlignment(unsigned ByteAlignment,			void MCObjectStreamer::emitCodeAlignment(unsigned ByteAlignment,
	unsigned MaxBytesToEmit) {			unsigned MaxBytesToEmit) {
	emitValueToAlignment(ByteAlignment, 0, 1, MaxBytesToEmit);			emitValueToAlignment(ByteAlignment, 0, 1, MaxBytesToEmit);
	cast<MCAlignFragment>(getCurrentFragment())->setEmitNops(true);			cast<MCAlignFragment>(getCurrentFragment())->setEmitNops(true);
	}			}

				void MCObjectStreamer::emitNeverAlignCodeAtEnd(unsigned ByteAlignment) {
				insert(new MCNeverAlignFragment(ByteAlignment));
				}

	void MCObjectStreamer::emitValueToOffset(const MCExpr *Offset,			void MCObjectStreamer::emitValueToOffset(const MCExpr *Offset,
	unsigned char Value,			unsigned char Value,
	SMLoc Loc) {			SMLoc Loc) {
	insert(new MCOrgFragment(*Offset, Value, Loc));			insert(new MCOrgFragment(*Offset, Value, Loc));
	}			}

	// Associate DTPRel32 fixup with data and resize data area			// Associate DTPRel32 fixup with data and resize data area
	void MCObjectStreamer::emitDTPRel32Value(const MCExpr *Value) {			void MCObjectStreamer::emitDTPRel32Value(const MCExpr *Value) {
	▲ Show 20 Lines • Show All 254 Lines • Show Last 20 Lines

llvm/lib/MC/MCStreamer.cpp

	Show First 20 Lines • Show All 1,190 Lines • ▼ Show 20 Lines
	void MCStreamer::emitFill(const MCExpr &NumBytes, uint64_t Value, SMLoc Loc) {}			void MCStreamer::emitFill(const MCExpr &NumBytes, uint64_t Value, SMLoc Loc) {}
	void MCStreamer::emitFill(const MCExpr &NumValues, int64_t Size, int64_t Expr,			void MCStreamer::emitFill(const MCExpr &NumValues, int64_t Size, int64_t Expr,
	SMLoc Loc) {}			SMLoc Loc) {}
	void MCStreamer::emitValueToAlignment(unsigned ByteAlignment, int64_t Value,			void MCStreamer::emitValueToAlignment(unsigned ByteAlignment, int64_t Value,
	unsigned ValueSize,			unsigned ValueSize,
	unsigned MaxBytesToEmit) {}			unsigned MaxBytesToEmit) {}
	void MCStreamer::emitCodeAlignment(unsigned ByteAlignment,			void MCStreamer::emitCodeAlignment(unsigned ByteAlignment,
	unsigned MaxBytesToEmit) {}			unsigned MaxBytesToEmit) {}
				void MCStreamer::emitNeverAlignCodeAtEnd(unsigned ByteAlignment) {}
	void MCStreamer::emitValueToOffset(const MCExpr *Offset, unsigned char Value,			void MCStreamer::emitValueToOffset(const MCExpr *Offset, unsigned char Value,
	SMLoc Loc) {}			SMLoc Loc) {}
	void MCStreamer::emitBundleAlignMode(unsigned AlignPow2) {}			void MCStreamer::emitBundleAlignMode(unsigned AlignPow2) {}
	void MCStreamer::emitBundleLock(bool AlignToEnd) {}			void MCStreamer::emitBundleLock(bool AlignToEnd) {}
	void MCStreamer::finishImpl() {}			void MCStreamer::finishImpl() {}
	void MCStreamer::emitBundleUnlock() {}			void MCStreamer::emitBundleUnlock() {}

	void MCStreamer::SwitchSection(MCSection Section, const MCExpr Subsection) {			void MCStreamer::SwitchSection(MCSection Section, const MCExpr Subsection) {
	▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp

Show First 20 Lines • Show All 1,122 Lines • ▼ Show 20 Lines	bool CreateMemForMSInlineAsm(unsigned SegReg, const MCExpr *Disp,
unsigned Scale, SMLoc Start, SMLoc End,		unsigned Scale, SMLoc Start, SMLoc End,
unsigned Size, StringRef Identifier,		unsigned Size, StringRef Identifier,
const InlineAsmIdentifierInfo &Info,		const InlineAsmIdentifierInfo &Info,
OperandVector &Operands);		OperandVector &Operands);

bool parseDirectiveArch();		bool parseDirectiveArch();
bool parseDirectiveNops(SMLoc L);		bool parseDirectiveNops(SMLoc L);
bool parseDirectiveEven(SMLoc L);		bool parseDirectiveEven(SMLoc L);
		bool parseDirectiveAvoidEndAlign(SMLoc L);
bool ParseDirectiveCode(StringRef IDVal, SMLoc L);		bool ParseDirectiveCode(StringRef IDVal, SMLoc L);

/// CodeView FPO data directives.		/// CodeView FPO data directives.
bool parseDirectiveFPOProc(SMLoc L);		bool parseDirectiveFPOProc(SMLoc L);
bool parseDirectiveFPOSetFrame(SMLoc L);		bool parseDirectiveFPOSetFrame(SMLoc L);
bool parseDirectiveFPOPushReg(SMLoc L);		bool parseDirectiveFPOPushReg(SMLoc L);
bool parseDirectiveFPOStackAlloc(SMLoc L);		bool parseDirectiveFPOStackAlloc(SMLoc L);
bool parseDirectiveFPOStackAlign(SMLoc L);		bool parseDirectiveFPOStackAlign(SMLoc L);
▲ Show 20 Lines • Show All 3,488 Lines • ▼ Show 20 Lines	if (getLexer().isNot(AsmToken::EndOfStatement)) {
else if (Parser.getTok().getString() == "prefix")		else if (Parser.getTok().getString() == "prefix")
return Error(DirectiveID.getLoc(), "'.intel_syntax prefix' is not "		return Error(DirectiveID.getLoc(), "'.intel_syntax prefix' is not "
"supported: registers must not have "		"supported: registers must not have "
"a '%' prefix in .intel_syntax");		"a '%' prefix in .intel_syntax");
}		}
return false;		return false;
} else if (IDVal == ".nops")		} else if (IDVal == ".nops")
return parseDirectiveNops(DirectiveID.getLoc());		return parseDirectiveNops(DirectiveID.getLoc());
		else if (IDVal == ".avoid_end_align")
		return parseDirectiveAvoidEndAlign(DirectiveID.getLoc());
else if (IDVal == ".even")		else if (IDVal == ".even")
return parseDirectiveEven(DirectiveID.getLoc());		return parseDirectiveEven(DirectiveID.getLoc());
else if (IDVal == ".cv_fpo_proc")		else if (IDVal == ".cv_fpo_proc")
return parseDirectiveFPOProc(DirectiveID.getLoc());		return parseDirectiveFPOProc(DirectiveID.getLoc());
else if (IDVal == ".cv_fpo_setframe")		else if (IDVal == ".cv_fpo_setframe")
return parseDirectiveFPOSetFrame(DirectiveID.getLoc());		return parseDirectiveFPOSetFrame(DirectiveID.getLoc());
else if (IDVal == ".cv_fpo_pushreg")		else if (IDVal == ".cv_fpo_pushreg")
return parseDirectiveFPOPushReg(DirectiveID.getLoc());		return parseDirectiveFPOPushReg(DirectiveID.getLoc());
Show All 30 Lines	bool X86AsmParser::parseDirectiveArch() {
return false;		return false;
}		}

/// parseDirectiveNops		/// parseDirectiveNops
/// ::= .nops size[, control]		/// ::= .nops size[, control]
bool X86AsmParser::parseDirectiveNops(SMLoc L) {		bool X86AsmParser::parseDirectiveNops(SMLoc L) {
int64_t NumBytes = 0, Control = 0;		int64_t NumBytes = 0, Control = 0;
SMLoc NumBytesLoc, ControlLoc;		SMLoc NumBytesLoc, ControlLoc;
const MCSubtargetInfo STI = getSTI();
NumBytesLoc = getTok().getLoc();		NumBytesLoc = getTok().getLoc();
if (getParser().checkForValidSection() \|\|		if (getParser().checkForValidSection() \|\|
getParser().parseAbsoluteExpression(NumBytes))		getParser().parseAbsoluteExpression(NumBytes))
return true;		return true;

if (parseOptionalToken(AsmToken::Comma)) {		if (parseOptionalToken(AsmToken::Comma)) {
ControlLoc = getTok().getLoc();		ControlLoc = getTok().getLoc();
if (getParser().parseAbsoluteExpression(Control))		if (getParser().parseAbsoluteExpression(Control))
Show All 32 Lines	bool X86AsmParser::parseDirectiveEven(SMLoc L) {
}		}
if (Section->UseCodeAlign())		if (Section->UseCodeAlign())
getStreamer().emitCodeAlignment(2, 0);		getStreamer().emitCodeAlignment(2, 0);
else		else
getStreamer().emitValueToAlignment(2, 0, 1, 0);		getStreamer().emitValueToAlignment(2, 0, 1, 0);
return false;		return false;
}		}

		/// Directive for NeverAlign fragment testing, not for general usage!
		/// parseDirectiveAvoidEndAlign
		MaskRayUnsubmitted Done Reply Inline Actions Are you using the assembly syntax `.avoid_end_align` in tools? Or is it for testing purpose only? MaskRay: Are you using the assembly syntax `.avoid_end_align` in tools? Or is it for testing purpose…
		AmirAuthorUnsubmitted Done Reply Inline Actions It's only for testing purposes. Amir: It's only for testing purposes.
		MaskRayUnsubmitted Done Reply Inline Actions Add a comment to make clear this is not intended for general usage. MaskRay: Add a comment to make clear this is not intended for general usage.
		/// ::= .avoid_end_align alignment
		bool X86AsmParser::parseDirectiveAvoidEndAlign(SMLoc L) {
		int64_t Alignment = 0;
		SMLoc AlignmentLoc;
		MaskRayUnsubmitted Done Reply Inline Actions Delete MaskRay: Delete
		AlignmentLoc = getTok().getLoc();
		if (getParser().checkForValidSection() \|\|
		getParser().parseAbsoluteExpression(Alignment))
		return true;

		if (getParser().parseEOL("unexpected token in directive"))
		MaskRayUnsubmitted Done Reply Inline Actions `if (parseEOL())` MaskRay: `if (parseEOL())`
		return true;

		MaskRayUnsubmitted Done Reply Inline Actions No need to mention the directive name. The diagnostic uses caret to highlight to user input. MaskRay: No need to mention the directive name. The diagnostic uses caret to highlight to user input.
		if (Alignment <= 0)
		return Error(AlignmentLoc, "expected a positive alignment");

		MaskRayUnsubmitted Done Reply Inline Actions expected a positive alignment MaskRay: expected a positive alignment
		getParser().getStreamer().emitNeverAlignCodeAtEnd(Alignment);
		MaskRayUnsubmitted Done Reply Inline Actions ditto MaskRay: ditto
		return false;
		skanUnsubmitted Done Reply Inline Actions Could you add test cases for line 4743-4750? skan: Could you add test cases for line 4743-4750?
		}

/// ParseDirectiveCode		/// ParseDirectiveCode
		MaskRayUnsubmitted Done Reply Inline Actions The code is self-explanatory MaskRay: The code is self-explanatory
		MaskRayUnsubmitted Done Reply Inline Actions delete blank line MaskRay: delete blank line
/// ::= .code16 \| .code32 \| .code64		/// ::= .code16 \| .code32 \| .code64
bool X86AsmParser::ParseDirectiveCode(StringRef IDVal, SMLoc L) {		bool X86AsmParser::ParseDirectiveCode(StringRef IDVal, SMLoc L) {
MCAsmParser &Parser = getParser();		MCAsmParser &Parser = getParser();
Code16GCC = false;		Code16GCC = false;
if (IDVal == ".code16") {		if (IDVal == ".code16") {
Parser.Lex();		Parser.Lex();
if (!is16BitMode()) {		if (!is16BitMode()) {
SwitchMode(X86::Mode16Bit);		SwitchMode(X86::Mode16Bit);
▲ Show 20 Lines • Show All 243 Lines • Show Last 20 Lines

llvm/test/MC/X86/directive-avoid_end_align.s

This file was added.

				# RUN: llvm-mc -triple=x86_64 -filetype=obj %s \| llvm-objdump --no-show-raw-insn -d - \| FileCheck %s
				# RUN: not llvm-mc -triple=x86_64 --defsym ERR=1 %s -o /dev/null 2>&1 \| FileCheck %s --check-prefix=ERR

				# avoid_end_align has no effect since test doesn't end at alignment boundary:
				.avoid_end_align 64
				# CHECK-NOT: nop
				testl %eax, %eax
				# CHECK: testl %eax, %eax
				je .LBB0

				.fill 58, 1, 0x00
				# NeverAlign followed by MCDataFragment:
				# avoid_end_align inserts nop because `test` would end at alignment boundary:
				.avoid_end_align 64
				# CHECK: 3e: nop
				testl %eax, %eax
				# CHECK-NEXT: 3f: testl %eax, %eax
				je .LBB0
				# CHECK-NEXT: 41: je
				.LBB0:
				retq

				.p2align 6
				.L0:
				.nops 57
				int3
				# NeverAlign followed by RelaxableFragment:
				.avoid_end_align 64
				# CHECK: ba: nop
				cmpl $(.L1-.L0), %eax
				# CHECK-NEXT: bb: cmpl
				je .L0
				# CHECK-NEXT: c1: je
				.nops 65
				.L1:

				###############################################################################
				# Experiment A:
				# Check that NeverAlign doesn't introduce infinite loops in layout.
				# Control:
				# 1. NeverAlign fragment is not added,
				# 2. Short formats of cmp and jcc are used (3 and 2 bytes respectively),
				# 3. cmp and jcc are placed such that to be split by 64B alignment boundary.
				# 4. jcc would be relaxed to a longer format if at least one byte is added
				# between .L10 and je itself, e.g. by adding a NeverAlign padding byte,
				# or relaxing cmp instruction.
				# 5. cmp would be relaxed to a longer format if at least one byte is added
				# between .L11 and .L12, e.g. due to relaxing jcc instruction.
				.p2align 6
				# CHECK: 140: int3
				.fill 2, 1, 0xcc
				.L10:
				.nops 122
				int3
				# CHECK: 1bc: int3
				# no avoid_end_align here
				# CHECK-NOT: nop
				cmp $(.L12-.L11), %eax
				# CHECK: 1bd: cmpl
				.L11:
				je .L10
				# CHECK-NEXT: 1c0: je
				.nops 125
				.L12:

				# Experiment:
				# Same setup as control, except NeverAlign fragment is added before cmp.
				# Expected effect:
				# 1. NeverAlign pads cmp+jcc by one byte since cmp and jcc are split by a 64B
				# alignment boundary,
				# 2. This extra byte forces jcc relaxation to a longer format (Control rule #4),
				# 3. This results in an cmp relaxation (Control rule #5),
				# 4. Which in turn makes NeverAlign fragment unnecessary as cmp and jcc
				# are no longer split by an alignment boundary (cmp crosses the boundary).
				# 5. NeverAlign padding is removed.
				# 6. cmp and jcc instruction remain in relaxed form.
				# 7. Relaxation converges, layout succeeds.
				.p2align 6
				# CHECK: 240: int3
				.fill 2, 1, 0xcc
				.L20:
				.nops 122
				int3
				# CHECK: 2bc: int3
				.avoid_end_align 64
				# CHECK-NOT: nop
				cmp $(.L22-.L21), %eax
				# CHECK-NEXT: 2bd: cmpl
				.L21:
				je .L20
				# CHECK-NEXT: 2c3: je
				.nops 125
				.L22:

				###############################################################################
				# Experiment B: similar to exp A, but we check that once NeverAlign padding is
				# removed from the layout (exp A, experiment step 5), the increased distance
				# between the symbols L33 and L34 triggers the relaxation of instruction at
				# label L32.
				#
				# Control 1: using a one-byte instruction at L33 (site of NeverAlign) leads to
				# steps 2-3 of exp A, experiment:
				# 2. This extra byte forces jcc relaxation to a longer format (Control rule #4),
				# 3. This results in an cmp relaxation (Control rule #5),
				# => short cmp under L32
				.p2align 6
				# CHECK: 380: int3
				.fill 2, 1, 0xcc
				.L30:
				.nops 122
				int3
				# CHECK: 3fc: int3
				hlt
				#.avoid_end_align 64
				.L33:
				cmp $(.L32-.L31), %eax
				# CHECK: 3fe: cmpl
				.L31:
				je .L30
				# CHECK-NEXT: 404: je
				.nops 114
				.p2align 1
				int3
				int3
				# CHECK: 47c: int3
				.L34:
				.nops 9
				.L32:
				cmp $(.L33-.L34), %eax
				# CHECK: 487: cmp
				# note that the size of cmp is 48a-487 == 3 bytes (distance is exactly -128)
				int3
				# CHECK-NEXT: 48a: int3

				# Control 2: leaving out a byte at L43 (site of NeverAlign), plus
				# relaxed jcc and cmp leads to a relaxed cmp under L42 (-129 as cmp's immediate)
				.p2align 6
				# CHECK: 4c0: int3
				.fill 2, 1, 0xcc
				.L40:
				.nops 122
				int3
				# CHECK: 53c: int3
				# int3
				#.avoid_end_align 64
				.L43:
				cmp $(.L42-.L41+0x100), %eax
				# CHECK: 53d: cmpl
				.L41:
				je .L40+0x100
				# CHECK-NEXT: 543: je
				.nops 114
				.p2align 1
				int3
				int3
				# CHECK: 5bc: int3
				.L44:
				.nops 9
				.L42:
				cmp $(.L43-.L44), %eax
				# CHECK: 5c7: cmp
				# note that the size of cmp is 5cd-5c7 == 6 bytes (distance is exactly -129)
				int3
				# CHECK-NEXT: 5cd: int3

				# Experiment
				# Checking if removing NeverAlign padding at L53 as a result of alignment and
				# relaxation of cmp and jcc following it (see exp A), thus reproducing the case
				# in Control 2 (getting a relaxed cmp under L52), is handled correctly.
				.p2align 6
				# CHECK: 600: int3
				.fill 2, 1, 0xcc
				.L50:
				.nops 122
				int3
				# CHECK: 67c: int3
				.avoid_end_align 64
				.L53:
				# CHECK-NOT: nop
				cmp $(.L52-.L51), %eax
				# CHECK-NEXT: 67d: cmpl
				.L51:
				je .L50
				# CHECK-NEXT: 683: je
				.nops 114
				.p2align 1
				int3
				int3
				# CHECK: 6fc: int3
				.L54:
				.nops 9
				.L52:
				cmp $(.L53-.L54), %eax
				# CHECK: 707: cmp
				# note that the size of cmp is 70d-707 == 6 bytes (distance is exactly -129)
				int3
				# CHECK-NEXT: 70d: int3

				.ifdef ERR
				# ERR: {{.*}}.s:[[#@LINE+1]]:17: error: unknown token in expression
				.avoid_end_align
				# ERR: {{.*}}.s:[[#@LINE+1]]:18: error: expected absolute expression
				.avoid_end_align x
				# ERR: {{.*}}.s:[[#@LINE+1]]:18: error: expected a positive alignment
				.avoid_end_align 0
				# ERR: {{.*}}.s:[[#@LINE+1]]:20: error: unexpected token in directive
				.avoid_end_align 64, 0
				.endif