This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/MC/
-
llvm/
-
MC/
3/3
MCFragment.h
-
MCObjectStreamer.h
-
lib/
-
MC/
6/11
MCAssembler.cpp
-
MCFragment.cpp
-
MCObjectStreamer.cpp
-
Target/X86/MCTargetDesc/
-
X86/
-
MCTargetDesc/
23/27
X86AsmBackend.cpp
-
test/MC/X86/
-
MC/
-
X86/
-
align-branch-32-1a.s
-
align-branch-32-2a.s
-
align-branch-32-3a.s
-
align-branch-32-4a.s
-
align-branch-64-1e.s
1/2
align-branch-64-2d.s
-
align-branch-64-7a.s
2/2
align-branch-64-8a.s

Differential D72225

Align branches within 32-Byte boundary(Prefix padding)
AbandonedPublic

Authored by skan on Jan 5 2020, 4:42 AM.

Download Raw Diff

Details

Reviewers

annita.zhang
LuoYuanke
craig.topper
jyknight
reames
MaskRay
tstellar
chandlerc

Summary

This patch is an enhanced version of D70157.

When -x86-align-branch-prefix-size is 0 (default 0), it use NOP padding.
When -x86-align-branch-prefix-size is 1~5, it use Prefix padding.

The prefix padding strategy:

First we try to add segment prefixes to instructions before a branch.
If there is no sufficient room to add segment prefixes, NOP will be inserted before a branch.

Diff Detail

Event Timeline

skan created this revision.Jan 5 2020, 4:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2020, 4:42 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

skan added a parent revision: D72047: Add an interface emitPrefix for MCCodeEmitter.Jan 5 2020, 4:42 AM

skan added a child revision: D72227: [Driver][X86] Add -malign-branch* and -malign-branch-within-32B-boundaries.Jan 5 2020, 5:45 AM

skan updated this revision to Diff 236281.Jan 5 2020, 6:28 PM

skan updated this revision to Diff 236283.Jan 5 2020, 6:53 PM

Rebase

skan added a subscriber: andrew.w.kaylor.Jan 6 2020, 6:20 PM

skan updated this revision to Diff 236526.Jan 6 2020, 11:08 PM

skan edited the summary of this revision. (Show Details)

skan mentioned this in D72291: Reimplement BoundaryAlign mechanism (mostly, but not quite NFC).Jan 7 2020, 12:26 AM

Refine help information.

skan added reviewers: tstellar, chandlerc.Jan 7 2020, 6:30 PM

Rebase

skan updated this revision to Diff 237007.Jan 9 2020, 3:46 AM

LuoYuanke added inline comments.Jan 9 2020, 4:09 AM

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
609	I notice line 623 also check the MCAlignFragment. Is it better to check MCAlignFragment at the beginning, and return without insert any MCBoundaryAlignFragment if the previous fragment is MCAlignFragment? In the following code, we can assume the previous fragment is not MCAlignFragment.

skan updated this revision to Diff 237024.Jan 9 2020, 5:05 AM

skan updated this revision to Diff 237027.Jan 9 2020, 5:10 AM

skan marked an inline comment as done.Jan 9 2020, 5:14 AM

skan added inline comments.

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
609	We shouldn't do that. Only NOP should not be emitted after a `MCAlignFragment`, since it the `MCAlignFragment` is used to align the branch or the fused pair rather than NOP. However, prefix can be emitted after a `MCAlignFragment`, since it is the part of the instruction.

First set of purely minor stylistic comments as I start to get familiar with the code.

llvm/include/llvm/MC/MCAsmBackend.h
59 ↗	(On Diff #237027)	This looks like it can be combined w/the allowAutoPadding function which I recently added.
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
443	This looks to be a formatting change? If so, remove. You can commit this without separate review if desired.

reames added inline comments.Jan 9 2020, 1:33 PM

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
34	Remove the reordering here.
80–85	a) segment prefixes aren't the only ones used are they? b) the wording changes can be pulled into their own review (or simply committed)
495	Please define an enum which gives a symbolic name for these values (if there isn't already one). (i.e. what the heck are these integer constant values?)

I'm having a really hard time wrapping my head around the implementation. Can you give a high level summary of the usage of BoundaryAlign after this change? A couple of examples w/all the fragments written out might also help a lot.

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
477	This should probably be: if (needAlign(Inst)) return false;
560	Please remove this. It should be covered by the compiler support patch which already landed for compiled code, and he assembler syntax is separate.
569	Please assert that the total number of prefixes fits within a uint8_t. It does, but having that explicitly asserted/noted would be helpful.

MaskRay added inline comments.Jan 9 2020, 6:24 PM

llvm/include/llvm/MC/MCFragment.h
538	You can keep `EmitNops` above. The first 2 bytes of `MCBoundaryAlignFragment` and the tail of `MCFragment` shared the same word.
548	Drop `\name Accessors`. It is not useful.
561	I find this method confusing. Updating the call sites to use `hasEmitNops() \|\| hasValue()`
llvm/lib/MC/MCAssembler.cpp
1005	Is it guaranteed `F->getPrevNode()` will not be executed on the first Fragment?
1009	`AlignedOffset` can be defined after AlignedSize is computed.
1016	FixedValue doesn't have to be an immediately invoked function expression.
1026–1036	`if (NewSize == BF.getSize()`
1027	`*(xxx)` -> `xxx->`
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
167	`AlignMaxPrefixSize = X86AlignBranchPrefixSize;` The error checking and normalization (to 5) should be done in an earlier place.
llvm/test/MC/X86/align-branch-64-2d.s
2	When writing tests, make sure llvm-mc and GNU as emit jmp of the same length. There are differences (D72197).
llvm/test/MC/X86/align-branch-64-8a.s
2	Use `##` for comments. `x86_64-unknown-unknown` can be simplified to `x86_64` (the default is ELF).

skan updated this revision to Diff 237242.Jan 9 2020, 10:25 PM

skan marked 19 inline comments as done.

skan added inline comments.

llvm/lib/MC/MCAssembler.cpp
1009	Sorry, I didn't get your point.
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
80–85	If there is no room to insert prefix, NOP will be emitted before the branch or fused pair.
167	Do you any suggestions about a appropriate, earlier place?
495	It is the exact value to be emitted when it has corresponding segment prefix. See `X86MCCodeEmitter::emitSegmentOverridePrefix`
llvm/test/MC/X86/align-branch-64-8a.s
2	I prefer to use `x86_64-unknown-unknown`, it seems more clear to me.

skan marked an inline comment as done.Jan 9 2020, 11:35 PM

skan added inline comments.

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
560	Remove this will cause test align-branch-32-2a.s to fail. This check is simply and direct, I think we can keep this currently.

skan marked an inline comment as done.Jan 9 2020, 11:51 PM

skan added inline comments.

llvm/test/MC/X86/align-branch-64-2d.s
2	According to my understanding, you are referring to the reallocation of call? I will update the tests when D72197 is landed.

skan updated this revision to Diff 237250.Jan 9 2020, 11:55 PM

skan mentioned this in D72463: [Driver][X86] Add -malign-branch* and -mbranches-within-32B-boundaries.Jan 10 2020, 12:46 AM

skan updated this revision to Diff 237482.Jan 10 2020, 9:14 PM

Summary of the usage of MCBoundaryAlignFragment:

As I commented in MCFragment.h, MCBoundaryAlignFragment is a placeholder fragment used to emit NOP or values to align a set of fragments within specific boundary. And in this application, the value is the segment prefix indeed. For example, let's say the code is

pushl  %ebp
pushl  %edi
je  .L_0

Data1 holds pushl %ebp, Data2 holds pushl %edi, Relax1 holds je .L_0. BoundaryAlign1 and BoundaryAlign2 are used to emit prefix,
BoundaryAlign3 is used to emit NOP.

2. Determine the range of the fragments needed to be aligned
MCBoundaryAlignFragment is designed to align a set of fragments within the same section as the BoundaryAlign. BoundaryAlign1~BoundaryAlign3 are used to align fragment Relax1. Each BoundaryAlign has a member called LastFragment, which marks the end of the set of fragments, if we call the nearest backward MCBoundaryAlignFragment of LastFragment as NBBF (short for nearest backward BoundaryAlign fragment), then the set of fragments to be aligned is (NBBF, LastFragment]. In this example, the LastFragment is Relax1 and the NBBF is BoundaryAlign3.

3. Relax the MCBoundaryAlignFragment
3.1 Prerequisites
Before relaxation, we should guarantee that the BoundaryAlign fragment is in the same section as (NBBF, LastFragment], and each non-MCBoundaryAlignFragment in (BoundaryAlign, LastFragment] must have a fixed size after finite times of relaxation. For example, if the code is

pushl  %ebp
.align 16
pushl  %edi
je  .L_0

Align1 is MCAlignFragment that can grow and shrink, it is not guranteed to have a fixed size after finite times of relaxation, so the BoundaryAlign1's LastFragment should be NULL.
3.2 How to relax
Let's go back to the code

pushl  %ebp
pushl  %edi
je  .L_0

LastFragment = Relax1, NBBF = BoundaryAlign3.

The fragments are always relaxed from left to right, namely in each iteration, BoundaryAlign1 is relaxed first, then Data1 and then BoundaryAlign2. When we relax a BoundaryAlign, we will assume that all the BoundaryAlign in [this, LastFragment) are of size 0. For example, let's say the size of BoundaryAlign1 and BoundaryAlign2 is limited by 1, and size of BoundaryAlign3 is not limited, and the boundary size is 32.
In the first iteration and before Relax1 is relaxed, Relax1's offset is 0x1e and size is 2, so Relax1 needs 2-byte padding.

After BoundaryAlign3 is relaxed, the sequence will be

In the second iteration and before Relax1 is relaxed, Relax1's offset may become 0x1a and size become 6.

When relax BoudaryAlign1, we will assume Relax1's offset is 0x18 (0x1a -1 -1), so Relax1 needs 0-byte padding. After BoudaryAlign3 is relaxed,
the sequence will be

Note
If need to align fused jcc, 2 and 3 are same as above, only 1 is slightly different. When the code is

pushl  %ebp
pushl  %edi
cmp  %eax, %ebp
je  .L_0

Data1 holds pushl %ebp, Data2 holds pushl %edi, Data3 holds cmp %eax, %ebp, Relax1 holds je .L_0. BoundaryAlign1 and BoundaryAlign2 are used to emit prefix, BoundaryAlign3 is used to emit NOP. LastFragment = Relax1, NBBF = BoundaryAlign3.

Rebase after D72197 is landed.

The prefix insertion logic is scattered (alignBranchesBegin, alignBranchesEnd, and relaxBoundaryAlign), though they are tightly coupled and share a fair amount of information. I am thinking whether placing the logic in one place will remove some abstraction and make the overall logic simpler to understand.

MCBoundaryAlignFragment is currently relaxed in the loop as MCRelaxableFragment. MCBoundaryAlignFragments are pre-allocated.

If we don't require MCBoundaryAlignFragment to be processed in the same loop as other fragments. We can (1) remove pre-allocation of MCBoundaryAlignFragment (2) layout everything (except MCBoundaryAlignFragment) (3) loop over fragments and compute the candidate placement points of MCBoundaryAlignFragment. When we find a jcc which requires padding, pick the candidates on its left and place padding. The candidates can be maintained as a slide window. Whenever we encounter a MCAlignFragment, clear the sliding window. After handling a jcc which requires padding, clear the sliding window as well.

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
611	You can remove `PF->setAlignment(AlignBoundary);` here.
617	And here.
645	If you place `F->setAlignment(AlignBoundary)` here, you can avoid 2 setAlignment calls in `X86AsmBackend::alignBranchesBegin`.
662	It seems unnecessary to move it from alignBranchesBegin here.

skan marked 9 inline comments as done.Jan 13 2020, 12:44 AM

skan added inline comments.

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
611	The `MCBoundaryAlignFragment` may not emit anything, so the constructor of `MCBoundaryAlignFragment` doesn't set the `AlignBoundary` with value provided by the user. The data member `AlignBoundary` will be not corretly set here if `PF->setAlignment(AlignBoundary)` is removed.
617	I think we can not remove it.
645	If a `MCBoundaryAlignFragment` is not used to emit anything, I prefer it doesn't set the alignment. If we place `F->setAlignment(AlignBoundary)` here, the operation `setAlignment(AlignBoundary)` is unnecessary for those `MCBoundaryAlignFragments` that will not emit anything.
662	The purpose of moving `PrevInst = Inst` here is to make the early return in `alignBranchesBegin` simple. Otherwise, we need to write `PrevInst = Inst; return` in alignBranchesBegin` rather than `return`.

LuoYuanke added inline comments.Jan 13 2020, 1:01 AM

llvm/lib/MC/MCAssembler.cpp
616	Why we declare Value as int64_t in MCAlignFragment? It seems only 1 byte is needed. Or we add assert(BF.getValueSize() == 1)?
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
612	Just to confirm that there is no MCBoundaryAlignFragment between the 2 macro fusion instructions. Right?

In D72225#1816298, @MaskRay wrote:

The prefix insertion logic is scattered (alignBranchesBegin, alignBranchesEnd, and relaxBoundaryAlign), though they are tightly coupled and share a fair amount of information. I am thinking whether placing the logic in one place will remove some abstraction and make the overall logic simpler to understand.

MCBoundaryAlignFragment is currently relaxed in the loop as MCRelaxableFragment. MCBoundaryAlignFragments are pre-allocated.

If we don't require MCBoundaryAlignFragment to be processed in the same loop as other fragments. We can (1) remove pre-allocation of MCBoundaryAlignFragment (2) layout everything (except MCBoundaryAlignFragment) (3) loop over fragments and compute the candidate placement points of MCBoundaryAlignFragment. When we find a jcc which requires padding, pick the candidates on its left and place padding. The candidates can be maintained as a slide window. Whenever we encounter a MCAlignFragment, clear the sliding window. After handling a jcc which requires padding, clear the sliding window as well.

Although the data structure used to hold MCFragment is doubly linked list, which means we can insert a fragment anywhere, the MCObjectStreamer only has the interface insert() to insert a fragment at the end of the list. It may be hard to add an interface for MCObjectStreamer to insert fragment anywhere safely, that's the reason why I preallocate the MCBoundaryAlignFragment.

If you have interest in designing such an interface for MCObjectStreamer and placing the logic in one place, you can reimplement the Prefix Padding after the patch is landed, I will be very glad to help you review that patch.

skan marked 4 inline comments as done.Jan 13 2020, 1:17 AM

skan added inline comments.

llvm/lib/MC/MCAssembler.cpp
616	I didn't decalre `Value` as `int64_t`, I declared it as `Optional<uint8_t>`
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
612	Right, there is no.

LuoYuanke added inline comments.Jan 13 2020, 2:57 AM

llvm/lib/MC/MCAssembler.cpp
616	You are right.

The patch LGTM. I'd like to see the patch land before the llvm release.

This revision is now accepted and ready to land.Jan 13 2020, 3:02 AM

craig.topper added inline comments.Jan 13 2020, 11:31 AM

llvm/lib/MC/MCAssembler.cpp
1029	This 15 limit is X86 specific. Seems weird to have it mentioned in a target independent file. Can we abstract this somehow? It can probably happen after the branch.
1033	Same with this 15.
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
492	I think this loop picks up instructions that don't use segment registers for just memory. And probably picks the wrong prefix for this instruction mov %fs:0x1, %ss The loop will find the %ss destination register first. But the %fs is the correct segment to use.

In D72225#1816594, @LuoYuanke wrote:

The patch LGTM. I'd like to see the patch land before the llvm release.

We can play it safe. NOP padding is available which should achieve most of the mitigation benefits (http://lists.llvm.org/pipermail/llvm-dev/2019-December/137610.html hw vs hw_sw_nop). This patch retrieves a bit more performance loss back at the risk of some more uncertainty. I can remove -mllvm -x86-align-branch-prefix-size= from D72463 and commit it (if it gets approved) before the 10.0.0 branch date (2020-01-15). We should definitely speed up reviewing this patch and make it available soon, so that downstream users playing with -mbranches-within-32B-boundaries.

If you have interest in designing such an interface for MCObjectStreamer and placing the logic in one place, you can reimplement the Prefix Padding after the patch is landed, I will be very glad to help you review that patch.

Thanks. I'll definitely think more in this area.

@skan @craig.topper This patch introduces an unknown bug. I applied Diff 15 (https://reviews.llvm.org/D72225?id=237569) + D72463 (clangDriver) + a local patch which enables -mbranches-within-32B-boundaries by default, then randonly picked 1000 tests and 40 failed. I will try providing a reproduce.

NOP padding alone seems good.

In D72225#1818149, @MaskRay wrote:

@skan @craig.topper This patch introduces an unknown bug. I applied Diff 15 (https://reviews.llvm.org/D72225?id=237569) + D72463 (clangDriver) + a local patch which enables -mbranches-within-32B-boundaries by default, then randonly picked 1000 tests and 40 failed. I will try providing a reproduce.

NOP padding alone seems good.

Could you reproduce the fail with this patch only? Applying three patches together makes things complicated and can not prove there is something wrong. I am glad to wait for the reproduced fail for half day.

In D72225#1818510, @skan wrote:

In D72225#1818149, @MaskRay wrote:

@skan @craig.topper This patch introduces an unknown bug. I applied Diff 15 (https://reviews.llvm.org/D72225?id=237569) + D72463 (clangDriver) + a local patch which enables -mbranches-within-32B-boundaries by default, then randonly picked 1000 tests and 40 failed. I will try providing a reproduce.

NOP padding alone seems good.

Could you reproduce the fail with this patch only? Applying three patches together makes things complicated and can not prove there is something wrong. I am glad to wait for the reproduced fail for half day.

-mllvm -x86-align-branch-boundary=32 -mllvm -x86-align-branch=fused,jcc,jmp -mllvm -x86-align-branch-prefix-size=0 => pass (NOP padding only)
-mllvm -x86-align-branch-boundary=32 -mllvm -x86-align-branch=fused,jcc,jmp -mllvm -x86-align-branch-prefix-size=5 => fail (this patch)

How can I reproduce with this patch only? Without a clangDriver patch, the code path added in this patch is not exercised and surely nothing breaks.

As I explained earlier, we already have NOP padding (it passes for our internal 1000 tests) With part of https://reviews.llvm.org/D72463 , we have something decent enough to ship for clang 10.0.0 (the difference between NOP padding alone and this patch is tiny). I understand that you eagerly want to ship the full feature for clang 10.0.0, but IMHO it is not very safe.

In D72225#1818531, @MaskRay wrote:

In D72225#1818510, @skan wrote:

In D72225#1818149, @MaskRay wrote:

@skan @craig.topper This patch introduces an unknown bug. I applied Diff 15 (https://reviews.llvm.org/D72225?id=237569) + D72463 (clangDriver) + a local patch which enables -mbranches-within-32B-boundaries by default, then randonly picked 1000 tests and 40 failed. I will try providing a reproduce.

NOP padding alone seems good.

Could you reproduce the fail with this patch only? Applying three patches together makes things complicated and can not prove there is something wrong. I am glad to wait for the reproduced fail for half day.

-mllvm -x86-align-branch-boundary=32 -mllvm -x86-align-branch=fused,jcc,jmp -mllvm -x86-align-branch-prefix-size=0 => pass (NOP padding only)

-mllvm -x86-align-branch-boundary=32 -mllvm -x86-align-branch=fused,jcc,jmp -mllvm -x86-align-branch-prefix-size=5 => fail (this patch)

How can I reproduce with this patch only? Without a clangDriver patch, the code path added in this patch is not exercised and surely nothing breaks.

As I explained earlier, we already have NOP padding (it passes for our internal 1000 tests) With part of https://reviews.llvm.org/D72463 , we have something decent enough to ship for clang 10.0.0 (the difference between NOP padding alone and this patch is tiny). I understand that you eagerly want to ship the full feature for clang 10.0.0, but IMHO it is not very safe.

Passing option -mllvm -x86-align-branch-boundary=32 -mllvm -x86-align-branch=fused+jcc+jmp -mllvm -x86-align-branch-prefix-size=5 doesn't need any patch for driver.

skan updated this revision to Diff 237832.Jan 13 2020, 7:07 PM

skan marked an inline comment as done.

craig.topper added inline comments.Jan 13 2020, 7:19 PM

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
588	limted->limited peformance->performance
610	Your emitPrefix function includes the 2 byte and 3 byte VEX prefixes, the 4 byte EVEX prefix, and the 3 byte XOP prefix. The bytes after the leading byte of those prefixes can be any byte value and does not indicate a segment value.

skan updated this revision to Diff 237847.Jan 13 2020, 9:25 PM

skan marked 2 inline comments as done.

reorder the check in CanReuseDataFragment, check isBundlingEnabled before check allowAutoPadding()

skan updated this revision to Diff 237851.Jan 13 2020, 10:13 PM

Here is one failure.

--x86-align-branch-prefix-size=0

    3444eca6:	64 48 8b 04 25 00 00 	mov    %fs:0x0,%rax
    3444ecad:	00 00 
    3444ecaf:	48 8d 80 60 f7 ff ff 	lea    -0x8a0(%rax),%rax
    3444ecb6:	83 78 04 00          	cmpl   $0x0,0x4(%rax)
    3444ecba:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
    3444ecc0:	0f 88 08 02 00 00    	js     3444eece

--x86-align-branch-prefix-size=5

    3444fe66:	2e 2e 2e 2e 2e 64 48 	cs cs cs cs cs mov %fs:0x0,%rax
    3444fe6d:	8b 04 25 00 00 00 00 
    3444fe74:	48 8d 80 60 f7 ff ff 	lea    -0x8a0(%rax),%rax
    3444fe7b:	00 83 78 04 00 0f    	add    %al,0xf000478(%rbx)   ###### incorrect
    3444fe81:	88 08                	mov    %cl,(%rax)
    3444fe83:	02 00                	add    (%rax),%al
    3444fe85:	00 48 8d             	add    %cl,-0x73(%rax)
    3444fe88:	05 73 b4 44 01       	add    $0x144b473,%eax

I still suggest we ship something safer. If you think D72463 is acceptable, I can delete line 2022 (AlignBranchPrefixSize = 5;), and we can ship just NOP padding for clang 10.0.
https://lists.llvm.org/pipermail/llvm-dev/2019-December/137610.html NOP padding only has smaller code size increase and is good enough for the mitigation purposes.

In D72225#1818812, @MaskRay wrote:
Here is one failure.
--x86-align-branch-prefix-size=0

    3444eca6:	64 48 8b 04 25 00 00 	mov    %fs:0x0,%rax
    3444ecad:	00 00 
    3444ecaf:	48 8d 80 60 f7 ff ff 	lea    -0x8a0(%rax),%rax
    3444ecb6:	83 78 04 00          	cmpl   $0x0,0x4(%rax)
    3444ecba:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
    3444ecc0:	0f 88 08 02 00 00    	js     3444eece

--x86-align-branch-prefix-size=5

    3444fe66:	2e 2e 2e 2e 2e 64 48 	cs cs cs cs cs mov %fs:0x0,%rax
    3444fe6d:	8b 04 25 00 00 00 00 
    3444fe74:	48 8d 80 60 f7 ff ff 	lea    -0x8a0(%rax),%rax
    3444fe7b:	00 83 78 04 00 0f    	add    %al,0xf000478(%rbx)   ###### incorrect
    3444fe81:	88 08                	mov    %cl,(%rax)
    3444fe83:	02 00                	add    (%rax),%al
    3444fe85:	00 48 8d             	add    %cl,-0x73(%rax)
    3444fe88:	05 73 b4 44 01       	add    $0x144b473,%eax
I still suggest we ship something safer. If you think D72463 is acceptable, I can delete line 2022 (AlignBranchPrefixSize = 5;), and we can ship just NOP padding for clang 10.0.
https://lists.llvm.org/pipermail/llvm-dev/2019-December/137610.html NOP padding only has smaller code size increase and is good enough for the mitigation purposes.

Could you provide the corresponding assembly code?

You cannot prepend prefixes to callq __tls_get_addr (General-Dynamic/Local-Dynamic TLS models). The code sequence is specially organized to allow linker relaxation. Prepending prefixes may cause the linker to mis-relax the code sequence.

--x86-align-branch-prefix-size=0

    4660: 0f 84 df 01 00 00            	je	479 <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0x295>
    4666: 66 48 8d 3d 00 00 00 00      	leaq	(%rip), %rdi
		000000000000466a:  R_X86_64_TLSGD	__rseq_abi-0x4
    466e: 66 66 48 e8 00 00            	callw	0 <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0xc4>
		0000000000004672:  R_X86_64_PLT32	__tls_get_addr-0x4
    4674: 00 00                        	addb	%al, (%rax)
    4676: 83 78 04 00                  	cmpl	$0, 4(%rax)
    467a: 66 0f 1f 44 00 00            	nopw	(%rax,%rax)
    4680: 0f 88 08 02 00 00            	js	520 <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0x2de>

--x86-align-branch-prefix-size=5

    4660:	0f 84 df 01 00 00    	je     4845 <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0x295>
    4666:	2e 2e 2e 2e 2e 66 48 	cs cs cs cs data16 lea %cs:0x0(%rip),%rdi        # 4673 <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0xc3>
    466d:	8d 3d 00 00 00 00 
			466f: R_X86_64_TLSGD	__rseq_abi-0x4
    4673:	2e 66 66 48 e8 00 00 	cs data16 data16 callq 467c <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0xcc>
    467a:	00 00 
			4678: R_X86_64_PLT32	__tls_get_addr-0x4
    467c:	83 78 04 00          	cmpl   $0x0,0x4(%rax)
    4680:	0f 88 08 02 00 00    	js     488e <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0x2de>

Note, with -fno-plt, clang will emit calll *___tls_get_addr@GOT(%ebx) (32-bit) callq *__tls_get_addr@GOTPCREL(%rip) (64-bit). prefix-size= cannot alter such instructions, either.

I'll try investigating other issues in a few hours (>9).

In D72225#1818849, @skan wrote:

In D72225#1818812, @MaskRay wrote:

I still suggest we ship something safer. If you think D72463 is acceptable, I can delete line 2022 (AlignBranchPrefixSize = 5;), and we can ship just NOP padding for clang 10.0.
https://lists.llvm.org/pipermail/llvm-dev/2019-December/137610.html NOP padding only has smaller code size increase and is good enough for the mitigation purposes.

If you think NOP padding is safer. I suggest we can make the general option -mbranches-within-32B-boundaries equivalent to -malign-branch=fused,jcc,jmp -malign-branch-boundary=32 -malign-branch-prefix-size=0.

In D72225#1818906, @MaskRay wrote:

--x86-align-branch-prefix-size=0

    4660: 0f 84 df 01 00 00            	je	479 <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0x295>
    4666: 66 48 8d 3d 00 00 00 00      	leaq	(%rip), %rdi
		000000000000466a:  R_X86_64_TLSGD	__rseq_abi-0x4
    466e: 66 66 48 e8 00 00            	callw	0 <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0xc4>
		0000000000004672:  R_X86_64_PLT32	__tls_get_addr-0x4
    4674: 00 00                        	addb	%al, (%rax)
    4676: 83 78 04 00                  	cmpl	$0, 4(%rax)
    467a: 66 0f 1f 44 00 00            	nopw	(%rax,%rax)
    4680: 0f 88 08 02 00 00            	js	520 <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0x2de>

--x86-align-branch-prefix-size=5

    4660:	0f 84 df 01 00 00    	je     4845 <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0x295>
    4666:	2e 2e 2e 2e 2e 66 48 	cs cs cs cs data16 lea %cs:0x0(%rip),%rdi        # 4673 <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0xc3>
    466d:	8d 3d 00 00 00 00 
			466f: R_X86_64_TLSGD	__rseq_abi-0x4
    4673:	2e 66 66 48 e8 00 00 	cs data16 data16 callq 467c <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0xcc>
    467a:	00 00 
			4678: R_X86_64_PLT32	__tls_get_addr-0x4
    467c:	83 78 04 00          	cmpl   $0x0,0x4(%rax)
    4680:	0f 88 08 02 00 00    	js     488e <_ZN12_GLOBAL__N_116do_free_no_hooksEPv+0x2de>

Note, with -fno-plt, clang will emit calll *___tls_get_addr@GOT(%ebx) (32-bit) callq *__tls_get_addr@GOTPCREL(%rip) (64-bit). prefix-size= cannot alter such instructions, either.

As far as I know, TLSCALL must have a variant symbol, e.g. call ___tls_get_addr@PLT, call *___tls_get_addr@GOT(%ecx). The patch did not prepend prefixes to an instruction with variant symbol, which is guranteed by function X86AsmBackend::shouldAddPrefix(). And we can check that with test case

  .text
  .globl  foo
  .p2align  4
foo:
  .rept 5
  call    ___tls_get_addr@PLT
  .endr
  cmp     %eax, %ebp
  je      foo

Did I miss any TLSCALL?

Only add one test case align-branch-64-9a.s to prove that prefix was not prepended to instruction that has variant symbol operand. (e.g. TLSCALL)

Make sure prefixes are not prepended to any CALL, JUMP,RET

I have serious reservations about the rush to land this patch. I have expressed some of this privately, and other bits have been in previous review threads, but I want to put everything in one public place.

I don't see a strong value in having this in the current LLVM release. We have support for nop padding, and the delta between nop padding and prefix padding is minimal. My personal take is that wrapping up all of the wiring to provide the clang driver option to enable padding (via whatever mechanism is available, for now nops) should take preference over making the padding code marginally better.

I am also deeply uncomfortable that there does not appear to have been any meaningful progress on defining an assembler syntax for this feature. I understand - and in fact pushed for - the urgency in getting a mitigation out, but that urgency is now gone. We have a mitigation which closes most of the gap, we can now focus on ensuring that we're well positioned for the long term.

Returning specifically to this patch, I see the following technical concerns:

The use a black list - instead of a white list - for deciding which instructions can be safely padded has already been shown to be problematic correctness wise. This seems to ignore the possibility that an instruction might *already have* prefixes when deciding how many to insert, and it's not clear to me that all chosen prefixes are valid on *all* instructions.
The fact that *every single instruction* ends up in it's own fragment is a *huge* increase in memory usage. This has been brought up repeatedly (see original thread), but I have seen *no* data provided to make the overhead concrete. There are several ways to potentially mitigate this, but it doesn't appeared to have been assessed.
Stylistic points - such as defining an enum with names for prefixes - previously raised in review comments have not been addressed.
I see little discussion of what validation has been done on this patch. The only comments I can find - https://reviews.llvm.org/D72225#1818149 - seems to indicate that fairly basic testing exposes functional issues. That does not inspire confidence in the quality of the implementation.

Out of these, (4) and (2) are by far the most worrying.

Putting all this together, I don't think this patch should land at this time.

Diff 21 is still incorrect. I'll give a reproduce.

(DATA16_PREFIX)
(LEA64r)
<--- Diff 21 can place cs (0x2e) here and break the General Dynamic TLS code sequence --->
<MCInst 851> (DATA16_PREFIX)
<MCInst 851> (DATA16_PREFIX)
<MCInst 2450> (REX64_PREFIX)
<MCInst 602 <MCOperand Expr:(__tls_get_addr@PLT)>> CALL64pcrel32

It seems we need more time to have further discussion. I will hold this patch until the issues mentioned by @reames @MaskRay are resolved. Welcome to post any fail tests.

In D72225#1820978, @MaskRay wrote:

Diff 21 is still incorrect. I'll give a reproduce.

(DATA16_PREFIX)
(LEA64r)
<--- Diff 21 can place cs (0x2e) here and break the General Dynamic TLS code sequence --->
<MCInst 851> (DATA16_PREFIX)
<MCInst 851> (DATA16_PREFIX)
<MCInst 2450> (REX64_PREFIX)
<MCInst 602 <MCOperand Expr:(__tls_get_addr@PLT)>> CALL64pcrel32

Pls. give us a C or assembly file to reproduce. Thx a lot.

llvm/include/llvm/MC/MCAsmBackend.h
59 ↗	(On Diff #237007)	No semicolon in the end.

In D72225#1820813, @reames wrote:

I have serious reservations about the rush to land this patch. I have expressed some of this privately, and other bits have been in previous review threads, but I want to put everything in one public place.

I don't see a strong value in having this in the current LLVM release. We have support for nop padding, and the delta between nop padding and prefix padding is minimal. My personal take is that wrapping up all of the wiring to provide the clang driver option to enable padding (via whatever mechanism is available, for now nops) should take preference over making the padding code marginally better.

Regarding prefix padding vs. nop padding, we observed 0.3% improvement in SPECINT geomean, and 0.5% in SPECFP geomean. It's not that much in geomean. But we saw 1~2% improvement in specific SPEC benchmarks. We also observed cases in which nop padding doesn't mitigate the effect very well, but prefix padding does. That's the reason we intend to land the prefix padding into LLVM10. So users have alternative approaches to mitigate the JCC microcode update. They can choose whatever provides better performance.
Anyway, I understand your concern and agree to hold the patch until it's pretty mature.

I am also deeply uncomfortable that there does not appear to have been any meaningful progress on defining an assembler syntax for this feature. I understand - and in fact pushed for - the urgency in getting a mitigation out, but that urgency is now gone. We have a mitigation which closes most of the gap, we can now focus on ensuring that we're well positioned for the long term.

I saw you proposed an assembler syntax in https://reviews.llvm.org/D71315. So I suppose you will continue to drive it. We can co-work on it to finalize the syntax and implement it.

MaskRay mentioned this in D72878: [X86][BranchAlign] Suppress branch alignment for {,_}__tls_get_addr.Jan 16 2020, 6:04 PM

data16
leaq    i@TLSGD(%rip), %rdi
data16
data16
rex64
callq   __tls_get_addr@PLT

data16 is emitted as an instruction, though it is the prefix of the leaq. Prefix should not prepend to data16, I will fix it.

reames mentioned this in D75203: [X86] Relax existing instructions to reduce the number of nops needed for alignment purposes.Feb 26 2020, 11:21 AM

skan mentioned this in D75268: A light-weight solution to align branches within 32B boundary by prefix padding.Feb 27 2020, 8:50 AM

reames mentioned this in rGf708c823f06c: [X86] Relax existing instructions to reduce the number of nops needed for….Mar 4 2020, 4:54 PM

Reimplemented this by D76286 based on D75300

Revision Contents

Path

Size

llvm/

include/

llvm/

MC/

MCFragment.h

51 lines

MCObjectStreamer.h

5 lines

lib/

MC/

MCAssembler.cpp

60 lines

MCFragment.cpp

11 lines

MCObjectStreamer.cpp

23 lines

Target/

X86/

MCTargetDesc/

X86AsmBackend.cpp

273 lines

test/

MC/

X86/

5 lines

25 lines

114 lines

27 lines

35 lines

20 lines

29 lines

24 lines

Diff 237482

llvm/include/llvm/MC/MCFragment.h

Show First 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	public:

StringRef getFixedSizePortion() const { return FixedSizePortion; }		StringRef getFixedSizePortion() const { return FixedSizePortion; }

static bool classof(const MCFragment *F) {		static bool classof(const MCFragment *F) {
return F->getKind() == MCFragment::FT_CVDefRange;		return F->getKind() == MCFragment::FT_CVDefRange;
}		}
};		};

/// Represents required padding such that a particular other set of fragments		/// This is a placeholder fragment used to emit NOP or values to align a set of
/// does not cross a particular power-of-two boundary. The other fragments must		/// fragments within specific boundary. If we call the nearest backward
/// follow this one within the same section.		/// MCBoundaryAlignFragment of LastFragment as NBBF, then the set of fragments
		/// to be aligned is (NBBF, LastFragment]. The fragments to be aligned should be
		/// in the same section with this fragment, and each non-BF fragment on the path
		/// from this fragment to the fragments to be aligned must have a fixed size
		/// after finite times of relaxation.
class MCBoundaryAlignFragment : public MCFragment {		class MCBoundaryAlignFragment : public MCFragment {
		/// Flag to indicate that (optimal) NOPs should be emitted instead
		/// of using the provided value.
		bool EmitNops = false;
/// The alignment requirement of the branch to be aligned.		/// The alignment requirement of the branch to be aligned.
Align AlignBoundary;		Align AlignBoundary;
/// Flag to indicate whether the branch is fused. Use in determining the
/// region of fragments being aligned.
bool Fused : 1;
/// Flag to indicate whether NOPs should be emitted.
bool EmitNops : 1;
/// The size of the fragment. The size is lazily set during relaxation, and		/// The size of the fragment. The size is lazily set during relaxation, and
/// is not meaningful before that.		/// is not meaningful before that.
uint64_t Size = 0;		uint64_t Size = 0;
		/// Value to use for filling padding bytes if existing.
		Optional<uint8_t> Value;
		/// The maximum number of bytes to emit; if the Flag EmitNops is true,
		MaskRayUnsubmitted Done Reply Inline Actions You can keep `EmitNops` above. The first 2 bytes of `MCBoundaryAlignFragment` and the tail of `MCFragment` shared the same word. MaskRay: You can keep `EmitNops` above. The first 2 bytes of `MCBoundaryAlignFragment` and the tail of…
		/// then this constraint is ignored.
		uint64_t MaxBytesToEmit = 0;
		/// The fragment to be aligned.
		const MCFragment *LastFragment = nullptr;

public:		public:
MCBoundaryAlignFragment(Align AlignBoundary, bool Fused = false,		MCBoundaryAlignFragment(MCSection *Sec = nullptr)
bool EmitNops = false, MCSection *Sec = nullptr)		: MCFragment(FT_BoundaryAlign, false, Sec) {}
: MCFragment(FT_BoundaryAlign, false, Sec), AlignBoundary(AlignBoundary),
Fused(Fused), EmitNops(EmitNops) {}

uint64_t getSize() const { return Size; }		uint64_t getSize() const { return Size; }
		MaskRayUnsubmitted Done Reply Inline Actions Drop `\name Accessors`. It is not useful. MaskRay: Drop `\name Accessors`. It is not useful.
void setSize(uint64_t Value) { Size = Value; }		void setSize(uint64_t V) { Size = V; }

Align getAlignment() const { return AlignBoundary; }		Align getAlignment() const { return AlignBoundary; }
		void setAlignment(Align V) { AlignBoundary = V; }

bool isFused() const { return Fused; }		bool hasValue() const { return Value.hasValue(); }
void setFused(bool Value) { Fused = Value; }		uint8_t getValue() const { return Value.getValue(); }
		void setValue(uint8_t V) { Value = V; }

bool canEmitNops() const { return EmitNops; }		bool hasEmitNops() const { return EmitNops; }
void setEmitNops(bool Value) { EmitNops = Value; }		void setEmitNops(bool V) { EmitNops = V; }

		bool hasEmitNopsOrValue() const { return EmitNops \|\| Value.hasValue(); }
		MaskRayUnsubmitted Done Reply Inline Actions I find this method confusing. Updating the call sites to use `hasEmitNops() \|\| hasValue()` MaskRay: I find this method confusing. Updating the call sites to use `hasEmitNops() \|\| hasValue()`

		uint8_t getMaxBytesToEmit() const { return MaxBytesToEmit; }
		void setMaxBytesToEmit(uint64_t V) { MaxBytesToEmit = V; }

		const MCFragment *getFragment() const { return LastFragment; }
		void setFragment(const MCFragment *F) { LastFragment = F; }

static bool classof(const MCFragment *F) {		static bool classof(const MCFragment *F) {
return F->getKind() == MCFragment::FT_BoundaryAlign;		return F->getKind() == MCFragment::FT_BoundaryAlign;
}		}
};		};
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_MC_MCFRAGMENT_H		#endif // LLVM_MC_MCFRAGMENT_H

llvm/include/llvm/MC/MCObjectStreamer.h

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	public:
}		}

/// Get a data fragment to write into, creating a new one if the current		/// Get a data fragment to write into, creating a new one if the current
/// fragment is not a data fragment.		/// fragment is not a data fragment.
/// Optionally a \p STI can be passed in so that a new fragment is created		/// Optionally a \p STI can be passed in so that a new fragment is created
/// if the Subtarget differs from the current fragment.		/// if the Subtarget differs from the current fragment.
MCDataFragment getOrCreateDataFragment(const MCSubtargetInfo STI = nullptr);		MCDataFragment getOrCreateDataFragment(const MCSubtargetInfo STI = nullptr);

		/// Get a boundary-align fragment to write into, creating a new one if the
		/// current fragment is not a boundary-align fragment or has been used to emit
		/// something.
		MCBoundaryAlignFragment *getOrCreateBoundaryAlignFragment();

protected:		protected:
bool changeSectionImpl(MCSection Section, const MCExpr Subsection);		bool changeSectionImpl(MCSection Section, const MCExpr Subsection);

/// Assign a label to the current Section and Subsection even though a		/// Assign a label to the current Section and Subsection even though a
/// fragment is not yet present. Use flushPendingLabels(F) to associate		/// fragment is not yet present. Use flushPendingLabels(F) to associate
/// a fragment with this label.		/// a fragment with this label.
void addPendingLabel(MCSymbol* label);		void addPendingLabel(MCSymbol* label);

▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/lib/MC/MCAssembler.cpp

Show First 20 Lines • Show All 600 Lines • ▼ Show 20 Lines	static void writeFragment(raw_ostream &OS, const MCAssembler &Asm,

case MCFragment::FT_LEB: {		case MCFragment::FT_LEB: {
const MCLEBFragment &LF = cast<MCLEBFragment>(F);		const MCLEBFragment &LF = cast<MCLEBFragment>(F);
OS << LF.getContents();		OS << LF.getContents();
break;		break;
}		}

case MCFragment::FT_BoundaryAlign: {		case MCFragment::FT_BoundaryAlign: {
		const MCBoundaryAlignFragment &BF = cast<MCBoundaryAlignFragment>(F);
		if (BF.hasEmitNops()) {
if (!Asm.getBackend().writeNopData(OS, FragmentSize))		if (!Asm.getBackend().writeNopData(OS, FragmentSize))
report_fatal_error("unable to write nop sequence of " +		report_fatal_error("unable to write nop sequence of " +
Twine(FragmentSize) + " bytes");		Twine(FragmentSize) + " bytes");
		} else if (BF.hasValue()) {
		for (uint64_t i = 0; i != FragmentSize; ++i)
		OS << char(BF.getValue());
		LuoYuankeUnsubmitted Done Reply Inline Actions Why we declare Value as int64_t in MCAlignFragment? It seems only 1 byte is needed. Or we add assert(BF.getValueSize() == 1)? LuoYuanke: Why we declare Value as int64_t in MCAlignFragment? It seems only 1 byte is needed. Or we add…
		skanAuthorUnsubmitted Done Reply Inline Actions I didn't decalre `Value` as `int64_t`, I declared it as `Optional<uint8_t>` skan: I didn't decalre `Value` as `int64_t`, I declared it as `Optional<uint8_t>`
		LuoYuankeUnsubmitted Not Done Reply Inline Actions You are right. LuoYuanke: You are right.
		}
break;		break;
}		}

case MCFragment::FT_SymbolId: {		case MCFragment::FT_SymbolId: {
const MCSymbolIdFragment &SF = cast<MCSymbolIdFragment>(F);		const MCSymbolIdFragment &SF = cast<MCSymbolIdFragment>(F);
support::endian::write<uint32_t>(OS, SF.getSymbol()->getIndex(), Endian);		support::endian::write<uint32_t>(OS, SF.getSymbol()->getIndex(), Endian);
break;		break;
}		}
▲ Show 20 Lines • Show All 362 Lines • ▼ Show 20 Lines
static bool needPadding(uint64_t StartAddr, uint64_t Size,		static bool needPadding(uint64_t StartAddr, uint64_t Size,
Align BoundaryAlignment) {		Align BoundaryAlignment) {
return mayCrossBoundary(StartAddr, Size, BoundaryAlignment) \|\|		return mayCrossBoundary(StartAddr, Size, BoundaryAlignment) \|\|
isAgainstBoundary(StartAddr, Size, BoundaryAlignment);		isAgainstBoundary(StartAddr, Size, BoundaryAlignment);
}		}

bool MCAssembler::relaxBoundaryAlign(MCAsmLayout &Layout,		bool MCAssembler::relaxBoundaryAlign(MCAsmLayout &Layout,
MCBoundaryAlignFragment &BF) {		MCBoundaryAlignFragment &BF) {
// The MCBoundaryAlignFragment that doesn't emit NOP should not be relaxed.		// The MCBoundaryAlignFragment that does not emit anything or not have any
if (!BF.canEmitNops())		// fragment to be aligned should not be relaxed.
		if (!BF.hasEmitNopsOrValue() \|\| !BF.getFragment())
return false;		return false;

uint64_t AlignedOffset = Layout.getFragmentOffset(BF.getNextNode());		// Compute the size of all the fragments in the range we're trying to align.
uint64_t AlignedSize = 0;		const MCFragment *TF = BF.getFragment();
const MCFragment *F = BF.getNextNode();		uint64_t AlignedSize = computeFragmentSize(Layout, *TF);
// If the branch is unfused, it is emitted into one fragment, otherwise it is		uint64_t AlignedOffset = Layout.getFragmentOffset(TF);
// emitted into two fragments at most, the next MCBoundaryAlignFragment(if		// Note: It should be guaranteed that there is a MCBoundaryAlignFragment
		MaskRayUnsubmitted Done Reply Inline Actions Is it guaranteed `F->getPrevNode()` will not be executed on the first Fragment? MaskRay: Is it guaranteed `F->getPrevNode()` will not be executed on the first Fragment?
// exists) also marks the end of the branch.		// before TF in the same section.
for (auto i = 0, N = BF.isFused() ? 2 : 1;		for (auto *F = TF->getPrevNode(); !isa<MCBoundaryAlignFragment>(F);
i != N && !isa<MCBoundaryAlignFragment>(F); ++i, F = F->getNextNode()) {		F = F->getPrevNode()) {
AlignedSize += computeFragmentSize(Layout, *F);		uint64_t Size = computeFragmentSize(Layout, *F);
		MaskRayUnsubmitted Not Done Reply Inline Actions `AlignedOffset` can be defined after AlignedSize is computed. MaskRay: `AlignedOffset` can be defined after AlignedSize is computed.
		skanAuthorUnsubmitted Done Reply Inline Actions Sorry, I didn't get your point. skan: Sorry, I didn't get your point.
}		AlignedSize += Size;
uint64_t OldSize = BF.getSize();		AlignedOffset -= Size;
AlignedOffset -= OldSize;		}

		// Compute the size of all the MCBoundaryAlignFragments in the range
		// [BF,BF.getFragment).
		uint64_t FixedValue = 0;
		MaskRayUnsubmitted Not Done Reply Inline Actions FixedValue doesn't have to be an immediately invoked function expression. MaskRay: FixedValue doesn't have to be an immediately invoked function expression.
		for (const MCFragment *F = &BF; F != TF; F = F->getNextNode())
		if (auto *MBF = dyn_cast<MCBoundaryAlignFragment>(F))
		FixedValue += MBF->getSize();

		AlignedOffset -= FixedValue;
Align BoundaryAlignment = BF.getAlignment();		Align BoundaryAlignment = BF.getAlignment();
uint64_t NewSize = needPadding(AlignedOffset, AlignedSize, BoundaryAlignment)		uint64_t NewSize = needPadding(AlignedOffset, AlignedSize, BoundaryAlignment)
? offsetToAlignment(AlignedOffset, BoundaryAlignment)		? offsetToAlignment(AlignedOffset, BoundaryAlignment)
: 0U;		: 0U;
if (NewSize == OldSize)		if (!BF.hasEmitNops()) {
		assert(BF.getNextNode()->hasInstructions() &&
		MaskRayUnsubmitted Done Reply Inline Actions `(xxx)` -> `xxx->` MaskRay:* `*(xxx)` -> `xxx->`
		"The fragment doesn't have any instruction.");
		assert(computeFragmentSize(Layout, *(BF.getNextNode())) <= 15 &&
		craig.topperUnsubmitted Not Done Reply Inline Actions This 15 limit is X86 specific. Seems weird to have it mentioned in a target independent file. Can we abstract this somehow? It can probably happen after the branch. craig.topper: This 15 limit is X86 specific. Seems weird to have it mentioned in a target independent file.
		"The fragment's size must be no longer than 15 since it should only "
		"hold one instruction.");
		NewSize = std::min({NewSize,
		15 - computeFragmentSize(Layout, *(BF.getNextNode())),
		craig.topperUnsubmitted Not Done Reply Inline Actions Same with this 15. craig.topper: Same with this 15.
		static_cast<uint64_t>(BF.getMaxBytesToEmit())});
		}
		if (NewSize == BF.getSize())
		MaskRayUnsubmitted Done Reply Inline Actions `if (NewSize == BF.getSize()` MaskRay: `if (NewSize == BF.getSize()`
return false;		return false;
BF.setSize(NewSize);		BF.setSize(NewSize);
Layout.invalidateFragmentsFrom(&BF);		Layout.invalidateFragmentsFrom(&BF);
return true;		return true;
}		}

bool MCAssembler::relaxDwarfLineAddr(MCAsmLayout &Layout,		bool MCAssembler::relaxDwarfLineAddr(MCAsmLayout &Layout,
MCDwarfLineAddrFragment &DF) {		MCDwarfLineAddrFragment &DF) {
▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

llvm/lib/MC/MCFragment.cpp

Show First 20 Lines • Show All 418 Lines • ▼ Show 20 Lines	LLVM_DUMP_METHOD void MCFragment::dump() const {
case MCFragment::FT_LEB: {		case MCFragment::FT_LEB: {
const auto *LF = cast<MCLEBFragment>(this);		const auto *LF = cast<MCLEBFragment>(this);
OS << "\n ";		OS << "\n ";
OS << " Value:" << LF->getValue() << " Signed:" << LF->isSigned();		OS << " Value:" << LF->getValue() << " Signed:" << LF->isSigned();
break;		break;
}		}
case MCFragment::FT_BoundaryAlign: {		case MCFragment::FT_BoundaryAlign: {
const auto *BF = cast<MCBoundaryAlignFragment>(this);		const auto *BF = cast<MCBoundaryAlignFragment>(this);
if (BF->canEmitNops())		if (BF->hasEmitNops())
OS << " (can emit nops to align";		OS << " (emit nops)";
if (BF->isFused())
OS << " fused branch)";
else
OS << " unfused branch)";
OS << "\n ";		OS << "\n ";
		if (BF->hasValue())
		OS << " Value:" << hexdigit(BF->getValue());
OS << " BoundarySize:" << BF->getAlignment().value()		OS << " BoundarySize:" << BF->getAlignment().value()
		<< " MaxBytesToEmit:" << BF->getMaxBytesToEmit()
<< " Size:" << BF->getSize();		<< " Size:" << BF->getSize();
break;		break;
}		}
case MCFragment::FT_SymbolId: {		case MCFragment::FT_SymbolId: {
const auto *F = cast<MCSymbolIdFragment>(this);		const auto *F = cast<MCSymbolIdFragment>(this);
OS << "\n ";		OS << "\n ";
OS << " Sym:" << F->getSymbol();		OS << " Sym:" << F->getSymbol();
break;		break;
Show All 23 Lines

llvm/lib/MC/MCObjectStreamer.cpp

Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	MCFragment *MCObjectStreamer::getCurrentFragment() const {
assert(getCurrentSectionOnly() && "No current section!");		assert(getCurrentSectionOnly() && "No current section!");

if (CurInsertionPoint != getCurrentSectionOnly()->getFragmentList().begin())		if (CurInsertionPoint != getCurrentSectionOnly()->getFragmentList().begin())
return &*std::prev(CurInsertionPoint);		return &*std::prev(CurInsertionPoint);

return nullptr;		return nullptr;
}		}

static bool CanReuseDataFragment(const MCDataFragment &F,		static bool CanReuseDataFragment(const MCDataFragment &F, MCObjectStreamer &OS,
const MCAssembler &Assembler,
const MCSubtargetInfo *STI) {		const MCSubtargetInfo *STI) {
if (!F.hasInstructions())		if (!F.hasInstructions())
return true;		return true;

		MCAssembler &Assembler = OS.getAssembler();

		// When the target need align instructions, we need to determine the size
		// of some instructions during the relaxation, the easiest way to do it is
		// to emit each instruction into fragment of its own.
		if (Assembler.getBackend().allowAutoPadding())
		return false;

// When bundling is enabled, we don't want to add data to a fragment that		// When bundling is enabled, we don't want to add data to a fragment that
// already has instructions (see MCELFStreamer::EmitInstToData for details)		// already has instructions (see MCELFStreamer::EmitInstToData for details)
if (Assembler.isBundlingEnabled())		if (Assembler.isBundlingEnabled())
return Assembler.getRelaxAll();		return Assembler.getRelaxAll();
// If the subtarget is changed mid fragment we start a new fragment to record		// If the subtarget is changed mid fragment we start a new fragment to record
// the new STI.		// the new STI.
return !STI \|\| F.getSubtargetInfo() == STI;		return !STI \|\| F.getSubtargetInfo() == STI;
}		}

MCDataFragment *		MCDataFragment *
MCObjectStreamer::getOrCreateDataFragment(const MCSubtargetInfo *STI) {		MCObjectStreamer::getOrCreateDataFragment(const MCSubtargetInfo *STI) {
MCDataFragment *F = dyn_cast_or_null<MCDataFragment>(getCurrentFragment());		MCDataFragment *F = dyn_cast_or_null<MCDataFragment>(getCurrentFragment());
if (!F \|\| !CanReuseDataFragment(F, Assembler, STI)) {		if (!F \|\| !CanReuseDataFragment(F, this, STI)) {
F = new MCDataFragment();		F = new MCDataFragment();
insert(F);		insert(F);
}		}
return F;		return F;
}		}

		MCBoundaryAlignFragment *MCObjectStreamer::getOrCreateBoundaryAlignFragment() {
		auto *F = dyn_cast_or_null<MCBoundaryAlignFragment>(getCurrentFragment());
		if (!F \|\| F->hasEmitNopsOrValue()) {
		F = new MCBoundaryAlignFragment();
		insert(F);
		}
		return F;
		}

void MCObjectStreamer::visitUsedSymbol(const MCSymbol &Sym) {		void MCObjectStreamer::visitUsedSymbol(const MCSymbol &Sym) {
Assembler->registerSymbol(Sym);		Assembler->registerSymbol(Sym);
}		}

void MCObjectStreamer::EmitCFISections(bool EH, bool Debug) {		void MCObjectStreamer::EmitCFISections(bool EH, bool Debug) {
MCStreamer::EmitCFISections(EH, Debug);		MCStreamer::EmitCFISections(EH, Debug);
EmitEHFrame = EH;		EmitEHFrame = EH;
EmitDebugFrame = Debug;		EmitDebugFrame = Debug;
▲ Show 20 Lines • Show All 544 Lines • Show Last 20 Lines

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

//===-- X86AsmBackend.cpp - X86 Assembler Backend -------------------------===//		//===-- X86AsmBackend.cpp - X86 Assembler Backend -------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "MCTargetDesc/X86BaseInfo.h"		#include "MCTargetDesc/X86BaseInfo.h"
#include "MCTargetDesc/X86FixupKinds.h"		#include "MCTargetDesc/X86FixupKinds.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"
#include "llvm/BinaryFormat/MachO.h"		#include "llvm/BinaryFormat/MachO.h"
#include "llvm/MC/MCAsmBackend.h"		#include "llvm/MC/MCAsmBackend.h"
#include "llvm/MC/MCAssembler.h"		#include "llvm/MC/MCAssembler.h"
		#include "llvm/MC/MCCodeEmitter.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCDwarf.h"		#include "llvm/MC/MCDwarf.h"
#include "llvm/MC/MCELFObjectWriter.h"		#include "llvm/MC/MCELFObjectWriter.h"
#include "llvm/MC/MCExpr.h"		#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCFixupKindInfo.h"		#include "llvm/MC/MCFixupKindInfo.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCMachObjectWriter.h"		#include "llvm/MC/MCMachObjectWriter.h"
#include "llvm/MC/MCObjectStreamer.h"		#include "llvm/MC/MCObjectStreamer.h"
#include "llvm/MC/MCObjectWriter.h"		#include "llvm/MC/MCObjectWriter.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSectionMachO.h"		#include "llvm/MC/MCSectionMachO.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/MCValue.h"		#include "llvm/MC/MCValue.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		reamesUnsubmitted Not Done Reply Inline Actions Remove the reordering here. reames: Remove the reordering here.

using namespace llvm;		using namespace llvm;

namespace {		namespace {
/// A wrapper for holding a mask of the values from X86::AlignBranchBoundaryKind		/// A wrapper for holding a mask of the values from X86::AlignBranchBoundaryKind
class X86AlignBranchKind {		class X86AlignBranchKind {
private:		private:
uint8_t AlignBranchKind = 0;		uint8_t AlignBranchKind = 0;
Show All 29 Lines	public:
operator uint8_t() const { return AlignBranchKind; }		operator uint8_t() const { return AlignBranchKind; }
void addKind(X86::AlignBranchBoundaryKind Value) { AlignBranchKind \|= Value; }		void addKind(X86::AlignBranchBoundaryKind Value) { AlignBranchKind \|= Value; }
};		};

X86AlignBranchKind X86AlignBranchKindLoc;		X86AlignBranchKind X86AlignBranchKindLoc;

cl::opt<unsigned> X86AlignBranchBoundary(		cl::opt<unsigned> X86AlignBranchBoundary(
"x86-align-branch-boundary", cl::init(0),		"x86-align-branch-boundary", cl::init(0),
cl::desc(		cl::desc(
"Control how the assembler should align branches with NOP. If the "		"Control how the assembler should align branches with NOP or segment "
"boundary's size is not 0, it should be a power of 2 and no less "		"override prefix. If the boundary's size is not 0, it should be a "
"than 32. Branches will be aligned to prevent from being across or "		"power of 2 and no less than 16. Branches will be aligned to prevent "
"against the boundary of specified size. The default value 0 does not "		"from being across or against the boundary of specified size. The "
"align branches."));		"default value 0 does not align branches."));
		reamesUnsubmitted Done Reply Inline Actions a) segment prefixes aren't the only ones used are they? b) the wording changes can be pulled into their own review (or simply committed) reames: a) segment prefixes aren't the only ones used are they? b) the wording changes can be pulled…
		skanAuthorUnsubmitted Done Reply Inline Actions If there is no room to insert prefix, NOP will be emitted before the branch or fused pair. skan: If there is no room to insert prefix, NOP will be emitted before the branch or fused pair.

cl::opt<X86AlignBranchKind, true, cl::parser<std::string>> X86AlignBranch(		cl::opt<X86AlignBranchKind, true, cl::parser<std::string>> X86AlignBranch(
"x86-align-branch",		"x86-align-branch",
cl::desc("Specify types of branches to align (plus separated list of "		cl::desc("Specify types of branches to align (plus separated list of "
"types). The branches's types are combination of jcc, fused, "		"types). The branches's types are combination of jcc, fused, "
"jmp, call, ret, indirect."),		"jmp, call, ret, indirect."),
cl::value_desc("jcc indicates conditional jumps, fused indicates fused "		cl::value_desc("jcc indicates conditional jumps, fused indicates fused "
"conditional jumps, jmp indicates unconditional jumps, call "		"conditional jumps, jmp indicates unconditional jumps, call "
"indicates direct and indirect calls, ret indicates rets, "		"indicates direct and indirect calls, ret indicates rets, "
"indirect indicates indirect jumps."),		"indirect indicates indirect jumps."),
cl::location(X86AlignBranchKindLoc));		cl::location(X86AlignBranchKindLoc));

		cl::opt<unsigned> X86AlignBranchPrefixSize(
		"x86-align-branch-prefix-size", cl::init(0),
		cl::desc("Specify the maximum number of prefixes on an instruction to "
		"align branches. The number should be between 0 and 5."));

static unsigned getFixupKindSize(unsigned Kind) {		static unsigned getFixupKindSize(unsigned Kind) {
switch (Kind) {		switch (Kind) {
default:		default:
llvm_unreachable("invalid fixup kind!");		llvm_unreachable("invalid fixup kind!");
case FK_NONE:		case FK_NONE:
return 0;		return 0;
case FK_PCRel_1:		case FK_PCRel_1:
case FK_SecRel_1:		case FK_SecRel_1:
Show All 30 Lines	X86ELFObjectWriter(bool is64Bit, uint8_t OSABI, uint16_t EMachine,
: MCELFObjectTargetWriter(is64Bit, OSABI, EMachine, HasRelocationAddend) {}		: MCELFObjectTargetWriter(is64Bit, OSABI, EMachine, HasRelocationAddend) {}
};		};

class X86AsmBackend : public MCAsmBackend {		class X86AsmBackend : public MCAsmBackend {
const MCSubtargetInfo &STI;		const MCSubtargetInfo &STI;
std::unique_ptr<const MCInstrInfo> MCII;		std::unique_ptr<const MCInstrInfo> MCII;
X86AlignBranchKind AlignBranchType;		X86AlignBranchKind AlignBranchType;
Align AlignBoundary;		Align AlignBoundary;
		uint8_t AlignMaxPrefixSize;

bool isMacroFused(const MCInst &Cmp, const MCInst &Jcc) const;		bool isMacroFused(const MCInst &Cmp, const MCInst &Jcc) const;

bool needAlign(MCObjectStreamer &OS) const;		bool needAlign(MCObjectStreamer &OS) const;
bool needAlignInst(const MCInst &Inst) const;		bool needAlignInst(const MCInst &Inst) const;
MCBoundaryAlignFragment *
getOrCreateBoundaryAlignFragment(MCObjectStreamer &OS) const;		bool shouldAddPrefix(const MCInst &Inst) const;
		uint8_t choosePrefix(const MCInst &Inst) const;
MCInst PrevInst;		MCInst PrevInst;
		const MCFragment *LastFragmentToBeAligned = nullptr;

public:		public:
X86AsmBackend(const Target &T, const MCSubtargetInfo &STI)		X86AsmBackend(const Target &T, const MCSubtargetInfo &STI)
: MCAsmBackend(support::little), STI(STI),		: MCAsmBackend(support::little), STI(STI),
MCII(T.createMCInstrInfo()) {		MCII(T.createMCInstrInfo()) {
AlignBoundary = assumeAligned(X86AlignBranchBoundary);		AlignBoundary = assumeAligned(X86AlignBranchBoundary);
AlignBranchType = X86AlignBranchKindLoc;		AlignBranchType = X86AlignBranchKindLoc;
		AlignMaxPrefixSize = std::min<uint8_t>(X86AlignBranchPrefixSize, 5);
		MaskRayUnsubmitted Not Done Reply Inline Actions `AlignMaxPrefixSize = X86AlignBranchPrefixSize;` The error checking and normalization (to 5) should be done in an earlier place. MaskRay: `AlignMaxPrefixSize = X86AlignBranchPrefixSize;` The error checking and normalization (to 5)…
		skanAuthorUnsubmitted Done Reply Inline Actions Do you any suggestions about a appropriate, earlier place? skan: Do you any suggestions about a appropriate, earlier place?
}		}

bool allowAutoPadding() const override;		bool allowAutoPadding() const override;
void alignBranchesBegin(MCObjectStreamer &OS, const MCInst &Inst) override;		void alignBranchesBegin(MCObjectStreamer &OS, const MCInst &Inst) override;
void alignBranchesEnd(MCObjectStreamer &OS, const MCInst &Inst) override;		void alignBranchesEnd(MCObjectStreamer &OS, const MCInst &Inst) override;

unsigned getNumFixupKinds() const override {		unsigned getNumFixupKinds() const override {
return X86::NumTargetFixupKinds;		return X86::NumTargetFixupKinds;
▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	bool X86AsmBackend::needAlign(MCObjectStreamer &OS) const {

MCAssembler &Assembler = OS.getAssembler();		MCAssembler &Assembler = OS.getAssembler();
MCSection *Sec = OS.getCurrentSectionOnly();		MCSection *Sec = OS.getCurrentSectionOnly();
// To be Done: Currently don't deal with Bundle cases.		// To be Done: Currently don't deal with Bundle cases.
if (Assembler.isBundlingEnabled() && Sec->isBundleLocked())		if (Assembler.isBundlingEnabled() && Sec->isBundleLocked())
return false;		return false;

// Branches only need to be aligned in 32-bit or 64-bit mode.		// Branches only need to be aligned in 32-bit or 64-bit mode.
if (!(STI.hasFeature(X86::Mode64Bit) \|\| STI.hasFeature(X86::Mode32Bit)))		if (!(STI.hasFeature(X86::Mode64Bit) \|\| STI.hasFeature(X86::Mode32Bit)))
		reamesUnsubmitted Done Reply Inline Actions This looks to be a formatting change? If so, remove. You can commit this without separate review if desired. reames: This looks to be a formatting change? If so, remove. You can commit this without separate…
return false;		return false;

return true;		return true;
}		}

/// Check if the instruction operand needs to be aligned. Padding is disabled		/// Check if the instruction operand needs to be aligned. Padding is disabled
/// before intruction which may be rewritten by linker(e.g. TLSCALL).		/// before intruction which may be rewritten by linker(e.g. TLSCALL).
bool X86AsmBackend::needAlignInst(const MCInst &Inst) const {		bool X86AsmBackend::needAlignInst(const MCInst &Inst) const {
Show All 9 Lines	return (InstDesc.isConditionalBranch() &&
(InstDesc.isCall() &&		(InstDesc.isCall() &&
(AlignBranchType & X86::AlignBranchCall)) \|\|		(AlignBranchType & X86::AlignBranchCall)) \|\|
(InstDesc.isReturn() &&		(InstDesc.isReturn() &&
(AlignBranchType & X86::AlignBranchRet)) \|\|		(AlignBranchType & X86::AlignBranchRet)) \|\|
(InstDesc.isIndirectBranch() &&		(InstDesc.isIndirectBranch() &&
(AlignBranchType & X86::AlignBranchIndirect));		(AlignBranchType & X86::AlignBranchIndirect));
}		}

static bool canReuseBoundaryAlignFragment(const MCBoundaryAlignFragment &F) {		/// Check if prefix can be added before instruction \p Inst.
// If a MCBoundaryAlignFragment has not been used to emit NOP,we can reuse it.		bool X86AsmBackend::shouldAddPrefix(const MCInst &Inst) const {
return !F.canEmitNops();		// No prefix can be added if AlignMaxPrefixSize is 0.
		if (AlignMaxPrefixSize == 0)
		return false;

		if (needAlignInst(Inst))
		return false;

		reamesUnsubmitted Done Reply Inline Actions This should probably be: if (needAlign(Inst)) return false; reames: This should probably be: if (needAlign(Inst)) return false;
		// Linker may rewrite the instruction with variant symbol operand.
		return !hasVariantSymbol(Inst);
}		}

MCBoundaryAlignFragment *		/// Choose which prefix should be inserted before the instruction.
X86AsmBackend::getOrCreateBoundaryAlignFragment(MCObjectStreamer &OS) const {		///
auto *F = dyn_cast_or_null<MCBoundaryAlignFragment>(OS.getCurrentFragment());		/// If there is one, use the existing segment override prefix.
if (!F \|\| !canReuseBoundaryAlignFragment(*F)) {		/// If the target is 64-bit, use the CS.
F = new MCBoundaryAlignFragment(AlignBoundary);		/// If the target is 32-bit,
OS.insert(F);		/// - If the instruction has a ESP/EBP base register, use SS.
		/// - Otherwise use DS.
		uint8_t X86AsmBackend::choosePrefix(const MCInst &Inst) const {
		assert((STI.hasFeature(X86::Mode32Bit) \|\| STI.hasFeature(X86::Mode64Bit)) &&
		"Prefixes can be added only in 32-bit or 64-bit mode.");
		for (const auto &Operand : Inst) {
		craig.topperUnsubmitted Done Reply Inline Actions I think this loop picks up instructions that don't use segment registers for just memory. And probably picks the wrong prefix for this instruction mov %fs:0x1, %ss The loop will find the %ss destination register first. But the %fs is the correct segment to use. craig.topper: I think this loop picks up instructions that don't use segment registers for just memory. And…
		if (Operand.isReg())
		switch (Operand.getReg()) {
		default:
		reamesUnsubmitted Not Done Reply Inline Actions Please define an enum which gives a symbolic name for these values (if there isn't already one). (i.e. what the heck are these integer constant values?) reames: Please define an enum which gives a symbolic name for these values (if there isn't already one).
		skanAuthorUnsubmitted Done Reply Inline Actions It is the exact value to be emitted when it has corresponding segment prefix. See `X86MCCodeEmitter::emitSegmentOverridePrefix` skan: It is the exact value to be emitted when it has corresponding segment prefix. See…
		break;
		case X86::CS:
		return 0x2e;
		case X86::SS:
		return 0x36;
		case X86::DS:
		return 0x3e;
		case X86::ES:
		return 0x26;
		case X86::FS:
		return 0x64;
		case X86::GS:
		return 0x65;
}		}
return F;		}
		if (STI.hasFeature(X86::Mode64Bit))
		return 0x2e;

		unsigned Opcode = Inst.getOpcode();
		const MCInstrDesc &Desc = MCII->get(Opcode);
		uint64_t TSFlags = Desc.TSFlags;
		int MemoryOperand = X86II::getMemoryOperandNo(TSFlags);
		if (MemoryOperand >= 0) {
		unsigned CurOp = X86II::getOperandBias(Desc);
		unsigned BaseRegNum = MemoryOperand + CurOp + X86::AddrBaseReg;
		unsigned BaseReg = Inst.getOperand(BaseRegNum).getReg();
		if (BaseReg == X86::ESP \|\| BaseReg == X86::EBP)
		return 0x36;
		}
		return 0x3e;
}		}

/// Insert MCBoundaryAlignFragment before instructions to align branches.		/// Insert MCBoundaryAlignFragment before instructions to align branches.
void X86AsmBackend::alignBranchesBegin(MCObjectStreamer &OS,		void X86AsmBackend::alignBranchesBegin(MCObjectStreamer &OS,
const MCInst &Inst) {		const MCInst &Inst) {
if (!needAlign(OS))		if (!needAlign(OS))
return;		return;

		// Summary of inserting scheme(Two Steps):
		// Step 1:
		// If the previous instruction is the first instruction in a fusible pair
		// - If macro fusion actually happens, emit NOP before the first instrucion
		// in the fused pair and skip step 2.
		// - If the macro fusion doesn't happen indeed, emit prefix before the
		// previous instruction.
		//
		// Step 2:
		// If the instruction needs to be aligned, emit NOP before the instruction.
		//
		// If the instruction is the first instruction in a fusible pair, put a
		// a placeholder here.
		//
		// Otherwise emit prefix before the instruction.

MCFragment *CF = OS.getCurrentFragment();		MCFragment *CF = OS.getCurrentFragment();

		// Prefix or NOP shouldn't be inserted after hardcode, e.g.
		//
		// \code
		// .byte 0x2e
		// jmp .Label0
		// \endcode
		//
		// since there is no clear instruction boundary.
		if (isa_and_nonnull<MCDataFragment>(CF) && !CF->hasInstructions())
		reamesUnsubmitted Not Done Reply Inline Actions Please remove this. It should be covered by the compiler support patch which already landed for compiled code, and he assembler syntax is separate. reames: Please remove this. It should be covered by the compiler support patch which already landed…
		skanAuthorUnsubmitted Done Reply Inline Actions Remove this will cause test align-branch-32-2a.s to fail. This check is simply and direct, I think we can keep this currently. skan: Remove this will cause test align-branch-32-2a.s to fail. This check is simply and direct, I…
		return;

		// The number of prefixes is limted by AlignMaxPrefixSize for some peformance
		// reasons, so we need to compute how many prefixes can be added.
		auto GetRemainingPrefixSize = [&](const MCInst &Inst) {
		SmallString<256> Code;
		raw_svector_ostream VecOS(Code);
		OS.getAssembler().getEmitter().emitPrefix(Inst, VecOS, STI);
		assert(Code.size() < 15 && "The number of prefixes must be less than 15.");
		reamesUnsubmitted Done Reply Inline Actions Please assert that the total number of prefixes fits within a uint8_t. It does, but having that explicitly asserted/noted would be helpful. reames: Please assert that the total number of prefixes fits within a uint8_t. It does, but having…
		uint8_t ExistingPrefixSize = static_cast<uint8_t>(Code.size());
		return (AlignMaxPrefixSize > ExistingPrefixSize)
		? (AlignMaxPrefixSize - ExistingPrefixSize)
		: 0;
		};

bool NeedAlignFused = AlignBranchType & X86::AlignBranchFused;		bool NeedAlignFused = AlignBranchType & X86::AlignBranchFused;
if (NeedAlignFused && isMacroFused(PrevInst, Inst) && CF) {		// Step 1:
// Macro fusion actually happens and there is no other fragment inserted		// Handle the condition when the previous the instruction is the first
// after the previous instruction. NOP can be emitted in PF to align fused		// instruction in a fusible pair. Note: We need to check the previous
// jcc.		// fragment is a BF since we may encounter the case:
if (auto *PF =
dyn_cast_or_null<MCBoundaryAlignFragment>(CF->getPrevNode())) {
const_cast<MCBoundaryAlignFragment *>(PF)->setEmitNops(true);
const_cast<MCBoundaryAlignFragment *>(PF)->setFused(true);
}
} else if (needAlignInst(Inst)) {
// Note: When there is at least one fragment, such as MCAlignFragment,
// inserted after the previous instruction, e.g.
//		//
// \code		// \code
// cmp %rax %rcx		// cmp %rax %rcx
// .align 16		// .align 16
// je .Label0		// je .Label0
// \ endcode		// \endcode
//		//
// We will treat the JCC as a unfused branch although it may be fused		// MCAlignFragment can grow and shrink, so it is not ensured to get a fixed
		craig.topperUnsubmitted Done Reply Inline Actions limted->limited peformance->performance craig.topper: limted->limited peformance->performance
// with the CMP.		// size after finite times of relaxation. NOP or prefix should not emitted
auto *F = getOrCreateBoundaryAlignFragment(OS);		// before the CMP since it may cause MCAssembler::relaxBoundaryAlign not to
		// converge.
		if (NeedAlignFused && isFirstMacroFusibleInst(PrevInst, *MCII) && CF &&
		isa_and_nonnull<MCBoundaryAlignFragment>(CF->getPrevNode())) {
		auto PF = const_cast<MCBoundaryAlignFragment >(
		cast<MCBoundaryAlignFragment>(CF->getPrevNode()));
		// Macro fusion actually happens, so emit NOP before the first instrucion in
		// the fused pair. Note: When there is a MCAlignFragment inserted just
		// before the first instruction in the fused pair, e.g.
		//
		// \code
		// .align 16
		// cmp %rax %rcx
		// je .Label0
		// \endcode
		//
		// We will not emit NOP before the CMP since the align directive is
		// used to align the fused pair rather than NOP.
		if (isMacroFused(PrevInst, Inst)) {
		if (isa_and_nonnull<MCAlignFragment>(PF->getPrevNode()))
		LuoYuankeUnsubmitted Done Reply Inline Actions I notice line 623 also check the MCAlignFragment. Is it better to check MCAlignFragment at the beginning, and return without insert any MCBoundaryAlignFragment if the previous fragment is MCAlignFragment? In the following code, we can assume the previous fragment is not MCAlignFragment. LuoYuanke: I notice line 623 also check the MCAlignFragment. Is it better to check MCAlignFragment at the…
		skanAuthorUnsubmitted Done Reply Inline Actions We shouldn't do that. Only NOP should not be emitted after a `MCAlignFragment`, since it the `MCAlignFragment` is used to align the branch or the fused pair rather than NOP. However, prefix can be emitted after a `MCAlignFragment`, since it is the part of the instruction. skan: We shouldn't do that. Only NOP should not be emitted after a `MCAlignFragment`, since it the…
		return;
		craig.topperUnsubmitted Done Reply Inline Actions Your emitPrefix function includes the 2 byte and 3 byte VEX prefixes, the 4 byte EVEX prefix, and the 3 byte XOP prefix. The bytes after the leading byte of those prefixes can be any byte value and does not indicate a segment value. craig.topper: Your emitPrefix function includes the 2 byte and 3 byte VEX prefixes, the 4 byte EVEX prefix…
		PF->setAlignment(AlignBoundary);
		MaskRayUnsubmitted Done Reply Inline Actions You can remove `PF->setAlignment(AlignBoundary);` here. MaskRay: You can remove `PF->setAlignment(AlignBoundary);` here.
		skanAuthorUnsubmitted Done Reply Inline Actions The `MCBoundaryAlignFragment` may not emit anything, so the constructor of `MCBoundaryAlignFragment` doesn't set the `AlignBoundary` with value provided by the user. The data member `AlignBoundary` will be not corretly set here if `PF->setAlignment(AlignBoundary)` is removed. skan: The `MCBoundaryAlignFragment` may not emit anything, so the constructor of…
		PF->setEmitNops(true);
		LuoYuankeUnsubmitted Done Reply Inline Actions Just to confirm that there is no MCBoundaryAlignFragment between the 2 macro fusion instructions. Right? LuoYuanke: Just to confirm that there is no MCBoundaryAlignFragment between the 2 macro fusion…
		skanAuthorUnsubmitted Done Reply Inline Actions Right, there is no. skan: Right, there is no.
		return;
		} else if (shouldAddPrefix(PrevInst)) {
		// Macro fusion doesn't happen indeed, emit prefix before the previous
		// instruction.
		PF->setAlignment(AlignBoundary);
		MaskRayUnsubmitted Done Reply Inline Actions And here. MaskRay: And here.
		skanAuthorUnsubmitted Done Reply Inline Actions I think we can not remove it. skan: I think we can not remove it.
		PF->setMaxBytesToEmit(GetRemainingPrefixSize(PrevInst));
		PF->setValue(choosePrefix(PrevInst));
		}
		}

		// Step 2:
		if (needAlignInst(Inst)) {
		// Handle the condition when the instruction to be aligned is unfused. Note:
		// When there is a MCAlignFragment inserted just before the instruction to
		// be aligned, e.g.
		//
		// \code
		// .align 16
		// je .Label0
		// \endcode
		//
		// We will not emit NOP before the instruction since the align directive is
		// used to align JCC rather than NOP.
		if (isa_and_nonnull<MCAlignFragment>(CF))
		return;
		// Emit NOP before the instruction to be aligned.
		auto *F = OS.getOrCreateBoundaryAlignFragment();
		F->setAlignment(AlignBoundary);
F->setEmitNops(true);		F->setEmitNops(true);
F->setFused(false);
} else if (NeedAlignFused && isFirstMacroFusibleInst(Inst, *MCII)) {		} else if (NeedAlignFused && isFirstMacroFusibleInst(Inst, *MCII)) {
// We don't know if macro fusion happens until the reaching the next		// We don't know if macro fusion happens until reaching the next
// instruction, so a place holder is put here if necessary.		// instruction, so a placeholder is put here if necessary.
getOrCreateBoundaryAlignFragment(OS);		OS.getOrCreateBoundaryAlignFragment();
		MaskRayUnsubmitted Done Reply Inline Actions If you place `F->setAlignment(AlignBoundary)` here, you can avoid 2 setAlignment calls in `X86AsmBackend::alignBranchesBegin`. MaskRay: If you place `F->setAlignment(AlignBoundary)` here, you can avoid 2 setAlignment calls in…
		skanAuthorUnsubmitted Done Reply Inline Actions If a `MCBoundaryAlignFragment` is not used to emit anything, I prefer it doesn't set the alignment. If we place `F->setAlignment(AlignBoundary)` here, the operation `setAlignment(AlignBoundary)` is unnecessary for those `MCBoundaryAlignFragments` that will not emit anything. skan: If a `MCBoundaryAlignFragment` is not used to emit anything, I prefer it doesn't set the…
		} else if (shouldAddPrefix(Inst)) {
		// Emit prefixes before instruction that doesn't need to be aligned.
		auto *F = OS.getOrCreateBoundaryAlignFragment();
		F->setAlignment(AlignBoundary);
		F->setMaxBytesToEmit(GetRemainingPrefixSize(Inst));
		F->setValue(choosePrefix(Inst));
}		}

PrevInst = Inst;
}		}

/// Insert a MCBoundaryAlignFragment to mark the end of the branch to be aligned		/// Set the last fragment in the set of fragments to be aligned (which is
/// if necessary.		/// current fragment indeed) for BF and insert a new BF to prevent further
		/// instruction from being added to the current fragment if necessary.
void X86AsmBackend::alignBranchesEnd(MCObjectStreamer &OS, const MCInst &Inst) {		void X86AsmBackend::alignBranchesEnd(MCObjectStreamer &OS, const MCInst &Inst) {
if (!needAlign(OS))		if (!needAlign(OS))
return;		return;
// If the branch is emitted into a MCRelaxableFragment, we can determine the
// size of the branch easily in MCAssembler::relaxBoundaryAlign. When the		PrevInst = Inst;
		MaskRayUnsubmitted Done Reply Inline Actions It seems unnecessary to move it from alignBranchesBegin here. MaskRay: It seems unnecessary to move it from alignBranchesBegin here.
		skanAuthorUnsubmitted Done Reply Inline Actions The purpose of moving `PrevInst = Inst` here is to make the early return in `alignBranchesBegin` simple. Otherwise, we need to write `PrevInst = Inst; return` in alignBranchesBegin` rather than `return`. skan: The purpose of moving `PrevInst = Inst` here is to make the early return in…
// branch is fused, the fused branch(macro fusion pair) must be emitted into
// two fragments. Or when the branch is unfused, the branch must be emitted		if (!needAlignInst(Inst))
// into one fragment. The MCRelaxableFragment naturally marks the end of the		return;
// fused or unfused branch.
// Otherwise, we need to insert a MCBoundaryAlignFragment to mark the end of		const MCFragment *CF = OS.getCurrentFragment();
// the branch. This MCBoundaryAlignFragment may be reused to emit NOP to align		for (auto *F = CF; F && F != LastFragmentToBeAligned &&
// other branch.		(F->hasInstructions() \|\| isa<MCBoundaryAlignFragment>(F));
if (needAlignInst(Inst) && !isa<MCRelaxableFragment>(OS.getCurrentFragment()))		F = F->getPrevNode()) {
OS.insert(new MCBoundaryAlignFragment(AlignBoundary));		// The fragments to be aligned should be in the same section with this
		// fragment, and each non-BF fragment on the path from this fragment to the
		// fragments to be aligned must have a fixed size after finite times of
		// relaxation. Currently, we conservatively use hasInstruction to ensure
		// that.
		if (auto *BF = dyn_cast<MCBoundaryAlignFragment>(F)) {
		if (BF->hasEmitNopsOrValue())
		const_cast<MCBoundaryAlignFragment *>(BF)->setFragment(CF);
		// There is at most one MCBoundaryAlignFragment to align one instruction
		// if we only emit NOP to align instruction.
		if (AlignMaxPrefixSize == 0)
		break;
		}
		}

		LastFragmentToBeAligned = CF;

		// We need no further instructions can be emitted into the current fragment.
		//
		// If current fragment is a MCRelaxableFragment, then no more
		// instructions can be pushed into since MCRelaxableFragment only holds one
		// instruction.
		//
		// Otherwise, we need to insert a new BF to truncate the current fragment.
		// This MCBoundaryAlignFragment may be reused to emit NOP or segment override
		// prefix to align other instruction.

		if (!isa<MCRelaxableFragment>(OS.getCurrentFragment()))
		OS.insert(new MCBoundaryAlignFragment());

// Update the maximum alignment on the current section if necessary.		// Update the maximum alignment on the current section if necessary.
MCSection *Sec = OS.getCurrentSectionOnly();		MCSection *Sec = OS.getCurrentSectionOnly();
if (AlignBoundary.value() > Sec->getAlignment())		if (AlignBoundary.value() > Sec->getAlignment())
Sec->setAlignment(AlignBoundary);		Sec->setAlignment(AlignBoundary);
}		}

Optional<MCFixupKind> X86AsmBackend::getFixupKind(StringRef Name) const {		Optional<MCFixupKind> X86AsmBackend::getFixupKind(StringRef Name) const {
▲ Show 20 Lines • Show All 633 Lines • Show Last 20 Lines

llvm/test/MC/X86/align-branch-32-1a.s

	# Check NOP padding is disabled before instruction that has variant symbol operand.			## Check NOP/Prefix padding is disabled for instruction that has variant symbol operand.
	# RUN: llvm-mc -filetype=obj -triple i386-unknown-unknown --x86-align-branch-boundary=32 --x86-align-branch=call %s \| llvm-objdump -d - \| FileCheck %s			# RUN: llvm-mc -filetype=obj -triple i386-unknown-unknown --x86-align-branch-boundary=32 --x86-align-branch=call+jmp %s \| llvm-objdump -d - \| FileCheck %s
				# RUN: llvm-mc -filetype=obj -triple i386-unknown-unknown --x86-align-branch-boundary=32 --x86-align-branch=call+jmp --x86-align-branch-prefix-size=4 %s \| llvm-objdump -d - \| FileCheck %s

	# CHECK: 00000000 foo:			# CHECK: 00000000 foo:
	# CHECK-COUNT-5: : 64 a3 01 00 00 00 movl %eax, %fs:1			# CHECK-COUNT-5: : 64 a3 01 00 00 00 movl %eax, %fs:1
	# CHECK: 1e: e8 fc ff ff ff calll {{.*}}			# CHECK: 1e: e8 fc ff ff ff calll {{.*}}
	# CHECK-COUNT-4: : 64 a3 01 00 00 00 movl %eax, %fs:1			# CHECK-COUNT-4: : 64 a3 01 00 00 00 movl %eax, %fs:1
	# CHECK: 3b: 55 pushl %ebp			# CHECK: 3b: 55 pushl %ebp
	# CHECK-NEXT: 3c: ff 91 00 00 00 00 calll *(%ecx)			# CHECK-NEXT: 3c: ff 91 00 00 00 00 calll *(%ecx)
	# CHECK-COUNT-4: : 64 a3 01 00 00 00 movl %eax, %fs:1			# CHECK-COUNT-4: : 64 a3 01 00 00 00 movl %eax, %fs:1
	Show All 28 Lines

llvm/test/MC/X86/align-branch-32-2a.s

This file was added.

				## Check no prefix is inserted after hardcode.
				# RUN: llvm-mc -filetype=obj -triple i386-unknown-unknown --x86-align-branch-boundary=32 --x86-align-branch=fused+jcc+jmp --x86-align-branch-prefix-size=2 %s \| llvm-objdump -d - \| FileCheck %s

				# CHECK: 00000000 main:
				# CHECK-NEXT: 0: 2e 55 pushl %ebp
				# CHECK-NEXT: 2: 2e 89 e5 movl %esp, %ebp
				# CHECK-NEXT: 5: 3e 55 pushl %ebp
				# CHECK-COUNT-25: 55 pushl %ebp
				# CHECK-NEXT: 20: eb 00 jmp {{.*}}
				# CHECK: 00000022 infiniteLoop:
				# CHECK-NEXT: 22: eb dc jmp {{.*}}

				.text
				.globl infiniteLoop
				main:
				.byte 0x2e
				pushl %ebp
				.byte 0x2e
				movl %esp, %ebp
				.rept 26
				pushl %ebp
				.endr
				jmp infiniteLoop
				infiniteLoop:
				jmp main

llvm/test/MC/X86/align-branch-32-3a.s

This file was added.

				## Check approriate prefix is choosen to prefix an instruction.
				# RUN: llvm-mc -filetype=obj -triple i386-unknown-unknown --x86-align-branch-boundary=32 --x86-align-branch=fused+jcc+jmp --x86-align-branch-prefix-size=2 %s \| llvm-objdump -d - \| FileCheck %s

				# CHECK: 00000000 foo:
				# CHECK-NEXT: 0: 65 65 a3 01 00 00 00 movl %eax, %gs:1
				# CHECK-NEXT: 7: 3e 55 pushl %ebp
				# CHECK-NEXT: 9: 57 pushl %edi
				# CHECK-COUNT-2: : 55 pushl %ebp
				# CHECK: c: 89 e5 movl %esp, %ebp
				# CHECK-NEXT: e: 89 7d f8 movl %edi, -8(%ebp)
				# CHECK-COUNT-5: : 89 75 f4 movl %esi, -12(%ebp)
				# CHECK: 20: 39 c5 cmpl %eax, %ebp
				# CHECK-NEXT: 22: 74 5e je {{.*}}
				# CHECK-NEXT: 24: 3e 89 73 f4 movl %esi, %ds:-12(%ebx)
				# CHECK-NEXT: 28: 89 75 f4 movl %esi, -12(%ebp)
				# CHECK-NEXT: 2b: 89 7d f8 movl %edi, -8(%ebp)
				# CHECK-COUNT-5: : 89 75 f4 movl %esi, -12(%ebp)
				# CHECK-COUNT-3: : 5d popl %ebp
				# CHECK: 40: 74 40 je {{.*}}
				# CHECK-NEXT: 42: 5d popl %ebp
				# CHECK-NEXT: 43: 74 3d je {{.*}}
				# CHECK-NEXT: 45: 36 89 44 24 fc movl %eax, %ss:-4(%esp)
				# CHECK-NEXT: 4a: 89 75 f4 movl %esi, -12(%ebp)
				# CHECK-NEXT: 4d: 89 7d f8 movl %edi, -8(%ebp)
				# CHECK-COUNT-5: : 89 75 f4 movl %esi, -12(%ebp)
				# CHECK: 5f: 5d popl %ebp
				# CHECK-NEXT: 60: eb 26 jmp {{.*}}
				# CHECK-NEXT: 62: eb 24 jmp {{.*}}
				# CHECK-NEXT: 64: eb 22 jmp {{.*}}
				# CHECK-NEXT: 66: 89 45 fc movl %eax, -4(%ebp)
				# CHECK-NEXT: 69: 89 75 f4 movl %esi, -12(%ebp)
				# CHECK-NEXT: 6c: 89 7d f8 movl %edi, -8(%ebp)
				# CHECK-COUNT-3: : 89 75 f4 movl %esi, -12(%ebp)
				# CHECK-COUNT-2: : 5d popl %ebp
				# CHECK-NEXT: 7a: 39 c5 cmpl %eax, %ebp
				# CHECK-NEXT: 7c: 74 04 je {{.*}}
				# CHECK-COUNT-2: : 90 nop
				# CHECK-NEXT: 80: eb 06 jmp {{.*}}
				# CHECK-NEXT: 82: 8b 45 f4 movl -12(%ebp), %eax
				# CHECK-NEXT: 85: 89 45 fc movl %eax, -4(%ebp)
				# CHECK-COUNT-4: : 89 b5 50 fb ff ff movl %esi, -1200(%ebp)
				# CHECK: a0: 89 75 0c movl %esi, 12(%ebp)
				# CHECK-NEXT: a3: e9 fc ff ff ff jmp {{.*}}
				# CHECK-COUNT-4: : 89 b5 50 fb ff ff movl %esi, -1200(%ebp)
				# CHECK: c0: 89 75 00 movl %esi, (%ebp)
				# CHECK-NEXT: c3: 74 c3 je {{.*}}
				# CHECK-NEXT: c5: 74 c1 je {{.*}}

				.text
				.globl foo
				.p2align 4
				foo:
				movl %eax, %gs:0x1
				pushl %ebp
				pushl %edi
				.rept 2
				pushl %ebp
				.endr
				movl %esp, %ebp
				movl %edi, -8(%ebp)
				.rept 5
				movl %esi, -12(%ebp)
				.endr
				cmp %eax, %ebp
				je .L_2
				movl %esi, -12(%ebx)
				movl %esi, -12(%ebp)
				movl %edi, -8(%ebp)
				.rept 5
				movl %esi, -12(%ebp)
				.endr
				.rept 3
				popl %ebp
				.endr
				je .L_2
				popl %ebp
				je .L_2
				movl %eax, -4(%esp)
				movl %esi, -12(%ebp)
				movl %edi, -8(%ebp)
				.rept 5
				movl %esi, -12(%ebp)
				.endr
				popl %ebp
				jmp .L_3
				jmp .L_3
				jmp .L_3
				movl %eax, -4(%ebp)
				movl %esi, -12(%ebp)
				movl %edi, -8(%ebp)
				.rept 3
				movl %esi, -12(%ebp)
				.endr
				.rept 2
				popl %ebp
				.endr
				cmp %eax, %ebp
				je .L_2
				jmp .L_3
				.L_2:
				movl -12(%ebp), %eax
				movl %eax, -4(%ebp)
				.L_3:
				.rept 4
				movl %esi, -1200(%ebp)
				.endr
				movl %esi, 12(%ebp)
				jmp bar
				.rept 4
				movl %esi, -1200(%ebp)
				.endr
				movl %esi, (%ebp)
				je .L_3
				je .L_3

llvm/test/MC/X86/align-branch-32-4a.s

This file was added.

				## Check prefix of instruction is limited by option --x86-align-branch-prefix-size=NUM.
				# RUN: llvm-mc -filetype=obj -triple i386-unknown-unknown --x86-align-branch-boundary=32 --x86-align-branch=fused+jcc+jmp --x86-align-branch-prefix-size=4 %s \| llvm-objdump -d - \| FileCheck %s

				# CHECK: 00000000 foo:
				# CHECK-NEXT: 0: 3e 66 0f 3a 60 00 03 pcmpestrm $3, %ds:(%eax), %xmm0
				# CHECK-NEXT: 7: 3e c4 e3 79 60 00 03 vpcmpestrm $3, %ds:(%eax), %xmm0
				# CHECK-NEXT: e: 65 65 65 a3 01 00 00 00 movl %eax, %gs:1
				# CHECK-COUNT-3: : 89 75 f4 movl %esi, -12(%ebp)
				# CHECK-NEXT: 1f: 55 pushl %ebp
				# CHECK-NEXT: 20: a8 04 testb $4, %al
				# CHECK-NEXT: 22: 70 dc jo {{.*}}

				.text
				.globl foo
				.p2align 4
				foo:
				.L1:
				pcmpestrm $3, (%eax), %xmm0
				vpcmpestrm $3, (%eax), %xmm0
				movl %eax, %gs:0x1
				movl %esi, -12(%ebp)
				movl %esi, -12(%ebp)
				movl %esi, -12(%ebp)
				pushl %ebp
				testb $0x4,%al
				jo .L1

llvm/test/MC/X86/align-branch-64-1e.s

This file was added.

				## Check only fused conditional jumps, conditional jumps and unconditional jumps are aligned with option --x86-align-branch-boundary=32 --x86-align-branch=fused+jcc+jmp --x86-align-branch-prefix-size=4
				# RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown --x86-align-branch-boundary=32 --x86-align-branch=fused+jcc+jmp --x86-align-branch-prefix-size=4 %p/Inputs/align-branch-64-1.s \| llvm-objdump -d - > %t1
				# RUN: FileCheck --input-file=%t1 %s

				# CHECK: 0000000000000000 foo:
				# CHECK-NEXT: 0: 64 64 64 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK-COUNT-2: : 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK: 1b: 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: 1e: 31 c0 xorl %eax, %eax
				# CHECK-NEXT: 20: 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: 23: 74 5d je {{.*}}
				# CHECK-NEXT: 25: 64 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK-COUNT-2: : 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK: 3e: 31 c0 xorl %eax, %eax
				# CHECK-NEXT: 40: 74 40 je {{.*}}
				# CHECK-NEXT: 42: 5d popq %rbp
				# CHECK-NEXT: 43: 74 3d je {{.*}}
				# CHECK-NEXT: 45: 64 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK-COUNT-2: : 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK: 5e: 31 c0 xorl %eax, %eax
				# CHECK-NEXT: 60: eb 26 jmp {{.*}}
				# CHECK-NEXT: 62: eb 24 jmp {{.*}}
				# CHECK-NEXT: 64: eb 22 jmp {{.*}}
				# CHECK-COUNT-2: : 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK: 76: 89 45 fc movl %eax, -4(%rbp)
				# CHECK-NEXT: 79: 5d popq %rbp
				# CHECK-NEXT: 7a: 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: 7d: 74 03 je {{.*}}
				# CHECK-NEXT: 7f: 90 nop
				# CHECK-NEXT: 80: eb 06 jmp {{.*}}
				# CHECK-NEXT: 82: 8b 45 f4 movl -12(%rbp), %eax
				# CHECK-NEXT: 85: 89 45 fc movl %eax, -4(%rbp)
				# CHECK-COUNT-10: : 89 b5 50 fb ff ff movl %esi, -1200(%rbp)
				# CHECK: c4: eb c2 jmp {{.*}}
				# CHECK-NEXT: c6: c3 retq

llvm/test/MC/X86/align-branch-64-2d.s

This file was added.

				## Check only indirect jumps and calls are aligned with option --x86-align-branch-boundary=32 --x86-align-branch=indirect+call --x86-align-branch-prefix-size=4
				# RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown --x86-align-branch-boundary=32 --x86-align-branch=indirect+call --x86-align-branch-prefix-size=4 %p/Inputs/align-branch-64-2.s \| llvm-objdump -d - \| FileCheck %s
				MaskRayUnsubmitted Not Done Reply Inline Actions When writing tests, make sure llvm-mc and GNU as emit jmp of the same length. There are differences (D72197). MaskRay: When writing tests, make sure llvm-mc and GNU as emit jmp of the same length. There are…
				skanAuthorUnsubmitted Done Reply Inline Actions According to my understanding, you are referring to the reallocation of call? I will update the tests when D72197 is landed. skan: According to my understanding, you are referring to the reallocation of call? I will update the…

				# CHECK: 0000000000000000 foo:
				# CHECK-NEXT: 0: 64 64 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK-COUNT-2: : 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK-COUNT-2: : 89 75 f4 movl %esi, -12(%rbp)
				# CHECK: 20: ff e0 jmpq *%rax
				# CHECK-NEXT: 22: 64 64 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK-COUNT-2: : 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK: 3c: 89 75 f4 movl %esi, -12(%rbp)
				# CHECK-NEXT: 3f: 55 pushq %rbp
				# CHECK-NEXT: 40: ff d0 callq *%rax
				# CHECK-NEXT: 42: 64 64 64 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK-NEXT: 4d: 64 64 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK-NEXT: 57: 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK-NEXT: 5f: 55 pushq %rbp
				# CHECK-NEXT: 60: e8 9b ff ff ff callq {{.*}}
				# CHECK-COUNT-4: : 64 89 04 25 01 00 00 00 movl %eax, %fs:1
				# CHECK: 85: ff 14 25 00 00 00 00 callq *0

llvm/test/MC/X86/align-branch-64-7a.s

This file was added.

				## Check no prefixes is added to the instruction if there is a align directive between the instruction and the target branch
				# RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown --x86-align-branch-boundary=32 --x86-align-branch=jmp --x86-align-branch-prefix-size=5 %s \| llvm-objdump -d - \| FileCheck %s

				# CHECK: 0000000000000000 test1:
				# CHECK-NEXT: 0: 31 d2 xorl %edx, %edx
				# CHECK-NEXT: 2: 89 8c 24 84 00 00 00 movl %ecx, 132(%rsp)
				# CHECK-NEXT: 9: 4c 89 c1 movq %r8, %rcx
				# CHECK-NEXT: c: 4c 8b 8c 24 88 00 00 00 movq 136(%rsp), %r9
				# CHECK-COUNT-4: : 90 nop
				# CHECK: 18: 66 66 90 nop
				# CHECK-NEXT: 1b: 2e 2e 4c 89 c1 movq %r8, %rcx
				# CHECK-NEXT: 20: eb de jmp {{.*}}
				# CHECK-NEXT: 22: c3 retq

				.text
				.globl test1
				test1:
				.Ltmp0:
				xorl %edx, %edx
				movl %ecx, 132(%rsp)
				movq %r8, %rcx
				movq 136(%rsp), %r9
				.p2align 3, 0x90
				.byte 102
				.byte 102
				nop
				movq %r8, %rcx
				jmp .Ltmp0
				retq

llvm/test/MC/X86/align-branch-64-8a.s

This file was added.

				## Check the case multiple CMPs are followed a jcc is correctly handled.
				# RUN: llvm-mc -filetype=obj -triple x86_64-unknown-unknown --x86-align-branch-boundary=32 --x86-align-branch=fused+jcc --x86-align-branch-prefix-size=5 %s \| llvm-objdump -d - \| FileCheck %s
				MaskRayUnsubmitted Done Reply Inline Actions Use `##` for comments. `x86_64-unknown-unknown` can be simplified to `x86_64` (the default is ELF). MaskRay: Use `## ` for comments. `x86_64-unknown-unknown` can be simplified to `x86_64` (the default is…
				skanAuthorUnsubmitted Done Reply Inline Actions I prefer to use `x86_64-unknown-unknown`, it seems more clear to me. skan: I prefer to use `x86_64-unknown-unknown`, it seems more clear to me.

				# CHECK: 0000000000000000 test1:
				# CHECK-NEXT: 0: 2e 2e 2e 2e 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: 7: 2e 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: b: 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: e: 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: 11: 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: 14: 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: 17: 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: 1a: 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: 1d: 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: 20: 48 39 c5 cmpq %rax, %rbp
				# CHECK-NEXT: 23: 74 db je {{.*}}

				.text
				.globl test1
				test1:
				.Ltmp0:
				.rept 10
				cmp %rax, %rbp
				.endr
				je .Ltmp0

This is an archive of the discontinued LLVM Phabricator instance.

Align branches within 32-Byte boundary(Prefix padding)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 237482

llvm/include/llvm/MC/MCFragment.h

llvm/include/llvm/MC/MCObjectStreamer.h

llvm/lib/MC/MCAssembler.cpp

llvm/lib/MC/MCFragment.cpp

llvm/lib/MC/MCObjectStreamer.cpp

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp

llvm/test/MC/X86/align-branch-32-1a.s

llvm/test/MC/X86/align-branch-32-2a.s

llvm/test/MC/X86/align-branch-32-3a.s

llvm/test/MC/X86/align-branch-32-4a.s

llvm/test/MC/X86/align-branch-64-1e.s

llvm/test/MC/X86/align-branch-64-2d.s

llvm/test/MC/X86/align-branch-64-7a.s

llvm/test/MC/X86/align-branch-64-8a.s

Align branches within 32-Byte boundary(Prefix padding)
AbandonedPublic