This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Driver/
-
clang/
-
Driver/
-
Options.td
-
lib/Basic/Targets/
-
Basic/
-
Targets/
-
X86.h
3
X86.cpp
-
test/Driver/
-
Driver/
-
x86-target-features.c
-
lld/
-
ELF/
-
Arch/
4
X86.cpp
2
X86_64.cpp
-
Config.h
-
Driver.cpp
-
test/ELF/
-
ELF/
-
i386-retpoline-nopic.s
-
i386-retpoline-pic.s
-
x86-64-retpoline-znow.s
-
x86-64-retpoline.s
-
llvm/
-
include/llvm/
-
llvm/
-
CodeGen/
-
Passes.h
-
TargetLowering.h
2/3
TargetPassConfig.h
-
TargetSubtargetInfo.h
-
InitializePasses.h
-
lib/
-
CodeGen/
-
CMakeLists.txt
-
CodeGen.cpp
16/16
IndirectBrExpandPass.cpp
-
TargetPassConfig.cpp
-
TargetSubtargetInfo.cpp
-
Target/X86/
-
X86/
-
CMakeLists.txt
-
X86.h
-
X86.td
-
X86AsmPrinter.h
-
X86FastISel.cpp
-
X86ISelDAGToDAG.cpp
-
X86ISelLowering.h
2/4
X86ISelLowering.cpp
4/4
X86InstrCompiler.td
-
X86InstrControl.td
-
X86InstrInfo.td
7/9
X86RetpolineThunks.cpp
1/1
X86Subtarget.h
-
X86Subtarget.cpp
-
X86TargetMachine.cpp
-
test/
-
CodeGen/X86/
-
X86/
-
O0-pipeline.ll
-
retpoline.ll
-
Transforms/IndirectBrExpand/
-
IndirectBrExpand/
-
basic.ll
-
tools/opt/
-
opt/
-
opt.cpp

Differential D41723

Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
ClosedPublic

Authored by chandlerc on Jan 4 2018, 1:14 AM.

Download Raw Diff

Details

Reviewers

echristo
rnk
ruiu
craig.topper
DavidKreitzer
ab
MatzeB

Commits

Summary

First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.

The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.

However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.

On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.

This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886

There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.

The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.

For kernels or other freestanding or fully static executables, the
compiler switch -mretpoline is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with -mretpoline and additionally link the dynamic
executable and all shared libraries with LLD and pass -z retpolineplt
(or use similar functionality from some other linker). We strongly
recommend also using -z now as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.

When manually apply similar transformations to -mretpoline to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.

When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.

However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.

We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.

This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.

Diff Detail

Build Status

Buildable 13517
Build 13517: arc lint + arc unit

Event Timeline

chandlerc created this revision.Jan 4 2018, 1:14 AM

Harbormaster completed remote builds in B13510: Diff 128605.Jan 4 2018, 1:14 AM

Herald added subscribers: hiraditya, mehdi_amini, mgorny and 3 others. · View Herald TranscriptJan 4 2018, 1:14 AM

LGTM of course.

-eric

This revision is now accepted and ready to land.Jan 4 2018, 1:19 AM

smeenai added a subscriber: smeenai.Jan 4 2018, 1:21 AM

Adding a bunch of subscribers who likely will care about this, and our release managers.

smeenai added subscribers: • espindola, grimar.Jan 4 2018, 1:33 AM

xiangzhai added a subscriber: xiangzhai.Jan 4 2018, 1:48 AM

JDevlieghere added a subscriber: JDevlieghere.Jan 4 2018, 2:41 AM

emaste added a subscriber: dim.Jan 4 2018, 4:42 AM

Does this CVE affect all previous versions of clang/llvm ?

The Project Zero blog post mention that this issue affects ARM CPUs, does clang/llvm need a similar fix for ARM?

Is this fix important enough to do a special 5.0.2 release for it?

Also, does this mitigation require the use of lld or is there a way to make this work with gnu ld ?

dexonsmith added a subscriber: dexonsmith.Jan 4 2018, 9:17 AM

Comment for Rui about the 32-bit PLT sequence...

lld/ELF/Arch/X86.cpp
491	Does it make sense to use something like the `pushl` sequence Reid came up with here? In the non-PLT case it looks like: addl $4, %esp pushl 4(%esp) pushl 4(%esp) popl 8(%esp) popl (%esp) So it would potentially need to be done a touch differently to work here, but maybe something like that rather than `xchg`? Even if the alternative is a lot more instructions, the `xchg` instruction is a locked instruction on x86 and so this will actually create synchronization overhead on the cache line of the top of the stack.

AFAICT this change doesn't allow to use different indirect thunks, gcc allows such usage with the -mindirect-branch=thunk-extern/-mindirect-branch-register options. The following patches for example allow Xen to choose the optimal thunk depending on the processor:

https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00112.html
https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00118.html

majnemer added a subscriber: majnemer.Jan 4 2018, 10:07 AM

majnemer added inline comments.

clang/lib/Basic/Targets/X86.cpp
1306	Why is this phrased as a target feature? It just seems weird as it is not a hardware capability (all X86 hardware should support the reptoline instruction sequence unless I am mistaken). Also, it seems that other hardware platforms beyond Intel have problems with speculation of this nature. IMO, this is more similar to things like the relocation-model and the code-model...
llvm/lib/CodeGen/IndirectBrExpandPass.cpp
98–100	BBToIndex is to `intptr_t` but BBIndex is an `int`, shouldn't they agree?
162	I would think this should also agree with BBToIndex.
llvm/lib/Target/X86/X86ISelLowering.cpp
27082–27083	I'd remove this empty default to make `getRetpolineSymForReg` more similar to `getOpcodeForRetpoline`.

Fix suggested in code review and a couple of indentation fixes via clang-format
that I missed previously.

rnk added inline comments.Jan 4 2018, 10:21 AM

lld/ELF/Arch/X86.cpp
491	This is a real concern, I checked the Intel manual, and it says: Exchanges the contents of the destination (first) and source (second) operands. The operands can be two general purpose registers or a register and a memory location. If a memory operand is referenced, the processor’s locking protocol is automatically implemented for the duration of the exchange operation, regardless of the presence or absence of the LOCK prefix or of the value of the IOPL. One question is, do we want to try to avoid `PUSH/POP MEM` instructions? LLVM has x86 subtarget features that suggest that these instructions are slow on some CPU models. To avoid them completely, we could use this code sequence: movl %ecx, (%esp) # save ECX over useless RA movl 4(%esp), %ecx # load original EAX to ECX movl %eax, 4(%esp) # store callee over saved EAX movl %ecx, %eax # restore EAX popl %ecx # restore ECX retl # return to callee When it comes down to it, this just implements `xchg` with a scratch register. On reflection, I think the code sequence above is the best we can do. The PUSH sequence you describe is 8 memory operations vs 4 if we use ECX as scratch.

Thanks for the review!

clang/lib/Basic/Targets/X86.cpp
1306	All x86-64 hardware supports SSE and SSE2, but we can still turn them on and off? I think it makes some sense as it only impacts the target's code generation, not anything produced by Clang etc, and this seems to be a way to pass that down. And while other architectures may want to introduce similar mitigations for Spectre, the retpoline construct itself is pretty x86-specific. My suspicion is that other architectures will potentially re-use some of the LLVM infrastructure like removing indirectbr, but want to handle the hard cases with a target-specific technique. So something like relocation-model and code-model seem weird to me as this is (at least so far) a fundamentally x86 thing.
llvm/lib/CodeGen/IndirectBrExpandPass.cpp
98–100	I just did this out of habit to avoid padding in the map's pair object. The size of this isn't really significant unless we have over 2 billion basic blocks with their address taken.... Still, if this bothers folks I can change it either direction.
162	Or should BBToIndex just use an 'int'?
llvm/lib/Target/X86/X86ISelLowering.cpp
27082–27083	Good call, done.

majnemer added inline comments.Jan 4 2018, 10:35 AM

llvm/lib/CodeGen/IndirectBrExpandPass.cpp
162	Either way works for me.

Pardon my unsolicited comment, but it seems to me that using a "retpoline" will have two unintended negative side-effects:

It will ~neuter Control Flow Integrity by providing a "universal" gadget that pulls a call target off the stack and is allowed to call anything
It will break performance-counter-based ROP detection

In D41723#967518, @zachriggle wrote:

Pardon my unsolicited comment, but it seems to me that using a "retpoline" will have two unintended negative side-effects:

It will ~neuter Control Flow Integrity by providing a "universal" gadget that pulls a call target off the stack and is allowed to call anything

I don't really follow...

Generally, CFI is in the business of preventing arbitrary control flow transfers. So if it is working, you can't jump to a gadget even if you have one.

And the retpoline is no more or less of a gadget than you had before. Before you had something equivalent to jmpq (%r11), now you have a retpoline that does the same thing. Both are viable gadgets, and if you can steer control flow to them they will transfer to an arbitrary location.

Also, I should note that LLVM has CFI mitigations, and they should play nice with this. It will check the target *before* calling the retpoline, just as it would check the target before doing the actual indirect call. We're even looking at ways to use CFI to make the overhead of retpoline much lower. While CFI introduces its own overheads, the overhead from CFI -> CFI+retpoline might be much lower than normal -> normal+retpoline. And you'll get CFI. =]

It will break performance-counter-based ROP detection

Almost certainly. But what else can we do?

Looks like this uses a different command-line argument than the preliminary patchset for GCC, which is rather unfortunate.
http://git.infradead.org/users/dwmw2/gcc-retpoline.git/shortlog/refs/heads/gcc-7_2_0-retpoline-20171219
It'd be awesome if both compilers would somehow end up using the same command-line spelling. Maybe GCC would like to switch to "-mretpoline"? Have we been talking to them?

However, I do suspect the upstream kernel folks will need the equivalent of "-mindirect-branch=thunk-extern" mode (where the compiler doesn't actually emit the thunks, only the calls to them). This lets them define their own thunks which have "alternatives" annotations in them to allow them to be runtime patched out. (e.g. as in https://lkml.org/lkml/2018/1/4/419). That should be easy to support -- just don't do the thunk-emission -- but it does then mean the need to standardize on the names and semantics of the required thunks. And it would be good if the same function-names were used as GCC. Fine to do as a followup patch, but maybe at least have an idea what the command-line UI will be for it.

AndreiGrischenko added a subscriber: AndreiGrischenko.Jan 4 2018, 11:28 AM

In D41723#967545, @jyknight wrote:

Looks like this uses a different command-line argument than the preliminary patchset for GCC, which is rather unfortunate.
http://git.infradead.org/users/dwmw2/gcc-retpoline.git/shortlog/refs/heads/gcc-7_2_0-retpoline-20171219
It'd be awesome if both compilers would somehow end up using the same command-line spelling. Maybe GCC would like to switch to "-mretpoline"? Have we been talking to them?

We have talked to them, but I don't know what the rationale is behind their flag names and there isn't an actual discussion about them that we can comment on. I *strongly* prefer having a single simple flag such as -mretpoline.

However, I do suspect the upstream kernel folks will need the equivalent of "-mindirect-branch=thunk-extern" mode (where the compiler doesn't actually emit the thunks, only the calls to them). This lets them define their own thunks which have "alternatives" annotations in them to allow them to be runtime patched out. (e.g. as in https://lkml.org/lkml/2018/1/4/419).

If the upstream kernel folks want it, and can spell out how it should work, no problem. We've worked on this set of functionality in close coordination with Paul Turner, and this wasn't at the top of the queue of things for us to do, but we can add it whenever/however it is needed. That said....

That should be easy to support -- just don't do the thunk-emission -- but it does then mean the need to standardize on the names and semantics of the required thunks.

I don't think this is true. We need to document LLVM's semantics and the thunks required. If they match GCCs, cool. If not and a user provides a flag to request an external thunk, then they have to give LLVM one that matches LLVM's semantics. Since they control the name, they can even have thunks with each compiler's semantics and different names. While in practice, I expect the semantics to match, I don't think we should be in the business of forcing this into the ABI. We already have waaaay to much in the ABI. If we come up with a better way to do retpoline in the future, we should rapidly switch to that without needing to coordinate about names and semantics across compilers.

And it would be good if the same function-names were used as GCC. Fine to do as a followup patch, but maybe at least have an idea what the command-line UI will be for it.

I strongly disagree here. Again, this should *not* be part of the ABI and it should not be something that we have to engineer to match exactly with other compilers. We should even be free to add __llvm_retpoline2_foo() thunks in the future with new semantics with zero compatibility concerns.

Is there any plan (or willingness?) to backport this further, to 4.0.0 and even 3.4.1?

In D41723#967557, @emaste wrote:

Is there any plan (or willingness?) to backport this further, to 4.0.0 and even 3.4.1?

I don't really have plans for it, but it should be relatively straightforward to backport at least some of it across a couple years worth of llvm releases.

sanjoy added inline comments.Jan 4 2018, 11:51 AM

llvm/lib/CodeGen/IndirectBrExpandPass.cpp
114	`BBIndex` needs to start from `1` I think since "no label is equal to the null pointer".
134	Do we care about inline assembly here? The langref says "Finally, some targets may provide defined semantics when using the value as the operand to an inline assembly, but that is target specific."

There are some references to X86::CALL64r/X86::CALL64m in X86FrameLowering.cpp and X86MCInstLower.cpp which look like they could be relevant, but aren't addressed by this patch.

When a function called using a retpoline returns, will the ret be predicted correctly?

llvm/lib/CodeGen/IndirectBrExpandPass.cpp
113	blockaddresses are uniqued, so no block should ever have more than one blockaddress user. So this should probably be an assertion.

In D41723#967554, @chandlerc wrote:

That should be easy to support -- just don't do the thunk-emission -- but it does then mean the need to standardize on the names and semantics of the required thunks.

I don't think this is true. We need to document LLVM's semantics and the thunks required. If they match GCCs, cool. If not and a user provides a flag to request an external thunk, then they have to give LLVM one that matches LLVM's semantics. Since they control the name, they can even have thunks with each compiler's semantics and different names. While in practice, I expect the semantics to match, I don't think we should be in the business of forcing this into the ABI. We already have waaaay to much in the ABI. If we come up with a better way to do retpoline in the future, we should rapidly switch to that without needing to coordinate about names and semantics across compilers.

Right, I shouldn't have said "standardize", I did actually mean "document".

And it would be good if the same function-names were used as GCC. Fine to do as a followup patch, but maybe at least have an idea what the command-line UI will be for it.

I strongly disagree here. Again, this should *not* be part of the ABI and it should not be something that we have to engineer to match exactly with other compilers.

No, I agree with you, it does not _need_ to match. However, if the semantics are the same as GCC's thunks (which they do appear to be), it would be _good_ to coordinate with eachother, simply in order to make users' lives easier. Which is the same reason to coordinate the command-line UI. Neither are required, but both would be good.

We should even be free to add __llvm_retpoline2_foo() thunks in the future with new semantics with zero compatibility concerns.

It wouldn't have _zero_ concerns due to externally-provided thunks -- they'll need to provide new ones to use the new compiler version. But probably that's okay.

ruiu added inline comments.Jan 4 2018, 1:02 PM

lld/ELF/Arch/X86.cpp
491	I didn't know that xchg is so slow. There's no reason not to use push/pop instructions to swap a word at the stack top and a register. Since this is a PLT header (and not a PLT entry), the size of the instrcutions doesn't really matter. Do you want me to update my patch?
lld/ELF/Arch/X86_64.cpp
517–525	Chander, I also noticed we can improve instructions here. We can use the following instructions instead so that the jump target to lazy-resolve PLT is aligned to 16 byte. I can make a change now if you want. mov foo@GOTPLT(%rip), %r11 callq next loop: pause jmp plt+32; .align 16 pushq <relocation index> // lazy-resolve a PLT entry jmpq plt[0]

kristof.beyls added a subscriber: kristof.beyls.Jan 4 2018, 1:03 PM

AndreiGrischenko added inline comments.Jan 4 2018, 1:35 PM

llvm/lib/Target/X86/X86InstrCompiler.td
1160	Hi Chandler, Do you really want to use "NotUseRetpoline" here? It will match RETPOLINE_TCRETURN32 then even it is not an indirect branch. I guess the following test case will crash llc. target triple = "i386-unknown-linux-gnu" define void @FOO() { ret void } define void @FOO1() { entry: tail call void @FOO() ret void }
llvm/lib/Target/X86/X86RetpolineThunks.cpp
121	You will create thunk function even if it is not necessary? For example for " tail call void @FOO()"?

In D41723#967580, @jyknight wrote:

In D41723#967554, @chandlerc wrote:

That should be easy to support -- just don't do the thunk-emission -- but it does then mean the need to standardize on the names and semantics of the required thunks.

I don't think this is true. We need to document LLVM's semantics and the thunks required. If they match GCCs, cool. If not and a user provides a flag to request an external thunk, then they have to give LLVM one that matches LLVM's semantics. Since they control the name, they can even have thunks with each compiler's semantics and different names. While in practice, I expect the semantics to match, I don't think we should be in the business of forcing this into the ABI. We already have waaaay to much in the ABI. If we come up with a better way to do retpoline in the future, we should rapidly switch to that without needing to coordinate about names and semantics across compilers.

Right, I shouldn't have said "standardize", I did actually mean "document".

And it would be good if the same function-names were used as GCC. Fine to do as a followup patch, but maybe at least have an idea what the command-line UI will be for it.

I strongly disagree here. Again, this should *not* be part of the ABI and it should not be something that we have to engineer to match exactly with other compilers.

No, I agree with you, it does not _need_ to match. However, if the semantics are the same as GCC's thunks (which they do appear to be), it would be _good_ to coordinate with eachother, simply in order to make users' lives easier. Which is the same reason to coordinate the command-line UI. Neither are required, but both would be good.

We should even be free to add __llvm_retpoline2_foo() thunks in the future with new semantics with zero compatibility concerns.

It wouldn't have _zero_ concerns due to externally-provided thunks -- they'll need to provide new ones to use the new compiler version. But probably that's okay.

Sure, sounds like we're generally in agreement here. I'll take a stab at supporting external thunks and skimming emission, shouldn't be too hard.

lld/ELF/Arch/X86.cpp
491	Yeah, if you can email me a delta on top of this, I'll fold it in. Thanks! (Suggesting that in part to make backports a bit easier...)
lld/ELF/Arch/X86_64.cpp
517–525	Will this break the 16-byte property when combined with -z now? That seems more relevant. Given that the lazy-binding is intended to only run once, I don't think its worth burning much space to speed it up.
llvm/lib/CodeGen/IndirectBrExpandPass.cpp
113	I just didn't want to hard code that assumption, but I can if you prefer.
134	I mean, yes, but also no. ;] It would be nice to maybe preserve inline asm uses of blockaddr and not any others. And then force people to not rinse their blackaddr usage through inline asm and mix that with `-mretpoline`. That would allow the common usage I'm aware of to remain (showing label addresses in crash dumps in things like kernels) and really allow almost any usage wholly contained within inline asm to continue working perfectly. But it seemed reasonable for a follow-up. That said, maybe its not too complex to add now...
llvm/lib/Target/X86/X86InstrCompiler.td
1160	Reid, could you take a look?
llvm/lib/Target/X86/X86RetpolineThunks.cpp
121	There is a FIXME above about being more careful when creating these.

sanjoy added inline comments.Jan 4 2018, 2:38 PM

llvm/lib/CodeGen/IndirectBrExpandPass.cpp
134	What do you think about `report_fatal_error`ing here if you encounter an inline assembly user? That seems somewhat more friendly than silently "miscompiling" (in quotes) inline assembly.

rnk added inline comments.Jan 4 2018, 2:50 PM

llvm/lib/Target/X86/X86InstrCompiler.td
1160	Here's a small diff on top of this one that fixes and tests it: https://reviews.llvm.org/P8056

efriedma added inline comments.Jan 4 2018, 3:04 PM

llvm/lib/CodeGen/IndirectBrExpandPass.cpp
113	If we violate that assumption, something has gone very wrong (either we've created a blockaddress in the wrong context, or we leaked a blockaddress from the context, or we have a blockaddress with an invalid block+function pair). Although, on a related note, you might want to check Constant::isConstantUsed(), so we don't generate indexes for blockaddresses which aren't actually referenced anywhere.

dexonsmith added inline comments.Jan 4 2018, 3:33 PM

llvm/lib/CodeGen/IndirectBrExpandPass.cpp
113	FWIW, I agree with Eli that it's fundamentally bad if constants haven't been uniqued properly.

davidxl added a subscriber: davidxl.Jan 4 2018, 4:02 PM

davidxl added inline comments.

clang/lib/Basic/Targets/X86.cpp
1306	This can go either way. Fundamentally, the option is used to 'disable' indirect branch predictor so it sounds like target independent. However the mechanism used here 'retpoline' is very x86 specific -- it forces the HW to use call/ret predictor so that the predicted target is 'trapped' in a loop.

Fix several bugs caught in code review, as well as implementing suggestions.

I'm looking at adding support for external thunks next.

Harbormaster completed remote builds in B13521: Diff 128675.Jan 4 2018, 4:22 PM

Add another minor suggestion that I forgot previously.

Harbormaster completed remote builds in B13523: Diff 128678.Jan 4 2018, 4:30 PM

Chandler,

Please apply https://reviews.llvm.org/D41744 to this patch. It includes the following changes:

xchg is replaced with mov/pop instructions
x86-64 lazy PLT relocation target is now aligned to 16 byte
the x86-64 PLT header for lazy PLT resolution is shrunk from 48 bytes to 32 bytes (which became possible by utilizing the space made by (2))

chandlerc added inline comments.Jan 4 2018, 4:32 PM

llvm/lib/CodeGen/IndirectBrExpandPass.cpp
134	I actually think that'd be much less friendly. The primary use case I'm aware of is for diagnostics, and I'd much rather have low-quality diagnostics in some random subsystem than be unable to use retpolines... I've added a FIXME about this but I'm pretty nervous about breaking round-trip-through-inline-asm to repair a quality loss in diagnostics. But we can revisit this if needed.

chandlerc added inline comments.Jan 4 2018, 4:32 PM

llvm/lib/CodeGen/IndirectBrExpandPass.cpp
113	Yeah, I'm convinced. It actually makes the code way simpler. I've implemented all of the suggestions here.
llvm/lib/Target/X86/X86InstrCompiler.td
1160	Awesome, applied!

Clang/LLVM pieces LGTM. Thanks for doing this, people.

llvm/lib/CodeGen/IndirectBrExpandPass.cpp
56	Nit: Unnecessary 'private'?
llvm/lib/Target/X86/X86ISelLowering.cpp
27120	I wonder: should this also check the CSR regmask? There are a couple CCs that do preserve R11, but it probably doesn't matter for -mretpoline users. E.g., define void @t(void ()* %f) { call cc 17 void %f() ret void } Worth an assert?
llvm/lib/Target/X86/X86RetpolineThunks.cpp
119	'retq' at the end
171	Nit: newlines between functions
183	'noinline' makes sense, but I'm curious: is it necessary?
llvm/lib/Target/X86/X86Subtarget.h
344	Nit: "Use retpoline to avoid ..."

MaskRay added a subscriber: MaskRay.Jan 4 2018, 5:58 PM

Add support for externally provided thunks. This is an independent feature;
when combined with the overall retpoline feature it suppresses the thunk
emission and rotates the names to be distinct names that an external build
system for the kernel (for example) can provide.

I've added some minimal documentation about the semantic requirements of these
thunks to the commit log, although it is fairly obvious. More comprehensive
documentation will be part of the large follow-up effort around docs.

Also adds Rui's work to provide more efficient PLT thunks.

Also addresses remaining feedback from Ahmed.

Harbormaster completed remote builds in B13531: Diff 128698.Jan 4 2018, 6:06 PM

In D41723#967861, @ruiu wrote:

Chandler,

Please apply https://reviews.llvm.org/D41744 to this patch. It includes the following changes:

xchg is replaced with mov/pop instructions

x86-64 lazy PLT relocation target is now aligned to 16 byte

the x86-64 PLT header for lazy PLT resolution is shrunk from 48 bytes to 32 bytes (which became possible by utilizing the space made by (2))

Done, and awesome!

Reid, do you want me to adjust the code for the 32-bit _push thunk?

Add the new test file for external thunks.

All comments addressed now I think, thanks everyone! See some thoughts about a few of these below though.

llvm/lib/Target/X86/X86ISelLowering.cpp
27120	We should already have the assert below? If we get through this and have no available reg, we assert that we're in 32-bit. That said, this really is a case that will come up. We shouldn't silently miscompile the code. I've changed both cases to `report_fatal_error` because these are fundamental incompatibilities with retpoline.
llvm/lib/Target/X86/X86RetpolineThunks.cpp
183	Nope, not necessary. I've removed it. Actually, I don't think any of these are necessary because we hand build the MI and we are the last pass before the asm printer. But I've left the others for now. They seem at worst harmless.

I had one comment I think got missed: there are some references to X86::CALL64r/X86::CALL64m in X86FrameLowering.cpp and X86MCInstLower.cpp which look like they could be relevant, but aren't addressed by this patch.

arichardson added a subscriber: arichardson.Jan 4 2018, 8:49 PM

Chandler, please apply https://reviews.llvm.org/P8058 to address latest review comments.

tschuett added a subscriber: tschuett.Jan 4 2018, 11:10 PM

eax added a subscriber: eax.Jan 5 2018, 12:30 AM

Rebase and fold in the latest batch of changes from Rui based on the review
from Rafael.

Harbormaster completed remote builds in B13535: Diff 128715.Jan 5 2018, 1:51 AM

In D41723#967987, @chandlerc wrote:

Add support for externally provided thunks. This is an independent feature;
when combined with the overall retpoline feature it suppresses the thunk
emission and rotates the names to be distinct names that an external build
system for the kernel (for example) can provide.

I've added some minimal documentation about the semantic requirements of these
thunks to the commit log, although it is fairly obvious. More comprehensive
documentation will be part of the large follow-up effort around docs.

Thanks! I'm however not seeing the updated commit message that contains the usage documentation of the new option in the differential revision.

AFAICT from the code, the new option is going to be named "mretpoline_external_thunk", could we have a more generic name that could be used by all options, like:

-mindirect-thunk={retpoline,external}

This should also allow clang to implement new techniques as they become available, without having to add new options for each one of them.

In D41723#968179, @royger wrote:

In D41723#967987, @chandlerc wrote:

Add support for externally provided thunks. This is an independent feature;
when combined with the overall retpoline feature it suppresses the thunk
emission and rotates the names to be distinct names that an external build
system for the kernel (for example) can provide.

I've added some minimal documentation about the semantic requirements of these
thunks to the commit log, although it is fairly obvious. More comprehensive
documentation will be part of the large follow-up effort around docs.

Thanks! I'm however not seeing the updated commit message that contains the usage documentation of the new option in the differential revision.

Yeah, not sure why phab does't update that -- it will be updated when it lands.

AFAICT from the code, the new option is going to be named "mretpoline_external_thunk", could we have a more generic name that could be used by all options, like:

-mindirect-thunk={retpoline,external}

This should also allow clang to implement new techniques as they become available, without having to add new options for each one of them.

Adding options is actually cheaper than adding option parsing like this...

I'm hesitant to name the option in the external case something completely generic, because it isn't as generic as it seems IMO. We hide indirect calls and indirect branches, but we *don't* hide returns which are technically still indirects... The reason is that we are assuming that rewriting arbitrary indirects into a return is an effective mitigation. If it weren't, we'd also need to rewrite returns. This is a large component of the discussion in the linked detailed article around RSB filling -- we have to use some *other* mechanisms to defend ret, and then we rely on them being "safe".

As a consequence to all of this, this flag to clang shouldn't be used for *arbitrary* mitigations. It really should be used for mitigations stemming from return-based trampolining of indirect branches.

If we want truly generic thunk support in the future, we should add that under a separate flag name, and it should likely apply to returns as well as other forms of indirects. I know there has been some discussion about using lfence to protect indirect branch and call on AMD processors, but I'm still extremely dubious about this being an advisable mitigation. I have *significantly* more confidence in the retpoline technique being a truly effective mitigation against the attacks, and the performance we observe in the wild really is completely acceptable (for the Linux kernel, hard to even measure outside of contrived micro-benchmarks).

Add explicit checks for various lowerings that would need direct retpoline
support that are not yet implemented. These are constructs that don't show up
in typical C, C++, and other static languages today: state points, patch
points, stack probing and morestack.

Harbormaster completed remote builds in B13537: Diff 128721.Jan 5 2018, 2:46 AM

In D41723#968046, @efriedma wrote:

I had one comment I think got missed: there are some references to X86::CALL64r/X86::CALL64m in X86FrameLowering.cpp and X86MCInstLower.cpp which look like they could be relevant, but aren't addressed by this patch.

Sorry for failing to respond previously...

I've audited all of these. They should have already triggered a hard failure if we made it through ISel though. I've added explicit and more user-friendly fatal errors if we hit these cases.

The only really interesting cases are statepoints and patchpoints. The other cases are only relevant when in the large code model on x86-64 which is pretty much riddled with bugs already. =/

ddibyend added a subscriber: ddibyend.Jan 5 2018, 3:09 AM

For AMD processors we may be able to handle indirect jumps via a simpler lfence mechanism. Indirect calls may still require retpoline. If this turns out to be the right solution for AMD processors we may need to put some code in to support this.

In D41723#968188, @chandlerc wrote:

In D41723#968179, @royger wrote:

Thanks! I'm however not seeing the updated commit message that contains the usage documentation of the new option in the differential revision.

Yeah, not sure why phab does't update that -- it will be updated when it lands.

Phabricator keeps the summary field of a revision independent of the commit message (beyond the initial upload). You can use arc diff --verbatim to force the summary to be re-copied from the commit message. However, running that will also update the Reviewers and Subscribers fields with their values in the commit message, which would probably drop a lot of subscribers in this case, so I wouldn't recommend it. You can always Edit Revision and update the summary manually if you really want.

qcolombet added a subscriber: qcolombet.Jan 5 2018, 9:29 AM

In D41723#968202, @chandlerc wrote:

The only really interesting cases are statepoints and patchpoints. The other cases are only relevant when in the large code model on x86-64 which is pretty much riddled with bugs already. =/

To be explicit, you can ignore statepoints and patchpoints for the moment. As the only major user of this functionality, we'll follow up with patches if needed. We're still in the process of assessing actual risk in our environment, but at the moment it looks like we likely won't need this functionality. (Obviously, subject to change as we learn more.)

Any more comments? I'd love to land this and start the backporting and merging process. =D

FWIW, I've built and linked the test suite with this in various modes, both 64-bit and 32-bit, and have no functional failures. I've not done any specific performance measurements using the LLVM test suite, but you can see our initial (very rough) performance data in the OP.

In D41723#968248, @ddibyend wrote:

For AMD processors we may be able to handle indirect jumps via a simpler lfence mechanism. Indirect calls may still require retpoline. If this turns out to be the right solution for AMD processors we may need to put some code in to support this.

Yeah, if it ends up that we want non-retpoline mitigations for AMD we can and should add them. One hope I have is that this patch is at least generically *sufficient* (when paired with correct RSB filling) even if it suboptimal in some cases and we end up adding more precise tools later.

(Er sorry, have two updates from Rafael I need to actually include, will be done momentarily...)

Teach the thunk emission to put them in comdats and enhance tests to verify
this.

Also add test coverage for nonlazybind calls which on 64-bit architectures
require retpoline there despite no user written indirect call. This already
worked, but Rafael rightly pointed out we should test it.

Harbormaster completed remote builds in B13566: Diff 128834.Jan 5 2018, 5:45 PM

In D41723#968977, @chandlerc wrote:

In D41723#968248, @ddibyend wrote:

For AMD processors we may be able to handle indirect jumps via a simpler lfence mechanism. Indirect calls may still require retpoline. If this turns out to be the right solution for AMD processors we may need to put some code in to support this.

Yeah, if it ends up that we want non-retpoline mitigations for AMD we can and should add them. One hope I have is that this patch is at least generically *sufficient* (when paired with correct RSB filling) even if it suboptimal in some cases and we end up adding more precise tools later.

Just to say that at Sony we're still doing our investigation and might be interested in lfence. But who knows, we might just zap the predictor on syscalls and context switches; for environments that have mostly a few long-running processes with comparatively few syscalls it might be net cheaper than making every indirection more expensive.

In D41723#969143, @probinson wrote:

In D41723#968977, @chandlerc wrote:

In D41723#968248, @ddibyend wrote:

For AMD processors we may be able to handle indirect jumps via a simpler lfence mechanism. Indirect calls may still require retpoline. If this turns out to be the right solution for AMD processors we may need to put some code in to support this.

Yeah, if it ends up that we want non-retpoline mitigations for AMD we can and should add them. One hope I have is that this patch is at least generically *sufficient* (when paired with correct RSB filling) even if it suboptimal in some cases and we end up adding more precise tools later.

Just to say that at Sony we're still doing our investigation and might be interested in lfence. But who knows, we might just zap the predictor on syscalls and context switches; for environments that have mostly a few long-running processes with comparatively few syscalls it might be net cheaper than making every indirection more expensive.

But retpoline doesn't make every indirection more expensive any more or less than zapping the predictor... You only build the code running in the privileged domain with retpoline, not all of the code, and they both accomplish very similar things.

The performance difference we see between something like retpoline and disabling the predictor on context switches is very significant (retpoline is much, much cheaper).

A good way to think about the cost of these things is this. The cost of retpoline we have observed on the kernel:

the cost of executing the system call with "broken" indirect branch prediction (IE, reliably mispredicted), plus
the cost of few extra instructions (very, very few cycles)

Both of these are very effectively mitigated by efforts to remove hot indirect branches from the system call code in the kernel. Because of the nature of most kernels, this tends to be pretty easy and desirable for performance anyways.

By comparison, the cost of toggling off the predictor is:

the exact same cost as #1 above, plus
the cost of toggling the MSR on every context switch

This second cost, very notably, cannot be meaningfully mitigated by PGO, or hand-implemented hot-path specializations without an indirect branch. And our measurements on Intel hardware at least show that this cost of toggling is actually the dominant cost by a very large margin.

So, you should absolutely measure the impact of the AMD solutions you have on your AMD hardware as it may be very significantly different. But I wanted to set the expectation correctly based on what limited experience we have so far (sadly only on Intel hardware).

venkataramanan.kumar.llvm added a subscriber: venkataramanan.kumar.llvm.Jan 5 2018, 10:32 PM

jbhateja added a subscriber: jbhateja.Jan 6 2018, 7:14 AM

In D41723#969167, @chandlerc wrote:

But retpoline doesn't make every indirection more expensive any more or less than zapping the predictor... You only build the code running in the privileged domain with retpoline, not all of the code, and they both accomplish very similar things.

I do understand that it applies only to privileged code.

The performance difference we see between something like retpoline and disabling the predictor on context switches is very significant (retpoline is much, much cheaper).

I expect you are measuring this in a normal timesharing environment, which is not what we have (more below).

A good way to think about the cost of these things is this. The cost of retpoline we have observed on the kernel:

the cost of executing the system call with "broken" indirect branch prediction (IE, reliably mispredicted), plus

the cost of few extra instructions (very, very few cycles)

Both of these are very effectively mitigated by efforts to remove hot indirect branches from the system call code in the kernel. Because of the nature of most kernels, this tends to be pretty easy and desirable for performance anyways.

Right.

By comparison, the cost of toggling off the predictor is:

the exact same cost as #1 above, plus

the cost of toggling the MSR on every context switch

This second cost, very notably, cannot be meaningfully mitigated by PGO, or hand-implemented hot-path specializations without an indirect branch. And our measurements on Intel hardware at least show that this cost of toggling is actually the dominant cost by a very large margin.

As I said above, I expect you are measuring this on a normal timesharing system. A game console is not a normal timesharing environment. We reserve a core to the system and the game gets the rest of them. My understanding is that the game is one process and it essentially doesn't get process-context-switched (this understanding could be very flawed) although it probably does get thread-context-switched (which should still be cheap as there is no protection-domain transition involved). If there basically aren't any process context switches while running a game (which is when performance matters most) then moving even a small in-process execution cost to a larger context-switch cost is a good thing, not a bad thing. We have hard-real-time constraints to worry about.

So, you should absolutely measure the impact of the AMD solutions you have on your AMD hardware as it may be very significantly different. But I wanted to set the expectation correctly based on what limited experience we have so far (sadly only on Intel hardware).

I appreciate all the work you've done on this, and your sharing of your findings. Unfortunately on our side we are playing catch-up as we were unaware of the problem before the public announcements and the publication of this patch. We're definitely doing research and our own measurements to understand what is our best way forward.

(If it wasn't obvious, I'm not standing in the way of the patch, just noting the AMD thing which clearly can be a follow-up if it seems to be a better choice there.)

MatzeB added a subscriber: MatzeB.Jan 8 2018, 1:20 PM

MatzeB added inline comments.

llvm/lib/Target/X86/X86RetpolineThunks.cpp
83–84	We need this pass for "correctness" and should never skip it I think.

n.bozhenov added a subscriber: n.bozhenov.Jan 9 2018, 2:46 AM

lattera added a subscriber: lattera.Jan 9 2018, 6:23 AM

FlameTop added a subscriber: FlameTop.Jan 10 2018, 3:08 AM

• fsbruva added a subscriber: • fsbruva.Jan 12 2018, 12:10 PM

nullius added a subscriber: nullius.Jan 12 2018, 1:53 PM

FYI: I've imported this patch into a feature branch in HardenedBSD's playground repo. All of HardenedBSD (world + kernel) is compiled with it. I've been running it on my laptop for the past couple days without a single issue. I've enabled it locally on my box for a few applications, notably Firefox and ioquake3. Both work without issue. I plan to start an experimental package build tomorrow with it applied to the entire ports tree (around 29,400 packages).

Re-implement the indirectbr removal pass based on Rafael's suggestion. It now
creates a single switch block and threads all the indirictbr's in the function
through that block. This should give substantially smaller code especially in
the case of using retpoline where the switch block expands to a search tree
that we probably don't want to duplicate 100s of times.

Harbormaster completed remote builds in B13835: Diff 129849.Jan 15 2018, 5:50 AM

Couple of other minor fixes.

Ok, I think this is all done. Rafael, I think I've implemented your suggestion as well and it still passes all my tests (including the test-suite) and a bunch of internal code I have.

Any last comments?

llvm/lib/Target/X86/X86RetpolineThunks.cpp
83–84	Agreed and done.

Per kernel https://marc.info/?l=linux-kernel&m=151580566622935&w=2 and gcc https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01059.html, it seems AMD needs there to be an lfence in the speculation trap (and the pause is not useful for them, but does no harm). There seems to be some speculation (but no confirmation yet?) that pause *is* necessary vs lfence on intel. So in order to work generically, they seem to be suggesting using both instructions:

loop:
  pause
  lfence
  jmp loop

Some more links
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01209.html
and final patch:
https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2b

In D41723#976776, @jyknight wrote:
Per kernel https://marc.info/?l=linux-kernel&m=151580566622935&w=2 and gcc https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01059.html, it seems AMD needs there to be an lfence in the speculation trap (and the pause is not useful for them, but does no harm). There seems to be some speculation (but no confirmation yet?) that pause *is* necessary vs lfence on intel. So in order to work generically, they seem to be suggesting using both instructions:
loop:
  pause
  lfence
  jmp loop
Some more links
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01209.html
and final patch:
https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2b

Yes for AMD, we require "lfence" instruction after the "pause" in the "retpoline" loop filler. This solution has already been accepted in GCC and Linux kernel.
Can you please do the same in LLVM as well?

mnagaraj added a subscriber: mnagaraj.Jan 15 2018, 11:38 PM

In D41723#976776, @jyknight wrote:
Per kernel https://marc.info/?l=linux-kernel&m=151580566622935&w=2 and gcc https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01059.html, it seems AMD needs there to be an lfence in the speculation trap (and the pause is not useful for them, but does no harm). There seems to be some speculation (but no confirmation yet?) that pause *is* necessary vs lfence on intel. So in order to work generically, they seem to be suggesting using both instructions:
loop:
  pause
  lfence
  jmp loop
Some more links
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01209.html
and final patch:
https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2b

Thanks for digging all of this up, but I have to say that it would be really awesome of folks from AMD would actually comment on this thread and/or patch rather than us relaying things 2nd and 3rd hand....

I'll look at implementing this, but I'm not super thrilled to change so much of the code at this point. The code as-is is secure, and merely power-inefficient on AMD chips. I'd like to fix that, but if it creates problems in testing, I'm inclined to wait for AMD to actually join the discussion.

In D41723#976780, @venkataramanan.kumar.llvm wrote:
In D41723#976776, @jyknight wrote:
Per kernel https://marc.info/?l=linux-kernel&m=151580566622935&w=2 and gcc https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01059.html, it seems AMD needs there to be an lfence in the speculation trap (and the pause is not useful for them, but does no harm). There seems to be some speculation (but no confirmation yet?) that pause *is* necessary vs lfence on intel. So in order to work generically, they seem to be suggesting using both instructions:
loop:
  pause
  lfence
  jmp loop
Some more links
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01209.html
and final patch:
https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2b
In D41723#976776, @jyknight wrote:
Per kernel https://marc.info/?l=linux-kernel&m=151580566622935&w=2 and gcc https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01059.html, it seems AMD needs there to be an lfence in the speculation trap (and the pause is not useful for them, but does no harm). There seems to be some speculation (but no confirmation yet?) that pause *is* necessary vs lfence on intel. So in order to work generically, they seem to be suggesting using both instructions:
loop:
  pause
  lfence
  jmp loop
Some more links
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01209.html
and final patch:
https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2b
Yes for AMD, we require "lfence" instruction after the "pause" in the "retpoline" loop filler. This solution has already been accepted in GCC and Linux kernel.
Can you please do the same in LLVM as well?

Ahh, I see my email crossed yours, sorry.

Have you tested adding 'lfence' and this patch on any AMD platforms? Do you have any results? Can you confirm that these patches are actually working?

In D41723#977285, @chandlerc wrote:

I'll look at implementing this, but I'm not super thrilled to change so much of the code at this point.

It seems potentially reasonable to me to commit as-is, and do that as a follow-on patch.

I still think the interface should be fixed, so that later on when lfence (or other thunks) is added clang doesn't end up with:

-mretpoline_external_thunk and -mlfence_external_thunk

Which doesn't make sense. IMHO the interface should be something like:

-mindirect-thunk={retpoline,lfence,external,...}

Or similar, ie: a single option that allows the user to select the thunk to use.

gcc is using such interface:

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01041.html

anujm1 added a subscriber: anujm1.Jan 16 2018, 5:35 PM

In D41723#977286, @chandlerc wrote:
In D41723#976780, @venkataramanan.kumar.llvm wrote:
In D41723#976776, @jyknight wrote:
Per kernel https://marc.info/?l=linux-kernel&m=151580566622935&w=2 and gcc https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01059.html, it seems AMD needs there to be an lfence in the speculation trap (and the pause is not useful for them, but does no harm). There seems to be some speculation (but no confirmation yet?) that pause *is* necessary vs lfence on intel. So in order to work generically, they seem to be suggesting using both instructions:
loop:
  pause
  lfence
  jmp loop
Some more links
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01209.html
and final patch:
https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2b
In D41723#976776, @jyknight wrote:
Per kernel https://marc.info/?l=linux-kernel&m=151580566622935&w=2 and gcc https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01059.html, it seems AMD needs there to be an lfence in the speculation trap (and the pause is not useful for them, but does no harm). There seems to be some speculation (but no confirmation yet?) that pause *is* necessary vs lfence on intel. So in order to work generically, they seem to be suggesting using both instructions:
loop:
  pause
  lfence
  jmp loop
Some more links
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01209.html
and final patch:
https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2b
Yes for AMD, we require "lfence" instruction after the "pause" in the "retpoline" loop filler. This solution has already been accepted in GCC and Linux kernel.
Can you please do the same in LLVM as well?
Ahh, I see my email crossed yours, sorry.

Have you tested adding 'lfence' and this patch on any AMD platforms? Do you have any results? Can you confirm that these patches are actually working?

Given the lack of test case for this issue, We just tested SPEC2k17 with GCC 'retpoline' patch ('pause' vs 'pause+lfence') on AMD Zen.
There is no overhead on adding 'lfence' after 'pause'.

Also for AMD we need 'lfence' as it is a dispatch serializing instruction.
Please refer: https://www.spinics.net/lists/kernel/msg2697621.html.

sylvestre.ledru added a subscriber: sylvestre.ledru.Jan 18 2018, 12:06 PM

Add the lfence instruction to all of the speculation capture loops, both
those generated by LLVM and those generated by LLD for the PLT.

This should ensure that both Intel and AMD processors stop speculating these
loops rather than continuing to consume resources (and power) uselessly.

Ping! Would really like to get this landed, but as it has changed somewhat, looking for a final round of review....

LGTM, I'm all for getting this into the LLVM tree, we can fine tune later as necessary.

This is the first widely used machine module pass. Adding a module pass into the pipeline means we will have all the MI in memory at the same time rather than creating MI for 1 function at a time and freeing it after emitting each function. So it would be good to keep an eye on the compilers memory consumption... Maybe we can find a way to refactor the codegen pipeline later so that we can go back to 1 function at a time and still have a way to emit some new code afterwards...

llvm/include/llvm/CodeGen/TargetPassConfig.h
416–417	Even though this documentation is mostly copied, how about changing it to: Targets may add passes immediately before machine code is emitted in this callback. This is called later than addPreEmitPass().
418	I find the name `addEmitPass()` misleading as it isn't adding the assembly emission passes. The best bad name I can think of right now is `addPreEmit2()`, in the spirit of the existing `addPreSched2()` callback...
llvm/lib/Target/X86/X86RetpolineThunks.cpp
84–85	There shouldn't be any way to use any Target/X86 without also having a TargetPassConfig in the pipeline. An assert should be enough.

Thanks Matthias and Eric!

After a discussion on IRC, the general conclusion I came to is that let's try this as-is with a clean machine module pass. If we get reports of issues, it should be pretty straight forward to hack it up to use machine function passes instead, but it seems quite a bit cleaner as-is, and everyone seems very hopeful that this will not be an issue in practice. (And none of the testing we've done so far indicate any issues with memory consumption.)

Seems like this has pretty good consensus now (Matthias, Rafael, several others). I'm planning to land this first thing tomorrow morning (west coast US time) and then will start working on backports to release branches.

Thanks again everyone, especially those who helped author large parts of this. Will of course keep an eye out for follow-ups or other issues.

llvm/include/llvm/CodeGen/TargetPassConfig.h
418	I suspect the right thing to do is to change the name of `addPreEmitPass` to something more representative of reality and document that correctly. Then we can use the obvious name of `addPreEmitPass`. But I'd prefer to do the name shuffling of existing APIs in a follow-up patch. I've used `addPreEmitPass2` here even though I really dislike that name, and left a FIXME.

Closed by commit rLLD323155: Introduce the "retpoline" x86 mitigation technique for variant #2 of the… (authored by chandlerc). · Explain WhyJan 22 2018, 2:08 PM

This revision was automatically updated to reflect the committed changes.

chandlerc marked an inline comment as done.

@chandlerc is MacOS support omitted intentionally (ie. not implemented yet)? When I try using the -mretpoline flag on High Sierra I get the following:

fatal error: error in backend: MachO doesn't support COMDATs, '__llvm_retpoline_r11' cannot be lowered.

In D41723#987098, @samkellett wrote:

@chandlerc is MacOS support omitted intentionally (ie. not implemented yet)?

Not at all, I just don't have a Mac system to test with...

When I try using the -mretpoline flag on High Sierra I get the following:
fatal error: error in backend: MachO doesn't support COMDATs, '__llvm_retpoline_r11' cannot be lowered.

This is just that I unconditionally used comdats and that doesn't work on MachO. We should use some other lowering strategy to ensure the thunks are merged by the Mac linker. I'm not an expert there, so would defer to others like Ahmed. Happy to review a patch fixing it though.

GGanesh added a subscriber: GGanesh.Feb 22 2018, 9:15 PM

GGanesh removed a subscriber: GGanesh.

GGanesh added a subscriber: GGanesh.

Revision Contents

Path

Size

clang/

include/

clang/

Driver/

Options.td

2 lines

lib/

Basic/

Targets/

X86.h

1 line

X86.cpp

3 lines

test/

Driver/

x86-target-features.c

4 lines

lld/

ELF/

Arch/

137 lines

119 lines

1 line

1 line

test/

ELF/

i386-retpoline-nopic.s

49 lines

i386-retpoline-pic.s

46 lines

x86-64-retpoline-znow.s

34 lines

x86-64-retpoline.s

51 lines

llvm/

include/

llvm/

CodeGen/

Passes.h

3 lines

TargetLowering.h

2 lines

TargetPassConfig.h

4 lines

TargetSubtargetInfo.h

3 lines

InitializePasses.h

1 line

lib/

CodeGen/

CMakeLists.txt

1 line

CodeGen.cpp

1 line

IndirectBrExpandPass.cpp

187 lines

TargetPassConfig.cpp

3 lines

TargetSubtargetInfo.cpp

4 lines

Target/

X86/

1 line

4 lines

12 lines

1 line

4 lines

6 lines

6 lines

102 lines

18 lines

31 lines

2 lines

X86RetpolineThunks.cpp

266 lines

X86Subtarget.h

8 lines

X86Subtarget.cpp

1 line

X86TargetMachine.cpp

8 lines

test/

CodeGen/

X86/

O0-pipeline.ll

3 lines

retpoline.ll

300 lines

Transforms/

IndirectBrExpand/

basic.ll

63 lines

tools/

opt/

opt.cpp

1 line

Diff 128629

clang/include/clang/Driver/Options.td

	Show First 20 Lines • Show All 2,573 Lines • ▼ Show 20 Lines
	def mxsaveopt : Flag<["-"], "mxsaveopt">, Group<m_x86_Features_Group>;			def mxsaveopt : Flag<["-"], "mxsaveopt">, Group<m_x86_Features_Group>;
	def mno_xsaveopt : Flag<["-"], "mno-xsaveopt">, Group<m_x86_Features_Group>;			def mno_xsaveopt : Flag<["-"], "mno-xsaveopt">, Group<m_x86_Features_Group>;
	def mxsaves : Flag<["-"], "mxsaves">, Group<m_x86_Features_Group>;			def mxsaves : Flag<["-"], "mxsaves">, Group<m_x86_Features_Group>;
	def mno_xsaves : Flag<["-"], "mno-xsaves">, Group<m_x86_Features_Group>;			def mno_xsaves : Flag<["-"], "mno-xsaves">, Group<m_x86_Features_Group>;
	def mshstk : Flag<["-"], "mshstk">, Group<m_x86_Features_Group>;			def mshstk : Flag<["-"], "mshstk">, Group<m_x86_Features_Group>;
	def mno_shstk : Flag<["-"], "mno-shstk">, Group<m_x86_Features_Group>;			def mno_shstk : Flag<["-"], "mno-shstk">, Group<m_x86_Features_Group>;
	def mibt : Flag<["-"], "mibt">, Group<m_x86_Features_Group>;			def mibt : Flag<["-"], "mibt">, Group<m_x86_Features_Group>;
	def mno_ibt : Flag<["-"], "mno-ibt">, Group<m_x86_Features_Group>;			def mno_ibt : Flag<["-"], "mno-ibt">, Group<m_x86_Features_Group>;
				def mretpoline : Flag<["-"], "mretpoline">, Group<m_x86_Features_Group>;
				def mno_retpoline : Flag<["-"], "mno-retpoline">, Group<m_x86_Features_Group>;

	// These are legacy user-facing driver-level option spellings. They are always			// These are legacy user-facing driver-level option spellings. They are always
	// aliases for options that are spelled using the more common Unix / GNU flag			// aliases for options that are spelled using the more common Unix / GNU flag
	// style of double-dash and equals-joined flags.			// style of double-dash and equals-joined flags.
	def gcc_toolchain_legacy_spelling : Separate<["-"], "gcc-toolchain">, Alias<gcc_toolchain>;			def gcc_toolchain_legacy_spelling : Separate<["-"], "gcc-toolchain">, Alias<gcc_toolchain>;
	def target_legacy_spelling : Separate<["-"], "target">, Alias<target>;			def target_legacy_spelling : Separate<["-"], "target">, Alias<target>;

	// Special internal option to handle -Xlinker --no-demangle.			// Special internal option to handle -Xlinker --no-demangle.
	▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/X86.h

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	class LLVM_LIBRARY_VISIBILITY X86TargetInfo : public TargetInfo {
bool HasXSAVES = false;		bool HasXSAVES = false;
bool HasMWAITX = false;		bool HasMWAITX = false;
bool HasCLZERO = false;		bool HasCLZERO = false;
bool HasPKU = false;		bool HasPKU = false;
bool HasCLFLUSHOPT = false;		bool HasCLFLUSHOPT = false;
bool HasCLWB = false;		bool HasCLWB = false;
bool HasMOVBE = false;		bool HasMOVBE = false;
bool HasPREFETCHWT1 = false;		bool HasPREFETCHWT1 = false;
		bool HasRetpoline = false;

/// \brief Enumeration of all of the X86 CPUs supported by Clang.		/// \brief Enumeration of all of the X86 CPUs supported by Clang.
///		///
/// Each enumeration represents a particular CPU supported by Clang. These		/// Each enumeration represents a particular CPU supported by Clang. These
/// loosely correspond to the options passed to '-march' or '-mtune' flags.		/// loosely correspond to the options passed to '-march' or '-mtune' flags.
enum CPUKind {		enum CPUKind {
CK_Generic,		CK_Generic,
#define PROC(ENUM, STRING, IS64BIT) CK_##ENUM,		#define PROC(ENUM, STRING, IS64BIT) CK_##ENUM,
▲ Show 20 Lines • Show All 701 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/X86.cpp

Show First 20 Lines • Show All 753 Lines • ▼ Show 20 Lines	for (const auto &Feature : Features) {
} else if (Feature == "+clflushopt") {		} else if (Feature == "+clflushopt") {
HasCLFLUSHOPT = true;		HasCLFLUSHOPT = true;
} else if (Feature == "+clwb") {		} else if (Feature == "+clwb") {
HasCLWB = true;		HasCLWB = true;
} else if (Feature == "+prefetchwt1") {		} else if (Feature == "+prefetchwt1") {
HasPREFETCHWT1 = true;		HasPREFETCHWT1 = true;
} else if (Feature == "+clzero") {		} else if (Feature == "+clzero") {
HasCLZERO = true;		HasCLZERO = true;
		} else if (Feature == "+retpoline") {
		HasRetpoline = true;
}		}

X86SSEEnum Level = llvm::StringSwitch<X86SSEEnum>(Feature)		X86SSEEnum Level = llvm::StringSwitch<X86SSEEnum>(Feature)
.Case("+avx512f", AVX512F)		.Case("+avx512f", AVX512F)
.Case("+avx2", AVX2)		.Case("+avx2", AVX2)
.Case("+avx", AVX)		.Case("+avx", AVX)
.Case("+sse4.2", SSE42)		.Case("+sse4.2", SSE42)
.Case("+sse4.1", SSE41)		.Case("+sse4.1", SSE41)
▲ Show 20 Lines • Show All 526 Lines • ▼ Show 20 Lines	return llvm::StringSwitch<bool>(Feature)
.Case("mwaitx", HasMWAITX)		.Case("mwaitx", HasMWAITX)
.Case("pclmul", HasPCLMUL)		.Case("pclmul", HasPCLMUL)
.Case("pku", HasPKU)		.Case("pku", HasPKU)
.Case("popcnt", HasPOPCNT)		.Case("popcnt", HasPOPCNT)
.Case("prefetchwt1", HasPREFETCHWT1)		.Case("prefetchwt1", HasPREFETCHWT1)
.Case("prfchw", HasPRFCHW)		.Case("prfchw", HasPRFCHW)
.Case("rdrnd", HasRDRND)		.Case("rdrnd", HasRDRND)
.Case("rdseed", HasRDSEED)		.Case("rdseed", HasRDSEED)
		.Case("retpoline", HasRetpoline)
		majnemerUnsubmitted Not Done Reply Inline Actions Why is this phrased as a target feature? It just seems weird as it is not a hardware capability (all X86 hardware should support the reptoline instruction sequence unless I am mistaken). Also, it seems that other hardware platforms beyond Intel have problems with speculation of this nature. IMO, this is more similar to things like the relocation-model and the code-model... majnemer: Why is this phrased as a target feature? It just seems weird as it is not a hardware capability…
		chandlercAuthorUnsubmitted Not Done Reply Inline Actions All x86-64 hardware supports SSE and SSE2, but we can still turn them on and off? I think it makes some sense as it only impacts the target's code generation, not anything produced by Clang etc, and this seems to be a way to pass that down. And while other architectures may want to introduce similar mitigations for Spectre, the retpoline construct itself is pretty x86-specific. My suspicion is that other architectures will potentially re-use some of the LLVM infrastructure like removing indirectbr, but want to handle the hard cases with a target-specific technique. So something like relocation-model and code-model seem weird to me as this is (at least so far) a fundamentally x86 thing. chandlerc: All x86-64 hardware supports SSE and SSE2, but we can still turn them on and off? I think it…
		davidxlUnsubmitted Not Done Reply Inline Actions This can go either way. Fundamentally, the option is used to 'disable' indirect branch predictor so it sounds like target independent. However the mechanism used here 'retpoline' is very x86 specific -- it forces the HW to use call/ret predictor so that the predicted target is 'trapped' in a loop. davidxl: This can go either way. Fundamentally, the option is used to 'disable' indirect branch…
.Case("rtm", HasRTM)		.Case("rtm", HasRTM)
.Case("sgx", HasSGX)		.Case("sgx", HasSGX)
.Case("sha", HasSHA)		.Case("sha", HasSHA)
.Case("shstk", HasSHSTK)		.Case("shstk", HasSHSTK)
.Case("sse", SSELevel >= SSE1)		.Case("sse", SSELevel >= SSE1)
.Case("sse2", SSELevel >= SSE2)		.Case("sse2", SSELevel >= SSE2)
.Case("sse3", SSELevel >= SSE3)		.Case("sse3", SSELevel >= SSE3)
.Case("ssse3", SSELevel >= SSSE3)		.Case("ssse3", SSELevel >= SSSE3)
▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines

clang/test/Driver/x86-target-features.c

	Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
	// VNNI: "-target-feature" "+avx512vnni"			// VNNI: "-target-feature" "+avx512vnni"
	// NO-VNNI: "-target-feature" "-avx512vnni"			// NO-VNNI: "-target-feature" "-avx512vnni"

	// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mavx512vbmi2 %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=VBMI2 %s			// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mavx512vbmi2 %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=VBMI2 %s
	// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-avx512vbmi2 %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=NO-VBMI2 %s			// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-avx512vbmi2 %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=NO-VBMI2 %s
	// VBMI2: "-target-feature" "+avx512vbmi2"			// VBMI2: "-target-feature" "+avx512vbmi2"
	// NO-VBMI2: "-target-feature" "-avx512vbmi2"			// NO-VBMI2: "-target-feature" "-avx512vbmi2"

				// RUN: %clang -target i386-linux-gnu -mretpoline %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=RETPOLINE %s
				// RUN: %clang -target i386-linux-gnu -mno-retpoline %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=NO-RETPOLINE %s
				// RETPOLINE: "-target-feature" "+retpoline"
				// NO-RETPOLINE: "-target-feature" "-retpoline"

lld/ELF/Arch/X86.cpp

Show All 15 Lines

using namespace llvm;		using namespace llvm;
using namespace llvm::support::endian;		using namespace llvm::support::endian;
using namespace llvm::ELF;		using namespace llvm::ELF;
using namespace lld;		using namespace lld;
using namespace lld::elf;		using namespace lld::elf;

namespace {		namespace {
class X86 final : public TargetInfo {		class X86 : public TargetInfo {
public:		public:
X86();		X86();
RelExpr getRelExpr(RelType Type, const Symbol &S,		RelExpr getRelExpr(RelType Type, const Symbol &S,
const uint8_t *Loc) const override;		const uint8_t *Loc) const override;
int64_t getImplicitAddend(const uint8_t *Buf, RelType Type) const override;		int64_t getImplicitAddend(const uint8_t *Buf, RelType Type) const override;
void writeGotPltHeader(uint8_t *Buf) const override;		void writeGotPltHeader(uint8_t *Buf) const override;
RelType getDynRel(RelType Type) const override;		RelType getDynRel(RelType Type) const override;
void writeGotPlt(uint8_t *Buf, const Symbol &S) const override;		void writeGotPlt(uint8_t *Buf, const Symbol &S) const override;
▲ Show 20 Lines • Show All 361 Lines • ▼ Show 20 Lines	void X86::relaxTlsLdToLe(uint8_t *Loc, RelType Type, uint64_t Val) const {
const uint8_t Inst[] = {		const uint8_t Inst[] = {
0x65, 0xa1, 0x00, 0x00, 0x00, 0x00, // movl %gs:0,%eax		0x65, 0xa1, 0x00, 0x00, 0x00, 0x00, // movl %gs:0,%eax
0x90, // nop		0x90, // nop
0x8d, 0x74, 0x26, 0x00, // leal 0(%esi,1),%esi		0x8d, 0x74, 0x26, 0x00, // leal 0(%esi,1),%esi
};		};
memcpy(Loc - 2, Inst, sizeof(Inst));		memcpy(Loc - 2, Inst, sizeof(Inst));
}		}

		namespace {
		class RetpolinePic : public X86 {
		public:
		RetpolinePic();
		void writeGotPlt(uint8_t *Buf, const Symbol &S) const override;
		void writePltHeader(uint8_t *Buf) const override;
		void writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr, uint64_t PltEntryAddr,
		int32_t Index, unsigned RelOff) const override;
		};

		class RetpolineNoPic : public X86 {
		public:
		RetpolineNoPic();
		void writeGotPlt(uint8_t *Buf, const Symbol &S) const override;
		void writePltHeader(uint8_t *Buf) const override;
		void writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr, uint64_t PltEntryAddr,
		int32_t Index, unsigned RelOff) const override;
		};
		} // namespace

		RetpolinePic::RetpolinePic() {
		PltEntrySize = 32;
		PltHeaderSize = 32;
		}

		void RetpolinePic::writeGotPlt(uint8_t *Buf, const Symbol &S) const {
		write32le(Buf, S.getPltVA() + 16);
		}

		void RetpolinePic::writePltHeader(uint8_t *Buf) const {
		const uint8_t Insn[] = {
		0x50, // pushl %eax
		0x8b, 0x83, 0, 0, 0, 0, // mov GOTPLT+8(%ebx), %eax
		0xe8, 0x04, 0x00, 0x00, 0x00, // call next
		0xf3, 0x90, // loop: pause
		0xeb, 0xfc, // jmp loop; .align 16
		0x83, 0xc4, 0x04, // next: add $4, %esp
		0x87, 0x04, 0x24, // xchg %eax, (%esp)
		0xc3, // ret
		};
		memcpy(Buf, Insn, sizeof(Insn));

		uint32_t Ebx = InX::Got->getVA() + InX::Got->getSize();
		uint32_t GotPlt = InX::GotPlt->getVA() - Ebx;
		write32le(Buf + 3, GotPlt + 8);
		}

		void RetpolinePic::writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr,
		uint64_t PltEntryAddr, int32_t Index,
		unsigned RelOff) const {
		const uint8_t Insn[] = {
		0x50, // pushl %eax
		0x8b, 0x83, 0, 0, 0, 0, // mov foo@GOT(%ebx), %eax
		0xe8, 0, 0, 0, 0, // call plt+16
		0xf3, 0x90, // loop: pause
		0xeb, 0xfc, // jmp loop; .align 16
		0x68, 0, 0, 0, 0, // pushl $reloc_offset
		0xff, 0xb3, 0, 0, 0, 0, // pushl GOTPLT+4(%ebx)
		0xe9, 0, 0, 0, 0, // jmp plt[0]
		};
		memcpy(Buf, Insn, sizeof(Insn));

		uint32_t Ebx = InX::Got->getVA() + InX::Got->getSize();
		uint32_t GotPlt = InX::GotPlt->getVA() - Ebx;
		write32le(Buf + 3, GotPltEntryAddr - Ebx);
		write32le(Buf + 8, -Index * PltEntrySize - PltHeaderSize + 4);
		write32le(Buf + 17, RelOff);
		write32le(Buf + 23, GotPlt + 4);
		write32le(Buf + 28, -Index * PltEntrySize - PltHeaderSize - 32);
		}

		RetpolineNoPic::RetpolineNoPic() {
		PltEntrySize = 32;
		PltHeaderSize = 32;
		}

		void RetpolineNoPic::writeGotPlt(uint8_t *Buf, const Symbol &S) const {
		write32le(Buf, S.getPltVA() + 16);
		}

		void RetpolineNoPic::writePltHeader(uint8_t *Buf) const {
		const uint8_t PltData[] = {
		0x50, // pushl %eax
		0xa1, 0, 0, 0, 0, // mov GOTPLT+8, %eax
		0xe8, 0x05, 0x00, 0x00, 0x00, // call next
		0xf3, 0x90, // loop: pause
		0xeb, 0xfc, // jmp loop
		0x90, // nop; .align 16
		0x83, 0xc4, 0x04, // next: add $4, %esp
		0x87, 0x04, 0x24, // xchg %eax, (%esp)
		chandlercAuthorUnsubmitted Not Done Reply Inline Actions Does it make sense to use something like the `pushl` sequence Reid came up with here? In the non-PLT case it looks like: addl $4, %esp pushl 4(%esp) pushl 4(%esp) popl 8(%esp) popl (%esp) So it would potentially need to be done a touch differently to work here, but maybe something like that rather than `xchg`? Even if the alternative is a lot more instructions, the `xchg` instruction is a locked instruction on x86 and so this will actually create synchronization overhead on the cache line of the top of the stack. chandlerc: Does it make sense to use something like the `pushl` sequence Reid came up with here? In the…
		rnkUnsubmitted Not Done Reply Inline Actions This is a real concern, I checked the Intel manual, and it says: Exchanges the contents of the destination (first) and source (second) operands. The operands can be two general purpose registers or a register and a memory location. If a memory operand is referenced, the processor’s locking protocol is automatically implemented for the duration of the exchange operation, regardless of the presence or absence of the LOCK prefix or of the value of the IOPL. One question is, do we want to try to avoid `PUSH/POP MEM` instructions? LLVM has x86 subtarget features that suggest that these instructions are slow on some CPU models. To avoid them completely, we could use this code sequence: movl %ecx, (%esp) # save ECX over useless RA movl 4(%esp), %ecx # load original EAX to ECX movl %eax, 4(%esp) # store callee over saved EAX movl %ecx, %eax # restore EAX popl %ecx # restore ECX retl # return to callee When it comes down to it, this just implements `xchg` with a scratch register. On reflection, I think the code sequence above is the best we can do. The PUSH sequence you describe is 8 memory operations vs 4 if we use ECX as scratch. rnk: This is a real concern, I checked the Intel manual, and it says: > Exchanges the contents of…
		ruiuUnsubmitted Not Done Reply Inline Actions I didn't know that xchg is so slow. There's no reason not to use push/pop instructions to swap a word at the stack top and a register. Since this is a PLT header (and not a PLT entry), the size of the instrcutions doesn't really matter. Do you want me to update my patch? ruiu: I didn't know that xchg is so slow. There's no reason not to use push/pop instructions to swap…
		chandlercAuthorUnsubmitted Not Done Reply Inline Actions Yeah, if you can email me a delta on top of this, I'll fold it in. Thanks! (Suggesting that in part to make backports a bit easier...) chandlerc: Yeah, if you can email me a delta on top of this, I'll fold it in. Thanks! (Suggesting that in…
		0xc3, // ret
		};
		memcpy(Buf, PltData, sizeof(PltData));

		uint32_t GotPlt = InX::GotPlt->getVA();
		write32le(Buf + 2, GotPlt + 8);
		}

		void RetpolineNoPic::writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr,
		uint64_t PltEntryAddr, int32_t Index,
		unsigned RelOff) const {
		const uint8_t Insn[] = {
		0x50, // pushl %eax
		0xa1, 0, 0, 0, 0, // mov foo_in_GOT, %eax
		0xe8, 0, 0, 0, 0, // call plt+16
		0xf3, 0x90, // loop: pause
		0xeb, 0xfc, // jmp loop
		0x90, // nop; .align 16
		0x68, 0, 0, 0, 0, // pushl $reloc_offset
		0xff, 0x35, 0, 0, 0, 0, // pushl (GOTPLT+4)
		0xe9, 0, 0, 0, 0, // jmp plt[0]
		};
		memcpy(Buf, Insn, sizeof(Insn));

		uint32_t GotPlt = InX::GotPlt->getVA();
		write32le(Buf + 2, GotPltEntryAddr);
		write32le(Buf + 7, -Index * PltEntrySize - PltHeaderSize - 11 + 16);
		write32le(Buf + 17, RelOff);
		write32le(Buf + 23, GotPlt + 4);
		write32le(Buf + 28, -Index * PltEntrySize - PltHeaderSize - 32);
		}

TargetInfo *elf::getX86TargetInfo() {		TargetInfo *elf::getX86TargetInfo() {
static X86 Target;		if (Config->ZRetpolineplt) {
return &Target;		if (Config->Pic) {
		static RetpolinePic T;
		return &T;
		}
		static RetpolineNoPic T;
		return &T;
		}

		static X86 T;
		return &T;
}		}

lld/ELF/Arch/X86_64.cpp

Show All 17 Lines
using namespace llvm;		using namespace llvm;
using namespace llvm::object;		using namespace llvm::object;
using namespace llvm::support::endian;		using namespace llvm::support::endian;
using namespace llvm::ELF;		using namespace llvm::ELF;
using namespace lld;		using namespace lld;
using namespace lld::elf;		using namespace lld::elf;

namespace {		namespace {
template <class ELFT> class X86_64 final : public TargetInfo {		template <class ELFT> class X86_64 : public TargetInfo {
public:		public:
X86_64();		X86_64();
RelExpr getRelExpr(RelType Type, const Symbol &S,		RelExpr getRelExpr(RelType Type, const Symbol &S,
const uint8_t *Loc) const override;		const uint8_t *Loc) const override;
bool isPicRel(RelType Type) const override;		bool isPicRel(RelType Type) const override;
void writeGotPltHeader(uint8_t *Buf) const override;		void writeGotPltHeader(uint8_t *Buf) const override;
void writeGotPlt(uint8_t *Buf, const Symbol &S) const override;		void writeGotPlt(uint8_t *Buf, const Symbol &S) const override;
void writePltHeader(uint8_t *Buf) const override;		void writePltHeader(uint8_t *Buf) const override;
▲ Show 20 Lines • Show All 420 Lines • ▼ Show 20 Lines	void X86_64<ELFT>::relaxGot(uint8_t *Loc, uint64_t Val) const {
// Convert "jmp *foo@GOTPCREL(%rip)" to "jmp foo; nop".		// Convert "jmp *foo@GOTPCREL(%rip)" to "jmp foo; nop".
// jmp doesn't return, so it is fine to use nop here, it is just a stub.		// jmp doesn't return, so it is fine to use nop here, it is just a stub.
assert(ModRm == 0x25);		assert(ModRm == 0x25);
Loc[-2] = 0xe9; // jmp		Loc[-2] = 0xe9; // jmp
Loc[3] = 0x90; // nop		Loc[3] = 0x90; // nop
write32le(Loc - 1, Val + 1);		write32le(Loc - 1, Val + 1);
}		}

TargetInfo *elf::getX32TargetInfo() {		namespace {
static X86_64<ELF32LE> Target;		template <class ELFT> class Retpoline : public X86_64<ELFT> {
return &Target;		public:
		Retpoline();
		void writeGotPlt(uint8_t *Buf, const Symbol &S) const override;
		void writePltHeader(uint8_t *Buf) const override;
		void writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr, uint64_t PltEntryAddr,
		int32_t Index, unsigned RelOff) const override;
		};

		template <class ELFT> class RetpolineZNow : public X86_64<ELFT> {
		public:
		void writeGotPlt(uint8_t *Buf, const Symbol &S) const override {}
		void writePltHeader(uint8_t *Buf) const override;
		void writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr, uint64_t PltEntryAddr,
		int32_t Index, unsigned RelOff) const override;
		};
		} // namespace

		template <class ELFT> Retpoline<ELFT>::Retpoline() {
		TargetInfo::PltEntrySize = 32;
		TargetInfo::PltHeaderSize = 48;
		}

		template <class ELFT>
		void Retpoline<ELFT>::writeGotPlt(uint8_t *Buf, const Symbol &S) const {
		write32le(Buf, S.getPltVA() + 21);
		}

		template <class ELFT> void Retpoline<ELFT>::writePltHeader(uint8_t *Buf) const {
		const uint8_t Insn[] = {
		0xff, 0x35, 0, 0, 0, 0, // pushq GOTPLT+8(%rip)
		0x4c, 0x8b, 0x1d, 0, 0, 0, 0, // mov GOTPLT+16(%rip), %r11
		0xe8, 0x0e, 0x00, 0x00, 0x00, // callq next
		0xf3, 0x90, // loop: pause
		0xeb, 0xfc, // jmp loop
		0x0f, 0x1f, 0x44, 0x00, 0x00, // nop
		0x0f, 0x1f, 0x44, 0x00, 0x00, // nop; .align 16
		0x4c, 0x89, 0x1c, 0x24, // next: mov %r11, (%rsp)
		0xc3, // ret
		};
		memcpy(Buf, Insn, sizeof(Insn));

		uint64_t GotPlt = InX::GotPlt->getVA();
		uint64_t Plt = InX::Plt->getVA();
		write32le(Buf + 2, GotPlt - Plt + 2); // GOTPLT+8
		write32le(Buf + 9, GotPlt - Plt + 3); // GOTPLT+16
}		}

TargetInfo *elf::getX86_64TargetInfo() {		template <class ELFT>
static X86_64<ELF64LE> Target;		void Retpoline<ELFT>::writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr,
return &Target;		uint64_t PltEntryAddr, int32_t Index,
		unsigned RelOff) const {
		const uint8_t Insn[] = {
		0x4c, 0x8b, 0x1d, 0, 0, 0, 0, // mov foo@GOTPLT(%rip), %r11
		0xe8, 0x04, 0x00, 0x00, 0x00, // callq next
		0xf3, 0x90, // loop: pause
		0xeb, 0xfc, // jmp loop; .align 16
		0x4c, 0x89, 0x1c, 0x24, // next: mov %r11, (%rsp)
		0xc3, // ret
		0x68, 0, 0, 0, 0, // pushq <relocation index>
		0xe9, 0, 0, 0, 0, // jmpq plt[0]
		0xcc, // int3
		ruiuUnsubmitted Not Done Reply Inline Actions Chander, I also noticed we can improve instructions here. We can use the following instructions instead so that the jump target to lazy-resolve PLT is aligned to 16 byte. I can make a change now if you want. mov foo@GOTPLT(%rip), %r11 callq next loop: pause jmp plt+32; .align 16 pushq <relocation index> // lazy-resolve a PLT entry jmpq plt[0] ruiu: Chander, I also noticed we can improve instructions here. We can use the following…
		chandlercAuthorUnsubmitted Not Done Reply Inline Actions Will this break the 16-byte property when combined with -z now? That seems more relevant. Given that the lazy-binding is intended to only run once, I don't think its worth burning much space to speed it up. chandlerc: Will this break the 16-byte property when combined with -z now? That seems more relevant. Given…
		};
		memcpy(Buf, Insn, sizeof(Insn));

		write32le(Buf + 3, GotPltEntryAddr - PltEntryAddr - 7);
		write32le(Buf + 22, Index);
		write32le(Buf + 27,
		-Index * TargetInfo::PltEntrySize - TargetInfo::PltHeaderSize - 31);
		}

		template <class ELFT>
		void RetpolineZNow<ELFT>::writePltHeader(uint8_t *Buf) const {
		const uint8_t Insn[] = {
		0x4c, 0x89, 0x1c, 0x24, // mov %r11, (%rsp)
		0xc3, // ret
		};
		memcpy(Buf, Insn, sizeof(Insn));
}		}

		template <class ELFT>
		void RetpolineZNow<ELFT>::writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr,
		uint64_t PltEntryAddr, int32_t Index,
		unsigned RelOff) const {
		const uint8_t Insn[] = {
		0x4c, 0x8b, 0x1d, 0, 0, 0, 0, // mov foo@GOTPLT(%rip), %r11
		0xe8, 0, 0, 0, 0, // call plt[0]
		0xf3, 0x90, // loop: pause
		0xeb, 0xfc, // jmp loop
		};
		memcpy(Buf, Insn, sizeof(Insn));

		write32le(Buf + 3, GotPltEntryAddr - PltEntryAddr - 7);
		write32le(Buf + 8,
		-Index * TargetInfo::PltEntrySize - TargetInfo::PltHeaderSize - 12);
		}

		template <class ELFT> TargetInfo *getTargetInfo() {
		if (Config->ZRetpolineplt) {
		if (Config->ZNow) {
		static RetpolineZNow<ELFT> T;
		return &T;
		}
		static Retpoline<ELFT> T;
		return &T;
		}

		static X86_64<ELFT> T;
		return &T;
		}

		TargetInfo *elf::getX32TargetInfo() { return getTargetInfo<ELF32LE>(); }
		TargetInfo *elf::getX86_64TargetInfo() { return getTargetInfo<ELF64LE>(); }

lld/ELF/Config.h

Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	struct Configuration {
bool ZNocopyreloc;		bool ZNocopyreloc;
bool ZNodelete;		bool ZNodelete;
bool ZNodlopen;		bool ZNodlopen;
bool ZNow;		bool ZNow;
bool ZOrigin;		bool ZOrigin;
bool ZRelro;		bool ZRelro;
bool ZRodynamic;		bool ZRodynamic;
bool ZText;		bool ZText;
		bool ZRetpolineplt;
bool ExitEarly;		bool ExitEarly;
bool ZWxneeded;		bool ZWxneeded;
DiscardPolicy Discard;		DiscardPolicy Discard;
OrphanHandlingPolicy OrphanHandling;		OrphanHandlingPolicy OrphanHandling;
SortSectionPolicy SortSection;		SortSectionPolicy SortSection;
StripPolicy Strip;		StripPolicy Strip;
UnresolvedPolicy UnresolvedSymbols;		UnresolvedPolicy UnresolvedSymbols;
Target2Policy Target2;		Target2Policy Target2;
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

lld/ELF/Driver.cpp

Show First 20 Lines • Show All 668 Lines • ▼ Show 20 Lines	void LinkerDriver::readConfigs(opt::InputArgList &Args) {
Config->ZCombreloc = !hasZOption(Args, "nocombreloc");		Config->ZCombreloc = !hasZOption(Args, "nocombreloc");
Config->ZExecstack = hasZOption(Args, "execstack");		Config->ZExecstack = hasZOption(Args, "execstack");
Config->ZNocopyreloc = hasZOption(Args, "nocopyreloc");		Config->ZNocopyreloc = hasZOption(Args, "nocopyreloc");
Config->ZNodelete = hasZOption(Args, "nodelete");		Config->ZNodelete = hasZOption(Args, "nodelete");
Config->ZNodlopen = hasZOption(Args, "nodlopen");		Config->ZNodlopen = hasZOption(Args, "nodlopen");
Config->ZNow = hasZOption(Args, "now");		Config->ZNow = hasZOption(Args, "now");
Config->ZOrigin = hasZOption(Args, "origin");		Config->ZOrigin = hasZOption(Args, "origin");
Config->ZRelro = !hasZOption(Args, "norelro");		Config->ZRelro = !hasZOption(Args, "norelro");
		Config->ZRetpolineplt = hasZOption(Args, "retpolineplt");
Config->ZRodynamic = hasZOption(Args, "rodynamic");		Config->ZRodynamic = hasZOption(Args, "rodynamic");
Config->ZStackSize = args::getZOptionValue(Args, OPT_z, "stack-size", 0);		Config->ZStackSize = args::getZOptionValue(Args, OPT_z, "stack-size", 0);
Config->ZText = !hasZOption(Args, "notext");		Config->ZText = !hasZOption(Args, "notext");
Config->ZWxneeded = hasZOption(Args, "wxneeded");		Config->ZWxneeded = hasZOption(Args, "wxneeded");

// Parse LTO plugin-related options for compatibility with gold.		// Parse LTO plugin-related options for compatibility with gold.
for (auto *Arg : Args.filtered(OPT_plugin_opt, OPT_plugin_opt_eq)) {		for (auto *Arg : Args.filtered(OPT_plugin_opt, OPT_plugin_opt_eq)) {
StringRef S = Arg->getValue();		StringRef S = Arg->getValue();
▲ Show 20 Lines • Show All 443 Lines • Show Last 20 Lines

lld/test/ELF/i386-retpoline-nopic.s

This file was added.

				// REQUIRES: x86
				// RUN: llvm-mc -filetype=obj -triple=i386-unknown-linux %s -o %t1.o
				// RUN: llvm-mc -filetype=obj -triple=i386-unknown-linux %p/Inputs/shared.s -o %t2.o
				// RUN: ld.lld -shared %t2.o -o %t2.so

				// RUN: ld.lld %t1.o %t2.so -o %t.exe -z retpolineplt
				// RUN: llvm-objdump -d %t.exe \| FileCheck %s

				// CHECK: 11010: 50 pushl %eax
				// CHECK-NEXT: 11011: a1 08 20 01 00 movl 73736, %eax
				// CHECK-NEXT: 11016: e8 05 00 00 00 calll 5 <.plt+0x10>
				// CHECK-NEXT: 1101b: f3 90 pause
				// CHECK-NEXT: 1101d: eb fc jmp -4 <.plt+0xb>
				// CHECK-NEXT: 1101f: 90 nop
				// CHECK-NEXT: 11020: 83 c4 04 addl $4, %esp
				// CHECK-NEXT: 11023: 87 04 24 xchgl %eax, (%esp)
				// CHECK-NEXT: 11026: c3 retl
				// CHECK-NEXT: 11027: cc int3
				// CHECK-NEXT: 11028: cc int3
				// CHECK-NEXT: 11029: cc int3
				// CHECK-NEXT: 1102a: cc int3
				// CHECK-NEXT: 1102b: cc int3
				// CHECK-NEXT: 1102c: cc int3
				// CHECK-NEXT: 1102d: cc int3
				// CHECK-NEXT: 1102e: cc int3
				// CHECK-NEXT: 1102f: cc int3
				// CHECK-NEXT: 11030: 50 pushl %eax
				// CHECK-NEXT: 11031: a1 0c 20 01 00 movl 73740, %eax
				// CHECK-NEXT: 11036: e8 e5 ff ff ff calll -27 <.plt+0x10>
				// CHECK-NEXT: 1103b: f3 90 pause
				// CHECK-NEXT: 1103d: eb fc jmp -4 <.plt+0x2b>
				// CHECK-NEXT: 1103f: 90 nop
				// CHECK-NEXT: 11040: 68 00 00 00 00 pushl $0
				// CHECK-NEXT: 11045: ff 35 04 20 01 00 pushl 73732
				// CHECK-NEXT: 1104b: e9 c0 ff ff ff jmp -64 <.plt>
				// CHECK-NEXT: 11050: 50 pushl %eax
				// CHECK-NEXT: 11051: a1 10 20 01 00 movl 73744, %eax
				// CHECK-NEXT: 11056: e8 c5 ff ff ff calll -59 <.plt+0x10>
				// CHECK-NEXT: 1105b: f3 90 pause
				// CHECK-NEXT: 1105d: eb fc jmp -4 <.plt+0x4b>
				// CHECK-NEXT: 1105f: 90 nop
				// CHECK-NEXT: 11060: 68 08 00 00 00 pushl $8
				// CHECK-NEXT: 11065: ff 35 04 20 01 00 pushl 73732
				// CHECK-NEXT: 1106b: e9 a0 ff ff ff jmp -96 <.plt>

				.global _start
				_start:
				jmp bar@PLT
				jmp zed@PLT

lld/test/ELF/i386-retpoline-pic.s

This file was added.

				// REQUIRES: x86
				// RUN: llvm-mc -filetype=obj -triple=i386-unknown-linux -position-independent %s -o %t1.o
				// RUN: llvm-mc -filetype=obj -triple=i386-unknown-linux -position-independent %p/Inputs/shared.s -o %t2.o
				// RUN: ld.lld -shared %t2.o -o %t2.so

				// RUN: ld.lld %t1.o %t2.so -o %t.exe -z retpolineplt -pie
				// RUN: llvm-objdump -d %t.exe \| FileCheck %s

				// CHECK: 1010: 50 pushl %eax
				// CHECK-NEXT: 1011: 8b 83 08 20 00 00 movl 8200(%ebx), %eax
				// CHECK-NEXT: 1017: e8 04 00 00 00 calll 4 <.plt+0x10>
				// CHECK-NEXT: 101c: f3 90 pause
				// CHECK-NEXT: 101e: eb fc jmp -4 <.plt+0xc>
				// CHECK-NEXT: 1020: 83 c4 04 addl $4, %esp
				// CHECK-NEXT: 1023: 87 04 24 xchgl %eax, (%esp)
				// CHECK-NEXT: 1026: c3 retl
				// CHECK-NEXT: 1027: cc int3
				// CHECK-NEXT: 1028: cc int3
				// CHECK-NEXT: 1029: cc int3
				// CHECK-NEXT: 102a: cc int3
				// CHECK-NEXT: 102b: cc int3
				// CHECK-NEXT: 102c: cc int3
				// CHECK-NEXT: 102d: cc int3
				// CHECK-NEXT: 102e: cc int3
				// CHECK-NEXT: 102f: cc int3
				// CHECK-NEXT: 1030: 50 pushl %eax
				// CHECK-NEXT: 1031: 8b 83 0c 20 00 00 movl 8204(%ebx), %eax
				// CHECK-NEXT: 1037: e8 e4 ff ff ff calll -28 <.plt+0x10>
				// CHECK-NEXT: 103c: f3 90 pause
				// CHECK-NEXT: 103e: eb fc jmp -4 <.plt+0x2c>
				// CHECK-NEXT: 1040: 68 00 00 00 00 pushl $0
				// CHECK-NEXT: 1045: ff b3 04 20 00 00 pushl 8196(%ebx)
				// CHECK-NEXT: 104b: e9 c0 ff ff ff jmp -64 <.plt>
				// CHECK-NEXT: 1050: 50 pushl %eax
				// CHECK-NEXT: 1051: 8b 83 10 20 00 00 movl 8208(%ebx), %eax
				// CHECK-NEXT: 1057: e8 c4 ff ff ff calll -60 <.plt+0x10>
				// CHECK-NEXT: 105c: f3 90 pause
				// CHECK-NEXT: 105e: eb fc jmp -4 <.plt+0x4c>
				// CHECK-NEXT: 1060: 68 08 00 00 00 pushl $8
				// CHECK-NEXT: 1065: ff b3 04 20 00 00 pushl 8196(%ebx)
				// CHECK-NEXT: 106b: e9 a0 ff ff ff jmp -96 <.plt>

				.global _start
				_start:
				jmp bar@PLT
				jmp zed@PLT

lld/test/ELF/x86-64-retpoline-znow.s

This file was added.

				// REQUIRES: x86
				// RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t1.o
				// RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %p/Inputs/shared.s -o %t2.o
				// RUN: ld.lld -shared %t2.o -o %t2.so

				// RUN: ld.lld -shared %t1.o %t2.so -o %t.exe -z retpolineplt -z now
				// RUN: llvm-objdump -d %t.exe \| FileCheck %s

				// CHECK: 1010: 4c 89 1c 24 movq %r11, (%rsp)
				// CHECK-NEXT: 1014: c3 retq
				// CHECK-NEXT: 1015: cc int3
				// CHECK-NEXT: 1016: cc int3
				// CHECK-NEXT: 1017: cc int3
				// CHECK-NEXT: 1018: cc int3
				// CHECK-NEXT: 1019: cc int3
				// CHECK-NEXT: 101a: cc int3
				// CHECK-NEXT: 101b: cc int3
				// CHECK-NEXT: 101c: cc int3
				// CHECK-NEXT: 101d: cc int3
				// CHECK-NEXT: 101e: cc int3
				// CHECK-NEXT: 101f: cc int3
				// CHECK-NEXT: 1020: 4c 8b 1d d1 10 00 00 movq 4305(%rip), %r11
				// CHECK-NEXT: 1027: e8 e4 ff ff ff callq -28 <.plt>
				// CHECK-NEXT: 102c: f3 90 pause
				// CHECK-NEXT: 102e: eb fc jmp -4 <.plt+0x1c>
				// CHECK-NEXT: 1030: 4c 8b 1d c9 10 00 00 movq 4297(%rip), %r11
				// CHECK-NEXT: 1037: e8 d4 ff ff ff callq -44 <.plt>
				// CHECK-NEXT: 103c: f3 90 pause
				// CHECK-NEXT: 103e: eb fc jmp -4 <.plt+0x2c>

				.global _start
				_start:
				jmp bar@PLT
				jmp zed@PLT

lld/test/ELF/x86-64-retpoline.s

This file was added.

				// REQUIRES: x86
				// RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t1.o
				// RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %p/Inputs/shared.s -o %t2.o
				// RUN: ld.lld -shared %t2.o -o %t2.so

				// RUN: ld.lld -shared %t1.o %t2.so -o %t.exe -z retpolineplt
				// RUN: llvm-objdump -d %t.exe \| FileCheck %s

				// CHECK: 1010: ff 35 f2 0f 00 00 pushq 4082(%rip)
				// CHECK-NEXT: 1016: 4c 8b 1d f3 0f 00 00 movq 4083(%rip), %r11
				// CHECK-NEXT: 101d: e8 0e 00 00 00 callq 14 <.plt+0x20>
				// CHECK-NEXT: 1022: f3 90 pause
				// CHECK-NEXT: 1024: eb fc jmp -4 <.plt+0x12>
				// CHECK-NEXT: 1026: 0f 1f 44 00 00 nopl
				// CHECK-NEXT: 102b: 0f 1f 44 00 00 nopl
				// CHECK-NEXT: 1030: 4c 89 1c 24 movq %r11, (%rsp)
				// CHECK-NEXT: 1034: c3 retq
				// CHECK-NEXT: 1035: cc int3
				// CHECK-NEXT: 1036: cc int3
				// CHECK-NEXT: 1037: cc int3
				// CHECK-NEXT: 1038: cc int3
				// CHECK-NEXT: 1039: cc int3
				// CHECK-NEXT: 103a: cc int3
				// CHECK-NEXT: 103b: cc int3
				// CHECK-NEXT: 103c: cc int3
				// CHECK-NEXT: 103d: cc int3
				// CHECK-NEXT: 103e: cc int3
				// CHECK-NEXT: 103f: cc int3
				// CHECK-NEXT: 1040: 4c 8b 1d d1 0f 00 00 movq 4049(%rip), %r11
				// CHECK-NEXT: 1047: e8 04 00 00 00 callq 4 <.plt+0x40>
				// CHECK-NEXT: 104c: f3 90 pause
				// CHECK-NEXT: 104e: eb fc jmp -4 <.plt+0x3c>
				// CHECK-NEXT: 1050: 4c 89 1c 24 movq %r11, (%rsp)
				// CHECK-NEXT: 1054: c3 retq
				// CHECK-NEXT: 1055: 68 00 00 00 00 pushq $0
				// CHECK-NEXT: 105a: e9 b1 ff ff ff jmp -79 <.plt>
				// CHECK-NEXT: 105f: cc int3
				// CHECK-NEXT: 1060: 4c 8b 1d b9 0f 00 00 movq 4025(%rip), %r11
				// CHECK-NEXT: 1067: e8 04 00 00 00 callq 4 <.plt+0x60>
				// CHECK-NEXT: 106c: f3 90 pause
				// CHECK-NEXT: 106e: eb fc jmp -4 <.plt+0x5c>
				// CHECK-NEXT: 1070: 4c 89 1c 24 movq %r11, (%rsp)
				// CHECK-NEXT: 1074: c3 retq
				// CHECK-NEXT: 1075: 68 01 00 00 00 pushq $1
				// CHECK-NEXT: 107a: e9 91 ff ff ff jmp -111 <.plt>
				// CHECK-NEXT: 107f: cc int3

				.global _start
				_start:
				jmp bar@PLT
				jmp zed@PLT

llvm/include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 411 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.

/// This pass expands the experimental reduction intrinsics into sequences of		/// This pass expands the experimental reduction intrinsics into sequences of
/// shuffles.		/// shuffles.
FunctionPass *createExpandReductionsPass();		FunctionPass *createExpandReductionsPass();

// This pass expands memcmp() to load/stores.		// This pass expands memcmp() to load/stores.
FunctionPass *createExpandMemCmpPass();		FunctionPass *createExpandMemCmpPass();

		// This pass expands indirectbr instructions.
		FunctionPass *createIndirectBrExpandPass();

} // End llvm namespace		} // End llvm namespace

#endif		#endif

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 794 Lines • ▼ Show 20 Lines	public:

/// Return true if the operation uses custom lowering, regardless of whether		/// Return true if the operation uses custom lowering, regardless of whether
/// the type is legal or not.		/// the type is legal or not.
bool isOperationCustom(unsigned Op, EVT VT) const {		bool isOperationCustom(unsigned Op, EVT VT) const {
return getOperationAction(Op, VT) == Custom;		return getOperationAction(Op, VT) == Custom;
}		}

/// Return true if lowering to a jump table is allowed.		/// Return true if lowering to a jump table is allowed.
bool areJTsAllowed(const Function *Fn) const {		virtual bool areJTsAllowed(const Function *Fn) const {
if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")		if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
return false;		return false;

return isOperationLegalOrCustom(ISD::BR_JT, MVT::Other) \|\|		return isOperationLegalOrCustom(ISD::BR_JT, MVT::Other) \|\|
isOperationLegalOrCustom(ISD::BRIND, MVT::Other);		isOperationLegalOrCustom(ISD::BRIND, MVT::Other);
}		}

/// Check whether the range [Low,High] fits in a machine word.		/// Check whether the range [Low,High] fits in a machine word.
▲ Show 20 Lines • Show All 2,743 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/TargetPassConfig.h

Show First 20 Lines • Show All 407 Lines • ▼ Show 20 Lines	protected:

/// Add standard basic block placement passes.		/// Add standard basic block placement passes.
virtual void addBlockPlacement();		virtual void addBlockPlacement();

/// This pass may be implemented by targets that want to run passes		/// This pass may be implemented by targets that want to run passes
/// immediately before machine code is emitted.		/// immediately before machine code is emitted.
virtual void addPreEmitPass() { }		virtual void addPreEmitPass() { }

		/// This pass may be implemented by targets that want to run passes
		/// that emit MI directly and bypass all other machine passes.
		MatzeBUnsubmitted Done Reply Inline Actions Even though this documentation is mostly copied, how about changing it to: Targets may add passes immediately before machine code is emitted in this callback. This is called later than addPreEmitPass(). MatzeB: Even though this documentation is mostly copied, how about changing it to: ``` Targets may add…
		virtual void addEmitPass() {}
		MatzeBUnsubmitted Done Reply Inline Actions I find the name `addEmitPass()` misleading as it isn't adding the assembly emission passes. The best bad name I can think of right now is `addPreEmit2()`, in the spirit of the existing `addPreSched2()` callback... MatzeB: I find the name `addEmitPass()` misleading as it isn't adding the assembly emission passes. The…
		chandlercAuthorUnsubmitted Not Done Reply Inline Actions I suspect the right thing to do is to change the name of `addPreEmitPass` to something more representative of reality and document that correctly. Then we can use the obvious name of `addPreEmitPass`. But I'd prefer to do the name shuffling of existing APIs in a follow-up patch. I've used `addPreEmitPass2` here even though I really dislike that name, and left a FIXME. chandlerc: I suspect the right thing to do is to change the name of `addPreEmitPass` to something more…

/// Utilities for targets to add passes to the pass manager.		/// Utilities for targets to add passes to the pass manager.
///		///

/// Add a CodeGen pass at this point in the pipeline after checking overrides.		/// Add a CodeGen pass at this point in the pipeline after checking overrides.
/// Return the pass that was added, or zero if no pass was added.		/// Return the pass that was added, or zero if no pass was added.
/// @p printAfter if true and adding a machine function pass add an extra		/// @p printAfter if true and adding a machine function pass add an extra
/// machine printer pass afterwards		/// machine printer pass afterwards
/// @p verifyAfter if true and adding a machine function pass add an extra		/// @p verifyAfter if true and adding a machine function pass add an extra
Show All 21 Lines

llvm/include/llvm/CodeGen/TargetSubtargetInfo.h

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	public:
///		///
/// By default this queries the PostRAScheduling bit in the scheduling model		/// By default this queries the PostRAScheduling bit in the scheduling model
/// which is the preferred way to influence this.		/// which is the preferred way to influence this.
virtual bool enablePostRAScheduler() const;		virtual bool enablePostRAScheduler() const;

/// \brief True if the subtarget should run the atomic expansion pass.		/// \brief True if the subtarget should run the atomic expansion pass.
virtual bool enableAtomicExpand() const;		virtual bool enableAtomicExpand() const;

		/// True if the subtarget should run the indirectbr expansion pass.
		virtual bool enableIndirectBrExpand() const;

/// \brief Override generic scheduling policy within a region.		/// \brief Override generic scheduling policy within a region.
///		///
/// This is a convenient way for targets that don't provide any custom		/// This is a convenient way for targets that don't provide any custom
/// scheduling heuristics (no custom MachineSchedStrategy) to make		/// scheduling heuristics (no custom MachineSchedStrategy) to make
/// changes to the generic scheduling policy.		/// changes to the generic scheduling policy.
virtual void overrideSchedPolicy(MachineSchedPolicy &Policy,		virtual void overrideSchedPolicy(MachineSchedPolicy &Policy,
unsigned NumRegionInstrs) const {}		unsigned NumRegionInstrs) const {}

▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines
	void initializeGuardWideningLegacyPassPass(PassRegistry&);			void initializeGuardWideningLegacyPassPass(PassRegistry&);
	void initializeIPCPPass(PassRegistry&);			void initializeIPCPPass(PassRegistry&);
	void initializeIPSCCPLegacyPassPass(PassRegistry&);			void initializeIPSCCPLegacyPassPass(PassRegistry&);
	void initializeIRTranslatorPass(PassRegistry&);			void initializeIRTranslatorPass(PassRegistry&);
	void initializeIVUsersWrapperPassPass(PassRegistry&);			void initializeIVUsersWrapperPassPass(PassRegistry&);
	void initializeIfConverterPass(PassRegistry&);			void initializeIfConverterPass(PassRegistry&);
	void initializeImplicitNullChecksPass(PassRegistry&);			void initializeImplicitNullChecksPass(PassRegistry&);
	void initializeIndVarSimplifyLegacyPassPass(PassRegistry&);			void initializeIndVarSimplifyLegacyPassPass(PassRegistry&);
				void initializeIndirectBrExpandPassPass(PassRegistry&);
	void initializeInductiveRangeCheckEliminationPass(PassRegistry&);			void initializeInductiveRangeCheckEliminationPass(PassRegistry&);
	void initializeInferAddressSpacesPass(PassRegistry&);			void initializeInferAddressSpacesPass(PassRegistry&);
	void initializeInferFunctionAttrsLegacyPassPass(PassRegistry&);			void initializeInferFunctionAttrsLegacyPassPass(PassRegistry&);
	void initializeInlineCostAnalysisPass(PassRegistry&);			void initializeInlineCostAnalysisPass(PassRegistry&);
	void initializeInstCountPass(PassRegistry&);			void initializeInstCountPass(PassRegistry&);
	void initializeInstNamerPass(PassRegistry&);			void initializeInstNamerPass(PassRegistry&);
	void initializeInstSimplifierPass(PassRegistry&);			void initializeInstSimplifierPass(PassRegistry&);
	void initializeInstrProfilingLegacyPassPass(PassRegistry&);			void initializeInstrProfilingLegacyPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CMakeLists.txt

Show All 27 Lines	add_llvm_library(LLVMCodeGen
FuncletLayout.cpp		FuncletLayout.cpp
GCMetadata.cpp		GCMetadata.cpp
GCMetadataPrinter.cpp		GCMetadataPrinter.cpp
GCRootLowering.cpp		GCRootLowering.cpp
GCStrategy.cpp		GCStrategy.cpp
GlobalMerge.cpp		GlobalMerge.cpp
IfConversion.cpp		IfConversion.cpp
ImplicitNullChecks.cpp		ImplicitNullChecks.cpp
		IndirectBrExpandPass.cpp
InlineSpiller.cpp		InlineSpiller.cpp
InterferenceCache.cpp		InterferenceCache.cpp
InterleavedAccessPass.cpp		InterleavedAccessPass.cpp
IntrinsicLowering.cpp		IntrinsicLowering.cpp
LatencyPriorityQueue.cpp		LatencyPriorityQueue.cpp
LazyMachineBlockFrequencyInfo.cpp		LazyMachineBlockFrequencyInfo.cpp
LexicalScopes.cpp		LexicalScopes.cpp
LiveDebugValues.cpp		LiveDebugValues.cpp
▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGen.cpp

Show All 32 Lines	void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeExpandPostRAPass(Registry);		initializeExpandPostRAPass(Registry);
initializeFEntryInserterPass(Registry);		initializeFEntryInserterPass(Registry);
initializeFinalizeMachineBundlesPass(Registry);		initializeFinalizeMachineBundlesPass(Registry);
initializeFuncletLayoutPass(Registry);		initializeFuncletLayoutPass(Registry);
initializeGCMachineCodeAnalysisPass(Registry);		initializeGCMachineCodeAnalysisPass(Registry);
initializeGCModuleInfoPass(Registry);		initializeGCModuleInfoPass(Registry);
initializeIfConverterPass(Registry);		initializeIfConverterPass(Registry);
initializeImplicitNullChecksPass(Registry);		initializeImplicitNullChecksPass(Registry);
		initializeIndirectBrExpandPassPass(Registry);
initializeInterleavedAccessPass(Registry);		initializeInterleavedAccessPass(Registry);
initializeLiveDebugValuesPass(Registry);		initializeLiveDebugValuesPass(Registry);
initializeLiveDebugVariablesPass(Registry);		initializeLiveDebugVariablesPass(Registry);
initializeLiveIntervalsPass(Registry);		initializeLiveIntervalsPass(Registry);
initializeLiveRangeShrinkPass(Registry);		initializeLiveRangeShrinkPass(Registry);
initializeLiveStacksPass(Registry);		initializeLiveStacksPass(Registry);
initializeLiveVariablesPass(Registry);		initializeLiveVariablesPass(Registry);
initializeLocalStackSlotPassPass(Registry);		initializeLocalStackSlotPassPass(Registry);
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/lib/CodeGen/IndirectBrExpandPass.cpp

This file was added.

				//===- IndirectBrExpandPass.cpp - Expand indirectbr to switch -------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// Implements an expansion pass to turn `indirectbr` instructions in the IR
				/// into `switch` instructions. This works by enumerating the basic blocks in
				/// a dense range of integers, replacing each `blockaddr` constant with the
				/// corresponding integer constant, and then building a switch that maps from
				/// the integers to the actual blocks.
				///
				/// While this is generically useful if a target is unable to codegen
				/// `indirectbr` natively, it is primarily useful when there is some desire to
				/// get the builtin non-jump-table lowering of a switch even when the input
				/// source contained an explicit indirect branch construct.
				///
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/CodeGen/TargetPassConfig.h"
				#include "llvm/CodeGen/TargetSubtargetInfo.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/InstIterator.h"
				#include "llvm/IR/Instruction.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetMachine.h"

				using namespace llvm;

				#define DEBUG_TYPE "indirectbr-expand"

				namespace {

				class IndirectBrExpandPass : public FunctionPass {
				const TargetLowering *TLI = nullptr;

				public:
				static char ID; // Pass identification, replacement for typeid

				IndirectBrExpandPass() : FunctionPass(ID) {
				initializeIndirectBrExpandPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &F) override;

				private:
				abUnsubmitted Done Reply Inline Actions Nit: Unnecessary 'private'? ab: Nit: Unnecessary 'private'?
				};

				} // end anonymous namespace

				char IndirectBrExpandPass::ID = 0;

				INITIALIZE_PASS(IndirectBrExpandPass, DEBUG_TYPE,
				"Expand indirectbr instructions", false, false)

				FunctionPass *llvm::createIndirectBrExpandPass() {
				return new IndirectBrExpandPass();
				}

				bool IndirectBrExpandPass::runOnFunction(Function &F) {
				auto &DL = F.getParent()->getDataLayout();
				auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
				if (!TPC)
				return false;

				auto &TM = TPC->getTM<TargetMachine>();
				auto &STI = *TM.getSubtargetImpl(F);
				if (!STI.enableIndirectBrExpand())
				return false;
				TLI = STI.getTargetLowering();

				SmallVector<IndirectBrInst *, 1> IndirectBrs;

				// Build a list of indirectbrs that we want to rewrite.
				for (Instruction &I : instructions(F))
				if (auto *IBr = dyn_cast<IndirectBrInst>(&I))
				IndirectBrs.push_back(IBr);

				if (IndirectBrs.empty())
				return false;

				// If we need to replace any indirectbrs we need to establish integer
				// constants that will correspond to each of the basic blocks in the function
				// whose address escapes. We do that here and rewrite all the blockaddress
				// constants to just be those integer constants cast to a pointer type.
				SmallVector<BasicBlock *, 4> BBs;
				SmallDenseMap<BasicBlock *, intptr_t, 4> BBToIndex;
				for (BasicBlock &BB : F) {
				int BBIndex = -1;
				ConstantInt *BBIndexC;
				majnemerUnsubmitted Done Reply Inline Actions BBToIndex is to `intptr_t` but BBIndex is an `int`, shouldn't they agree? majnemer: BBToIndex is to `intptr_t` but BBIndex is an `int`, shouldn't they agree?
				chandlercAuthorUnsubmitted Done Reply Inline Actions I just did this out of habit to avoid padding in the map's pair object. The size of this isn't really significant unless we have over 2 billion basic blocks with their address taken.... Still, if this bothers folks I can change it either direction. chandlerc: I just did this out of habit to avoid padding in the map's pair object. The size of this isn't…
				for (auto UI = BB.use_begin(), UE = BB.use_end(); UI != UE;) {
				// Grab the use and increment past it so that we can replace this use
				// without breaking iteration.
				Use &U = *UI++;
				Value *UserV = U.getUser();

				// If the use isn't a blockaddress, nothing to do here.
				if (!isa<BlockAddress>(UserV))
				continue;

				// If this is the first blockaddress use of this block, map it to an
				// index.
				if (BBIndex == -1) {
				efriedmaUnsubmitted Done Reply Inline Actions blockaddresses are uniqued, so no block should ever have more than one blockaddress user. So this should probably be an assertion. efriedma: blockaddresses are uniqued, so no block should ever have more than one blockaddress user. So…
				chandlercAuthorUnsubmitted Done Reply Inline Actions I just didn't want to hard code that assumption, but I can if you prefer. chandlerc: I just didn't want to hard code that assumption, but I can if you prefer.
				efriedmaUnsubmitted Done Reply Inline Actions If we violate that assumption, something has gone very wrong (either we've created a blockaddress in the wrong context, or we leaked a blockaddress from the context, or we have a blockaddress with an invalid block+function pair). Although, on a related note, you might want to check Constant::isConstantUsed(), so we don't generate indexes for blockaddresses which aren't actually referenced anywhere. efriedma: If we violate that assumption, something has gone very wrong (either we've created a…
				dexonsmithUnsubmitted Done Reply Inline Actions FWIW, I agree with Eli that it's fundamentally bad if constants haven't been uniqued properly. dexonsmith: FWIW, I agree with Eli that it's fundamentally bad if constants haven't been uniqued properly.
				chandlercAuthorUnsubmitted Done Reply Inline Actions Yeah, I'm convinced. It actually makes the code way simpler. I've implemented all of the suggestions here. chandlerc: Yeah, I'm convinced. It actually makes the code way simpler. I've implemented all of the…
				BBIndex = BBs.size();
				sanjoyUnsubmitted Done Reply Inline Actions `BBIndex` needs to start from `1` I think since "no label is equal to the null pointer". sanjoy: `BBIndex` needs to start from `1` I think since "no label is equal to the null pointer".
				BBToIndex.insert({&BB, BBIndex});
				BBs.push_back(&BB);

				auto *ITy = cast<IntegerType>(DL.getIntPtrType(UserV->getType()));
				BBIndexC = ConstantInt::get(ITy, BBIndex);
				}

				// Now rewrite the blockaddress to an integer constant based on the index.
				UserV->replaceAllUsesWith(
				ConstantExpr::getIntToPtr(BBIndexC, UserV->getType()));
				}
				}
				#ifndef NDEBUG
				for (const auto &BBIndexPair : BBToIndex)
				assert(BBs[BBIndexPair.second] == BBIndexPair.first &&
				"Mismatch between BB and index!");
				#endif

				// Now rewrite each indirectbr to cast its loaded pointer to an integer and
				// switch on it using the integer map from above.
				sanjoyUnsubmitted Done Reply Inline Actions Do we care about inline assembly here? The langref says "Finally, some targets may provide defined semantics when using the value as the operand to an inline assembly, but that is target specific." sanjoy: Do we care about inline assembly here? The langref says "Finally, some targets may provide…
				chandlercAuthorUnsubmitted Done Reply Inline Actions I mean, yes, but also no. ;] It would be nice to maybe preserve inline asm uses of blockaddr and not any others. And then force people to not rinse their blackaddr usage through inline asm and mix that with `-mretpoline`. That would allow the common usage I'm aware of to remain (showing label addresses in crash dumps in things like kernels) and really allow almost any usage wholly contained within inline asm to continue working perfectly. But it seemed reasonable for a follow-up. That said, maybe its not too complex to add now... chandlerc: I mean, yes, but also no. ;] It would be nice to maybe preserve inline asm uses of blockaddr…
				sanjoyUnsubmitted Done Reply Inline Actions What do you think about `report_fatal_error`ing here if you encounter an inline assembly user? That seems somewhat more friendly than silently "miscompiling" (in quotes) inline assembly. sanjoy: What do you think about `report_fatal_error`ing here if you encounter an inline assembly user?
				chandlercAuthorUnsubmitted Done Reply Inline Actions I actually think that'd be much less friendly. The primary use case I'm aware of is for diagnostics, and I'd much rather have low-quality diagnostics in some random subsystem than be unable to use retpolines... I've added a FIXME about this but I'm pretty nervous about breaking round-trip-through-inline-asm to repair a quality loss in diagnostics. But we can revisit this if needed. chandlerc: I actually think that'd be much less friendly. The primary use case I'm aware of is for…
				for (auto *IBr : IndirectBrs) {
				// Handle the degenerate case of no successors by replacing the indirectbr
				// with unreachable as there is no successor available.
				if (IBr->getNumSuccessors() == 0) {
				(void)new UnreachableInst(F.getContext(), IBr);
				IBr->eraseFromParent();
				continue;
				}

				// First, cast the address back to an integer value.
				auto *ITy =
				cast<IntegerType>(DL.getIntPtrType(IBr->getAddress()->getType()));
				auto *V = CastInst::CreatePointerCast(
				IBr->getAddress(), ITy,
				Twine(IBr->getAddress()->getName()) + ".switch_cast", IBr);

				// Select a default successor for the switch. Either use the first successor
				// or the successor mapping to the zero index (if one does).
				auto *DefaultSuccBB = IBr->getSuccessor(0);
				for (auto *SuccBB : IBr->successors()) {
				// Find the existing index for this basic block, or if we never computed
				// the basic block's address just reserve an entry for it.
				auto InsertResult = BBToIndex.insert({SuccBB, BBs.size()});
				if (InsertResult.second)
				BBs.push_back(SuccBB);

				int BBIndex = InsertResult.first->second;
				if (BBIndex == 0)
				majnemerUnsubmitted Done Reply Inline Actions I would think this should also agree with BBToIndex. majnemer: I would think this should also agree with BBToIndex.
				chandlercAuthorUnsubmitted Done Reply Inline Actions Or should BBToIndex just use an 'int'? chandlerc: Or should BBToIndex just use an 'int'?
				majnemerUnsubmitted Done Reply Inline Actions Either way works for me. majnemer: Either way works for me.
				DefaultSuccBB = SuccBB;
				}

				auto *SI =
				SwitchInst::Create(V, DefaultSuccBB, IBr->getNumSuccessors() - 1, IBr);
				for (auto *SuccBB : IBr->successors()) {
				if (SuccBB == DefaultSuccBB)
				// Nothing to do for the default successor, it is already set up.
				continue;

				// Lookup the index for this successor.
				auto BBToIndexIt = BBToIndex.find(SuccBB);
				assert(BBToIndexIt != BBToIndex.end() &&
				"Should have created an entry for all succesors!");
				auto *BBIndexC = ConstantInt::get(ITy, BBToIndexIt->second);

				// Add a case for this successor of the indirectbr.
				SI->addCase(BBIndexC, SuccBB);
				}

				// Now erase the indirectbr, leaving the switch as the new terminator.
				IBr->eraseFromParent();
				}
				return true;
				}

llvm/lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 899 Lines • ▼ Show 20 Lines	void TargetPassConfig::addMachinePasses() {
addPass(&FEntryInserterID, false);		addPass(&FEntryInserterID, false);

addPass(&XRayInstrumentationID, false);		addPass(&XRayInstrumentationID, false);
addPass(&PatchableFunctionID, false);		addPass(&PatchableFunctionID, false);

if (EnableMachineOutliner)		if (EnableMachineOutliner)
PM->add(createMachineOutlinerPass(EnableLinkOnceODROutlining));		PM->add(createMachineOutlinerPass(EnableLinkOnceODROutlining));

		// Add passes that directly emit MI after all other MI passes.
		addEmitPass();

AddingMachinePasses = false;		AddingMachinePasses = false;
}		}

/// Add passes that optimize machine instructions in SSA form.		/// Add passes that optimize machine instructions in SSA form.
void TargetPassConfig::addMachineSSAOptimization() {		void TargetPassConfig::addMachineSSAOptimization() {
// Pre-ra tail duplication.		// Pre-ra tail duplication.
addPass(&EarlyTailDuplicateID);		addPass(&EarlyTailDuplicateID);

▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetSubtargetInfo.cpp

	Show All 32 Lines
	}			}

	TargetSubtargetInfo::~TargetSubtargetInfo() = default;			TargetSubtargetInfo::~TargetSubtargetInfo() = default;

	bool TargetSubtargetInfo::enableAtomicExpand() const {			bool TargetSubtargetInfo::enableAtomicExpand() const {
	return true;			return true;
	}			}

				bool TargetSubtargetInfo::enableIndirectBrExpand() const {
				return false;
				}

	bool TargetSubtargetInfo::enableMachineScheduler() const {			bool TargetSubtargetInfo::enableMachineScheduler() const {
	return false;			return false;
	}			}

	bool TargetSubtargetInfo::enableJoinGlobalCopies() const {			bool TargetSubtargetInfo::enableJoinGlobalCopies() const {
	return enableMachineScheduler();			return enableMachineScheduler();
	}			}

	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/lib/Target/X86/CMakeLists.txt

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	set(sources
X86LegalizerInfo.cpp		X86LegalizerInfo.cpp
X86MCInstLower.cpp		X86MCInstLower.cpp
X86MachineFunctionInfo.cpp		X86MachineFunctionInfo.cpp
X86MacroFusion.cpp		X86MacroFusion.cpp
X86OptimizeLEAs.cpp		X86OptimizeLEAs.cpp
X86PadShortFunction.cpp		X86PadShortFunction.cpp
X86RegisterBankInfo.cpp		X86RegisterBankInfo.cpp
X86RegisterInfo.cpp		X86RegisterInfo.cpp
		X86RetpolineThunks.cpp
X86SelectionDAGInfo.cpp		X86SelectionDAGInfo.cpp
X86ShuffleDecodeConstantPool.cpp		X86ShuffleDecodeConstantPool.cpp
X86Subtarget.cpp		X86Subtarget.cpp
X86TargetMachine.cpp		X86TargetMachine.cpp
X86TargetObjectFile.cpp		X86TargetObjectFile.cpp
X86TargetTransformInfo.cpp		X86TargetTransformInfo.cpp
X86VZeroUpper.cpp		X86VZeroUpper.cpp
X86WinAllocaExpander.cpp		X86WinAllocaExpander.cpp
Show All 12 Lines

llvm/lib/Target/X86/X86.h

	Show All 16 Lines

	#include "llvm/Support/CodeGen.h"			#include "llvm/Support/CodeGen.h"

	namespace llvm {			namespace llvm {

	class FunctionPass;			class FunctionPass;
	class ImmutablePass;			class ImmutablePass;
	class InstructionSelector;			class InstructionSelector;
				class ModulePass;
	class PassRegistry;			class PassRegistry;
	class X86RegisterBankInfo;			class X86RegisterBankInfo;
	class X86Subtarget;			class X86Subtarget;
	class X86TargetMachine;			class X86TargetMachine;

	/// This pass converts a legalized DAG into a X86-specific DAG, ready for			/// This pass converts a legalized DAG into a X86-specific DAG, ready for
	/// instruction scheduling.			/// instruction scheduling.
	FunctionPass *createX86ISelDag(X86TargetMachine &TM,			FunctionPass *createX86ISelDag(X86TargetMachine &TM,
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	FunctionPass *createX86DomainReassignmentPass();			FunctionPass *createX86DomainReassignmentPass();

	void initializeFixupBWInstPassPass(PassRegistry &);			void initializeFixupBWInstPassPass(PassRegistry &);

	/// This pass replaces EVEX encoded of AVX-512 instructiosn by VEX			/// This pass replaces EVEX encoded of AVX-512 instructiosn by VEX
	/// encoding when possible in order to reduce code size.			/// encoding when possible in order to reduce code size.
	FunctionPass *createX86EvexToVexInsts();			FunctionPass *createX86EvexToVexInsts();

				/// This pass creates the thunks for the retpoline feature.
				ModulePass *createX86RetpolineThunksPass();

	InstructionSelector *createX86InstructionSelector(const X86TargetMachine &TM,			InstructionSelector *createX86InstructionSelector(const X86TargetMachine &TM,
	X86Subtarget &,			X86Subtarget &,
	X86RegisterBankInfo &);			X86RegisterBankInfo &);

	void initializeEvexToVexInstPassPass(PassRegistry &);			void initializeEvexToVexInstPassPass(PassRegistry &);

	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

llvm/lib/Target/X86/X86.td

	Show First 20 Lines • Show All 323 Lines • ▼ Show 20 Lines
	// Gather is available since Haswell (AVX2 set). So technically, we can			// Gather is available since Haswell (AVX2 set). So technically, we can
	// generate Gathers on all AVX2 processors. But the overhead on HSW is high.			// generate Gathers on all AVX2 processors. But the overhead on HSW is high.
	// Skylake Client processor has faster Gathers than HSW and performance is			// Skylake Client processor has faster Gathers than HSW and performance is
	// similar to Skylake Server (AVX-512).			// similar to Skylake Server (AVX-512).
	def FeatureHasFastGather			def FeatureHasFastGather
	: SubtargetFeature<"fast-gather", "HasFastGather", "true",			: SubtargetFeature<"fast-gather", "HasFastGather", "true",
	"Indicates if gather is reasonably fast.">;			"Indicates if gather is reasonably fast.">;

				// Enable mitigation of some aspects of speculative execution related
				// vulnerabilities by removing speculatable indirect branches. This disables
				// jump-table formation, rewrites explicit `indirectbr` instructions into
				// `switch` instructions, and uses a special construct called a "retpoline" to
				// prevent speculation of the remaining indirect branches (indirect calls and
				// tail calls).
				def FeatureRetpoline
				: SubtargetFeature<"retpoline", "UseRetpoline", "true",
				"Remove speculation of indirect branches from the "
				"generated code, either by avoiding them entirely or "
				"lowering them with a speculation blocking construct.">;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Register File Description			// Register File Description
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	include "X86RegisterInfo.td"			include "X86RegisterInfo.td"
	include "X86RegisterBanks.td"			include "X86RegisterBanks.td"

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 709 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86AsmPrinter.h

	Show All 26 Lines
	class MCSymbol;			class MCSymbol;

	class LLVM_LIBRARY_VISIBILITY X86AsmPrinter : public AsmPrinter {			class LLVM_LIBRARY_VISIBILITY X86AsmPrinter : public AsmPrinter {
	const X86Subtarget *Subtarget;			const X86Subtarget *Subtarget;
	StackMaps SM;			StackMaps SM;
	FaultMaps FM;			FaultMaps FM;
	std::unique_ptr<MCCodeEmitter> CodeEmitter;			std::unique_ptr<MCCodeEmitter> CodeEmitter;
	bool EmitFPOData = false;			bool EmitFPOData = false;
				bool NeedsRetpoline = false;

	// This utility class tracks the length of a stackmap instruction's 'shadow'.			// This utility class tracks the length of a stackmap instruction's 'shadow'.
	// It is used by the X86AsmPrinter to ensure that the stackmap shadow			// It is used by the X86AsmPrinter to ensure that the stackmap shadow
	// invariants (i.e. no other stackmaps, patchpoints, or control flow within			// invariants (i.e. no other stackmaps, patchpoints, or control flow within
	// the shadow) are met, while outputting a minimal number of NOPs for padding.			// the shadow) are met, while outputting a minimal number of NOPs for padding.
	//			//
	// To minimise the number of NOPs used, the shadow tracker counts the number			// To minimise the number of NOPs used, the shadow tracker counts the number
	// of instruction bytes output since the last stackmap. Only if there are too			// of instruction bytes output since the last stackmap. Only if there are too
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86FastISel.cpp

Show First 20 Lines • Show All 3,166 Lines • ▼ Show 20 Lines	const CallInst *CI =
CLI.CS ? dyn_cast<CallInst>(CLI.CS->getInstruction()) : nullptr;		CLI.CS ? dyn_cast<CallInst>(CLI.CS->getInstruction()) : nullptr;
const Function *CalledFn = CI ? CI->getCalledFunction() : nullptr;		const Function *CalledFn = CI ? CI->getCalledFunction() : nullptr;

// Functions with no_caller_saved_registers that need special handling.		// Functions with no_caller_saved_registers that need special handling.
if ((CI && CI->hasFnAttr("no_caller_saved_registers")) \|\|		if ((CI && CI->hasFnAttr("no_caller_saved_registers")) \|\|
(CalledFn && CalledFn->hasFnAttribute("no_caller_saved_registers")))		(CalledFn && CalledFn->hasFnAttribute("no_caller_saved_registers")))
return false;		return false;

		// Functions using retpoline should use SDISel for calls.
		if (Subtarget->useRetpoline())
		return false;

// Handle only C, fastcc, and webkit_js calling conventions for now.		// Handle only C, fastcc, and webkit_js calling conventions for now.
switch (CC) {		switch (CC) {
default: return false;		default: return false;
case CallingConv::C:		case CallingConv::C:
case CallingConv::Fast:		case CallingConv::Fast:
case CallingConv::WebKit_JS:		case CallingConv::WebKit_JS:
case CallingConv::Swift:		case CallingConv::Swift:
case CallingConv::X86_FastCall:		case CallingConv::X86_FastCall:
▲ Show 20 Lines • Show All 816 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 623 Lines • ▼ Show 20 Lines	void X86DAGToDAGISel::PreprocessISelDAG() {
OptForMinSize = MF->getFunction().optForMinSize();		OptForMinSize = MF->getFunction().optForMinSize();
assert((!OptForMinSize \|\| OptForSize) && "OptForMinSize implies OptForSize");		assert((!OptForMinSize \|\| OptForSize) && "OptForMinSize implies OptForSize");

for (SelectionDAG::allnodes_iterator I = CurDAG->allnodes_begin(),		for (SelectionDAG::allnodes_iterator I = CurDAG->allnodes_begin(),
E = CurDAG->allnodes_end(); I != E; ) {		E = CurDAG->allnodes_end(); I != E; ) {
SDNode N = &I++; // Preincrement iterator to avoid invalidation issues.		SDNode N = &I++; // Preincrement iterator to avoid invalidation issues.

if (OptLevel != CodeGenOpt::None &&		if (OptLevel != CodeGenOpt::None &&
// Only does this when target favors doesn't favor register indirect		// Only do this when the target can fold the load into the call or
// call.		// jmp.
		!Subtarget->useRetpoline() &&
((N->getOpcode() == X86ISD::CALL && !Subtarget->slowTwoMemOps()) \|\|		((N->getOpcode() == X86ISD::CALL && !Subtarget->slowTwoMemOps()) \|\|
(N->getOpcode() == X86ISD::TC_RETURN &&		(N->getOpcode() == X86ISD::TC_RETURN &&
// Only does this if load can be folded into TC_RETURN.
(Subtarget->is64Bit() \|\|		(Subtarget->is64Bit() \|\|
!getTargetMachine().isPositionIndependent())))) {		!getTargetMachine().isPositionIndependent())))) {
/// Also try moving call address load from outside callseq_start to just		/// Also try moving call address load from outside callseq_start to just
/// before the call to allow it to be folded.		/// before the call to allow it to be folded.
///		///
/// [Load chain]		/// [Load chain]
/// ^		/// ^
/// \|		/// \|
▲ Show 20 Lines • Show All 2,464 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 976 Lines • ▼ Show 20 Lines	public:
bool isShuffleMaskLegal(ArrayRef<int> Mask, EVT VT) const override;		bool isShuffleMaskLegal(ArrayRef<int> Mask, EVT VT) const override;

/// Similar to isShuffleMaskLegal. This is used by Targets can use this to		/// Similar to isShuffleMaskLegal. This is used by Targets can use this to
/// indicate if there is a suitable VECTOR_SHUFFLE that can be used to		/// indicate if there is a suitable VECTOR_SHUFFLE that can be used to
/// replace a VAND with a constant pool entry.		/// replace a VAND with a constant pool entry.
bool isVectorClearMaskLegal(const SmallVectorImpl<int> &Mask,		bool isVectorClearMaskLegal(const SmallVectorImpl<int> &Mask,
EVT VT) const override;		EVT VT) const override;

		/// Returns true if lowering to a jump table is allowed.
		bool areJTsAllowed(const Function *Fn) const override;

/// If true, then instruction selection should		/// If true, then instruction selection should
/// seek to shrink the FP constant of the specified type to a smaller type		/// seek to shrink the FP constant of the specified type to a smaller type
/// in order to save space and / or reduce runtime.		/// in order to save space and / or reduce runtime.
bool ShouldShrinkFPConstant(EVT VT) const override {		bool ShouldShrinkFPConstant(EVT VT) const override {
// Don't shrink FP constpool if SSE2 is available since cvtss2sd is more		// Don't shrink FP constpool if SSE2 is available since cvtss2sd is more
// expensive than a straight movsd. On the other hand, it's important to		// expensive than a straight movsd. On the other hand, it's important to
// shrink long double fp constant since fldt is very slow.		// shrink long double fp constant since fldt is very slow.
return !X86ScalarSSEf64 \|\| VT == MVT::f80;		return !X86ScalarSSEf64 \|\| VT == MVT::f80;
▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines	MachineBasicBlock *EmitLoweredSegAlloca(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock *EmitLoweredTLSAddr(MachineInstr &MI,		MachineBasicBlock *EmitLoweredTLSAddr(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock *EmitLoweredTLSCall(MachineInstr &MI,		MachineBasicBlock *EmitLoweredTLSCall(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

		MachineBasicBlock *EmitLoweredRetpoline(MachineInstr &MI,
		MachineBasicBlock *BB) const;

MachineBasicBlock *emitEHSjLjSetJmp(MachineInstr &MI,		MachineBasicBlock *emitEHSjLjSetJmp(MachineInstr &MI,
MachineBasicBlock *MBB) const;		MachineBasicBlock *MBB) const;

MachineBasicBlock *emitEHSjLjLongJmp(MachineInstr &MI,		MachineBasicBlock *emitEHSjLjLongJmp(MachineInstr &MI,
MachineBasicBlock *MBB) const;		MachineBasicBlock *MBB) const;

MachineBasicBlock *emitFMA3Instr(MachineInstr &MI,		MachineBasicBlock *emitFMA3Instr(MachineInstr &MI,
MachineBasicBlock *MBB) const;		MachineBasicBlock *MBB) const;
▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 25,746 Lines • ▼ Show 20 Lines

bool		bool
X86TargetLowering::isVectorClearMaskLegal(const SmallVectorImpl<int> &Mask,		X86TargetLowering::isVectorClearMaskLegal(const SmallVectorImpl<int> &Mask,
EVT VT) const {		EVT VT) const {
// Just delegate to the generic legality, clear masks aren't special.		// Just delegate to the generic legality, clear masks aren't special.
return isShuffleMaskLegal(Mask, VT);		return isShuffleMaskLegal(Mask, VT);
}		}

		bool X86TargetLowering::areJTsAllowed(const Function *Fn) const {
		// If the subtarget is using retpolines, we need to not generate jump tables.
		if (Subtarget.useRetpoline())
		return false;

		// Otherwise, fallback on the generic logic.
		return TargetLowering::areJTsAllowed(Fn);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// X86 Scheduler Hooks		// X86 Scheduler Hooks
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Utility function to emit xbegin specifying the start of an RTM region.		/// Utility function to emit xbegin specifying the start of an RTM region.
static MachineBasicBlock emitXBegin(MachineInstr &MI, MachineBasicBlock MBB,		static MachineBasicBlock emitXBegin(MachineInstr &MI, MachineBasicBlock MBB,
const TargetInstrInfo *TII) {		const TargetInstrInfo *TII) {
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();
▲ Show 20 Lines • Show All 1,286 Lines • ▼ Show 20 Lines	if (Subtarget.is64Bit()) {
addDirectMem(MIB, X86::EAX);		addDirectMem(MIB, X86::EAX);
MIB.addReg(X86::EAX, RegState::ImplicitDefine).addRegMask(RegMask);		MIB.addReg(X86::EAX, RegState::ImplicitDefine).addRegMask(RegMask);
}		}

MI.eraseFromParent(); // The pseudo instruction is gone now.		MI.eraseFromParent(); // The pseudo instruction is gone now.
return BB;		return BB;
}		}

		static unsigned getOpcodeForRetpoline(unsigned RPOpc) {
		switch (RPOpc) {
		case X86::RETPOLINE_CALL32:
		return X86::CALLpcrel32;
		case X86::RETPOLINE_CALL64:
		return X86::CALL64pcrel32;
		case X86::RETPOLINE_TCRETURN32:
		return X86::TCRETURNdi;
		case X86::RETPOLINE_TCRETURN64:
		return X86::TCRETURNdi64;
		}
		llvm_unreachable("not retpoline opcode");
		}

		static const char *getRetpolineSymForReg(unsigned Reg) {
		switch (Reg) {
		case X86::EAX:
		return "__retpoline_eax";
		majnemerUnsubmitted Done Reply Inline Actions I'd remove this empty default to make `getRetpolineSymForReg` more similar to `getOpcodeForRetpoline`. majnemer: I'd remove this empty default to make `getRetpolineSymForReg` more similar to…
		chandlercAuthorUnsubmitted Not Done Reply Inline Actions Good call, done. chandlerc: Good call, done.
		case X86::ECX:
		return "__retpoline_ecx";
		case X86::EDX:
		return "__retpoline_edx";
		case X86::R11:
		return "__retpoline_r11";
		}
		llvm_unreachable("unexpected reg for retpoline");
		}

		MachineBasicBlock *
		X86TargetLowering::EmitLoweredRetpoline(MachineInstr &MI,
		MachineBasicBlock *BB) const {
		// Copy the virtual register into the R11 physical register and
		// call the retpoline thunk.
		DebugLoc DL = MI.getDebugLoc();
		const X86InstrInfo *TII = Subtarget.getInstrInfo();
		unsigned CalleeVReg = MI.getOperand(0).getReg();
		unsigned Opc = getOpcodeForRetpoline(MI.getOpcode());

		// Find an available scratch register to hold the callee. On 64-bit, we can
		// just use R11, but we scan for uses anyway to ensure we don't generate
		// incorrect code. On 32-bit, we use one of EAX, ECX, or EDX that isn't
		// already a register use operand to the call to hold the callee. If none
		// are available, push the callee instead. This is less efficient, but is
		// necessary for functions using 3 regparms. Such function calls are
		// (currently) not eligible for tail call optimization, because there is no
		// scratch register available to hold the address of the callee.
		SmallVector<unsigned, 3> AvailableRegs;
		if (Subtarget.is64Bit())
		AvailableRegs.push_back(X86::R11);
		else
		AvailableRegs.append({X86::EAX, X86::ECX, X86::EDX});

		// Zero out any registers that are already used.
		for (const auto &MO : MI.operands()) {
		if (MO.isReg() && MO.isUse())
		abUnsubmitted Done Reply Inline Actions I wonder: should this also check the CSR regmask? There are a couple CCs that do preserve R11, but it probably doesn't matter for -mretpoline users. E.g., define void @t(void ()* %f) { call cc 17 void %f() ret void } Worth an assert? ab: I wonder: should this also check the CSR regmask? There are a couple CCs that do preserve R11…
		chandlercAuthorUnsubmitted Not Done Reply Inline Actions We should already have the assert below? If we get through this and have no available reg, we assert that we're in 32-bit. That said, this really is a case that will come up. We shouldn't silently miscompile the code. I've changed both cases to `report_fatal_error` because these are fundamental incompatibilities with retpoline. chandlerc: We should already have the assert below? If we get through this and have no available reg, we…
		for (unsigned &Reg : AvailableRegs)
		if (Reg == MO.getReg())
		Reg = 0;
		}

		// Choose the first remaining non-zero available register.
		unsigned AvailableReg = 0;
		for (unsigned MaybeReg : AvailableRegs) {
		if (MaybeReg) {
		AvailableReg = MaybeReg;
		break;
		}
		}

		if (AvailableReg == 0) {
		// No register available. Use PUSH. This must not be a tailcall, and this
		// must not be x64.
		assert(!Subtarget.is64Bit() && "R11 should always be available on x64");
		assert(Opc == X86::CALLpcrel32 && "cannot push before tailcall");
		BuildMI(*BB, MI, DL, TII->get(X86::PUSH32r)).addReg(CalleeVReg);
		MI.getOperand(0).ChangeToES("__retpoline_push");
		MI.setDesc(TII->get(Opc));
		} else {
		BuildMI(*BB, MI, DL, TII->get(TargetOpcode::COPY), AvailableReg)
		.addReg(CalleeVReg);
		MI.getOperand(0).ChangeToES(getRetpolineSymForReg(AvailableReg));
		MI.setDesc(TII->get(Opc));
		MachineInstrBuilder(*BB->getParent(), &MI)
		.addReg(AvailableReg, RegState::Implicit \| RegState::Kill);
		}
		return BB;
		}

MachineBasicBlock *		MachineBasicBlock *
X86TargetLowering::emitEHSjLjSetJmp(MachineInstr &MI,		X86TargetLowering::emitEHSjLjSetJmp(MachineInstr &MI,
MachineBasicBlock *MBB) const {		MachineBasicBlock *MBB) const {
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();
MachineFunction *MF = MBB->getParent();		MachineFunction *MF = MBB->getParent();
const TargetInstrInfo *TII = Subtarget.getInstrInfo();		const TargetInstrInfo *TII = Subtarget.getInstrInfo();
const TargetRegisterInfo *TRI = Subtarget.getRegisterInfo();		const TargetRegisterInfo *TRI = Subtarget.getRegisterInfo();
MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
▲ Show 20 Lines • Show All 489 Lines • ▼ Show 20 Lines	X86TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,

switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: llvm_unreachable("Unexpected instr type to insert");		default: llvm_unreachable("Unexpected instr type to insert");
case X86::TLS_addr32:		case X86::TLS_addr32:
case X86::TLS_addr64:		case X86::TLS_addr64:
case X86::TLS_base_addr32:		case X86::TLS_base_addr32:
case X86::TLS_base_addr64:		case X86::TLS_base_addr64:
return EmitLoweredTLSAddr(MI, BB);		return EmitLoweredTLSAddr(MI, BB);
		case X86::RETPOLINE_CALL32:
		case X86::RETPOLINE_CALL64:
		case X86::RETPOLINE_TCRETURN32:
		case X86::RETPOLINE_TCRETURN64:
		return EmitLoweredRetpoline(MI, BB);
case X86::CATCHRET:		case X86::CATCHRET:
return EmitLoweredCatchRet(MI, BB);		return EmitLoweredCatchRet(MI, BB);
case X86::CATCHPAD:		case X86::CATCHPAD:
return EmitLoweredCatchPad(MI, BB);		return EmitLoweredCatchPad(MI, BB);
case X86::SEG_ALLOCA_32:		case X86::SEG_ALLOCA_32:
case X86::SEG_ALLOCA_64:		case X86::SEG_ALLOCA_64:
return EmitLoweredSegAlloca(MI, BB);		return EmitLoweredSegAlloca(MI, BB);
case X86::TLSCall_32:		case X86::TLSCall_32:
▲ Show 20 Lines • Show All 11,138 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrCompiler.td

Show First 20 Lines • Show All 1,140 Lines • ▼ Show 20 Lines	def X86tcret_6regs : PatFrag<(ops node:$ptr, node:$off),
for (unsigned i = 3, e = N->getNumOperands(); i != e; ++i)		for (unsigned i = 3, e = N->getNumOperands(); i != e; ++i)
if (isa<RegisterSDNode>(N->getOperand(i)) && ++NumRegs > 6)		if (isa<RegisterSDNode>(N->getOperand(i)) && ++NumRegs > 6)
return false;		return false;
return true;		return true;
}]>;		}]>;

def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),		def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
(TCRETURNri ptr_rc_tailcall:$dst, imm:$off)>,		(TCRETURNri ptr_rc_tailcall:$dst, imm:$off)>,
Requires<[Not64BitMode]>;		Requires<[Not64BitMode, NotUseRetpoline]>;

// FIXME: This is disabled for 32-bit PIC mode because the global base		// FIXME: This is disabled for 32-bit PIC mode because the global base
// register which is part of the address mode may be assigned a		// register which is part of the address mode may be assigned a
// callee-saved register.		// callee-saved register.
def : Pat<(X86tcret (load addr:$dst), imm:$off),		def : Pat<(X86tcret (load addr:$dst), imm:$off),
(TCRETURNmi addr:$dst, imm:$off)>,		(TCRETURNmi addr:$dst, imm:$off)>,
Requires<[Not64BitMode, IsNotPIC]>;		Requires<[Not64BitMode, IsNotPIC, NotUseRetpoline]>;

def : Pat<(X86tcret (i32 tglobaladdr:$dst), imm:$off),		def : Pat<(X86tcret (i32 tglobaladdr:$dst), imm:$off),
(TCRETURNdi tglobaladdr:$dst, imm:$off)>,		(TCRETURNdi tglobaladdr:$dst, imm:$off)>,
Requires<[NotLP64]>;		Requires<[NotLP64, NotUseRetpoline]>;
		AndreiGrischenkoUnsubmitted Done Reply Inline Actions Hi Chandler, Do you really want to use "NotUseRetpoline" here? It will match RETPOLINE_TCRETURN32 then even it is not an indirect branch. I guess the following test case will crash llc. target triple = "i386-unknown-linux-gnu" define void @FOO() { ret void } define void @FOO1() { entry: tail call void @FOO() ret void } AndreiGrischenko: Hi Chandler, Do you really want to use "NotUseRetpoline" here? It will match…
		chandlercAuthorUnsubmitted Done Reply Inline Actions Reid, could you take a look? chandlerc: Reid, could you take a look?
		rnkUnsubmitted Done Reply Inline Actions Here's a small diff on top of this one that fixes and tests it: https://reviews.llvm.org/P8056 rnk: Here's a small diff on top of this one that fixes and tests it: https://reviews.llvm.org/P8056
		chandlercAuthorUnsubmitted Done Reply Inline Actions Awesome, applied! chandlerc: Awesome, applied!

def : Pat<(X86tcret (i32 texternalsym:$dst), imm:$off),		def : Pat<(X86tcret (i32 texternalsym:$dst), imm:$off),
(TCRETURNdi texternalsym:$dst, imm:$off)>,		(TCRETURNdi texternalsym:$dst, imm:$off)>,
Requires<[NotLP64]>;		Requires<[NotLP64]>;

def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),		def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
(TCRETURNri64 ptr_rc_tailcall:$dst, imm:$off)>,		(TCRETURNri64 ptr_rc_tailcall:$dst, imm:$off)>,
Requires<[In64BitMode]>;		Requires<[In64BitMode, NotUseRetpoline]>;

// Don't fold loads into X86tcret requiring more than 6 regs.		// Don't fold loads into X86tcret requiring more than 6 regs.
// There wouldn't be enough scratch registers for base+index.		// There wouldn't be enough scratch registers for base+index.
def : Pat<(X86tcret_6regs (load addr:$dst), imm:$off),		def : Pat<(X86tcret_6regs (load addr:$dst), imm:$off),
(TCRETURNmi64 addr:$dst, imm:$off)>,		(TCRETURNmi64 addr:$dst, imm:$off)>,
Requires<[In64BitMode]>;		Requires<[In64BitMode, NotUseRetpoline]>;

		def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
		(RETPOLINE_TCRETURN64 ptr_rc_tailcall:$dst, imm:$off)>,
		Requires<[In64BitMode, UseRetpoline]>;

		def : Pat<(X86tcret ptr_rc_tailcall:$dst, imm:$off),
		(RETPOLINE_TCRETURN32 ptr_rc_tailcall:$dst, imm:$off)>,
		Requires<[Not64BitMode, UseRetpoline]>;

def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),		def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),
(TCRETURNdi64 tglobaladdr:$dst, imm:$off)>,		(TCRETURNdi64 tglobaladdr:$dst, imm:$off)>,
Requires<[IsLP64]>;		Requires<[IsLP64]>;

def : Pat<(X86tcret (i64 texternalsym:$dst), imm:$off),		def : Pat<(X86tcret (i64 texternalsym:$dst), imm:$off),
(TCRETURNdi64 texternalsym:$dst, imm:$off)>,		(TCRETURNdi64 texternalsym:$dst, imm:$off)>,
Requires<[IsLP64]>;		Requires<[IsLP64]>;
▲ Show 20 Lines • Show All 819 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrControl.td

Show First 20 Lines • Show All 205 Lines • ▼ Show 20 Lines	def CALL16r : I<0xFF, MRM2r, (outs), (ins GR16:$dst),
OpSize16, Requires<[Not64BitMode]>, Sched<[WriteJump]>;		OpSize16, Requires<[Not64BitMode]>, Sched<[WriteJump]>;
def CALL16m : I<0xFF, MRM2m, (outs), (ins i16mem:$dst),		def CALL16m : I<0xFF, MRM2m, (outs), (ins i16mem:$dst),
"call{w}\t{*}$dst", [(X86call (loadi16 addr:$dst))],		"call{w}\t{*}$dst", [(X86call (loadi16 addr:$dst))],
IIC_CALL_MEM>, OpSize16,		IIC_CALL_MEM>, OpSize16,
Requires<[Not64BitMode,FavorMemIndirectCall]>,		Requires<[Not64BitMode,FavorMemIndirectCall]>,
Sched<[WriteJumpLd]>;		Sched<[WriteJumpLd]>;
def CALL32r : I<0xFF, MRM2r, (outs), (ins GR32:$dst),		def CALL32r : I<0xFF, MRM2r, (outs), (ins GR32:$dst),
"call{l}\t{*}$dst", [(X86call GR32:$dst)], IIC_CALL_RI>,		"call{l}\t{*}$dst", [(X86call GR32:$dst)], IIC_CALL_RI>,
OpSize32, Requires<[Not64BitMode]>, Sched<[WriteJump]>;		OpSize32, Requires<[Not64BitMode,NotUseRetpoline]>,
		Sched<[WriteJump]>;
def CALL32m : I<0xFF, MRM2m, (outs), (ins i32mem:$dst),		def CALL32m : I<0xFF, MRM2m, (outs), (ins i32mem:$dst),
"call{l}\t{*}$dst", [(X86call (loadi32 addr:$dst))],		"call{l}\t{*}$dst", [(X86call (loadi32 addr:$dst))],
IIC_CALL_MEM>, OpSize32,		IIC_CALL_MEM>, OpSize32,
Requires<[Not64BitMode,FavorMemIndirectCall]>,		Requires<[Not64BitMode,FavorMemIndirectCall,NotUseRetpoline]>,
Sched<[WriteJumpLd]>;		Sched<[WriteJumpLd]>;

let Predicates = [Not64BitMode] in {		let Predicates = [Not64BitMode] in {
def FARCALL16i : Iseg16<0x9A, RawFrmImm16, (outs),		def FARCALL16i : Iseg16<0x9A, RawFrmImm16, (outs),
(ins i16imm:$off, i16imm:$seg),		(ins i16imm:$off, i16imm:$seg),
"lcall{w}\t$seg, $off", [],		"lcall{w}\t$seg, $off", [],
IIC_CALL_FAR_PTR>, OpSize16, Sched<[WriteJump]>;		IIC_CALL_FAR_PTR>, OpSize16, Sched<[WriteJump]>;
def FARCALL32i : Iseg32<0x9A, RawFrmImm16, (outs),		def FARCALL32i : Iseg32<0x9A, RawFrmImm16, (outs),
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	let isCall = 1, Uses = [RSP, SSP], SchedRW = [WriteJump] in {
// the 32-bit pcrel field that we have.		// the 32-bit pcrel field that we have.
def CALL64pcrel32 : Ii32PCRel<0xE8, RawFrm,		def CALL64pcrel32 : Ii32PCRel<0xE8, RawFrm,
(outs), (ins i64i32imm_pcrel:$dst),		(outs), (ins i64i32imm_pcrel:$dst),
"call{q}\t$dst", [], IIC_CALL_RI>, OpSize32,		"call{q}\t$dst", [], IIC_CALL_RI>, OpSize32,
Requires<[In64BitMode]>;		Requires<[In64BitMode]>;
def CALL64r : I<0xFF, MRM2r, (outs), (ins GR64:$dst),		def CALL64r : I<0xFF, MRM2r, (outs), (ins GR64:$dst),
"call{q}\t{*}$dst", [(X86call GR64:$dst)],		"call{q}\t{*}$dst", [(X86call GR64:$dst)],
IIC_CALL_RI>,		IIC_CALL_RI>,
Requires<[In64BitMode]>;		Requires<[In64BitMode,NotUseRetpoline]>;
def CALL64m : I<0xFF, MRM2m, (outs), (ins i64mem:$dst),		def CALL64m : I<0xFF, MRM2m, (outs), (ins i64mem:$dst),
"call{q}\t{*}$dst", [(X86call (loadi64 addr:$dst))],		"call{q}\t{*}$dst", [(X86call (loadi64 addr:$dst))],
IIC_CALL_MEM>,		IIC_CALL_MEM>,
Requires<[In64BitMode,FavorMemIndirectCall]>;		Requires<[In64BitMode,FavorMemIndirectCall,
		NotUseRetpoline]>;

def FARCALL64 : RI<0xFF, MRM3m, (outs), (ins opaque80mem:$dst),		def FARCALL64 : RI<0xFF, MRM3m, (outs), (ins opaque80mem:$dst),
"lcall{q}\t{*}$dst", [], IIC_CALL_FAR_MEM>;		"lcall{q}\t{*}$dst", [], IIC_CALL_FAR_MEM>;
}		}

let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,		let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
isCodeGenOnly = 1, Uses = [RSP, SSP], SchedRW = [WriteJump] in {		isCodeGenOnly = 1, Uses = [RSP, SSP], SchedRW = [WriteJump] in {
def TCRETURNdi64 : PseudoI<(outs),		def TCRETURNdi64 : PseudoI<(outs),
Show All 21 Lines	def TAILJMPr64_REX : I<0xFF, MRM4r, (outs), (ins ptr_rc_tailcall:$dst),
"rex64 jmp{q}\t{*}$dst", [], IIC_JMP_MEM>;		"rex64 jmp{q}\t{*}$dst", [], IIC_JMP_MEM>;

let mayLoad = 1 in		let mayLoad = 1 in
def TAILJMPm64_REX : I<0xFF, MRM4m, (outs), (ins i64mem_TC:$dst),		def TAILJMPm64_REX : I<0xFF, MRM4m, (outs), (ins i64mem_TC:$dst),
"rex64 jmp{q}\t{*}$dst", [], IIC_JMP_MEM>;		"rex64 jmp{q}\t{*}$dst", [], IIC_JMP_MEM>;
}		}
}		}

		let isPseudo = 1, isCall = 1, isCodeGenOnly = 1,
		Uses = [RSP, SSP],
		usesCustomInserter = 1,
		SchedRW = [WriteJump] in {
		def RETPOLINE_CALL32 :
		PseudoI<(outs), (ins GR32:$dst), [(X86call GR32:$dst)]>,
		Requires<[Not64BitMode,UseRetpoline]>;

		def RETPOLINE_CALL64 :
		PseudoI<(outs), (ins GR64:$dst), [(X86call GR64:$dst)]>,
		Requires<[In64BitMode,UseRetpoline]>;

		// Retpoline variant of indirect tail calls.
		let isTerminator = 1, isReturn = 1, isBarrier = 1 in {
		def RETPOLINE_TCRETURN64 :
		PseudoI<(outs), (ins GR64:$dst, i32imm:$offset), []>;
		def RETPOLINE_TCRETURN32 :
		PseudoI<(outs), (ins GR32:$dst, i32imm:$offset), []>;
		}
		}

// Conditional tail calls are similar to the above, but they are branches		// Conditional tail calls are similar to the above, but they are branches
// rather than barriers, and they use EFLAGS.		// rather than barriers, and they use EFLAGS.
let isCall = 1, isTerminator = 1, isReturn = 1, isBranch = 1,		let isCall = 1, isTerminator = 1, isReturn = 1, isBranch = 1,
isCodeGenOnly = 1, SchedRW = [WriteJumpLd] in		isCodeGenOnly = 1, SchedRW = [WriteJumpLd] in
let Uses = [RSP, EFLAGS, SSP] in {		let Uses = [RSP, EFLAGS, SSP] in {
def TCRETURNdi64cc : PseudoI<(outs),		def TCRETURNdi64cc : PseudoI<(outs),
(ins i64i32imm_pcrel:$dst, i32imm:$offset,		(ins i64i32imm_pcrel:$dst, i32imm:$offset,
i32imm:$cond), []>;		i32imm:$cond), []>;

// This gets substituted to a conditional jump instruction in MC lowering.		// This gets substituted to a conditional jump instruction in MC lowering.
def TAILJMPd64_CC : Ii32PCRel<0x80, RawFrm, (outs),		def TAILJMPd64_CC : Ii32PCRel<0x80, RawFrm, (outs),
(ins i64i32imm_pcrel:$dst, i32imm:$cond),		(ins i64i32imm_pcrel:$dst, i32imm:$cond),
"",		"",
[], IIC_JMP_REL>;		[], IIC_JMP_REL>;
}		}

llvm/lib/Target/X86/X86InstrInfo.td

	Show First 20 Lines • Show All 932 Lines • ▼ Show 20 Lines

	def CallImmAddr : Predicate<"Subtarget->isLegalToCallImmediateAddr()">;			def CallImmAddr : Predicate<"Subtarget->isLegalToCallImmediateAddr()">;
	def FavorMemIndirectCall : Predicate<"!Subtarget->slowTwoMemOps()">;			def FavorMemIndirectCall : Predicate<"!Subtarget->slowTwoMemOps()">;
	def HasFastMem32 : Predicate<"!Subtarget->isUnalignedMem32Slow()">;			def HasFastMem32 : Predicate<"!Subtarget->isUnalignedMem32Slow()">;
	def HasFastLZCNT : Predicate<"Subtarget->hasFastLZCNT()">;			def HasFastLZCNT : Predicate<"Subtarget->hasFastLZCNT()">;
	def HasFastSHLDRotate : Predicate<"Subtarget->hasFastSHLDRotate()">;			def HasFastSHLDRotate : Predicate<"Subtarget->hasFastSHLDRotate()">;
	def HasERMSB : Predicate<"Subtarget->hasERMSB()">;			def HasERMSB : Predicate<"Subtarget->hasERMSB()">;
	def HasMFence : Predicate<"Subtarget->hasMFence()">;			def HasMFence : Predicate<"Subtarget->hasMFence()">;
				def UseRetpoline : Predicate<"Subtarget->useRetpoline()">;
				def NotUseRetpoline : Predicate<"!Subtarget->useRetpoline()">;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// X86 Instruction Format Definitions.			// X86 Instruction Format Definitions.
	//			//

	include "X86InstrFormats.td"			include "X86InstrFormats.td"

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 2,400 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86RetpolineThunks.cpp

This file was added.

				//======- X86RetpolineThunks.cpp - Construct retpoline thunks for x86 --=====//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// Pass that injects an MI thunk implementing a "retpoline". This is
				/// a RET-implemented trampoline that is used to lower indirect calls in a way
				/// that prevents speculation on some x86 processors and can be used to mitigate
				/// security vulnerabilities due to targeted speculative execution and side
				/// channels such as CVE-2017-5715.
				///
				/// TODO(chandlerc): All of this code could use better comments and
				/// documentation.
				///
				//===----------------------------------------------------------------------===//

				#include "X86.h"
				#include "X86InstrBuilder.h"
				#include "X86Subtarget.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineModuleInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/CodeGen/TargetPassConfig.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Module.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;

				#define DEBUG_TYPE "x86-retpoline-thunks"

				namespace {
				class X86RetpolineThunks : public ModulePass {
				public:
				static char ID;

				X86RetpolineThunks() : ModulePass(ID) {}

				StringRef getPassName() const override { return "X86 Retpoline Thunks"; }

				bool runOnModule(Module &M) override;

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<MachineModuleInfo>();
				AU.addPreserved<MachineModuleInfo>();
				}

				private:
				MachineModuleInfo *MMI;
				const TargetMachine *TM;
				bool Is64Bit;
				const X86Subtarget *STI;
				const X86InstrInfo *TII;

				Function *createThunkFunction(Module &M, StringRef Name);
				void insertRegReturnAddrClobber(MachineBasicBlock &MBB, unsigned Reg);
				void insert32BitPushReturnAddrClobber(MachineBasicBlock &MBB);
				void createThunk(Module &M, StringRef NameSuffix,
				Optional<unsigned> Reg = None);
				};

				} // end anonymous namespace

				ModulePass *llvm::createX86RetpolineThunksPass() {
				return new X86RetpolineThunks();
				}

				char X86RetpolineThunks::ID = 0;

				bool X86RetpolineThunks::runOnModule(Module &M) {
				DEBUG(dbgs() << getPassName() << '\n');

				if (skipModule(M))
				return false;

				MatzeBUnsubmitted Done Reply Inline Actions We need this pass for "correctness" and should never skip it I think. MatzeB: We need this pass for "correctness" and should never skip it I think.
				chandlercAuthorUnsubmitted Not Done Reply Inline Actions Agreed and done. chandlerc: Agreed and done.
				auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
				MatzeBUnsubmitted Done Reply Inline Actions There shouldn't be any way to use any Target/X86 without also having a TargetPassConfig in the pipeline. An assert should be enough. MatzeB: There shouldn't be any way to use any Target/X86 without also having a TargetPassConfig in the…
				if (!TPC)
				return false;

				MMI = &getAnalysis<MachineModuleInfo>();
				TM = &TPC->getTM<TargetMachine>();
				Is64Bit = TM->getTargetTriple().getArch() == Triple::x86_64;

				// Only add a thunk if we have at least one function that has the retpoline
				// feature enabled in its subtarget.
				// FIXME: Conditionalize on indirect calls so we don't emit a thunk when
				// nothing will end up calling it.
				// FIXME: It's a little silly to look at every function just to enumerate
				// the subtargets, but eventually we'll want to look at them for indirect
				// calls, so maybe this is OK.
				if (!llvm::any_of(M, [&](const Function &F) {
				// Save the subtarget we find for use in emitting the subsequent
				// thunk.
				STI = &TM->getSubtarget<X86Subtarget>(F);
				return STI->useRetpoline();
				}))
				return false;

				// If we have a relevant subtarget, get the instr info as well.
				TII = STI->getInstrInfo();

				if (Is64Bit) {
				// __llvm_retpoline_r11:
				// callq .Lr11_call_target
				// .Lr11_capture_spec:
				// pause
				// jmp .Lr11_capture_spec
				// .align 16
				// .Lr11_call_target:
				// movq %r11, (%rsp)
				abUnsubmitted Done Reply Inline Actions 'retq' at the end ab: 'retq' at the end

				createThunk(M, "r11", X86::R11);
				AndreiGrischenkoUnsubmitted Done Reply Inline Actions You will create thunk function even if it is not necessary? For example for " tail call void @FOO()"? AndreiGrischenko: You will create thunk function even if it is not necessary? For example for " tail call void…
				chandlercAuthorUnsubmitted Done Reply Inline Actions There is a FIXME above about being more careful when creating these. chandlerc: There is a FIXME above about being more careful when creating these.
				} else {
				// For 32-bit targets we need to emit a collection of thunks for various
				// possible scratch registers as well as a fallback that is used when
				// there are no scratch registers and assumes the retpoline target has
				// been pushed.
				// __llvm_retpoline_eax:
				// calll .Leax_call_target
				// .Leax_capture_spec:
				// pause
				// jmp .Leax_capture_spec
				// .align 16
				// .Leax_call_target:
				// movl %eax, (%esp) # Clobber return addr
				// retl
				//
				// __llvm_retpoline_ecx:
				// ... # Same setup
				// movl %ecx, (%esp)
				// retl
				//
				// __llvm_retpoline_edx:
				// ... # Same setup
				// movl %edx, (%esp)
				// retl
				//
				// This last one is a bit more special and so needs a little extra
				// handling.
				// __llvm_retpoline_push:
				// calll .Lpush_call_target
				// .Lpush_capture_spec:
				// pause
				// jmp .Lpush_capture_spec
				// .align 16
				// .Lpush_call_target:
				// # Clear pause_loop return address.
				// addl $4, %esp
				// # Top of stack words are: Callee, RA. Exchange Callee and RA.
				// pushl 4(%esp) # Push callee
				// pushl 4(%esp) # Push RA
				// popl 8(%esp) # Pop RA to final RA
				// popl (%esp) # Pop callee to next top of stack
				// retl # Ret to callee
				createThunk(M, "eax", X86::EAX);
				createThunk(M, "ecx", X86::ECX);
				createThunk(M, "edx", X86::EDX);
				createThunk(M, "push");
				}

				return true;
				}
				abUnsubmitted Done Reply Inline Actions Nit: newlines between functions ab: Nit: newlines between functions
				Function *X86RetpolineThunks::createThunkFunction(Module &M, StringRef Name) {
				LLVMContext &Ctx = M.getContext();
				auto Type = FunctionType::get(Type::getVoidTy(Ctx), false);
				Function *F =
				Function::Create(Type, GlobalValue::LinkOnceODRLinkage, Name, &M);
				F->setVisibility(GlobalValue::HiddenVisibility);

				// Add Attributes so that we don't create a frame, unwind information, or
				// inline.
				AttrBuilder B;
				B.addAttribute(llvm::Attribute::NoUnwind);
				B.addAttribute(llvm::Attribute::NoInline);
				abUnsubmitted Done Reply Inline Actions 'noinline' makes sense, but I'm curious: is it necessary? ab: 'noinline' makes sense, but I'm curious: is it necessary?
				chandlercAuthorUnsubmitted Not Done Reply Inline Actions Nope, not necessary. I've removed it. Actually, I don't think any of these are necessary because we hand build the MI and we are the last pass before the asm printer. But I've left the others for now. They seem at worst harmless. chandlerc: Nope, not necessary. I've removed it. Actually, I don't think any of these are necessary…
				B.addAttribute(llvm::Attribute::Naked);
				F->addAttributes(llvm::AttributeList::FunctionIndex, B);

				// Populate our function a bit so that we can verify.
				BasicBlock *Entry = BasicBlock::Create(Ctx, "entry", F);
				IRBuilder<> Builder(Entry);

				Builder.CreateRetVoid();
				return F;
				}

				void X86RetpolineThunks::insertRegReturnAddrClobber(MachineBasicBlock &MBB,
				unsigned Reg) {
				const unsigned MovOpc = Is64Bit ? X86::MOV64mr : X86::MOV32mr;
				const unsigned SPReg = Is64Bit ? X86::RSP : X86::ESP;
				addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(MovOpc)), SPReg, false, 0)
				.addReg(Reg);
				}
				void X86RetpolineThunks::insert32BitPushReturnAddrClobber(
				MachineBasicBlock &MBB) {
				// The instruction sequence we use to replace the return address without
				// a scratch register is somewhat complicated:
				// # Clear capture_spec from return address.
				// addl $4, %esp
				// # Top of stack words are: Callee, RA. Exchange Callee and RA.
				// pushl 4(%esp) # Push callee
				// pushl 4(%esp) # Push RA
				// popl 8(%esp) # Pop RA to final RA
				// popl (%esp) # Pop callee to next top of stack
				// retl # Ret to callee
				BuildMI(&MBB, DebugLoc(), TII->get(X86::ADD32ri), X86::ESP)
				.addReg(X86::ESP)
				.addImm(4);
				addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::PUSH32rmm)), X86::ESP,
				false, 4);
				addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::PUSH32rmm)), X86::ESP,
				false, 4);
				addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::POP32rmm)), X86::ESP,
				false, 8);
				addRegOffset(BuildMI(&MBB, DebugLoc(), TII->get(X86::POP32rmm)), X86::ESP,
				false, 0);
				}

				void X86RetpolineThunks::createThunk(Module &M, StringRef NameSuffix,
				Optional<unsigned> Reg) {
				Function &F =
				*createThunkFunction(M, (Twine("__llvm_retpoline_") + NameSuffix).str());
				MachineFunction &MF = MMI->getOrCreateMachineFunction(F);

				// Set MF properties. We never use vregs...
				MF.getProperties().set(MachineFunctionProperties::Property::NoVRegs);

				BasicBlock &OrigEntryBB = F.getEntryBlock();
				MachineBasicBlock *Entry = MF.CreateMachineBasicBlock(&OrigEntryBB);
				MachineBasicBlock *CaptureSpec = MF.CreateMachineBasicBlock(&OrigEntryBB);
				MachineBasicBlock *CallTarget = MF.CreateMachineBasicBlock(&OrigEntryBB);

				MF.push_back(Entry);
				MF.push_back(CaptureSpec);
				MF.push_back(CallTarget);

				const unsigned CallOpc = Is64Bit ? X86::CALL64pcrel32 : X86::CALLpcrel32;
				const unsigned RetOpc = Is64Bit ? X86::RETQ : X86::RETL;

				BuildMI(Entry, DebugLoc(), TII->get(CallOpc)).addMBB(CallTarget);
				Entry->addSuccessor(CallTarget);
				Entry->addSuccessor(CaptureSpec);
				CallTarget->setHasAddressTaken();

				BuildMI(CaptureSpec, DebugLoc(), TII->get(X86::PAUSE));
				BuildMI(CaptureSpec, DebugLoc(), TII->get(X86::JMP_1)).addMBB(CaptureSpec);
				CaptureSpec->setHasAddressTaken();
				CaptureSpec->addSuccessor(CaptureSpec);

				CallTarget->setAlignment(4);
				if (Reg) {
				insertRegReturnAddrClobber(CallTarget, Reg);
				} else {
				assert(!Is64Bit && "We only support non-reg thunks on 32-bit x86!");
				insert32BitPushReturnAddrClobber(*CallTarget);
				}
				BuildMI(CallTarget, DebugLoc(), TII->get(RetOpc));
				}

llvm/lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	protected:
bool HasSGX;		bool HasSGX;

/// Processor supports Flush Cache Line instruction		/// Processor supports Flush Cache Line instruction
bool HasCLFLUSHOPT;		bool HasCLFLUSHOPT;

/// Processor supports Cache Line Write Back instruction		/// Processor supports Cache Line Write Back instruction
bool HasCLWB;		bool HasCLWB;

		/// Processor needs retpolin to avoid branch prediction errors.
		abUnsubmitted Done Reply Inline Actions Nit: "Use retpoline to avoid ..." ab: Nit: "Use retpoline to avoid ..."
		bool UseRetpoline;

/// Use software floating point for code generation.		/// Use software floating point for code generation.
bool UseSoftFloat;		bool UseSoftFloat;

/// The minimum alignment known to hold of the stack frame on		/// The minimum alignment known to hold of the stack frame on
/// entry to the function and which must be maintained by every function.		/// entry to the function and which must be maintained by every function.
unsigned stackAlignment;		unsigned stackAlignment;

/// Max. memset / memcpy size that is turned into rep/movs, rep/stos ops.		/// Max. memset / memcpy size that is turned into rep/movs, rep/stos ops.
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	public:
bool hasPKU() const { return HasPKU; }		bool hasPKU() const { return HasPKU; }
bool hasVNNI() const { return HasVNNI; }		bool hasVNNI() const { return HasVNNI; }
bool hasBITALG() const { return HasBITALG; }		bool hasBITALG() const { return HasBITALG; }
bool hasMPX() const { return HasMPX; }		bool hasMPX() const { return HasMPX; }
bool hasSHSTK() const { return HasSHSTK; }		bool hasSHSTK() const { return HasSHSTK; }
bool hasIBT() const { return HasIBT; }		bool hasIBT() const { return HasIBT; }
bool hasCLFLUSHOPT() const { return HasCLFLUSHOPT; }		bool hasCLFLUSHOPT() const { return HasCLFLUSHOPT; }
bool hasCLWB() const { return HasCLWB; }		bool hasCLWB() const { return HasCLWB; }
		bool useRetpoline() const { return UseRetpoline; }

bool isXRaySupported() const override { return is64Bit(); }		bool isXRaySupported() const override { return is64Bit(); }

X86ProcFamilyEnum getProcFamily() const { return X86ProcFamily; }		X86ProcFamilyEnum getProcFamily() const { return X86ProcFamily; }

/// TODO: to be removed later and replaced with suitable properties		/// TODO: to be removed later and replaced with suitable properties
bool isAtom() const { return X86ProcFamily == IntelAtom; }		bool isAtom() const { return X86ProcFamily == IntelAtom; }
bool isSLM() const { return X86ProcFamily == IntelSLM; }		bool isSLM() const { return X86ProcFamily == IntelSLM; }
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	public:

/// Classify a blockaddress reference for the current subtarget according to		/// Classify a blockaddress reference for the current subtarget according to
/// how we should reference it in a non-pcrel context.		/// how we should reference it in a non-pcrel context.
unsigned char classifyBlockAddressReference() const;		unsigned char classifyBlockAddressReference() const;

/// Return true if the subtarget allows calls to immediate address.		/// Return true if the subtarget allows calls to immediate address.
bool isLegalToCallImmediateAddr() const;		bool isLegalToCallImmediateAddr() const;

		/// If we are using retpolines, we need to expand indirectbr to avoid it
		/// lowering to an actual indirect jump.
		bool enableIndirectBrExpand() const override { return useRetpoline(); }

/// Enable the MachineScheduler pass for all X86 subtargets.		/// Enable the MachineScheduler pass for all X86 subtargets.
bool enableMachineScheduler() const override { return true; }		bool enableMachineScheduler() const override { return true; }

// TODO: Update the regression tests and return true.		// TODO: Update the regression tests and return true.
bool supportPrintSchedInfo() const override { return false; }		bool supportPrintSchedInfo() const override { return false; }

bool enableEarlyIfConversion() const override;		bool enableEarlyIfConversion() const override;

Show All 15 Lines

llvm/lib/Target/X86/X86Subtarget.cpp

Show First 20 Lines • Show All 308 Lines • ▼ Show 20 Lines	void X86Subtarget::initializeEnvironment() {
HasMWAITX = false;		HasMWAITX = false;
HasCLZERO = false;		HasCLZERO = false;
HasMPX = false;		HasMPX = false;
HasSHSTK = false;		HasSHSTK = false;
HasIBT = false;		HasIBT = false;
HasSGX = false;		HasSGX = false;
HasCLFLUSHOPT = false;		HasCLFLUSHOPT = false;
HasCLWB = false;		HasCLWB = false;
		UseRetpoline = false;
IsPMULLDSlow = false;		IsPMULLDSlow = false;
IsSHLDSlow = false;		IsSHLDSlow = false;
IsUAMem16Slow = false;		IsUAMem16Slow = false;
IsUAMem32Slow = false;		IsUAMem32Slow = false;
HasSSEUnalignedMem = false;		HasSSEUnalignedMem = false;
HasCmpxchg16b = false;		HasCmpxchg16b = false;
UseLeaForSP = false;		UseLeaForSP = false;
HasFastVariableShuffle = false;		HasFastVariableShuffle = false;
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 316 Lines • ▼ Show 20 Lines	public:
bool addGlobalInstructionSelect() override;		bool addGlobalInstructionSelect() override;
bool addILPOpts() override;		bool addILPOpts() override;
bool addPreISel() override;		bool addPreISel() override;
void addMachineSSAOptimization() override;		void addMachineSSAOptimization() override;
void addPreRegAlloc() override;		void addPreRegAlloc() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
void addPreSched2() override;		void addPreSched2() override;
		void addEmitPass() override;
};		};

class X86ExecutionDepsFix : public ExecutionDepsFix {		class X86ExecutionDepsFix : public ExecutionDepsFix {
public:		public:
static char ID;		static char ID;
X86ExecutionDepsFix() : ExecutionDepsFix(ID, X86::VR128XRegClass) {}		X86ExecutionDepsFix() : ExecutionDepsFix(ID, X86::VR128XRegClass) {}
StringRef getPassName() const override {		StringRef getPassName() const override {
return "X86 Execution Dependency Fix";		return "X86 Execution Dependency Fix";
Show All 12 Lines

void X86PassConfig::addIRPasses() {		void X86PassConfig::addIRPasses() {
addPass(createAtomicExpandPass());		addPass(createAtomicExpandPass());

TargetPassConfig::addIRPasses();		TargetPassConfig::addIRPasses();

if (TM->getOptLevel() != CodeGenOpt::None)		if (TM->getOptLevel() != CodeGenOpt::None)
addPass(createInterleavedAccessPass());		addPass(createInterleavedAccessPass());

		// Add passes that handle indirect branch removal and insertion of a retpoline
		// thunk. These will be a no-op unless a function subtarget has the retpoline
		// feature enabled.
		addPass(createIndirectBrExpandPass());
}		}

bool X86PassConfig::addInstSelector() {		bool X86PassConfig::addInstSelector() {
// Install an instruction selector.		// Install an instruction selector.
addPass(createX86ISelDag(getX86TargetMachine(), getOptLevel()));		addPass(createX86ISelDag(getX86TargetMachine(), getOptLevel()));

// For ELF, cleanup any local-dynamic TLS accesses.		// For ELF, cleanup any local-dynamic TLS accesses.
if (TM->getTargetTriple().isOSBinFormatELF() &&		if (TM->getTargetTriple().isOSBinFormatELF() &&
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	void X86PassConfig::addPreEmitPass() {

if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addPass(createX86FixupBWInsts());		addPass(createX86FixupBWInsts());
addPass(createX86PadShortFunctions());		addPass(createX86PadShortFunctions());
addPass(createX86FixupLEAs());		addPass(createX86FixupLEAs());
addPass(createX86EvexToVexInsts());		addPass(createX86EvexToVexInsts());
}		}
}		}

		void X86PassConfig::addEmitPass() { addPass(createX86RetpolineThunksPass()); }

llvm/test/CodeGen/X86/O0-pipeline.ll

	Show All 19 Lines
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Lower Garbage Collection Instructions			; CHECK-NEXT: Lower Garbage Collection Instructions
	; CHECK-NEXT: Shadow Stack GC Lowering			; CHECK-NEXT: Shadow Stack GC Lowering
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)			; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	; CHECK-NEXT: Expand reduction intrinsics			; CHECK-NEXT: Expand reduction intrinsics
				; CHECK-NEXT: Expand indirectbr instructions
	; CHECK-NEXT: Rewrite Symbols			; CHECK-NEXT: Rewrite Symbols
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Exception handling preparation			; CHECK-NEXT: Exception handling preparation
	; CHECK-NEXT: Safe Stack instrumentation pass			; CHECK-NEXT: Safe Stack instrumentation pass
	; CHECK-NEXT: Insert stack protectors			; CHECK-NEXT: Insert stack protectors
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: X86 DAG->DAG Instruction Selection			; CHECK-NEXT: X86 DAG->DAG Instruction Selection
	Show All 16 Lines
	; CHECK-NEXT: Contiguously Lay Out Funclets			; CHECK-NEXT: Contiguously Lay Out Funclets
	; CHECK-NEXT: StackMap Liveness Analysis			; CHECK-NEXT: StackMap Liveness Analysis
	; CHECK-NEXT: Live DEBUG_VALUE analysis			; CHECK-NEXT: Live DEBUG_VALUE analysis
	; CHECK-NEXT: Insert fentry calls			; CHECK-NEXT: Insert fentry calls
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	; CHECK-NEXT: Insert XRay ops			; CHECK-NEXT: Insert XRay ops
	; CHECK-NEXT: Implement the 'patchable-function' attribute			; CHECK-NEXT: Implement the 'patchable-function' attribute
				; CHECK-NEXT: X86 Retpoline Thunks
				; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Optimization Remark Emitter			; CHECK-NEXT: Machine Optimization Remark Emitter
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	; CHECK-NEXT: X86 Assembly Printer			; CHECK-NEXT: X86 Assembly Printer
	; CHECK-NEXT: Free MachineFunction			; CHECK-NEXT: Free MachineFunction

	define void @f() {			define void @f() {
	ret void			ret void
	}			}

llvm/test/CodeGen/X86/retpoline.ll

This file was added.

				; RUN: llc -mtriple=x86_64-unknown < %s \| FileCheck %s --implicit-check-not="jmp.\" --implicit-check-not="call.\" --check-prefix=X64
				; RUN: llc -mtriple=x86_64-unknown -O0 < %s \| FileCheck %s --implicit-check-not="jmp.\" --implicit-check-not="call.\" --check-prefix=X64FAST

				; RUN: llc -mtriple=i686-unknown < %s \| FileCheck %s --implicit-check-not="jmp.\" --implicit-check-not="call.\" --check-prefix=X86
				; RUN: llc -mtriple=i686-unknown -O0 < %s \| FileCheck %s --implicit-check-not="jmp.\" --implicit-check-not="call.\" --check-prefix=X86FAST

				declare void @bar(i32)

				; Test a simple indirect call and tail call.
				define void @icall_reg(void (i32)* %fp, i32 %x) #0 {
				entry:
				tail call void @bar(i32 %x)
				tail call void %fp(i32 %x)
				tail call void @bar(i32 %x)
				tail call void %fp(i32 %x)
				ret void
				}

				; X64-LABEL: icall_reg:
				; X64-DAG: movq %rdi, %[[fp:[^ ]*]]
				; X64-DAG: movl %esi, %[[x:[^ ]*]]
				; X64: movl %[[x]], %edi
				; X64: callq bar
				; X64-DAG: movl %[[x]], %edi
				; X64-DAG: movq %[[fp]], %r11
				; X64: callq __retpoline_r11
				; X64: movl %[[x]], %edi
				; X64: callq bar
				; X64-DAG: movl %[[x]], %edi
				; X64-DAG: movq %[[fp]], %r11
				; X64: jmp __retpoline_r11 # TAILCALL

				; X64FAST-LABEL: icall_reg:
				; X64FAST: callq bar
				; X64FAST: callq __retpoline_r11
				; X64FAST: callq bar
				; X64FAST: jmp __retpoline_r11 # TAILCALL

				; X86-LABEL: icall_reg:
				; X86-DAG: movl 12(%esp), %[[fp:[^ ]*]]
				; X86-DAG: movl 16(%esp), %[[x:[^ ]*]]
				; X86: pushl %[[x]]
				; X86: calll bar
				; X86: movl %[[fp]], %eax
				; X86: pushl %[[x]]
				; X86: calll __retpoline_eax
				; X86: pushl %[[x]]
				; X86: calll bar
				; X86: movl %[[fp]], %eax
				; X86: pushl %[[x]]
				; X86: calll __retpoline_eax
				; X86-NOT: # TAILCALL

				; X86FAST-LABEL: icall_reg:
				; X86FAST: calll bar
				; X86FAST: calll __retpoline_eax
				; X86FAST: calll bar
				; X86FAST: calll __retpoline_eax


				@global_fp = external global void (i32)*

				; Test an indirect call through a global variable.
				define void @icall_global_fp(i32 %x, void (i32)** %fpp) #0 {
				%fp1 = load void (i32), void (i32)* @global_fp
				call void %fp1(i32 %x)
				%fp2 = load void (i32), void (i32)* @global_fp
				tail call void %fp2(i32 %x)
				ret void
				}

				; X64-LABEL: icall_global_fp:
				; X64-DAG: movl %edi, %[[x:[^ ]*]]
				; X64-DAG: movq global_fp(%rip), %r11
				; X64: callq __retpoline_r11
				; X64-DAG: movl %[[x]], %edi
				; X64-DAG: movq global_fp(%rip), %r11
				; X64: jmp __retpoline_r11 # TAILCALL

				; X64FAST-LABEL: icall_global_fp:
				; X64FAST: movq global_fp(%rip), %r11
				; X64FAST: callq __retpoline_r11
				; X64FAST: movq global_fp(%rip), %r11
				; X64FAST: jmp __retpoline_r11 # TAILCALL

				; X86-LABEL: icall_global_fp:
				; X86: movl global_fp, %eax
				; X86: pushl 4(%esp)
				; X86: calll __retpoline_eax
				; X86: addl $4, %esp
				; X86: movl global_fp, %eax
				; X86: jmp __retpoline_eax # TAILCALL

				; X86FAST-LABEL: icall_global_fp:
				; X86FAST: calll __retpoline_eax
				; X86FAST: jmp __retpoline_eax # TAILCALL


				%struct.Foo = type { void (%struct.Foo)* }

				; Test an indirect call through a vtable.
				define void @vcall(%struct.Foo* %obj) #0 {
				%vptr_field = getelementptr %struct.Foo, %struct.Foo* %obj, i32 0, i32 0
				%vptr = load void (%struct.Foo), void (%struct.Foo)*** %vptr_field
				%vslot = getelementptr void(%struct.Foo), void(%struct.Foo)* %vptr, i32 1
				%fp = load void(%struct.Foo), void(%struct.Foo)* %vslot
				tail call void %fp(%struct.Foo* %obj)
				tail call void %fp(%struct.Foo* %obj)
				ret void
				}

				; X64-LABEL: vcall:
				; X64: movq %rdi, %[[obj:[^ ]*]]
				; X64: movq (%[[obj]]), %[[vptr:[^ ]*]]
				; X64: movq 8(%[[vptr]]), %[[fp:[^ ]*]]
				; X64: movq %[[fp]], %r11
				; X64: callq __retpoline_r11
				; X64-DAG: movq %[[obj]], %rdi
				; X64-DAG: movq %[[fp]], %r11
				; X64: jmp __retpoline_r11 # TAILCALL

				; X64FAST-LABEL: vcall:
				; X64FAST: callq __retpoline_r11
				; X64FAST: jmp __retpoline_r11 # TAILCALL

				; X86-LABEL: vcall:
				; X86: movl 8(%esp), %[[obj:[^ ]*]]
				; X86: movl (%[[obj]]), %[[vptr:[^ ]*]]
				; X86: movl 4(%[[vptr]]), %[[fp:[^ ]*]]
				; X86: movl %[[fp]], %eax
				; X86: pushl %[[obj]]
				; X86: calll __retpoline_eax
				; X86: addl $4, %esp
				; X86: movl %[[fp]], %eax
				; X86: jmp __retpoline_eax # TAILCALL

				; X86FAST-LABEL: vcall:
				; X86FAST: calll __retpoline_eax
				; X86FAST: jmp __retpoline_eax # TAILCALL


				@indirectbr_rewrite.targets = constant [10 x i8] [i8 blockaddress(@indirectbr_rewrite, %bb0),
				i8* blockaddress(@indirectbr_rewrite, %bb1),
				i8* blockaddress(@indirectbr_rewrite, %bb2),
				i8* blockaddress(@indirectbr_rewrite, %bb3),
				i8* blockaddress(@indirectbr_rewrite, %bb4),
				i8* blockaddress(@indirectbr_rewrite, %bb5),
				i8* blockaddress(@indirectbr_rewrite, %bb6),
				i8* blockaddress(@indirectbr_rewrite, %bb7),
				i8* blockaddress(@indirectbr_rewrite, %bb8),
				i8* blockaddress(@indirectbr_rewrite, %bb9)]

				; Check that when retpolines are enabled a function with indirectbr gets
				; rewritten to use switch, and that in turn doesn't get lowered as a jump
				; table.
				define void @indirectbr_rewrite(i64* readonly %p, i64* %sink) #0 {
				; CHECK-LABEL: indirectbr_rewrite:
				; CHECK-NOT: jmpq
				entry:
				%i0 = load i64, i64* %p
				%target.i0 = getelementptr [10 x i8], [10 x i8]* @indirectbr_rewrite.targets, i64 0, i64 %i0
				%target0 = load i8, i8* %target.i0
				indirectbr i8* %target0, [label %bb1, label %bb3]

				bb0:
				store volatile i64 0, i64* %sink
				br label %latch

				bb1:
				store volatile i64 1, i64* %sink
				br label %latch

				bb2:
				store volatile i64 2, i64* %sink
				br label %latch

				bb3:
				store volatile i64 3, i64* %sink
				br label %latch

				bb4:
				store volatile i64 4, i64* %sink
				br label %latch

				bb5:
				store volatile i64 5, i64* %sink
				br label %latch

				bb6:
				store volatile i64 6, i64* %sink
				br label %latch

				bb7:
				store volatile i64 7, i64* %sink
				br label %latch

				bb8:
				store volatile i64 8, i64* %sink
				br label %latch

				bb9:
				store volatile i64 9, i64* %sink
				br label %latch

				latch:
				%i.next = load i64, i64* %p
				%target.i.next = getelementptr [10 x i8], [10 x i8]* @indirectbr_rewrite.targets, i64 0, i64 %i.next
				%target.next = load i8, i8* %target.i.next
				; Potentially hit a full 10 successors here so that even if we rewrite as
				; a switch it will try to be lowered with a jump table.
				indirectbr i8* %target.next, [label %bb0,
				label %bb1,
				label %bb2,
				label %bb3,
				label %bb4,
				label %bb5,
				label %bb6,
				label %bb7,
				label %bb8,
				label %bb9]
				}

				; Lastly check that the necessary thunks were emitted.
				;
				; X64-LABEL: __llvm_retpoline_r11: # @__llvm_retpoline_r11
				; X64-NEXT: # {{.*}} # %entry
				; X64-NEXT: callq [[CALL_TARGET:.*]]
				; X64-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
				; X64-NEXT: # %entry
				; X64-NEXT: # =>This Inner Loop Header: Depth=1
				; X64-NEXT: pause
				; X64-NEXT: jmp [[CAPTURE_SPEC]]
				; X64-NEXT: .p2align 4, 0x90
				; X64-NEXT: [[CALL_TARGET]]: # Block address taken
				; X64-NEXT: # %entry
				; X64-NEXT: movq %r11, (%rsp)
				; X64-NEXT: retq
				;
				; X86-LABEL: __llvm_retpoline_eax: # @__llvm_retpoline_eax
				; X86-NEXT: # {{.*}} # %entry
				; X86-NEXT: calll [[CALL_TARGET:.*]]
				; X86-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
				; X86-NEXT: # %entry
				; X86-NEXT: # =>This Inner Loop Header: Depth=1
				; X86-NEXT: pause
				; X86-NEXT: jmp [[CAPTURE_SPEC]]
				; X86-NEXT: .p2align 4, 0x90
				; X86-NEXT: [[CALL_TARGET]]: # Block address taken
				; X86-NEXT: # %entry
				; X86-NEXT: movl %eax, (%esp)
				; X86-NEXT: retl
				;
				; X86-LABEL: __llvm_retpoline_ecx: # @__llvm_retpoline_ecx
				; X86-NEXT: # {{.*}} # %entry
				; X86-NEXT: calll [[CALL_TARGET:.*]]
				; X86-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
				; X86-NEXT: # %entry
				; X86-NEXT: # =>This Inner Loop Header: Depth=1
				; X86-NEXT: pause
				; X86-NEXT: jmp [[CAPTURE_SPEC]]
				; X86-NEXT: .p2align 4, 0x90
				; X86-NEXT: [[CALL_TARGET]]: # Block address taken
				; X86-NEXT: # %entry
				; X86-NEXT: movl %ecx, (%esp)
				; X86-NEXT: retl
				;
				; X86-LABEL: __llvm_retpoline_edx: # @__llvm_retpoline_edx
				; X86-NEXT: # {{.*}} # %entry
				; X86-NEXT: calll [[CALL_TARGET:.*]]
				; X86-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
				; X86-NEXT: # %entry
				; X86-NEXT: # =>This Inner Loop Header: Depth=1
				; X86-NEXT: pause
				; X86-NEXT: jmp [[CAPTURE_SPEC]]
				; X86-NEXT: .p2align 4, 0x90
				; X86-NEXT: [[CALL_TARGET]]: # Block address taken
				; X86-NEXT: # %entry
				; X86-NEXT: movl %edx, (%esp)
				; X86-NEXT: retl
				;
				; X86-LABEL: __llvm_retpoline_push: # @__llvm_retpoline_push
				; X86-NEXT: # {{.*}} # %entry
				; X86-NEXT: calll [[CALL_TARGET:.*]]
				; X86-NEXT: [[CAPTURE_SPEC:.*]]: # Block address taken
				; X86-NEXT: # %entry
				; X86-NEXT: # =>This Inner Loop Header: Depth=1
				; X86-NEXT: pause
				; X86-NEXT: jmp [[CAPTURE_SPEC]]
				; X86-NEXT: .p2align 4, 0x90
				; X86-NEXT: [[CALL_TARGET]]: # Block address taken
				; X86-NEXT: # %entry
				; X86-NEXT: addl $4, %esp
				; X86-NEXT: pushl 4(%esp)
				; X86-NEXT: pushl 4(%esp)
				; X86-NEXT: popl 8(%esp)
				; X86-NEXT: popl (%esp)
				; X86-NEXT: retl


				attributes #0 = { "target-features"="+retpoline" }

llvm/test/Transforms/IndirectBrExpand/basic.ll

This file was added.

				; RUN: opt < %s -indirectbr-expand -S \| FileCheck %s
				;
				; REQUIRES: x86-registered-target

				target triple = "x86_64-unknown-linux-gnu"

				@test1.targets = constant [4 x i8] [i8 blockaddress(@test1, %bb0),
				i8* blockaddress(@test1, %bb1),
				i8* blockaddress(@test1, %bb2),
				i8* blockaddress(@test1, %bb3)]
				; CHECK-LABEL: @test1.targets = constant [4 x i8*]
				; CHECK: [i8* null,
				; CHECK: i8* inttoptr (i64 1 to i8*),
				; CHECK: i8* inttoptr (i64 2 to i8*),
				; CHECK: i8* inttoptr (i64 3 to i8*)]

				define void @test1(i64* readonly %p, i64* %sink) #0 {
				; CHECK-LABEL: define void @test1(
				entry:
				%i0 = load i64, i64* %p
				%target.i0 = getelementptr [4 x i8], [4 x i8]* @test1.targets, i64 0, i64 %i0
				%target0 = load i8, i8* %target.i0
				; Only a subset of blocks are viable successors here.
				indirectbr i8* %target0, [label %bb1, label %bb3]
				; CHECK-NOT: indirectbr
				; CHECK: %[[V:.]] = ptrtoint i8 %{{.*}} to i64
				; CHECK-NEXT: switch i64 %[[V]], label %bb1 [
				; CHECK-NEXT: i64 3, label %bb3
				; CHECK-NEXT: ]

				bb0:
				store volatile i64 0, i64* %sink
				br label %latch

				bb1:
				store volatile i64 1, i64* %sink
				br label %latch

				bb2:
				store volatile i64 2, i64* %sink
				br label %latch

				bb3:
				store volatile i64 3, i64* %sink
				br label %latch

				latch:
				%i.next = load i64, i64* %p
				%target.i.next = getelementptr [4 x i8], [4 x i8]* @test1.targets, i64 0, i64 %i.next
				%target.next = load i8, i8* %target.i.next
				; We list the successors in a frustrating order for the lowering to make sure
				; it handles that.
				indirectbr i8* %target.next, [label %bb1, label %bb2, label %bb3, label %bb0]
				; CHECK-NOT: indirectbr
				; CHECK: %[[V:.]] = ptrtoint i8 %{{.*}} to i64
				; CHECK-NEXT: switch i64 %[[V]], label %bb0 [
				; CHECK-NEXT: i64 1, label %bb1
				; CHECK-NEXT: i64 2, label %bb2
				; CHECK-NEXT: i64 3, label %bb3
				; CHECK-NEXT: ]
				}

				attributes #0 = { "target-features"="+retpoline" }

llvm/tools/opt/opt.cpp

Show First 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
initializeAtomicExpandPass(Registry);		initializeAtomicExpandPass(Registry);
initializeRewriteSymbolsLegacyPassPass(Registry);		initializeRewriteSymbolsLegacyPassPass(Registry);
initializeWinEHPreparePass(Registry);		initializeWinEHPreparePass(Registry);
initializeDwarfEHPreparePass(Registry);		initializeDwarfEHPreparePass(Registry);
initializeSafeStackLegacyPassPass(Registry);		initializeSafeStackLegacyPassPass(Registry);
initializeSjLjEHPreparePass(Registry);		initializeSjLjEHPreparePass(Registry);
initializePreISelIntrinsicLoweringLegacyPassPass(Registry);		initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
initializeGlobalMergePass(Registry);		initializeGlobalMergePass(Registry);
		initializeIndirectBrExpandPassPass(Registry);
initializeInterleavedAccessPass(Registry);		initializeInterleavedAccessPass(Registry);
initializeEntryExitInstrumenterPass(Registry);		initializeEntryExitInstrumenterPass(Registry);
initializePostInlineEntryExitInstrumenterPass(Registry);		initializePostInlineEntryExitInstrumenterPass(Registry);
initializeUnreachableBlockElimLegacyPassPass(Registry);		initializeUnreachableBlockElimLegacyPassPass(Registry);
initializeExpandReductionsPass(Registry);		initializeExpandReductionsPass(Registry);
initializeWriteBitcodePassPass(Registry);		initializeWriteBitcodePassPass(Registry);

#ifdef LINK_POLLY_INTO_TOOLS		#ifdef LINK_POLLY_INTO_TOOLS
▲ Show 20 Lines • Show All 381 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 128629

clang/include/clang/Driver/Options.td

clang/lib/Basic/Targets/X86.h

clang/lib/Basic/Targets/X86.cpp

clang/test/Driver/x86-target-features.c

lld/ELF/Arch/X86.cpp

lld/ELF/Arch/X86_64.cpp

lld/ELF/Config.h

lld/ELF/Driver.cpp

lld/test/ELF/i386-retpoline-nopic.s

lld/test/ELF/i386-retpoline-pic.s

lld/test/ELF/x86-64-retpoline-znow.s

lld/test/ELF/x86-64-retpoline.s

llvm/include/llvm/CodeGen/Passes.h

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/include/llvm/CodeGen/TargetPassConfig.h

llvm/include/llvm/CodeGen/TargetSubtargetInfo.h

llvm/include/llvm/InitializePasses.h

llvm/lib/CodeGen/CMakeLists.txt

llvm/lib/CodeGen/CodeGen.cpp

llvm/lib/CodeGen/IndirectBrExpandPass.cpp

llvm/lib/CodeGen/TargetPassConfig.cpp

llvm/lib/CodeGen/TargetSubtargetInfo.cpp

llvm/lib/Target/X86/CMakeLists.txt

llvm/lib/Target/X86/X86.h

llvm/lib/Target/X86/X86.td

llvm/lib/Target/X86/X86AsmPrinter.h

llvm/lib/Target/X86/X86FastISel.cpp

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/lib/Target/X86/X86InstrCompiler.td

llvm/lib/Target/X86/X86InstrControl.td

llvm/lib/Target/X86/X86InstrInfo.td

llvm/lib/Target/X86/X86RetpolineThunks.cpp

llvm/lib/Target/X86/X86Subtarget.h

llvm/lib/Target/X86/X86Subtarget.cpp

llvm/lib/Target/X86/X86TargetMachine.cpp

llvm/test/CodeGen/X86/O0-pipeline.ll

llvm/test/CodeGen/X86/retpoline.ll

llvm/test/Transforms/IndirectBrExpand/basic.ll

llvm/tools/opt/opt.cpp

Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
ClosedPublic