This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
-
ReleaseNotes.rst
-
include/clang/
-
clang/
-
Basic/
-
CodeGenOptions.def
-
DiagnosticFrontendKinds.td
-
TargetInfo.h
-
Driver/
-
CC1Options.td
2
Options.td
-
lib/
-
Basic/Targets/
-
Targets/
2/2
X86.h
-
CodeGen/
4/4
CGStmt.cpp
-
CodeGenModule.cpp
-
Driver/ToolChains/
-
ToolChains/
7/8
Clang.cpp
-
Frontend/
-
CompilerInvocation.cpp
-
test/
-
CodeGen/
2
stack-clash-protection.c
-
Driver/
-
stack-clash-protection.c
-
llvm/
-
docs/
-
ReleaseNotes.rst
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/Target/X86/
-
Target/
-
X86/
-
X86CallFrameOptimization.cpp
-
X86FrameLowering.h
9/10
X86FrameLowering.cpp
-
X86ISelLowering.h
1/3
X86ISelLowering.cpp
1/2
X86InstrCompiler.td
-
X86InstrInfo.td
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
1/1
stack-clash-dynamic-alloca.ll
-
stack-clash-large.ll
-
stack-clash-medium-natural-probes.ll
-
stack-clash-medium.ll
-
stack-clash-no-free-probe.ll
-
stack-clash-small.ll

Differential D68720

Support -fstack-clash-protection for x86
ClosedPublic

Authored by serge-sans-paille on Oct 9 2019, 11:30 AM.

Download Raw Diff

Details

Reviewers

rnk
craig.topper
annita.zhang
andrew.w.kaylor

Commits

rGe67cbac81211: Support -fstack-clash-protection for x86
rG0fd51a4554f5: Support -fstack-clash-protection for x86
rGe229017732bc: Support -fstack-clash-protection for x86
rG4a1a0690ad68: Support -fstack-clash-protection for x86
rG39f50da2a357: Support -fstack-clash-protection for x86

Summary

Implement protection against the stack clash attack [0] through inline stack probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1].

This extends the existing probe-stack mechanism with a special value inline-asm. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

serge-sans-paille created this revision.Oct 9 2019, 11:30 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 9 2019, 11:30 AM

Herald added subscribers: llvm-commits, cfe-commits, hiraditya, dschuff. · View Herald Transcript

tstellar added a subscriber: tstellar.Oct 9 2019, 11:33 AM

Is there some reason this isn't using the existing stack-probe-size attribute?

Sorry, I meant the "probe-stack" attribute.

craig.topper added a reviewer: rnk.Oct 9 2019, 12:44 PM

@efriedma : there's indeed an intersection with the probe-stack attribute. The probe-stack attribute (a) forces a function call, and (b) this function call only happens before the stack gets expanded.

(a) is probably a performance issue in several cases, plus it requires an extra register (that's mentioned in https://reviews.llvm.org/D9653)
(b) is an issue, as pointed out in https://lwn.net/Articles/726587/ (grep for valgrind) : from valgrind point of view, accessing un-allocated stack memory triggers error, and we probably want to please valgrind

Doing the call *after* the stack allocation is also not an option, as a signal could be raised between the stack allocation and the stack probing, escaping the stack probe if a custom signal handler is executed.

That being said, I do think it would be a good thing to have a special value for probe-stack, say probe-stack=inline-asm, that would trigger generation of inlined assembly as I do. That way we have all the pieces in one place, with different strategies. And we would have clang set the attribute for each function when -fstack-clash-protection is given.

Move to stack-probe compatibility, using a dedicated name to trigger inline assembly. It looks better to me because

it leverage existing mechanics
it has a finer grain

@efriedma alos compared to probe-stack with a function, this version has the ability to use existing MOV operations to avoid generating probes, which looks like a big plus to me.

(b) is an issue, as pointed out in https://lwn.net/Articles/726587/ (grep for valgrind) : from valgrind point of view, accessing un-allocated stack memory triggers error, and we probably want to please valgrind

Doing the call *after* the stack allocation is also not an option, as a signal could be raised between the stack allocation and the stack probing, escaping the stack probe if a custom signal handler is executed.

I'm not sure I follow. How are you solving this problem in your patch? By limiting the amount you adjust the stack at a time? What limit is sufficient to avoid this issue?

Can you give a complete assembly listing for small examples of static and dynamic stack probing?

llvm/lib/Target/X86/X86FrameLowering.cpp
482	Why is this code in a different location from the stack probing code that generates a call?
490	This algorithm needs some documentation; it isn't at all obvious what it's doing. Particularly the interaction with "free" stack probes. Should we generate a loop if the stack frame is large?

Please add info about this new feature to release notes

Added documentation + release note entry

@efriedma the free probe algorithm requires more testing, and I'd like to take into account memset and memcpy as free probes too. To showcase this algorithm, consider the following LLVM bitcode:

define i32 @foo() local_unnamed_addr {
  %a = alloca i32, i64 2000, align 16
  %b = getelementptr inbounds i32, i32* %a, i64 1198
  store volatile i32 1, i32* %b
  %c = load volatile i32, i32* %a
  ret i32 %c
}

when compiled with llc it outputs the following assembly:

foo:                                    # @foo
    subq    $7880, %rsp             # imm = 0x1EC8
    movl    $1, 4664(%rsp)
    movl    -128(%rsp), %eax
    addq    $7880, %rsp             # imm = 0x1EC8
    retq

When probe-stack is set to inline-asm it outputs

foo:                                    # @foo
	subq	$4096, %rsp             # imm = 0x1000
	movl	$1, 880(%rsp)
	subq	$3784, %rsp             # imm = 0xEC8
	movq	$0, (%rsp)
	movl	-128(%rsp), %eax
	addq	$7880, %rsp             # imm = 0x1EC8
	retq

The stack allocation is split in two, but only one MOV is added, the first one is what I call a free probe. Turns out we could only use natural probes here, I need to implement that :-)

As a comparison, setting probe-stack to a random function name like __probe_stack outputs the following:

foo:                                    # @foo
	movl	$8008, %eax             # imm = 0x1F48
	callq	__probe_stack
	subq	%rax, %rsp
	movl	$1, 4792(%rsp)
	movl	(%rsp), %eax
	addq	$8008, %rsp             # imm = 0x1F48
	retq

which requires runtime support (to provide __stack_probe), and a function call overhead, while ideally just an extra sub %rsp would be needed.

serge-sans-paille marked 3 inline comments as done.Oct 10 2019, 2:09 AM

serge-sans-paille added inline comments.

llvm/lib/Target/X86/X86FrameLowering.cpp
482	Because `BuildStackAdjustment` has more callsites than just emitPrologue and we want to capture all stack manipulation.
490	Should we generate a loop if the stack frame is large? That's an option. it's more complex to make looping compatible with free probes, but that's possible.
490	This algorithm needs some documentation Done

Added test case, statistics and refactor interactions with existing stack probing mechanism.

Some early stats: on the sqlite amalgamation [0], the free probe reuse allows to skip 123 out of the 474 probes needed during frame lowering.

[0] https://www.sqlite.org/download.html

xbolva00 added inline comments.Oct 10 2019, 12:06 PM

llvm/lib/Target/X86/X86FrameLowering.cpp
506	nit: llvm::any_of

efriedma added inline comments.Oct 10 2019, 3:38 PM

llvm/lib/Target/X86/X86FrameLowering.cpp
498	Each probe has to have an offset of at most PageSize bytes from the previous probe. If each probe is exactly PageSize bytes away from the previous probe, that's fine. But it looks like you don't enforce the distance between free probes correctly?
508	There are instructions that don't refer to any FI, but are still relevant. For example, calls.

tstellar added inline comments.Oct 10 2019, 4:05 PM

clang/test/CodeGen/stack-clash-protection.c
4	There were concerns[1] raised recently about adding clang tests that were codegen dependent. Is something being tested here that can't be tested with an IR test? If you only need to test that the frontend option work, I think checking the IR for the necessary function attributes might be enough. [1] http://lists.llvm.org/pipermail/cfe-dev/2019-September/063309.html

Ensure the distance between two probes is at max PAGE_SIZE.
Use Calls as free probes.
Fix alignment for dynamic alloca

This passes the llvm-test suite, and thanks to the use of calls, no inserted probe are needed to compile sqlite!

nickdesaulniers added a subscriber: nickdesaulniers.Oct 14 2019, 8:52 AM

For maintenance reasons, I'd really prefer it if we could find a way to reuse the existing code that calls an external stack probe function. What do you think about taking a look at X86RetpolineThunks.cpp and doing something similar to that? Basically, when the user sets -fstack-clash-protection, LLVM will emit a small comdat+weak function into every object file that has the same ABI as the existing stack probe mechanism. For other prior art, you can also look at how __clang_call_terminate works.

llvm/lib/Target/X86/X86FrameLowering.cpp
428	Please no dynamically initialized globals. The LLVM-y thing would be to use a switch.
482	If you care about those cases, we should have tests for all of them. These are all the cases: TailCallReturnAddrDelta: When doing guaranteed TCO for a callee with >4K of argument memory eliminateCallFramePseudoInstr: When doing stack adjustments to set up a call with more than 4K of arguments Both of these cases involve setting up arguments to calls, and I think we can guarantee that the compiler will emit writes to every argument stack slot. So, to set up one of these cases, we'd have this code: struct EightK { int a[2048]; }; void passEightK(EightK); void foo() { EightK x; memset(&x, 0, sizeof(x)); passEightK(x); // some targets will pass indirect, 32-bit probably uses byval. } In this case, LLVM happens to use rep;movsl to copy the argument memory. It could use memcpy, but either way, it will probe all those bytes. I think that means it's safe to move to emitSPUpdate. I would greatly prefer that `BuildStackAdjustment` remain a simple primitive that generates a single instruction.
llvm/test/CodeGen/X86/stack-clash-dynamic-alloca.ll
10–12	This seems like a good use case for update_llc_test_checks.py. I'd want to see the body of the loop, the cmp, the jl, etc.

sylvestre.ledru added a subscriber: sylvestre.ledru.Oct 14 2019, 11:50 PM

Some benchmarks, running the python performance suite from recompiled cpython build, one built with -O2 (baseline) and one built with -O2 -fstack-clash-protection (protection). Surprisingly enough, the stack protection makes the code faster in various scenario, but none of these changes present significant regression, according to me.

baseline.json
=============

Performance version: 0.9.1
Report on Linux-3.10.0-891.el7.x86_64-x86_64-with-redhat-7.5-Maipo
Number of logical CPUs: 8
Start date: 2019-10-15 11:16:28.425344
End date: 2019-10-15 11:37:55.064599

protection.json
===============

Performance version: 0.9.1
Report on Linux-3.10.0-891.el7.x86_64-x86_64-with-redhat-7.5-Maipo
Number of logical CPUs: 8
Start date: 2019-10-15 10:55:24.270166
End date: 2019-10-15 11:16:27.195366

+-------------------------+---------------+-----------------+--------------+-----------------------+
| Benchmark               | baseline.json | protection.json | Change       | Significance          |
+=========================+===============+=================+==============+=======================+
| 2to3                    | 435 ms        | 448 ms          | 1.03x slower | Significant (t=-4.26) |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| chameleon               | 15.0 ms       | 14.7 ms         | 1.02x faster | Significant (t=4.76)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| chaos                   | 176 ms        | 174 ms          | 1.02x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| crypto_pyaes            | 153 ms        | 150 ms          | 1.02x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| deltablue               | 11.9 ms       | 11.9 ms         | 1.00x slower | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| django_template         | 210 ms        | 223 ms          | 1.06x slower | Significant (t=-3.84) |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| dulwich_log             | 96.1 ms       | 102 ms          | 1.06x slower | Significant (t=-8.42) |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| fannkuch                | 703 ms        | 698 ms          | 1.01x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| float                   | 161 ms        | 160 ms          | 1.01x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| genshi_text             | 45.5 ms       | 45.5 ms         | 1.00x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| genshi_xml              | 95.3 ms       | 95.2 ms         | 1.00x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| go                      | 392 ms        | 382 ms          | 1.03x faster | Significant (t=5.95)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| hexiom                  | 16.0 ms       | 15.9 ms         | 1.01x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| html5lib                | 137 ms        | 135 ms          | 1.01x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| json_dumps              | 18.8 ms       | 17.9 ms         | 1.05x faster | Significant (t=4.81)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| json_loads              | 41.3 us       | 40.2 us         | 1.03x faster | Significant (t=2.53)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| logging_format          | 17.4 us       | 17.5 us         | 1.00x slower | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| logging_silent          | 509 ns        | 500 ns          | 1.02x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| logging_simple          | 14.3 us       | 14.8 us         | 1.04x slower | Significant (t=-5.64) |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| mako                    | 30.6 ms       | 30.4 ms         | 1.01x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| meteor_contest          | 134 ms        | 130 ms          | 1.03x faster | Significant (t=4.07)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| nbody                   | 186 ms        | 180 ms          | 1.03x faster | Significant (t=4.87)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| nqueens                 | 152 ms        | 145 ms          | 1.05x faster | Significant (t=5.26)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| pathlib                 | 29.5 ms       | 27.7 ms         | 1.07x faster | Significant (t=4.33)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| pickle                  | 15.7 us       | 15.6 us         | 1.01x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| pickle_dict             | 45.1 us       | 43.2 us         | 1.04x faster | Significant (t=34.35) |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| pickle_list             | 6.05 us       | 5.67 us         | 1.07x faster | Significant (t=5.15)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| pickle_pure_python      | 766 us        | 786 us          | 1.03x slower | Significant (t=-3.19) |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| pidigits                | 206 ms        | 208 ms          | 1.01x slower | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| python_startup          | 12.4 ms       | 12.7 ms         | 1.03x slower | Significant (t=-8.03) |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| python_startup_no_site  | 7.62 ms       | 7.75 ms         | 1.02x slower | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| raytrace                | 830 ms        | 805 ms          | 1.03x faster | Significant (t=15.48) |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| regex_compile           | 264 ms        | 261 ms          | 1.01x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| regex_dna               | 195 ms        | 195 ms          | 1.00x slower | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| regex_effbot            | 3.59 ms       | 3.50 ms         | 1.02x faster | Significant (t=7.90)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| regex_v8                | 30.0 ms       | 29.3 ms         | 1.02x faster | Significant (t=7.10)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| richards                | 122 ms        | 120 ms          | 1.02x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| scimark_fft             | 466 ms        | 462 ms          | 1.01x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| scimark_lu              | 358 ms        | 334 ms          | 1.07x faster | Significant (t=5.46)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| scimark_monte_carlo     | 167 ms        | 155 ms          | 1.08x faster | Significant (t=4.68)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| scimark_sor             | 346 ms        | 328 ms          | 1.05x faster | Significant (t=7.55)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| scimark_sparse_mat_mult | 5.49 ms       | 5.14 ms         | 1.07x faster | Significant (t=13.23) |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| spectral_norm           | 209 ms        | 199 ms          | 1.05x faster | Significant (t=4.10)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| sympy_expand            | 650 ms        | 638 ms          | 1.02x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| sympy_integrate         | 30.8 ms       | 30.2 ms         | 1.02x faster | Significant (t=9.94)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| sympy_str               | 425 ms        | 407 ms          | 1.04x faster | Significant (t=4.26)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| sympy_sum               | 248 ms        | 248 ms          | 1.00x slower | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| telco                   | 9.63 ms       | 9.37 ms         | 1.03x faster | Significant (t=9.26)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| tornado_http            | 371 ms        | 369 ms          | 1.00x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| unpack_sequence         | 60.1 ns       | 59.0 ns         | 1.02x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| unpickle                | 19.3 us       | 18.9 us         | 1.02x faster | Significant (t=8.65)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| unpickle_list           | 4.72 us       | 4.67 us         | 1.01x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| unpickle_pure_python    | 576 us        | 562 us          | 1.02x faster | Significant (t=7.27)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| xml_etree_generate      | 156 ms        | 153 ms          | 1.02x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| xml_etree_iterparse     | 157 ms        | 146 ms          | 1.08x faster | Significant (t=6.27)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| xml_etree_parse         | 187 ms        | 183 ms          | 1.02x faster | Significant (t=4.14)  |
+-------------------------+---------------+-----------------+--------------+-----------------------+
| xml_etree_process       | 126 ms        | 123 ms          | 1.02x faster | Not significant       |
+-------------------------+---------------+-----------------+--------------+-----------------------+

For maintenance reasons, I'd really prefer it if we could find a way to reuse the existing code that calls an external stack probe function. What do you think about taking a look at X86RetpolineThunks.cpp and doing something similar to that? Basically, when the user sets -fstack-clash-protection, LLVM will emit a small comdat+weak function into every object file that has the same ABI as the existing stack probe mechanism. For other prior art, you can also look at how __clang_call_terminate works.

@rnk I 100% understand the reasoning, and will have a look to the files you're pointing at. AFAIU the approach you're suggesting is incompatible with the « free probe » algorithm that moves instruction in between stack allocations, but let me confirm that first.

Another test run on an x264 encoder (source: https://openbenchmarking.org/test/pts/x264)

Compiled with -O2 and with or without -fstack-clash-protection; Run without threads (x265 --pools 1 -F 1 ./Bosphorus_1920x1080_120fps_420_8bit_YUV.y4m /dev/null)

Clang

baseline:  318.60s 
protection:  317.72s

So no performance impact beyond noise.

The compilation inserts 44 inline probes in 9 functions.

gcc

Out of comparison, with gcc 8.2 (yeah, it's a bit old), I get (same flags & setup)

baseline : 417.53 sec
protected : 412.6 sec

The compilations inserts inline probes in 22 functions.
So gcc inserts more probes, the impact on performance is equally surprising. I need to gather more points to get some statistical informations there.

Get rid of static mapping + update test cases

serge-sans-paille marked 6 inline comments as done.Oct 15 2019, 8:37 AM

dmajor added a subscriber: dmajor.Oct 16 2019, 10:44 AM

Moved the implementation to a specialization of emitStackProbeInline that used to be Windows-centric. Provide a generic implementation that generates inline assembly instead. that way we don't clutter existing functions, and only add a new case to emitSPUpdate.

Handle large stack allocation with a loop instead of a (large) basic block.

@rnk: this is far less intrusive and more integrated to the existing structure, thanks a lot for hinting in that direction.

Extra note: an older version of the patch has been tested by the firefox team without much performance impact, (and no test failure), see https://bugzilla.mozilla.org/show_bug.cgi?id=1588710

Explicilty prefer existing mechanism for windows.
Split loop/block allocation strategy to different functions.

You should also probably document the arg in
clang/docs/ClangCommandLineReference.rst
we have -fstack-protector-strong & co in the doc

Better integration with MachineInstr description
Handle stacks with multiple stack objects
More test case

Update documentation as suggested by @sylvestre.ledru
Corner case when a write was touching memory beyond the allocated stack.

Some benchmark / instrumentation report: due to the way memory moves are ordered in the entry block, there tend to be relatively few free probes between two stack growth within a function, and a large number after the last stack growth.

When recompiling llc, I only get 18 free probes between stack growth, and a large amount (several thousands) after the last stack growth.
When recompiling cpython, I only get 4 free probes between stack growth.

This may be improved by rescheduling instructions / changing stack object layout but that's probably not worth the effort (there's not that much functions with stack space > a page)

Temporarily comment out call support as free probe, everything else passes validation but a call may have some stack effect I don't handle (yet).

sylvestre.ledru added inline comments.Oct 24 2019, 8:11 AM

clang/docs/ClangCommandLineReference.rst
1901 ↗	(On Diff #226109)	Maybe add that it is Linux only? :)

Moved to a simpler, easier to review, less error prone implementation. Validates just fines on llvm bootstrap an cpython code base. @sylvestre.ledru that's the patch you can play with.

@rnk what's your take on that version?

@sylvestre.ledru : quick backport for llvm9: https://sergesanspaille.fedorapeople.org/0001-Stack-clash-mir-attempt.patch

@sylvestre.ledru did the testing and benchmarking on firefox (see https://bugzilla.mozilla.org/show_bug.cgi?id=1588710#c12), everything seems ok, let's move forward?

In D68720#1733839, @serge-sans-paille wrote:

@sylvestre.ledru did the testing and benchmarking on firefox (see https://bugzilla.mozilla.org/show_bug.cgi?id=1588710#c12), everything seems ok, let's move forward?

Actually, not sure about the benchmark anymore. I am checking with experts. Sorry

serge-sans-paille added a reviewer: craig.topper.Nov 12 2019, 2:50 AM

@rnk : up ?

https://reviews.llvm.org/rG397fa687691876de9ff0fbaaf0def3ac5a48899c

Commited?

Added extra test for arch support (warns if the flag is used on unsupported targets, and checks it's effectively unused), cc @sylvestre.ledru

In D68720#1755546, @xbolva00 wrote:

https://reviews.llvm.org/rG397fa687691876de9ff0fbaaf0def3ac5a48899c

Commited?

No, still waiting for @rnk feedback. This commit is just me using the wrong remote when pushing for validation.

For the record : this validates fine on OSX, Linux and Windows, using the github actions setup by @tstellar

https://github.com/serge-sans-paille/llvm-project/pull/3

@sylvestre.ledru backport for release/9.x branch: https://sergesanspaille.fedorapeople.org/0001-Stack-clash-mir-attempt.patch

@rnk up? the mozilla folks confirmed there's no significant regression (see https://bugzilla.mozilla.org/show_bug.cgi?id=1588710#c12).

tstellar added a reviewer: annita.zhang.Dec 13 2019, 12:08 PM

annita.zhang added a reviewer: andrew.w.kaylor.Jan 6 2020, 12:16 AM

craig.topper added inline comments.Jan 6 2020, 10:48 AM

clang/docs/ClangCommandLineReference.rst
1903 ↗	(On Diff #230494)	Probably need a -fno-stack-class-protection as well. Looks like gcc has it. You'll need to update the handling in the driver to make sure the last polarity wins. For cc1 you might be able to support just the positive variant. But try to see what we usually do.
clang/lib/Basic/Targets/X86.h
152	What about 32-bit mode where the register name is "esp"?
clang/lib/CodeGen/CGStmt.cpp
2239	Why is this in the frontend diagnostic list?
clang/lib/Driver/ToolChains/Clang.cpp
2613	Can we use EffectiveTriple.isX86() that was just introduced yesterday?
2622	Seems like this should just be an if? Or maybe use Args.filtered or args getLastArg?

tstellar added inline comments.Jan 9 2020, 1:05 PM

clang/lib/Driver/ToolChains/Clang.cpp
2615	Do we need to warn (or error) here for arches that don't implement -fstack-clash-protection?

Take review into account.

serge-sans-paille marked 9 inline comments as done.Jan 10 2020, 6:03 AM

serge-sans-paille added inline comments.

clang/lib/CodeGen/CGStmt.cpp
2239	As far as I understand, that's the only place where line/col accurate diagnostics are emitted.
clang/lib/Driver/ToolChains/Clang.cpp
2615	That's done indirectly because the argument is not claimed on unsupported platform.

@craig.topper do you think there is a chance that this change could be part of clang-10 ?
Thanks

craig.topper added inline comments.Jan 10 2020, 8:45 AM

clang/lib/CodeGen/CGStmt.cpp
2239	I was really asking about why we have to use FrontendDiagnostic.h and DriverDiagnostic.h? Is that what we normally use for diagnostics in Codegen files? Seems like a possible layering issue for modules build

serge-sans-paille marked 2 inline comments as done.Jan 10 2020, 9:50 AM

serge-sans-paille added inline comments.

clang/lib/CodeGen/CGStmt.cpp
2239	Oh, I understand now, let me have a look, your remark totally makes sense.

update warn_fe_stack_clash_protection_inline_asm location.

serge-sans-paille marked an inline comment as done.Jan 11 2020, 12:09 AM

@craig.topper up?

craig.topper added inline comments.Jan 16 2020, 11:03 AM

clang/include/clang/Basic/DiagnosticCommonKinds.td
243 ↗	(On Diff #237486)	Remove "fe_" from this? Looks like thats only there because this used to be DiagnosticFrontendKinds?
clang/lib/Basic/Targets/X86.h
152	This probably doesn't work correctly with X32 which is 64-bit but uses 32-bit pointers. Maybe we change this to "bool isSPReg(StringRef Reg)" and pass the register name to it so you can just check both strings always.
clang/lib/Driver/ToolChains/Clang.cpp
2616	I think this can use Args.hasFlag(options::OPT_fstack_clash_protection, options::OPT_fnostack_clash_protection, false) and then you don't need the 'if'
llvm/lib/Target/X86/X86ISelLowering.cpp
30030	This uses physSPReg, but doesn't match the condition for when physSPReg is a 64-bit register. Same at several places below.
llvm/lib/Target/X86/X86InstrCompiler.td
130	Why is this In64BitMode, but above is NotLP64. Shouldn't they be opposites? Looks like this was just copied from SEG_ALLOCA above?

Take into account some of the reviews

serge-sans-paille marked 2 inline comments as done.Jan 17 2020, 5:44 AM

serge-sans-paille added inline comments.

clang/lib/Driver/ToolChains/Clang.cpp
2616	I wasn't aware of that API, thanks for pointing this out o/
llvm/lib/Target/X86/X86ISelLowering.cpp
30030	Sorry, I don't understand your remark. Can you elaborate?
llvm/lib/Target/X86/X86InstrCompiler.td
130	Yeah, these intrinsics are just the same as the one above, except they have a different name. And they get lowered differently. I'm not familiar with that part of LLVM, can you suggest a better approach?

Unit tests: fail. 61960 tests passed, 3 failed and 783 were skipped.

failed: Clang.CodeGen/stack-clash-protection.c
failed: Clang.Driver/stack-clash-protection.c
failed: Clang.Misc/warning-flags.c

clang-tidy: unknown.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B44264: Diff 238758!Jan 17 2020, 6:20 AM

craig.topper added inline comments.Jan 17 2020, 10:31 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
30030	If physSPReg is RSP then you need to use SUB64ri32 and if its ESP you need to use SUB32ri. The condition you're using here is "IsLP64", but the condition for phySPReg to be RSP is "IsLP64 \|\| Subtarget.isTargetNaCl64()". So there's a mismatch when IsLP64 is false and Subtarget.isTargetNaCl64() is true.

Harmonize esp/rsp usage as hinted by @craig.topper
Fix argument detection

Unit tests: fail. 62035 tests passed, 2 failed and 783 were skipped.

failed: Clang.CodeGen/stack-clash-protection.c
failed: Clang.Misc/warning-flags.c

clang-tidy: unknown.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B44423: Diff 239166!Jan 20 2020, 11:31 AM

Update warning category

Unit tests: fail. 62059 tests passed, 1 failed and 783 were skipped.

failed: Clang.CodeGen/stack-clash-protection.c

clang-tidy: unknown.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B44464: Diff 239257!Jan 21 2020, 2:29 AM

Make some tests more portable.

Unit tests: pass. 62269 tests passed, 0 failed and 827 were skipped.

clang-tidy: fail. clang-tidy found 0 errors and 16 warnings. 0 of them are added as review comments below (why?).

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45118: Diff 240844!Jan 28 2020, 6:23 AM

@craig.topper up :-)

LGTM

This revision is now accepted and ready to land.Feb 6 2020, 10:02 AM

Any plans to merge this feature to 10 release?

In D68720#1862050, @xbolva00 wrote:

Any plans to merge this feature to 10 release?

I would love to have it in -10 for Firefox !

I would love too, @hans what do you think?

Closed by commit rG39f50da2a357: Support -fstack-clash-protection for x86 (authored by serge-sans-paille). · Explain WhyFeb 7 2020, 2:00 AM

This revision was automatically updated to reflect the committed changes.

This seems to be breaking the buildbots, e.g.
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/33475/steps/ninja%20check%201/logs/FAIL%3A%20Clang%3A%3Astack-clash-protection.c
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/62154/steps/test-check-all/logs/stdio
http://45.33.8.238/win/7706/step_7.txt

This breaks check-clang on mac and win: http://45.33.8.238/mac/7485/step_7.txt http://45.33.8.238/win/7753/step_7.txt

Please revert and then investigate asynchronously, unless the fix is obvious.

clang/test/CodeGen/stack-clash-protection.c
3	Tests should compile binaries and then run them if at all possible. This is impossible in cross-build scenarios and in general we try hard to have unit tests, not integration tests. Check that this produces the IR you expect, and have an llvm-level test that the IR produces the assembly you expect, and if you must, an lld level test that checks that the generated elf file looks like you think.

Reverted in b03c3d8c620 for now.

Also got failed clang tests on Arm/Aarch64 builders

http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/4295
http://lab.llvm.org:8011/builders/llvm-clang-win-x-aarch64/builds/4277

FAIL: Clang::stack-clash-protection.c
FAIL: Clang::stack-clash-protection.c

error: unable to create target: 'No available targets are compatible with triple "x86_64"'

looks like REQUIRES: x86-registered-target keyword is needed.

@serge-sans-paille Please can you fix the issues with EXPENSIVE_CHECKS builds: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/2604

For the record: tests have been updated, option handling cleaned up and expansive check failure fixed in commit e67cbac81211d40332a79d98c9d5953624cc1202.

sberg added a subscriber: sberg.Aug 10 2020, 12:13 PM

sberg added inline comments.

clang/include/clang/Driver/Options.td
1711	Should this rather spell `"fno-stack-clash-protection"`? The above change to clang/docs/ClangCommandLineReference.rst (but which got overwritten by https://github.com/llvm/llvm-project/commit/9624beb38a46037f69362650a52e06d8be4fd006 "[docs] Regenerate ClangCommandLineReference.rst") mentions `-fno-stack-clash-protection`, and also GCC calls it like that. (Though the below clang/test/Driver/stack-clash-protection.c does use `-fnostack-clash-protection`.)

Herald added a subscriber: dang. · View Herald TranscriptAug 10 2020, 12:13 PM

xbolva00 added inline comments.Aug 10 2020, 12:18 PM

clang/include/clang/Driver/Options.td
1711	And the fix should be backported to release branch. @hans

I filed https://bugs.llvm.org/show_bug.cgi?id=47139 "Misspelled -fnostack-clash-protection" now.

Sorry if I missed something here, but why is this marked as "Closed"? It seems like the feature has still not landed (i.e. it got reverted).

This revision is now accepted and ready to land.Aug 27 2020, 4:32 PM

It was reverted, then re-landed. It's in trunk now.

Ah! Yes, I see it now. Thanks and sorry for the noise!

emaste added a subscriber: emaste.Nov 24 2020, 5:34 PM

emaste added inline comments.

clang/lib/Driver/ToolChains/Clang.cpp
2610–2611	Is there anything OS-dependent here? I plan to add `EffectiveTriple.isOSFreeBSD()` to FreeBSD-s in-tree Clang https://reviews.freebsd.org/D27366 and intend to upstream the change (with a test), but should we just remove the OS test completely?

Herald added subscribers: dexonsmith, pengfei. · View Herald TranscriptNov 24 2020, 5:34 PM

serge-sans-paille added inline comments.Nov 25 2020, 1:03 AM

clang/lib/Driver/ToolChains/Clang.cpp
2610–2611	According to https://blog.qualys.com/vulnerabilities-research/2017/06/19/the-stack-clash, stack clash impacts Linux, OpenBSD, NetBSD, FreeBSD and Solaris We should probably enlarge the test to these platforms.

emaste mentioned this in D92100: [clang] do not limit -fstack-clash-protection to Linux.Nov 25 2020, 7:25 AM

emaste mentioned this in D54880: Ignore gcc's stack-clash-protection flag.Aug 23 2021, 12:06 PM

ryan-summers mentioned this in D152626: Add support for probestack on ARM targets.Jun 10 2023, 11:32 AM

Revision Contents

Path

Size

clang/

docs/

ReleaseNotes.rst

4 lines

include/

clang/

Basic/

CodeGenOptions.def

1 line

DiagnosticFrontendKinds.td

3 lines

TargetInfo.h

2 lines

Driver/

CC1Options.td

2 lines

Options.td

2 lines

lib/

Basic/

Targets/

X86.h

2 lines

CodeGen/

CGStmt.cpp

12 lines

CodeGenModule.cpp

3 lines

Driver/

ToolChains/

Clang.cpp

24 lines

Frontend/

CompilerInvocation.cpp

2 lines

test/

CodeGen/

stack-clash-protection.c

34 lines

Driver/

stack-clash-protection.c

22 lines

llvm/

docs/

ReleaseNotes.rst

4 lines

include/

llvm/

CodeGen/

TargetLowering.h

4 lines

lib/

Target/

X86/

X86CallFrameOptimization.cpp

5 lines

20 lines

261 lines

9 lines

133 lines

17 lines

5 lines

test/

CodeGen/

X86/

stack-clash-dynamic-alloca.ll

44 lines

stack-clash-large.ll

38 lines

stack-clash-medium-natural-probes.ll

31 lines

stack-clash-medium.ll

28 lines

stack-clash-no-free-probe.ll

27 lines

stack-clash-small.ll

25 lines

Diff 225462

clang/docs/ReleaseNotes.rst

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	- For X86 target, -march=skylake-avx512, -march=icelake-client,
not using 512-bit zmm registers in vectorized code unless 512-bit intrinsics		not using 512-bit zmm registers in vectorized code unless 512-bit intrinsics
are used in the source code. 512-bit operations are known to cause the CPUs		are used in the source code. 512-bit operations are known to cause the CPUs
to run at a lower frequency which can impact performance. This behavior can be		to run at a lower frequency which can impact performance. This behavior can be
changed by passing -mprefer-vector-width=512 on the command line.		changed by passing -mprefer-vector-width=512 on the command line.

New Compiler Flags		New Compiler Flags
------------------		------------------

- ...		- -fstack-clash-protection will provide a protection against the stack clash
		attack for x86 architecture through automatic probing of each page of
		allocated stack.

Deprecated Compiler Flags		Deprecated Compiler Flags
-------------------------		-------------------------

The following options are deprecated and ignored. They will be removed in		The following options are deprecated and ignored. They will be removed in
future versions of Clang.		future versions of Clang.

- -mmpx used to enable the __MPX__ preprocessor define for the Intel MPX		- -mmpx used to enable the __MPX__ preprocessor define for the Intel MPX
▲ Show 20 Lines • Show All 194 Lines • Show Last 20 Lines

clang/include/clang/Basic/CodeGenOptions.def

	Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	CODEGENOPT(NoCommon , 1, 0) ///< Set when -fno-common or C++ is enabled.			CODEGENOPT(NoCommon , 1, 0) ///< Set when -fno-common or C++ is enabled.
	CODEGENOPT(NoDwarfDirectoryAsm , 1, 0) ///< Set when -fno-dwarf-directory-asm is			CODEGENOPT(NoDwarfDirectoryAsm , 1, 0) ///< Set when -fno-dwarf-directory-asm is
	///< enabled.			///< enabled.
	CODEGENOPT(NoExecStack , 1, 0) ///< Set when -Wa,--noexecstack is enabled.			CODEGENOPT(NoExecStack , 1, 0) ///< Set when -Wa,--noexecstack is enabled.
	CODEGENOPT(FatalWarnings , 1, 0) ///< Set when -Wa,--fatal-warnings is			CODEGENOPT(FatalWarnings , 1, 0) ///< Set when -Wa,--fatal-warnings is
	///< enabled.			///< enabled.
	CODEGENOPT(NoWarn , 1, 0) ///< Set when -Wa,--no-warn is enabled.			CODEGENOPT(NoWarn , 1, 0) ///< Set when -Wa,--no-warn is enabled.
	CODEGENOPT(EnableSegmentedStacks , 1, 0) ///< Set when -fsplit-stack is enabled.			CODEGENOPT(EnableSegmentedStacks , 1, 0) ///< Set when -fsplit-stack is enabled.
				CODEGENOPT(StackClashProtector, 1, 0) ///< Set when -fstack-clash-protection is enabled.
	CODEGENOPT(NoImplicitFloat , 1, 0) ///< Set when -mno-implicit-float is enabled.			CODEGENOPT(NoImplicitFloat , 1, 0) ///< Set when -mno-implicit-float is enabled.
	CODEGENOPT(NoInfsFPMath , 1, 0) ///< Assume FP arguments, results not +-Inf.			CODEGENOPT(NoInfsFPMath , 1, 0) ///< Assume FP arguments, results not +-Inf.
	CODEGENOPT(NoSignedZeros , 1, 0) ///< Allow ignoring the signedness of FP zero			CODEGENOPT(NoSignedZeros , 1, 0) ///< Allow ignoring the signedness of FP zero
	CODEGENOPT(NullPointerIsValid , 1, 0) ///< Assume Null pointer deference is defined.			CODEGENOPT(NullPointerIsValid , 1, 0) ///< Assume Null pointer deference is defined.
	CODEGENOPT(Reassociate , 1, 0) ///< Allow reassociation of FP math ops			CODEGENOPT(Reassociate , 1, 0) ///< Allow reassociation of FP math ops
	CODEGENOPT(ReciprocalMath , 1, 0) ///< Allow FP divisions to be reassociated.			CODEGENOPT(ReciprocalMath , 1, 0) ///< Allow FP divisions to be reassociated.
	CODEGENOPT(NoTrappingMath , 1, 0) ///< Set when -fno-trapping-math is enabled.			CODEGENOPT(NoTrappingMath , 1, 0) ///< Set when -fno-trapping-math is enabled.
	CODEGENOPT(NoNaNsFPMath , 1, 0) ///< Assume FP arguments, results not NaN.			CODEGENOPT(NoNaNsFPMath , 1, 0) ///< Assume FP arguments, results not NaN.
	▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

clang/include/clang/Basic/DiagnosticFrontendKinds.td

	Show All 11 Lines

	def err_fe_error_opening : Error<"error opening '%0': %1">;			def err_fe_error_opening : Error<"error opening '%0': %1">;
	def err_fe_error_reading : Error<"error reading '%0'">;			def err_fe_error_reading : Error<"error reading '%0'">;
	def err_fe_error_reading_stdin : Error<"error reading stdin: %0">;			def err_fe_error_reading_stdin : Error<"error reading stdin: %0">;
	def err_fe_error_backend : Error<"error in backend: %0">, DefaultFatal;			def err_fe_error_backend : Error<"error in backend: %0">, DefaultFatal;

	def err_fe_inline_asm : Error<"%0">, CatInlineAsm;			def err_fe_inline_asm : Error<"%0">, CatInlineAsm;
	def warn_fe_inline_asm : Warning<"%0">, CatInlineAsm, InGroup<BackendInlineAsm>;			def warn_fe_inline_asm : Warning<"%0">, CatInlineAsm, InGroup<BackendInlineAsm>;
				def warn_fe_stack_clash_protection_inline_asm : Warning<
				"Unable to protect inline asm that clobbers stack pointer against stack clash">,
				CatInlineAsm, InGroup<BackendInlineAsm>;
	def note_fe_inline_asm : Note<"%0">, CatInlineAsm;			def note_fe_inline_asm : Note<"%0">, CatInlineAsm;
	def note_fe_inline_asm_here : Note<"instantiated into assembly here">;			def note_fe_inline_asm_here : Note<"instantiated into assembly here">;
	def err_fe_cannot_link_module : Error<"cannot link module '%0': %1">,			def err_fe_cannot_link_module : Error<"cannot link module '%0': %1">,
	DefaultFatal;			DefaultFatal;

	def warn_fe_frame_larger_than : Warning<"stack frame size of %0 bytes in %q1">,			def warn_fe_frame_larger_than : Warning<"stack frame size of %0 bytes in %q1">,
	BackendInfo, InGroup<BackendFrameLargerThanEQ>;			BackendInfo, InGroup<BackendFrameLargerThanEQ>;
	def warn_fe_backend_frame_larger_than: Warning<"%0">,			def warn_fe_backend_frame_larger_than: Warning<"%0">,
	▲ Show 20 Lines • Show All 253 Lines • Show Last 20 Lines

clang/include/clang/Basic/TargetInfo.h

Show First 20 Lines • Show All 809 Lines • ▼ Show 20 Lines	public:
/// Returns the "normalized" GCC register name.		/// Returns the "normalized" GCC register name.
///		///
/// ReturnCannonical true will return the register name without any additions		/// ReturnCannonical true will return the register name without any additions
/// such as "{}" or "%" in it's canonical form, for example:		/// such as "{}" or "%" in it's canonical form, for example:
/// ReturnCanonical = true and Name = "rax", will return "ax".		/// ReturnCanonical = true and Name = "rax", will return "ax".
StringRef getNormalizedGCCRegisterName(StringRef Name,		StringRef getNormalizedGCCRegisterName(StringRef Name,
bool ReturnCanonical = false) const;		bool ReturnCanonical = false) const;

		virtual const char *getSPRegName() const { return nullptr; }

/// Extracts a register from the passed constraint (if it is a		/// Extracts a register from the passed constraint (if it is a
/// single-register constraint) and the asm label expression related to a		/// single-register constraint) and the asm label expression related to a
/// variable in the input or output list of an inline asm statement.		/// variable in the input or output list of an inline asm statement.
///		///
/// This function is used by Sema in order to diagnose conflicts between		/// This function is used by Sema in order to diagnose conflicts between
/// the clobber list and the input/output lists.		/// the clobber list and the input/output lists.
virtual StringRef getConstraintRegister(StringRef Constraint,		virtual StringRef getConstraintRegister(StringRef Constraint,
StringRef Expression) const {		StringRef Expression) const {
▲ Show 20 Lines • Show All 575 Lines • Show Last 20 Lines

clang/include/clang/Driver/CC1Options.td

	Show First 20 Lines • Show All 730 Lines • ▼ Show 20 Lines
	def error_on_deserialized_pch_decl_EQ : Joined<["-"], "error-on-deserialized-decl=">,			def error_on_deserialized_pch_decl_EQ : Joined<["-"], "error-on-deserialized-decl=">,
	Alias<error_on_deserialized_pch_decl>;			Alias<error_on_deserialized_pch_decl>;
	def static_define : Flag<["-"], "static-define">,			def static_define : Flag<["-"], "static-define">,
	HelpText<"Should __STATIC__ be defined">;			HelpText<"Should __STATIC__ be defined">;
	def stack_protector : Separate<["-"], "stack-protector">,			def stack_protector : Separate<["-"], "stack-protector">,
	HelpText<"Enable stack protectors">;			HelpText<"Enable stack protectors">;
	def stack_protector_buffer_size : Separate<["-"], "stack-protector-buffer-size">,			def stack_protector_buffer_size : Separate<["-"], "stack-protector-buffer-size">,
	HelpText<"Lower bound for a buffer to be considered for stack protection">;			HelpText<"Lower bound for a buffer to be considered for stack protection">;
				def stack_clash_protection : Separate<["-"], "stack-clash-protection">,
				HelpText<"Enable stack clash protection">;
	def fvisibility : Separate<["-"], "fvisibility">,			def fvisibility : Separate<["-"], "fvisibility">,
	HelpText<"Default type and symbol visibility">;			HelpText<"Default type and symbol visibility">;
	def ftype_visibility : Separate<["-"], "ftype-visibility">,			def ftype_visibility : Separate<["-"], "ftype-visibility">,
	HelpText<"Default type visibility">;			HelpText<"Default type visibility">;
	def fapply_global_visibility_to_externs : Flag<["-"], "fapply-global-visibility-to-externs">,			def fapply_global_visibility_to_externs : Flag<["-"], "fapply-global-visibility-to-externs">,
	HelpText<"Apply global symbol visibility to external declarations without an explicit visibility">;			HelpText<"Apply global symbol visibility to external declarations without an explicit visibility">;
	def ftemplate_depth : Separate<["-"], "ftemplate-depth">,			def ftemplate_depth : Separate<["-"], "ftemplate-depth">,
	HelpText<"Maximum depth of recursive template instantiation">;			HelpText<"Maximum depth of recursive template instantiation">;
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

	Show First 20 Lines • Show All 1,700 Lines • ▼ Show 20 Lines
	def fspell_checking_limit_EQ : Joined<["-"], "fspell-checking-limit=">, Group<f_Group>;			def fspell_checking_limit_EQ : Joined<["-"], "fspell-checking-limit=">, Group<f_Group>;
	def fsigned_bitfields : Flag<["-"], "fsigned-bitfields">, Group<f_Group>;			def fsigned_bitfields : Flag<["-"], "fsigned-bitfields">, Group<f_Group>;
	def fsigned_char : Flag<["-"], "fsigned-char">, Group<f_Group>;			def fsigned_char : Flag<["-"], "fsigned-char">, Group<f_Group>;
	def fno_signed_char : Flag<["-"], "fno-signed-char">, Group<f_Group>,			def fno_signed_char : Flag<["-"], "fno-signed-char">, Group<f_Group>,
	Flags<[CC1Option]>, HelpText<"Char is unsigned">;			Flags<[CC1Option]>, HelpText<"Char is unsigned">;
	def fsplit_stack : Flag<["-"], "fsplit-stack">, Group<f_Group>;			def fsplit_stack : Flag<["-"], "fsplit-stack">, Group<f_Group>;
	def fstack_protector_all : Flag<["-"], "fstack-protector-all">, Group<f_Group>,			def fstack_protector_all : Flag<["-"], "fstack-protector-all">, Group<f_Group>,
	HelpText<"Enable stack protectors for all functions">;			HelpText<"Enable stack protectors for all functions">;
				def fstack_clash_protection : Flag<["-"], "fstack-clash-protection">, Group<f_Group>,
				HelpText<"Enable stack clash protection">;
	def fstack_protector_strong : Flag<["-"], "fstack-protector-strong">, Group<f_Group>,			def fstack_protector_strong : Flag<["-"], "fstack-protector-strong">, Group<f_Group>,
				sbergUnsubmitted Not Done Reply Inline Actions Should this rather spell `"fno-stack-clash-protection"`? The above change to clang/docs/ClangCommandLineReference.rst (but which got overwritten by https://github.com/llvm/llvm-project/commit/9624beb38a46037f69362650a52e06d8be4fd006 "[docs] Regenerate ClangCommandLineReference.rst") mentions `-fno-stack-clash-protection`, and also GCC calls it like that. (Though the below clang/test/Driver/stack-clash-protection.c does use `-fnostack-clash-protection`.) sberg: Should this rather spell `"fno-stack-clash-protection"`? The above change to…
				xbolva00Unsubmitted Not Done Reply Inline Actions And the fix should be backported to release branch. @hans xbolva00: And the fix should be backported to release branch. @hans
	HelpText<"Enable stack protectors for some functions vulnerable to stack smashing. "			HelpText<"Enable stack protectors for some functions vulnerable to stack smashing. "
	"Compared to -fstack-protector, this uses a stronger heuristic "			"Compared to -fstack-protector, this uses a stronger heuristic "
	"that includes functions containing arrays of any size (and any type), "			"that includes functions containing arrays of any size (and any type), "
	"as well as any calls to alloca or the taking of an address from a local variable">;			"as well as any calls to alloca or the taking of an address from a local variable">;
	def fstack_protector : Flag<["-"], "fstack-protector">, Group<f_Group>,			def fstack_protector : Flag<["-"], "fstack-protector">, Group<f_Group>,
	HelpText<"Enable stack protectors for some functions vulnerable to stack smashing. "			HelpText<"Enable stack protectors for some functions vulnerable to stack smashing. "
	"This uses a loose heuristic which considers functions vulnerable "			"This uses a loose heuristic which considers functions vulnerable "
	"if they contain a char (or 8bit integer) array or constant sized calls to "			"if they contain a char (or 8bit integer) array or constant sized calls to "
	▲ Show 20 Lines • Show All 1,568 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/X86.h

Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	public:
ArrayRef<const char *> getGCCRegNames() const override;		ArrayRef<const char *> getGCCRegNames() const override;

ArrayRef<TargetInfo::GCCRegAlias> getGCCRegAliases() const override {		ArrayRef<TargetInfo::GCCRegAlias> getGCCRegAliases() const override {
return None;		return None;
}		}

ArrayRef<TargetInfo::AddlRegName> getGCCAddlRegNames() const override;		ArrayRef<TargetInfo::AddlRegName> getGCCAddlRegNames() const override;

		const char *getSPRegName() const override { return "rsp"; }
		craig.topperUnsubmitted Done Reply Inline Actions What about 32-bit mode where the register name is "esp"? craig.topper: What about 32-bit mode where the register name is "esp"?
		craig.topperUnsubmitted Done Reply Inline Actions This probably doesn't work correctly with X32 which is 64-bit but uses 32-bit pointers. Maybe we change this to "bool isSPReg(StringRef Reg)" and pass the register name to it so you can just check both strings always. craig.topper: This probably doesn't work correctly with X32 which is 64-bit but uses 32-bit pointers. Maybe…

bool validateCpuSupports(StringRef Name) const override;		bool validateCpuSupports(StringRef Name) const override;

bool validateCpuIs(StringRef Name) const override;		bool validateCpuIs(StringRef Name) const override;

bool validateCPUSpecificCPUDispatch(StringRef Name) const override;		bool validateCPUSpecificCPUDispatch(StringRef Name) const override;

char CPUSpecificManglingCharacter(StringRef Name) const override;		char CPUSpecificManglingCharacter(StringRef Name) const override;

▲ Show 20 Lines • Show All 703 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGStmt.cpp

//===--- CGStmt.cpp - Emit LLVM Code from Statements ----------------------===//		//===--- CGStmt.cpp - Emit LLVM Code from Statements ----------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This contains code to emit Stmt nodes as LLVM code.		// This contains code to emit Stmt nodes as LLVM code.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CodeGenFunction.h"
#include "CGDebugInfo.h"		#include "CGDebugInfo.h"
		#include "CodeGenFunction.h"
#include "CodeGenModule.h"		#include "CodeGenModule.h"
#include "TargetInfo.h"		#include "TargetInfo.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
#include "clang/Basic/Builtins.h"		#include "clang/Basic/Builtins.h"
#include "clang/Basic/PrettyStackTrace.h"		#include "clang/Basic/PrettyStackTrace.h"
#include "clang/Basic/TargetInfo.h"		#include "clang/Basic/TargetInfo.h"
		#include "clang/Driver/DriverDiagnostic.h"
		#include "clang/Frontend/FrontendDiagnostic.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/InlineAsm.h"		#include "llvm/IR/InlineAsm.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/MDBuilder.h"		#include "llvm/IR/MDBuilder.h"

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;
▲ Show 20 Lines • Show All 2,195 Lines • ▼ Show 20 Lines	void CodeGenFunction::EmitAsmStmt(const AsmStmt &S) {
}		}

// Clobbers		// Clobbers
for (unsigned i = 0, e = S.getNumClobbers(); i != e; i++) {		for (unsigned i = 0, e = S.getNumClobbers(); i != e; i++) {
StringRef Clobber = S.getClobber(i);		StringRef Clobber = S.getClobber(i);

if (Clobber == "memory")		if (Clobber == "memory")
ReadOnly = ReadNone = false;		ReadOnly = ReadNone = false;
else if (Clobber != "cc")		else if (Clobber != "cc") {
Clobber = getTarget().getNormalizedGCCRegisterName(Clobber);		Clobber = getTarget().getNormalizedGCCRegisterName(Clobber);
		if (CGM.getCodeGenOpts().StackClashProtector &&
		Clobber == getTarget().getSPRegName()) {
		CGM.getDiags().Report(S.getAsmLoc(),
		diag::warn_fe_stack_clash_protection_inline_asm);
		craig.topperUnsubmitted Done Reply Inline Actions Why is this in the frontend diagnostic list? craig.topper: Why is this in the frontend diagnostic list?
		serge-sans-pailleAuthorUnsubmitted Done Reply Inline Actions As far as I understand, that's the only place where line/col accurate diagnostics are emitted. serge-sans-paille: As far as I understand, that's the only place where line/col accurate diagnostics are emitted.
		craig.topperUnsubmitted Done Reply Inline Actions I was really asking about why we have to use FrontendDiagnostic.h and DriverDiagnostic.h? Is that what we normally use for diagnostics in Codegen files? Seems like a possible layering issue for modules build craig.topper: I was really asking about why we have to use FrontendDiagnostic.h and DriverDiagnostic.h? Is…
		serge-sans-pailleAuthorUnsubmitted Done Reply Inline Actions Oh, I understand now, let me have a look, your remark totally makes sense. serge-sans-paille: Oh, I understand now, let me have a look, your remark totally makes sense.
		}
		}

if (!Constraints.empty())		if (!Constraints.empty())
Constraints += ',';		Constraints += ',';

Constraints += "~{";		Constraints += "~{";
Constraints += Clobber;		Constraints += Clobber;
Constraints += '}';		Constraints += '}';
}		}
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

	Show First 20 Lines • Show All 1,480 Lines • ▼ Show 20 Lines

	void CodeGenModule::SetLLVMFunctionAttributesForDefinition(const Decl *D,			void CodeGenModule::SetLLVMFunctionAttributesForDefinition(const Decl *D,
	llvm::Function *F) {			llvm::Function *F) {
	llvm::AttrBuilder B;			llvm::AttrBuilder B;

	if (CodeGenOpts.UnwindTables)			if (CodeGenOpts.UnwindTables)
	B.addAttribute(llvm::Attribute::UWTable);			B.addAttribute(llvm::Attribute::UWTable);

				if (CodeGenOpts.StackClashProtector)
				B.addAttribute("probe-stack", "inline-asm");

	if (!hasUnwindExceptions(LangOpts))			if (!hasUnwindExceptions(LangOpts))
	B.addAttribute(llvm::Attribute::NoUnwind);			B.addAttribute(llvm::Attribute::NoUnwind);

	if (!D \|\| !D->hasAttr<NoStackProtectorAttr>()) {			if (!D \|\| !D->hasAttr<NoStackProtectorAttr>()) {
	if (LangOpts.getStackProtector() == LangOptions::SSPOn)			if (LangOpts.getStackProtector() == LangOptions::SSPOn)
	B.addAttribute(llvm::Attribute::StackProtect);			B.addAttribute(llvm::Attribute::StackProtect);
	else if (LangOpts.getStackProtector() == LangOptions::SSPStrong)			else if (LangOpts.getStackProtector() == LangOptions::SSPStrong)
	B.addAttribute(llvm::Attribute::StackProtectStrong);			B.addAttribute(llvm::Attribute::StackProtectStrong);
	▲ Show 20 Lines • Show All 4,387 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Clang.cpp

Show First 20 Lines • Show All 2,597 Lines • ▼ Show 20 Lines	if (Str.startswith("ssp-buffer-size=")) {
// FIXME: Verify the argument is a valid integer.		// FIXME: Verify the argument is a valid integer.
CmdArgs.push_back(Args.MakeArgString(Str.drop_front(16)));		CmdArgs.push_back(Args.MakeArgString(Str.drop_front(16)));
}		}
A->claim();		A->claim();
}		}
}		}
}		}

		static void RenderSCPOptions(const ToolChain &TC, const ArgList &Args,
		ArgStringList &CmdArgs) {
		const llvm::Triple &EffectiveTriple = TC.getEffectiveTriple();

		for (const Arg *A : Args) {
		switch (A->getOption().getID()) {
		emasteUnsubmitted Not Done Reply Inline Actions Is there anything OS-dependent here? I plan to add `EffectiveTriple.isOSFreeBSD()` to FreeBSD-s in-tree Clang https://reviews.freebsd.org/D27366 and intend to upstream the change (with a test), but should we just remove the OS test completely? emaste: Is there anything OS-dependent here? I plan to add `EffectiveTriple.isOSFreeBSD()` to FreeBSD…
		serge-sans-pailleAuthorUnsubmitted Done Reply Inline Actions According to https://blog.qualys.com/vulnerabilities-research/2017/06/19/the-stack-clash, stack clash impacts Linux, OpenBSD, NetBSD, FreeBSD and Solaris We should probably enlarge the test to these platforms. serge-sans-paille: According to https://blog.qualys.com/vulnerabilities-research/2017/06/19/the-stack-clash, stack…
		default:
		continue;
		craig.topperUnsubmitted Done Reply Inline Actions Can we use EffectiveTriple.isX86() that was just introduced yesterday? craig.topper: Can we use EffectiveTriple.isX86() that was just introduced yesterday?
		case options::OPT_fstack_clash_protection: {
		switch (EffectiveTriple.getArch()) {
		tstellarUnsubmitted Done Reply Inline Actions Do we need to warn (or error) here for arches that don't implement -fstack-clash-protection? tstellar: Do we need to warn (or error) here for arches that don't implement -fstack-clash-protection?
		serge-sans-pailleAuthorUnsubmitted Done Reply Inline Actions That's done indirectly because the argument is not claimed on unsupported platform. serge-sans-paille: That's done indirectly because the argument is not claimed on unsupported platform.
		default:
		craig.topperUnsubmitted Done Reply Inline Actions I think this can use Args.hasFlag(options::OPT_fstack_clash_protection, options::OPT_fnostack_clash_protection, false) and then you don't need the 'if' craig.topper: I think this can use Args.hasFlag(options::OPT_fstack_clash_protection, options…
		serge-sans-pailleAuthorUnsubmitted Done Reply Inline Actions I wasn't aware of that API, thanks for pointing this out o/ serge-sans-paille: I wasn't aware of that API, thanks for pointing this out o/
		return;
		case llvm::Triple::ArchType::x86:
		case llvm::Triple::ArchType::x86_64:
		break;
		}
		A->claim();
		craig.topperUnsubmitted Done Reply Inline Actions Seems like this should just be an if? Or maybe use Args.filtered or args getLastArg? craig.topper: Seems like this should just be an if? Or maybe use Args.filtered or args getLastArg?
		CmdArgs.push_back("-stack-clash-protection");
		}
		}
		}
		}

static void RenderTrivialAutoVarInitOptions(const Driver &D,		static void RenderTrivialAutoVarInitOptions(const Driver &D,
const ToolChain &TC,		const ToolChain &TC,
const ArgList &Args,		const ArgList &Args,
ArgStringList &CmdArgs) {		ArgStringList &CmdArgs) {
auto DefaultTrivialAutoVarInit = TC.GetDefaultTrivialAutoVarInit();		auto DefaultTrivialAutoVarInit = TC.GetDefaultTrivialAutoVarInit();
StringRef TrivialAutoVarInit = "";		StringRef TrivialAutoVarInit = "";

for (const Arg *A : Args) {		for (const Arg *A : Args) {
▲ Show 20 Lines • Show All 2,103 Lines • ▼ Show 20 Lines	#endif

Args.AddLastArg(CmdArgs, options::OPT_pthread);		Args.AddLastArg(CmdArgs, options::OPT_pthread);

if (Args.hasFlag(options::OPT_mspeculative_load_hardening, options::OPT_mno_speculative_load_hardening,		if (Args.hasFlag(options::OPT_mspeculative_load_hardening, options::OPT_mno_speculative_load_hardening,
false))		false))
CmdArgs.push_back(Args.MakeArgString("-mspeculative-load-hardening"));		CmdArgs.push_back(Args.MakeArgString("-mspeculative-load-hardening"));

RenderSSPOptions(TC, Args, CmdArgs, KernelOrKext);		RenderSSPOptions(TC, Args, CmdArgs, KernelOrKext);
		RenderSCPOptions(TC, Args, CmdArgs);
RenderTrivialAutoVarInitOptions(D, TC, Args, CmdArgs);		RenderTrivialAutoVarInitOptions(D, TC, Args, CmdArgs);

// Translate -mstackrealign		// Translate -mstackrealign
if (Args.hasFlag(options::OPT_mstackrealign, options::OPT_mno_stackrealign,		if (Args.hasFlag(options::OPT_mstackrealign, options::OPT_mno_stackrealign,
false))		false))
CmdArgs.push_back(Args.MakeArgString("-mstackrealign"));		CmdArgs.push_back(Args.MakeArgString("-mstackrealign"));

if (Args.hasArg(options::OPT_mstack_alignment)) {		if (Args.hasArg(options::OPT_mstack_alignment)) {
▲ Show 20 Lines • Show All 1,733 Lines • Show Last 20 Lines

clang/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 1,207 Lines • ▼ Show 20 Lines	if (Arg *A = Args.getLastArg(OPT_mstack_probe_size)) {
StringRef Val = A->getValue();		StringRef Val = A->getValue();
unsigned StackProbeSize = Opts.StackProbeSize;		unsigned StackProbeSize = Opts.StackProbeSize;
Val.getAsInteger(0, StackProbeSize);		Val.getAsInteger(0, StackProbeSize);
Opts.StackProbeSize = StackProbeSize;		Opts.StackProbeSize = StackProbeSize;
}		}

Opts.NoStackArgProbe = Args.hasArg(OPT_mno_stack_arg_probe);		Opts.NoStackArgProbe = Args.hasArg(OPT_mno_stack_arg_probe);

		Opts.StackClashProtector = Args.hasArg(OPT_stack_clash_protection);

if (Arg *A = Args.getLastArg(OPT_fobjc_dispatch_method_EQ)) {		if (Arg *A = Args.getLastArg(OPT_fobjc_dispatch_method_EQ)) {
StringRef Name = A->getValue();		StringRef Name = A->getValue();
unsigned Method = llvm::StringSwitch<unsigned>(Name)		unsigned Method = llvm::StringSwitch<unsigned>(Name)
.Case("legacy", CodeGenOptions::Legacy)		.Case("legacy", CodeGenOptions::Legacy)
.Case("non-legacy", CodeGenOptions::NonLegacy)		.Case("non-legacy", CodeGenOptions::NonLegacy)
.Case("mixed", CodeGenOptions::Mixed)		.Case("mixed", CodeGenOptions::Mixed)
.Default(~0U);		.Default(~0U);
if (Method == ~0U) {		if (Method == ~0U) {
▲ Show 20 Lines • Show All 2,448 Lines • Show Last 20 Lines

clang/test/CodeGen/stack-clash-protection.c

This file was added.

				// check interaction between -fstack-clash-protection and dynamic allocation schemes
				// RUN: %clang -target x86_64 -O0 -o %t.out %s -fstack-clash-protection && %t.out

				thakisUnsubmitted Not Done Reply Inline Actions Tests should compile binaries and then run them if at all possible. This is impossible in cross-build scenarios and in general we try hard to have unit tests, not integration tests. Check that this produces the IR you expect, and have an llvm-level test that the IR produces the assembly you expect, and if you must, an lld level test that checks that the generated elf file looks like you think. thakis: Tests should compile binaries and then run them if at all possible. This is impossible in cross…
				#include <alloca.h>
				tstellarUnsubmitted Not Done Reply Inline Actions There were concerns[1] raised recently about adding clang tests that were codegen dependent. Is something being tested here that can't be tested with an IR test? If you only need to test that the frontend option work, I think checking the IR for the necessary function attributes might be enough. [1] http://lists.llvm.org/pipermail/cfe-dev/2019-September/063309.html tstellar: There were concerns[1] raised recently about adding clang tests that were codegen dependent.
				#include <string.h>

				int large_stack() __attribute__((noinline));

				int large_stack() {
				int stack[20000], i;
				for (i = 0; i < sizeof(stack) / sizeof(int); ++i)
				stack[i] = i;
				return stack[1];
				}

				int main(int argc, char **argv) {
				int volatile static_mem[8000];
				for (size_t i = 0; i < argc * sizeof(static_mem) / sizeof(static_mem[0]); ++i)
				static_mem[i] = argc * i;

				int vla[argc];
				memset(&vla[0], 0, argc);

				int index = large_stack();

				// also check allocation of 0 size
				volatile void *mem = __builtin_alloca(argc - 1);

				int volatile dyn_mem = alloca(sizeof(static_mem) argc);
				for (size_t i = 0; i < argc * sizeof(static_mem) / sizeof(static_mem[0]); ++i)
				dyn_mem[i] = argc * i;

				return static_mem[(7999 * argc) / 2] - dyn_mem[(7999 * argc) / 2] + vla[argc - index];
				}

clang/test/Driver/stack-clash-protection.c

This file was added.

				// RUN: %clang -target i386-unknown-linux -fstack-clash-protection -### %s 2>&1 \| FileCheck %s -check-prefix=SCP-i386
				// SCP-i386: "-stack-clash-protection"

				// RUN: %clang -target x86_64-scei-ps4 -fstack-clash-protection -### %s 2>&1 \| FileCheck %s -check-prefix=SCP-x86
				// SCP-x86: "-stack-clash-protection"

				// RUN: %clang -target armv7k-apple-watchos2.0 -fstack-clash-protection -### %s 2>&1 \| FileCheck %s -check-prefix=SCP-armv7
				// SCP-armv7-NOT: "-stack-clash-protection"

				// RUN: %clang -target x86_64-unknown-linux -fstack-clash-protection -c %s 2>&1 \| FileCheck %s -check-prefix=SCP-warn
				// SCP-warn: warning: Unable to protect inline asm that clobbers stack pointer against stack clash

				int foo(int c) {
				int r;
				__asm__("sub %0, %%rsp"
				:
				: "rm"(c)
				: "rsp");
				__asm__("mov %%rsp, %0"
				: "=rm"(r)::);
				return r;
				}

llvm/docs/ReleaseNotes.rst

Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	* v32i8 and v64i8 vectors with AVX512F enabled, but AVX512BW disabled will now
be passed in ZMM registers for calls and returns. Previously they were passed		be passed in ZMM registers for calls and returns. Previously they were passed
in two YMM registers. Old behavior can be enabled by passing		in two YMM registers. Old behavior can be enabled by passing
-x86-enable-old-knl-abi		-x86-enable-old-knl-abi
* -mprefer-vector-width=256 is now the default behavior skylake-avx512 and later		* -mprefer-vector-width=256 is now the default behavior skylake-avx512 and later
Intel CPUs. This tries to limit the use of 512-bit registers which can cause a		Intel CPUs. This tries to limit the use of 512-bit registers which can cause a
decrease in CPU frequency on these CPUs. This can be re-enabled by passing		decrease in CPU frequency on these CPUs. This can be re-enabled by passing
-mprefer-vector-width=512 to clang or passing -mattr=-prefer-256-bit to llc.		-mprefer-vector-width=512 to clang or passing -mattr=-prefer-256-bit to llc.

		* Functions with the probe-stack attribute set to "inline-asm" are now protected
		against stack clash without the need of a third-party probing function and
		with limited impact on performance.

Changes to the AMDGPU Target		Changes to the AMDGPU Target
-----------------------------		-----------------------------

Changes to the AVR Target		Changes to the AVR Target
-----------------------------		-----------------------------

During this release ...		During this release ...

▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 1,653 Lines • ▼ Show 20 Lines	Value *getDefaultSafeStackPointerLocation(IRBuilder<> &IRB,
bool UseTLS) const;		bool UseTLS) const;

public:		public:
/// Returns the target-specific address of the unsafe stack pointer.		/// Returns the target-specific address of the unsafe stack pointer.
virtual Value *getSafeStackPointerLocation(IRBuilder<> &IRB) const;		virtual Value *getSafeStackPointerLocation(IRBuilder<> &IRB) const;

/// Returns the name of the symbol used to emit stack probes or the empty		/// Returns the name of the symbol used to emit stack probes or the empty
/// string if not applicable.		/// string if not applicable.
		virtual bool hasStackProbeSymbol(MachineFunction &MF) const { return false; }

		virtual bool hasInlineStackProbe(MachineFunction &MF) const { return false; }

virtual StringRef getStackProbeSymbolName(MachineFunction &MF) const {		virtual StringRef getStackProbeSymbolName(MachineFunction &MF) const {
return "";		return "";
}		}

/// Returns true if a cast between SrcAS and DestAS is a noop.		/// Returns true if a cast between SrcAS and DestAS is a noop.
virtual bool isNoopAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const {		virtual bool isNoopAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const {
return false;		return false;
}		}
▲ Show 20 Lines • Show All 2,607 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86CallFrameOptimization.cpp

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	bool X86CallFrameOptimization::isLegal(MachineFunction &MF) {
// the same block, and, for good measure, that there are no nested frames.		// the same block, and, for good measure, that there are no nested frames.
//		//
// If any call allocates more argument stack memory than the stack		// If any call allocates more argument stack memory than the stack
// probe size, don't do this optimization. Otherwise, this pass		// probe size, don't do this optimization. Otherwise, this pass
// would need to synthesize additional stack probe calls to allocate		// would need to synthesize additional stack probe calls to allocate
// memory for arguments.		// memory for arguments.
unsigned FrameSetupOpcode = TII->getCallFrameSetupOpcode();		unsigned FrameSetupOpcode = TII->getCallFrameSetupOpcode();
unsigned FrameDestroyOpcode = TII->getCallFrameDestroyOpcode();		unsigned FrameDestroyOpcode = TII->getCallFrameDestroyOpcode();
bool UseStackProbe =		bool EmitStackProbeCall = STI->getTargetLowering()->hasStackProbeSymbol(MF);
!STI->getTargetLowering()->getStackProbeSymbolName(MF).empty();
unsigned StackProbeSize = STI->getTargetLowering()->getStackProbeSize(MF);		unsigned StackProbeSize = STI->getTargetLowering()->getStackProbeSize(MF);
for (MachineBasicBlock &BB : MF) {		for (MachineBasicBlock &BB : MF) {
bool InsideFrameSequence = false;		bool InsideFrameSequence = false;
for (MachineInstr &MI : BB) {		for (MachineInstr &MI : BB) {
if (MI.getOpcode() == FrameSetupOpcode) {		if (MI.getOpcode() == FrameSetupOpcode) {
if (TII->getFrameSize(MI) >= StackProbeSize && UseStackProbe)		if (TII->getFrameSize(MI) >= StackProbeSize && EmitStackProbeCall)
return false;		return false;
if (InsideFrameSequence)		if (InsideFrameSequence)
return false;		return false;
InsideFrameSequence = true;		InsideFrameSequence = true;
} else if (MI.getOpcode() == FrameDestroyOpcode) {		} else if (MI.getOpcode() == FrameDestroyOpcode) {
if (!InsideFrameSequence)		if (!InsideFrameSequence)
return false;		return false;
InsideFrameSequence = false;		InsideFrameSequence = false;
▲ Show 20 Lines • Show All 458 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86FrameLowering.h

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	public:
int mergeSPUpdates(MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI,		int mergeSPUpdates(MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI,
bool doMergeWithPrevious) const;		bool doMergeWithPrevious) const;

/// Emit a series of instructions to increment / decrement the stack		/// Emit a series of instructions to increment / decrement the stack
/// pointer by a constant value.		/// pointer by a constant value.
void emitSPUpdate(MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI,		void emitSPUpdate(MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI,
const DebugLoc &DL, int64_t NumBytes, bool InEpilogue) const;		const DebugLoc &DL, int64_t NumBytes, bool InEpilogue) const;

		std::tuple<MachineBasicBlock::iterator, int64_t>
		findFreeStackProbe(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
		int64_t MinOffset, int64_t MaxOffset) const;

/// Check that LEA can be used on SP in an epilogue sequence for \p MF.		/// Check that LEA can be used on SP in an epilogue sequence for \p MF.
bool canUseLEAForSPInEpilogue(const MachineFunction &MF) const;		bool canUseLEAForSPInEpilogue(const MachineFunction &MF) const;

/// Check whether or not the given \p MBB can be used as a prologue		/// Check whether or not the given \p MBB can be used as a prologue
/// for the target.		/// for the target.
/// The prologue will be inserted first in this basic block.		/// The prologue will be inserted first in this basic block.
/// This method is used by the shrink-wrapping pass to decide if		/// This method is used by the shrink-wrapping pass to decide if
/// \p MBB will be correctly handled by the target.		/// \p MBB will be correctly handled by the target.
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	private:
void emitStackProbeCall(MachineFunction &MF, MachineBasicBlock &MBB,		void emitStackProbeCall(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL,		MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
bool InProlog) const;		bool InProlog) const;

/// Emit target stack probe as an inline sequence.		/// Emit target stack probe as an inline sequence.
void emitStackProbeInline(MachineFunction &MF, MachineBasicBlock &MBB,		void emitStackProbeInline(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, bool InProlog) const;		const DebugLoc &DL, bool InProlog) const;
		void emitStackProbeInlineWindowsCoreCLR64(MachineFunction &MF,
		MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBI,
		const DebugLoc &DL,
		bool InProlog) const;
		void emitStackProbeInlineGeneric(MachineFunction &MF, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBI,
		const DebugLoc &DL, bool InProlog) const;

/// Emit a stub to later inline the target stack probe.		/// Emit a stub to later inline the target stack probe.
void emitStackProbeInlineStub(MachineFunction &MF, MachineBasicBlock &MBB,		MachineInstr *emitStackProbeInlineStub(MachineFunction &MF,
		MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, bool InProlog) const;		const DebugLoc &DL,
		bool InProlog) const;

/// Aligns the stack pointer by ANDing it with -MaxAlign.		/// Aligns the stack pointer by ANDing it with -MaxAlign.
void BuildStackAlignAND(MachineBasicBlock &MBB,		void BuildStackAlignAND(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL,		MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
unsigned Reg, uint64_t MaxAlign) const;		unsigned Reg, uint64_t MaxAlign) const;

/// Make small positive stack adjustments using POPs.		/// Make small positive stack adjustments using POPs.
bool adjustStackWithPops(MachineBasicBlock &MBB,		bool adjustStackWithPops(MachineBasicBlock &MBB,
Show All 22 Lines

llvm/lib/Target/X86/X86FrameLowering.cpp

Show All 11 Lines

#include "X86FrameLowering.h"		#include "X86FrameLowering.h"
#include "X86InstrBuilder.h"		#include "X86InstrBuilder.h"
#include "X86InstrInfo.h"		#include "X86InstrInfo.h"
#include "X86MachineFunctionInfo.h"		#include "X86MachineFunctionInfo.h"
#include "X86Subtarget.h"		#include "X86Subtarget.h"
#include "X86TargetMachine.h"		#include "X86TargetMachine.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/EHPersonalities.h"		#include "llvm/Analysis/EHPersonalities.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/WinEHFuncInfo.h"		#include "llvm/CodeGen/WinEHFuncInfo.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCSymbol.h"		#include "llvm/MC/MCSymbol.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
#include <cstdlib>		#include <cstdlib>

		#define DEBUG_TYPE "x86-fl"

		STATISTIC(NumFrameLoopProbe, "Number of loop stack probes used in prologue");
		STATISTIC(NumFrameFreeProbe, "Number of free stack probes used in prologue");
		STATISTIC(NumFrameExtraProbe,
		"Number of extra stack probes generated in prologue");

using namespace llvm;		using namespace llvm;

X86FrameLowering::X86FrameLowering(const X86Subtarget &STI,		X86FrameLowering::X86FrameLowering(const X86Subtarget &STI,
unsigned StackAlignOverride)		unsigned StackAlignOverride)
: TargetFrameLowering(StackGrowsDown, StackAlignOverride,		: TargetFrameLowering(StackGrowsDown, StackAlignOverride,
STI.is64Bit() ? -8 : -4),		STI.is64Bit() ? -8 : -4),
STI(STI), TII(*STI.getInstrInfo()), TRI(STI.getRegisterInfo()) {		STI(STI), TII(*STI.getInstrInfo()), TRI(STI.getRegisterInfo()) {
// Cache a bunch of frame-related predicates for this subtarget.		// Cache a bunch of frame-related predicates for this subtarget.
▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	flagsNeedToBePreservedBeforeTheTerminators(const MachineBasicBlock &MBB) {
// Check if they are live-out, that would imply we need to preserve them.		// Check if they are live-out, that would imply we need to preserve them.
for (const MachineBasicBlock *Succ : MBB.successors())		for (const MachineBasicBlock *Succ : MBB.successors())
if (Succ->isLiveIn(X86::EFLAGS))		if (Succ->isLiveIn(X86::EFLAGS))
return true;		return true;

return false;		return false;
}		}

		/// Maps an opcode that performs a memory operation to the operand holding the
		/// memory reference, or return -1 if opcode is not supported.
		static int OpcodeToSPOperandIndex(unsigned Opcode) {
		switch (Opcode) {
		case X86::MOV8mi:
		case X86::MOV16mi:
		case X86::MOV32mi:
		case X86::MOV64mi32:
		return 3;
		case X86::MOV8mr:
		case X86::MOV16mr:
		case X86::MOV32mr:
		case X86::MOV64mr:
		return 3;
		default:
		return -1;
		}
		};

		/// findFreeStackProbe - traverse instruction from @p MBBI to the end of @p MBB,
		/// stoping when either of the following happens first:
		///
		/// - a MachineInstruction writing to the stack at an offset statically known
		/// to be in [MinOffset, MaxOffset[ is found, and return an iterator to that
		/// instruction and the offset of the Probe, (ProbeIterator, ProbeOffset)
		/// - a MachineInstruction reading or writing to the stack at an offset not
		/// statically known is found, and return (MBBI.end(), -1).
		/// - none of the above happen, and return (MBBI.end(), -1).
		std::tuple<MachineBasicBlock::iterator, int64_t>
		X86FrameLowering::findFreeStackProbe(MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBI,
		int64_t MinOffset,
		int64_t MaxOffset) const {

		for (; MBBI != MBB.end(); ++MBBI) {
		MachineInstr &MI = *MBBI;
		auto OperandIndex = OpcodeToSPOperandIndex(MI.getOpcode());
		if (OperandIndex != -1) {
		if (MI.isCall())
		return {MBBI, MaxOffset};
		auto &Dst = MI.getOperand(0);
		if (!Dst.isFI())
		continue;
		auto &MOOffset = MI.getOperand(OperandIndex);
		if (MOOffset.isImm()) {
		int64_t Offset = MOOffset.getImm();
		if (MinOffset <= Offset && Offset < MaxOffset) {
		return {MBBI, Offset};
		}
		continue;
		}
		// don't know where we write, stopping
		break;
		}
		if (std::any_of(MI.operands_begin(), MI.operands_end(),
		[](MachineOperand &MO) { return MO.isFI(); })) {
		break; // effect on stack pointer not modelled, stopping
		}
		}
		return {MBB.end(), -1};
		}

/// emitSPUpdate - Emit a series of instructions to increment / decrement the		/// emitSPUpdate - Emit a series of instructions to increment / decrement the
/// stack pointer by a constant value.		/// stack pointer by a constant value.
void X86FrameLowering::emitSPUpdate(MachineBasicBlock &MBB,		void X86FrameLowering::emitSPUpdate(MachineBasicBlock &MBB,
MachineBasicBlock::iterator &MBBI,		MachineBasicBlock::iterator &MBBI,
const DebugLoc &DL,		const DebugLoc &DL,
int64_t NumBytes, bool InEpilogue) const {		int64_t NumBytes, bool InEpilogue) const {
bool isSub = NumBytes < 0;		bool isSub = NumBytes < 0;
uint64_t Offset = isSub ? -NumBytes : NumBytes;		uint64_t Offset = isSub ? -NumBytes : NumBytes;
MachineInstr::MIFlag Flag =		MachineInstr::MIFlag Flag =
isSub ? MachineInstr::FrameSetup : MachineInstr::FrameDestroy;		isSub ? MachineInstr::FrameSetup : MachineInstr::FrameDestroy;

uint64_t Chunk = (1LL << 31) - 1;		uint64_t Chunk = (1LL << 31) - 1;

if (Offset > Chunk) {		MachineFunction &MF = *MBB.getParent();
		const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();
		const X86TargetLowering &TLI = *STI.getTargetLowering();
		const bool EmitInlineStackProbe = TLI.hasInlineStackProbe(MF);

		if (EmitInlineStackProbe && !InEpilogue) {
		MachineInstr *Stub = emitStackProbeInlineStub(MF, MBB, MBBI, DL, true);
		LLVMContext &Context = MF.getFunction().getContext();
		MachineInstrBuilder(MF, Stub).addMetadata(
		MDTuple::get(Context, {ConstantAsMetadata::get(ConstantInt::getSigned(
		IntegerType::get(Context, 64), Offset))}));
		return;
		} else if (Offset > Chunk) {
// Rather than emit a long series of instructions for large offsets,		// Rather than emit a long series of instructions for large offsets,
// load the offset into a register and do one sub/add		// load the offset into a register and do one sub/add
unsigned Reg = 0;		unsigned Reg = 0;
unsigned Rax = (unsigned)(Is64Bit ? X86::RAX : X86::EAX);		unsigned Rax = (unsigned)(Is64Bit ? X86::RAX : X86::EAX);

if (isSub && !isEAXLiveIn(MBB))		if (isSub && !isEAXLiveIn(MBB))
Reg = Rax;		Reg = Rax;
else		else
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	while (Offset) {

BuildStackAdjustment(MBB, MBBI, DL, isSub ? -ThisVal : ThisVal, InEpilogue)		BuildStackAdjustment(MBB, MBBI, DL, isSub ? -ThisVal : ThisVal, InEpilogue)
.setMIFlag(Flag);		.setMIFlag(Flag);

Offset -= ThisVal;		Offset -= ThisVal;
}		}
}		}

MachineInstrBuilder X86FrameLowering::BuildStackAdjustment(		MachineInstrBuilder X86FrameLowering::BuildStackAdjustment(
		rnkUnsubmitted Done Reply Inline Actions Please no dynamically initialized globals. The LLVM-y thing would be to use a switch. rnk: Please no dynamically initialized globals. The LLVM-y thing would be to use a switch.
MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,		MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, int64_t Offset, bool InEpilogue) const {		const DebugLoc &DL, int64_t Offset, bool InEpilogue) const {
assert(Offset != 0 && "zero offset stack adjustment requested");		assert(Offset != 0 && "zero offset stack adjustment requested");

// On Atom, using LEA to adjust SP is preferred, but using it in the epilogue		// On Atom, using LEA to adjust SP is preferred, but using it in the epilogue
// is tricky.		// is tricky.
bool UseLEA;		bool UseLEA;
if (!InEpilogue) {		if (!InEpilogue) {
Show All 21 Lines	MachineInstrBuilder X86FrameLowering::BuildStackAdjustment(
if (UseLEA) {		if (UseLEA) {
MI = addRegOffset(BuildMI(MBB, MBBI, DL,		MI = addRegOffset(BuildMI(MBB, MBBI, DL,
TII.get(getLEArOpcode(Uses64BitFramePtr)),		TII.get(getLEArOpcode(Uses64BitFramePtr)),
StackPtr),		StackPtr),
StackPtr, false, Offset);		StackPtr, false, Offset);
} else {		} else {
bool IsSub = Offset < 0;		bool IsSub = Offset < 0;
uint64_t AbsOffset = IsSub ? -Offset : Offset;		uint64_t AbsOffset = IsSub ? -Offset : Offset;
unsigned Opc = IsSub ? getSUBriOpcode(Uses64BitFramePtr, AbsOffset)		const unsigned Opc = IsSub ? getSUBriOpcode(Uses64BitFramePtr, AbsOffset)
: getADDriOpcode(Uses64BitFramePtr, AbsOffset);		: getADDriOpcode(Uses64BitFramePtr, AbsOffset);
MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)		MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)
.addReg(StackPtr)		.addReg(StackPtr)
.addImm(AbsOffset);		.addImm(AbsOffset);
MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.		MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.
}		}
return MI;		return MI;
}		}

int X86FrameLowering::mergeSPUpdates(MachineBasicBlock &MBB,		int X86FrameLowering::mergeSPUpdates(MachineBasicBlock &MBB,
MachineBasicBlock::iterator &MBBI,		MachineBasicBlock::iterator &MBBI,
bool doMergeWithPrevious) const {		bool doMergeWithPrevious) const {
if ((doMergeWithPrevious && MBBI == MBB.begin()) \|\|		if ((doMergeWithPrevious && MBBI == MBB.begin()) \|\|
(!doMergeWithPrevious && MBBI == MBB.end()))		(!doMergeWithPrevious && MBBI == MBB.end()))
return 0;		return 0;

		efriedmaUnsubmitted Done Reply Inline Actions Why is this code in a different location from the stack probing code that generates a call? efriedma: Why is this code in a different location from the stack probing code that generates a call?
		serge-sans-pailleAuthorUnsubmitted Done Reply Inline Actions Because `BuildStackAdjustment` has more callsites than just emitPrologue and we want to capture all stack manipulation. serge-sans-paille: Because `BuildStackAdjustment` has more callsites than just emitPrologue and we want to capture…
		rnkUnsubmitted Not Done Reply Inline Actions If you care about those cases, we should have tests for all of them. These are all the cases: TailCallReturnAddrDelta: When doing guaranteed TCO for a callee with >4K of argument memory eliminateCallFramePseudoInstr: When doing stack adjustments to set up a call with more than 4K of arguments Both of these cases involve setting up arguments to calls, and I think we can guarantee that the compiler will emit writes to every argument stack slot. So, to set up one of these cases, we'd have this code: struct EightK { int a[2048]; }; void passEightK(EightK); void foo() { EightK x; memset(&x, 0, sizeof(x)); passEightK(x); // some targets will pass indirect, 32-bit probably uses byval. } In this case, LLVM happens to use rep;movsl to copy the argument memory. It could use memcpy, but either way, it will probe all those bytes. I think that means it's safe to move to emitSPUpdate. I would greatly prefer that `BuildStackAdjustment` remain a simple primitive that generates a single instruction. rnk: If you care about those cases, we should have tests for all of them. These are all the cases: 1.
MachineBasicBlock::iterator PI = doMergeWithPrevious ? std::prev(MBBI) : MBBI;		MachineBasicBlock::iterator PI = doMergeWithPrevious ? std::prev(MBBI) : MBBI;

PI = skipDebugInstructionsBackward(PI, MBB.begin());		PI = skipDebugInstructionsBackward(PI, MBB.begin());
// It is assumed that ADD/SUB/LEA instruction is succeded by one CFI		// It is assumed that ADD/SUB/LEA instruction is succeded by one CFI
// instruction, and that there are no DBG_VALUE or other instructions between		// instruction, and that there are no DBG_VALUE or other instructions between
// ADD/SUB/LEA and its corresponding CFI instruction.		// ADD/SUB/LEA and its corresponding CFI instruction.
/* TODO: Add support for the case where there are multiple CFI instructions		/* TODO: Add support for the case where there are multiple CFI instructions
below the ADD/SUB/LEA, e.g.:		below the ADD/SUB/LEA, e.g.:
		efriedmaUnsubmitted Done Reply Inline Actions This algorithm needs some documentation; it isn't at all obvious what it's doing. Particularly the interaction with "free" stack probes. Should we generate a loop if the stack frame is large? efriedma: This algorithm needs some documentation; it isn't at all obvious what it's doing. Particularly…
		serge-sans-pailleAuthorUnsubmitted Done Reply Inline Actions Should we generate a loop if the stack frame is large? That's an option. it's more complex to make looping compatible with free probes, but that's possible. serge-sans-paille: > Should we generate a loop if the stack frame is large? That's an option. it's more complex…
		serge-sans-pailleAuthorUnsubmitted Done Reply Inline Actions This algorithm needs some documentation Done serge-sans-paille: > This algorithm needs some documentation Done
...		...
add		add
cfi_def_cfa_offset		cfi_def_cfa_offset
cfi_offset		cfi_offset
...		...
*/		*/
if (doMergeWithPrevious && PI != MBB.begin() && PI->isCFIInstruction())		if (doMergeWithPrevious && PI != MBB.begin() && PI->isCFIInstruction())
PI = std::prev(PI);		PI = std::prev(PI);
		efriedmaUnsubmitted Done Reply Inline Actions Each probe has to have an offset of at most PageSize bytes from the previous probe. If each probe is exactly PageSize bytes away from the previous probe, that's fine. But it looks like you don't enforce the distance between free probes correctly? efriedma: Each probe has to have an offset of at most PageSize bytes from the previous probe. If each…

unsigned Opc = PI->getOpcode();		unsigned Opc = PI->getOpcode();
int Offset = 0;		int Offset = 0;

if ((Opc == X86::ADD64ri32 \|\| Opc == X86::ADD64ri8 \|\|		if ((Opc == X86::ADD64ri32 \|\| Opc == X86::ADD64ri8 \|\|
Opc == X86::ADD32ri \|\| Opc == X86::ADD32ri8) &&		Opc == X86::ADD32ri \|\| Opc == X86::ADD32ri8) &&
PI->getOperand(0).getReg() == StackPtr){		PI->getOperand(0).getReg() == StackPtr){
assert(PI->getOperand(1).getReg() == StackPtr);		assert(PI->getOperand(1).getReg() == StackPtr);
		xbolva00Unsubmitted Done Reply Inline Actions nit: llvm::any_of xbolva00: nit: llvm::any_of
Offset = PI->getOperand(2).getImm();		Offset = PI->getOperand(2).getImm();
} else if ((Opc == X86::LEA32r \|\| Opc == X86::LEA64_32r) &&		} else if ((Opc == X86::LEA32r \|\| Opc == X86::LEA64_32r) &&
		efriedmaUnsubmitted Done Reply Inline Actions There are instructions that don't refer to any FI, but are still relevant. For example, calls. efriedma: There are instructions that don't refer to any FI, but are still relevant. For example, calls.
PI->getOperand(0).getReg() == StackPtr &&		PI->getOperand(0).getReg() == StackPtr &&
PI->getOperand(1).getReg() == StackPtr &&		PI->getOperand(1).getReg() == StackPtr &&
PI->getOperand(2).getImm() == 1 &&		PI->getOperand(2).getImm() == 1 &&
PI->getOperand(3).getReg() == X86::NoRegister &&		PI->getOperand(3).getReg() == X86::NoRegister &&
PI->getOperand(5).getReg() == X86::NoRegister) {		PI->getOperand(5).getReg() == X86::NoRegister) {
// For LEAs we have: def = lea SP, FI, noreg, Offset, noreg.		// For LEAs we have: def = lea SP, FI, noreg, Offset, noreg.
Offset = PI->getOperand(4).getImm();		Offset = PI->getOperand(4).getImm();
} else if ((Opc == X86::SUB64ri32 \|\| Opc == X86::SUB64ri8 \|\|		} else if ((Opc == X86::SUB64ri32 \|\| Opc == X86::SUB64ri8 \|\|
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
}		}

void X86FrameLowering::emitStackProbeInline(MachineFunction &MF,		void X86FrameLowering::emitStackProbeInline(MachineFunction &MF,
MachineBasicBlock &MBB,		MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
const DebugLoc &DL,		const DebugLoc &DL,
bool InProlog) const {		bool InProlog) const {
const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();		const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();
		if (STI.isTargetWindowsCoreCLR() && STI.is64Bit())
		emitStackProbeInlineWindowsCoreCLR64(MF, MBB, MBBI, DL, InProlog);
		else
		emitStackProbeInlineGeneric(MF, MBB, MBBI, DL, InProlog);
		}

		void X86FrameLowering::emitStackProbeInlineGeneric(
		MachineFunction &MF, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool InProlog) const {
		MachineInstr &CallToInline = *std::prev(MBBI);
		uint64_t Offset =
		cast<ConstantInt>(
		cast<ConstantAsMetadata>(
		cast<MDTuple>(CallToInline.getOperand(1).getMetadata())
		->getOperand(0))
		->getValue())
		->getZExtValue();

		const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();
		const X86TargetLowering &TLI = *STI.getTargetLowering();
		assert(!(STI.is64Bit() && STI.isTargetWindowsCoreCLR()) &&
		"different expansion needed for CoreCLR 64 bit");
		const unsigned Opc = getSUBriOpcode(Uses64BitFramePtr, Offset);
		const unsigned MovMIOpc = Is64Bit ? X86::MOV64mi32 : X86::MOV32mi;
		const uint64_t StackProbeSize = TLI.getStackProbeSize(MF);
		uint64_t CurrentOffset = 0;
		// 0 Thanks to return address being saved on the stack
		uint64_t CurrentProbeOffset = 0;

		uint64_t ProbeChunk = StackProbeSize * 8;

		// Synthesize a loop
		if (Offset > ProbeChunk) {
		NumFrameLoopProbe++;
		const BasicBlock *LLVM_BB = MBB.getBasicBlock();

		MachineBasicBlock *testMBB = MF.CreateMachineBasicBlock(LLVM_BB);
		MachineBasicBlock *tailMBB = MF.CreateMachineBasicBlock(LLVM_BB);

		MachineFunction::iterator MBBIter = ++MBB.getIterator();
		MF.insert(MBBIter, testMBB);
		MF.insert(MBBIter, tailMBB);

		unsigned FinalStackPtr = Uses64BitFramePtr ? X86::R11 : X86::R11D;
		BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64rr), FinalStackPtr)
		.addReg(StackPtr)
		.setMIFlag(MachineInstr::FrameSetup);

		// save loop bound
		{
		const unsigned Opc = getSUBriOpcode(Uses64BitFramePtr, Offset);
		BuildMI(MBB, MBBI, DL, TII.get(Opc), FinalStackPtr)
		.addReg(FinalStackPtr)
		.addImm(Offset / StackProbeSize * StackProbeSize)
		.setMIFlag(MachineInstr::FrameSetup);
		}

		// allocate a page
		{
		const unsigned Opc = getSUBriOpcode(Uses64BitFramePtr, StackProbeSize);
		BuildMI(testMBB, DL, TII.get(Opc), StackPtr)
		.addReg(StackPtr)
		.addImm(StackProbeSize)
		.setMIFlag(MachineInstr::FrameSetup);
		}

		// touch the page
		const unsigned MovMIOpc = Is64Bit ? X86::MOV64mi32 : X86::MOV32mi;
		addRegOffset(BuildMI(testMBB, DL, TII.get(MovMIOpc))
		.setMIFlag(MachineInstr::FrameSetup),
		StackPtr, false, 0)
		.addImm(0)
		.setMIFlag(MachineInstr::FrameSetup);

		// cmp with stack pointer bound
		BuildMI(testMBB, DL, TII.get(IsLP64 ? X86::CMP64rr : X86::CMP32rr))
		.addReg(StackPtr)
		.addReg(FinalStackPtr)
		.setMIFlag(MachineInstr::FrameSetup);

		// jump
		BuildMI(testMBB, DL, TII.get(X86::JCC_1))
		.addMBB(testMBB)
		.addImm(X86::COND_NE)
		.setMIFlag(MachineInstr::FrameSetup);
		testMBB->addSuccessor(testMBB);
		testMBB->addSuccessor(tailMBB);

		// allocate a block and touch it

		tailMBB->splice(tailMBB->end(), &MBB, MBBI, MBB.end());
		tailMBB->transferSuccessorsAndUpdatePHIs(&MBB);
		MBB.addSuccessor(testMBB);

		if (Offset % StackProbeSize) {
		const unsigned Opc = getSUBriOpcode(Uses64BitFramePtr, Offset);
		BuildMI(*tailMBB, tailMBB->begin(), DL, TII.get(Opc), StackPtr)
		.addReg(StackPtr)
		.addImm(Offset % StackProbeSize)
		.setMIFlag(MachineInstr::FrameSetup);
		}
		} else {
		while (CurrentOffset < Offset) {
		uint64_t ChunkSize = std::min(Offset - CurrentOffset, StackProbeSize);
		MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)
		.addReg(StackPtr)
		.addImm(ChunkSize)
		.setMIFlag(MachineInstr::FrameSetup);
		CurrentOffset += ChunkSize;
		MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.

		// We need to touch the stack allocated by MI. Instead of generating
		// a probe, we try to reuse existing stack operation available in the
		// EntryBlock, starting from MI. findFreeStackProbe gives us such a
		// probe, if any and continuing allocation from that probe doesn't
		// conflict with instructions between MI and the free stack probe.

		const int64_t FreeProbeLowerBound = Offset - CurrentOffset;
		const int64_t FreeProbeUpperBound = FreeProbeLowerBound + StackProbeSize;
		int64_t FreeProbeOffset = -1;
		do {
		MachineBasicBlock::iterator FreeProbeIterator;
		std::tie(FreeProbeIterator, FreeProbeOffset) = findFreeStackProbe(
		MBB, MBBI, FreeProbeLowerBound, FreeProbeUpperBound);
		if (FreeProbeOffset != -1) {
		NumFrameFreeProbe++;
		MachineInstr &FreeProbe = *FreeProbeIterator;
		if (!FreeProbe.isCall()) {
		int OffsetIndex = OpcodeToSPOperandIndex(FreeProbe.getOpcode());
		assert(OffsetIndex >= 0 && "Supported Opcode");
		FreeProbe.getOperand(OffsetIndex)
		.setImm(FreeProbe.getOperand(OffsetIndex).getImm() -
		FreeProbeLowerBound);
		FreeProbe.setFlag(MachineInstr::FrameSetup);
		}
		CurrentProbeOffset = Offset - FreeProbeOffset;
		MBBI = std::next(FreeProbeIterator);
		}
		} while (FreeProbeOffset != -1);

		while (CurrentOffset - CurrentProbeOffset >= StackProbeSize) {
		NumFrameExtraProbe++;
		auto Shift =
		std::min(CurrentOffset - CurrentProbeOffset, StackProbeSize);
		addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(MovMIOpc))
		.setMIFlag(MachineInstr::FrameSetup),
		StackPtr, false, StackProbeSize - Shift)
		.addImm(0)
		.setMIFlag(MachineInstr::FrameSetup);
		CurrentProbeOffset += Shift;
		}
		}
		}
		}

		void X86FrameLowering::emitStackProbeInlineWindowsCoreCLR64(
		MachineFunction &MF, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool InProlog) const {
		const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();
assert(STI.is64Bit() && "different expansion needed for 32 bit");		assert(STI.is64Bit() && "different expansion needed for 32 bit");
assert(STI.isTargetWindowsCoreCLR() && "custom expansion expects CoreCLR");		assert(STI.isTargetWindowsCoreCLR() && "custom expansion expects CoreCLR");
const TargetInstrInfo &TII = *STI.getInstrInfo();		const TargetInstrInfo &TII = *STI.getInstrInfo();
const BasicBlock *LLVM_BB = MBB.getBasicBlock();		const BasicBlock *LLVM_BB = MBB.getBasicBlock();

// RAX contains the number of bytes of desired stack adjustment.		// RAX contains the number of bytes of desired stack adjustment.
// The handling here assumes this value has already been updated so as to		// The handling here assumes this value has already been updated so as to
// maintain stack alignment.		// maintain stack alignment.
▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	void X86FrameLowering::emitStackProbeCall(MachineFunction &MF,

if (InProlog) {		if (InProlog) {
// Apply the frame setup flag to all inserted instrs.		// Apply the frame setup flag to all inserted instrs.
for (++ExpansionMBBI; ExpansionMBBI != MBBI; ++ExpansionMBBI)		for (++ExpansionMBBI; ExpansionMBBI != MBBI; ++ExpansionMBBI)
ExpansionMBBI->setFlag(MachineInstr::FrameSetup);		ExpansionMBBI->setFlag(MachineInstr::FrameSetup);
}		}
}		}

void X86FrameLowering::emitStackProbeInlineStub(		MachineInstr *X86FrameLowering::emitStackProbeInlineStub(
MachineFunction &MF, MachineBasicBlock &MBB,		MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool InProlog) const {		MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool InProlog) const {

assert(InProlog && "ChkStkStub called outside prolog!");		assert(InProlog && "ChkStkStub called outside prolog!");

BuildMI(MBB, MBBI, DL, TII.get(X86::CALLpcrel32))		return BuildMI(MBB, MBBI, DL, TII.get(X86::CALLpcrel32))
.addExternalSymbol("__chkstk_stub");		.addExternalSymbol("__chkstk_stub");
}		}

static unsigned calculateSetFPREG(uint64_t SPAdjust) {		static unsigned calculateSetFPREG(uint64_t SPAdjust) {
// Win64 ABI has a less restrictive limitation of 240; 128 works equally well		// Win64 ABI has a less restrictive limitation of 240; 128 works equally well
// and might require smaller successive adjustments.		// and might require smaller successive adjustments.
const uint64_t Win64MaxSEHOffset = 128;		const uint64_t Win64MaxSEHOffset = 128;
uint64_t SEHFrameOffset = std::min(SPAdjust, Win64MaxSEHOffset);		uint64_t SEHFrameOffset = std::min(SPAdjust, Win64MaxSEHOffset);
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	void X86FrameLowering::emitPrologue(MachineFunction &MF,
int TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();		int TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();
if (TailCallReturnAddrDelta && IsWin64Prologue)		if (TailCallReturnAddrDelta && IsWin64Prologue)
report_fatal_error("Can't handle guaranteed tail call under win64 yet");		report_fatal_error("Can't handle guaranteed tail call under win64 yet");

if (TailCallReturnAddrDelta < 0)		if (TailCallReturnAddrDelta < 0)
X86FI->setCalleeSavedFrameSize(		X86FI->setCalleeSavedFrameSize(
X86FI->getCalleeSavedFrameSize() - TailCallReturnAddrDelta);		X86FI->getCalleeSavedFrameSize() - TailCallReturnAddrDelta);

bool UseStackProbe = !STI.getTargetLowering()->getStackProbeSymbolName(MF).empty();		const bool EmitStackProbeCall =
		STI.getTargetLowering()->hasStackProbeSymbol(MF);
unsigned StackProbeSize = STI.getTargetLowering()->getStackProbeSize(MF);		unsigned StackProbeSize = STI.getTargetLowering()->getStackProbeSize(MF);

// Re-align the stack on 64-bit if the x86-interrupt calling convention is		// Re-align the stack on 64-bit if the x86-interrupt calling convention is
// used and an error code was pushed, since the x86-64 ABI requires a 16-byte		// used and an error code was pushed, since the x86-64 ABI requires a 16-byte
// stack alignment.		// stack alignment.
if (Fn.getCallingConv() == CallingConv::X86_INTR && Is64Bit &&		if (Fn.getCallingConv() == CallingConv::X86_INTR && Is64Bit &&
Fn.arg_size() == 2) {		Fn.arg_size() == 2) {
StackSize += 8;		StackSize += 8;
MFI.setStackSize(StackSize);		MFI.setStackSize(StackSize);
emitSPUpdate(MBB, MBBI, DL, -8, /InEpilogue=/false);		emitSPUpdate(MBB, MBBI, DL, -8, /InEpilogue=/false);
}		}

// If this is x86-64 and the Red Zone is not disabled, if we are a leaf		// If this is x86-64 and the Red Zone is not disabled, if we are a leaf
// function, and use up to 128 bytes of stack space, don't have a frame		// function, and use up to 128 bytes of stack space, don't have a frame
// pointer, calls, or dynamic alloca then we do not need to adjust the		// pointer, calls, or dynamic alloca then we do not need to adjust the
// stack pointer (we fit in the Red Zone). We also check that we don't		// stack pointer (we fit in the Red Zone). We also check that we don't
// push and pop from the stack.		// push and pop from the stack.
if (has128ByteRedZone(MF) &&		if (has128ByteRedZone(MF) && !TRI->needsStackRealignment(MF) &&
!TRI->needsStackRealignment(MF) &&
!MFI.hasVarSizedObjects() && // No dynamic alloca.		!MFI.hasVarSizedObjects() && // No dynamic alloca.
!MFI.adjustsStack() && // No calls.		!MFI.adjustsStack() && // No calls.
!UseStackProbe && // No stack probes.		!EmitStackProbeCall && // No stack probes.
!MFI.hasCopyImplyingStackAdjustment() && // Don't push and pop.		!MFI.hasCopyImplyingStackAdjustment() && // Don't push and pop.
!MF.shouldSplitStack()) { // Regular stack		!MF.shouldSplitStack()) { // Regular stack
uint64_t MinSize = X86FI->getCalleeSavedFrameSize();		uint64_t MinSize = X86FI->getCalleeSavedFrameSize();
if (HasFP) MinSize += SlotSize;		if (HasFP) MinSize += SlotSize;
X86FI->setUsesRedZone(MinSize > 0 \|\| StackSize > 0);		X86FI->setUsesRedZone(MinSize > 0 \|\| StackSize > 0);
StackSize = std::max(MinSize, StackSize > 128 ? StackSize - 128 : 0);		StackSize = std::max(MinSize, StackSize > 128 ? StackSize - 128 : 0);
MFI.setStackSize(StackSize);		MFI.setStackSize(StackSize);
}		}
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	void X86FrameLowering::emitPrologue(MachineFunction &MF,
// stack and adjust the stack pointer in one go. The 64-bit version of		// stack and adjust the stack pointer in one go. The 64-bit version of
// __chkstk is only responsible for probing the stack. The 64-bit prologue is		// __chkstk is only responsible for probing the stack. The 64-bit prologue is
// responsible for adjusting the stack pointer. Touching the stack at 4K		// responsible for adjusting the stack pointer. Touching the stack at 4K
// increments is necessary to ensure that the guard pages used by the OS		// increments is necessary to ensure that the guard pages used by the OS
// virtual memory manager are allocated in correct sequence.		// virtual memory manager are allocated in correct sequence.
uint64_t AlignedNumBytes = NumBytes;		uint64_t AlignedNumBytes = NumBytes;
if (IsWin64Prologue && !IsFunclet && TRI->needsStackRealignment(MF))		if (IsWin64Prologue && !IsFunclet && TRI->needsStackRealignment(MF))
AlignedNumBytes = alignTo(AlignedNumBytes, MaxAlign);		AlignedNumBytes = alignTo(AlignedNumBytes, MaxAlign);
if (AlignedNumBytes >= StackProbeSize && UseStackProbe) {		if (AlignedNumBytes >= StackProbeSize && EmitStackProbeCall) {
assert(!X86FI->getUsesRedZone() &&		assert(!X86FI->getUsesRedZone() &&
"The Red Zone is not accounted for in stack probes");		"The Red Zone is not accounted for in stack probes");

// Check whether EAX is livein for this block.		// Check whether EAX is livein for this block.
bool isEAXAlive = isEAXLiveIn(MBB);		bool isEAXAlive = isEAXLiveIn(MBB);

if (isEAXAlive) {		if (isEAXAlive) {
if (Is64Bit) {		if (Is64Bit) {
▲ Show 20 Lines • Show All 1,976 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 531 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
// Windows's _chkstk call to do stack probing.		// Windows's _chkstk call to do stack probing.
WIN_ALLOCA,		WIN_ALLOCA,

// For allocating variable amounts of stack space when using		// For allocating variable amounts of stack space when using
// segmented stacks. Check if the current stacklet has enough space, and		// segmented stacks. Check if the current stacklet has enough space, and
// falls back to heap allocation if not.		// falls back to heap allocation if not.
SEG_ALLOCA,		SEG_ALLOCA,

		// For allocating stack space when using stack clash protector.
		// Allocation is performed by block, and each block is probed.
		PROBED_ALLOCA,

// Memory barriers.		// Memory barriers.
MEMBARRIER,		MEMBARRIER,
MFENCE,		MFENCE,

// Store FP status word into i16 register.		// Store FP status word into i16 register.
FNSTSW16r,		FNSTSW16r,

// Store contents of %ah into %eflags.		// Store contents of %ah into %eflags.
▲ Show 20 Lines • Show All 662 Lines • ▼ Show 20 Lines	public:
unsigned getVectorTypeBreakdownForCallingConv(		unsigned getVectorTypeBreakdownForCallingConv(
LLVMContext &Context, CallingConv::ID CC, EVT VT, EVT &IntermediateVT,		LLVMContext &Context, CallingConv::ID CC, EVT VT, EVT &IntermediateVT,
unsigned &NumIntermediates, MVT &RegisterVT) const override;		unsigned &NumIntermediates, MVT &RegisterVT) const override;

bool isIntDivCheap(EVT VT, AttributeList Attr) const override;		bool isIntDivCheap(EVT VT, AttributeList Attr) const override;

bool supportSwiftError() const override;		bool supportSwiftError() const override;

		bool hasStackProbeSymbol(MachineFunction &MF) const override;
		bool hasInlineStackProbe(MachineFunction &MF) const override;
StringRef getStackProbeSymbolName(MachineFunction &MF) const override;		StringRef getStackProbeSymbolName(MachineFunction &MF) const override;

unsigned getStackProbeSize(MachineFunction &MF) const;		unsigned getStackProbeSize(MachineFunction &MF) const;

bool hasVectorBlend() const override { return true; }		bool hasVectorBlend() const override { return true; }

unsigned getMaxSupportedInterleaveFactor() const override { return 4; }		unsigned getMaxSupportedInterleaveFactor() const override { return 4; }

▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	MachineBasicBlock *EmitLoweredCatchRet(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock *EmitLoweredCatchPad(MachineInstr &MI,		MachineBasicBlock *EmitLoweredCatchPad(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock *EmitLoweredSegAlloca(MachineInstr &MI,		MachineBasicBlock *EmitLoweredSegAlloca(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

		MachineBasicBlock *EmitLoweredProbedAlloca(MachineInstr &MI,
		MachineBasicBlock *BB) const;

MachineBasicBlock *EmitLoweredTLSAddr(MachineInstr &MI,		MachineBasicBlock *EmitLoweredTLSAddr(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock *EmitLoweredTLSCall(MachineInstr &MI,		MachineBasicBlock *EmitLoweredTLSCall(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;

MachineBasicBlock *EmitLoweredRetpoline(MachineInstr &MI,		MachineBasicBlock *EmitLoweredRetpoline(MachineInstr &MI,
MachineBasicBlock *BB) const;		MachineBasicBlock *BB) const;
▲ Show 20 Lines • Show All 238 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 22,138 Lines • ▼ Show 20 Lines
// bytes in one go. Touching the stack at 4K increments is necessary to ensure		// bytes in one go. Touching the stack at 4K increments is necessary to ensure
// that the guard pages used by the OS virtual memory manager are allocated in		// that the guard pages used by the OS virtual memory manager are allocated in
// correct sequence.		// correct sequence.
SDValue		SDValue
X86TargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op,		X86TargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
bool SplitStack = MF.shouldSplitStack();		bool SplitStack = MF.shouldSplitStack();
bool EmitStackProbe = !getStackProbeSymbolName(MF).empty();		bool EmitStackProbeCall = hasStackProbeSymbol(MF);
bool Lower = (Subtarget.isOSWindows() && !Subtarget.isTargetMachO()) \|\|		bool Lower = (Subtarget.isOSWindows() && !Subtarget.isTargetMachO()) \|\|
SplitStack \|\| EmitStackProbe;		SplitStack \|\| EmitStackProbeCall;
SDLoc dl(Op);		SDLoc dl(Op);

// Get the inputs.		// Get the inputs.
SDNode *Node = Op.getNode();		SDNode *Node = Op.getNode();
SDValue Chain = Op.getOperand(0);		SDValue Chain = Op.getOperand(0);
SDValue Size = Op.getOperand(1);		SDValue Size = Op.getOperand(1);
unsigned Align = Op.getConstantOperandVal(2);		unsigned Align = Op.getConstantOperandVal(2);
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);

// Chain the dynamic stack allocation so that it doesn't modify the stack		// Chain the dynamic stack allocation so that it doesn't modify the stack
// pointer when other instructions are using the stack.		// pointer when other instructions are using the stack.
Chain = DAG.getCALLSEQ_START(Chain, 0, 0, dl);		Chain = DAG.getCALLSEQ_START(Chain, 0, 0, dl);

bool Is64Bit = Subtarget.is64Bit();		bool Is64Bit = Subtarget.is64Bit();
MVT SPTy = getPointerTy(DAG.getDataLayout());		MVT SPTy = getPointerTy(DAG.getDataLayout());

SDValue Result;		SDValue Result;
if (!Lower) {		if (!Lower) {
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
unsigned SPReg = TLI.getStackPointerRegisterToSaveRestore();		unsigned SPReg = TLI.getStackPointerRegisterToSaveRestore();
assert(SPReg && "Target cannot require DYNAMIC_STACKALLOC expansion and"		assert(SPReg && "Target cannot require DYNAMIC_STACKALLOC expansion and"
" not tell us which reg is the stack pointer!");		" not tell us which reg is the stack pointer!");

SDValue SP = DAG.getCopyFromReg(Chain, dl, SPReg, VT);
Chain = SP.getValue(1);
const TargetFrameLowering &TFI = *Subtarget.getFrameLowering();		const TargetFrameLowering &TFI = *Subtarget.getFrameLowering();
unsigned StackAlign = TFI.getStackAlignment();		unsigned StackAlign = TFI.getStackAlignment();
		if (hasInlineStackProbe(MF)) {
		MachineRegisterInfo &MRI = MF.getRegInfo();

		const TargetRegisterClass *AddrRegClass = getRegClassFor(SPTy);
		Register Vreg = MRI.createVirtualRegister(AddrRegClass);
		Chain = DAG.getCopyToReg(Chain, dl, Vreg, Size);
		Result = DAG.getNode(X86ISD::PROBED_ALLOCA, dl, SPTy, Chain,
		DAG.getRegister(Vreg, SPTy));
		} else {
		SDValue SP = DAG.getCopyFromReg(Chain, dl, SPReg, VT);
		Chain = SP.getValue(1);
Result = DAG.getNode(ISD::SUB, dl, VT, SP, Size); // Value		Result = DAG.getNode(ISD::SUB, dl, VT, SP, Size); // Value
		}
if (Align > StackAlign)		if (Align > StackAlign)
Result = DAG.getNode(ISD::AND, dl, VT, Result,		Result = DAG.getNode(ISD::AND, dl, VT, Result,
DAG.getConstant(-(uint64_t)Align, dl, VT));		DAG.getConstant(-(uint64_t)Align, dl, VT));
Chain = DAG.getCopyToReg(Chain, dl, SPReg, Result); // Output chain		Chain = DAG.getCopyToReg(Chain, dl, SPReg, Result); // Output chain
} else if (SplitStack) {		} else if (SplitStack) {
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();

if (Is64Bit) {		if (Is64Bit) {
▲ Show 20 Lines • Show All 6,511 Lines • ▼ Show 20 Lines	const char *X86TargetLowering::getTargetNodeName(unsigned Opcode) const {
case X86ISD::PSADBW: return "X86ISD::PSADBW";		case X86ISD::PSADBW: return "X86ISD::PSADBW";
case X86ISD::DBPSADBW: return "X86ISD::DBPSADBW";		case X86ISD::DBPSADBW: return "X86ISD::DBPSADBW";
case X86ISD::VASTART_SAVE_XMM_REGS: return "X86ISD::VASTART_SAVE_XMM_REGS";		case X86ISD::VASTART_SAVE_XMM_REGS: return "X86ISD::VASTART_SAVE_XMM_REGS";
case X86ISD::VAARG_64: return "X86ISD::VAARG_64";		case X86ISD::VAARG_64: return "X86ISD::VAARG_64";
case X86ISD::WIN_ALLOCA: return "X86ISD::WIN_ALLOCA";		case X86ISD::WIN_ALLOCA: return "X86ISD::WIN_ALLOCA";
case X86ISD::MEMBARRIER: return "X86ISD::MEMBARRIER";		case X86ISD::MEMBARRIER: return "X86ISD::MEMBARRIER";
case X86ISD::MFENCE: return "X86ISD::MFENCE";		case X86ISD::MFENCE: return "X86ISD::MFENCE";
case X86ISD::SEG_ALLOCA: return "X86ISD::SEG_ALLOCA";		case X86ISD::SEG_ALLOCA: return "X86ISD::SEG_ALLOCA";
		case X86ISD::PROBED_ALLOCA:
		return "X86ISD::PROBED_ALLOCA";
case X86ISD::SAHF: return "X86ISD::SAHF";		case X86ISD::SAHF: return "X86ISD::SAHF";
case X86ISD::RDRAND: return "X86ISD::RDRAND";		case X86ISD::RDRAND: return "X86ISD::RDRAND";
case X86ISD::RDSEED: return "X86ISD::RDSEED";		case X86ISD::RDSEED: return "X86ISD::RDSEED";
case X86ISD::RDPKRU: return "X86ISD::RDPKRU";		case X86ISD::RDPKRU: return "X86ISD::RDPKRU";
case X86ISD::WRPKRU: return "X86ISD::WRPKRU";		case X86ISD::WRPKRU: return "X86ISD::WRPKRU";
case X86ISD::VPMADDUBSW: return "X86ISD::VPMADDUBSW";		case X86ISD::VPMADDUBSW: return "X86ISD::VPMADDUBSW";
case X86ISD::VPMADDWD: return "X86ISD::VPMADDWD";		case X86ISD::VPMADDWD: return "X86ISD::VPMADDWD";
case X86ISD::VPSHA: return "X86ISD::VPSHA";		case X86ISD::VPSHA: return "X86ISD::VPSHA";
▲ Show 20 Lines • Show All 1,241 Lines • ▼ Show 20 Lines	X86TargetLowering::EmitLoweredSelect(MachineInstr &MI,

// Now remove the CMOV(s).		// Now remove the CMOV(s).
ThisMBB->erase(MIItBegin, MIItEnd);		ThisMBB->erase(MIItBegin, MIItEnd);

return SinkMBB;		return SinkMBB;
}		}

MachineBasicBlock *		MachineBasicBlock *
		X86TargetLowering::EmitLoweredProbedAlloca(MachineInstr &MI,
		MachineBasicBlock *BB) const {
		MachineFunction *MF = BB->getParent();
		const TargetInstrInfo *TII = Subtarget.getInstrInfo();
		DebugLoc DL = MI.getDebugLoc();
		const BasicBlock *LLVM_BB = BB->getBasicBlock();

		const bool Is64Bit = Subtarget.is64Bit();
		const bool IsLP64 = Subtarget.isTarget64BitLP64();

		const unsigned ProbeSize = getStackProbeSize(*MF);

		MachineRegisterInfo &MRI = MF->getRegInfo();
		MachineBasicBlock *testMBB = MF->CreateMachineBasicBlock(LLVM_BB);
		MachineBasicBlock *tailMBB = MF->CreateMachineBasicBlock(LLVM_BB);
		MachineBasicBlock *blockMBB = MF->CreateMachineBasicBlock(LLVM_BB);

		MachineFunction::iterator MBBIter = ++BB->getIterator();
		MF->insert(MBBIter, testMBB);
		MF->insert(MBBIter, blockMBB);
		MF->insert(MBBIter, tailMBB);

		unsigned sizeVReg = MI.getOperand(1).getReg();

		const TargetRegisterClass *SizeRegClass = MRI.getRegClass(sizeVReg);

		unsigned tmpSizeVReg = MRI.createVirtualRegister(SizeRegClass);
		unsigned tmpSizeVReg2 = MRI.createVirtualRegister(SizeRegClass);

		unsigned physSPReg =
		IsLP64 \|\| Subtarget.isTargetNaCl64() ? X86::RSP : X86::ESP;

		// test rsp size
		BuildMI(testMBB, DL, TII->get(X86::PHI), tmpSizeVReg)
		.addReg(sizeVReg)
		.addMBB(BB)
		.addReg(tmpSizeVReg2)
		.addMBB(blockMBB);

		BuildMI(testMBB, DL, TII->get(IsLP64 ? X86::CMP64ri32 : X86::CMP32ri))
		.addReg(tmpSizeVReg)
		.addImm(ProbeSize);

		BuildMI(testMBB, DL, TII->get(X86::JCC_1))
		.addMBB(tailMBB)
		.addImm(X86::COND_L);
		testMBB->addSuccessor(blockMBB);
		testMBB->addSuccessor(tailMBB);

		// allocate a block and touch it

		BuildMI(blockMBB, DL, TII->get(IsLP64 ? X86::SUB64ri32 : X86::SUB32ri),
		tmpSizeVReg2)
		.addReg(tmpSizeVReg)
		.addImm(ProbeSize);

		BuildMI(blockMBB, DL, TII->get(IsLP64 ? X86::SUB64ri32 : X86::SUB32ri),
		craig.topperUnsubmitted Not Done Reply Inline Actions This uses physSPReg, but doesn't match the condition for when physSPReg is a 64-bit register. Same at several places below. craig.topper: This uses physSPReg, but doesn't match the condition for when physSPReg is a 64-bit register.
		serge-sans-pailleAuthorUnsubmitted Done Reply Inline Actions Sorry, I don't understand your remark. Can you elaborate? serge-sans-paille: Sorry, I don't understand your remark. Can you elaborate?
		craig.topperUnsubmitted Not Done Reply Inline Actions If physSPReg is RSP then you need to use SUB64ri32 and if its ESP you need to use SUB32ri. The condition you're using here is "IsLP64", but the condition for phySPReg to be RSP is "IsLP64 \|\| Subtarget.isTargetNaCl64()". So there's a mismatch when IsLP64 is false and Subtarget.isTargetNaCl64() is true. craig.topper: If physSPReg is RSP then you need to use SUB64ri32 and if its ESP you need to use SUB32ri. The…
		physSPReg)
		.addReg(physSPReg)
		.addImm(ProbeSize);

		const unsigned MovMIOpc = Is64Bit ? X86::MOV64mi32 : X86::MOV32mi;
		addRegOffset(BuildMI(blockMBB, DL, TII->get(MovMIOpc)), physSPReg, false, 0)
		.addImm(0);

		BuildMI(blockMBB, DL, TII->get(X86::JMP_1)).addMBB(testMBB);
		blockMBB->addSuccessor(testMBB);

		// allocate the tail and continue
		BuildMI(tailMBB, DL, TII->get(IsLP64 ? X86::SUB64rr : X86::SUB32rr),
		physSPReg)
		.addReg(physSPReg)
		.addReg(tmpSizeVReg);
		BuildMI(tailMBB, DL, TII->get(TargetOpcode::COPY), MI.getOperand(0).getReg())
		.addReg(physSPReg);

		tailMBB->splice(tailMBB->end(), BB,
		std::next(MachineBasicBlock::iterator(MI)), BB->end());
		tailMBB->transferSuccessorsAndUpdatePHIs(BB);
		BB->addSuccessor(testMBB);

		// Delete the original pseudo instruction.
		MI.eraseFromParent();

		// And we're done.
		return tailMBB;
		}

		MachineBasicBlock *
X86TargetLowering::EmitLoweredSegAlloca(MachineInstr &MI,		X86TargetLowering::EmitLoweredSegAlloca(MachineInstr &MI,
MachineBasicBlock *BB) const {		MachineBasicBlock *BB) const {
MachineFunction *MF = BB->getParent();		MachineFunction *MF = BB->getParent();
const TargetInstrInfo *TII = Subtarget.getInstrInfo();		const TargetInstrInfo *TII = Subtarget.getInstrInfo();
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();
const BasicBlock *LLVM_BB = BB->getBasicBlock();		const BasicBlock *LLVM_BB = BB->getBasicBlock();

assert(MF->shouldSplitStack());		assert(MF->shouldSplitStack());
▲ Show 20 Lines • Show All 1,158 Lines • ▼ Show 20 Lines	case X86::RETPOLINE_TCRETURN64:
return EmitLoweredRetpoline(MI, BB);		return EmitLoweredRetpoline(MI, BB);
case X86::CATCHRET:		case X86::CATCHRET:
return EmitLoweredCatchRet(MI, BB);		return EmitLoweredCatchRet(MI, BB);
case X86::CATCHPAD:		case X86::CATCHPAD:
return EmitLoweredCatchPad(MI, BB);		return EmitLoweredCatchPad(MI, BB);
case X86::SEG_ALLOCA_32:		case X86::SEG_ALLOCA_32:
case X86::SEG_ALLOCA_64:		case X86::SEG_ALLOCA_64:
return EmitLoweredSegAlloca(MI, BB);		return EmitLoweredSegAlloca(MI, BB);
		case X86::PROBED_ALLOCA_32:
		case X86::PROBED_ALLOCA_64:
		return EmitLoweredProbedAlloca(MI, BB);
case X86::TLSCall_32:		case X86::TLSCall_32:
case X86::TLSCall_64:		case X86::TLSCall_64:
return EmitLoweredTLSCall(MI, BB);		return EmitLoweredTLSCall(MI, BB);
case X86::CMOV_FR32:		case X86::CMOV_FR32:
case X86::CMOV_FR32X:		case X86::CMOV_FR32X:
case X86::CMOV_FR64:		case X86::CMOV_FR64:
case X86::CMOV_FR64X:		case X86::CMOV_FR64X:
case X86::CMOV_GR8:		case X86::CMOV_GR8:
▲ Show 20 Lines • Show All 14,673 Lines • ▼ Show 20 Lines	for (auto *Exit : Exits)
.addReg(NewVR);		.addReg(NewVR);
}		}
}		}

bool X86TargetLowering::supportSwiftError() const {		bool X86TargetLowering::supportSwiftError() const {
return Subtarget.is64Bit();		return Subtarget.is64Bit();
}		}

		/// Returns true if stack probing through a function call is requested.
		bool X86TargetLowering::hasStackProbeSymbol(MachineFunction &MF) const {
		return !getStackProbeSymbolName(MF).empty();
		}

		/// Returns true if stack probing through inline assembly is requested.
		bool X86TargetLowering::hasInlineStackProbe(MachineFunction &MF) const {
		// If the function specifically requests inline stack probes, emit them.
		if (MF.getFunction().hasFnAttribute("probe-stack"))
		return MF.getFunction().getFnAttribute("probe-stack").getValueAsString() ==
		"inline-asm";

		return false;
		}

/// Returns the name of the symbol used to emit stack probes or the empty		/// Returns the name of the symbol used to emit stack probes or the empty
/// string if not applicable.		/// string if not applicable.
StringRef		StringRef
X86TargetLowering::getStackProbeSymbolName(MachineFunction &MF) const {		X86TargetLowering::getStackProbeSymbolName(MachineFunction &MF) const {
		// Inline Stack probes disable stack probe call
		if (hasInlineStackProbe(MF))
		return "";

// If the function specifically requests stack probes, emit them.		// If the function specifically requests stack probes, emit them.
if (MF.getFunction().hasFnAttribute("probe-stack"))		if (MF.getFunction().hasFnAttribute("probe-stack"))
return MF.getFunction().getFnAttribute("probe-stack").getValueAsString();		return MF.getFunction().getFnAttribute("probe-stack").getValueAsString();

// Generally, if we aren't on Windows, the platform ABI does not include		// Generally, if we aren't on Windows, the platform ABI does not include
// support for stack probes, so don't emit them.		// support for stack probes, so don't emit them.
if (!Subtarget.isOSWindows() \|\| Subtarget.isTargetMachO() \|\|		if (!Subtarget.isOSWindows() \|\| Subtarget.isTargetMachO() \|\|
MF.getFunction().hasFnAttribute("no-stack-arg-probe"))		MF.getFunction().hasFnAttribute("no-stack-arg-probe"))
Show All 21 Lines

llvm/lib/Target/X86/X86InstrCompiler.td

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	def SEG_ALLOCA_32 : I<0, Pseudo, (outs GR32:$dst), (ins GR32:$size),
Requires<[NotLP64]>;		Requires<[NotLP64]>;

let Defs = [RAX, RSP, EFLAGS], Uses = [RSP] in		let Defs = [RAX, RSP, EFLAGS], Uses = [RSP] in
def SEG_ALLOCA_64 : I<0, Pseudo, (outs GR64:$dst), (ins GR64:$size),		def SEG_ALLOCA_64 : I<0, Pseudo, (outs GR64:$dst), (ins GR64:$size),
"# variable sized alloca for segmented stacks",		"# variable sized alloca for segmented stacks",
[(set GR64:$dst,		[(set GR64:$dst,
(X86SegAlloca GR64:$size))]>,		(X86SegAlloca GR64:$size))]>,
Requires<[In64BitMode]>;		Requires<[In64BitMode]>;

		// To protect against stack clash, dynamic allocation should perform a memory
		// probe at each page.

		let Defs = [EAX, ESP, EFLAGS], Uses = [ESP] in
		def PROBED_ALLOCA_32 : I<0, Pseudo, (outs GR32:$dst), (ins GR32:$size),
		"# variable sized alloca with probing",
		[(set GR32:$dst,
		(X86ProbedAlloca GR32:$size))]>,
		Requires<[NotLP64]>;

		let Defs = [RAX, RSP, EFLAGS], Uses = [RSP] in
		def PROBED_ALLOCA_64 : I<0, Pseudo, (outs GR64:$dst), (ins GR64:$size),
		"# variable sized alloca with probing",
		[(set GR64:$dst,
		(X86ProbedAlloca GR64:$size))]>,
		Requires<[In64BitMode]>;
		craig.topperUnsubmitted Not Done Reply Inline Actions Why is this In64BitMode, but above is NotLP64. Shouldn't they be opposites? Looks like this was just copied from SEG_ALLOCA above? craig.topper: Why is this In64BitMode, but above is NotLP64. Shouldn't they be opposites? Looks like this was…
		serge-sans-pailleAuthorUnsubmitted Done Reply Inline Actions Yeah, these intrinsics are just the same as the one above, except they have a different name. And they get lowered differently. I'm not familiar with that part of LLVM, can you suggest a better approach? serge-sans-paille: Yeah, these intrinsics are just the same as the one above, except they have a different name.
}		}

// Dynamic stack allocation yields a _chkstk or _alloca call for all Windows		// Dynamic stack allocation yields a _chkstk or _alloca call for all Windows
// targets. These calls are needed to probe the stack when allocating more than		// targets. These calls are needed to probe the stack when allocating more than
// 4k bytes in one go. Touching the stack at 4K increments is necessary to		// 4k bytes in one go. Touching the stack at 4K increments is necessary to
// ensure that the guard pages used by the OS virtual memory manager are		// ensure that the guard pages used by the OS virtual memory manager are
// allocated in correct sequence.		// allocated in correct sequence.
// The main point of having separate instruction are extra unmodelled effects		// The main point of having separate instruction are extra unmodelled effects
▲ Show 20 Lines • Show All 2,056 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrInfo.td

	Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	def SDT_X86TLSBASEADDR : SDTypeProfile<0, 1, [SDTCisInt<0>]>;			def SDT_X86TLSBASEADDR : SDTypeProfile<0, 1, [SDTCisInt<0>]>;

	def SDT_X86TLSCALL : SDTypeProfile<0, 1, [SDTCisInt<0>]>;			def SDT_X86TLSCALL : SDTypeProfile<0, 1, [SDTCisInt<0>]>;

	def SDT_X86WIN_ALLOCA : SDTypeProfile<0, 1, [SDTCisVT<0, iPTR>]>;			def SDT_X86WIN_ALLOCA : SDTypeProfile<0, 1, [SDTCisVT<0, iPTR>]>;

	def SDT_X86SEG_ALLOCA : SDTypeProfile<1, 1, [SDTCisVT<0, iPTR>, SDTCisVT<1, iPTR>]>;			def SDT_X86SEG_ALLOCA : SDTypeProfile<1, 1, [SDTCisVT<0, iPTR>, SDTCisVT<1, iPTR>]>;

				def SDT_X86PROBED_ALLOCA : SDTypeProfile<1, 1, [SDTCisVT<0, iPTR>, SDTCisVT<1, iPTR>]>;

	def SDT_X86EHRET : SDTypeProfile<0, 1, [SDTCisInt<0>]>;			def SDT_X86EHRET : SDTypeProfile<0, 1, [SDTCisInt<0>]>;

	def SDT_X86TCRET : SDTypeProfile<0, 2, [SDTCisPtrTy<0>, SDTCisVT<1, i32>]>;			def SDT_X86TCRET : SDTypeProfile<0, 2, [SDTCisPtrTy<0>, SDTCisVT<1, i32>]>;

	def SDT_X86MEMBARRIER : SDTypeProfile<0, 0, []>;			def SDT_X86MEMBARRIER : SDTypeProfile<0, 0, []>;

	def SDT_X86ENQCMD : SDTypeProfile<1, 2, [SDTCisVT<0, i32>,			def SDT_X86ENQCMD : SDTypeProfile<1, 2, [SDTCisVT<0, i32>,
	SDTCisPtrTy<1>, SDTCisSameAs<1, 2>]>;			SDTCisPtrTy<1>, SDTCisSameAs<1, 2>]>;
	▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines
	def X86mul_imm : SDNode<"X86ISD::MUL_IMM", SDTIntBinOp>;			def X86mul_imm : SDNode<"X86ISD::MUL_IMM", SDTIntBinOp>;

	def X86WinAlloca : SDNode<"X86ISD::WIN_ALLOCA", SDT_X86WIN_ALLOCA,			def X86WinAlloca : SDNode<"X86ISD::WIN_ALLOCA", SDT_X86WIN_ALLOCA,
	[SDNPHasChain, SDNPOutGlue]>;			[SDNPHasChain, SDNPOutGlue]>;

	def X86SegAlloca : SDNode<"X86ISD::SEG_ALLOCA", SDT_X86SEG_ALLOCA,			def X86SegAlloca : SDNode<"X86ISD::SEG_ALLOCA", SDT_X86SEG_ALLOCA,
	[SDNPHasChain]>;			[SDNPHasChain]>;

				def X86ProbedAlloca : SDNode<"X86ISD::PROBED_ALLOCA", SDT_X86PROBED_ALLOCA,
				[SDNPHasChain]>;

	def X86TLSCall : SDNode<"X86ISD::TLSCALL", SDT_X86TLSCALL,			def X86TLSCall : SDNode<"X86ISD::TLSCALL", SDT_X86TLSCALL,
	[SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;			[SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

	def X86lwpins : SDNode<"X86ISD::LWPINS",			def X86lwpins : SDNode<"X86ISD::LWPINS",
	SDTypeProfile<1, 3, [SDTCisVT<0, i32>, SDTCisInt<1>,			SDTypeProfile<1, 3, [SDTCisVT<0, i32>, SDTCisInt<1>,
	SDTCisVT<2, i32>, SDTCisVT<3, i32>]>,			SDTCisVT<2, i32>, SDTCisVT<3, i32>]>,
	[SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPSideEffect]>;			[SDNPHasChain, SDNPMayStore, SDNPMayLoad, SDNPSideEffect]>;

	▲ Show 20 Lines • Show All 3,283 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/stack-clash-dynamic-alloca.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s


				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define i32 @foo(i32 %n) local_unnamed_addr #0 {

				; CHECK-LABEL: foo:
				; CHECK: # %bb.0:
				; CHECK-NEXT: pushq %rbp
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				rnkUnsubmitted Done Reply Inline Actions This seems like a good use case for update_llc_test_checks.py. I'd want to see the body of the loop, the cmp, the jl, etc. rnk: This seems like a good use case for update_llc_test_checks.py. I'd want to see the body of the…
				; CHECK-NEXT: .cfi_offset %rbp, -16
				; CHECK-NEXT: movq %rsp, %rbp
				; CHECK-NEXT: .cfi_def_cfa_register %rbp
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: leaq 15(,%rax,4), %rax
				; CHECK-NEXT: andq $-16, %rax
				; CHECK-NEXT: cmpq $4096, %rax # imm = 0x1000
				; CHECK-NEXT: jl .LBB0_3
				; CHECK-NEXT: .LBB0_2: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: subq $4096, %rax # imm = 0x1000
				; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000
				; CHECK-NEXT: movq $0, (%rsp)
				; CHECK-NEXT: cmpq $4096, %rax # imm = 0x1000
				; CHECK-NEXT: jge .LBB0_2
				; CHECK-NEXT: .LBB0_3:
				; CHECK-NEXT: subq %rax, %rsp
				; CHECK-NEXT: movq %rsp, %rax
				; CHECK-NEXT: movl $1, 4792(%rax)
				; CHECK-NEXT: movl (%rax), %eax
				; CHECK-NEXT: movq %rbp, %rsp
				; CHECK-NEXT: popq %rbp
				; CHECK-NEXT: .cfi_def_cfa %rsp, 8
				; CHECK-NEXT: retq

				%a = alloca i32, i32 %n, align 16
				%b = getelementptr inbounds i32, i32* %a, i64 1198
				store volatile i32 1, i32* %b
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				attributes #0 = {"probe-stack"="inline-asm"}

llvm/test/CodeGen/X86/stack-clash-large.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s


				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define i32 @foo() local_unnamed_addr #0 {

				; CHECK-LABEL: foo:
				; CHECK: # %bb.0:
				; CHECK-NEXT: movq %rsp, %r11
				; CHECK-NEXT: subq $69632, %r11 # imm = 0x11000
				; CHECK-NEXT: .LBB0_1:
				; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000
				; CHECK-NEXT: movq $0, (%rsp)
				; CHECK-NEXT: cmpq %r11, %rsp
				; CHECK-NEXT: jne .LBB0_1
				; CHECK-NEXT:# %bb.2:
				; CHECK-NEXT: subq $2248, %rsp # imm = 0x8C8
				; CHECK-NEXT: .cfi_def_cfa_offset 71888
				; CHECK-NEXT: movl $1, 264(%rsp)
				; CHECK-NEXT: movl $1, 28664(%rsp)
				; CHECK-NEXT: movl -128(%rsp), %eax
				; CHECK-NEXT: addq $71880, %rsp # imm = 0x118C8
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: retq


				%a = alloca i32, i64 18000, align 16
				%b0 = getelementptr inbounds i32, i32* %a, i64 98
				%b1 = getelementptr inbounds i32, i32* %a, i64 7198
				store volatile i32 1, i32* %b0
				store volatile i32 1, i32* %b1
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				attributes #0 = {"probe-stack"="inline-asm"}

llvm/test/CodeGen/X86/stack-clash-medium-natural-probes.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s


				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define i32 @foo() local_unnamed_addr #0 {

				; CHECK-LABEL: foo:
				; CHECK: # %bb.0:
				; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000
				; CHECK-NEXT: .cfi_def_cfa_offset 7888
				; CHECK-NEXT: movl $1, 264(%rsp)
				; CHECK-NEXT: movl $1, 880(%rsp)
				; CHECK-NEXT: subq $3784, %rsp # imm = 0xEC8
				; CHECK-NEXT: movq $0, (%rsp)
				; CHECK-NEXT: movl -128(%rsp), %eax
				; CHECK-NEXT: addq $7880, %rsp # imm = 0x1EC8
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: retq

				%a = alloca i32, i64 2000, align 16
				%b0 = getelementptr inbounds i32, i32* %a, i64 98
				%b1 = getelementptr inbounds i32, i32* %a, i64 1198
				store volatile i32 1, i32* %b0
				store volatile i32 1, i32* %b1
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				attributes #0 = {"probe-stack"="inline-asm"}

llvm/test/CodeGen/X86/stack-clash-medium.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s


				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define i32 @foo() local_unnamed_addr #0 {

				; CHECK-LABEL: foo:
				; CHECK: # %bb.0:
				; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000
				; CHECK-NEXT: .cfi_def_cfa_offset 7888
				; CHECK-NEXT: movl $1, 880(%rsp)
				; CHECK-NEXT: subq $3784, %rsp # imm = 0xEC8
				; CHECK-NEXT: movq $0, (%rsp)
				; CHECK-NEXT: movl -128(%rsp), %eax
				; CHECK-NEXT: addq $7880, %rsp # imm = 0x1EC8
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: retq

				%a = alloca i32, i64 2000, align 16
				%b = getelementptr inbounds i32, i32* %a, i64 1198
				store volatile i32 1, i32* %b
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				attributes #0 = {"probe-stack"="inline-asm"}

llvm/test/CodeGen/X86/stack-clash-no-free-probe.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define i32 @foo(i64 %i) local_unnamed_addr #0 {
				; CHECK-LABEL: foo:
				; CHECK: # %bb.0:
				; CHECK-NEXT: subq $4096, %rsp # imm = 0x1000
				; CHECK-NEXT: movq $0, (%rsp)
				; CHECK-NEXT: subq $3784, %rsp # imm = 0xEC8
				; CHECK-NEXT: .cfi_def_cfa_offset 7888
				; CHECK-NEXT: movl $1, -128(%rsp,%rdi,4)
				; CHECK-NEXT: movl -128(%rsp), %eax
				; CHECK-NEXT: addq $7880, %rsp # imm = 0x1EC8
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: retq

				%a = alloca i32, i32 2000, align 16
				%b = getelementptr inbounds i32, i32* %a, i64 %i
				store volatile i32 1, i32* %b
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				attributes #0 = {"probe-stack"="inline-asm"}

llvm/test/CodeGen/X86/stack-clash-small.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s


				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define i32 @foo() local_unnamed_addr #0 {
				; CHECK-LABEL: foo:
				; CHECK: # %bb.0:
				; CHECK-NEXT: subq $280, %rsp # imm = 0x118
				; CHECK-NEXT: .cfi_def_cfa_offset 288
				; CHECK-NEXT: movl $1, 264(%rsp)
				; CHECK-NEXT: movl -128(%rsp), %eax
				; CHECK-NEXT: addq $280, %rsp # imm = 0x118
				; CHECK-NEXT: .cfi_def_cfa_offset 8
				; CHECK-NEXT: retq

				%a = alloca i32, i64 100, align 16
				%b = getelementptr inbounds i32, i32* %a, i64 98
				store volatile i32 1, i32* %b
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				attributes #0 = {"probe-stack"="inline-asm"}

This is an archive of the discontinued LLVM Phabricator instance.

Support -fstack-clash-protection for x86ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 225462

clang/docs/ReleaseNotes.rst

clang/include/clang/Basic/CodeGenOptions.def

clang/include/clang/Basic/DiagnosticFrontendKinds.td

clang/include/clang/Basic/TargetInfo.h

clang/include/clang/Driver/CC1Options.td

clang/include/clang/Driver/Options.td

clang/lib/Basic/Targets/X86.h

clang/lib/CodeGen/CGStmt.cpp

clang/lib/CodeGen/CodeGenModule.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Frontend/CompilerInvocation.cpp

clang/test/CodeGen/stack-clash-protection.c

clang/test/Driver/stack-clash-protection.c

llvm/docs/ReleaseNotes.rst

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/Target/X86/X86CallFrameOptimization.cpp

llvm/lib/Target/X86/X86FrameLowering.h

llvm/lib/Target/X86/X86FrameLowering.cpp

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/lib/Target/X86/X86InstrCompiler.td

llvm/lib/Target/X86/X86InstrInfo.td

llvm/test/CodeGen/X86/stack-clash-dynamic-alloca.ll

llvm/test/CodeGen/X86/stack-clash-large.ll

llvm/test/CodeGen/X86/stack-clash-medium-natural-probes.ll

llvm/test/CodeGen/X86/stack-clash-medium.ll

llvm/test/CodeGen/X86/stack-clash-no-free-probe.ll

llvm/test/CodeGen/X86/stack-clash-small.ll

Support -fstack-clash-protection for x86
ClosedPublic