This is an archive of the discontinued LLVM Phabricator instance.

[stack-protection] Add support for MSVC buffer security check
ClosedPublic

Authored by etienneb on May 17 2016, 5:51 PM.

Details

Summary

This patch is adding support for the MSVC buffer security check implementation

The buffer security check is turned on with the '/GS' compiler switch.

Some overview of buffer security check feature and implementation:

For the following example:

int example(int offset, int index) {
  char buffer[10];
  memset(buffer, 0xCC, index);
  return buffer[index];
}

The MSVC compiler is adding these instructions to perform stack integrity check:

      push        ebp  
      mov         ebp,esp  
      sub         esp,50h  
[1]   mov         eax,dword ptr [__security_cookie (01068024h)]  
[2]   xor         eax,ebp  
[3]   mov         dword ptr [ebp-4],eax  
      push        ebx  
      push        esi  
      push        edi  
      mov         eax,dword ptr [index]  
      push        eax  
      push        0CCh  
      lea         ecx,[buffer]  
      push        ecx  
      call        _memset (010610B9h)  
      add         esp,0Ch  
      mov         eax,dword ptr [index]  
      movsx       eax,byte ptr buffer[eax]  
      pop         edi  
      pop         esi  
      pop         ebx  
[4]   mov         ecx,dword ptr [ebp-4]  
[5]   xor         ecx,ebp  
[6]   call        @__security_check_cookie@4 (01061276h)  
      mov         esp,ebp  
      pop         ebp  
      ret

The instrumentation above is:

  • [1] is loading the global security canary,
  • [3] is storing the local computed ([2]) canary to the guard slot,
  • [4] is loading the guard slot and ([5]) re-compute the global canary,
  • [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.

Overview of the current stack-protection implementation:

  • lib/CodeGen/StackProtector.cpp
    • There is a default stack-protection implementation applied on intermediate representation.
    • The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
    • An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
    • Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
    • Guard manipulation and comparison are added directly to the intermediate representation.
  • lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
  • lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
    • There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
      • see long comment above 'class StackProtectorDescriptor' declaration.
    • The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
      • 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
    • The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.
  • include/llvm/Target/TargetLowering.h
    • Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.
  • lib/Target/X86/X86ISelLowering.cpp
    • Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.

Function-based Instrumentation:

  • The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
  • To support function-based instrumentation, this patch is
    • adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
      • If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
    • modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
    • generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
    • if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).

Modifications

  • adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
  • adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)

Results

  • IR generated instrumentation:
clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
*** Final LLVM Code input to ISel ***

; Function Attrs: nounwind sspstrong
define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
entry:
  %StackGuardSlot = alloca i8*                                                  <<<-- Allocated guard slot
  %0 = call i8* @llvm.stackguard()                                              <<<-- Loading Stack Guard value
  call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot)                  <<<-- Prologue intrinsic call (store to Guard slot)
  %index.addr = alloca i32, align 4
  %offset.addr = alloca i32, align 4
  %buffer = alloca [10 x i8], align 1
  store i32 %index, i32* %index.addr, align 4
  store i32 %offset, i32* %offset.addr, align 4
  %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
  %1 = load i32, i32* %index.addr, align 4
  call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
  %2 = load i32, i32* %index.addr, align 4
  %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
  %3 = load i8, i8* %arrayidx, align 1
  %conv = sext i8 %3 to i32
  %4 = load volatile i8*, i8** %StackGuardSlot                                  <<<-- Loading Guard slot
  call void @__security_check_cookie(i8* %4)                                    <<<-- Epilogue function-based check
  ret i32 %conv
}
  • SelectionDAG generated instrumentation:
clang-cl /GS test.cc /O1 /c /FA
"?example@@YAHHH@Z":                    # @"\01?example@@YAHHH@Z"
# BB#0:                                 # %entry
        pushl   %esi
        subl    $16, %esp
        movl    ___security_cookie, %eax                                        <<<-- Loading Stack Guard value
        movl    28(%esp), %esi
        movl    %eax, 12(%esp)                                                  <<<-- Store to Guard slot
        leal    2(%esp), %eax
        pushl   %esi
        pushl   $204
        pushl   %eax
        calll   _memset
        addl    $12, %esp
        movsbl  2(%esp,%esi), %esi
        movl    12(%esp), %ecx                                                  <<<-- Loading Guard slot
        calll   @__security_check_cookie@4                                      <<<-- Epilogue function-based check
        movl    %esi, %eax
        addl    $16, %esp
        popl    %esi
        retl

Diff Detail

Event Timeline

etienneb updated this revision to Diff 57544.May 17 2016, 5:51 PM
etienneb retitled this revision from to [draft] StackGuad prototype.
etienneb updated this object.
etienneb retitled this revision from [draft] StackGuad prototype to [draft] StackGuard prototype.May 17 2016, 6:00 PM
etienneb updated this revision to Diff 57568.May 18 2016, 12:55 AM

fixing /Od instrumentation

etienneb retitled this revision from [draft] StackGuard prototype to [stack-protection] Add support for MSVC buffer security security.May 18 2016, 8:38 AM
etienneb updated this object.
etienneb updated this object.May 18 2016, 8:59 AM
etienneb updated this object.May 18 2016, 9:09 AM
etienneb updated this object.May 18 2016, 9:22 AM
etienneb updated this object.May 18 2016, 9:24 AM
etienneb updated this object.May 18 2016, 12:07 PM
etienneb updated this object.May 18 2016, 12:10 PM
etienneb updated this object.May 18 2016, 12:13 PM
etienneb updated this object.May 18 2016, 12:18 PM
etienneb updated this object.May 18 2016, 12:28 PM
etienneb updated this object.May 18 2016, 12:40 PM
etienneb updated this object.
etienneb retitled this revision from [stack-protection] Add support for MSVC buffer security security to [stack-protection] Add support for MSVC buffer security check.May 18 2016, 1:00 PM
etienneb updated this revision to Diff 57716.May 18 2016, 5:21 PM

unittests

etienneb updated this revision to Diff 57792.May 19 2016, 7:52 AM

tests for MSVC

etienneb updated this object.May 19 2016, 8:02 AM
rnk added a subscriber: rnk.May 19 2016, 2:48 PM

Neat, looks like a reasonable approach.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2062–2063

A tiny bit of x86 knowledge here doesn't hurt that much. :)

lib/CodeGen/StackProtector.cpp
355

"mechanism"

etienneb updated this object.May 20 2016, 9:25 AM
rnk added a subscriber: thakis.May 24 2016, 11:23 AM
rnk added a subscriber: hans.

Nice! x64 is for a separate CL, I'm guessing?

Nice! x64 is for a separate CL, I'm guessing?

This is not yet fully working.
I'm debugging some corner cases found when running CPU2006, chrome and llvm.
Compiling with Fast-ISEL isn't working as intended. The callconv is not propagated correctly.

I'm gonna check for 64-bits and see what is missing., after.

etienneb updated this revision to Diff 58303.May 24 2016, 1:10 PM

fix calling convention; breaking with fast-isel.

etienneb marked an inline comment as done.May 24 2016, 1:14 PM
etienneb added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2062–2063

I moved the architecture specific code to the symbol declaration.
It's cleaner that way.

The 64-bits version is working the same way.

Prologue:

000000013F1C17B6  mov         rax,qword ptr [__security_cookie (013F1CC000h)]  
000000013F1C17BD  xor         rax,rbp  
000000013F1C17C0  mov         qword ptr [rbp+0E8h],rax

Epilogue:

000000013F1C17FF  mov         rcx,qword ptr [rbp+0E8h]  
000000013F1C1806  xor         rcx,rbp  
000000013F1C1809  call        __security_check_cookie (013F1C12EEh)

The 64-bits version is working the same way.

Prologue:

000000013F1C17B6  mov         rax,qword ptr [__security_cookie (013F1CC000h)]  
000000013F1C17BD  xor         rax,rbp  
000000013F1C17C0  mov         qword ptr [rbp+0E8h],rax

Epilogue:

000000013F1C17FF  mov         rcx,qword ptr [rbp+0E8h]  
000000013F1C1806  xor         rcx,rbp  
000000013F1C1809  call        __security_check_cookie (013F1C12EEh)

Cool! Probably good to give it some test coverage still.

etienneb marked an inline comment as done.May 25 2016, 10:50 AM

Here are some metrics for a chromium build on Windows with/without stack guard activated.

The first set of column is when the /GS is off by default.
The second set of column is when the /GS is on by default.

Relocs are the number of relocation to security_cookie.

                          Without /GS               With  /GS                Delta          
                  ----------------------     ----------------------    ----------------------
                           Size   Relocs           Size       Relocs          Size
                          (bytes)                (bytes)                    (bytes)
chrome.exe             1,081,344    206         1,107,456     1,128           26,112    2.41%
chrome.dll            47,528,960    228        49,414,144    65,208        1,885,184    3.97%
chrome_child.dll      57,482,752    242        59,433,984    72,722        1,951,232    3.39%
hans added a comment.May 25 2016, 10:55 AM

Here are some metrics for a chromium build on Windows with/without stack guard activated.

The first set of column is when the /GS is off by default.
The second set of column is when the /GS is on by default.

Relocs are the number of relocation to security_cookie.

                          Without /GS               With  /GS                Delta          
                  ----------------------     ----------------------    ----------------------
                           Size   Relocs           Size       Relocs          Size
                          (bytes)                (bytes)                    (bytes)
chrome.exe             1,081,344    206         1,107,456     1,128           26,112    2.41%
chrome.dll            47,528,960    228        49,414,144    65,208        1,885,184    3.97%
chrome_child.dll      57,482,752    242        59,433,984    72,722        1,951,232    3.39%

Are those "branding=Chrome buildtype=Official" builds? The numbers without /GS look larger than what I see for 32-bit Clang builds.

In D20346#439546, @hans wrote:

Here are some metrics for a chromium build on Windows with/without stack guard activated.

The first set of column is when the /GS is off by default.
The second set of column is when the /GS is on by default.

Relocs are the number of relocation to security_cookie.

                          Without /GS               With  /GS                Delta          
                  ----------------------     ----------------------    ----------------------
                           Size   Relocs           Size       Relocs          Size
                          (bytes)                (bytes)                    (bytes)
chrome.exe             1,081,344    206         1,107,456     1,128           26,112    2.41%
chrome.dll            47,528,960    228        49,414,144    65,208        1,885,184    3.97%
chrome_child.dll      57,482,752    242        59,433,984    72,722        1,951,232    3.39%

Are those "branding=Chrome buildtype=Official" builds? The numbers without /GS look larger than what I see for 32-bit Clang builds.

They are Chromium build, not Chrome build. So, it's not an official build.
They are ToT for chromium, and ToT for clang.

{
  'GYP_GENERATORS': 'ninja',
  'GYP_DEFINES': 'clang=1 clang_use_chrome_plugins=0 make_clang_dir=d:/src/llvm/build/Release/'  
}
hans added a comment.May 25 2016, 11:35 AM
In D20346#439546, @hans wrote:

Here are some metrics for a chromium build on Windows with/without stack guard activated.

The first set of column is when the /GS is off by default.
The second set of column is when the /GS is on by default.

Relocs are the number of relocation to security_cookie.

                          Without /GS               With  /GS                Delta          
                  ----------------------     ----------------------    ----------------------
                           Size   Relocs           Size       Relocs          Size
                          (bytes)                (bytes)                    (bytes)
chrome.exe             1,081,344    206         1,107,456     1,128           26,112    2.41%
chrome.dll            47,528,960    228        49,414,144    65,208        1,885,184    3.97%
chrome_child.dll      57,482,752    242        59,433,984    72,722        1,951,232    3.39%

Are those "branding=Chrome buildtype=Official" builds? The numbers without /GS look larger than what I see for 32-bit Clang builds.

They are Chromium build, not Chrome build. So, it's not an official build.
They are ToT for chromium, and ToT for clang.

I worry that the relative deltas will be larger with buildtype=Official since that builds large parts of the code with -Os. And since those are the kind of binaries we ship, those are the most interesting numbers.

Another question: when does and doesn't the stack checks get emitted? For example, in this code where there are no stack objects, MSVC will not emit the stack protector. What happens with your patch?

void g(int, void*);
void f(void *p) {
  g(5, p);
}
In D20346#439611, @hans wrote:
In D20346#439546, @hans wrote:

Here are some metrics for a chromium build on Windows with/without stack guard activated.

The first set of column is when the /GS is off by default.
The second set of column is when the /GS is on by default.

Relocs are the number of relocation to security_cookie.

                          Without /GS               With  /GS                Delta          
                  ----------------------     ----------------------    ----------------------
                           Size   Relocs           Size       Relocs          Size
                          (bytes)                (bytes)                    (bytes)
chrome.exe             1,081,344    206         1,107,456     1,128           26,112    2.41%
chrome.dll            47,528,960    228        49,414,144    65,208        1,885,184    3.97%
chrome_child.dll      57,482,752    242        59,433,984    72,722        1,951,232    3.39%

Are those "branding=Chrome buildtype=Official" builds? The numbers without /GS look larger than what I see for 32-bit Clang builds.

They are Chromium build, not Chrome build. So, it's not an official build.
They are ToT for chromium, and ToT for clang.

I worry that the relative deltas will be larger with buildtype=Official since that builds large parts of the code with -Os. And since those are the kind of binaries we ship, those are the most interesting numbers.

Another question: when does and doesn't the stack checks get emitted? For example, in this code where there are no stack objects, MSVC will not emit the stack protector. What happens with your patch?

void g(int, void*);
void f(void *p) {
  g(5, p);
}

This patch is not touching anything related to the algorithm that choose whether or not we should emit a guard.
It's the implementation using the MSVC technique (and the CRT for error reporting).

The heuristic is implemented in stackprotector.cpp and it's the same used by other platforms.

The previous metrics come from this patch: http://reviews.llvm.org/D20347
which activates /GS by default and bind it to "-fstack-protector-strong".

We could investigate when MSVC and Clang are producing guards and comparing both algorithms.
This may lead to adding a flag -fstack-protector-msvc or something similar if needed.

For your example, they won't be a stack guard emitted.

void g(int, void*);
void f(void *p) {
  g(5, p);
}

Guards are emitted when there are unsafe function of buffers declared locally.
(see unittests)

hans added a comment.May 25 2016, 1:37 PM

For your example, they won't be a stack guard emitted.

void g(int, void*);
void f(void *p) {
  g(5, p);
}

Guards are emitted when there are unsafe function of buffers declared locally.
(see unittests)

That sounds great, thanks.

I think we can look into situations where Clang emits stack guards and MSVC doesn't if that turns out to be a problem later.

Overall, your size numbers don't look too scary. If they're about the same on 64-bit, I think we might have headroom for that without growing bigger than MSVC's binaries (currently clang-cl's 64-bit chrome_child.dll is about 4.5% smaller than MSVC's).

The patch looks fine to me, but rnk should probably sign off on it.

etienneb updated this revision to Diff 58500.May 25 2016, 1:56 PM
etienneb marked an inline comment as done.

add more tests

@rnk How does this play with WinEHPrepare, AsmPrinter and the state number insertion pass? Should those be updated later? Should we not enable /GS by default until they are implemented?

lib/CodeGen/StackProtector.cpp
357–363

Pointers lean right.

lib/Target/X86/X86ISelLowering.cpp
1962

Pointers lean right.

rnk added a comment.May 25 2016, 3:47 PM

@rnk How does this play with WinEHPrepare, AsmPrinter and the state number insertion pass? Should those be updated later? Should we not enable /GS by default until they are implemented?

Hm, maybe we should block stack guard insertion into functions with EH funclets for now. I think we have to update the tables and switch from _except_handler3 to _except_handler4 for SEH.

The general approach here should work, though. We don't need to do anything special for funclet prologue/epilogues because they never hold any address-taken stack objects. Those all live in the parent stack frame.

More metrics for Clang and MSVC on a chromium build.
LLVM heurisitic is producing more than 3x candidates to stack protection than MSVC compiler.

Clang
    Size    Relocs
 1,107,456   1,128
49,414,144  65,208
59,433,984  72,722
MSVC
    Size    Relocs
 1,139,200     682
44,576,768  26,336
52,203,008  13,954
etienneb updated this revision to Diff 58852.May 27 2016, 3:49 PM
etienneb marked 2 inline comments as done.

fix nits

I'm gonna take a look to WinEHPrepare.

But, there is a link between EH and GS.

struct _EH4_SCOPETABLE {
     DWORD GSCookieOffset;
     DWORD GSCookieXOROffset;
     DWORD EHCookieOffset;
     DWORD EHCookieXOROffset;
     _EH4_SCOPETABLE_RECORD ScopeRecord[1];
 };

When using version 4 of the exception handler, the GS cookie can be present.

GSCookieOffset = -2 means that GS cookie is not used. EH cookie is always present. Offsets are ebp relative.

Should those be updated later?

Yes.

I had few examples where the choice of 'personality' is changed from version 3 to version 4 when /GS is on.
Also, the stack/EH security cookies must be added to the structure when using version 3.

Should we not enable /GS by default until they are implemented?

This should/could be kept off until other pieces are in place.

rnk added a comment.May 31 2016, 2:30 PM

I had few examples where the choice of 'personality' is changed from version 3 to version 4 when /GS is on.
Also, the stack/EH security cookies must be added to the structure when using version 3.

I actually thought all functions using SEH got upgraded to _except_handler4. So, maybe we should change clang to use _except_handler4 when -fstack-protector is on.

lib/CodeGen/StackProtector.cpp
357–363

Not actually done?

In D20346#440161, @rnk wrote:

@rnk How does this play with WinEHPrepare, AsmPrinter and the state number insertion pass? Should those be updated later? Should we not enable /GS by default until they are implemented?

Hm, maybe we should block stack guard insertion into functions with EH funclets for now. I think we have to update the tables and switch from _except_handler3 to _except_handler4 for SEH.

There are some upsides to transparently upgrading the personality in the backend: we'd support transparently support inlining between the two and other frontends would need similar changes.

The general approach here should work, though. We don't need to do anything special for funclet prologue/epilogues because they never hold any address-taken stack objects. Those all live in the parent stack frame.

Also, the stack/EH security cookies must be added to the structure when using version 3.

I actually thought all functions using SEH got upgraded to _except_handler4. So, maybe we should change clang to use _except_handler4 when -fstack-protector is on.

For a given compilation unit with stack-protection turned on, it seems to me that functions using SEH are all upgraded to version 4.
I tried to produce a counter example using these MSVC features:

But, I was not able to produce a case with functions on different exception handlers.

By using two separate source files with either /GS and /GS-, I was able to produce an executable with both exception handler.
Which means, both version must co-exist well.

Functions (same compilation unit) using SEH and GS are upgraded to version 4 (containing the cookie slots in EH ScopeTable).
Which make sense because exception handler version 4 is performing additional checks on the security cookies when an exception occurs.
If /GS is off, it's better to produce smaller structures. So, it better to produce version 3.

Graphs showing stack layout:

rnk added a comment.Jun 1 2016, 8:47 AM

I looked into x64 a bit more, and this example shows that we need to do a similar upgrade from __C_specific_handler to __GSHandlerCheck_SEH:

extern "C" void *memset(void*, int, int);
int example(int offset, int index) {
  char buffer[10];
  __try {
    memset(buffer, 0xCC, index);
  } __except(1) {
  }
  return buffer[index];
}

Looking at this example, I agree with David we should do this EH personality function upgrade in the backend if we decide to protect the stack. It just so happens that for 32-bit, MSVC always protects the stack when SEH is used, which is why I thought we should pick _except_handler4 in the frontend. We should also look at C++ EH, I forget what that does.

Anyway, I would have stack protector bail out on functions with funclet EH personalities for now (use isFuncletEHPersonality the way WinEHPrepare.cpp does). We can deal with the personality function upgrade later, and land this once we're sure this doesn't break EH. We should also add the XOR with RSP/EBP/ESP as a followup.

etienneb updated this revision to Diff 59230.Jun 1 2016, 9:03 AM
etienneb marked an inline comment as done.

nits

etienneb updated this revision to Diff 59383.Jun 2 2016, 8:23 AM

disabling stack protection with funclets

etienneb updated this revision to Diff 59385.Jun 2 2016, 8:30 AM

change cookie type to be consistent:
./lib/Target/X86/X86WinEHState.cpp:

Cookie = TheModule->getOrInsertGlobal("__security_cookie", Int32Ty);
etienneb updated this revision to Diff 59395.Jun 2 2016, 9:03 AM

revert previous patch

rnk accepted this revision.Jun 6 2016, 4:37 PM
rnk added a reviewer: rnk.

lgtm, thanks!

This revision is now accepted and ready to land.Jun 6 2016, 4:37 PM
etienneb closed this revision.Jun 7 2016, 1:22 PM