This is an archive of the discontinued LLVM Phabricator instance.

MemorySanitizer: Add option to insert init checks at call site
ClosedPublic

Authored by guiand on Jun 11 2020, 4:31 PM.

Details

Summary

This change depends on a previous patch adding the noundef attribute to LLVM calls.

With this change, specifying the option -msan-eager-checks will cause MemorySanitizer to insert initialization checks for function arguments before calling them, and for return values before returning.

This change depends on the noundef keyword to determining cases where it it sound to check these shadows, and falls back to passing shadows values by TLS.

Checking at call boundaries enforces undefined behavior rules with passing uninitialized arguments by value. It also provides substantial improvements to the optimized instrumented code, decreasing binary size by up to 17% and runtime up to 25%.

Diff Detail

Event Timeline

guiand created this revision.Jun 11 2020, 4:31 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 11 2020, 4:31 PM
Herald added subscribers: llvm-commits, Restricted Project, aaron.ballman and 2 others. · View Herald Transcript

May I ask you to move compiler-rt/ into a separate patch? With ClEagerChecks false by default we can do that and have smaller patches.

vitalybuka added inline comments.Jun 17 2020, 2:20 PM
llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
880

this line is to long. please clang-format the patch
I usually do that by:
git clang-format -f --style=file HEAD^

1082–1083

Please undo such irrelevant changes or move them into separate patches

1220

var names are inconsistent
same for i and it
llvm style is to start with uppercase

1222

so if we need integer index, then just iterate with getNumElements() getElementType(unsigned N) ?
range base loop and separate var is fine as well

BTW. the first "i" is 1, is it expected?

1223

declare and intialize variable at the same statement if possible.
I assume it's to destroy IRB before recursive call
maybe then extract lines in {} into a separate function?

1625

FArgCheck -> FArgEagerCheck?

1684

why do we need this?

3411

Check -> EagerCheck?

3467

ClEagerChecks && !PartialInit;

llvm/test/Instrumentation/MemorySanitizer/msan_eager.ll
2

would you like to try go generate test with llvm/utils/update_analyze_test_checks.py

10–21

llvm/utils/update_analyze_test_checks.py will likely drop this, but they are not important for this check

glider added a subscriber: glider.Jun 18 2020, 6:11 AM

Wrapping all variables in the existing tests in PartInit<> doesn't look good.
I believe instead of doing this you need to:

  • change existing tests to also run with eager checks, adding expected MSan outputs as needed. Do not scatter PartInit<> across all tests.
  • add more tests that use the partinit attribute to cover new functionality.

I also don't quite understand how this partinit thing works with programs written in C. We probably need a test for it as well.

compiler-rt/test/msan/in-struct-padding.cpp
4 ↗(On Diff #270260)

It's unclear what this test is doing. Please consider adding comments.

compiler-rt/test/msan/param_tls_limit.cpp
39 ↗(On Diff #270260)

Please don't introduce unnecessary diff.

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
3442

Please remove unrelated diff from the patch.

aaron.ballman added inline comments.Jun 18 2020, 6:33 AM
llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
867

Whitelist msan unaligned -> Allow msan unaligned

877

Whitelist msan unaligned -> Allow msan unaligned

It looks like folks are still deciding what to call the partialinit / frozen / nopoison attribute. Once that's settled I'll go through and update this patch and address the reviews.

One thing that's changed is that struct and union types in clang are never marked for checking anymore (at least, for now). So the tests I added to check that can likely go away.

guiand marked 2 inline comments as done.Jun 22 2020, 10:42 AM
guiand added inline comments.
llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
1222

Makes sense to use getElementType. But I think here the first i is still 0, as the i++ expression happens after each loop body.

1684

This change allows checked arguments to not take up TLS space since they don't use the TLS for anything anyway. That frees up more TLS for arguments which can't be checked.

eugenis added inline comments.Jun 22 2020, 4:31 PM
compiler-rt/test/msan/cxa_atexit.cpp
20 ↗(On Diff #270260)

For tests that do not make sense with the eager checks, you could introduce a lit feature, and guard them with
// REQUIRES: !msan_eager_checks

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
288

... and return values ...

885

There is only one argument that can be uninit - the value to be stored. No need to set the attribute on all of them.

990

The module flag is gone, and in any case this should not be a fatal error - if the frontend can not promise correct attribute value, we always delay the check, that's all.

1223

It's also common to reuse the builder with IRB.SetInsertPoint

1233

This change can be done in a separate patch.

It's more efficient to OR everything together and have a single branch. I'm not sure which would generate better code:

  • ShadowCombiner and materializeOneCheck on the result, this might do bad things if there are very large members (like vectors)
  • icmp NE 0 each members' shadow and materializeOneCheck of that.
guiand updated this revision to Diff 273864.Jun 26 2020, 4:50 PM
guiand edited the summary of this revision. (Show Details)

Addressed comments, updated for noundef

guiand marked 12 inline comments as done.Jun 26 2020, 4:53 PM
guiand updated this revision to Diff 276550.Jul 8 2020, 2:11 PM
guiand marked 2 inline comments as done.

I'm splitting this patch into this new one, which depends on nothing but noundef correctly parsed by LLVM, and the rest which relies on clang emitting noundef everywhere.

guiand marked an inline comment as done.Jul 8 2020, 2:11 PM
guiand added inline comments.
llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
3467

This one is for the return attribute, the other was for the arguments

vitalybuka accepted this revision.Jul 8 2020, 2:43 PM
vitalybuka added inline comments.
llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
1684

clang-format this

llvm/test/Instrumentation/MemorySanitizer/msan_eager.ll
2

?

This revision is now accepted and ready to land.Jul 8 2020, 2:43 PM
guiand updated this revision to Diff 276607.Jul 8 2020, 5:15 PM
guiand marked an inline comment as done.

Fix test checking wrong __msan_warning variant, clang-format

vitalybuka added inline comments.Jul 9 2020, 3:30 AM
llvm/test/Instrumentation/MemorySanitizer/msan_eager.ll
2
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -msan-check-access-address=0 -msan-track-origins=1 -msan-eager-checks -S -passes='module(msan-module),function(msan)' 2>&1 | \
; RUN:   FileCheck -allow-deprecated-dag-overlap -check-prefixes=CHECK,CHECK-ORIGINS %s

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define noundef i32 @NormalRet() nounwind uwtable sanitize_memory {
; CHECK-LABEL: @NormalRet(
; CHECK-NEXT:    ret i32 123
;
  ret i32 123
}

define i32 @PartialRet() nounwind uwtable sanitize_memory {
; CHECK-LABEL: @PartialRet(
; CHECK-NEXT:    store i32 0, i32* bitcast ([100 x i64]* @__msan_retval_tls to i32*), align 8
; CHECK-NEXT:    store i32 0, i32* @__msan_retval_origin_tls, align 4
; CHECK-NEXT:    ret i32 123
;
  ret i32 123
}

define noundef i32 @LoadedRet() nounwind uwtable sanitize_memory {
; CHECK-LABEL: @LoadedRet(
; CHECK-NEXT:    [[P:%.*]] = inttoptr i64 0 to i32*
; CHECK-NEXT:    [[O:%.*]] = load i32, i32* [[P]], align 4
; CHECK-NEXT:    [[TMP1:%.*]] = ptrtoint i32* [[P]] to i64
; CHECK-NEXT:    [[TMP2:%.*]] = xor i64 [[TMP1]], 87960930222080
; CHECK-NEXT:    [[TMP3:%.*]] = inttoptr i64 [[TMP2]] to i32*
; CHECK-NEXT:    [[TMP4:%.*]] = add i64 [[TMP2]], 17592186044416
; CHECK-NEXT:    [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to i32*
; CHECK-NEXT:    [[_MSLD:%.*]] = load i32, i32* [[TMP3]], align 4
; CHECK-NEXT:    [[TMP6:%.*]] = load i32, i32* [[TMP5]], align 4
; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i32 [[_MSLD]], 0
; CHECK-NEXT:    br i1 [[_MSCMP]], label [[TMP7:%.*]], label [[TMP8:%.*]], !prof !0
; CHECK:       7:
; CHECK-NEXT:    call void @__msan_warning_with_origin_noreturn(i32 [[TMP6]]) #1
; CHECK-NEXT:    unreachable
; CHECK:       8:
; CHECK-NEXT:    ret i32 [[O]]
;
  %p = inttoptr i64 0 to i32 *
  %o = load i32, i32 *%p
  ret i32 %o
}


define void @NormalArg(i32 noundef %a) nounwind uwtable sanitize_memory {
; CHECK-LABEL: @NormalArg(
; CHECK-NEXT:    [[P:%.*]] = inttoptr i64 0 to i32*
; CHECK-NEXT:    [[TMP1:%.*]] = ptrtoint i32* [[P]] to i64
; CHECK-NEXT:    [[TMP2:%.*]] = xor i64 [[TMP1]], 87960930222080
; CHECK-NEXT:    [[TMP3:%.*]] = inttoptr i64 [[TMP2]] to i32*
; CHECK-NEXT:    [[TMP4:%.*]] = add i64 [[TMP2]], 17592186044416
; CHECK-NEXT:    [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to i32*
; CHECK-NEXT:    store i32 0, i32* [[TMP3]], align 4
; CHECK-NEXT:    store i32 [[A:%.*]], i32* [[P]], align 4
; CHECK-NEXT:    ret void
;
  %p = inttoptr i64 0 to i32 *
  store i32 %a, i32 *%p
  ret void
}

define void @PartialArg(i32 %a) nounwind uwtable sanitize_memory {
; CHECK-LABEL: @PartialArg(
; CHECK-NEXT:    [[TMP1:%.*]] = load i32, i32* bitcast ([100 x i64]* @__msan_param_tls to i32*), align 8
; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* getelementptr inbounds ([200 x i32], [200 x i32]* @__msan_param_origin_tls, i32 0, i32 0), align 4
; CHECK-NEXT:    [[P:%.*]] = inttoptr i64 0 to i32*
; CHECK-NEXT:    [[TMP3:%.*]] = ptrtoint i32* [[P]] to i64
; CHECK-NEXT:    [[TMP4:%.*]] = xor i64 [[TMP3]], 87960930222080
; CHECK-NEXT:    [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to i32*
; CHECK-NEXT:    [[TMP6:%.*]] = add i64 [[TMP4]], 17592186044416
; CHECK-NEXT:    [[TMP7:%.*]] = inttoptr i64 [[TMP6]] to i32*
; CHECK-NEXT:    store i32 [[TMP1]], i32* [[TMP5]], align 4
; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i32 [[TMP1]], 0
; CHECK-NEXT:    br i1 [[_MSCMP]], label [[TMP8:%.*]], label [[TMP9:%.*]], !prof !0
; CHECK:       8:
; CHECK-NEXT:    store i32 [[TMP2]], i32* [[TMP7]], align 4
; CHECK-NEXT:    br label [[TMP9]]
; CHECK:       9:
; CHECK-NEXT:    store i32 [[A:%.*]], i32* [[P]], align 4
; CHECK-NEXT:    ret void
;
  %p = inttoptr i64 0 to i32 *
  store i32 %a, i32 *%p
  ret void
}

define void @CallNormal() nounwind uwtable sanitize_memory {
; CHECK-LABEL: @CallNormal(
; CHECK-NEXT:    [[R:%.*]] = call i32 @NormalRet() #0
; CHECK-NEXT:    call void @NormalArg(i32 [[R]]) #0
; CHECK-NEXT:    ret void
;
  %r = call i32 @NormalRet() nounwind uwtable sanitize_memory
  call void @NormalArg(i32 %r) nounwind uwtable sanitize_memory
  ret void
}

define void @CallWithLoaded() nounwind uwtable sanitize_memory {
; CHECK-LABEL: @CallWithLoaded(
; CHECK-NEXT:    [[P:%.*]] = inttoptr i64 0 to i32*
; CHECK-NEXT:    [[O:%.*]] = load i32, i32* [[P]], align 4
; CHECK-NEXT:    [[TMP1:%.*]] = ptrtoint i32* [[P]] to i64
; CHECK-NEXT:    [[TMP2:%.*]] = xor i64 [[TMP1]], 87960930222080
; CHECK-NEXT:    [[TMP3:%.*]] = inttoptr i64 [[TMP2]] to i32*
; CHECK-NEXT:    [[TMP4:%.*]] = add i64 [[TMP2]], 17592186044416
; CHECK-NEXT:    [[TMP5:%.*]] = inttoptr i64 [[TMP4]] to i32*
; CHECK-NEXT:    [[_MSLD:%.*]] = load i32, i32* [[TMP3]], align 4
; CHECK-NEXT:    [[TMP6:%.*]] = load i32, i32* [[TMP5]], align 4
; CHECK-NEXT:    [[_MSCMP:%.*]] = icmp ne i32 [[_MSLD]], 0
; CHECK-NEXT:    br i1 [[_MSCMP]], label [[TMP7:%.*]], label [[TMP8:%.*]], !prof !0
; CHECK:       7:
; CHECK-NEXT:    call void @__msan_warning_with_origin_noreturn(i32 [[TMP6]]) #1
; CHECK-NEXT:    unreachable
; CHECK:       8:
; CHECK-NEXT:    call void @NormalArg(i32 [[O]]) #0
; CHECK-NEXT:    ret void
;
  %p = inttoptr i64 0 to i32 *
  %o = load i32, i32 *%p
  call void @NormalArg(i32 %o) nounwind uwtable sanitize_memory
  ret void
}

define void @CallPartial() nounwind uwtable sanitize_memory {
; CHECK-LABEL: @CallPartial(
; CHECK-NEXT:    store i32 0, i32* bitcast ([100 x i64]* @__msan_retval_tls to i32*), align 8
; CHECK-NEXT:    [[R:%.*]] = call i32 @PartialRet() #0
; CHECK-NEXT:    [[_MSRET:%.*]] = load i32, i32* bitcast ([100 x i64]* @__msan_retval_tls to i32*), align 8
; CHECK-NEXT:    [[TMP1:%.*]] = load i32, i32* @__msan_retval_origin_tls, align 4
; CHECK-NEXT:    store i32 [[_MSRET]], i32* bitcast ([100 x i64]* @__msan_param_tls to i32*), align 8
; CHECK-NEXT:    store i32 [[TMP1]], i32* getelementptr inbounds ([200 x i32], [200 x i32]* @__msan_param_origin_tls, i32 0, i32 0), align 4
; CHECK-NEXT:    call void @PartialArg(i32 [[R]]) #0
; CHECK-NEXT:    ret void
;
  %r = call i32 @PartialRet() nounwind uwtable sanitize_memory
  call void @PartialArg(i32 %r) nounwind uwtable sanitize_memory
  ret void
}
guiand updated this revision to Diff 276761.Jul 9 2020, 9:39 AM

Use autogenerated test

guiand marked 10 inline comments as done.Jul 9 2020, 9:40 AM
guiand added inline comments.
llvm/test/Instrumentation/MemorySanitizer/msan_eager.ll
2

Thanks for helping me out with this, Vitaly!

eugenis accepted this revision.Jul 13 2020, 3:54 PM

LGTM with a nit

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
3416

Remove "else".

3525–3534

This logic is a bit confusing.
Maybe move the "main" check into this function, and express it as

bool EagerCheck = ((ClEagerChecks && hasAttribute) || (name == "main"));
bool StoreShadow = !(ClEagerChecks && hasAttribute);

This revision was automatically updated to reflect the committed changes.
guiand marked an inline comment as done.