This is an archive of the discontinued LLVM Phabricator instance.

[dfsan] Track field/index-level shadow values in variables
ClosedPublic

Authored by stephan.yichao.zhao on Nov 27 2020, 11:13 PM.

Details

Summary

  • The problem *****

See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current
DFSan always uses a 16bit shadow value for a variable with any type by
combining all shadow values of all bytes of the variable. So it cannot
distinguish two fields of a struct: each field's shadow value equals the
combined shadow value of all fields. This introduces an overtaint issue.

Consider a parsing function

std::pair<char*, int> get_token(char* p);

where p points to a buffer to parse, the returned pair includes the next
token and the pointer to the position in the buffer after the token.

If the token is tainted, then both the returned pointer and int ar
tainted. If the parser keeps on using get_token for the rest parsing,
all the following outputs are tainted because of the tainted pointer.

The CL is the first change to address the issue.


  • The proposed improvement ******

Eventually all fields and indices have their own shadow values in
variables and memory.

For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8},
[2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16],
{[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with
primary type still have shadow values i16.


  • An potential implementation plan *******

The idea is to adopt the change incrementially.

  1. This CL

    Support field-level accuracy at variables/args/ret in TLS/Fast16 mode, load/store/alloca still use combined shadow values.

    After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level.
  1. Support field-level accuracy at alloca
  2. Support field-level accuracy at load/store

    These two should make O0 and real memory access work.
  1. Support vector if necessary.
  2. Support Args mode if necessary.
  3. Support passing more accurate shadow values via custom functions if necessary.
  4. Support legacy non-fast16 mode if necessary.

    ***
    • About this CL. ***

The CL did the following

  1. extended TLS arg/ret to work with aggregate types. This is similar to what MSan does.
  2. implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const.
  3. extended (insert|extract)value to use field/index-level progagation.
  4. for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases.
  5. Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custom functions need more accurate shadow propagation yet.
  6. Added test cases for aggregate type related cases.

Diff Detail

Event Timeline

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 27 2020, 11:13 PM
Herald added subscribers: llvm-commits, Restricted Project, arphaman, hiraditya. · View Herald Transcript
stephan.yichao.zhao requested review of this revision.Nov 27 2020, 11:13 PM
stephan.yichao.zhao edited the summary of this revision. (Show Details)
stephan.yichao.zhao edited the summary of this revision. (Show Details)

Thanks for your work on this, Jianzhou. Overall I think this is a reasonable approach to improving label precision.

This is a fairly large patch; would it be possible to split it into a series of dependent patches? For example, the renaming/refactoring stuff could go in its own patch and probably the changes to select.ll could as well. If there's a way to also split the functional changes, that would be even better. This would make it easier for me to review and give better feedback.

stephan.yichao.zhao added a comment.EditedDec 1 2020, 9:50 AM

This is a fairly large patch; would it be possible to split it into a series of dependent patches? For example, the renaming/refactoring stuff could go in its own patch and probably the changes to select.ll could as well. If there's a way to also split the functional changes, that would be even better. This would make it easier for me to review and give better feedback.

Is the following split better?

  1. the extension of dfsan_arg_tls and dfsan_ret_tls. This also updates test cases that check them. For example, select.ll
  2. rename ShadowTy/ZeroShadow to PrimaryShadowTy/ZeroShadowTy
  3. modification of test cases phi.ll and load.ll since they can test the existing work too.
  4. add dfsan e2e test cases about struct to lock down the existing behavior.
  5. the rest functional change that supports field-level accuracy. This part is hard to split because we may have to make all instruction propagation use the same shadow value to make it work.

updated after rebasing those children diffs

How does this change affect instrumented binary size?

llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
450
454
469
473
507

Nit: This is only used for mapping expanded shadow to collapsed shadow. Maybe CachedCollapsedShadows is more descriptive, while being consistent with the naming of CachedShadows?

539–540
547–548
553

Nit: I think expandFromPrimitiveShadow and collapseToPrimitiveShadow are more intuitive names.

564–566
569–571
674–699
689–691
705

Nit: This is really just a recursive helper function. Maybe a name like expandFromPrimitiveShadowRecursive is more descriptive? Also since it doesn't access any member variables, it could just be a local static function.

722–723
734

This function is small enough, I think we should just inline it into convertFromPrimitiveShadow below.

772

OR only works for fast16 mode.

791

collapseStructShadow and collapseArrayShadow are practically identical. Can we use a templates or polymorphism to share their implementation?

794

Since this function is small, I think we should inline it into convertToPrimitiveShadow below.

814

This additional check isn't used for the CCS cache. Why do we need it here?

962
1412

Nit: This function is tiny. Maybe we should just inline it everywhere instead?

1418

Do you plan to make this work without converting everything to primitive shadow?

1524
1769

What's the reason for expanding the shadow when we're about to collapse it again in storeShadow?

stephan.yichao.zhao marked 22 inline comments as done.

update

How does this change affect instrumented binary size?

For the same large application used by D92440, this diff added 0.004% code size overhead.
Because the change of D92440 reduces 0.1%, so after the diff, the code size is still smaller.
I think this is because returning or passing struct/array is rare in C++ code, although if this happens at critical code path, it introduces overtaint as the motivation example used in the description.

We also looked into a small code to see how this changes code gen. Interestingly, somehow at TLS part, the diff's code is slightly smaller.
For example, for a C code

typedef struct Pair {
  int i;
  char *ptr;
} Pair;
Pair make_pair(int i, char *ptr) {
  Pair pair;
  pair.i = i;
  pair.ptr = ptr;
  return pair;
}

At ret. the old dfsan does

mov    %rsi,%rdx
mov    %edi,%eax
mov    0x0(%rip),%rcx 
movzwl %fs:(%rcx),%esi
or     %fs:0x2(%rcx),%si
mov    0x0(%rip),%rcx 
mov    %si,%fs:(%rcx)
retq

the new dfsan does

mov    %rsi,%rdx
mov    %edi,%eax
mov    0x0(%rip),%rcx 
mov    %fs:(%rcx),%ecx
mov    0x0(%rip),%rsi  
mov    %ecx,%fs:(%rsi)
retq

We can see although Pair has two fields, it is represented by one register.
Because the old dfsan needs to union the two fields before returning, this seems a more complicated register-level code because of unioning a register itself.., while the new dfsan does not do such a union.

I feel there could be cases where the new dfsan may generate more code.
But because in a large application, passing/returning complicated struct is not common, it seems fine.

llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
553

renamed to collapseToPrimitiveShadow

705

moved it to a static function expandFromPrimitiveShadowRecursive.

772

Thank you for catching this! My original test case worked for non-fast16 accidentially because it used only labels 1 and 2..., and our large applications use only fast16 mode..

The or operation of the legacy mode is more complicated; plus, introducing more 'or' can consume legacy labels more quickly. So it is not clear if it is a good trade-off between accuracy and running-out-of-labels...

I added a shouldTrackFieldsAndIndices method to ensure this diff only enables field-level shadow for fast16 mode. none-fast16 mode needs more evaluation in the next step.

794

This convertToPrimitiveShadow and the above collapseArrayShadow make a mutual recursion.

The convertToPrimitiveShadow below is like the main entry point.

814

The use of the CCS cache assumes that in the same block instructions are visited sequentially. So it does not need to check domination inside the same block.

CachedShadow2PrimitiveShadow has a case here.
Somehow it may insert a new instruction in a reversed order because of phi node insertion and because this ClDebugNonzeroLabels feature is a post-process.
So we added this to ensure in the same block CS.Shadow dominates Pos.

1412

It was inlined before. I found it is used many times, so made it a function.

1418

Yes.

The conversion or collapsing happens in mainly the following cases

  1. memory operations

The next step is making load/store/alloca use non-collapsed values. This is required for O0-compiled code. O0 does not promote alloca to variables. So most struct operations go through memory...

  1. ClCombinePointerLabelsOnStore/ClCombinePointerLabelsOnLoad/ClTrackSelectControlFlow

When ClCombinePointerLabelsOnStore/ClCombinePointerLabelsOnLoad/ClTrackSelectControlFlow are true, we can also do field-level combining rather than combining-then-union.

  1. custom function wrapper

It is not clear if any wrapper needs struct parameters or ret values yet

  1. when combineShadows is used by combineOperandShadows

I feel in this case most operands are not aggregate types except select, insert/extract, which are already considered separately. If any op can use aggregate types, we could treat them specially.

In the above 4 cases, probably we wanted to see how non-collapsed propagation changes code size to get a better trade-off.

1769

Good catch! I renamed storeShadow to be storePrimitiveShadow to indicate it reads primitive values, and we renamed unnecessary conversions.

W/o the change, it indeed generates dead-code, although the following pass may remove them.

morehouse added inline comments.Dec 8 2020, 10:44 AM
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
215

Nit: Moving this helper function closer to where it's used would improve readability.

546

WDYT about renaming to expandFromPrimitiveShadow?

814

Please add a comment explaining this.

If CS.Shadow dominates Pos, do we still need to check that CS.Block dominates Pos->getParent()?

1705

Please delete commented-out code here and elsewhere.

llvm/test/Instrumentation/DataFlowSanitizer/abilist_aggregate.ll
181

I'm not familiar with the custom wrapper logic. Why does this function store 0 to arg TLS?

191

This function should store {a1, b0} shadow to retval TLS, right? Should we verify that?

llvm/test/Instrumentation/DataFlowSanitizer/array.ll
213

Why are there two stores to SP? 2 x i1 is less than 1 byte, so wouldn't a single i16 shadow be enough?

Or is there a hidden 1 byte alignment in the array?

245

Shouldn't there be more ORs for each element in %a?

260

What does this last store to "P3" do?

llvm/test/Instrumentation/DataFlowSanitizer/struct.ll
20

What's the reason for having DEBUG_NONZERO_LABELS here when it tests nothing interesting?

260
273
llvm/test/Instrumentation/DataFlowSanitizer/union-large.ll
3013

What's the reason for changing this test?

stephan.yichao.zhao marked 15 inline comments as done.Dec 8 2020, 5:07 PM
stephan.yichao.zhao added inline comments.
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
814

Added the comments at CachedCollapsedShadows. Actually we only need to catch values since we do not need to check Block dominations if we have to check dominations between values.

llvm/test/Instrumentation/DataFlowSanitizer/abilist_aggregate.ll
181

This seems the effect of the existing design.

dfs$call_custom_cb calls the customized @__dfsw_custom_cb.
The user-provided @__dfsw_custom_cb calls @call_custom_cb.

And the instrumentation around "call %cb" inside @call_custom_cb is by this visitCallInst.

This visitor is from

DFSanFunction DFSF(*this, F, /*IsNativeABI=*/true);

When IsNativeABI is one, getShadow always returns 0 for arguments, and
@custom_cb does not return shadow.

@__dfsw_custom_cb receives dfst0$custom_cb and @"dfs$cb" instead of @cb.
@"dfs$cb" is a dfsan @cb that uses TLS to pass shadows at args/ret, while dfst0$custom_cb is a wrapper of @"dfs$cb", and dfst0$custom_cb passes shadows by additional arguments.

I think in @__dfsw_custom_cb, users' code is like

def @__dfsw_custom_cb(@"dfst0$custom_cb, @"dfs$cb", arg1, arg2, ..., arg_shadow1, arg_shadow2, ..., ret_shadow) {
    auto cb_ret_shadow;
    auto my_cb = [&] (...) {
      ...
      ret @"dfst0$custom_cb"(@"dfs$cb", cb_arg1, cb_arg2, ..., cb_arg_shadow1, cb_arg_shadow2, ..., cb_ret_shadow);
    }
    auto r = @custom_cb(my_cb, ...)
    // set ret_shadow in terms of all shadows...
}

The puzzle is that all arguments shadow are assigned to this DFSF, although they should never be used. It is from the first version.

I am not sure this is supposed to be the correct custom wrapper of a function with callbacks.

llvm/test/Instrumentation/DataFlowSanitizer/array.ll
213

This is defined by the above data layout. It defines i1:8:8. So each i1 takes 1 byte, and we have 2-byte shadow for each 1 byte.

245

This is where the current diff loses accuracy.
When saving an aggregate value into memory, we call that collapse function to convert an accurate shadow to a i16 label.
So this diff only increases accuracy for variables, arguments and ret.

This works for O1-compiled targets, because alloca premotion removes lots of memory operations, and practice code does not save aggregate types to memory.
If we build by O0, it does not work as those pair.cc and struct.c test.

We need to address this in the next change.

260

This is testing the loop from here to here.

when a data to store is large, it first saves vectors, then saves the rest as primitive types.
The instructions before P3 are about vector saving, the rest are for primitive-data saving.

This is another reason this diff collapses aggregate shadow to i16. It makes the code still reuse the same code for storing/loading.

The next change that preserves aggregate accuracy needs to redesign this logic.

llvm/test/Instrumentation/DataFlowSanitizer/struct.ll
20

Thank you for catching this. Added.

llvm/test/Instrumentation/DataFlowSanitizer/union-large.ll
3013

this is not necessary. removed.

stephan.yichao.zhao marked 7 inline comments as done.

updated

stephan.yichao.zhao edited the summary of this revision. (Show Details)Dec 8 2020, 5:09 PM
stephan.yichao.zhao edited the summary of this revision. (Show Details)
morehouse added inline comments.Dec 9 2020, 6:45 AM
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
1777–1778

Please remove this commented-out code.

llvm/test/Instrumentation/DataFlowSanitizer/abilist_aggregate.ll
181

I think I understand... Normally custom_cb would not be instrumented by DFSan (hence why we added it to the ABI list). So probably whatever happens here is unimportant. Maybe we should remove all these checks except the first line:

; TLS_ABI: define { i1, i7 } @custom_cb({ i1, i7 } ({ i32, i1 }, [2 x i7])* %cb, { i32, i1 } %a, [2 x i7] %b)
llvm/test/Instrumentation/DataFlowSanitizer/array.ll
245

So there *should* be more ORs in the current diff, right? But the plan is to fix this, so that's why they aren't listed here?

stephan.yichao.zhao marked 3 inline comments as done.

addressed comments

llvm/test/Instrumentation/DataFlowSanitizer/array.ll
245

added.

morehouse accepted this revision.Dec 9 2020, 10:56 AM

LGTM. Thanks for the nice test coverage too!

This revision is now accepted and ready to land.Dec 9 2020, 10:56 AM