This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Check NULL pointer dereference issue for memset function
ClosedPublic

Authored by xiangzhai on Apr 9 2017, 8:38 PM.

Details

Summary

Hi LLVM developers,

As Anna mentioned:

One idea is to check that we do not pass a pointer that is known to be NULL to functions that are known to dereference pointers such as memcpy. There is a checker that determines if a null pointer could be dereferenced already, but there is no extension to check if such a pointer could be passed to a function tat could dereference it.

So I implemented evalMemset in the CStringChecker to detect null-deref issue. please review my patch, thanks a lot!

Regards,
Leslie Zhai

Diff Detail

Repository
rL LLVM

Event Timeline

xiangzhai created this revision.Apr 9 2017, 8:38 PM

Thanks! Looks like a valueable addition.

lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2004

even better: != 3

2009

The name "S" does not tell me much.. how about something like Data / DataArg / PtrArg / ..?

2011

Variables should start with capital.. State, SizeVal, SizeTy, ...

2034

use early return:

if (!stateNonZeroSize)
  return;

Hi Daniel,

Thanks for your review!

Sorry I am not available until this Friday, then I will update my patch.

Regards,
Leslie Zhai

NoQ edited edge metadata.Apr 13 2017, 4:09 AM

This looks correct! Thanks for taking this up.

One of the possible improvements for future work here would be to actually bind the second argument value to the buffer instead of just invalidating it. Like, after memset(buf, 0, sizeof(buf)) the analyzer should know that all values in the buf array are 0. In the analyzer we have the notion of *default bindings* to handle that (see documentation in docs/analyzer/RegionStore.txt for more details). However, they would only work when the whole region is memseted, not when we're wiping a smaller portion of the region. But the special case of memseting the whole region is important enough to be worth handling separately anyway. In order to detect this special case, you can find the region's *extent* and see if it's exactly equal to the size argument (ideally, handle comparison with the assume method). This would work not only for arrays, but also for regions allocated via malloc or C++ operator new. There is an attempt to do a similar thing for strdup in D20863 (mixed with more complicated stuff).

Since we're in CStringChecker, we may also want to see how memset affects our model of string length. If we memset to zero everything from the beginning of the region, the length of the resulting C string is 0; if not to zero, then the length of the resulting C string is at least the size argument; if not from the beginning, then we've no idea, unless we knew the length before and it was less then the offset from which we started memseting.

lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2025

Trailing spaces here.

2037

One more trailing space.

2051–2052

And a few more trailing spaces :) I usually catch those with a colored git diff and highlight them in my editor as well, and also have a hotkey to clang-format current line, which is probably the most handy.

Hi Artem,

Thanks for your review! I will update my patch tomorrow :)

Regards,
Leslie Zhai

xiangzhai updated this revision to Diff 95273.Apr 13 2017, 11:58 PM

Hi All,

I have updated my patch as your suggestion, please review it, thanks a lot!

Regards,
Leslie Zhai

Hi Artem,

If it looks good to you, I want to try commit by myself for testing commit, thanks!

Regards,
Leslie Zhai

Please click "Done" on fixed review comments.

lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2038

This "if (StateNonZeroSize)" condition is useless now since you added a early return

2039

I suggest code cleanup:

State = StateNonZeroSize;
State = checkNonNull(C, State, Mem, MemVal);

Change that to:

State = checkNonNull(C, StateNonZeroSize, Mem, MemVal);
2063

indentation. I personally like the clang-format-diff script also. That would cleanup all your changes.

cd llvm/tools/clang
svn diff | python tools/clang-format/clang-format-diff.py -i

It should work with git diff also but I have not tried that.

NoQ added a comment.Apr 18 2017, 7:17 AM

Wow, so you're doing the binding thing now? Thanks! It was not critical for landing this patch, so you could have fixed comments here, allowing us to commit what's already done, and then proceed with further improvements. It's also easier to review and aligns with the LLVM's policy of incremental development.

Could you add test cases for the new feature? For instance,

void foo() {
  int *x = malloc(sizeof(int));
  memset(x, 0, sizeof(int));
  1 / *x; // expected-warning{{Division by zero}}
}
void bar() {
  int *x = malloc(sizeof(int));
  memset(x, 0, 1);
  1 / *x; // no-warning
}

Tests that involve setting memory to anything but 0 are also welcome!

lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2066

I believe that if the size is not the same, you'd still need to do invalidation.

Hi All,

Thanks for your review! I will update my patch tomorrow! Almost 4+ days no see, I miss you :)

Regards,
Leslie Zhaii

xiangzhai updated this revision to Diff 95690.EditedApr 18 2017, 11:39 PM

Hi Artem,

so you're doing the binding thing now?

No! it only works for RetVal for example int *ret = memset(x, 0, sizeof(int)); please see my testcase:

void foo() {
  int *x = malloc(sizeof(int));
  int *ret = memset(x, 0, sizeof(int));
  int n = 1 / *ret; // expected-warning {{Division by zero}}
  free(x);
}

but not work for MemVal for example int n = 1 / *x;

Please carefully review my patch to point out my fault: wrongly use bindDefault? I have tried bindDefault for MemVal, and for RetVal. thanks a lot!

Regards,
Leslie Zhai

xiangzhai updated this revision to Diff 96344.Apr 23 2017, 10:14 PM

Hi Artem,

Because memcpy checked NULL pointer dereference for src:

state = checkNonNull(C, state, Source, srcVal);
...

so such testcase can not point out my fault:

void f15 () {                                                                      
  int *x = malloc(sizeof(int));                                                    
  memcpy(x, 0, sizeof(int)); // expected-warning {{Null pointer argument in call to memory copy function}}
  int n = 1 / *x;                                                                  
  free(x);                                                                         
}

And I have no idea how to copy RetVal to Mem s:

if (StateSameSize) {                                                            
      SVal ConstVal = State->getSVal(Const, LCtx);                                  
      State = State->BindExpr(CE, LCtx, RetVal);                                    
      // Actually bind the second argument value to the buffer.                     
      State = State->bindDefault(RetVal, ConstVal, LCtx);                           
      // FIXME: Copy to Mem                                                         
      const MemRegion *MR = RetVal.getAsRegion();                                   
      if (!MR)                                                                      
        return;                                                                     
      MR = MR->StripCasts();                                                        
      if (const TypedValueRegion *TVR = MR->getAs<TypedValueRegion>()) {            
        MemVal = SB.makeLazyCompoundVal(StoreRef(State->getStore(),                 
                    State->getStateManager().getStoreManager()), TVR);              
        State = State->BindExpr(CE, LCtx, MemVal);                                  
      }                                                                             
      C.addTransition(State);                                                       
    }

Please give me some advice, thanks a lot!

Regards,
Leslie Zhai

NoQ added a comment.Apr 27 2017, 8:45 AM

I've a feeling we need to roll this back a little bit.

The memset() function does the following:

  1. Accesses pointers in range R = [first argument, first argument + third argument] and crashes when accessing an invalid pointer.
  2. Writes second argument to all bytes in range R.
  3. Returns its first argument.

Assuming R is an empty set, steps 1 and 2 are skipped.

We handle step 1 through checkNonNull and checkBufferAccess. These easy-to-use functions are already available in the checker, just pass the arguments there directly.

For step 2, we decided to skip most of the step, instead invalidating the whole base region around R. There's a separate task to come up with a better behavior here, but we decided to do it later, probably in the next patch. So for now, the relationship between the size of the buffer and the third argument are not relevant and should not be considered - this is an idea for the future.

Finally, step 3 can be done by binding the call expression to the symbolic value of the first argument through BindExpr. You are already doing this in the zero-size case, but it should be similar in all cases. There is no need to construct a new symbol here - just use the existing SVal.

lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2034

In fact this early return is unnecessary. Of the two states returned by assume(), at least one is always non-null (we have an assertion for that). However, the since the situation (StateZeroSize && !StateNonZeroSize) was checked above, it follows from (!StateNonZeroSize) that StateZeroSize is also null, which contradicts the above.

So we can assert here that StateNonZeroSize is not null.

2045–2049

Here you construct a new symbol that represents the pointer returned. However, we return our first argument, which is already denoted by symbolic value MemVal, so we don't need a new symbol here - we'd return MemVal directly, and work on it directly during invalidation.

Also, note that it's not necessarily on the heap.

Hi Artem,

Thank you so much! you are my mentor teach me patiently and carefully, I will update my patch tomorrow, good night from my timezone:)

Regards,
Leslie Zhai

xiangzhai updated this revision to Diff 97042.Apr 27 2017, 10:14 PM

Hi Artem,

Please review my updated patch, thanks a lot!

Regards,
Leslie Zhai

NoQ added a comment.Jun 1 2017, 4:34 AM

The code looks good now! A few minor comments and we can commit this :)

lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2010

This variable is unused. It might make buildbots angry.

test/Analysis/null-deref-ps-region.c
23

Could you mark this as FIXME? Eg:

int n = 1 / *x; // FIXME: no-warning

Because eventually it should warn.

34

Can we make function names more fancy?

Eg. "testConcreteNull", "testStackArray", "testHeapSymbol", etc.

Hi Artem,

Long time no see! miss you :)

I will update my patch next Thursday! I am doing my work assignment about L4 right now.

Regards,
Leslie Zhai

xiangzhai updated this revision to Diff 101847.Jun 7 2017, 9:00 PM

Hi Artem,

I updated my patch please review it, thanks a lot!

Regards,
Leslie Zhai

xiangzhai marked 9 inline comments as done.Jun 7 2017, 9:01 PM

Kindly ping Artem and Anna :)

NoQ accepted this revision.Jun 19 2017, 11:08 AM

Thanks, this was very useful, please commit!

This revision is now accepted and ready to land.Jun 19 2017, 11:08 AM
This revision was automatically updated to reflect the committed changes.
MTC added a subscriber: MTC.Sep 26 2017, 1:50 AM
MTC added a comment.Oct 24 2017, 2:01 AM

One of the possible improvements for future work here would be to actually bind the second argument value to the buffer instead of just invalidating it. Like, after memset(buf, 0, sizeof(buf)) the analyzer should know that all values in the buf array are 0. In the analyzer we have the notion of *default bindings* to handle that (see documentation in docs/analyzer/RegionStore.txt for more details).

BindDefault() is the only function that can make the default binding, is it? If so, evalMemset() uses bindDefault(), the binding may not take effect. Because the current BindDefault() logic is that if the memory area has been initialized, then the default binding will no longer be done, see https://github.com/llvm-mirror/clang/blob/master/lib/StaticAnalyzer/Core/RegionStore.cpp#L429. Before evalMemset(), MallocMemAux() in MallocChecker.cpp may have already made the default binding. Am I right?

Thank you!

Dear Henry,

Sorry for my poor English, please Create a New patch or provide Chinese translation :)

Regards,
Leslie Zhai

NoQ added a comment.Oct 24 2017, 3:04 AM
In D31868#904814, @MTC wrote:

One of the possible improvements for future work here would be to actually bind the second argument value to the buffer instead of just invalidating it. Like, after memset(buf, 0, sizeof(buf)) the analyzer should know that all values in the buf array are 0. In the analyzer we have the notion of *default bindings* to handle that (see documentation in docs/analyzer/RegionStore.txt for more details).

BindDefault() is the only function that can make the default binding, is it? If so, evalMemset() uses bindDefault(), the binding may not take effect. Because the current BindDefault() logic is that if the memory area has been initialized, then the default binding will no longer be done, see https://github.com/llvm-mirror/clang/blob/master/lib/StaticAnalyzer/Core/RegionStore.cpp#L429. Before evalMemset(), MallocMemAux() in MallocChecker.cpp may have already made the default binding. Am I right?

Seems so, and it also looks super counter-intuitive. We definitely do need to overwrite default bindings from time to time. When RegionStore itself wants to overwrite a default binding, it does removeBindings() first (see invalidateGlobalRegion() as an example). I think ProgramState::bindDefault() should behave this way as well, not sure about impact of such change.