This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Improve PlacementNewChecker
ClosedPublic

Authored by f00kat on Mar 28 2020, 1:45 PM.

Details

Summary
  1. Added insufficient storage check for arrays
  2. Added align support check

Based on https://reviews.llvm.org/D76229

Diff Detail

Event Timeline

f00kat created this revision.Mar 28 2020, 1:45 PM
f00kat updated this revision to Diff 253408.EditedMar 29 2020, 1:17 AM

lint fixes

NoQ added a comment.Mar 29 2020, 7:34 AM

That was fast! Looks alright.

clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp
25

Before i forget: Ideally @martong should have subscribed to [[ https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerDocumentation.html#a7fdb3b5ff726f4c5e782cef0d59c01ad | checkNewAllocator ]] because it fires before the construct-expression whereas this callback fires after construct-expression which is too late as the UB we're trying to catch has occured much earlier.

165–166

You're saying that A is a struct and a is of type A and &a is sufficiently aligned then for every field f in the struct &a.f is sufficiently aligned. I'm not sure it's actually the case.

196–197

I don't think you'll ever see this case in a real-world program. Even if you would, i doubt we'll behave as expected, because we have certain hacks in place that mess up arithmetic on concrete pointers. I appreciate your thinking but i suggest removing this section for now as it'll probably cause more false positives than true positives.

Wohoow! I am impressed, this is really nice work, I like it! :) Could not find any glitch, looks good from my side once you address NoQ's concerns.

clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp
125

Maybe the below could be a wording that's more easy to follow?
{0} bytes is possibly not enough for array allocation which ...

f00kat updated this revision to Diff 253628.Mar 30 2020, 10:17 AM
f00kat marked 3 inline comments as done.
  1. Removed code and tests for ConcreteInt cases
  2. Fixed FieldRegion check
  3. Added handling for ElementRegion cases such as
void f7() {
  short b[10];

  // ok. 2(short align) + 3*2(index '1' offset)
  ::new (&b[3]) long;
}
  1. Fixed align error message
  2. Maybe fixed lint warnings
f00kat updated this revision to Diff 253642.Mar 30 2020, 10:57 AM
f00kat marked an inline comment as done.

test fix

martong added inline comments.Mar 31 2020, 12:44 AM
clang/test/Analysis/placement-new.cpp
265

Maybe it is just me, but the contents of the parens here and above seems a bit muddled (index '2' offset). This should be (index '1' offset), shouldn't it?
What is the exact meaning of the number in the hyphens ('2' in this case), could you please elaborate?

f00kat updated this revision to Diff 253803.Mar 31 2020, 1:30 AM
f00kat marked an inline comment as done.

Fixed comments in tests

martong accepted this revision.Mar 31 2020, 2:37 AM

LGTM! Thanks! But I am not that confident with the element regions and field regions, so @NoQ could you please take another look?

clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp
82

This will break build-bots that run with -Werror.

../../git/llvm-project/clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp:82:15: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
   if (IsArray = NE->isArray()) {
clang/test/Analysis/placement-new.cpp
256

First I was wondering if we indeed handle correctly structs with nested arrays whose element's type is a structs with nested arrays (... and so on).

So, I tried the below test, and it seems okay. Thus I think it might be worth to add something similar to it.

void f9_1() {
  struct Y {
    char a;
    alignas(alignof(short)) char b[20];
  };
  struct X {
    char e;
    Y f[20];
  } Xi; // expected-note {{'Xi' initialized here}}

  // ok 2(custom align) + 6*1(index '6' offset)
  ::new (&Xi.f[6].b[6]) long;

  // bad 2(custom align) + 1*1(index '1' offset)
  ::new (&Xi.f[1].b[1]) long; // expected-warning{{Storage type is aligned to 3 bytes but allocated type is aligned to 8 bytes}} expected-note 1 {{}}
}
This revision is now accepted and ready to land.Mar 31 2020, 2:37 AM
f00kat updated this revision to Diff 255039.Apr 4 2020, 8:21 AM
  1. Maybe build fix
  2. Added tests for nested arrays of structures
  3. Fixed bugs in implementation for ElementRegion cases
NoQ added inline comments.Apr 6 2020, 12:58 AM
clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp
187

The sequence of FieldRegions and ElementRegions on top of a base region may be arbitrary: var.a[0].b[1][2].c.d[3] etc.

I'd rather unwrap those regions one-by-one in a loop and look at the alignment of each layer.

NoQ added inline comments.Apr 6 2020, 4:01 AM
clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp
187

Alternatively, just decompose the whole region into base region and offset and see if base region has the necessary alignment and the offset is divisible by the necessary alignment.

f00kat marked 4 inline comments as done.Apr 19 2020, 12:17 AM

Ping? :)

clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp
25

Ops, I forgot about it. Will be fixed soon.

25

When I use checkNewAllocator instead of check::PreStmt<CXXNewExpr> an error occures in method PathDiagnosticBuilder::generate.
In the code ErrorNode->getLocation().getTag() because ProgramPoint contains an empty ProgramPointTag.
I`m not very good with analyzer so its hard to understand what`s going on, but I`m still trying..

125

Yeah. Sure. I have some troubles with english :)

165–166

Yeah..Maybe we should take into account struct type align plus field offset? Currently, I am relying only on the fact that just struct type is aligned properly

For example

struct X
{
  char a;
  int b;
} x;

Type X is aligned to 'int' and thus field 'x.a' is also aligned to 'int' because it goes first

struct Y
{
  char a;
  int b;
  char c;
  char d;
} y;

But here field 'y.d' is aligned to 'char'

I will learn more about RegionOffset and will be back :)

187

The sequence of FieldRegions and ElementRegions on top of a base region may be arbitrary: var.a[0].b[1][2].c.d[3] etc.

But i think(hope) I already do this and even have tests for this cases. For example

void test22() {
  struct alignas(alignof(short)) Z {
    char p;
    char c[10];
  };

  struct Y {
    char p;
    Z b[10];
  };

  struct X {
    Y a[10];
  } Xi; // expected-note {{'Xi' initialized here}}

  // ok. 2(X align) + 1 (offset Y.p) + 1(align Z.p to 'short') + 1(offset Z.p) + 3(index)
  ::new (&Xi.a[0].b[0].c[3]) long;
}

Cases with multidimensional arrays will also be handled correctly because method 'TheElementRegion->getAsArrayOffset()' calculates the offset for multidimensional arrays

void testXX() {
	struct Y {
		char p;
		char b[10][10];
	};

	struct X {
		Y a[10];
	} Xi;

	::new (&Xi.a[0].b[0][0]) long;
}

I can explain the code below for ElementRegion if needed.

187

?

196–197

Okay I will remove it!

clang/test/Analysis/placement-new.cpp
256

Thanks! Added tests for this cases.

265

Yeah, sorry it is copy-paste error. Of course there should be '1'.

// bad 2(custom align) + 1(index '1' offset)
martong added inline comments.Apr 20 2020, 1:49 AM
clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp
187

Yeah, the tests are convincing and I think that you are handling the regions well.

On the other hand, this code is getting really complex, we should make it easier to read and understand.
E.g. FieldOffsetValue should be explained more, is it the offset started from the the start address of the multidimensional array, or it is just the offset from one element's start address?
Also, you have two variables named as Offset. They are offsets from which starting address? Perhaps we should have in the comments a running example, maybe for &Xi.a[0].b[0][0]? I mean is FieldOffseValue is standing for b or for a?

NoQ added inline comments.Apr 20 2020, 2:41 AM
clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp
125

Why do we say "possibly"? Where does the uncertainty come from?

187

Alternatively, just decompose the whole region into base region and offset and see if base region has the necessary alignment and the offset is divisible by the necessary alignment.

I expect this to be, like, 5 lines of code. I don't understand why the current code is so complicated, it looks like you're considering multiple cases but ultimately doing the same thing.

f00kat updated this revision to Diff 259499.Apr 23 2020, 2:36 AM
  1. Rewroted ElementRegion processing and fixed tests for this cases.
  2. Simplified the code a bit.
martong added inline comments.Apr 23 2020, 9:03 AM
clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp
271

Perhaps you could call instead checkVarRegionAlign()?

271

Also, I think BaseRegion can be an ElementRegion here as well. So, in that case we should call into checkElementRegionAlign(), shouldn't we? This draws a pattern that we should recursively descend down to the top most base region. I.e. the different check*RegionAlign methods should call into each other until we reach the top level base region.

The observation here is that the alignment of a region can be correct only if we can prove that its base region is aligned properly (and other requirements, e.g. the offset is divisible). But the base region may have another base region and we have to prove the alignment correctness to that as well.

I hope this makes sense, please correct me if I am wrong.

clang/test/Analysis/placement-new.cpp
245

So these tests failed after you rewrote ElementRegion processing, right?
Actually, I wonder why we thought that if x is divisible by 2 then (x+6) will be divisible by 8 unconditionally.It's good you have this fixed.

f00kat updated this revision to Diff 259636.Apr 23 2020, 11:05 AM
f00kat marked 4 inline comments as done.
  1. Refactoring
  2. Build fix
f00kat updated this revision to Diff 259667.Apr 23 2020, 12:13 PM
  1. Build fix
martong accepted this revision.May 4 2020, 7:18 AM

... This draws a pattern that we should recursively descend down to the top most base region. I.e. the different check*RegionAlign methods should call into each other until we reach the top level base region.

The observation here is that the alignment of a region can be correct only if we can prove that its base region is aligned properly (and other requirements, e.g. the offset is divisible). But the base region may have another base region and we have to prove the alignment correctness to that as well.

This could be an issue not just with alignment but maybe with the size as well, I am not sure if we handle the offset properly in compound cases like this: Xi.b[0].a[1][6].

Even though the above issue is still not investigated/handled, I think this patch is now acceptable because seems like most of the practical cases are handled. We could further investigate the concern and improve in a follow-up patch.
I'd like to see this landed and thanks for your work!

... This draws a pattern that we should recursively descend down to the top most base region. I.e. the different check*RegionAlign methods should call into each other until we reach the top level base region.

The observation here is that the alignment of a region can be correct only if we can prove that its base region is aligned properly (and other requirements, e.g. the offset is divisible). But the base region may have another base region and we have to prove the alignment correctness to that as well.

This could be an issue not just with alignment but maybe with the size as well, I am not sure if we handle the offset properly in compound cases like this: Xi.b[0].a[1][6].

Even though the above issue is still not investigated/handled, I think this patch is now acceptable because seems like most of the practical cases are handled. We could further investigate the concern and improve in a follow-up patch.
I'd like to see this landed and thanks for your work!

Thanks for feedback!

I still have no rights to push in the repo so if you think that it is acceptable could you commit it please?

clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp
125

I don’t know what specific size Clang uses for its internal needs in array cases. If you can tell this size, I will use it here.

187

Thank you for feedback!
Rewroted the code for ElementRegion cases. Now it is much easier to understand, and there was also a bug in logic.

271

Perhaps you could call instead checkVarRegionAlign()?

Yeah, you are right. Thank you!

Also, I think BaseRegion can be an ElementRegion here as well. So, in that case we should call into checkElementRegionAlign(), shouldn't we? This draws a pattern that we should recursively descend down to the top most base region. I.e. the different check*RegionAlign methods should call into each other until we reach the top level base region.

I`m not sure that BaseRegion can be ElementRegion. Anyway there always must be some Variable in the end? Or maybe I am wrong?

The observation here is that the alignment of a region can be correct only if we can prove that its base region is aligned properly (and other requirements, e.g. the offset is divisible). But the base region may have another base region and we have to prove the alignment correctness to that as well.

I hope this makes sense, please correct me if I am wrong.

I split check into to two stages.

  1. Check BaseRegion align.

But if Var has its own align specifier we ignore BaseRegion align.

  1. Check that total offset is divisible by the necessary alignment.

When I say total offset I mean that it is calculated from the BaseRegion(through all Fields and Elements Regions. e.g. Xi.a[10].b[20].c[30]. Total offset of 'c[30]' is offset from &Xi).

So in this solution we no need to recursively check all regions align.

clang/test/Analysis/placement-new.cpp
245

Yes, It is my fault. For example variable can be allocated at address 0x149E730 and it is well aligned to 2,4,8. And the assert that '0+6' is well
aligned to '8' will be wrong.

Thanks, just committed.

This revision was automatically updated to reflect the committed changes.