This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Retrieve a value from list initialization of multi-dimensional array declaration.
ClosedPublic

Authored by ASDenysPetrov on Oct 12 2021, 9:15 AM.

Details

Summary

Add support of multi-dimensional arrays in RegionStoreManager::getBindingForElement. Handle nested ElementRegion's getting offsets and checking for being in bounds. Get values from the nested initialization lists using obtained offsets.

Example:

const int arr[3][2] = {{1, 2}, {3, 4}};
arr[0][0];  // 1
arr[0][1];  // 2
arr[0][2];  // UB
arr[1][0];  // 3
arr[1][1];  // 4
arr[1][-1]; // UB
arr[2][0];  // 0
arr[2][1];  // 0
arr[-2][0]; // UB

Diff Detail

Event Timeline

ASDenysPetrov created this revision.Oct 12 2021, 9:15 AM
ASDenysPetrov requested review of this revision.Oct 12 2021, 9:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2021, 9:15 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ASDenysPetrov edited the summary of this revision. (Show Details)Oct 12 2021, 9:18 AM

Could you please rebase on top of D111542? That would make the changes direct and clear and the review would be easier.

Could you please rebase on top of D111542? That would make the changes direct and clear and the review would be easier.

OK. I'll rebase.

Improved. Rebased on top of D107339.
@martong kind ping.

This is an important step towards better handling of global initializer expressions.
I'm looking forward to it. Although, I have concerns to address.

clang/lib/AST/Type.cpp
141–143 ↗(On Diff #381562)

These comments should be in the header file instead.

149 ↗(On Diff #381562)

I suspect this not gonna look through typedefs.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1750–1761

I dislike this part of code. It has side effects all over the place.
Please use immediately called lambdas to describe your intentions in a more pure manner.

Besides that, when I see ElementRegions, I'm always scared. They sometimes represent reinterpret casts.
Since you have dealt with so many casts and region store stuff in the past, I'm assuming you have thought about this here.
Although, I'm still scared slightly, and I'd like to have evidence of this thought process. Likely some assertions or at least some comments? Could you @ASDenysPetrov elaborate on this?

1814

I would rather take it by value. They are designed so.
A mutable reference is always a bad sign anyway.
BTW for a brief moment I thought reverse() returns a temporary, but it doesn't, so we don't dangle at least.

1820

Please use the isNegative(). And don't have sideffects in conditions. advance the iterator at the end of the if.
I also think you can eliminate the I variable.
That way the getExtValue() would be guarded by the negativity check. I know that your code would behave the same, but I would find that phrasing slightly easier to follow.
This way you could even spare the 2 comment lines.

1824

This is probably a rare case when a continue is preferred instead of the else statement.

1861–1872

Just spell out uint64_t here. We don't win much by using auto.

1879–1884

Please, restructure this. Probably negating the condition and returning on that path would do the job.

clang/test/Analysis/initialization.cpp
17–23

Sorry If I was misunderstood in the previous patches.
I think, for this instance, the key is that arr is const, so this TU is supposed to provide this linker symbol, thus any other definitions would violate the ODR.
So, the FIXME is actually accurate, and we should report 0 here.

56

Does this work if the = {} is not present?

99

Uh, this would be a regression.
Accessing Unknown is not a bug, unlike accessing Undefined - which is a clear indication that we must have had UB in the calculation previously, to get this. So, this is slightly similar to llvm poison on that sense.

110

typo

martong added inline comments.Oct 25 2021, 6:59 AM
clang/include/clang/AST/Type.h
2953 ↗(On Diff #381562)

Could this be a free function (not a member function)? That way we could put it as an implementation detail of RegionStore.cpp and we could spare the timely review from the AST code owners.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
444–446

Could you please add docs to this function? Especially to ConcreteOffsets param.

1748–1761

I think this could be implemented in it's own separate function, couldn't it?

1763

Please address this TODO.

1799–1800

Perhaps this hunk could be in an individual implementation function?

clang/test/Analysis/initialization.c
76

Very good!

ASDenysPetrov added inline comments.Oct 26 2021, 7:40 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1750–1761

Please use immediately called lambdas.

I'll do.

They sometimes represent reinterpret casts. ... I'm assuming you have thought about this here.

Yes. This is my another revision in the stack. You are welcome: D110927

1763

I found the tests:

  • Analysis/MemRegion.cpp
  • Analysis/ctor.mm
  • Analysis/mpichecker.cpp
  • Analysis/string.c

They cover this check.

ASDenysPetrov added inline comments.Oct 26 2021, 8:10 AM
clang/test/Analysis/initialization.cpp
17–23

Right. FALSE expected. And it is fixed with the patch. But this case duplicates some more detailed cases I've added below. So I decide to remove it.

56

The compiler (AST) doesn't pass you through without an initializer by emitting a warning. But still there is a case without initializer at the end of the file. Yes, it does work.

99

This is a big problem with casts (in SValBuilder::evalCast). I've investigated it. Relates to D89055. IMO the solution will take a separate patch stack. I'm going to fix this next.

ASDenysPetrov added inline comments.Oct 26 2021, 9:12 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1799–1800

There are two returns inside the loop. That would not such easy to do a separate function for this.

@martong @steakhal
Thank you for your remarks. Updated according to them (at least the best I could do for now).

martong added inline comments.Oct 27 2021, 6:23 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1799–1800

There are two returns inside the loop. That would not such easy to do a separate function for this.

I think you could return with an Optional<SVal> and then in the call site you could check if the optional is set.
Optional<SVal> areOffsetsOutOfBounds(...) ?

Then in the caller:

if (Optional<SVal> CheckResult = areOffsetsOutOfBounds(...))
  return *CheckResult;

Please, try to focus on marking inline comments done if you accomplished them.
It takes some time to reevaluate them one-by-one on each update.
Aside from that, I'd like to apply these and play with them but I'm frequently having conflicts applying your patches.

Regarding the patch, I think it's in a pretty good shape, given that you fix the regression I pointed in an inline comment.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1810–1811

Can this be empty? If so, the subsequent .front() is UB.
Put here an assertion if you are sure that it cannot be empty.

clang/test/Analysis/initialization.cpp
17–23

Awesome, thanks!

56

Okay, thanks!

99

I think until you do that we should stick to the previous behavior.
Please address this regression, so report UNKNOWN for these cases or the correct answer.

I'll do an update soon taking into account all the suggestions.

Please, try to focus on marking inline comments done if you accomplished them.

In the next update.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1799–1800

Great idea.

ASDenysPetrov marked 17 inline comments as done.Oct 28 2021, 8:57 AM

@steakhal @martong Updated according to your comments.
I think I'm done with this patch.

ASDenysPetrov updated this revision to Diff 383396.EditedOct 29 2021, 9:37 AM
  • Added support of aliased types (work with canonical array type).
  • Added tests aliased types.
  • Added tests with initialization values in parentheses.

Changed test to fix a buildbot failure.

I'm still worried about the fact that you assume that there is a correspondence between ElementRegions and InitListExprs.
I cannot see why this assumption holds, since reinterpret casts might introduce ElementRegions which could mess with this assumption.
Aside from that, I'm okay with the general idea of doing this kind of stuff.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1661–1662

If I'm right then this is called the base of the region more often than not.
It would be probably a better choice than using MR.

1663–1665

IMO this is not a concern in real-life code.

1666–1667

I would prefer returning a pair instead of an output parameter.

1788–1790

I think you can hide this within the getConstantArrayExtents(). That is the only function touching this variable anyway.
That way you could also spare the comments about CAT requiring it to be canonical.

@steakhal Thank you for your suggestion. I'll make corresponding changes.

I'm still worried about the fact that you assume that there is a correspondence between ElementRegions and InitListExprs.
I cannot see why this assumption holds, since reinterpret casts might introduce ElementRegions which could mess with this assumption.

Currently in practice ElementRegions for array element access expression look like InitListExprs structure. But you are right about mess with different casts. That's why I have FIXME's in glob_ptr_index2 and glob_invalid_index6.
Right now I'm working on fixing this stuff. I'm gonna refactor StoreManager::castRegion in a part of ElementRegions to distinguish between array indirections and casts.

@steakhal Thank you for your suggestion. I'll make corresponding changes.

I'm still worried about the fact that you assume that there is a correspondence between ElementRegions and InitListExprs.
I cannot see why this assumption holds, since reinterpret casts might introduce ElementRegions which could mess with this assumption.

Currently in practice ElementRegions for array element access expression look like InitListExprs structure. But you are right about mess with different casts. That's why I have FIXME's in glob_ptr_index2 and glob_invalid_index6.
Right now I'm working on fixing this stuff. I'm gonna refactor StoreManager::castRegion in a part of ElementRegions to distinguish between array indirections and casts.

Awesome! Let's get the minor nits addressed and land it.
Thank you for working on this.

Updated according to @steakhal's comments.

steakhal accepted this revision.Nov 5 2021, 10:53 AM

Feel free to commit your change given that you fix my final nits.
Yeah, I'm soo bad in review. I should have proposed these previously. Sorry.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1641

You don't need to document this.
Aside from that consider marking these implementation detail functions static if they are not yet in an anonymous namespace - I haven't checked.

1646

I think we will get it :D

1749

The std::tie() pattern is probably more readable.

This revision is now accepted and ready to land.Nov 5 2021, 10:53 AM
ASDenysPetrov marked 7 inline comments as done.

Updated according to the nits.

@steakhal

Thanks for your effort!

Yeah, I'm soo bad in review. I should have proposed these previously. Sorry.

No, you're not! I'm telling you. Keep it up! :)