This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] MPIChecker: MPI_Waitall should respect count arg
Needs RevisionPublic

Authored by danix800 on Aug 24 2023, 10:03 PM.

Details

Summary

MPI_Waitall should respect count arg (ConcreteInt only)

Depends on https://reviews.llvm.org/D158707

Diff Detail

Event Timeline

danix800 created this revision.Aug 24 2023, 10:03 PM
Herald added a project: Restricted Project. · View Herald Transcript
danix800 requested review of this revision.Aug 24 2023, 10:03 PM
danix800 edited the summary of this revision. (Show Details)Aug 24 2023, 10:05 PM

Let me try to summarize some of the variables:

So, given an invocation like MPI_Waitall(C, Requests, Statues):

  • MR is lval(Requests)
  • ElementCount is the number of elements Requests consist of.
  • ArrSize is basically ElementCount but an APSInt.
  • MROffset is the offset and allocated region of the Requests region.
  • Index is ConcreteInt(0, ..., ArrSize)
  • Count is C, which is likely a ConcreteInt.

Please note that I have no idea what this checker does.

clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp
46–48

If these hunks are not closely related to the issue you intend to fix in this PR, I'd suggesst submitting it separately. That is a common API, and we wouldn't need tests for such defensive checks, as they rarely trigger.

185–188

How can ElemSizeInChar be zero?
If it can be zero, could you demonstrate it by a test?

189–190

What implies that MR->getAsOffset() succeeds, and also what implies that the Offset within that is not symbolic?
Also, how can you use this without also using the result region?
Without using that you don't know what is the anchor point, from which this offset represent anything.
ATM I believe the code assumes that MR->getRegion() equals to SuperRegion, which might not be always the case.
This could materialize a problem when you construct the element region later.

192–198

It's suspicious to me to compare stuff in the symbolic domain, such as Idx >= Count here.
And let me elaborate on why I think so.

I'd assume that the first argument of MPI_Waitall is usually just a number literal, thus it's gonna be a ConcreteInt, so now we have Count.
The Index you just created, obviously is just a ConcreteInt.
After this, we could simply unwrap them and do the comparison without doing it in the symbolic domain.


As a sidenote, State->assume(*CountReached, true) would return a state in two cases:

  1. We can prove that the assumption holds.
  2. We cannot disprove the assumption, thus it might hold, so we assume it holds.

This is probably not what you wanted in the first place, but it's probably irrelelevant anyway.

195

In C for example, we might not have a Boot type AFAIK.
Usually, we use SVB.getConditionType() in such cases.

200–202

My guess is that we only care about if MROffset is a concrete int, thus we could also just do this assidion in regular c++ and just transfer the result into the symbolic domain.

205

I believe you have a variable ASTCtx that you could use.

clang/test/Analysis/mpichecker-crash-gh64647.cpp
5–13

Unless you have a compelling reason creating this file, I'd suggest to append this to the end of the test/Analysis/mpichecker.cpp, and just leave a comment there like // GithHub Issue 64647.

On a side note, do we need this to be recursive? Is this the minimal case to reproduce the issue?

clang/test/Analysis/mpichecker.cpp
286–288

This loop here implicitly depends on the fact that we only unroll loops 4 times.
I'd prefer a simpler test where we don't even have a loop.

danix800 updated this revision to Diff 553514.Aug 25 2023, 9:24 AM
danix800 edited the summary of this revision. (Show Details)
  1. Move out complicated computation into separate function;
  2. Only check ConreteInt request count.

Let me try to summarize some of the variables:

So, given an invocation like MPI_Waitall(C, Requests, Statues):

  • MR is lval(Requests)
  • ElementCount is the number of elements Requests consist of.
  • ArrSize is basically ElementCount but an APSInt.
  • MROffset is the offset and allocated region of the Requests region.
  • Index is ConcreteInt(0, ..., ArrSize)
  • Count is C, which is likely a ConcreteInt.

Please note that I have no idea what this checker does.

It's really messy. I tried to refactor and clarify intentions with better naming. Thanks for the summary.

danix800 added inline comments.Aug 25 2023, 9:32 AM
clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp
46–48

Moved (together with the extra testcase file) into https://reviews.llvm.org/D158858.

Checking ExplodedNode against nullptr is a convention in almost all checkers. It surely can be null (quite often).

MPIChecker is one of the only two checkers that forget to do this checking and crashed as reported by #64647

danix800 added inline comments.Aug 25 2023, 9:39 AM
clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp
185–188

Type in the wild does not have to be of non-zero size. MPI vendors might choose whatever implementation as they want.
We are doing division so non-zero checking is necessary.

https://reviews.llvm.org/D158707 added a few general testcases for zero-sized type processing on dynamic extent.

danix800 added inline comments.Aug 25 2023, 9:41 AM
clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp
189–190

I'll restrict the checker to handle non-symbolic offset only.

clang/test/Analysis/mpichecker.cpp
286–288

I'll unroll this loop.

danix800 added inline comments.Aug 25 2023, 9:42 AM
clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp
192–198

I'll restrict this checker to handle ConcreteInt count only.

195

Thanks for this tip.

Ping~, could anyone have a look at this revision please? Thanks!

I don't have time this week, sorry.
Maybe others can take this over.
@donat.nagy could you please have a look?

donat.nagy requested changes to this revision.Sep 6 2023, 7:54 AM

I reviewed this change and collected my suggestions in comments.

In general, I feel that the code added by this commit is "sloppy" in the sense that it's designed for the common case and would behave unpredictably in unusual situations. This is a fault of the toolbox provided by the analyzer: in a saner world, you could freely combine the API functions (as you did) and expect that they would "just work" together; but in reality every function has its quirks and limitations, so the developer must understand the details of the implementation.

clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp
155

Use long long (or int64_t, ssize_t, whatever you prefer) instead of NonLoc here because it's easier to handle them (and you don't work with symbolic offsets anyway).

183

What happens here in the case when TypeSizeInChars.isZero() holds?

I see that you used a ternary operator to avoid division by zero; but do you get a meaningful and valid offset value in the case when MPI_Request has zero size and you use TypeSizeInBits == 8?

I understand that the divisior does not matter in the case when RequestRegionOffset.getOffset() is zero; but if I understand it correctly this offset value can be nonzero even in the case when MPI_Request has zero size. (Under the hood it calls MemRegion::getAsOffset() which may include the offset of e.g. a data member of an object.)

189–190

ATM I believe the code assumes that MR->getRegion() equals to SuperRegion, which might not be always the case.

This remark of @steakhal highlights a problem which is still present in your code: when you calculate SuperRegion you strip a single ElementRegion layer; while the offset that you get via getDynamicElementCountWithOffset() is calculated from MemRegion::getAsOffset() which can strip multiple ElementRegion, FieldRegion etc. layers.

Unfortunately the memory region API of Clang Static Analyzer is very inconsistent and quirky; you must dig deep into the implementation to understand the exact meaning of these function calls.

For example, if you have a data type like

struct ReqStruct {
    int custom_info;
    MPI_Request reqs[8];
} rs;

then the SuperRegion will refer to the immediate parent of the ElementRegion (which is presumably the FieldRegion corresponding to reqs); but the offset value is measured from the start of the VarRegion corresponding to the variable rs (and includes sizeof(int) + potential padding due to the presence of custom_info).

I think the correct solution would measure the offset from the start of the array of MPI_Request objects; because if you measure it from the start of the struct, then you may encounter serious issues, e.g. there is no guarantee that the offset will be divisible by the size of MPI_Request.

Unfortunately this would mean that you cannot rely on the logic of getDynamicElementCountWithOffset(); and I don't see a clear way to generalize that library function to limit the number/type of region layers that it strips.

Here I don't see a clear way forward, as the two potential solutions are a deep rewrite that complicates the library code and a creating a locally tweaked duplicate of the library logic -- and these both have serious drawbacks.

196

Paranoid remark: if these count values are both very large (e.g. negative numbers converted to size_t), then this loop could hang for a long time and/or exhaust the memory. Consider adding a hard limit on the number of iterations.

198–200

If you represent RegionOffset as a long long, you can replace this complicated evalBinOp with a simple SVB.makeArrayIndex(i + RegionOffset);.

Note: @steakhal already suggested this improvement in the earlier remark

My guess is that we only care about if MROffset is a concrete int, thus we could also just do this assidion in regular c++ and just transfer the result into the symbolic domain.

which referred to an older variant of your patch.

clang/test/Analysis/mpichecker.cpp
276–280

Please add a dummy field above the other fields to verify that the offsets are handled correctly. For example, char request_name[99]; would be sufficiently large to highlight errors.

This change could be applied in other tests as well.

This revision now requires changes to proceed.Sep 6 2023, 7:54 AM