This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Create `Value`s for integer literals.
ClosedPublic

Authored by mboehme on Jun 13 2023, 6:26 AM.

Details

Summary

This patch includes a test that fails without the fix.

I discovered that we weren't creating Values for integer literals when, in a
different patch, I tried to overwrite the value of a struct field with a literal
for the purposes of a test and was surprised to find that the struct compared
the same before and after the assignment.

This functionality therefore seems useful at least for tests, but is probably
also useful for actual analysis of code.

Diff Detail

Event Timeline

mboehme created this revision.Jun 13 2023, 6:26 AM
Herald added a project: Restricted Project. · View Herald Transcript
mboehme requested review of this revision.Jun 13 2023, 6:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2023, 6:26 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
xazax.hun accepted this revision.Jun 13 2023, 7:27 AM
This revision is now accepted and ready to land.Jun 13 2023, 7:27 AM
gribozavr2 accepted this revision.Jun 13 2023, 7:41 AM
mboehme added a comment.EditedJun 14 2023, 2:29 AM

Pre-merge check failures are in a test for the unchecked-optional-access check clang-tidy. The test that's failing is the following from unchecked-optional-access.cpp:

void multiple_unchecked_accesses(absl::optional<int> opt1,
                                 absl::optional<int> opt2) {
  for (int i = 0; i < 10; i++) {
    opt1.value();
    // CHECK-MESSAGES: :[[@LINE-1]]:5: warning: unchecked access to optional
  }
  opt2.value();
  // CHECK-MESSAGES: :[[@LINE-1]]:3: warning: unchecked access to optional value
}

Both of the expected diagnostics are not output.

It looks to me as if the values we're now newly producing for integer literals are causing non-convergence of the analysis on the for-loop.

This looks like a pretty serious issue. We would certainly want analysis of a simple loop like this to converge, especially with a small bounded trip count, but even with an unbounded trip count. It looks as if, currently, the analysis is converging merely because we don't generate values for integer literals.

Next step will be to repro this issue with a test in TransferTest.cpp so that I can take a closer look at why we're failing to converge.

It looks to me as if the values we're now newly producing for integer literals are causing non-convergence of the analysis on the for-loop.

For integer literals specifically, we should be returning the same value for the same literal over multiple loop iterations. And also, ideally, for multiple equal literals, too, so that an integer literal "42", no matter how many times spelled in the program, creates the same symbolic value.

It looks to me as if the values we're now newly producing for integer literals are causing non-convergence of the analysis on the for-loop.

For integer literals specifically, we should be returning the same value for the same literal over multiple loop iterations. And also, ideally, for multiple equal literals, too, so that an integer literal "42", no matter how many times spelled in the program, creates the same symbolic value.

That's true, and I'll modify the patch so that we do this. (I should probably also include a test that checks that 42 == 42 is equivalent to true.)

But even without this, wouldn't we expect the analysis to converge? The way I'm treating integer literals right now, the analysis is in effect "seeing" the example I quoted above like this:

int return_some_random_int();
void multiple_unchecked_accesses(absl::optional<int> opt1,
                                 absl::optional<int> opt2) {
  for (int i = return_some_random_int(); i < return_some_random_int(); i++) {
    opt1.value();
    // CHECK-MESSAGES: :[[@LINE-1]]:5: warning: unchecked access to optional
  }
  opt2.value();
  // CHECK-MESSAGES: :[[@LINE-1]]:3: warning: unchecked access to optional value
}

We would expect analysis of this code to converge too -- right? So I think the problem we're seeing here is independent of how we treat integer literals specifically (though I agree the same integer literal should always be represented by the same Value).

We would expect analysis of this code to converge too -- right?

Yes - but we might need universal top values for that? maybe?

We would expect analysis of this code to converge too -- right?

Yes - but we might need universal top values for that? maybe?

Ah, I see what you mean.

Anyway, I'll look at producing "reproducible" Values for the same integer.

mboehme updated this revision to Diff 531694.Jun 15 2023, 4:22 AM

Always return the same Value for the same integer.

ymandel accepted this revision.Jun 15 2023, 4:37 AM
gribozavr2 accepted this revision.Jun 15 2023, 5:49 AM
gribozavr2 added inline comments.
clang/include/clang/Analysis/FlowSensitive/Arena.h
89

Should we be taking the type into account? If not, why not? Please add the type or document why the type isn't tracked.

xazax.hun added inline comments.Jun 15 2023, 8:30 AM
clang/include/clang/Analysis/FlowSensitive/Arena.h
89

What would happen if we try to create the same value (like 5) but with different bit widths? (E.g., 64 bit integer vs 32 bit integer). I suspect we might end up having the same value twice in our constant pool, which might be fine. Just wanted to double check what is the desired behavior.

mboehme updated this revision to Diff 531863.Jun 15 2023, 12:07 PM

Clarify that integer literals aren't typed.

mboehme marked 2 inline comments as done.Jun 15 2023, 12:13 PM
mboehme added inline comments.
clang/include/clang/Analysis/FlowSensitive/Arena.h
89

Should we be taking the type into account? If not, why not? Please add the type or document why the type isn't tracked.

I've clarified that the type isn't tracked but is instead determined by the Expr that the integer literal is associated with. (This is consistent with our treatment of IntegerValues more generally.)

If we want to do constant propagation through integral conversions, this should be done when processing the IntegerCast node. This would, if necessary, produce an integer literal with a different value.

89

What would happen if we try to create the same value (like 5) but with different bit widths? (E.g., 64 bit integer vs 32 bit integer). I suspect we might end up having the same value twice in our constant pool, which might be fine. Just wanted to double check what is the desired behavior.

This isn't a consideration, as integer literals aren't typed (see reply to comment above).

gribozavr2 accepted this revision.Jun 15 2023, 4:50 PM
This revision was automatically updated to reflect the committed changes.
mboehme marked an inline comment as done.