This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Add transfer function for addrof
ClosedPublic

Authored by sgatev on Jan 17 2022, 8:28 AM.

Details

Summary

This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.

Diff Detail

Event Timeline

sgatev created this revision.Jan 17 2022, 8:28 AM
sgatev requested review of this revision.Jan 17 2022, 8:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2022, 8:28 AM
xazax.hun accepted this revision.Jan 17 2022, 11:02 PM

The address-of part looks good to me. However, just realized that I'm not convinced whether the dereference operator, or references, in general, are correctly handled.

clang/lib/Analysis/FlowSensitive/Transfer.cpp
193–214

I know this is not strictly related to this patch, but why do we actually need do getPointeeLoc here? I'd expect UO_Deref to be a noop for all intents and purposes.

E.g. for a snippet like:

int f(int *p) {
    return *p;
}

The AST looks like:

`-CompoundStmt 0x5565d151d038 <col:15, line:4:1>
  `-ReturnStmt 0x5565d151d028 <line:3:5, col:13>
    `-ImplicitCastExpr 0x5565d151d010 <col:12, col:13> 'int' <LValueToRValue>
      `-UnaryOperator 0x5565d151cff8 <col:12, col:13> 'int' lvalue prefix '*' cannot overflow
        `-ImplicitCastExpr 0x5565d151cfe0 <col:13> 'int *' <LValueToRValue>
          `-DeclRefExpr 0x5565d151cfc0 <col:13> 'int *' lvalue ParmVar 0x5565d151ce00 'p' 'int *'

I'd expect any actual dereference to happen in the LValueToRValue cast.
The reason is, this is how references can be handled. E.g.:

void f(int &p) {
    int &q = p;
    int r = p;
}

Has the AST:

|-DeclStmt 0x55d49a1f00b0 <line:3:5, col:15>
| `-VarDecl 0x55d49a1effd0 <col:5, col:14> col:10 q 'int &' cinit
|   `-DeclRefExpr 0x55d49a1f0038 <col:14> 'int' lvalue ParmVar 0x55d49a1efe00 'p' 'int &'
`-DeclStmt 0x55d49a1f0180 <line:4:5, col:14>
  `-VarDecl 0x55d49a1f00e0 <col:5, col:13> col:9 r 'int' cinit
    `-ImplicitCastExpr 0x55d49a1f0168 <col:13> 'int' <LValueToRValue>
      `-DeclRefExpr 0x55d49a1f0148 <col:13> 'int' lvalue ParmVar 0x55d49a1efe00 'p' 'int &'

The reason why you know when should you propagate the reference or the pointee of the reference from the subexpression is because of the LValueToRValue cast. So you already need to do the "dereference" operator there to correctly handle references. And if you already do that, I see no reason to duplicate this logic here for pointers. Do I miss something?

This revision is now accepted and ready to land.Jan 17 2022, 11:02 PM
sgatev marked an inline comment as done.Jan 18 2022, 3:03 AM
sgatev added inline comments.
clang/lib/Analysis/FlowSensitive/Transfer.cpp
193–214

In the first example you shared the type of the LValueToRValue expression is int * and the type of the UO_Deref expression is int so I think we need to do something to translate one into the other. My understanding is that LValueToRValue performs indirection through references whereas UO_Deref performs indirection through pointers. This is why I think we ultimately need both. I might be wrong so please correct me and I'd be happy to address this in a follow up.

This revision was landed with ongoing or failed builds.Jan 18 2022, 3:33 AM
This revision was automatically updated to reflect the committed changes.
sgatev marked an inline comment as done.
xazax.hun added inline comments.Jan 18 2022, 4:02 AM
clang/lib/Analysis/FlowSensitive/Transfer.cpp
193–214

Indeed, the type did change at UO_Deref.
I think the problem with handling both as performing the indirection is that we would end up performing two indirections for the first code snippet as both the cast and the dereference operator are present and we would need to introduce special code to avoid that.

When we implemented lifetime analysis, initially we also handled UO_Deref as performing the dereference operator. It did not quite work, but after replacing UO_Deref with noop, and relying on Clang to always put an LValueToRValue cast whenever we need to do a dereference operation worked out pretty well and we have not seen a code snippet yet that would not work with that approach.

I agree that the types are confusing in the AST. I am not even convinced about their correctness. For us, it just worked out OK as the type of the pointee always had the right type, so we did not even have to look at the AST.

Basically, in the current model, there is a decision one needs to make every time we look a location up. Namely, whether we should pass SkipPast::Reference. I wonder if we actually need to make that decision in the code at all. I think it might be possible that the AST has LValueToRValue casts every time we need to look past references, and we could just follow what the AST suggests without forcing us to make a decision point at every API call. That would make it way less error prone to work with this codebase and probably also making it more resilient to future changes in the language or the AST.

Don't get me wrong, I think what you have currently is perfectly OK and the tests demonstrate that this approach works, but I wonder if it is possible to simplify it. And the main reason I think there should be a way to make this simpler, because I implemented something similar in the past and I did not need the equivalent of SkipPast::Reference. Here is most of that code: https://github.com/mgehre/llvm-project/blob/lifetime/clang/lib/Analysis/LifetimePsetBuilder.cpp#L324