This is the bug that I'll be using in the LLVM Developer's Meeting presentation as an example of how to debug things, so spoiler alert!
And let me explain it a bit slower than usual.
Here's the original false positive that we're fixing:
It has appeared by applying the Static Analyzer to LLVM itself. Namely, you should be able to reproduce it locally by applying the static analyzer to LLVM rL373385; the static analyzer itself should be one commit behind this commit.
The warning tells us that variable Reg on line 824 is uninitialized:
This, however, is clearly a false positive, because the lambda invocation isKilledReg(MO, Reg) receives a non-const reference to Reg and initializes it on the current path:
The static analyzer somehow fails to realize that Reg is initialized, and even displays a clearly incorrect note (41) "Returning without writing to 'Reg'".
The variable Reg itself is declared in the caller function, transferSpillOrRestoreInst, and passed into the current function by reference, which is also named Reg:
In order to debug the issue, I dumped the trimmed exploded graph for the current analysis and used exploded-graph-rewriter to remove the unnecessary details:
$ clang ... -analyze-function="(anonymous namespace)::LiveDebugValue::runOnMachineFunction(class llvm::MachineFunction &)" -analyzer-dump-egraph=graph.dot -trim-egraph $ exploded-graph-rewriter.py --topology graph.dot # I searched the topology dump and found out that our bug node is 52419. $ exploded-graph-rewriter.py --to 52419 --single-path --diff graph.dot
Here's the final graph dump:
I tried to find out which transition in the graph was incorrect.
First of all I confirmed that there is no binding in the Store for the Reg in the bug state. The referece Reg was there and was known to refer to the variable Reg, but the variable itself is indeed nowhere to be found, as if it was never written to:
This gave me confidence that the checker is not to be blamed: the variable was "known" to be uninitialized, therefore the checker was right to report it, and we need to find out why was the variable incorrectly known to be uninitialized.
Therefore, logically, the next question was why did the assignment on line 812, before step 41, was not recorded in the Store. Here's the Exploded Node in which the assignment operator was evaluated:
Here we can see that the assignment was in fact recorded in the Store, but in a wrong memory region. In our notation, SymRegion{reg_$3519<unsigned int & Reg>} is an unknown region of memory that is the pointee of the reference Reg points to.
However, given that Reg is a parameter in a function call into which we dived during analysis, this symbolic value should not have existed to begin with, because we have concrete knowledge about what Reg points to. Instead, we should have had a binding in the Store from Reg (the memory region of the reference parameter) to &Reg (the pointer to the variable Reg), and loading from Reg (as part of evaluating which location should be written into) should have yielded &Reg rather than &SymRegion{reg_$3519<unsigned int & Reg>}.
In order to confirm this educated guess, I compared the evaluation that the Static Analyzer did for call sites of isLocationSpill():
and isKilledReg():
And, indeed, the binding Reg : &Reg is supposed to be there in both cases, but it is missing in case of isKilledReg(). Interestingly, the binding for the other parameter, MO, is also missing in case of isKilledReg.
By scrolling up a few nodes we can see that the bindings for expressions Reg and MO are present in the Environment when the call is being entered:
In other words, the translation of parameter values from the Environment to the Store was not performed correctly during call enter, even though the necessary information seems to have been present in the program state. Therefore this transition was the primary suspect.
Then I attached debugger to the transition. Even though there are many call enters that were evaluated throughout this analysis, I can attach the debugger precisely with the help of the stable IDs:
(lldb) br s -n inlineCall -c 'Pred->Id = 52357'
Here inlineCall is a method of ExprEngine that is responsible for "diving" into calls during analysis. Because the ExprEngine class is responsible for evaluating the effects of specific statements (and other CFG elements) over the program state, any transition can be debugged by setting a similar breakpoint on a method of ExprEngine.
Step-by-step debugging led me into function addParameterValuesToBindings() in CallEvent.cpp, which is indeed the function responsible for the translation of parameter values from the Environment to the Store:
528 SVal ArgVal = Call.getArgSVal(Idx); 529 if (!ArgVal.isUnknown()) { 530 Loc ParamLoc = SVB.makeLoc(MRMgr.getVarRegion(ParamDecl, CalleeCtx)); 531 Bindings.push_back(std::make_pair(ParamLoc, ArgVal)); 532 }
This transition seems straightforward, however there is a caveat - in C++ the transition is skipped for parameters that were constructed by invoking a constructor:
522 if (Call.isArgumentConstructedDirectly(Idx)) 523 continue;
Indeed, if a constructor was invoked, then the Store bindings should already be there by the time the call is entered, so there is no need to manually bind them.
We do in fact have an argument that is constructed directly in this example, which is MO - a C++ object of class MachineOperand that gets passed into isKilledReg by value:
By the way, you can learn more about how constructors are modeled in a talk by @george.karpenkov and me in LLVM DevMtg 2018: https://www.youtube.com/watch?v=4n3l-ZcDJNY
The problem, however, was that isArgumentConstructedDirectly(Idx) returned false for MO but true for Reg! It should clearly be the other way round.
This was happening due to an off-by-one error caused by how in Clang AST for historical reasons the numbering of parameters in member operator declarations is different by 1 from the numbering of arguments in member operator call expressions: one of them counts "this", the other doesn't. I was already aware about this problem, which is why in D49443 I added convenient accessor methods for dealing with these mismatched offsets. However, I forgot to use them in this tiny code snippet. Hence the patch.
Thanks for reading!
What about this one? It smells the same issue.