In https://bugs.llvm.org/show_bug.cgi?id=34460 CStringChecker tries to evalCast() a memory region from void * to char * for the purposes of modeling mempcpy(). The memory region turned out to be an element region of type unsigned int with a symbolic index. For such regions, even such trivial casts have turned out to be unsupported (modeled as UnknownVal), and the checker crashed upon asserting them to be supported.
I tried to get such trivial cast working for a few days and ended up believing that it not worth the effort because of:
- How we represent casts in the SVal hierarchy.
- In our case, the region is an ElementRegion with element type unsigned int, which essentially represents a pointer of type unsigned int *, not void *. It means that the cast is definitely not a no-op; the region needs to be modified.
- Now, if the offset was concrete, we already try to replace ElementRegion with another ElementRegion of the other type, and recompute element index accordingly. It is only possible if byte offset of the region is divisible by size of the target type (in our case it is because the target type is char). If it's not divisible, we make two ElementRegions on top of each other, first with element-type char to represent a raw offset, other of the target type to represent the cast. It means that for symbolic index case, we need to account for being able to receive such two-element-regions construct in the first place, and then investigate divisibility of a linear expression of form a*N + b, where N is concrete but a and b are possibly symbolic, to a concrete type size, which our range constraint manager would never handle.
- In fact, now we not only unpack the top one or two ElementRegions while computing the offset (that needs to be divisible), but also we unpack all other top-level element regions (i.e. if we look into a multi-dimensional array; see ElementRegion::getAsArrayOffset()). This was never tested; i added relevant tests into casts.c in this patch to demonstrate the problems caused by that (which are unrelated to the rest of the patch). I believe that unpacking multi-dimensional arrays in this way is not ideal; however, i also added tests that would fail if this behavior is disabled. This needs much more work. Of course, this work would be even harder if we intend to support symbolic offsets.
- How we split the responsibilities between entities. When trying to implement the plan described in 1., i ended up moving a lot of code around and passing ProgramState through dozens of APIs. We need the state to perform binary operations on SVal objects through SValBuilder. Originally, castRegion() resides in RegionStore which is not intended to access the program state. It is trivial to move castRegion to SimpleSValBuilder, but passing State into it results in a huge cascade of changes in SValBuilder method arguments. I have given up when i had to pass State to SValBuilder::getConstantVal(const Expr *);, which seemed ridiculous (it's constant, why does it depend on the state?). Introducing all this complexity only to compute an SVal that nobody would be able to understand was not worth the effort, in my opinion.
Hence the quick fix. Code slightly simplified to use a more high-level API.
Here we have y1 pointing to element{element{x, 3}, 5} vs. both y2 and y3 pointing to element{x, 35}. While y3 is inevitably looking like that, y2 could have been same as y1 if we didn't unwrap multi-dimensional array elements.