Page MenuHomePhabricator

[clang-tidy] Add explicitly given array size heuristic to misc-suspicious-missing-comma check.
Needs RevisionPublic

Authored by szdominik on Apr 30 2016, 10:05 AM.

Details

Reviewers
alexfh
Summary

Additional heuristic to misc-suspicious-missing-comma checker, based on the (in)equality of the explicitly given array size and the real array size.
Note: in these cases we don't know that the given size is wrong or there is a missing comma.

Original checker revision: http://reviews.llvm.org/D18457

Diff Detail

Event Timeline

szdominik updated this revision to Diff 55721.Apr 30 2016, 10:05 AM
szdominik retitled this revision from to [clang-tidy] Add explicitly given array size heuristic to misc-suspicious-missing-comma check..
szdominik updated this object.
szdominik added reviewers: alexfh, etienneb.
szdominik added subscribers: cfe-commits, xazax.hun.
etienneb edited edge metadata.Apr 30 2016, 10:54 AM

If you are interested to a quick chat on this Checker, ping me. I know other cases that should be filtered and I'm lacking time to implement them. Here is a common one I would lie to see happening (base on comments):

const char* A[] = {
  // This is a entry
  "entry1",
  // This is the next entry
  "entry2"
  "entry2"
  "entry2",
  // This is the last entry
  "entry2",
};

Other common cases to append literals are:

"Program version:"  VERSION 
"Program version: %"  PRIu64 
"Let add strange characters \xFF" "For nothing"  (you can't happend these two literals or the escaped char will be "\xFFF")
clang-tidy/misc/SuspiciousMissingCommaCheck.cpp
94

The issue I see by matching the "hasParent" that way is that you are assuming that the matcher is only looking for initialisation-list attached to variable declaration (direct child).

Assume this case:

struct StrArray {
  int n;
  const char* list[5];
} A[2] = {
  {5, {"a", "b", "c", "d", "e" }},
  {5, {"a", "b", "c", "d", "e" }},
};

It is giving you this AST representation:

VarDecl 0x1e82738 </usr/local/google/home/etienneb/examples/test.cc:3:1, line:9:1> line:6:3 A 'struct StrArray [2]' cinit
`-InitListExpr 0x1e82c38 <col:10, line:9:1> 'struct StrArray [2]'
  |-InitListExpr 0x1e82c88 <line:7:3, col:33> 'struct StrArray':'struct StrArray'
  | |-IntegerLiteral 0x1e827d8 <col:4> 'int' 5
  | `-InitListExpr 0x1e82cd8 <col:7, col:32> 'const char *[5]'
  |   |-ImplicitCastExpr 0x1e82d40 <col:8> 'const char *' <ArrayToPointerDecay>
  |   | `-StringLiteral 0x1e82878 <col:8> 'const char [2]' lvalue "a"
  |   |-ImplicitCastExpr 0x1e82d58 <col:13> 'const char *' <ArrayToPointerDecay>
  |   | `-StringLiteral 0x1e828a8 <col:13> 'const char [2]' lvalue "b"
  |   |-ImplicitCastExpr 0x1e82d70 <col:18> 'const char *' <ArrayToPointerDecay>
  |   | `-StringLiteral 0x1e828d8 <col:18> 'const char [2]' lvalue "c"
  |   |-ImplicitCastExpr 0x1e82d88 <col:23> 'const char *' <ArrayToPointerDecay>
  |   | `-StringLiteral 0x1e82908 <col:23> 'const char [2]' lvalue "d"
  |   `-ImplicitCastExpr 0x1e82da0 <col:28> 'const char *' <ArrayToPointerDecay>
  |     `-StringLiteral 0x1e82938 <col:28> 'const char [2]' lvalue "e"
  `-InitListExpr 0x1e82db8 <line:8:3, col:33> 'struct StrArray':'struct StrArray'
    |-IntegerLiteral 0x1e82a20 <col:4> 'int' 5
    `-InitListExpr 0x1e82e08 <col:7, col:32> 'const char *[5]'
      |-ImplicitCastExpr 0x1e82e70 <col:8> 'const char *' <ArrayToPointerDecay>
      | `-StringLiteral 0x1e82a40 <col:8> 'const char [2]' lvalue "a"
      |-ImplicitCastExpr 0x1e82e88 <col:13> 'const char *' <ArrayToPointerDecay>
      | `-StringLiteral 0x1e82a70 <col:13> 'const char [2]' lvalue "b"
      |-ImplicitCastExpr 0x1e82ea0 <col:18> 'const char *' <ArrayToPointerDecay>
      | `-StringLiteral 0x1e82aa0 <col:18> 'const char [2]' lvalue "c"
      |-ImplicitCastExpr 0x1e82eb8 <col:23> 'const char *' <ArrayToPointerDecay>
      | `-StringLiteral 0x1e82ad0 <col:23> 'const char [2]' lvalue "d"
      `-ImplicitCastExpr 0x1e82ed0 <col:28> 'const char *' <ArrayToPointerDecay>
        `-StringLiteral 0x1e82b00 <col:28> 'const char [2]' lvalue "e"

I believe this can be solved with something like:

hasParent(anyOf(varDecl(allOf(hasType(arrayType()), isDefinition()),
                            anything()))  <<-- to accept all other cases.

That way, you are adding an heuristic to filter some incorrect warnings.

I believe it's possible to match the example above as the size is part of the type.

If I try this example:

{5, {"a", "b", "c", "d"}},    // only four string literal

I'm getting the following AST:

`-InitListExpr 0x28ffd50 <line:8:3, col:27> 'struct StrArray':'struct StrArray'
   |-IntegerLiteral 0x28ff9e8 <col:4> 'int' 5
   `-InitListExpr 0x28ffda0 <col:7, col:26> 'const char *[5]'
     |-array filler
     | `-ImplicitValueInitExpr 0x28ffe78 <<invalid sloc>> 'const char *'
     |-ImplicitCastExpr 0x28ffde0 <col:8> 'const char *' <ArrayToPointerDecay>
     | `-StringLiteral 0x28ffa08 <col:8> 'const char [2]' lvalue "a"
     |-ImplicitCastExpr 0x28ffe00 <col:13> 'const char *' <ArrayToPointerDecay>
     | `-StringLiteral 0x28ffa38 <col:13> 'const char [2]' lvalue "b"
     |-ImplicitCastExpr 0x28ffe28 <col:18> 'const char *' <ArrayToPointerDecay>
     | `-StringLiteral 0x28ffa68 <col:18> 'const char [2]' lvalue "c"
     `-ImplicitCastExpr 0x28ffe60 <col:23> 'const char *' <ArrayToPointerDecay>
       `-StringLiteral 0x28ffa98 <col:23> 'const char [2]' lvalue "d"

For the direct case:

const char* list[5] = { "a", "b"};

It has the following AST-representation:

VarDecl 0x2c67f00 </usr/local/google/home/etienneb/examples/test.cc:11:1, col:33> col:13 list 'const char *[5]' cinit
`-InitListExpr 0x2c68010 <col:23, col:33> 'const char *[5]'
  |-array filler
  | `-ImplicitValueInitExpr 0x2c68098 <<invalid sloc>> 'const char *'
  |-ImplicitCastExpr 0x2c68050 <col:25> 'const char *' <ArrayToPointerDecay>
  | `-StringLiteral 0x2c67f60 <col:25> 'const char [2]' lvalue "a"
  `-ImplicitCastExpr 0x2c68070 <col:30> 'const char *' <ArrayToPointerDecay>
    `-StringLiteral 0x2c67f90 <col:30> 'const char [2]' lvalue "b"

So, it seems the length could be taken from the type instead of the declaration?!
Or, look at what is the "Array Filler" (I still don't know). This may be an more straight forward way.

etienneb requested changes to this revision.Apr 30 2016, 11:03 AM
etienneb edited edge metadata.
This revision now requires changes to proceed.Apr 30 2016, 11:03 AM
szdominik added inline comments.May 1 2016, 1:44 AM
clang-tidy/misc/SuspiciousMissingCommaCheck.cpp
94

The array filler could be our solution.
I didn't find too much description about it, but based on this:
http://clang.llvm.org/doxygen/Expr_8cpp_source.html#l01963
I guess, there is always an array filler in these cases (when the explicit size is bigger than the real, so when there is a "hole" in the array because of the size-diff).
It's more simplier way, you're right.

szdominik updated this revision to Diff 55741.May 1 2016, 3:08 AM
szdominik edited edge metadata.

Implement the heuristic in a different (and simpler) way.
Based on the array fillers.

this is cool :)
It's a simple and precise rule to avoid using the probabilistic heuristic.

clang-tidy/misc/SuspiciousMissingCommaCheck.cpp
94

If it's working as-is,... this is neat :)

106

The error should still be reported to the missing comma (concatenated token):

ConcatenatedLiteral->getLocStart(),

We could add a NOTE to point to the array, stating that the size mismatch.

What do you think?

test/clang-tidy/misc-suspicious-missing-comma.cpp
84

Could you add the more complicated example I sent you.
More tests is always welcome.

Currently, there is no tests for initialisation list in initialisation list.

szdominik marked 3 inline comments as done.May 2 2016, 3:16 AM
szdominik added inline comments.
clang-tidy/misc/SuspiciousMissingCommaCheck.cpp
106

Interesting question (the first idea was that we can't decide that the comma is missing or the size is wrong, so report to the array, that's a more secure solution), but I agree that the note could be more effective.
And... it's still a suspicious-missing-comma checker, not a wrong-string-array-size checker :)

szdominik added inline comments.May 2 2016, 3:57 AM
clang-tidy/misc/SuspiciousMissingCommaCheck.cpp
94

Well, the array filler has a problematic limitation.
I forget that I used the explicit given size for reduce false positives in my checker.
e.g.

struct MusicIntervalArray {
	int n;
	const char* list[5];
	} intervals[2] = {
		{5, {"second", "third", "fourth", "fifth" "sixth"}},
		{5, {"ninth", "tenth", "eleventh", "twelfth", "thir" "teenth"}},
	};

The first is simple: has array filler, so warn because of size.
But the second one is a good initialization, it has 5 elements, which is the explicit size. But your implementation warns us, that there is a missing comma. If I work with the length (from declaration or type), I can filter these cases.

etienneb requested changes to this revision.May 4 2016, 7:35 PM
etienneb edited edge metadata.

If the 'original' size is available, the checkers should by-pass the heuristic.
Can you check if there is a way to get the original size.

clang-tidy/misc/SuspiciousMissingCommaCheck.cpp
106

How can you be sure the size was provided by the user? And not inferred by the type system?

For the following examples:

const char* listA[] = {"a", "b" };
const char* listB[5] = {"a", "b" };

We've got this:

VarDecl 0x5e9d840 <C:\src\llvm\examples\hello_world.cc:11:1, col:33> col:13 listA 'const char *[2]' cinit
`-InitListExpr 0x5e9d950 <col:23, col:33> 'const char *[2]'
  |-ImplicitCastExpr 0x5e9d978 <col:24> 'const char *' <ArrayToPointerDecay>
  | `-StringLiteral 0x5e9d8d8 <col:24> 'const char [2]' lvalue "a"
  `-ImplicitCastExpr 0x5e9d988 <col:29> 'const char *' <ArrayToPointerDecay>
    `-StringLiteral 0x5e9d8fc <col:29> 'const char [2]' lvalue "b"
VarDecl 0x5e9d840 <C:\src\llvm\examples\hello_world.cc:11:1, col:33> col:13 listA 'const char *[2]' cinit
`-InitListExpr 0x5e9d950 <col:23, col:33> 'const char *[2]'
  |-ImplicitCastExpr 0x5e9d978 <col:24> 'const char *' <ArrayToPointerDecay>
  | `-StringLiteral 0x5e9d8d8 <col:24> 'const char [2]' lvalue "a"
  `-ImplicitCastExpr 0x5e9d988 <col:29> 'const char *' <ArrayToPointerDecay>
    `-StringLiteral 0x5e9d8fc <col:29> 'const char [2]' lvalue "b"

How can I tell the "size" was written by the user?
How can you get the "5" and not the "2".

This revision now requires changes to proceed.May 4 2016, 7:35 PM

The original size is available - but from the decleration, not from the initListExpr.

clang-tidy/misc/SuspiciousMissingCommaCheck.cpp
106

(You copied twice the AST-tree part of listA :))

The array, which was declared with explicit given size, has different type than the usual one. And I can filter the proper cases because of the different (incomplete / constant) array type.
e.g.

VariableDeclaration->getTypeSourceInfo()->getType()->dump()
const char *string_array_init[3] = {
		"first elem",
		"second elem"
		"third elem"
	};
ConstantArrayType 0x1efae70 'const char *[3]' 3 
`-PointerType 0x1efae40 'const char *'
  `-QualType 0x1efa511 'const char' const
    `-BuiltinType 0x1efa510 'char'

But

const char *string_array_init2[] = {
		"first elem",
		"second elem"
		"third elem"
	};
IncompleteArrayType 0x1f49ba0 'const char *[]' 
`-PointerType 0x1efae40 'const char *'
  `-QualType 0x1efa511 'const char' const
    `-BuiltinType 0x1efa510 'char'

However, the type of InitListExpr is always ConstantArrayType (I tried to find a way to be sure the size is explicit given or inferred, but... I don't find anything).
So (unless you haven't better idea) I have two options, as I see:
a) use the array filler solution, but I can't reduce the false-positive cases.
b) use the decleration's type, but I have to use some kind of heuristic, what you told me at your first comment.
Probably we should decide which one is less used.

The original size is available - but from the decleration, not from the initListExpr.

Let do the copy paste (correctly):

clang-query> match namedDecl(hasName("listA"))

Match #1:

Binding for "root":
VarDecl 0x1a06370 </usr/local/google/home/etienneb/examples/test.cc:2:1, col:33> col:13 listA 'const char *[2]' cinit
`-InitListExpr 0x1a06538 <col:23, col:33> 'const char *[2]'
  |-ImplicitCastExpr 0x1a06578 <col:24> 'const char *' <ArrayToPointerDecay>
  | `-StringLiteral 0x1a06488 <col:24> 'const char [2]' lvalue "a"
  `-ImplicitCastExpr 0x1a06598 <col:29> 'const char *' <ArrayToPointerDecay>
    `-StringLiteral 0x1a064b8 <col:29> 'const char [2]' lvalue "b"

1 match.
clang-query> match namedDecl(hasName("listB"))

Match #1:

Binding for "root":
VarDecl 0x1a06678 </usr/local/google/home/etienneb/examples/test.cc:3:1, col:34> col:13 listB 'const char *[5]' cinit
`-InitListExpr 0x1a06788 <col:24, col:34> 'const char *[5]'
  |-array filler
  | `-ImplicitValueInitExpr 0x1a06810 <<invalid sloc>> 'const char *'
  |-ImplicitCastExpr 0x1a067c8 <col:25> 'const char *' <ArrayToPointerDecay>
  | `-StringLiteral 0x1a066d8 <col:25> 'const char [2]' lvalue "a"
  `-ImplicitCastExpr 0x1a067e8 <col:30> 'const char *' <ArrayToPointerDecay>
    `-StringLiteral 0x1a06708 <col:30> 'const char [2]' lvalue "b"

Now, if you are telling me there is a way to handle correctly this example:

const char* A[2] = {"a" "b");

Even using "hasParent" and going to the declaration to get the original size.... this worth implementing it.
Just be sure your matcher won't remove actually supported cases. You first proposal was breaking initialization list in initialization list.

So, the plan is:
If the parent is a declaration, and has a size, and the size match.... skip the heuristic even if there is concatenated string literals.

wdyt?

I think the main success is that we don't warn here:

const char* A[2] = {"a", "b" "c");

The problem is, as you said, not remove already supported cases (such as when the parent is not a vardecl (but also an initListExpr)).
But I'll implement, and we'll see it.