Page MenuHomePhabricator

Improvements to localizability checks for iOS / OS X
ClosedPublic

Authored by kulpreet on Aug 27 2015, 4:46 PM.

Details

Summary
  • Adjusted copy to be consistent with diagnostic text in other Apple API checkers
  • Added in ~150 UIKit / AppKit methods that require localized strings in UnlocalizedStringsChecker
  • UnlocalizedStringChecker now checks for UI methods up the class hierarchy and UI methods that conform for a certain Objective-C protocol.
  • Added in alpha version of PluralMisuseChecker and some regression tests. False positives are still not ideal.

Diff Detail

Repository
rL LLVM

Event Timeline

kulpreet updated this revision to Diff 33373.Aug 27 2015, 4:46 PM
kulpreet retitled this revision from to Improvements to localizability checks for iOS / OS X.
kulpreet updated this object.
xazax.hun edited edge metadata.Aug 28 2015, 1:59 PM

I think in general, it might be better to have all the UI methods in a separate file, in a specific format, and you could include that file and use a macro, to generate the code.

zaks.anna edited edge metadata.Aug 28 2015, 2:52 PM

Thanks for working on this!

Added in ~150 UIKit / AppKit methods that require localized strings in UnlocalizedStringsChecker

It might be more efficient to store identifiers. Have you considered that approach?

For the Plural Misuse Checker Tests, please, add some test cases that show the common false positives that your checker already avoids as well as common false positives it does not deal with right now. (And mark them clearly to tell the difference.) These are important to ensure that we do not regress in false positive rates once someone less familiar with the checker modifies it!

Other comments are inline.

lib/StaticAnalyzer/Checkers/LocalizationChecker.cpp
1114 ↗(On Diff #33373)

For better readability, please, rename into something like isCheckingPlurality.

1138 ↗(On Diff #33373)

Is there a "working" test case for this?

1145 ↗(On Diff #33373)

Should this I/O line be removed?

1174 ↗(On Diff #33373)

FD->getNameInfo().getAsString() is a temporary variable, so it might get deallocated by the time you use NormalizeName. Please, build and test with ASan on.

You could also check the return type of the Decl.

1213 ↗(On Diff #33373)

You could simplify this code with an early return statement.

Why do we need to override traverse?
"/ users may override Traverse* and WalkUpFrom* to implement custom
/ traversal strategies. Returning false from one of these overridden
/// functions will abort the entire traversal."

If you are trying to avoid traversing everything that's not an IfStmt, maybe you could call Visitor.TraverseStmt on the top level if statements..

1270 ↗(On Diff #33373)

"Use a .stringsdict file" -> "Use a .stringsdict file instead"

kulpreet marked 5 inline comments as done.Sep 1 2015, 2:52 PM

It might be more efficient to store identifiers. Have you considered that approach?

I did consider this approach a while back, but I ran into a problem when I realized that Selector comparison doesn’t work the way I expected it. You can see it in this debug output:

   765 	
   766 	  Selector S = msg.getSelector();
   767 	
-> 768 	  std::string SelectorString = S.getAsString();
   769 	  StringRef SelectorName = SelectorString;
   770 	  assert(!SelectorName.empty());
   771 	
(lldb) p S
(clang::Selector) $3 = (InfoPtr = 4513118723)
(lldb) p S.getAsString()
(std::__1::string) $4 = "localizedStringForKey:value:table:"
(lldb) p getKeywordSelector(C.getASTContext(),"localizedStringForKey:","value:","table:",nullptr)
(clang::Selector) $5 = (InfoPtr = 4513118835)
(lldb) p S == getKeywordSelector(C.getASTContext(),"localizedStringForKey:","value:","table:",nullptr)
(bool) $7 = false

Looking at the implementation of getKeywordSelector(), it looks like a new Selector object is being constructed from the identifiers that make up the components of the selector. In its constructor, Selector maintains InfoPtr to an underlying MultiKeywordSelector object. When comparing two Selectors, this underlying InfoPtr is compared. Unfortunately, MultiKeywordSelector does not override operator==(), so it just compares the pointers directly. My guess is that it probably should compare the IdentifierInfo pointer in each slot to see if they are the same.

For the Plural Misuse Checker Tests, please, add some test cases that show the common false positives that your checker already avoids as well as common false positives it does not deal with right now.

I added a couple test cases for the one heuristic I added to reduce false positives and one for the common false positive I saw in some projects. I've been looking around for good localized projects to test this on, but there aren't many. I tested on Wikipedia iOS and there were no diagnostics raised for PluralMisuseChecker. (but there were 11 real diagnostics raised with the NonLocalizedStringChecker - hooray!)

I think in general, it might be better to have all the UI methods in a separate file, in a specific format, and you could include that file and use a macro, to generate the code.

This is something I will look into in the future, since I've already generated this code.

lib/StaticAnalyzer/Checkers/LocalizationChecker.cpp
1138 ↗(On Diff #33373)

This is tested in test5: in localization.m

1174 ↗(On Diff #33373)

I tested with ASan on and there were no memory issues reported. I think here NormalizeName doesn't get deallocated because it's the result of StringRef.lower().

1213 ↗(On Diff #33373)

I simplified the code as per your recommendation.

We override TraverseIfStmt to keep track of whether we are inside (visiting the children) of an IfStmt. By overriding TraverseIfStmt, we know when we have finished visiting all of the children of a matching IfStmt and can set InMatchingStatment to false.

I discussed this problem with Gabor a while back and that was the solution we came up with.

kulpreet updated this revision to Diff 33737.Sep 1 2015, 2:53 PM
kulpreet edited edge metadata.
kulpreet marked 2 inline comments as done.

Incorporated feedback from Anna.

xazax.hun added inline comments.Sep 1 2015, 4:16 PM
lib/StaticAnalyzer/Checkers/LocalizationChecker.cpp
1209 ↗(On Diff #33737)

In the visitor pattern BeginVisit and EndVisit are commonly used terms. Maybe it would be cleaner to refactor the code bellow into a separate method that has a name like EndVisitIfStmt and add a comment here that EndVisit callbacks are not provided by the recursiveASTVisitor, and this is the code to add it.

Looking at the implementation of getKeywordSelector(), it looks like a new Selector object is being constructed from the identifiers that make up the components of the selector. In its constructor, Selector maintains InfoPtr to an underlying MultiKeywordSelector object. When comparing two Selectors, this underlying InfoPtr is compared. Unfortunately, MultiKeywordSelector does not override operator==(), so it just compares the pointers directly. My guess is that it probably should compare the IdentifierInfo pointer in each slot to see if they are the same.

I think, if that is the only reason why storing identifiers/selectors does not work, it could be a good enough motivation to implement proper == operators in clang!

In general, == should be either disabled or work intuitively.

zaks.anna requested changes to this revision.Sep 10 2015, 6:57 PM
zaks.anna edited edge metadata.
This revision now requires changes to proceed.Sep 10 2015, 6:57 PM
kulpreet updated this revision to Diff 35207.Sep 20 2015, 12:21 PM
kulpreet edited edge metadata.

NonLocalizedStringChecker:

  • Got the Selector/Identifier object comparison working! So no more string comparison for UIMethods
  • Moved over to using macros for list of UIMethods as per Gabor's recommendation

PluralMisuseChecker:

  • Separated out TraverseIfStmt function into EndVisitIfStmt in to make it clear that this is after an IfStmt is visited

(Sorry for the delay!)

zaks.anna accepted this revision.Sep 21 2015, 3:49 PM
zaks.anna edited edge metadata.
zaks.anna added inline comments.
lib/StaticAnalyzer/Checkers/Checkers.td
493 ↗(On Diff #35207)

It would be good to shorten the description.

This revision is now accepted and ready to land.Sep 21 2015, 3:49 PM
kulpreet updated this revision to Diff 35404.Sep 22 2015, 12:59 PM
kulpreet edited edge metadata.
kulpreet added a subscriber: cfe-commits.

Shortened description of PluralMisuseChecker in Checkers.td as per Anna's recommendation.

This revision was automatically updated to reflect the committed changes.
dcoughlin edited edge metadata.Sep 22 2015, 5:23 PM

This is causing tests to fail on the bots: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/31373

/home/llvmbb/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/Release+Asserts/bin/clang -cc1 -internal-isystem /home/llvmbb/llvm-build-dir/clang-x86_64-debian-fast/llvm.obj/Release+Asserts/bin/../lib/clang/3.8.0/include -nostdsysteminc -analyze -fblocks -analyzer-store=region -analyzer-checker=alpha.osx.cocoa.NonLocalizedStringChecker -analyzer-checker=alpha.osx.cocoa.PluralMisuseChecker -verify  /home/llvmbb/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/tools/clang/test/Analysis/localization.m
--
Exit Code: 1

Command Output (stderr):
--
error: 'warning' diagnostics expected but not seen: 
  File /home/llvmbb/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/tools/clang/test/Analysis/localization.m Line 115: Plural cases are not supported accross all languages. Use a .stringsdict file
  File /home/llvmbb/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/tools/clang/test/Analysis/localization.m Line 117: Plural cases are not supported accross all languages. Use a .stringsdict file
  File /home/llvmbb/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/tools/clang/test/Analysis/localization.m Line 124: Plural cases are not supported accross all languages. Use a .stringsdict file
  File /home/llvmbb/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/tools/clang/test/Analysis/localization.m Line 126: Plural cases are not supported accross all languages. Use a .stringsdict file
  File /home/llvmbb/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/tools/clang/test/Analysis/localization.m Line 134: Plural cases are not supported accross all languages. Use a .stringsdict file
  File /home/llvmbb/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/tools/clang/test/Analysis/localization.m Line 136: Plural cases are not supported accross all languages. Use a .stringsdict file
  File /home/llvmbb/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/tools/clang/test/Analysis/localization.m Line 150: Plural cases are not supported accross all languages. Use a .stringsdict file
7 errors generated.