Page MenuHomePhabricator

format: Remove redundant calls to guessIsObjC to speed up clang-format on unknown file types
Needs RevisionPublic

Authored by davidvc1 on Nov 30 2021, 6:26 PM.



Running clang-format on the following input

int lambdas() {
  return [&] { 
  return [&] { 
  return [&] { 
  return [&] { 
  return [&] { 
  return [&] { 
  return [&] { return 3; } (); 
  } (); } (); } (); } (); } (); } (); }

will finish instantly if you pass clang-format a .cpp input with this content,
but hang for tens of seconds if you pass the same via stdin.

Adding some debug statements showed that guessIsObjC was getting called
tens of millions of times in a manner that scales very rapidly with the amount
of nesting (if clang-format just takes a few seconds with that input passed
on stdin, try adding a couple more levels of nesting).

This change moves the recursive guessIsObjC call one level of nesting out of
an inner loop whose iterations don't affect the input to the recursive call. This
resolves the performance issue.

Diff Detail

Event Timeline

davidvc1 requested review of this revision.Nov 30 2021, 6:26 PM
davidvc1 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2021, 6:26 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Hi @curdeius, I'm new to the project and I'm adding you as a reviewer because I noticed you reviewed a number of recent changes in this directory. Are you a good reviewer for this change? Thanks!

+ @rsmith from CODE_OWNERS.txt: Hi Richard, could you please suggest a reviewer for this change? Thanks!

MyDeveloperDay requested changes to this revision.Jan 17 2022, 9:40 AM

This needs a full context diff and ideally some unit tests

This revision now requires changes to proceed.Jan 17 2022, 9:40 AM

I see indeed that this change fixes the superlinear complexity when formatting code from stdin:

cat test.cpp | time clang-format --style=LLVM

0:01.44 ==   1.44s  //  9 returns
0:13.17 ==  13s     // 10 returns
1:56.63 == 116s     // 11 returns

time clang-format --style=LLVM test.cpp

And after this change the stdin case takes as much time as with a file.

It's definitely a good change, but I'd like to understand what's the reason, why stdin is handled so much differently?
Have you investigated this?

Also, a sort of performance test (e.g. a test with a timeout) would be nice to have.

@curdeius , the PR description mentions the reason: when you pipe input from stdin, clang-format has to run through the "is it objective C?" codepath, and this is the codepath with the bug.

On the other hand, when you pass a .cpp file as a file path input, it doesn't need to figure out whether the code is objective C.

Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2022, 11:25 AM