This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] improve distinction of K&R function definitions vs attributes
ClosedPublic

Authored by krasimir on Aug 12 2021, 12:30 AM.

Details

Summary

After
https://github.com/llvm/llvm-project/commit/9da70ab3d43c79116f80fc06aa7cf517374ce42c
we saw a few regressions around trailing attribute definitions and in
typedefs (examples in the added test cases). There's some tension
distinguishing K&R definitions from attributes at the parser level,
where we have to decide if we need to put the type of the K&R definition
on a new unwrapped line before we have access to the rest of the line,
so we're scanning backwards and looking for a pattern like f(a, b). But
this type of pattern could also be an attribute macro, or the whole
declaration could be a typedef itself. I updated the code to check for a
typedef at the beginning of the line and to not consider raw identifiers
as possible first K&R declaration (but treated as an attribute macro
instead). This is not 100% correct heuristic, but I think it should be
reasonably good in practice, where we'll:

  • likely be in some very C-ish code when using K&R style (e.g., stuff that uses struct name a; instead of name a;
  • likely be in some very C++-ish code when using attributes
  • unlikely mix up the two in the same declaration.

Ideally, we should only decide to add the unwrapped line before the K&R
declaration after we've scanned the rest of the line an noticed the
variable declarations and the semicolon, but the way the parser is
organized I don't see a good way to do this in the current parser, which
only has good context for the previously visited tokens. I also tried
not emitting an unwrapped line there and trying to resolve the situation
later in the token annotator and the continuation indenter, and that
approach seems promising, but I couldn't make it to work without
messing up a bunch of other cases in unit tests.

Diff Detail

Event Timeline

krasimir requested review of this revision.Aug 12 2021, 12:30 AM
krasimir created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2021, 12:30 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Thanks for doing this, it LGTM. I personally think tok::identifier tends to be just too general, its hard to use it correctly in the rules especially in the presence of macros.

MyDeveloperDay accepted this revision.Aug 12 2021, 1:11 AM
This revision is now accepted and ready to land.Aug 12 2021, 1:11 AM

I personally think tok::identifier tends to be just too general

However, you can't avoid it as it's used for user-defined types:

typedef unsigned char byte;

byte *f(a)
byte a[];
{
  return a && *a ? a + 1 : 0;
}

I will try to come up with a solution.

HazardyKnusperkeks added a project: Restricted Project.Aug 12 2021, 8:28 AM