This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Add simple macro replacements in formatting.
ClosedPublic

Authored by klimek on Feb 16 2023, 1:29 AM.

Details

Summary

Add configuration to specify macros.
Macros will be expanded, and the code will be parsed and annotated
in the expanded state. In a second step, the formatting decisions
in the annotated expanded code will be reconstructed onto the
original unexpanded macro call.

Eventually, this will allow to remove special-case code for
various macro options we accumulated over the years in favor of
one principled mechanism.

Diff Detail

Event Timeline

klimek created this revision.Feb 16 2023, 1:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 1:29 AM
klimek requested review of this revision.Feb 16 2023, 1:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 1:29 AM

It's happening!

clang/lib/Format/ContinuationIndenter.cpp
743

nit: parentheses

744

This is a bit hard to follow without context, I had to look up MacroCallReconstructor::takeResult() to recall the structure it's referring to, and there's no obvious breadcrumb to point you there. And if you don't follow the details, it's hard to understand what the relevance is.

Maybe something like: this doesn't apply to macro expansion lines, which are MACRO( , , ) with args as children of ( and ,. It does not make sense to align the commas with the opening paren.

(I think describing here how the alignment ends up working in that case might just add confusion: it's not handled by this code, right?)

clang/lib/Format/TokenAnnotator.h
68

this seems like a separate bugfix: previously if the first node in Line.Tokens had children, we wouldn't be adding them.

Is this true? Was there some reason the first node could never have children before?

76

does "add children" here refer to the procedure in addChildren(), or to the token insertion idea introduced in the previous patch?

(i.e. is this just describing that previous/next skip over trees of children rather than including them in the chain, or is there something new going on?)

clang/lib/Format/UnwrappedLineFormatter.cpp
924

comment as to why?

926

why do we no longer finalize the child tree?

clang/lib/Format/UnwrappedLineParser.cpp
224

I'm sure you got this right, but it's hard for me to evaluate this code because I don't know what UnwrappedLineConsumer::finishRun() is for - it's not documented and the implementation is mysterious.

You might consider adding a comment to that interface/method, it's not really related to this change though.

4209–4210

nit: you check this condition a bunch of times, and this patch adds more.

giving it a name like !parsingPPDirective() would be less cryptic

4246

This is a bit vague as to what the "indexes" refer to and what the implications are (between comment & code "indexes" is mentioned 3 times but it's not clear without digging what they are).

AFAICS in practice this only affects indentation of braces in whitesmiths style, which may be worth mentioning. (It'll get stale, but at the moment the comment sounds like arbitrary things are likely to be broken)

4570

it looks like you go into parsing the macro call including parens and arguments regardless of whether the macro is object-like or function-like.

So given

#define DEBUG puts
DEBUG("foo");

you're going to expand to puts; instead of puts("foo");

4579

Calling a macro with the wrong arity might be worth flagging in debug output?

4668

this mishandles:

#define BAR() foo
int y = BAR; // no macro used here, just a variable named BAR
4706

hitting eof is an error condition here, so it seems more natural to handle it as a case in the switch as a kind of early-return, and maybe it's worth adding an LLVM_DEBUG() for that case. If anything it'd be nice to have r_paren break the loop (but too annoying with the switch inside)

clang/lib/Format/UnwrappedLineParser.h
78

maybe slightly more descriptive like resetBlockLineIndexes?

in principle you want to clear any indices you have here, so it's nice to be vague, but the caller also needs to reason about the consequences of destroying this data.

104

you forward declare this here, move the destructor below out-of-line etc
but also include Macros.h and use MacroExpander by value below
Do we want to avoid the dependency or not?

294

if this doesn't need to be incomplete, optional would be clearer

clang/unittests/Format/FormatTest.cpp
70

hmm, testing formatting in this way without messing up the formatting first isn't very convincing from a black-box perspective!

if messUp isn't feasible here (why precisely?) then maybe these tests should specify both input + expected output?

22587

If this is assign_or_return, the treatment of "c" is fairly strange. Also you are mostly calling this with only two args. Maybe drop C and use a different macro for the 3-arg case?

22615

MOCK_METHOD is another motivating case that might be worth testing (even if it's just a more complicated version of the same thing)

klimek updated this revision to Diff 499771.Feb 23 2023, 2:01 AM
klimek marked 9 inline comments as done.

Address reviewer comments.

clang/lib/Format/ContinuationIndenter.cpp
744

Excellent wording, applied. And yes, alignment in the macro case is not handled here.

clang/lib/Format/TokenAnnotator.h
68

I think the main problem is that it's impossible to write C++ code that has a child in the first node based on how this worked. That is, previously we only added children for blocks within one line, that is, within a function or lambda. Those need intro tokens. So I really couldn't make this trigger, or I would have fixed it in a separate patch.

76

It's about adding children in general.
When fixing this I realized this is in principle wrong, but then noticed that we have code that depends on this behavior.

clang/lib/Format/UnwrappedLineFormatter.cpp
926

We do - markFinalized() is called for the child tree again. Previously we marked multiple times, which doesn't work with the new algorithm, as it would move ExpandedArg -> UnexpandedArg -> Finalized in one go.

clang/lib/Format/UnwrappedLineParser.cpp
224

Done. Not sure I've put too much info about how it's going to be used into the interface, where it doesn't really belong.

4570

Excellent find. Completely invalidated the way I was doing formatting of expanded lines, but fixed the problem with the line indices needing resetting \o/

Now we're formatting all lines in the first round with the macro calls replaced with expanded lines, and then again format everything including the macro calls. That is more in line with clang-format's assumptions anyway.

4706

Usually we don't see hitting eof as an error, as that can always happen while we're in the process of writing a new file, or formatting "up to cursor". Thus, I think this fits the pattern of "stop where we are when we hit eof and just go with what we have" that's everywhere in the UnwrappedLineParser.

clang/lib/Format/UnwrappedLineParser.h
104

That was a leftover from a previous phase, thanks for catching. Removed forward decl and removed out of line destructor.

294

Excellent idea, done.

clang/unittests/Format/FormatTest.cpp
22587

ASSIGN_OR_RETURN allows 3 args, though; this is basically the thing I'd propose to configure for ASSIGN_OR_RETURN in general; is there anything specific you don't like about this?

klimek updated this revision to Diff 499777.Feb 23 2023, 2:29 AM

Add comment.

Thanks, this all looks good, at least as far as I understand it! The two-passes-over-the-whole-file approach is easier for me to follow, too.

Mostly just comment nits, but I have one annoying question left about using macros with the wrong arity - I thought this was just an implementation limitation, but it sounds like you want it to be a feature, so I'd like to nail it down a little...

clang/include/clang/Format/Format.h
2750

I find "try to match the structure of the expanded call" a bit hard to follow from a user perspective.

Maybe "Code will be parsed with macros expanded, in order to determine how to interpret and format the macro arguments"?

2753

This is a good example. I think you should spell out both sides, and probably start with the simpler case (no macro definition) to motivate the problem.

For example, the code:
  A(a*b);
will usually be interpreted as a call to a function A, and the multiplication expression will be formatted as `a * b`.

If we specify the macro definition:
  Macros:
    - A(x)=x

the code will now be parsed as a declaration of the variable b of type a*,
and formatted as `a* b` (depending on pointer-binding rules).
2757

type a*, not A*

2762

I think the restrictions should be spelled out here: object-like and function-like macros are both supported, macro args should be used exactly once, macros should not reference other macros.

clang/lib/Format/UnwrappedLineParser.cpp
224

Thanks, comment seems just right to me.

clang/lib/Format/UnwrappedLineParser.h
88

this is a bit of a garden path sentence: for code where ((macros are expanded) and (the code with (the original macro calls) ... ERR_MISSING_PREDICATE))

maybe when macros are involved, for the expanded code and the as-written code?

262

nit: should these added functions be const?

(maybe this class doesn't bother with const - computePPHash is const though)

clang/unittests/Format/FormatTest.cpp
22587

OK, this is a bit sticky.

  • ASSIGN_OR_RETURN is almost always called with two args
  • some versions of ASSIGN_OR_RETURN support an optional third arg (there's no canonical public version)
  • these emulate overloading using variadic macro tricks
  • this patch doesn't claim to support either variadic macros or overloading

So the principled options seem to be:

  • macros have fixed arity: clang-format can support the two-arg version of ASSIGN_OR_RETURN but not the three-arg version. (This is what I was assuming)
  • macros have variable arity: the public docs need to describe how passing too many/too few args is treated, because this isn't obvious and it sounds like we want people to rely on it. This feels like stretching "principled" to me - we're relying on the expansion still parsing correctly when args are missing, which basically seems like coincidence.
  • macros support overloading: instead of looking up macros by name, the key is (string Name, optional<int> Arity)
klimek updated this revision to Diff 500121.Feb 24 2023, 2:21 AM
klimek marked 6 inline comments as done.

Address review comments.

clang/unittests/Format/FormatTest.cpp
22587

Chose option 3, as this is pretty straight-forward and I believe the best user experience.

sammccall accepted this revision.Feb 24 2023, 4:04 AM

Great!

The overloading impl is a little surprising to me.
I was assuming that object would always win over function (or that it would be disallowed to combine the two).
I think that is simpler to implement, as you know in advance what type of expansion you're attempting.
But what you have isn't that much more complicated, and the extra flexibility may prove useful.

One weird side-effect: if you define an object-like macro FOO and then write FOO( and never close the paren, then we die inside macro-arg-parsing rather than hitting the normal EOF-while-formatting behavior. I left a comment on parseMacroCall() about how to fix this but I'm not sure whether it matters at a high level.

clang/lib/Format/MacroExpander.cpp
162

this assertion failure isn't going to be as useful as it could be!

maybe rather

if (OptionalArgs)
  assert(...);
else
  assert(...);

Also I think we're now ensuring to only call this if arity matches, so I'd make this a precondition and replace the first assert with hasArity to make it easier to reason about

clang/lib/Format/UnwrappedLineParser.cpp
4580

nit: call => Macro call in dbgs()?

4592

Strictly I don't think this comment is true, we hit this path when it's only an object-like macro, used before parens.

For this reason you might *not* want the dbgs() here but rather in the bottom else

4610

(nit: std::move(args)?)

4610

nit: maybe "expansion" is more descriptive than "new"?

4620

I think you may want a comment and/or dbgs() here explaining this case: Didn't expand macro FOO because it was used {with 3|without} args, which doesn't match any definition

4708

I believe this has changed meaning along with the return type: previously it returned an empty vector, now it returns nullopt. Not sure which you intend here.

I think the ideal thing here would be to rewind the token stream back to before the (, and return nullopt. We'll end up performing an object-like substitution if possible, and non-macro parsing will continue.

This revision is now accepted and ready to land.Feb 24 2023, 4:04 AM
klimek updated this revision to Diff 500181.Feb 24 2023, 6:19 AM
klimek marked 5 inline comments as done.

Address review comments.

clang/lib/Format/MacroExpander.cpp
162

Hopefully done, the comment took me a while to parse :)

clang/lib/Format/UnwrappedLineParser.cpp
4592

I'm pretty sure we hit this when it's overloaded for both, but we call it with the wrong arity.
E.g. A=x, A(x, y, z)=x y z

A(1, 2)

  • Macros.objectLike(A) -> true
  • Args -> true
  • !Macros.hasArity(A, 2) -> true

Still LG

clang/lib/Format/MacroExpander.cpp
162

Sorry about that, LG though

clang/lib/Format/UnwrappedLineParser.cpp
4592

I agree that (the case described in the comment) => (we get to this point in the code).

I'm disputing that (we get to this point in the code) => (the case described in the comment).

i.e. given A=x, code=A(1, 2), we also get: objectLike(A) -> true, Args -> true, !Macros.hasArity(A, 2) -> true, but this time the comment is lying about the state.

klimek updated this revision to Diff 500191.Feb 24 2023, 7:00 AM

Address review comments.

clang/lib/Format/UnwrappedLineParser.cpp
4592

Ah, your comment made me think it was the former; I think the dbgs() is still what I want, but with an "either or" dbgs().

Adapted the comment in the code and the dbgs() to match.

This revision was landed with ongoing or failed builds.Feb 24 2023, 7:49 AM
This revision was automatically updated to reflect the committed changes.
MyDeveloperDay added a subscriber: MyDeveloperDay.EditedMar 23 2023, 2:08 AM

I know this is like "telling my grandmother to suck eggs" but @klimek the change here to Format.h means you need to regenerate ClangFormatStyleOptions.rst via docs/tools/dump_format_style.py

Impacting the next change to generate - D125171: [clang-format] Add a new clang-format option AlwaysBreakBeforeFunctionParameters

ok fixed your your behalf via rG7a5b95732ade: [clang-format] NFC Format.h and ClangFormatStyleOptions.rst are out of date

I am so sorry, thanks for sending out the patch already and fixing the
layout!

HazardyKnusperkeks added a project: Restricted Project.Mar 27 2023, 11:49 AM
HazardyKnusperkeks added inline comments.
clang/include/clang/Format/Format.h
4343

I put a lot of effort into bringing the stuff sorted. And now a change which was completely transparent to me, because of the missing clang-format tag.

It seems that this patch caused a regression. See https://github.com/llvm/llvm-project/issues/62768.

NOTE: Clang-Format Team Automated Review Comment

Your review contains a change to clang/include/clang/Format/Format.h but does not contain an update to ClangFormatStyleOptions.rst

ClangFormatStyleOptions.rst is generated via clang/docs/tools/dump_format_style.py, please run this to regenerate the .rst

You can validate that the rst is valid by running.

./docs/tools/dump_format_style.py
mkdir -p html
/usr/bin/sphinx-build -n ./docs ./html