This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Further improve support for requires expressions
ClosedPublic

Authored by HazardyKnusperkeks on Feb 7 2022, 7:01 AM.

Details

Summary

Detect requires expressions in more unusable contexts. This is far from
perfect, but currently we have no good metric to decide between a
requires expression and a trailing requires clause.

Diff Detail

Event Timeline

HazardyKnusperkeks requested review of this revision.Feb 7 2022, 7:01 AM
HazardyKnusperkeks created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2022, 7:01 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Some nits.

clang/lib/Format/UnwrappedLineParser.cpp
1541–1542
2791
2818
2826
2854
2862
HazardyKnusperkeks marked 6 inline comments as done.

Rebased and updated

Quuxplusone added inline comments.
clang/lib/Format/UnwrappedLineParser.cpp
2798–2802

s/doesn't/don't/
IIUC you're talking about, like, void member() && requires ( — is that right?
It might help the reader to give an example snippet right here. (OTOH, it might be "obvious," I don't know. I'm not in the target audience for this code.)
...Ah, I see you give a snippet on line 2837 that's basically what I mean; I just first felt the need for that snippet all the way up here.

2835–2841

I think it's weird that your heuristic parses backward rather than forward. I would think that the next token after the requires keyword tells you what it is with pretty high probability:
requires requires — it's a clause
requires identifier — it's a clause
requires { — it's an expression
requires ( — unclear, apply further heuristics

Or are those heuristics already present in trunk, and this PR is just dealing with the "unclear" case?

2882
HazardyKnusperkeks planned changes to this revision.Feb 12 2022, 12:55 AM
HazardyKnusperkeks added inline comments.
clang/lib/Format/UnwrappedLineParser.cpp
2835–2841

That would be so much better, but I can't easily look forward. Next is still nullptr, until I call nextToken(), but then I'm already moved along.

But this got me thinking, at least for the easy stuff I can just go forward and don't start on the keyword in parseRequiresClause() and parseRequiresExpression(). The paren case is more tricky, but I will try something.

New approach for identifying the expressions.

HazardyKnusperkeks marked 3 inline comments as done.Feb 12 2022, 4:28 PM
HazardyKnusperkeks added inline comments.
clang/lib/Format/UnwrappedLineParser.cpp
2835–2841

Present in main everything is a clause, except for requires expressions in a constraint expression. So the stuff where you use the requires expression in a "normal" boolean expression are misparsed and thus most likely misformatted.

There is actually a peekToken(), let's see if this is better.

This revision is now accepted and ready to land.Feb 14 2022, 12:59 AM
klimek added a subscriber: klimek.Nov 25 2022, 7:30 AM

Hey ho, sorry for the late comment here, but adding peekNextToken(n) is problematic, as this gets in the way of future changes we want to do to handle macros better.
Usually we want to use X = Tokens->getPosition() and FormatTok = Tokens->setPosition(X) pairs when doing look-ahead.
I did a quick attempt at fixing this, but ran into infinite loops later in the annotator :(

Herald added a project: Restricted Project. · View Herald TranscriptNov 25 2022, 7:30 AM

Generally, why do we need to have that much information? I.e. why do we need to know the exact type of the "requires" keyword?
I do understand we need to know the brace type, but that seems like it would be easier to figure out in the TokenAnnotator (where we already parsed UnwrappedLines).
Do we ever parse UnwrappedLines differently depending on requires clauses/expressions?
If not, we should really do the annotation in TokenAnnotator, where we already have nice parsing bounds from the parsed UnwrappedLines.

Generally, why do we need to have that much information? I.e. why do we need to know the exact type of the "requires" keyword?
I do understand we need to know the brace type, but that seems like it would be easier to figure out in the TokenAnnotator (where we already parsed UnwrappedLines).
Do we ever parse UnwrappedLines differently depending on requires clauses/expressions?
If not, we should really do the annotation in TokenAnnotator, where we already have nice parsing bounds from the parsed UnwrappedLines.

Who is we, I'm not part of that we and haven't heard of some macro improvements. And I don't see how that feature is harming you, but be my guest in changing that. If you look into the history of this change I had a heuristic approach which would only look behind to differentiate.

I don't know if that can be solved in the TokenAnnotator, but you and I have different opinions about that. I'd put more annotating in the UnwrappedLineParser, annotate it as soon as we can.

I'll happily review any changes proposed, but I will not rework this piece of code, unless I can see a big flaw in it (which I can't right now).

I changed it in 49aca00d63e14df8bc68fc4329e6cbc9c9805eb8.

"We" is the people working on clang-format :) I hope that we have a common goal of making clang-format as easy to maintain as we can.

FWIW, I once had the same opinion as you about best doing all parsing as early as possible, but djasper convinced me that the split was a good idea, and in the end, I think it turns out to be significantly less brittle to do more complex annotation in TokenAnnotator. E.g. we now have a lookahead limit of 50, which seems rather arbitrary, while in TokenAnnotator we could simply limit lookahead towards the current UnwrappedLine. Similarly, in TokenAnnotator, we already have all the parens connected, so we could simply look from requires l_paren to the corresponding r_paren and whether the next token is an l_brace. If I can find a bit of time I'll take an attempt at implementing it.

I changed it in 49aca00d63e14df8bc68fc4329e6cbc9c9805eb8.

"We" is the people working on clang-format :) I hope that we have a common goal of making clang-format as easy to maintain as we can.

FWIW, I once had the same opinion as you about best doing all parsing as early as possible, but djasper convinced me that the split was a good idea, and in the end, I think it turns out to be significantly less brittle to do more complex annotation in TokenAnnotator. E.g. we now have a lookahead limit of 50, which seems rather arbitrary, while in TokenAnnotator we could simply limit lookahead towards the current UnwrappedLine. Similarly, in TokenAnnotator, we already have all the parens connected, so we could simply look from requires l_paren to the corresponding r_paren and whether the next token is an l_brace. If I can find a bit of time I'll take an attempt at implementing it.

Your commit is in my view a an example of making that maintaining a bit harder, it didn't went through review, had you not posted it here I'd never seen it. LLVM receives to many commits to scan them for changes in clang-format. And as someone who isn't that long involved in clang-format I think there is an overview really missing.

For non-functional clean-ups generally llvm doesn't require pre-commit review - I did communicate here so people involved in the original change wouldn't miss the clean-up. I do agree that what commits to pre-review is a fine line, and usually try to err on the side of pre-review; I'll take your feedback into consideration for future changes.

Regarding a better overview, you're 100% right. This is something we've definitely not been good enough and we need to get better at.