This is an archive of the discontinued LLVM Phabricator instance.

[Support] Allow complex names for annotation points and ranges via ${}
ClosedPublic

Authored by tom-anders on Nov 13 2022, 7:52 AM.

Details

Summary

This is useful when tests encode information in these names. Currently,
this is pretty limited because names consist of only alphanumeric
characters and '_'.

Arguably, writing foo${name}^bar instead of foo$name^bar might also be a
bit more readable in general.

The new syntax should be fully backwards compatible (if I haven't missed
anything). I tested this against clangd unit tests and everything still passes.

Diff Detail

Event Timeline

tom-anders created this revision.Nov 13 2022, 7:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 13 2022, 7:52 AM
tom-anders requested review of this revision.Nov 13 2022, 7:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 13 2022, 7:52 AM

This seems pretty nice (I do find foo$long_word^bar a bit hard to read). I think I'm in favor, my concerns would be that for a small benefit we're:

  • adding some extra complexity people have to understand
  • providing more ways to write something, so more bikeshedding opportunities
  • encouraging people to embed structured information in the annotations, in a way that won't be very readable *and* isn't well-supported by the API

Can you give examples of where you want to use this?
(I think we should *probably* accept this regardless, but tests want to iterate over payload-name pairs or things like that we should think a couple of steps ahead)

llvm/include/llvm/Testing/Support/Annotations.h
31

How do you feel about ${...} rather than $(...)?

$foo looks like shell to me, and so ${foo} seems like a quoting version while $(foo) reminds me more of some kind of embedded expression.

Small difference but maybe others would have similar associations.

This seems pretty nice (I do find foo$long_word^bar a bit hard to read). I think I'm in favor, my concerns would be that for a small benefit we're:

  • adding some extra complexity people have to understand
  • providing more ways to write something, so more bikeshedding opportunities
  • encouraging people to embed structured information in the annotations, in a way that won't be very readable *and* isn't well-supported by the API

Can you give examples of where you want to use this?
(I think we should *probably* accept this regardless, but tests want to iterate over payload-name pairs or things like that we should think a couple of steps ahead)

Yeah sorry, should've included this example right away:

This clangd test (and some others in this file) use the names of annotation for encoding the expected Attribute field of the Referencestruct.
For testing https://github.com/clangd/clangd/issues/177 (https://reviews.llvm.org/D137894) I wanted to extend those tests instead of writing new one, so the idea was to encode both the expected Attribute and the expected Context inside the annotation name (e.g. via comma-separated list).
Or maybe there's a better way for extending the test without abusing annotations? I couldn't really find a solution that wouldn't involve a major rewrite of the test logic here.

llvm/include/llvm/Testing/Support/Annotations.h
31

Ah yes I agree, this is definitely more intuitive, I'll change it

${} instead of $()

tom-anders retitled this revision from [Support] Allow complex names for annotation points and ranges via $() to [Support] Allow complex names for annotation points and ranges via ${}.Nov 15 2022, 11:14 AM
tom-anders edited the summary of this revision. (Show Details)

Fix one test I missed

TL;DR: I think the ${} is a nice quality-of-life improvement as-is, but attaching complex metadata needs something more. And we should spend our syntax-complexity-budget on the latter. What about making payload a separate attribute of points, with syntax $name(payload)^, which collapses in simple cases to $name^ or $(payload)^ or just ^?


This clangd test (and some others in this file) use the names of annotation for encoding the expected Attribute field of the Referencestruct.
For testing https://github.com/clangd/clangd/issues/177 (https://reviews.llvm.org/D137894) I wanted to extend those tests instead of writing new one, so the idea was to encode both the expected Attribute and the expected Context inside the annotation name (e.g. via comma-separated list).

Right, I think this is a slightly bigger can of worms...

Annotations was designed to support lookup of points/ranges with known names and the API works pretty well for this purpose without being complicated.
The tests you link work OK because we're looking up a fixed set of names (unnamed, def, decl). Using names for metadata beyond basic categories doesn't work well because you need the name to look them up. (We sometimes hack around this, at the cost of test clarity/simplicity).

Metadata comes up enough that it might be worth accommodating. Simply providing iterators over (name,point) pairs would make this possible but I don't think overloading names in this way is a great design:

  • makes it less obvious what a name is
  • makes llvm::Annotations API harder to understand
  • means names + metadata can't be used together

I think it would be better to make metadata orthogonal to names. Silly strawman syntax:

^  // just a point
@foo^ // unnamed (i.e. def+decl), container is foo
$def^ // definition, no container
$decl@foo^ // definition, contained by foo

then add functions to Annotations like ArrayRef<pair<unsigned, StringRef>> pointsWithPayload(StringRef Name = "")

Of course good syntax is both important and hard: @ isn't really available as a sigil (ObjC), and brackets are nicer in these complicated chains than prefix sigils too... and we need to be backwards compatible.

What do you think about $name(payload)^ for the general case, with $(payload)^, $name, and ^ as abbreviated forms? I don't think the lack of brackets on names hurts a lot if they stay simple, and if they're "just labels" that don't have to carry arbitrary payloads then they can be simple.

D137894

this looks cool! I left a couple of comments, mentioning here because phab won't notify for comments on drafts.

Keep name syntax, add optional payload instead

Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2022, 2:25 PM

What do you think about $name(payload)^ for the general case, with $(payload)^, $name, and ^ as abbreviated forms? I don't think the lack of brackets on names hurts a lot if they stay simple, and if they're "just labels" that don't have to carry arbitrary payloads then they can be simple.

Thanks for the detailed feedback, I think this looks perfect! I adapted the code and docs accordingly (I added some comments to the parts I'm not sure about)

D137894

this looks cool! I left a couple of comments, mentioning here because phab won't notify for comments on drafts.

Thanks! I replied to one of them :)

clang-tools-extra/clangd/unittests/Annotations.h
29

I thought about just using std::pair<clangd::Position, llvm::Optional<llvm::StringRef>> instead, but that is pretty verbose and unreadable here IMO. Another nice thing about subclassing here is that we can keep the API stable and don't have to introduce an additional pointWithPayload()` getter like in the base class.

llvm/include/llvm/Testing/Support/Annotations.h
61

One could argue that this should just be llvm::StringRef. Right now, $name^ results in llvm::None for the payload, while $name()^ will yield an empty string.

Do you think they should just both be empty strings instead?

Fix clangd::Annotations interface, now also has pointWithPayload etc.

tom-anders added inline comments.Nov 16 2022, 3:44 PM
clang-tools-extra/clangd/unittests/Annotations.h
29

Ok the part about not needing API changes was a total lie, I had a typo so that ninja would only build clang but not clangd tests - I now kept the struct but added the same withPayload methods like in llvm::Annotations

Thanks! sorry about continuing to drift this patch, but this looks like a great improvement.

Critical comments below are:

  • dropping Optional and just use StringRef
  • using one consistent way to to include payloads across the API. I think pair<T, StringRef> is actually best overall

Others are less important/optional if you want to get this wrapped up sooner :-)

clang-tools-extra/clangd/unittests/tweaks/TweakTesting.h
119 ↗(On Diff #475939)

not being able to ignore payloads (e.g. not assert the presence/absence) seems like a regression to me, this points further towards separating ranges() and rangesWithPayload

llvm/include/llvm/Testing/Support/Annotations.h
58

this patch takes too many different approaches to attaching the payloads to the points:

  • adding it to an existing struct that is already returned
  • defining a new wrapping struct for the andPayload version
  • defining a new inheriting struct for the andPayload version

I think we should keep it simple, and use pair<T, StringRef> for all the andPayload versions. This isn't optimal in isolation, but it keeps things coherent, avoiding:

  • confusion around clever type-system features (inheritance, implicit conversions, smart-pointer like types, templates)
  • introducing new structs whose names we have to remember
  • inconsistency between how payloads are accessed between ranges/points/clangd/llvm versions

In many cases structured bindings make using the pair versions less cumbersome than they would have been in the past.

61

Yes, I don't think that distinguishing between "no payload" and "empty payload" is useful, so would prefer the simpler API.

95

I hadn't seen these context on these all_ functions, I reached out to the authors to find out about them (they're used out-of-tree).

Basically the use case is to use llvm::Annotations to do the parsing, but dump out the data into some external test helper which lookups/assertions will run on.
So instead of adding extra variants, we just need to expose the parse results in some convenient way.

125

Internally I'd consider switching to a uniform representation of points and ranges. For two reasons:

  • we're adding a new feature that's common between them
  • all_points/all_ranges want to expose such a representation

Something like:

struct Annotation {
  size_t Start;
  size_t End; // or size_t(-1) for a point
  StringRef Name;
  StringRef Payload;
}
vector<Annotation> All;
StringMap<SmallVector<unsigned>> Points; // values are indices into All
StringMap<SmallVector<unsigned>> Ranges; // values are indices into All

ArrayRef<Annotation> all() { return All; } // replaces all_points and all_ranges

up to you though

llvm/lib/Testing/Support/Annotations.cpp
30

(If using the Annotation struct I mentioned before, an instance of it could cover Name+Payload and it could also be the elements of OpenRanges)

llvm/unittests/Support/AnnotationsTest.cpp
110

this is no payload specified, we should also have one test where it's specified but empty e.g. $()^ (even if we make this equivalent to unspecified)

tom-anders marked 8 inline comments as done.

Use std::pair for returning payloads along with points/ranges. Also, refactor internal logic.

Thanks! sorry about continuing to drift this patch, but this looks like a great improvement.

No problem! I wasn't quite happy yet with the previous version of this patch either, really appreciate the continuous feedback!

llvm/include/llvm/Testing/Support/Annotations.h
58

Yeah structured bindings help a lot here. Also, gtest has a Pair matcher which is really nice to use for this stuff

95

Ok, got rid of all_points_and_payloads() and kept all_points() as-is for now

llvm/lib/Testing/Support/Annotations.cpp
30

Felt a bit less readable to me, so I kept the two optionals for name and payload

sammccall accepted this revision.Nov 17 2022, 8:43 PM

Fantastic, thanks!

clang-tools-extra/clangd/unittests/Annotations.h
29–31

Nit: no need for clangd:: qualifiers here or in cpp file, apart from on Range (to distinguish from llvm:: Annotations::Range)

llvm/include/llvm/Testing/Support/Annotations.h
28

Nit: I'd probably rather show payload both with and without name here, and then not bother repeating different combinations for range:

$definition^ // points can be named: "definition"
$(foo)^ // or have a payload: "foo"
$definition(foo)^ // or both
$fail[[...]] // ranges can have names/payloads too
28

Nit: examples here are realistic, so avoid "foo". Maybe "complete"?

78

Nit: I think it's more valuable to keep the one-sentence summary to one line (as above) than to repeat (some of) the syntax options again here, since this is clearly a variant of the method above.
Also the llvm::Optional part is obsolete.

Maybe Return the position of the named marked point, and its payload if any.?

90–91

Can you add a FIXME here and on all_points() to remove these functions in favor of exposing All directly?

(It's great that this isn't done in this patch though: that API break will cause some out-of-tree headaches I'm happy to deal with myself)

118

Document the sentinel value, or maybe better

size_t End = -1;
bool isPoint() const { return End == size_t(-1); }
This revision is now accepted and ready to land.Nov 17 2022, 8:43 PM

(please update the commit message too since the design changed)

This revision was automatically updated to reflect the committed changes.
tom-anders marked 6 inline comments as done.

Thanks again! Landed this with your suggestions incorporated.