Introduce signals to rank providers of a symbol.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I haven't read all code yet, left a few initial comments.
clang-tools-extra/include-cleaner/lib/AnalysisInternal.h | ||
---|---|---|
85 | IIUC, we're using a union bit for symbols and headers: some properties are for both (e.g. Complete); some properties are just for headers (e.g. Public, Canonical). I found that this model is hard to follow (e.g. for each property, I need to keep thinking about whether this property is for just symbols/headers or both, the hint conversion step from Symbol to Header is not obvious). What do you think about the following alternative ?
| |
89 | nit: since this is a bit union, I'd suggest to use = 1 << 0, = 1 << 1 syntax to define the value. | |
90 | I think this is no non-private declaration. The Public property is for headers, it indicates the symbol is provided by a non-private header. | |
93 | In most cases, this bit is set. If we use a flipped bit (Private), we don't need to set it in many places. | |
95 | This comment is hard to follow, it doesn't explain the Canonical. From the design doc, it means NameMatch and Annotated headers, I think it would be nice to document them in the comment. I'd suggest avoid using the Canonical name, it remains me too much about the canonical decl in clang. Maybe Preferred? | |
clang-tools-extra/include-cleaner/lib/FindHeaders.cpp | ||
1 | any reason to change the file property? | |
100 | I think Exporter is the case we need to handle -- the exporter header should have a higher value (we probably need to add another hint bit). | |
103 | if FE is private, do we want the Verbatim set the public bit? My inclination is yes (the current code behavior is no), it feels counterintuitive that headers from getPublic() is not marked as Public. | |
clang-tools-extra/include-cleaner/lib/LocateSymbol.cpp | ||
25–42 | nit: declHints? | |
clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp | ||
161 | I think we should have some tests for the hints bit (verify the hint bits are set correctly). | |
266 | The result is correct according to the (canonical, public, complete) triple order. But we can come up with a counterexample, e.g. // transitive.h #include "public_complete.h" // main.cpp #include "transitive.h" Foo foo; We will remove the transitive.h and insert the foo.proto.h, then we cause a compiling error (incomplete type of Foo). Not sure what we should do about it. |
clang-tools-extra/include-cleaner/lib/AnalysisInternal.h | ||
---|---|---|
84 | along with SymbolLocation, I think mixing our fundamental types and analysis functions in this header makes the structure harder to understand and think they both belong in Types.h | |
85 | I agree the model isn't obvious, but I think it is the right one and should be explained. I think the model is:
| |
87 | As discussed elsewhere, we could define this as "provides a generally-usable definition for a symbol", i.e. clear it for template/class forward-decls and set it for everything else. This isn't really more code, and makes it less likely we're going to subtly misinterpret the value. | |
90 | I agree with this one: because hints aggregate as union, it would be a mistake to flag a Symbol as Public - that would mean any header that provided it would end up with the Public hint, even if the header file is private! I wonder if we should use a naming-convention for this, e.g. CompleteSymbol, PublicHeader etc, indicating the layer at which the signal is expected to be set. | |
95 | +1 to Preferred and also documenting the conditions when a signal is set here in some detail (also for Public and Complete). I think we really have to understand what the signal is quite precisely to make correct use of it. | |
103 | std::forward should just be std::move here, T is not a reference type | |
142–143 | missed this in the previous review: both locateSymbol and headersForSymbol should probably be above writeHTMLReport | |
144 | tarversing -> traversing (uber-nit: i.e. should be e.g. or simply deleted, as "top element first" is not equivalent to "list is sorted") | |
clang-tools-extra/include-cleaner/lib/FindHeaders.cpp | ||
45 | I would suggest making this function public (or operator< on Hinted<Header>) and unit-testing it to demonstrate the intended behavior. The implementation is trivial now but may not always be, and editing enum constants at the end rather than in the correct position seems like an easy mistake to make. | |
54 | doing string rendering in a comparator seems gratuitously wasteful. call stable_sort instead? the input order was used prior to this patch and seems OK. | |
63 | again, this seems unneccesary - switch over the variants instead? | |
89 | why never canonical? In general, handling !PI as a special case up front seems like a bad idea: the purpose of supporting it being nullable is to allow the code below to used/tested without one. Seems we can get away with guarding PI-bits with if (PI) below. | |
100 | I think we discussed and thought that was a bit unsafe. export can simply mean "downstream code can assumes this symbol is available", not "i'm the real owner of these symbols". Prominent example: https://github.com/protocolbuffers/protobuf/blob/14e579436fb53d0e8271c4d26d3d88c394d96454/src/google/protobuf/stubs/port.h#L90 | |
103 | +1, conceptually this is a reference to a sibling edge in the DAG, not an edge chaining off this one, so it should not inherit hints | |
171 | why not define operator< on Header if you need one? this will come up again. | |
176 | not sure if this is really a FIXME: it's a pretty deep consequence of the design that will never be fixed. Since you're relying on the distinction between "header" and "Header" here, maybe be a bit more explicit? "If two Headers probably refer to the same file (e.g. Verbatim(foo.h) and Physical(/path/to/foo.h), we won't deduplicate them or merge their hints". | |
197 | again, no need to stringify and then reparse here (in fact operator<< for Symbol is extra-inefficient, assuming that we never call it in algorithms), we have a perfectly good underlying string that's pretty easy to get at in the cases where matching is appropriate | |
201 | BTW This stringify-and-parse isn't quite correct, e.g. for operator std::string it yields operator string but for operator typename T::foo() it yields foo! (I realize we do this in clangd, which isn't great...) | |
clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp | ||
161 | these tests seem to be in the wrong file | |
161 | yes, I think this will be easier to maintain & make complete if you test the hints extracted, test the sort order, and only smoke-test the interaction between the two | |
263 | This is confusing, the most obvious interpretation is "this is the generated header for the .proto file that defines foo", but such a header would have a complete definition. I can't think of a remotely realistic proto example that actually triggers this case, so maybe drop the references to proto? It seems to give some false intuitions about whether this behavior is correct. | |
266 | yeah, this is one of the tradeoffs of not examining completeness requirements at use sites. However it's also a reason to believe that operator< on Hints may not be a simple lexicographic compare, several "lesser" hints may be able to overpower a "greater" one ==> another reason to make the comparison public & test it. | |
280 | This comment describes an implementation (IMO is unlikely to be the right one) to solve a problem (which is left implicit) - it would be nice to state the problem instead. I think we should have some special handling for the main-file, but it's not clear that boosting it is the way to go. For example, if the main-file provides a forward-decl for a class, we probably shouldn't consider any other decl of that class used. (The forward decl clearly signals that it doesn't intend to rely on some external decl). So maybe headersForSymbol should only return the main file, or return some sentinel "provided in main file" value instead of a list. |
clang-tools-extra/include-cleaner/lib/AnalysisInternal.h | ||
---|---|---|
84 | i still don't feel great about exposing these types as public API. would it help if I moved these into a TypesInternal.h ? | |
85 | Yes, as discussed offline and Sam mentions in the comment idea is to have all hints expressed positively and take union of all the capabilities on the path from an AST node to a provider (also union in case there are multiple paths ending at the same provider). Each hint is to be provided by at most a single stage of the traversal (not sure why you think Complete is for both), I'll make this more concrete by renaming hint types to encode that in the name. I'll also add some docs explaining the model. | |
90 | i agree with renaming these to indicate the stage that produced them. | |
93 | as discussed, these need to be positively expressed (like others). | |
clang-tools-extra/include-cleaner/lib/FindHeaders.cpp | ||
89 | in the absence of PI, we can't really do much in the loop below (as we would assume each file is self-contained by default) and would bail out in the first run. Hence i don't see much difference between handling it here vs right after the calculation of CurrentHints down below (I am slightly inclined towards this version as it prevents thinking about both cases in the loop for no reason). I can move it into the body if you feel strongly about it though. regarding canonicalness, right now it's inferred either by namematch (which is calculated in headersForSymbol) or via IWYU pragmas, which isn't available in absence of PI. | |
100 | agreed, exports shouldn't really be boosted. | |
103 | oops you're right. this was an oversight. same applies to exporters. | |
201 | i think using the identifier name should be fine here, operators will be unnamed and won't get any name match. | |
clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp | ||
161 |
I've chosen this one as the implementation lives in FindHeaders.cpp, do you think it belongs to AnalysisTests instead? | |
266 | in addition to what Sam said, note that we'll preserve public_complete.h if it's already included directly. |
LG, just nits really!
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h | ||
---|---|---|
133 | this is a bit confusing... kind of the opposite of what std::in_place means? Maybe just a private struct Sentinel{}; instead of in_place_t? Symbol does this without any sentinel at all, if that's causing ambiguity then it might be worth fixing both to be the same. | |
clang-tools-extra/include-cleaner/lib/AnalysisInternal.h | ||
84 | AnalysisInternal < TypesInternal < Types I think, so yes (I think separating the concepts/data structures from the algorithm/implementation is the important concern here. I don't think defensively minimizing surface area is important as the library simply won't have many users, and that exposing these is a net win because it makes the structure of the system easier for human readers to understand. Anyway, up to you) | |
86 | nit: touched => traversed? Since you haven't explicitly mentioned a graph here (which is fine), hint at it more specifically | |
91 | "positive" makes intuitive sense, but not *quite* the right concept. (existential vs universal quantification) Maybe just give an example: We choose the boolean sense accordingly, e.g. "Public" rather than "Private", because a header is good if it provides any public definition, even if it also provides private ones. | |
94 | nit: maybe Hints over Hint to indicate it's a bitset | |
95 | nit: the conditions seem well-documented now, but I think we should drop the e.g. these should be the actual definitions, not just examples. | |
110 | nit: augment values with hints? (or "augment types with Hint", but I think values is clearer) | |
clang-tools-extra/include-cleaner/lib/FindHeaders.cpp | ||
38 | just stable_sort(reverse(Headers))? (reverse() returns an rbegin/rend view which should DTRT) | |
46 | this signature seems unneccesarily complicated/hard to understand what about nameMatch(DeclName, Header) -> Hint (or ->bool) and write the trivial loop in the caller? | |
60 | debug stuff, and below | |
61 | even though it's a few lines, if you pull out StringRef basename(StringRef Header) this code looks a lot clearer (and if returning bool, directly return basename(H.physical()->getName()).equals_insensitive(DeclName) etc inline above) | |
118 | headers() returns the alternatives in preferred order, so you only want to set PreferredHeader on the first. | |
130 | Somewhere say what you're doing: deduplicating headers and merging their hints | |
clang-tools-extra/include-cleaner/lib/Types.cpp | ||
114 | I think comparing pointers is going to give inconsistent results, and would suggest comparing Name (without trying to normalize, just for at least reproducibility across identical runs) | |
clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp | ||
161 | Nope, I think I was just confused, sorry | |
256 | I don't think it does, it looks like you're sorting by the FileEntry pointers - this may happen to be discovery order a lot of the time but it's of course not guaranteed. |
- Address comments & rebase
clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h | ||
---|---|---|
133 | without a disambiguation tag we get an ambigious conversion when we have a constructor call that looks like Header("foo.h") (whether to use Header(StringRef) or Header(variant<StringRef>), not sure if it's fine for an "inaccessible" constructor to participate in overload resolution, nor how variant can be viable here despite needing a double implicit conversion from stringref to variant) This didn't bite us with Symbol yet because constructor takes a Decl& but variant stores a Decl*, and for Macro case we always explicitly initialized with spelling of Macro. You're right about usage of inplace_t though, I was confused. using a sentinel tag instead. | |
clang-tools-extra/include-cleaner/lib/Types.cpp | ||
114 | i was rather worried about correctness of a name based match, e.g. we can have the same FileEntry pointers for foo.h and bar/foo.h but i guess it's OK to assume that a FileManager won't be accessed in the middle of a comparison sequence. |
this is a bit confusing... kind of the opposite of what std::in_place means?
(All of the other constructors build the state in-place in some sense, but this one doesn't)
Maybe just a private struct Sentinel{}; instead of in_place_t?
Symbol does this without any sentinel at all, if that's causing ambiguity then it might be worth fixing both to be the same.