This is an archive of the discontinued LLVM Phabricator instance.

[include-mapping] Allow multiple headers for the same symbol. Choose the first header of available ones.
ClosedPublic

Authored by VitaNuo on Jan 19 2023, 1:59 AM.

Diff Detail

Event Timeline

VitaNuo created this revision.Jan 19 2023, 1:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2023, 1:59 AM
VitaNuo requested review of this revision.Jan 19 2023, 1:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2023, 1:59 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
VitaNuo updated this revision to Diff 490468.Jan 19 2023, 5:21 AM

Remove variant logic in generator.

VitaNuo updated this revision to Diff 490476.Jan 19 2023, 5:47 AM

Restore variant logic for overloads. Remove broken logic for accepted variants.

VitaNuo updated this revision to Diff 490478.Jan 19 2023, 5:49 AM

Fix typo.

since we modify the python script, could you please update its artifacts (StdSymbolMap.inc, CSymbolMap) as well?

VitaNuo updated this revision to Diff 490824.Jan 20 2023, 7:05 AM

Add re-generated symbol maps.

Added the re-generated symbol maps to this patch as per @hokein's request.

Added the re-generated symbol maps to this patch as per @hokein's request.

Sorry for not being cleared. My suggestion was to generate the maps on the old data (2018-10-28), the reason is that we can see the diffs caused by the change of the python script, it would make the review easier. We plan to update the symbol mappings to latest version, I think we should do it in a followup patch.

clang/tools/include-mapping/gen_std.py
17 ↗(On Diff #490478)

nit: std::swap doesn't have variants, it is provided by multiple headers.

VitaNuo updated this revision to Diff 491258.Jan 23 2023, 1:40 AM

Update the 20181028 archive maps.

VitaNuo updated this revision to Diff 491260.Jan 23 2023, 1:43 AM
VitaNuo marked an inline comment as done.

Address review comment.

Oh I misunderstood your comment regarding which maps need to be regenerated as part of this patch. Regenerated the 2018 ones now. As expected, there are only added symbols now.

kadircet added inline comments.
clang/tools/include-mapping/cppreference_parser.py
174 ↗(On Diff #491260)

this is actually checking for something else (sorry for the confusing naming).

the variant here refers to library name mentioned in parentheses (this is same problem as std::move) on the std symbol index page https://en.cppreference.com/w/cpp/symbol_index (e.g. remove<>() (algorithm)). by getting rid of this we're introducing a regression, as previously std::remove wouldn't be recognized by the library, but now it'll be recognized and we'll keep suggesting <cstdio> for it.

so we should actually keep this around.

Can you add a unittest in the StandardLibraryTest.cpp?

clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc
100 ↗(On Diff #491260)

Conceptually, this (and other atomic_* symbols) doesn't feel correct:

  • <atomic> provides the generic template template<class T> struct atomic;
  • <memory> provides partial template specializations for std::shared_ptr and std::weak_ptr

They are variant symbols (ideally, they should be treat as the std::move()). The issue here is that unlike std::move which has two individual entries in the index page, we only have one entry for std::atomic (extending the cppreference_parser.py script to perfectly distinguish these two cases in the page seems non-trivial). Some options:

  1. treat them as multiple-header symbols (like what this patch does now)
  2. special-case these symbols like std::move()
  3. always prefer the header providing generic templates

@kadircet, what do you think?

369 ↗(On Diff #491260)

this one as well, both headers provide different overloads.

1053 ↗(On Diff #491258)

I think std::remove should not be in the scope of the patch, it has two variants:

  • std::remove from <cstdio>
  • and std::remove from <algorithm>.
clang/lib/Tooling/Inclusions/Stdlib/StandardLibrary.cpp
21–22

int => unsigned.

Any reason to change it to a map? My understanding is that SymbolNames is a mapping from the SymbolID => {Scope, Name}, it should not be affected if we add multiple-header symbols.

120

int => unsigned.

121

Given the fact that multiple-header symbols are grouped in the .inc file, we could simplify the code of checking a new symbol by looking at the last available SymIndex:

if (NextAvailSymIndex > 0 && SymbolNames[NextAvailSymIndex-1].first == NS && SymbolNames[NextAvailSymIndex-1].second == Name) {
   // this is not a new symbol.
}
132

nit: inline the HeaderID.

170

The ID field is guaranteed to be valid, so no need to do the sanity check.

VitaNuo updated this revision to Diff 491367.Jan 23 2023, 7:23 AM

Re-introduce the special std::remove handling for now.

clang/tools/include-mapping/cppreference_parser.py
174 ↗(On Diff #491260)

Ok, I can keep this out of this patch, but we'll have to remove this logic evetually when we deal with overloads.

I have a slight suspicion that this code might be buggy, because it suggests that one _of_ the variants should be accepted. What is does in reality, though, is it keeps algorithm in the list of headers suitable for std::remove alongside cstdio, and then in the last step std::remove is ignored by the generator because of being defined in two headers.

With this patch, the result will be both {cstdio, algorithm}. Is this (more) satisfactory for now compared to skipping algorithm due to being an overload?

VitaNuo updated this revision to Diff 491380.Jan 23 2023, 7:55 AM
VitaNuo marked 2 inline comments as done.

Address review comments regarding data structure use and style.

VitaNuo updated this revision to Diff 491382.Jan 23 2023, 7:56 AM

Fix formatting.

VitaNuo added inline comments.Jan 23 2023, 8:01 AM
clang/lib/Tooling/Inclusions/Stdlib/StandardLibrary.cpp
21–22

Right. I am not sure what I was thinking. Possibly I was thinking that symbol IDs won't be continuous anymore after this change, but they actually are. Thanks!

121

I know this is easier in terms of code, but this heavily relies on the order in the generated files. I would prefer to keep the library and the generator as decoupled as possible, even if it means slightly more complex code here. Overall, it seems more future-proof in case of unrelated generator changes (bugs?) that might change the order.

170

This is all gone now, since the code is back to the array version of SymbolNames.

kadircet added inline comments.Jan 23 2023, 8:12 AM
clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc
100 ↗(On Diff #491260)

right, i believe this patch is definitely adding keeping multiple headers for a symbol around, but mixing it with keeping headers for variants (e.g. overloads provided by different headers, or specializations as mentioned here).

we definitely need some more changes in parser to make sure we're only preserving alternatives for the same symbol and not for any other variants (overloads, specializations). IIRC there are currently 2 places these variants could come into play:

  • first is the symbol index page itself, symbols that have (foo) next to them have variants and should still be ignored (e.g. std::remove variant of cstdio shouldn't be preserved in the scope of this patch)
  • second is inside the detail pages for symbols, e.g. std::atomic, we can have headers for different declarations, they're clearly different variants so we shouldn't add such symbols into the mapping _ParseSymbolPage already does some detection of declarations in between headers.

in the scope of this patch, we should keep ignoring both.

clang/tools/include-mapping/cppreference_parser.py
183 ↗(On Diff #491367)

nit: formatting

174 ↗(On Diff #491260)

Ok, I can keep this out of this patch, but we'll have to remove this logic evetually when we deal with overloads.

Surely, I wasn't saying this should stay here forever, i am just saying that what's done in the scope of this patch doesn't really address the issues "worked around" by this piece.

I have a slight suspicion that this code might be buggy, because it suggests that one _of_ the variants should be accepted. What is does in reality, though, is it keeps algorithm in the list of headers suitable for std::remove alongside cstdio, and then in the last step std::remove is ignored by the generator because of being defined in two headers.

right, it's because we have logic to prefer "non-variant" versions of symbols when available (i.e. in the absence of this logic, we'd prefer std::remove from cstdio). this logic enables us to preserve certain variants (in addition to non-variants). that way we treat std::remove as ambigious rather than always resolving to <cstdio>, hence it's marked as "missing", similar to std::move.

With this patch, the result will be both {cstdio, algorithm}. Is this (more) satisfactory for now compared to skipping algorithm due to being an overload?

in the end this should probably look like {algorithm, cstdio}, but as mentioned elsewhere, this is not the same problem as "same symbol being provided by multiple header" but rather "different symbols having same name and different headers". so treatment of it shouldn't change by this patch.

VitaNuo updated this revision to Diff 491407.Jan 23 2023, 8:59 AM

Special case the overloads for now.

VitaNuo added inline comments.Jan 23 2023, 9:22 AM
clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc
100 ↗(On Diff #491260)

I suggest to special-case the overloads for now, just not to solve all the problems in the same patch.

369 ↗(On Diff #491260)

I suggest to special-case the overloads for now, just not to solve all the problems in the same patch.

clang/tools/include-mapping/cppreference_parser.py
174 ↗(On Diff #491260)

On second thought, I think we'd rather special-case the overloads for now (until we get to them). See the newest patch version.

VitaNuo added inline comments.Jan 23 2023, 10:02 AM
clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc
1053 ↗(On Diff #491258)

Yes, agreed. Let's take care of the overloads going forward.

clang/tools/include-mapping/cppreference_parser.py
174 ↗(On Diff #491260)

right, it's because we have logic to prefer "non-variant" versions of symbols when available (i.e. in the absence of this logic, we'd prefer std::remove from cstdio).

Where is this logic? AFAICS the generator in the current state doesn't generate anything for std::remove.

VitaNuo added inline comments.Jan 24 2023, 1:09 AM
clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc
100 ↗(On Diff #491260)

The first group are already ignored. We need to take a lot at how to ignore the second one.

hokein added inline comments.Jan 24 2023, 5:22 AM
clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc
106 ↗(On Diff #491407)

Looks like the regex filters too many symbols -- e.g. atomic_fetch_add is not a variant symbol, we should keep it instead. The same to other following symbols as well.

100 ↗(On Diff #491260)

Ideally, we should tweak the _ParseSymbolPage to handle this case, but I don't see a easy way to do it (the atomic case is quite tricky where the symbol name is identical, only the template argument is different, and the simple text-match heuristic in _ParseSymbolPage fails in this case).

so specialcasing these (there are not too many of them) looks fine to me.

clang/lib/Tooling/Inclusions/Stdlib/StandardLibrary.cpp
121

but this heavily relies on the order in the generated files.

Yeah, this is true. I actually think we should make it as an invariant (multiple-header symbols are grouped) of the generated .inc files. This invariant is important and useful, it is much easier for human to read and justify. We can probably guarantee it in the generator side.

clang/tools/include-mapping/gen_std.py
112 ↗(On Diff #491407)

I'd use a list of exact symbol names (the number of them should not be huge), to avoid filtering out symbols by accident (see my other comment).

kadircet added inline comments.Jan 24 2023, 7:32 AM
clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc
100 ↗(On Diff #491260)

I don't follow why we can't perform this detection directly, haven't been taking a look at the details but the page looks like:

Defined in header <atomic>
template< class T > struct atomic;
template< class U > struct atomic<U*>;
Defined in header <memory>
template< class U > struct atomic<std::shared_ptr<U>>;
template< class U > struct atomic<std::weak_ptr<U>>;
Defined in header <stdatomic.h>
#define _Atomic(T) /* see below */

So _ParseSymbolPage, first sees <atomic> then a bunch of decls, then <memory> then a bunch more decls and so on, and in the end it returns all the headers it has found.

Why can't we change the logic here to return nothing when there are different decls && headers? i.e. return empty if we see a new header after seeing some declarations.
does this result in undesired behaviour elsewhere?

VitaNuo updated this revision to Diff 491796.Jan 24 2023, 7:44 AM

Add early return for found symbols.

kadircet added inline comments.Jan 24 2023, 8:08 AM
clang/tools/include-mapping/cppreference_parser.py
174 ↗(On Diff #491260)

Where is this logic? AFAICS the generator in the current state doesn't generate anything for std::remove.

It's the logic that you're deleting:

if variant and variant not in variants_for_symbol:
    continue

we ignore any symbols that has a variant we shouldn't accept and always prefer the non-variant versions. to be more concrete when parsing c++ symbol index we'll see two alternatives for std::remove:

remove()
remove<>() (algorithm)

first one is a non-variant, hence it's accepted by default. second one is algorithm variant, and is accepted per this deleted logic. in the end we used to didn't generate anything because we now have multiple headers providing std::remove.

i think it creates more confusion to special case only std::remove down below, and not other symbols whose non-variant versions we accept, e.g. sin also has non-variant versions, but we don't "special case" it to keep it out of the map.

i'd suggest treating std::remove similar to other symbols with an accepted variant, rather than creating a divergence. so my suggestion is changing the logic above to only accept variants mentioned in the map when there's a mapping for the symbol (and reject the non-variant version). net effect will be that we'll now include the mapping from std::remove to <algorithm>, which is currently inline with our behaviour against other symbols with different variants. we try to pick the most common variant, and that's almost always the "non-variant" version, apart from std::remove.

that way we should get rid of the special casing below completely. does that make sense?

clang/tools/include-mapping/gen_std.py
112 ↗(On Diff #491796)

different headers providing std::swap don't provide different variants of it, these are just ADL extension points, similar to std::begin (which also shouldn't be missing from the mapping).

for remove, i've provided more context in the discussion above, about keeping a different variant of std::remove around.

VitaNuo updated this revision to Diff 492130.Jan 25 2023, 8:33 AM

Refactor the symbol parsing loop.

VitaNuo updated this revision to Diff 492160.Jan 25 2023, 9:51 AM

Refine C parsing.

VitaNuo updated this revision to Diff 492352.Jan 26 2023, 12:59 AM

Sort the C symbol map.

VitaNuo updated this revision to Diff 492366.Jan 26 2023, 2:41 AM

Fix "atomic" and "div" overloads programmatically. Add a FIXME for future proper handling of overloads.

VitaNuo marked 2 inline comments as done.Jan 26 2023, 2:46 AM
VitaNuo added inline comments.
clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc
106 ↗(On Diff #491407)

They're all back now.

100 ↗(On Diff #491260)

Ok, I've fixed (or hacked :) this programmatically.
The parsing of constants in https://en.cppreference.com/w/cpp/locale/codecvt_mode is not complete now, everything is attributed to the "codecvt" header, the "locale" header is ignored. Hope it's OK.
If not, then I'm afraid I can only special-case the "atomic.*" symbols.

369 ↗(On Diff #491260)

Should be fixed now, together with "atomic.*" cases.

VitaNuo updated this revision to Diff 492396.Jan 26 2023, 5:01 AM
VitaNuo marked an inline comment as done.

Remove special casing from gen_std.py. Re-introduce special handling for std::remove.

VitaNuo updated this revision to Diff 492398.Jan 26 2023, 5:07 AM

Formatting.

VitaNuo updated this revision to Diff 492400.Jan 26 2023, 5:08 AM

Formatting.

VitaNuo updated this revision to Diff 492408.Jan 26 2023, 5:39 AM
VitaNuo marked an inline comment as done.

Rely on alphabetic ordering to determine whether the symbol is new.

Thank you all for comments! The patch should be ready for the next review now.

clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc
1053 ↗(On Diff #491258)

The latest version keeps the std::remove overload from <algorithm>.

clang/lib/Tooling/Inclusions/Stdlib/StandardLibrary.cpp
121

Ok, sure. Sorted the symbols in the generator now and also applied the snippet from above. I'm not 100% sure the code became much simpler but this version seems fine too :)

clang/tools/include-mapping/cppreference_parser.py
174 ↗(On Diff #491260)

Ok, I've tweaked the logic to keep std::remove from <algorithm> now.

clang/tools/include-mapping/gen_std.py
112 ↗(On Diff #491796)

Thank you. The latest version should take care both of the overloads that have accidentally made it to this patch and of the std::remove case.

VitaNuo updated this revision to Diff 492413.Jan 26 2023, 5:42 AM

Remove extra import.

I haven't read though all the change of _ParseSymbolPage, left some comments.

I think we need to update&add proper tests in include-mapping/test.py as we has changed the parser.

clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc
1162 ↗(On Diff #492413)

Is this intended? it seems to me the cppparser doesn't handle this case correctly, in the swap symbol page, we have two headers in a single t-dsc-header tr, the parser should consider both (like the above size_t).

Defined in header <algorithm>
Defined in header <utility>
clang/lib/Tooling/Inclusions/Stdlib/StandardLibrary.cpp
108

We can avoid a local variable by

auto Add = [.. SymIndex(-1)] (....) {
if (SymIndex >=0 &&
    SymbolNames[SymIndex].first == NS && ...) {
} else {
    // the first symbol, or new symbol.
    ++SymIndex;
}

SymbolNames[SymIndex] = {NS, Name};
....
}
clang/tools/include-mapping/cppreference_parser.py
39 ↗(On Diff #492413)

If the header_to_accept is set, I think we can just return {header_to_accept} directly.

96 ↗(On Diff #492413)

IIUC, this is the major and intended change of the function, this is not a FIXME.

105 ↗(On Diff #492413)

The comment doesn't match the current behavior. IIUC, iof the symbol was never named, we only consider the first header now.

107 ↗(On Diff #492413)

why sort the headers?

kadircet added inline comments.Jan 27 2023, 5:55 AM
clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc
205 ↗(On Diff #492413)

i think we should have other providers here, https://en.cppreference.com/w/cpp/iterator/begin. do we know why they're dropped?

clang/tools/include-mapping/cppreference_parser.py
165 ↗(On Diff #492413)

variant is not necessarily the header, eg:

acos()
acos<>() (std::complex) (since C++11)
acos<>() (std::valarray)

in this case variants are std::complex and std::valarray.

hence we're not trying to "infer" the header we want to preserve but rather decide on which symbol page we want to parse. we should still accept "all the headers" mentioned in that variant symbol page.

VitaNuo updated this revision to Diff 492774.Jan 27 2023, 8:40 AM
VitaNuo marked 6 inline comments as done.

Address review comments. No new tests in this version.

VitaNuo updated this revision to Diff 492775.Jan 27 2023, 8:42 AM

Fix tests.

VitaNuo marked an inline comment as done.Jan 27 2023, 8:43 AM

Thanks for the comments! AFAICS I've addressed all of them.
Re tests, thanks for the reminder @hokein! I've fixed them now, everything is green.

clang/include/clang/Tooling/Inclusions/StdSymbolMap.inc
205 ↗(On Diff #492413)

Yes, this list is the regeneration of the 2018 html book. It's explicitly _not_ generated from the 2022 book in order to have a clear diff related to the topic of this patch.
In the 2018 book, the only provider for std::begin is iterator.

1162 ↗(On Diff #492413)

You're right. There's another random class name (t-dcl-rev-aux) that needs to be skipped. I've changed the condition in line 81 of cppreference_parser.py (the loop that skips unneeded rows between the header block and the symbol block) to negation in order to make it more robust and avoid listing all these unneeded classes unnecessarily.

clang/lib/Tooling/Inclusions/Stdlib/StandardLibrary.cpp
108

Ok, sure.

clang/tools/include-mapping/cppreference_parser.py
39 ↗(On Diff #492413)

We'd need to add <> then. It needs a couple of extra lines here or there anyways, I don't think it makes a difference.

96 ↗(On Diff #492413)

I guess it depends on how you view it: I meant it as FIXME in the context of overloads, i.e., we might want to remove this and add infrastructure to differentiate between overloads explicitly.
I can remove the FIXME part, no problem.

105 ↗(On Diff #492413)

I've removed all the hacking with all_headers now, so the comment is valid again.

107 ↗(On Diff #492413)

This was a hack to make the generator return atomic rather than memory for std::atomic.* symbols. But I've noticed now that this adds some trouble to the C symbol map. I think now that cutting all_headers to the first element is wrong.
I will special-case the std::atomic.* symbols instead, because so far I don't see a way to solve this programmatically without other collateral effects.

Note that the problem is not std::atomic itself (for which Kadir extensively described the solution in some of the other comments), but rather atomic_bool, atomic_char etc. mentioned further in the page. The headers for them come from all_headers rather than symbol_headers, and there seems to be no way to limit them to <atomic> without other collateral effects.

165 ↗(On Diff #492413)

Ah ok, I was mostly looking at std::remove (algorithm). Ok, I will use the term "variant" as before. Thanks.

kadircet added inline comments.Jan 27 2023, 10:39 AM
clang/tools/include-mapping/cppreference_parser.py
196 ↗(On Diff #492775)

there's no variant of "std::atomic.*" called "atomic", in https://en.cppreference.com/w/cpp/symbol_index. is it something in 2018 version of the html book? otherwise there's only the unnamed variant and std::shared_ptr variant, and we should be preferring unnamed variant.

107 ↗(On Diff #492413)

I think now that cutting all_headers to the first element is wrong.

What's exactly breaking once you do that? (I guess by first element you still mean "all the headers we've seen until the first declaration in the page")

but rather atomic_bool, atomic_char etc. mentioned further in the page. The headers for them come from all_headers rather than symbol_headers, and there seems to be no way to limit them to <atomic> without other collateral effects.

I guess this is happening because we're trying to find headers that are for the declaration block containing the interesting symbol name. Any idea if that logic is winning us anything? e.g. if we dropped the if symbol_name in found_symbols: what does change?

165 ↗(On Diff #492413)

sorry i wasn't just trying to nitpick on the name of the variable/parameters.

i was trying to say _ReadSymbolPage should not be a function of variant_to_accept.
we should "completely skip" parsing of symbol pages for variants we don't want to take in (that's what this continue here is trying to achieve)

later on when we're parsing a symbol page, the name of the variant shouldn't effect the headers we're picking in any way (especially in the way you're relying right now that says variant name and the header name will be the same, as i pointed out this might as well be std::complex).

Is this trying to work around a different problem? (especially around std::atomic_.* I assume, i'll follow up on that thread)

VitaNuo updated this revision to Diff 493564.Jan 31 2023, 4:25 AM
VitaNuo marked 2 inline comments as done.

Remove all special-casing, skip all variant pages.

Thanks for the comments!

clang/tools/include-mapping/cppreference_parser.py
196 ↗(On Diff #492775)

This is obsolete. As discussed offline, I've removed all the special-casing logic. All the variants are skipped now without exceptions.

107 ↗(On Diff #492413)

I guess by first element you still mean "all the headers we've seen until the first declaration in the page"

No, I mean the first element of the all_headers collection. all_headers is a collection that collects _all_ the headers in the page and, in case a symbol is not defined in a table with an individual header block, all_headers in the page are returned as an approximation.

What's exactly breaking once you do that?

It is essentially arbitrary to limit all_headers to the first element (at the moment it would be the first element alphabetically). E.g., for the INT8_C and other constants here, the result would be inttypes.h and stdint.h would be cut, so this result doesn't make sense.

if we dropped the if symbol_name in found_symbols what does change?

Then all the headers in the table will be attributed to all the symbols in the table. Consider this page: std::imaxdiv would be reported as defined both in cstdlib and cinttypes, which is wrong.

I guess this is happening because we're trying to find headers that are for the declaration block containing the interesting symbol name.

Not really. Headers for atomic_bool, atomic_char etc. come from all_headers rather than symbol_headers. The reason for that is they are defined in a table which doesn't have its own header block, so we are trying to approximate the headers by naming all the headers in the file.

165 ↗(On Diff #492413)

This is obsolete. As discussed offline, I've removed all the special-casing logic. All the variants are skipped now without exceptions.

VitaNuo updated this revision to Diff 493565.Jan 31 2023, 4:28 AM

Remove a FIXME.

hokein added a comment.Feb 1 2023, 1:00 AM

As discussed offline, we decided to stop spending effort on improving the cppreference_parser.py, instead, we allow human edits in the generated StdSymbolMap.inc and CSymbolMap.inc, it gives us more flexibility: sort the headers for multi-header symbol in a humn-desired order rather than suboptimal alphabet order; symbols (e.g. std::experimental::filesystem) that are not available in cppreference website; symbols (e.g. PRIX16) from a complicated cppreference HTML page which is hard to parse in cppreference_parser.py etc.

We have two major things in this patch:

  1. extend the Stdlib API to support multiple-header symbols
  2. emits multiple-header symbols in the *.inc files

For 1), this change is good. We need to make sure the change not break other users of the .inc files. clangd is the main user, the behavior will be changed in https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/index/CanonicalIncludes.cpp#L736 as now we have duplicated qualified names (the ideal solution is to ignore multiple-header symbols, I think it is fine now if we just add a single std::size_t symbol, see my comment below)
For 2), I have some concerns, it looks like some multiple-headers symbols that are handled incorrectly by the cppreference_parser.py (see my other comments). I think the best way for us is to list them manually (no action required, I will make a followup)

I'd suggest to narrow the scope of the patch:

  • keep 1) and add a unittest in llvm-project/clang/unittests/Tooling/StandardLibraryTest.cpp;
  • drop all generated multi-header symbols in *.inc files, instead, we manually add a single one std::size_t symbol to the StdSymbolMap.inc for the StandardLibrary unittest. This means we need to revert the current change of gen_std.py;
clang/include/clang/Tooling/Inclusions/CSymbolMap.inc
80 ↗(On Diff #493565)

These symbols seem to only have a single header, I think it is an improvement effect of the new change of parsing logic in cppreference_parser.py.

96 ↗(On Diff #493565)

This type of symbol (INTX_C) seems not correct, they are only from <stdint.h> -- (this is a limitation of our current cppreference parser (it can not handle the complex page: https://en.cppreference.com/w/c/types/integer)

172 ↗(On Diff #493565)

I think this kind of symbol (PRIdN, PRIdLEASTN, PRIdFASTN, PRIdMAX, PRIdPTR etc) is only from <inttypes.h> (another limitation of the cppreference parser).

clang/lib/Tooling/Inclusions/Stdlib/StandardLibrary.cpp
67–68

It is sad that we're allocating more memory for these two arrays now (as we have multiple SYMBOLs for the same qualified name). No action needed, can you add a comment saying that we're allocate more space here?

clang/tools/include-mapping/test.py
56 ↗(On Diff #493565)

The test needs to update as well, as you remove the change of _ParseSymbolPage.

VitaNuo updated this revision to Diff 493869.Feb 1 2023, 1:25 AM
VitaNuo marked an inline comment as done.

Add comment.

VitaNuo updated this revision to Diff 493870.Feb 1 2023, 1:28 AM

Revert all changes to generator.

VitaNuo updated this revision to Diff 493871.Feb 1 2023, 1:29 AM

Revert more.

Reverted everything apart from library support.

nridge added a subscriber: nridge.Feb 4 2023, 7:08 PM
hokein added a comment.Feb 7 2023, 4:45 AM

Thanks, it looks mostly good. This needs some work on rebase (as we have landed a few patches that change StandardLibrary.cpp file). I will rebase it and land it for you.

hokein updated this revision to Diff 495477.Feb 7 2023, 4:56 AM
This revision was not accepted when it landed; it landed in state Needs Review.Feb 7 2023, 5:18 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.