This is an archive of the discontinued LLVM Phabricator instance.

[clang] Don't use the AST to display backend diagnostics
ClosedPublic

Authored by aeubanks on Sep 28 2021, 4:01 PM.

Details

Summary

We keep a map from function name to source location so we don't have to
do it via looking up a source location from the AST. However, since
function names can be long, we actually use a hash of the function name
as the key.

Additionally, we can't rely on Clang's printing of function names via
the AST, so we just demangle the name instead.

This is necessary to implement
https://lists.llvm.org/pipermail/cfe-dev/2021-September/068930.html.

Diff Detail

Event Timeline

aeubanks requested review of this revision.Sep 28 2021, 4:01 PM
aeubanks created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2021, 4:01 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
dblaikie added inline comments.Sep 28 2021, 8:10 PM
clang/lib/CodeGen/CodeGenAction.cpp
345–346

Could we record these earlier, partly to avoid having to do the lookup by mangled name here? (eg: when the function's created instead)

652–655

Not worth doing anything faster (hash table, etc)?

If it's going to stay linear - could use llvm::find_if perhaps - though I realize it doesn't add a /huge/ amount of value here.

If it's going to stay with the explicit loop, the Pair reference should probably be made const.

clang/test/Frontend/backend-diagnostic.c
18–21

I'm surprised these and other warning updates don't show the signature of the function, I'd have expected demangle to have returned something like 'stackSizeWarning()' rather than just 'stackSizeWarning'?

(also it seems nice to retain the word "function" and quotation marks around the function name in the diagnostic messages, I think? (could end up with weird situations where simple function names could be confused with prose in the error message))

aeubanks updated this revision to Diff 375958.Sep 29 2021, 10:49 AM

address comments

clang/lib/CodeGen/CodeGenAction.cpp
345–346

We don't know if a function will be emitted into the IR when creating its AST node. And GetDeclForMangledName may not work if we haven't fully constructed the AST yet (not sure about this, but this would definitely require a lot of frontend knowledge).

652–655

yeah, find_if doesn't really make it any easier to read/shorter

I'm trying to optimize for construction time, not reporting time since reporting warnings/errors is much rarer. Can't get much better than a vector.

clang/test/Frontend/backend-diagnostic.c
18–21

I've added quotes.
For the word "function", sometimes the demangling prints out actual prose, e.g. in backend-stack-frame-diagnostics.cpp below, we have a invocation function for block in frameSizeBlocksWarning(), and adding "function" would print function 'invocation function for block in frameSizeBlocksWarning()'. Maybe that's fine since those are rarer, WDYT?
For C functions, I believe since there's no overloading, we don't print types/parentheses.

rnk added inline comments.Sep 29 2021, 11:58 AM
clang/lib/CodeGen/CodeGenAction.cpp
129

As David says, this could be a plain hash map, maybe hash_code isn't hashable, but it must have some way to convert to a plain integer.

This also deserves a comment about why the key here is a hash code and not the full string. The purpose of this is to avoid storing another copy of all the mangled names, which are large. In the case of a hash collision, it is possible that the wrong location will be returned. The probability is roughly #functions / size_t_max, so pretty negligible, especially for a 64-bit platform. In any case, the consequences of a collision are that we use the wrong source location, which is a diagnostic quality issue, which isn't severe.

This would be a good use case for a fast content hash, such as blake3. I believe it would be really useful for LLD to have such a fast content hash for parellelizing symbol resolution and other deduplication actions, so I think it is worth a FIXME or TODO to use a fast content hash when one is available.

651

There are LLVM passes that will rename functions, so this lookup will fail for such functions, but I guess that was already the case in the old code. Nothing to do here.

aeubanks updated this revision to Diff 376024.Sep 29 2021, 1:22 PM

add comments (including why it's a vector and not an actual map)

dblaikie accepted this revision.Oct 4 2021, 1:57 PM

Sounds OK to me.

This revision is now accepted and ready to land.Oct 4 2021, 1:57 PM
This revision was automatically updated to reflect the committed changes.