This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Make sure file entry for artificial functions has an MD5 checksum
ClosedPublic

Authored by probinson on Jul 21 2023, 1:39 PM.

Details

Summary

The DIFile cache was keyed on a string pointer instead of string content,
which was causing misses and resulted in an entry without a checksum.
In DWARF v5 if any checksum is missing, we can't write any to the output
file, so this had consequences.

Fixes https://github.com/llvm/llvm-project/issues/63955

Diff Detail

Event Timeline

probinson created this revision.Jul 21 2023, 1:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2023, 1:39 PM
probinson requested review of this revision.Jul 21 2023, 1:39 PM

The fix is straightforward but the test was surprisingly tricky; I had to add the -main-file-name option to keep one of the DIFile's from ending up as <stdin>.
I'm still befuddled about why there are two DIFile entries, when the DIFileCache clearly has only one. The metadata remains mysterious to me.

aprantl accepted this revision.Jul 24 2023, 9:24 AM
This revision is now accepted and ready to land.Jul 24 2023, 9:24 AM
This revision was landed with ongoing or failed builds.Jul 24 2023, 10:53 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2023, 10:53 AM

Any memory usage measurements to check this doesn't have a significant adverse impact by copying all the strings? (or performance by having to do string rather than pointer equality comparisons?)

Could/should we do the lookup on the CU filename before it goes into the DI metadata, and store that FileID somewhere for later use? Rather than requerying it later/making all queries string comparisons/storing strings, etc? (we could put an "expensive checks" assertion in that our queries always return in agreement with the pointer equality - so in case there are other things that trip over this issue we'll know about them sooner)

Any memory usage measurements to check this doesn't have a significant adverse impact by copying all the strings?

Not actual measurements, no; but intuitively the size should not be much greater than the size of the filename entries in the .debug_line section for the CU, which should be KB not MB. (Plus StringMap entry overhead, which is constant per node.) Therefore I didn't take the time to measure. But see below for a possible alternate approach which would avoid even that much.

Could/should we do the lookup on the CU filename before it goes into the DI metadata, and store that FileID somewhere for later use?

There's a note in issue 63955 to the effect that I can't find an API to turn a filename into a FileID. If there is one that I didn't find, we could use FileIDs instead of pointers to name strings.

FYI this is a 0.5% compile-time regression on O0 builds (https://llvm-compile-time-tracker.com/compare.php?from=69593aa5c054cec6be6b822c073ccdc63748a68d&to=7abb5fc618cec66841a8280d2a099a4c9c8cb91b&stat=instructions:u). Is that expected?

That's higher than I expected.

It might be feasible to do this a different way: When an invalid loc comes in, search the DIFileCache for a matching string, and use that instead of using the CU's copy of the filename. Then we can go back to using pointers as the keys. Might eliminate the duplicate DIFile as well. I'll look into that. Then the time cost would be incurred only when invalid locs come in, which is a minority of lookups (depends on the number of artificial functions generated).