The DIFile cache was keyed on a string pointer instead of string content,
which was causing misses and resulted in an entry without a checksum.
In DWARF v5 if any checksum is missing, we can't write any to the output
file, so this had consequences.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The fix is straightforward but the test was surprisingly tricky; I had to add the -main-file-name option to keep one of the DIFile's from ending up as <stdin>.
I'm still befuddled about why there are two DIFile entries, when the DIFileCache clearly has only one. The metadata remains mysterious to me.
Any memory usage measurements to check this doesn't have a significant adverse impact by copying all the strings? (or performance by having to do string rather than pointer equality comparisons?)
Could/should we do the lookup on the CU filename before it goes into the DI metadata, and store that FileID somewhere for later use? Rather than requerying it later/making all queries string comparisons/storing strings, etc? (we could put an "expensive checks" assertion in that our queries always return in agreement with the pointer equality - so in case there are other things that trip over this issue we'll know about them sooner)
FYI this is a 0.5% compile-time regression on O0 builds (https://llvm-compile-time-tracker.com/compare.php?from=69593aa5c054cec6be6b822c073ccdc63748a68d&to=7abb5fc618cec66841a8280d2a099a4c9c8cb91b&stat=instructions:u). Is that expected?
Any memory usage measurements to check this doesn't have a significant adverse impact by copying all the strings?
Not actual measurements, no; but intuitively the size should not be much greater than the size of the filename entries in the .debug_line section for the CU, which should be KB not MB. (Plus StringMap entry overhead, which is constant per node.) Therefore I didn't take the time to measure. But see below for a possible alternate approach which would avoid even that much.
Could/should we do the lookup on the CU filename before it goes into the DI metadata, and store that FileID somewhere for later use?
There's a note in issue 63955 to the effect that I can't find an API to turn a filename into a FileID. If there is one that I didn't find, we could use FileIDs instead of pointers to name strings.
FYI this is a 0.5% compile-time regression on O0 builds (https://llvm-compile-time-tracker.com/compare.php?from=69593aa5c054cec6be6b822c073ccdc63748a68d&to=7abb5fc618cec66841a8280d2a099a4c9c8cb91b&stat=instructions:u). Is that expected?
That's higher than I expected.
It might be feasible to do this a different way: When an invalid loc comes in, search the DIFileCache for a matching string, and use that instead of using the CU's copy of the filename. Then we can go back to using pointers as the keys. Might eliminate the duplicate DIFile as well. I'll look into that. Then the time cost would be incurred only when invalid locs come in, which is a minority of lookups (depends on the number of artificial functions generated).