This is an archive of the discontinued LLVM Phabricator instance.

[lld] Use -1 as tombstone value for discarded code ranges
ClosedPublic

Authored by Eric on Nov 19 2020, 9:28 AM.

Details

Summary

Under existing behavior discarded functions are relocated to have the start pc 0. This causes problems when debugging as they typically overlap the first function and lldb symbol resolution frequently chooses a discarded function instead of the correct one. Using the value -1 or -2 (depending on which DWARF section we are writing) is sufficient to prevent lldb from resolving to these symbols.

Diff Detail

Event Timeline

Eric created this revision.Nov 19 2020, 9:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 19 2020, 9:28 AM
Eric requested review of this revision.Nov 19 2020, 9:28 AM
lebedev.ri retitled this revision from Use -1 as tombstone value for discarded code ranges to [lld] Use -1 as tombstone value for discarded code ranges.Nov 19 2020, 9:31 AM

Are there no tests that fail under this change? If not, I guess we should add one.

Are there no tests that fail under this change? If not, I guess we should add one.

Failed Tests (3):
  lld :: wasm/debug-removed-fn.ll
  lld :: wasm/debuginfo.test
  lld :: wasm/undefined-weak-call.ll

The change does not seem right. For non-debug sections (when --allow-undefined is set), addend; .debug_ranges/.debug_loc => -2; other .debug_* => -1

Are there no tests that fail under this change? If not, I guess we should add one.

Failed Tests (3):
  lld :: wasm/debug-removed-fn.ll
  lld :: wasm/debuginfo.test
  lld :: wasm/undefined-weak-call.ll

The change does not seem right. For non-debug sections (when --allow-undefined is set), addend; .debug_ranges/.debug_loc => -2; other .debug_* => -1

+1 to that. Could also use a test case for this new functionality.

Eric updated this revision to Diff 306711.Nov 20 2020, 9:20 AM
Eric edited the summary of this revision. (Show Details)
Eric added a comment.Nov 20 2020, 9:23 AM

Hope this looks better!

lgtm!.. I'll leave the final approval to @yurydelendik

lld/wasm/InputChunks.cpp
408

Maybe a comment here explaining why we do this?

lld/wasm/InputChunks.h
230

We use the lowerCamel style so I think this should be tombstoneValue?

lld/wasm/InputFiles.cpp
196

So setting the tombstone to zero means we should include the Addend ?

lld/wasm/InputFiles.h
106

It seems like all of the callsites should probably include the tombstone no?

Perhaps remove the default and specify it at each call site?

yurydelendik accepted this revision.Nov 20 2020, 11:28 AM
This revision is now accepted and ready to land.Nov 20 2020, 11:28 AM
MaskRay added inline comments.Nov 20 2020, 11:37 AM
lld/wasm/InputChunks.h
230

Is the space consumption a concern?

sbc100 added inline comments.Nov 20 2020, 12:25 PM
lld/wasm/InputChunks.h
230

I not been to concerted about adding data O(input chunk). Honestly, we have not done much in the way optimizing wasm-ld for speed or memory consumption. Rui often guided me to keep the symbol size down and to avoid per-symbol hash lookup.. but per-input-chunk data I've not been too concerned about.

I guess one alternative would be to store a tombstone-type enum with three possible values?

dblaikie added inline comments.Nov 20 2020, 1:14 PM
lld/test/wasm/debug-removed-fn.ll
7

This looks suspcious - is something in the implementation (or the dumper) truncating the 64 bit -1 to a 32 bit -1? Or is this a bug in the dumper that it's printing out a 32bit value as a 64 bit value?

lld/wasm/InputChunks.cpp
411–417

Avoid "else after return" per the LLVM style guide. Also nice to reduce nesting, so I'd probably go with something like:

if (!.debug_)
  return 0;
if (.debug_ranges || .debug_loc)
  return -2
return -1
lld/wasm/InputFiles.cpp
196

Yeah, I guess that's beyond my debug info purview - but I'd suggest checking what bfd ld does here & maybe going with that, rather than what gold does/lld used to do. Which is probably absolute 0 rather than addend. For broader compatibility (& probably make this code a bit less subtle - avoiding the tombstone of zero not really being zero)

MaskRay added inline comments.Nov 20 2020, 1:26 PM
lld/wasm/InputChunks.h
230

The sizes of SymbolUnion and InputSections are important for the ELF's port.

As an example, the basic block sections feature cost us more than 1% peak memory usage. D91018 claimed back 0.6%

sbc100 added inline comments.Nov 20 2020, 1:40 PM
lld/wasm/InputChunks.h
230

Point taken. And since wasm mandates -ffunction-sections and -fdata-sections is the default we are even more effected than ELF by default I guess.

I'm embarrassed to say I have done zero benchmarking of wasm-ld thus far.

Eric updated this revision to Diff 307059.Nov 23 2020, 5:57 AM
Eric marked 3 inline comments as done.
Eric added inline comments.
lld/test/wasm/debug-removed-fn.ll
7

Looks like this file is coming from a R_WASM_FUNCTION_OFFSET_I32 type relocation so this is being written as 32 bits. I assume the dumper casts to 64 bits whether it read a 32 or 64 bit address.

lld/wasm/InputChunks.h
230

Note that this is not in InputChunk, it is in InputSection, so this is once per custom wasm section, of which there might be what, a dozen? I could get rid of it but then we would need to do 1-3 string comparisons against the section name for every relocation in that section, which I was concerned could be costly, though I also have done no wasm-ld benchmarking.

I actually considered putting it in InputChunk so as to avoid lots of virtual function calls. If I did that this would cost us a megabyte for especially large applications.

lld/wasm/InputFiles.cpp
196

Yes, this was based on MaskRay@'s comment on my initial commit. If there is not agreement on this perhaps bring discussion back to the email thread?

lld/wasm/InputFiles.h
106

This felt different to me because zero has a unique meaning, it is not used as a tombstone it means we don't do tombstoning. I don't think adding ", 0" to the other callsites would make the code any clearer but of course I can do that if there is a strong preference for it.

Eric updated this revision to Diff 307075.Nov 23 2020, 7:24 AM
Eric added inline comments.Nov 23 2020, 7:26 AM
lld/test/wasm/debug-removed-fn.ll
7

It turns out if I update my llvm repo to include your tombstone handling code in llvm-dwarfdump, this line disappears.

There are 4 callers of calcNewValue():

InputChunk::writeTo
InputFunction::calculateSize
InputFunction::writeTo
InputSegment::generateRelocationCode

It seems that they should all be consistent in passing getTombstone() as the second argument. It doesn't make sense that different callsites would get a different new value for a given relocation. They all logically want to get the same answer I believe.

Eric updated this revision to Diff 307299.Nov 24 2020, 3:10 AM

Okay, I've changed it to call getTombstone everywhere.

dblaikie added inline comments.Nov 24 2020, 2:39 PM
lld/wasm/InputFiles.cpp
196

Ah, this one?

The change does not seem right. For non-debug sections (when --allow-undefined is set), addend; .debug_ranges/.debug_loc => -2; other .debug_* => -1

@MaskRay is that what ld.bfd does? ("addend" for non-debug sections)

MaskRay added inline comments.Nov 24 2020, 2:53 PM
lld/wasm/InputChunks.cpp
298

For all these file->calcNewValue(rel, getTombstone()) calls, shouldn't getTombstone() cached to avoid performance overhead?

lld/wasm/InputFiles.cpp
196

Yes for "addend" for non-debug sections. This makes subtraction work.

dblaikie added inline comments.Nov 24 2020, 2:57 PM
lld/wasm/InputFiles.cpp
196

Ah, fair enough - thanks for the context!

Eric updated this revision to Diff 307583.Nov 25 2020, 4:46 AM
Eric marked an inline comment as done.
Eric added a comment.Nov 30 2020, 9:35 AM

Hope everyone had a nice Thanksgiving break! Are any further changes necessary here?

sbc100 accepted this revision.Nov 30 2020, 9:49 AM
MaskRay accepted this revision.Nov 30 2020, 10:35 AM

Looks great! Please include diff context in future patches (https://llvm.org/docs/GettingStarted.html#sending-patches)

lld/wasm/InputChunks.cpp
435

-2l is correct but is not an appropriate literal suffix. UINT64_C(-2) can be used.

lld/wasm/InputFiles.cpp
210
Eric updated this revision to Diff 308459.Nov 30 2020, 12:47 PM
Eric marked 2 inline comments as done.
This revision was automatically updated to reflect the committed changes.