This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Handle overflow beyond the 127 common encodings limit
ClosedPublic

Authored by gkm on Dec 14 2020, 6:53 PM.

Details

Reviewers
int3
Group Reviewers
Restricted Project
Commits
rG99930719c66d: Handle overflow beyond the 127 common encodings limit
Summary

The common encodings table holds only 127 entries. The encodings index for compact entries is 8 bits wide, and indexes 127..255 are stored locally to each second-level page. Prior to this diff, lld would fatal() if encodings overflowed the 127 limit.

This diff populates a per-second-level-page encodings table as needed. When the per-page encodings table hits its limit, we must terminate the page. If such early termination would consume fewer entries than a regular (non-compact) encoding page, then we prefer the regular format.

Caveat: one reason the common-encoding table might overflow is because of DWARF debug-info references, which are not yet implemented and will come with a later diff.

Diff Detail

Event Timeline

gkm requested review of this revision.Dec 14 2020, 6:53 PM
gkm created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2020, 6:53 PM
int3 added a subscriber: int3.Dec 16 2020, 7:19 PM
int3 added inline comments.
lld/MachO/UnwindInfoSection.cpp
160–164

I know this is existing code, but isn't this the default behavior of std::pair::operator>()?

167–227

depending on how large the vector is, might be a useful perf optimization to use quickselect to find the 127th largest element and just sort over those smaller than it... but we can consider that later

205

2 * sizeof(uint32_t) is due to the size of unwind_info_regular_second_level_entry, right? how about using sizeof(unwind_info_regular_second_level_entry) instead?

211–212

let's use braces here too since the other if-else clauses have them

215

why does it matter if i < cuPtrVector.size()?

235–237

why the + 1?

274–275

avoid auto for types with simple names

298

avoid auto

322–323

could use memcpy

337

so I don't know if this is spec-guaranteed behavior, but there are already several places in LLD where we assume that an unwritten section of an mmap'ed buffer will be zero. I.e. we don't zero things out explicitly, and it seems to work in practice. I think doing that would allow us to eliminate padByteCount?

338–339

rm / uncomment?

lld/MachO/UnwindInfoSection.h
17

doesn't seem like unordered_map is actually used

41–46

why not use find() instead of lookup()?

75–76

general aesthetic suggestion: if we moved this into the .cpp file, we could do away with all the UNWIND_INFO_ prefixes and reduce some of the (extensive) line wrapping. Probably better in a separate diff though

gkm marked 14 inline comments as done.Dec 18 2020, 12:26 AM
gkm added inline comments.
lld/MachO/UnwindInfoSection.cpp
160–164

No. std::pair::operator>() collates primarily on .first and falls back on .second. Here, we need .second as the primary and .first as the fallback.

167–227

This is good thinkin'! The Chrome bug that motivated this diff has inputs with approx 380 common encodings. IMO, filtering with quickselect (1) offers an insignificant micro-efficiency, and (2) is possibly more expensive than sort alone.

205

This is a valid comment for handling the regular encoding format a bit further down in the function. Here, we are still handling compact encoding, and specifically the case where we need to extend the local encoding table by 1 word, and the compact entry list by 1 word.

215

When i == cuPtrVector.size(), the final page was filled using the compact format, so it is irrelevant to check if we might fit more entries using the regular format.

222–223

s/2 * 2 * sizeof(uint32_t)/sizeof(unwind_info_regular_second_level_entry)/ is appropriate here.

235–237

The first level index contains a sentinel.

lld/MachO/UnwindInfoSection.h
41–46

find() returns an iterator at key.
lookup() returns the value at key, and is simpler for my purposes.

75–76

I did this here. Vertical space is reduced by an underwhelming 2 lines, but horizontal line density is also lower, so on balance it seems an improvement.

gkm updated this revision to Diff 312705.Dec 18 2020, 12:46 AM
gkm marked 7 inline comments as done.
  • revise according to review feedback
int3 added inline comments.Dec 18 2020, 6:29 AM
lld/MachO/UnwindInfoSection.cpp
215

ah, and if the final page can be filled with the compact format, it will always end up using fewer bytes than if it had used the regular format, right?

235–237

would be nice to have a conment

lld/MachO/UnwindInfoSection.h
41–46

I don't understand how it is simpler, since it looks like you always have to check if the returned value is a sentinel value -- just as you'd have to check if find() had returned a valid iterator. But it seems that using find() would remove the need for size_tx.

52

valid lint

gkm marked 3 inline comments as done.Dec 18 2020, 11:29 PM
gkm added inline comments.
lld/MachO/UnwindInfoSection.cpp
215

Yes, equal or fewer.

lld/MachO/UnwindInfoSection.h
41–46

Aside from the crufty helper class size_tx, I thought ...

  • The current approach has a simpler access procedure:
    1. call .lookup() to receive the desired .second of the DenseMap's pair
    2. which we can compare with the default constructor size_tx() to detect "not found"
  • With the iterator there are more steps:
    1. call .find() on to return an iterator
    2. compare iterator to .end() for "not found"
    3. if found, dereference the iterator to return a pair
    4. access .second to get the encoding-table index

I rewrote using .find() and it came-out nicer than anticipated, so I dropped size_tx and .lookup().

gkm updated this revision to Diff 312920.Dec 18 2020, 11:33 PM
gkm marked 2 inline comments as done.
  • rewrite encoding-index maps to use .find() rather than .lookup() so we can remove the crufty size_tx
int3 accepted this revision.Dec 19 2020, 12:14 PM

lgtm

lld/MachO/UnwindInfoSection.cpp
197

wouldn't hurt to address this lint

200–203

could use count() instead of find() here

318–319
This revision is now accepted and ready to land.Dec 19 2020, 12:14 PM
gkm marked 3 inline comments as done.Dec 19 2020, 2:53 PM
This revision was landed with ongoing or failed builds.Dec 19 2020, 2:54 PM
This revision was automatically updated to reflect the committed changes.
gkm added a comment.Dec 19 2020, 8:30 PM

I fixed the clang-tidy failures before landing. Unfortunately, I neglected to update the diff with the final version, which is why you see the notice that I landed with ongoing build failures. Oops. :(