This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Simplify getSectionPiece
ClosedPublic

Authored by MaskRay on Dec 3 2018, 8:37 PM.

Event Timeline

MaskRay created this revision.Dec 3 2018, 8:37 PM
grimar added a subscriber: grimar.Dec 4 2018, 1:54 AM

I think this kills the additional optimization we have here. Unfortunately, I do not remember
how much improvement this piece had in comparison with the simple code as your patch has,
but this was the code initially implemented by Rafael in r284594, which comment says:

Add a faster binary search.

Even with the hash table cache, binary search was still pretty
hot. This can be made even faster with prefetching.

Idea from http://cglab.ca/~morin/misc/arraylayout-v2/

I will suggest moving this to llvm.

I do not know if that was moved to llvm (or if that what we might want atm, if not).

I think this kills the additional optimization we have here. Unfortunately, I do not remember
how much improvement this piece had in comparison with the simple code as your patch has,
but this was the code initially implemented by Rafael in r284594, which comment says:

Add a faster binary search.

Even with the hash table cache, binary search was still pretty
hot. This can be made even faster with prefetching.

Idea from http://cglab.ca/~morin/misc/arraylayout-v2/

I will suggest moving this to llvm.

I do not know if that was moved to llvm (or if that what we might want atm, if not).

The optimization is that by using a simple conditional assignment, the compilers will know to generate conditional move (branch-less binary search as the paper says)

// either one below generates conditional mov
if (Pieces[Idx + H].InputOff <= Offset)
 Idx += H;

Idx = Pieces[Idx + H].InputOff <= Offset ? Idx + H : Idx;

// but std::upper_bound doesn't

The (negligible) disadvantage of this approach is that in each iteration, it reduces the range size from n to ceil(n/2), instead of floor(n/2) as is the case of std::upper_bound (which produces the not; add instruction sequence to do size -= H+1).

But note OffsetMap.find(Offset) also takes some time. The improvement here is negligible to me..

A static -DCMAKE_BUILD_TYPE=Debug build of bin/clang-8:

perf stat -r 10 ~/llvm/Release/bin/ld.lld @response.txt -o /dev/null => The times vary from 4.5435 to 4.7403 with or without this optimization. The differences are purely noise.. I cannot find the (very minor) optimization (branch instruction -> conditional move) improve the linking time.

ruiu added a comment.Dec 4 2018, 1:30 PM

Yeah, I think with an optimizing compiler you cannot see any difference between the old and the new code.

ELF/InputSection.cpp
1226

Can you add llvm:: to upper_bound so that that looks obviously different from std::upper_bound?

1226–1228

Calculating a distance between two iterators to use it as an array access index seems a bit awkward.

MaskRay updated this revision to Diff 176709.Dec 4 2018, 1:53 PM

Add llvm::

ruiu accepted this revision.Dec 4 2018, 2:26 PM

LGTM

ELF/InputSection.cpp
1230

You can remove this assert as it doesn't make much sense anymore.

This revision is now accepted and ready to land.Dec 4 2018, 2:26 PM
This revision was automatically updated to reflect the committed changes.