This is an archive of the discontinued LLVM Phabricator instance.

[MathExtras] Add alignToPowerOf2
Needs ReviewPublic

Authored by MaskRay on May 17 2018, 12:12 PM.

Details

Reviewers
jlebar
ruiu
Summary

When Align is a variable known to be a power of 2, this can be faster than alignTo by replacing a div instruction with bitwise operation.

Diff Detail

Event Timeline

MaskRay created this revision.May 17 2018, 12:12 PM
ruiu added a comment.May 17 2018, 12:47 PM

I wonder if this actually makes a measurable difference. Where are you planning to use this function?

They may be applicable in some places of LLD, e.g.

https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L487

void EhFrameSection::finalizeContents() {
...
  for (CieRecord *Rec : CieRecords) {
...
    Off += alignTo(Rec->Cie->Size, Config->Wordsize);

https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L2536

void MergeNoTailSection::finalizeContents() {
...
  for (size_t I = 0; I < NumShards; ++I) {
...
      Off = alignTo(Off, Alignment);
    ShardOffsets[I] = Off;
ruiu added a comment.May 17 2018, 12:59 PM

Yeah, but we don't usually optimize unless optimizing it makes things measurably faster, so I wonder if it will be effective.

MaskRay added a comment.EditedMay 18 2018, 8:42 AM

https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L2517

I found this when I was reading how SHF_MERGE sections are merged. Why does the code claim that power of 2 makes the parallelism more efficient?

  // Concurrency level. Must be a power of 2 to avoid expensive modulo
  // operations in the following tight loop.
  size_t Concurrency = 1;
  if (ThreadsEnabled)
    Concurrency =
        std::min<size_t>(PowerOf2Floor(hardware_concurrency()), NumShards);

  // Add section pieces to the builders.
  parallelForEachN(0, Concurrency, [&](size_t ThreadId) {
    for (MergeInputSection *Sec : Sections) {
      for (size_t I = 0, E = Sec->Pieces.size(); I != E; ++I) {
        size_t ShardId = getShardId(Sec->Pieces[I].Hash);
///////////////////////////////////// it this bitwise operation effective?
        if ((ShardId & (Concurrency - 1)) == ThreadId && Sec->Pieces[I].Live)
          Sec->Pieces[I].OutputOff = Shards[ShardId].add(Sec->getData(I));
      }
    }
  });
ruiu added a comment.May 18 2018, 10:40 AM

I don't think that code computes an aligned value. Rather, it computes a modulo.

Sorry for being unclear. I mean that code also wants to avoid a division instruction. Is that because the symbols are iterated NumSymbols*Concurrency times to the saving is significant but other alignment improvement may be unnoticeable?

ruiu added a comment.May 18 2018, 11:08 AM

Sorry for being unclear. I mean that code also wants to avoid a division instruction. Is that because the symbols are iterated NumSymbols*Concurrency times to the saving is significant but other alignment improvement may be unnoticeable?

Well, I don't know the answer, and I don't think I should make a guess whether some optimization is effective or not, as the axiom "don't guess, measure!" says. If you believe that that could make a difference, you should just run a benchmark with and without your change.

(But before start optimizing something, you want to make sure that the code you are trying to optimize is a bottleneck. If the code takes only 1% of total execution time, optimizing it by 1% improves the total execution time only by 0.01% which is perhaps too small to measure.)

That particular code you pointed out was chosen as a result of benchmarking. We generally don't want to micro-optimize code like that, but since if it was proved to be effective, we did that.