When Align is a variable known to be a power of 2, this can be faster than alignTo by replacing a div instruction with bitwise operation.
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 18278 Build 18278: arc lint + arc unit
Event Timeline
I wonder if this actually makes a measurable difference. Where are you planning to use this function?
They may be applicable in some places of LLD, e.g.
https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L487
void EhFrameSection::finalizeContents() { ... for (CieRecord *Rec : CieRecords) { ... Off += alignTo(Rec->Cie->Size, Config->Wordsize);
https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L2536
void MergeNoTailSection::finalizeContents() { ... for (size_t I = 0; I < NumShards; ++I) { ... Off = alignTo(Off, Alignment); ShardOffsets[I] = Off;
Yeah, but we don't usually optimize unless optimizing it makes things measurably faster, so I wonder if it will be effective.
https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L2517
I found this when I was reading how SHF_MERGE sections are merged. Why does the code claim that power of 2 makes the parallelism more efficient?
// Concurrency level. Must be a power of 2 to avoid expensive modulo // operations in the following tight loop. size_t Concurrency = 1; if (ThreadsEnabled) Concurrency = std::min<size_t>(PowerOf2Floor(hardware_concurrency()), NumShards); // Add section pieces to the builders. parallelForEachN(0, Concurrency, [&](size_t ThreadId) { for (MergeInputSection *Sec : Sections) { for (size_t I = 0, E = Sec->Pieces.size(); I != E; ++I) { size_t ShardId = getShardId(Sec->Pieces[I].Hash); ///////////////////////////////////// it this bitwise operation effective? if ((ShardId & (Concurrency - 1)) == ThreadId && Sec->Pieces[I].Live) Sec->Pieces[I].OutputOff = Shards[ShardId].add(Sec->getData(I)); } } });
Sorry for being unclear. I mean that code also wants to avoid a division instruction. Is that because the symbols are iterated NumSymbols*Concurrency times to the saving is significant but other alignment improvement may be unnoticeable?
Well, I don't know the answer, and I don't think I should make a guess whether some optimization is effective or not, as the axiom "don't guess, measure!" says. If you believe that that could make a difference, you should just run a benchmark with and without your change.
(But before start optimizing something, you want to make sure that the code you are trying to optimize is a bottleneck. If the code takes only 1% of total execution time, optimizing it by 1% improves the total execution time only by 0.01% which is perhaps too small to measure.)
That particular code you pointed out was chosen as a result of benchmarking. We generally don't want to micro-optimize code like that, but since if it was proved to be effective, we did that.