When Align is a variable known to be a power of 2, this can be faster than alignTo by replacing a div instruction with bitwise operation.
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 18278 Build 18278: arc lint + arc unit
Event Timeline
I wonder if this actually makes a measurable difference. Where are you planning to use this function?
They may be applicable in some places of LLD, e.g.
https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L487
void EhFrameSection::finalizeContents() {
...
for (CieRecord *Rec : CieRecords) {
...
Off += alignTo(Rec->Cie->Size, Config->Wordsize);https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L2536
void MergeNoTailSection::finalizeContents() {
...
for (size_t I = 0; I < NumShards; ++I) {
...
Off = alignTo(Off, Alignment);
ShardOffsets[I] = Off;Yeah, but we don't usually optimize unless optimizing it makes things measurably faster, so I wonder if it will be effective.
https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L2517
I found this when I was reading how SHF_MERGE sections are merged. Why does the code claim that power of 2 makes the parallelism more efficient?
// Concurrency level. Must be a power of 2 to avoid expensive modulo
// operations in the following tight loop.
size_t Concurrency = 1;
if (ThreadsEnabled)
Concurrency =
std::min<size_t>(PowerOf2Floor(hardware_concurrency()), NumShards);
// Add section pieces to the builders.
parallelForEachN(0, Concurrency, [&](size_t ThreadId) {
for (MergeInputSection *Sec : Sections) {
for (size_t I = 0, E = Sec->Pieces.size(); I != E; ++I) {
size_t ShardId = getShardId(Sec->Pieces[I].Hash);
///////////////////////////////////// it this bitwise operation effective?
if ((ShardId & (Concurrency - 1)) == ThreadId && Sec->Pieces[I].Live)
Sec->Pieces[I].OutputOff = Shards[ShardId].add(Sec->getData(I));
}
}
});Sorry for being unclear. I mean that code also wants to avoid a division instruction. Is that because the symbols are iterated NumSymbols*Concurrency times to the saving is significant but other alignment improvement may be unnoticeable?
Well, I don't know the answer, and I don't think I should make a guess whether some optimization is effective or not, as the axiom "don't guess, measure!" says. If you believe that that could make a difference, you should just run a benchmark with and without your change.
(But before start optimizing something, you want to make sure that the code you are trying to optimize is a bottleneck. If the code takes only 1% of total execution time, optimizing it by 1% improves the total execution time only by 0.01% which is perhaps too small to measure.)
That particular code you pointed out was chosen as a result of benchmarking. We generally don't want to micro-optimize code like that, but since if it was proved to be effective, we did that.