This is an archive of the discontinued LLVM Phabricator instance.

[MathExtras] Add alignToPowerOf2
Needs ReviewPublic

Authored by MaskRay on May 17 2018, 12:12 PM.

Download Raw Diff

Details

Reviewers

jlebar
ruiu

Summary

When Align is a variable known to be a power of 2, this can be faster than alignTo by replacing a div instruction with bitwise operation.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 18278
Build 18278: arc lint + arc unit

Event Timeline

MaskRay created this revision.May 17 2018, 12:12 PM

Harbormaster completed remote builds in B18278: Diff 147367.May 17 2018, 12:12 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptMay 17 2018, 12:12 PM

I wonder if this actually makes a measurable difference. Where are you planning to use this function?

They may be applicable in some places of LLD, e.g.

https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L487

void EhFrameSection::finalizeContents() {
...
  for (CieRecord *Rec : CieRecords) {
...
    Off += alignTo(Rec->Cie->Size, Config->Wordsize);

https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L2536

void MergeNoTailSection::finalizeContents() {
...
  for (size_t I = 0; I < NumShards; ++I) {
...
      Off = alignTo(Off, Alignment);
    ShardOffsets[I] = Off;

Yeah, but we don't usually optimize unless optimizing it makes things measurably faster, so I wonder if it will be effective.

https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L2517

I found this when I was reading how SHF_MERGE sections are merged. Why does the code claim that power of 2 makes the parallelism more efficient?

  // Concurrency level. Must be a power of 2 to avoid expensive modulo
  // operations in the following tight loop.
  size_t Concurrency = 1;
  if (ThreadsEnabled)
    Concurrency =
        std::min<size_t>(PowerOf2Floor(hardware_concurrency()), NumShards);

  // Add section pieces to the builders.
  parallelForEachN(0, Concurrency, [&](size_t ThreadId) {
    for (MergeInputSection *Sec : Sections) {
      for (size_t I = 0, E = Sec->Pieces.size(); I != E; ++I) {
        size_t ShardId = getShardId(Sec->Pieces[I].Hash);
///////////////////////////////////// it this bitwise operation effective?
        if ((ShardId & (Concurrency - 1)) == ThreadId && Sec->Pieces[I].Live)
          Sec->Pieces[I].OutputOff = Shards[ShardId].add(Sec->getData(I));
      }
    }
  });

In D47024#1104598, @MaskRay wrote:

https://github.com/llvm-mirror/lld/tree/master/ELF/SyntheticSections.cpp#L2517

I don't think that code computes an aligned value. Rather, it computes a modulo.

Sorry for being unclear. I mean that code also wants to avoid a division instruction. Is that because the symbols are iterated NumSymbols*Concurrency times to the saving is significant but other alignment improvement may be unnoticeable?

In D47024#1104843, @MaskRay wrote:

Sorry for being unclear. I mean that code also wants to avoid a division instruction. Is that because the symbols are iterated NumSymbols*Concurrency times to the saving is significant but other alignment improvement may be unnoticeable?

Well, I don't know the answer, and I don't think I should make a guess whether some optimization is effective or not, as the axiom "don't guess, measure!" says. If you believe that that could make a difference, you should just run a benchmark with and without your change.

(But before start optimizing something, you want to make sure that the code you are trying to optimize is a bottleneck. If the code takes only 1% of total execution time, optimizing it by 1% improves the total execution time only by 0.01% which is perhaps too small to measure.)

That particular code you pointed out was chosen as a result of benchmarking. We generally don't want to micro-optimize code like that, but since if it was proved to be effective, we did that.

Revision Contents

Path

Size

include/

llvm/

Support/

MathExtras.h

6 lines

Diff 147367

include/llvm/Support/MathExtras.h

	Show First 20 Lines • Show All 681 Lines • ▼ Show 20 Lines

	/// Returns the next integer (mod 2**64) that is greater than or equal to			/// Returns the next integer (mod 2**64) that is greater than or equal to
	/// \p Value and is a multiple of \c Align. \c Align must be non-zero.			/// \p Value and is a multiple of \c Align. \c Align must be non-zero.
	template <uint64_t Align> constexpr inline uint64_t alignTo(uint64_t Value) {			template <uint64_t Align> constexpr inline uint64_t alignTo(uint64_t Value) {
	static_assert(Align != 0u, "Align must be non-zero");			static_assert(Align != 0u, "Align must be non-zero");
	return (Value + Align - 1) / Align * Align;			return (Value + Align - 1) / Align * Align;
	}			}

				inline uint64_t alignToPowerOf2(uint64_t Value, uint64_t Align) {
				assert(Align != 0 && (Align & Align - 1) == 0 &&
				"Align must be a power of 2");
				return (Value + Align - 1) & -Align;
				}

	/// Returns the integer ceil(Numerator / Denominator).			/// Returns the integer ceil(Numerator / Denominator).
	inline uint64_t divideCeil(uint64_t Numerator, uint64_t Denominator) {			inline uint64_t divideCeil(uint64_t Numerator, uint64_t Denominator) {
	return alignTo(Numerator, Denominator) / Denominator;			return alignTo(Numerator, Denominator) / Denominator;
	}			}

	/// \c alignTo for contexts where a constant expression is required.			/// \c alignTo for contexts where a constant expression is required.
	/// \sa alignTo			/// \sa alignTo
	///			///
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines