Download Raw Diff

Details

Reviewers

lebedev.ri
spatel

Commits

rOLDT369707: [MemFunctions] Add microbenchmarks for memory functions.
rL369707: [MemFunctions] Add microbenchmarks for memory functions.

Summary

Memory functions (memcmp, memcpy, ...) are typically recognized by the
compiler and expanded to specific asm patterns when the size is known at
compile time.

This will help catch any regressions in expansions.

Right now we're only testing memcmp (see context in D60318).

Diff Detail

Repository

rOLDT svn-test-suite

Build Status

Buildable 34202
Build 34201: arc lint + arc unit

Event Timeline

courbet created this revision.Jul 2 2019, 8:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2019, 8:51 AM

Herald added a subscriber: mgorny. · View Herald Transcript

Harbormaster completed remote builds in B34202: Diff 207570.Jul 2 2019, 8:53 AM

Thanks for working on this! I'm not familiar with how the benchmarking framework works, so someone else should definitely have a look.

Does the framework automatically account for and filter out noisy results? I'm guessing that tiny memcmp() will have a lot of run-to-run variation.

In D64082#1566955, @spatel wrote:

Thanks for working on this! I'm not familiar with how the benchmarking framework works, so someone else should definitely have a look.

That could be me, i suppose.
Does not look too wrong to me.

Does the framework automatically account for and filter out noisy results? I'm guessing that tiny memcmp() will have a lot of run-to-run variation.

The for (auto _ : state) loop will run for up to 0.5 sec, so the results should be good;
Plus at the test-suite/lnt level the test suite can be run several times so it's possible ensure that timing changes are meaningful (U Test, e.g.)

MicroBenchmarks/MemFunctions/main.cpp
34–38	So what about `q`? It's intentionally left all-zeros? That warrants a comment.
82–94	I'd do one or two macro levels here
111–118	BENCHMARK_MAIN();

Address review comments.

Thanks Roman.

MicroBenchmarks/MemFunctions/main.cpp
111–118	Thanks for the pointer.

In D64082#1566955, @spatel wrote:

Thanks for working on this! I'm not familiar with how the benchmarking framework works, so someone else should definitely have a look.

Does the framework automatically account for and filter out noisy results? I'm guessing that tiny memcmp() will have a lot of run-to-run variation.

The framework will grow number of iterations until measurements stabilize. This is usually sufficient. However it will not do statistical significance testing for you (which is what I've done in the attached PDF just to be sure).

Harbormaster completed remote builds in B34257: Diff 207732.Jul 3 2019, 2:36 AM

courbet added a reviewer: lebedev.ri.Jul 3 2019, 2:36 AM

lebedev.ri added inline comments.Jul 3 2019, 2:48 AM

MicroBenchmarks/MemFunctions/main.cpp
44	This may be paranoia, but i'm not sure this is sufficient to guarantee that compiler can't just look into `p`/`q`. I'd suggest adding this here: benchmark::DoNotOptimize(p); benchmark::DoNotOptimize(q); benchmark::ClobberMemory(p); benchmark::ClobberMemory(q); (i see that you do that for `std::vector<char>`'s already, but you have already acquired `_storage.data()`..)

courbet marked an inline comment as done.Jul 3 2019, 3:50 AM

courbet added inline comments.

MicroBenchmarks/MemFunctions/main.cpp
44	Sounds reasonable. I've even moved the ClobberMemory inside the call (and verified that benchmark numbers do not change).

Be evel less permissive as to what we allow the compiler to see.

Harbormaster completed remote builds in B34265: Diff 207750.Jul 3 2019, 3:50 AM

lebedev.ri added inline comments.Jul 3 2019, 5:08 AM

MicroBenchmarks/MemFunctions/main.cpp
30	Magical constant I'm guessing that by `4096` you limit the maximal size of `p` and `q` buffers, implying that they should fit into L1 cache? Do you want to use the actual L1 size instead? Otherwise, static constexpr size_t kMaxBufSizeBytes = 4096; constexpr size_t kNumElements = kMaxBufSizeBytes / kSize;

Name magical constant.

Harbormaster completed remote builds in B34271: Diff 207762.Jul 3 2019, 5:10 AM

Add comment for buffer size.

Harbormaster completed remote builds in B34272: Diff 207763.Jul 3 2019, 5:12 AM

courbet marked an inline comment as done.Jul 3 2019, 5:13 AM

courbet added inline comments.

MicroBenchmarks/MemFunctions/main.cpp
30	It's combination of things, among which caching. But you're right that this warrants a comment. Done.

Looks ok to me from benchmark perspective, but some more thoughts about the benchmark itself..

MicroBenchmarks/MemFunctions/main.cpp
44	Nice.
51	I think i'm forgetting about some magic. All the predicates (`EqZero`, ...) take a single argument, how does this work if it passes two args?
59–67	To be noted, none of these is the actual `memcmp`, i think?
60	Does it matter that these take `int` while you always pass `char`?

This revision is now accepted and ready to land.Jul 3 2019, 5:19 AM

Clarify top comment.

MicroBenchmarks/MemFunctions/main.cpp
51	I think you missed that the result of calling memcmp is passed to pred. `Pred` just defines which of `==`, `<` or `>` we're benching. I updated the bench comment to make that clearer.
59–67	See my comment above.
60	See my comment above.

Harbormaster completed remote builds in B34273: Diff 207765.Jul 3 2019, 5:29 AM

lebedev.ri marked an inline comment as done.Jul 3 2019, 5:33 AM

lebedev.ri added inline comments.

MicroBenchmarks/MemFunctions/main.cpp
51	Oh i see, that explains it, thanks!

lg too

Thanks!

Closed by commit rL369707: [MemFunctions] Add microbenchmarks for memory functions. (authored by courbet). · Explain WhyAug 22 2019, 2:24 PM

This revision was automatically updated to reflect the committed changes.

Diff 207570

MicroBenchmarks/CMakeLists.txt

	file(COPY lit.local.cfg DESTINATION ${CMAKE_CURRENT_BINARY_DIR})			file(COPY lit.local.cfg DESTINATION ${CMAKE_CURRENT_BINARY_DIR})

	add_subdirectory(libs)			add_subdirectory(libs)
	add_subdirectory(XRay)			add_subdirectory(XRay)
	add_subdirectory(LCALS)			add_subdirectory(LCALS)
	add_subdirectory(harris)			add_subdirectory(harris)
	add_subdirectory(ImageProcessing)			add_subdirectory(ImageProcessing)
	add_subdirectory(LoopInterchange)			add_subdirectory(LoopInterchange)
				add_subdirectory(MemFunctions)

MicroBenchmarks/MemFunctions/CMakeLists.txt

This file was added.

				llvm_test_run(WORKDIR ${CMAKE_CURRENT_BINARY_DIR})

				llvm_test_executable(MemFunctions main.cpp)

				target_link_libraries(MemFunctions benchmark)

MicroBenchmarks/MemFunctions/main.cpp

This file was added.

				//===- main.cc - Memory Functions Benchmarks ------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Memory functions (memcmp, memcpy, ...) are typically recognized by the
				// compiler and expanded to specific asm patterns when the size is known at
				// compile time. THese microbenchmarks help catch potential CodeGen regressions.
				//
				// Note that these microbenchmarks do not represent a typical real-life
				// situation. They are designed to test the LLVM CodeGen. In particular,
				// real-life applications will typically be memory- rather than compute-bound
				// when manipulating memory.
				//
				//===----------------------------------------------------------------------===//

				#include "benchmark/benchmark.h"

				#include <cstring>
				#include <vector>

				// Benchmarks memcmp(p, q, size) where n is known at compile time.
				// The compiler typically inlines this to loads and compares.
				template <int kSize, typename Pred, typename Mod>
				void BM_MemCmp(benchmark::State &state) {
				constexpr const size_t kNumElements = 4096 / kSize;

				lebedev.riUnsubmitted Not Done Reply Inline Actions Magical constant I'm guessing that by `4096` you limit the maximal size of `p` and `q` buffers, implying that they should fit into L1 cache? Do you want to use the actual L1 size instead? Otherwise, static constexpr size_t kMaxBufSizeBytes = 4096; constexpr size_t kNumElements = kMaxBufSizeBytes / kSize; lebedev.ri: Magical constant I'm guessing that by `4096` you limit the maximal size of `p` and `q` buffers…
				courbetAuthorUnsubmitted Done Reply Inline Actions It's combination of things, among which caching. But you're right that this warrants a comment. Done. courbet: It's combination of things, among which caching. But you're right that this warrants a comment.
				std::vector<char> p_storage(kNumElements * kSize);
				std::vector<char> q_storage(kNumElements * kSize);
				char* p = p_storage.data();
				const char* q = q_storage.data();

				for (int i = 0; i < kNumElements; ++i)
				Mod().template Change<kSize>(p + i * kSize);

				lebedev.riUnsubmitted Done Reply Inline Actions So what about `q`? It's intentionally left all-zeros? That warrants a comment. lebedev.ri: So what about `q`? It's intentionally left all-zeros? That warrants a comment.
				benchmark::DoNotOptimize(p_storage);
				benchmark::DoNotOptimize(q_storage);

				for (auto _ : state) {
				for (int i = 0; i < kNumElements; ++i) {
				int res = Pred()(memcmp(p + i * kSize, q + i * kSize, kSize));
				lebedev.riUnsubmitted Not Done Reply Inline Actions This may be paranoia, but i'm not sure this is sufficient to guarantee that compiler can't just look into `p`/`q`. I'd suggest adding this here: benchmark::DoNotOptimize(p); benchmark::DoNotOptimize(q); benchmark::ClobberMemory(p); benchmark::ClobberMemory(q); (i see that you do that for `std::vector<char>`'s already, but you have already acquired `_storage.data()`..) lebedev.ri: This may be paranoia, but i'm not sure this is sufficient to guarantee that compiler can't…
				courbetAuthorUnsubmitted Done Reply Inline Actions Sounds reasonable. I've even moved the ClobberMemory inside the call (and verified that benchmark numbers do not change). courbet: Sounds reasonable. I've even moved the ClobberMemory inside the call (and verified that…
				lebedev.riUnsubmitted Done Reply Inline Actions Nice. lebedev.ri: Nice.
				benchmark::DoNotOptimize(res);
				}
				}
				state.SetBytesProcessed(p_storage.size() * state.iterations());

				}

				lebedev.riUnsubmitted Not Done Reply Inline Actions I think i'm forgetting about some magic. All the predicates (`EqZero`, ...) take a single argument, how does this work if it passes two args? lebedev.ri: I think i'm forgetting about some magic. All the predicates (`EqZero`, ...) take a single…
				courbetAuthorUnsubmitted Done Reply Inline Actions I think you missed that the result of calling memcmp is passed to pred. `Pred` just defines which of `==`, `<` or `>` we're benching. I updated the bench comment to make that clearer. courbet: I think you missed that the result of calling memcmp is passed to pred. `Pred` just defines…
				lebedev.riUnsubmitted Done Reply Inline Actions Oh i see, that explains it, thanks! lebedev.ri: Oh i see, that explains it, thanks!

				// Predicates.
				struct EqZero {
				bool operator()(int v) const { return v == 0; }
				};
				struct LessThanZero {
				bool operator()(int v) const { return v < 0; }
				};
				struct GreaterThanZero {
				lebedev.riUnsubmitted Not Done Reply Inline Actions Does it matter that these take `int` while you always pass `char`? lebedev.ri: Does it matter that these take `int` while you always pass `char`?
				courbetAuthorUnsubmitted Done Reply Inline Actions See my comment above. courbet: See my comment above.
				bool operator()(int v) const { return v > 0; }
				};

				// Functors to change the first/mid/last or no value.
				struct None {
				template <int kSize>
				void Change(char* const p) const {}
				lebedev.riUnsubmitted Not Done Reply Inline Actions To be noted, none of these is the actual `memcmp`, i think? lebedev.ri: To be noted, none of these is the actual `memcmp`, i think?
				courbetAuthorUnsubmitted Done Reply Inline Actions See my comment above. courbet: See my comment above.
				};
				struct First {
				template <int kSize>
				void Change(char* const p) const { p[0] = 128; }
				};
				struct Mid {
				template <int kSize>
				void Change(char* const p) const { p[kSize / 2] = 128; }
				};
				struct Last {
				template <int kSize>
				void Change(char* const p) const { p[kSize - 1] = 128; }
				};

				#define MEMCMP_BENCHMARK(size) \
				BENCHMARK_TEMPLATE(BM_MemCmp, size, EqZero, None)->Unit(benchmark::kNanosecond); \
				BENCHMARK_TEMPLATE(BM_MemCmp, size, EqZero, First)->Unit(benchmark::kNanosecond); \
				BENCHMARK_TEMPLATE(BM_MemCmp, size, EqZero, Mid)->Unit(benchmark::kNanosecond); \
				BENCHMARK_TEMPLATE(BM_MemCmp, size, EqZero, Last)->Unit(benchmark::kNanosecond); \
				BENCHMARK_TEMPLATE(BM_MemCmp, size, LessThanZero, None)->Unit(benchmark::kNanosecond); \
				BENCHMARK_TEMPLATE(BM_MemCmp, size, LessThanZero, First)->Unit(benchmark::kNanosecond); \
				BENCHMARK_TEMPLATE(BM_MemCmp, size, LessThanZero, Mid)->Unit(benchmark::kNanosecond); \
				BENCHMARK_TEMPLATE(BM_MemCmp, size, LessThanZero, Last)->Unit(benchmark::kNanosecond); \
				BENCHMARK_TEMPLATE(BM_MemCmp, size, GreaterThanZero, None)->Unit(benchmark::kNanosecond); \
				BENCHMARK_TEMPLATE(BM_MemCmp, size, GreaterThanZero, First)->Unit(benchmark::kNanosecond); \
				BENCHMARK_TEMPLATE(BM_MemCmp, size, GreaterThanZero, Mid)->Unit(benchmark::kNanosecond); \
				BENCHMARK_TEMPLATE(BM_MemCmp, size, GreaterThanZero, Last)->Unit(benchmark::kNanosecond);
				lebedev.riUnsubmitted Done Reply Inline Actions I'd do one or two macro levels here lebedev.ri: I'd do one or two macro levels here

				MEMCMP_BENCHMARK(1)
				MEMCMP_BENCHMARK(2)
				MEMCMP_BENCHMARK(3)
				MEMCMP_BENCHMARK(4)
				MEMCMP_BENCHMARK(5)
				MEMCMP_BENCHMARK(6)
				MEMCMP_BENCHMARK(7)
				MEMCMP_BENCHMARK(8)
				MEMCMP_BENCHMARK(15)
				MEMCMP_BENCHMARK(16)
				MEMCMP_BENCHMARK(31)
				MEMCMP_BENCHMARK(32)
				MEMCMP_BENCHMARK(63)
				MEMCMP_BENCHMARK(64)

				int main(int argc, char *argv[]) {
				::benchmark::Initialize(&argc, argv);
				if (::benchmark::ReportUnrecognizedArguments(argc, argv))
				return 1;
				::benchmark::RunSpecifiedBenchmarks();

				return 0;
				}
				lebedev.riUnsubmitted Not Done Reply Inline Actions BENCHMARK_MAIN(); lebedev.ri: BENCHMARK_MAIN();
				courbetAuthorUnsubmitted Done Reply Inline Actions Thanks for the pointer. courbet: Thanks for the pointer.

This is an archive of the discontinued LLVM Phabricator instance.

[MemFunctions] Add microbenchmarks for memory functions.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 207570

MicroBenchmarks/CMakeLists.txt

MicroBenchmarks/MemFunctions/CMakeLists.txt

MicroBenchmarks/MemFunctions/main.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[MemFunctions] Add microbenchmarks for memory functions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 207570

MicroBenchmarks/CMakeLists.txt

MicroBenchmarks/MemFunctions/CMakeLists.txt

MicroBenchmarks/MemFunctions/main.cpp

[MemFunctions] Add microbenchmarks for memory functions.
ClosedPublic