This is an archive of the discontinued LLVM Phabricator instance.

[MemFunctions] Add microbenchmarks for memory functions.
ClosedPublic

Authored by courbet on Jul 2 2019, 8:51 AM.

Details

Summary

Memory functions (memcmp, memcpy, ...) are typically recognized by the
compiler and expanded to specific asm patterns when the size is known at
compile time.

This will help catch any regressions in expansions.

Right now we're only testing memcmp (see context in D60318).

Event Timeline

courbet created this revision.Jul 2 2019, 8:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2019, 8:51 AM
Herald added a subscriber: mgorny. · View Herald Transcript

Thanks for working on this! I'm not familiar with how the benchmarking framework works, so someone else should definitely have a look.

Does the framework automatically account for and filter out noisy results? I'm guessing that tiny memcmp() will have a lot of run-to-run variation.

lebedev.ri added a subscriber: lebedev.ri.EditedJul 2 2019, 11:28 AM

Thanks for working on this! I'm not familiar with how the benchmarking framework works, so someone else should definitely have a look.

That could be me, i suppose.
Does not look too wrong to me.

Does the framework automatically account for and filter out noisy results? I'm guessing that tiny memcmp() will have a lot of run-to-run variation.

The for (auto _ : state) loop will run for up to 0.5 sec, so the results should be good;
Plus at the test-suite/lnt level the test suite can be run several times so it's possible ensure that timing changes are meaningful (U Test, e.g.)

MicroBenchmarks/MemFunctions/main.cpp
34–38

So what about q? It's intentionally left all-zeros? That warrants a comment.

82–94

I'd do one or two macro levels here

111–118

BENCHMARK_MAIN();

courbet updated this revision to Diff 207732.Jul 3 2019, 2:34 AM
courbet marked 3 inline comments as done.

Address review comments.

Thanks Roman.

MicroBenchmarks/MemFunctions/main.cpp
111–118

Thanks for the pointer.

Thanks for working on this! I'm not familiar with how the benchmarking framework works, so someone else should definitely have a look.

Does the framework automatically account for and filter out noisy results? I'm guessing that tiny memcmp() will have a lot of run-to-run variation.

The framework will grow number of iterations until measurements stabilize. This is usually sufficient. However it will not do statistical significance testing for you (which is what I've done in the attached PDF just to be sure).

lebedev.ri added inline comments.Jul 3 2019, 2:48 AM
MicroBenchmarks/MemFunctions/main.cpp
44

This may be paranoia, but i'm not sure this is sufficient to guarantee that compiler *can't* just look into p/q.
I'd suggest adding this here:

benchmark::DoNotOptimize(p);
benchmark::DoNotOptimize(q);

benchmark::ClobberMemory(p);
benchmark::ClobberMemory(q);

(i see that you do that for std::vector<char>'s already, but you have already acquired _storage.data()..)

courbet marked an inline comment as done.Jul 3 2019, 3:50 AM
courbet added inline comments.
MicroBenchmarks/MemFunctions/main.cpp
44

Sounds reasonable. I've even moved the ClobberMemory inside the call (and verified that benchmark numbers do not change).

courbet updated this revision to Diff 207750.Jul 3 2019, 3:50 AM

Be evel less permissive as to what we allow the compiler to see.

lebedev.ri added inline comments.Jul 3 2019, 5:08 AM
MicroBenchmarks/MemFunctions/main.cpp
30

Magical constant
I'm guessing that by 4096 you limit the maximal size of p and q buffers,
implying that they should fit into L1 cache?
Do you want to use the actual L1 size instead?
Otherwise,

static constexpr size_t kMaxBufSizeBytes = 4096;
constexpr size_t kNumElements = kMaxBufSizeBytes / kSize;
courbet updated this revision to Diff 207762.Jul 3 2019, 5:10 AM

Name magical constant.

courbet updated this revision to Diff 207763.Jul 3 2019, 5:12 AM

Add comment for buffer size.

courbet marked an inline comment as done.Jul 3 2019, 5:13 AM
courbet added inline comments.
MicroBenchmarks/MemFunctions/main.cpp
30

It's combination of things, among which caching. But you're right that this warrants a comment. Done.

lebedev.ri accepted this revision.Jul 3 2019, 5:19 AM
lebedev.ri marked an inline comment as done.

Looks ok to me from benchmark perspective, but some more thoughts about the benchmark itself..

MicroBenchmarks/MemFunctions/main.cpp
44

Nice.

51

I think i'm forgetting about some magic.
All the predicates (EqZero, ...) take a single argument, how does this work if it passes two args?

59–67

To be noted, none of these is the actual memcmp, i think?

60

Does it matter that these take int while you always pass char?

This revision is now accepted and ready to land.Jul 3 2019, 5:19 AM
courbet updated this revision to Diff 207765.Jul 3 2019, 5:29 AM
courbet marked 3 inline comments as done.

Clarify top comment.

MicroBenchmarks/MemFunctions/main.cpp
51

I think you missed that the result of calling memcmp is passed to pred. Pred just defines which of ==, < or > we're benching. I updated the bench comment to make that clearer.

59–67

See my comment above.

60

See my comment above.

lebedev.ri marked an inline comment as done.Jul 3 2019, 5:33 AM
lebedev.ri added inline comments.
MicroBenchmarks/MemFunctions/main.cpp
51

Oh i see, that explains it, thanks!

xbolva00 added a subscriber: xbolva00.

lg too

This revision was automatically updated to reflect the committed changes.