This is an archive of the discontinued LLVM Phabricator instance.

[MicroBenchmarks,AArch64] Added correctness test & other performance tests for truncate or zero-extend vector operations
ClosedPublic

Authored by nilanjana_basu on Nov 15 2022, 1:13 PM.

Details

Summary

This patch adds a correctness test to check the outcome of vectorized truncate or zero-extend operations in a loop for different vector types.
This patch also adds performance tests for combined operations of truncate/zero-extend and addition.

The goal of this benchmark is to check the impact of AArch64 specific changes in
D133495, D120571, D135229 and D136722.

Event Timeline

nilanjana_basu created this revision.Nov 15 2022, 1:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2022, 1:13 PM
nilanjana_basu requested review of this revision.Nov 15 2022, 1:13 PM

Why randomized?

Removed randomization in input & combined correctness tests with performance ones. Explicitly added vectorization width for 16 elements since the related patches target this width.

Why randomized?

You're right, it was unnecessary. Removed it in the latest patch & combined the correctness tests with the performance ones.

fhahn added a comment.Nov 15 2022, 3:54 PM

Why randomized?

! In D138059#3929074, @nilanjana_basu wrote:

Removed randomization in input & combined correctness tests with performance ones. Explicitly added vectorization width for 16 elements since the related patches target this width.

I think the main reason for initializing with random data is to make the benchmarks more robust so the optimizer won't be able to (partly) optimize out our benchmark code?

fhahn edited the summary of this revision. (Show Details)Nov 15 2022, 4:00 PM

Why randomized?

! In D138059#3929074, @nilanjana_basu wrote:

Removed randomization in input & combined correctness tests with performance ones. Explicitly added vectorization width for 16 elements since the related patches target this width.

I think the main reason for initializing with random data is to make the benchmarks more robust so the optimizer won't be able to (partly) optimize out our benchmark code?

Why randomized?

! In D138059#3929074, @nilanjana_basu wrote:

Removed randomization in input & combined correctness tests with performance ones. Explicitly added vectorization width for 16 elements since the related patches target this width.

I think the main reason for initializing with random data is to make the benchmarks more robust so the optimizer won't be able to (partly) optimize out our benchmark code?

I checked that at the IR level the generated code had the relevant trunc & zext instructions, 4 times per function for interleave_count 4. I could also see a performance difference based on the related patches it is meant to test. To be on the safe side, what do you think about adding "benchmark::DoNotOptimize(A)" instead? Or should we prefer reverting to the old form?

fhahn added a comment.Nov 17 2022, 2:15 PM

I checked that at the IR level the generated code had the relevant trunc & zext instructions, 4 times per function for interleave_count 4. I could also see a performance difference based on the related patches it is meant to test. To be on the safe side, what do you think about adding "benchmark::DoNotOptimize(A)" instead? Or should we prefer reverting to the old form?

Yeah, it is probably fine now, but testing with a single value also seems to make the test less interesting. You could keep the random initialization and add a version of truncOrZextVecInLoopWithVW8 that disables vectorization to generate comparison data for testing.

Reverted to using random inputs & changed correctness test to compare against same operations with no vectorization

Yeah, it is probably fine now, but testing with a single value also seems to make the test less interesting. You could keep the random initialization and add a version of truncOrZextVecInLoopWithVW8 that disables vectorization to generate comparison data for testing.

Yes, that sounds like a better correctness test. Have updated it.

MicroBenchmarks/LoopVectorization/VectorOperations.cpp
25

Added noinline for ease of verifying the IR to check if this part is not optimized out. Can be removed if deemed unnecessary.

fhahn added inline comments.Nov 22 2022, 3:18 AM
MicroBenchmarks/LoopVectorization/VectorOperations.cpp
25

it might confuse readers, so its better to remove it I think

26

LLM coding style uses uppercase for variables

67

it would also be good to benchmark a version of the loop without any pragmas.

Also, why fix the VF?

141

It might be better to move this before the main loop so we fail early.

159–160

we also need versions with different dst types than i8. Same for the src types for zexts

nilanjana_basu marked an inline comment as done.

Addressed reviewer comments

nilanjana_basu marked an inline comment as done.

Variable name changes

nilanjana_basu marked 2 inline comments as done.

Ran clang-format

nilanjana_basu marked an inline comment as done.Nov 29 2022, 12:58 AM
nilanjana_basu added inline comments.
MicroBenchmarks/LoopVectorization/VectorOperations.cpp
67

Added a version of the benchmark with only vectorization enabled but no other pragmas. Changed the VF to test vectors of length 16 specifically, similar to the VF 8 case, assuming these two are commonly used VFs.

141

Done. Also added the correctness for each version of the benchmarks i.e. for different vectorization configurations. Each configuration should trigger different paths of the trunc/zext lowering & therefore, will be good to be tested for correctness.

fhahn added inline comments.Dec 1 2022, 9:36 AM
MicroBenchmarks/LoopVectorization/VectorOperations.cpp
79

Is this effectively the same code as benchForTruncOrZextVecInLoopWithVW8 and just calling a different benchmark function? If so, better to make the function an argument to avoid duplication?

Removed duplicate code by adding function pointers as parameter as advised in the reviews. Added more performance tests using ZExt/Trunc operations in combination with addition operation.

nilanjana_basu marked an inline comment as done.Dec 1 2022, 12:21 PM
fhahn accepted this revision.Dec 1 2022, 1:51 PM

LGTM, thanks!

MicroBenchmarks/LoopVectorization/VectorOperations.cpp
148

it might help with readability if there was a newline after each BENCHMARK...

This revision is now accepted and ready to land.Dec 1 2022, 1:51 PM
nilanjana_basu retitled this revision from [MicroBenchmarks,AArch64] Added correctness test for truncate or zero-extend vector operations to [MicroBenchmarks,AArch64] Added correctness test & other performance tests for truncate or zero-extend vector operations.Dec 1 2022, 9:49 PM
nilanjana_basu edited the summary of this revision. (Show Details)