This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
SingleSource/UnitTests/Vectorizer/
-
UnitTests/
-
Vectorizer/
2/2
CMakeLists.txt
29/31
runtime-checks.cpp
-
runtime-checks.reference_output

Differential D119121

[test-suite] Add unit tests for vectorizer memory runtime checks.
ClosedPublic

Authored by fhahn on Feb 7 2022, 2:30 AM.

Download Raw Diff

Details

Reviewers

Meinersbur
dmgreen
lebedev.ri

Commits

rT01e720025669: [test-suite] Add unit tests for vectorizer memory runtime checks.

Summary

This patch adds a first set of tests to check memory runtime checks
generated by the vectorizer.

The it runs scalar and vectorized versions of a loop requiring runtime
checks on the same inputs with pointers to the same buffer using various
offsets. It fails if they do not produce the same results.

The test functions are provided as lambdas, which are passed to a
driver function that generates the inputs and calls the lambdas with
pointers to overlapping buffers. The driver functions are marked as
noinline, which should act as an optimization barrier so the lambdas in
turn cannot be inlined and optimized without runtime checks.

Unfortunately 2 separate lambdas need to be specified for the scalar and
vector versions, with the only difference being the pragma to disable
vectorization. If anybody knows a nice generic & convenient way to
specify the loop once, what would be great.

Diff Detail

Repository: rT test-suite

Event Timeline

fhahn created this revision.Feb 7 2022, 2:30 AM

Herald added a subscriber: mgorny. · View Herald TranscriptFeb 7 2022, 2:30 AM

Harbormaster completed remote builds in B147912: Diff 406370.Feb 7 2022, 2:30 AM

fhahn requested review of this revision.Feb 7 2022, 2:30 AM

xbolva00 added a subscriber: xbolva00.Feb 7 2022, 3:03 AM

xbolva00 added inline comments.

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
212	template <int F> int foo(int d, int N) { int s = 0; #pragma unroll F for (int i = 0; i < N; ++i) { s += d[i]; } return s; } int p(int d, int N) { return foo<2>(d, N); } Maybe similar idea can be used here? or with _Pragma..

Add macro to define both ScalarFn and VectorFn, given a loop.

Harbormaster completed remote builds in B147973: Diff 406445.Feb 7 2022, 6:56 AM

fhahn added inline comments.Feb 7 2022, 6:58 AM

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
212	Thanks, I added a macro that generates both `ScalarFn` and `VectorFn`, given a loop body. It uses `_Pragma`. Alternative would be a template + `vectorize_width`, but template lambdas are a bit awkward.

I doubt that __attribute__((noinline)) suffices as an optimization barrier, IPO such as IPConstantPropagation/FunctionSpecialization can still take place. Even worse, the allocation are made within the check functions and the lambda being inlined such that the optimizer can see the allocation.

But do we even need the optimization barrier? How could the memory check be optimized away?

[not a change request] If we vary interleaving as well, we would not be restricted to the target architecture's vector width.

SingleSource/UnitTests/Vectorizer/CMakeLists.txt
2	Can we let cmake select the flag? https://cmake.org/cmake/help/latest/prop_tgt/CXX_STANDARD.html
SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
8	Something that would have made me understand the purpose easer.
25	I assume it is not possible for NaNs to appear?
58–59	I'd have preferred some explicit expression that gives confidence in that we don't accidently go out of range e.g. when increasing the span for offsets ant to know when they actually overlap.
61–63	It might be more predictable if reusing the same allocation. Otherwise each allocation may have a different alignment and effectively some offsets (relative to virtual address space, eg. page boundaries) are skipped. In case the vectorizer adds a prologue to ensure vector memory accesses are aligned.
72	[style] No almost-always-auto
76
83–84	Do we really need that many offsets? The trip count is just 100 so everything outside [-100,100] already seems redundant.
219

Meinersbur added inline comments.Feb 9 2022, 5:31 PM

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
8	I just noticed the added "from/to same memory buffer" would be redundant.

I doubt that attribute((noinline)) suffices as an optimization barrier, IPO such as IPConstantPropagation/FunctionSpecialization can still take place. Even worse, the allocation are made within the check functions and the lambda being inlined such that the optimizer can see the allocation.

Sadly, work on ‘noipa’ stalled. And we totally miss “noclone”, but maybe not a issue yet, since funcspec pass is not enabled yet.

Thank you very much for taking a look! Comments should be addressed.

In D119121#3310103, @xbolva00 wrote:

I doubt that attribute((noinline)) suffices as an optimization barrier, IPO such as IPConstantPropagation/FunctionSpecialization can still take place. Even worse, the allocation are made within the check functions and the lambda being inlined such that the optimizer can see the allocation.

Sadly, work on ‘noipa’ stalled. And we totally miss “noclone”, but maybe not a issue yet, since funcspec pass is not enabled yet.

My thinking was that because checkOverlappingMemoryTwoRuntimeChecks won't be inlined and called with different lambda arguments, the scalar/vector lambdas won't be inlined. It's true that this would not prevent function specialization & co from interfering, but it seems unlikely that it would be profitable.

But do we even need the optimization barrier? How could the memory check be optimized away?

In the current version, I don't think there's a need for a real barrier any longer. Previously, with the larger offset there may have been cases where AA could prove that the 2 accessed regions won't overlap.

[not a change request] If we vary interleaving as well, we would not be restricted to the target architecture's vector width.

I was wondering whether we should rely on the vectorizer choosing vectorization factors and interleave counts automatically or if we should force them instead. My reasoning for letting the compiler chose is that we get different combinations for different targets, possibly increasing coverage overall. We could chose the VF automatically and cover a range of user-provided interleave counts ?

Harbormaster completed remote builds in B148824: Diff 407652.Feb 10 2022, 12:52 PM

fhahn marked 3 inline comments as done.Feb 10 2022, 12:54 PM

fhahn added inline comments.

SingleSource/UnitTests/Vectorizer/CMakeLists.txt
2	That looks better, thanks!
SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
8	Thanks, I added `between reads and writes`. I hope this makes it clearer.
25	At the moment it is only used with integer types. I think `uniform_int_distribution` also only works with integer types, but I think. we can skip testing with floating point types .
58–59	Agreed, but Im not sure how to best provide such an expression, combined with enforcing it when specifying the test loops. I've left it as is for now, but I'm more than happy to adjust it if there's a better alternative.
61–63	Sounds good, I moved it out to `checkOverlappingMemoryOneRuntimeCheck`. Was that what you had in mind?
72	Fixed, thanks!
76	Fixed, thanks!
83–84	Good point! Originally the tests used larger trip counts, but now [-100, 100] should be sufficient to cover all cases.
219	Updated, thanks!

ping :)

I tried copying this into a godbolt link, and is suggest you may need to #include <functional> under some systems. There are also some warnings that might be worth cleaning up: https://godbolt.org/z/sMTjfKbGo

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
209	Should this one have this loop unroll Pragma here? It's different to the others.

[suggestion] I am still not convinced __attribute__((noinline)) is useful and give the wrong impression that its semantics solves the problem. What should not be inlined is the VectorFn lambda. Is it possible to attach a noinline attribute to it? Or maybe only call it indirectly through a __attribute__((optnone)) function. As a suggestion, I will leave it up to you whether you think it is worth it.

In D119121#3312322, @fhahn wrote:

[not a change request] If we vary interleaving as well, we would not be restricted to the target architecture's vector width.

I was wondering whether we should rely on the vectorizer choosing vectorization factors and interleave counts automatically or if we should force them instead. My reasoning for letting the compiler chose is that we get different combinations for different targets, possibly increasing coverage overall. We could chose the VF automatically and cover a range of user-provided interleave counts ?

[not a change request] My reasoning was to force range of VFs (or interleave count since it does not require a specific instruction set), so the test covers all possible VFs without requiring access to platforms for each of them to test. The native choice by LoopVectorize can still be tested in addition to that, in case the overlap test makes a difference.

In D119121#3325503, @dmgreen wrote:

I tried copying this into a godbolt link, and is suggest you may need to #include <functional> under some systems. There are also some warnings that might be worth cleaning up: https://godbolt.org/z/sMTjfKbGo

gcc indeed does not compile it: https://godbolt.org/z/3acnP796E It would be nice to keep the test-suite buildable with gcc as a comparison/reference.

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
61–63	yes, thank you.
72	Can make the `&Reference[NumArrayElements / 2]` change too? Its the exact same as `&Reference[0] + NumArrayElements / 2`. `&Reference[0]` is only a more convoluted way to say `Reference.get()`.
125

Address latest comments, thanks!

In D119121#3327683, @Meinersbur wrote:

[suggestion] I am still not convinced __attribute__((noinline)) is useful and give the wrong impression that its semantics solves the problem. What should not be inlined is the VectorFn lambda. Is it possible to attach a noinline attribute to it? Or maybe only call it indirectly through a __attribute__((optnone)) function. As a suggestion, I will leave it up to you whether you think it is worth it.

Updated to call the vector functions through a optnone function :)

In D119121#3312322, @fhahn wrote:

[not a change request] If we vary interleaving as well, we would not be restricted to the target architecture's vector width.

I was wondering whether we should rely on the vectorizer choosing vectorization factors and interleave counts automatically or if we should force them instead. My reasoning for letting the compiler chose is that we get different combinations for different targets, possibly increasing coverage overall. We could chose the VF automatically and cover a range of user-provided interleave counts ?

[not a change request] My reasoning was to force range of VFs (or interleave count since it does not require a specific instruction set), so the test covers all possible VFs without requiring access to platforms for each of them to test. The native choice by LoopVectorize can still be tested in addition to that, in case the overlap test makes a difference.

Yeah, I think that might be worth as follow-up.

In D119121#3325503, @dmgreen wrote:

I tried copying this into a godbolt link, and is suggest you may need to #include <functional> under some systems. There are also some warnings that might be worth cleaning up: https://godbolt.org/z/sMTjfKbGo

gcc indeed does not compile it: https://godbolt.org/z/3acnP796E It would be nice to keep the test-suite buildable with gcc as a comparison/reference.

Thanks, I added the missing include (probably a slight libc++/libstdc++ difference).

Harbormaster completed remote builds in B150260: Diff 409676.Feb 17 2022, 8:50 AM

fhahn added inline comments.Feb 17 2022, 8:50 AM

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
72	Missed that originally, update, thanks!
125	Adjusted, thanks!
209	I think Clang performs runtime unrolling here before vectorization and that's why I added the pragma. It's not strictly necessary though.

LGTM

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
50–51	Could you add a description for why we are using this function?

This revision is now accepted and ready to land.Feb 17 2022, 10:27 AM

Closed by commit rT01e720025669: [test-suite] Add unit tests for vectorizer memory runtime checks. (authored by fhahn). · Explain WhyFeb 19 2022, 1:12 PM

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rT01e720025669: [test-suite] Add unit tests for vectorizer memory runtime checks..

fhahn marked an inline comment as done.Feb 19 2022, 1:15 PM

fhahn added inline comments.

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
50–51	Added in the committed version, thanks!

Kai added a subscriber: Kai.Mar 16 2022, 6:30 AM

Kai added inline comments.

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
17	According to https://reviews.llvm.org/D120630 `std::uniform_int_distribution<char>` is UB. It gives a compile error with latest C++lib.

Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2022, 6:30 AM

fhahn marked an inline comment as done.Mar 21 2022, 9:05 AM

fhahn added inline comments.

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
17	Thanks for the heads-up! IIUC it needs to be an unsigned type. So probably the best way to fix this is to instantiate `std::uniform_int_distribution` with something like `uint64_t` and then have the result converted?

fhahn marked an inline comment as done.Mar 28 2022, 10:31 AM

fhahn added inline comments.

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp
17	Should be fixed by c6828626a33992288943cdf363b997d540289638

Revision Contents

Path

Size

SingleSource/

UnitTests/

Vectorizer/

CMakeLists.txt

1 line

runtime-checks.cpp

273 lines

runtime-checks.reference_output

28 lines

Diff 410105

SingleSource/UnitTests/Vectorizer/CMakeLists.txt

	llvm_singlesource()			llvm_singlesource()
				set_property(TARGET runtime-checks PROPERTY CXX_STANDARD 17)
				MeinersburUnsubmitted Done Reply Inline Actions Can we let cmake select the flag? https://cmake.org/cmake/help/latest/prop_tgt/CXX_STANDARD.html Meinersbur: Can we let cmake select the flag? https://cmake.org/cmake/help/latest/prop_tgt/CXX_STANDARD.
				fhahnAuthorUnsubmitted Done Reply Inline Actions That looks better, thanks! fhahn: That looks better, thanks!

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp

This file was added.

#include <functional>

#include <iostream>

#include <limits>

#include <memory>

#include <random>

// Tests for memory runtime checks generated by the vectorizer. Runs scalar and

// vectorized versions of a loop requiring runtime checks on the same inputs

MeinersburUnsubmitted

Done

// vectorized versions of a loop requiring runtime checks on the same inputs

- // with pointers to the same buffer using various offsets. Fails if they do not

+ // with pointers to the same buffer using various offsets between reading and writing from/to same memory buffer. Fails if they do not

// produce the same results.

Something that would have made me understand the purpose easer.

Meinersbur: Something that would have made me understand the purpose easer.

MeinersburUnsubmitted

Done

I just noticed the added "from/to same memory buffer" would be redundant.

Meinersbur: I just noticed the added "from/to same memory buffer" would be redundant.

fhahnAuthorUnsubmitted

Done

Thanks, I added between reads and writes. I hope this makes it clearer.

fhahn: Thanks, I added `between reads and writes`. I hope this makes it clearer.

// with pointers to the same buffer using various offsets between reads and

// writes. Fails if they do not produce the same results.

static std::mt19937 rng;

// Initialize arrays A with random numbers.

template <typename Ty>

static void init_data(const std::unique_ptr<Ty[]> &A, unsigned N) {

std::uniform_int_distribution<Ty> distrib(std::numeric_limits<Ty>::min(),

KaiUnsubmitted

Done

According to https://reviews.llvm.org/D120630 std::uniform_int_distribution<char> is UB. It gives a compile error with latest C++lib.

Kai: According to https://reviews.llvm.org/D120630 `std::uniform_int_distribution<char>` is UB. It…

fhahnAuthorUnsubmitted

Done

Thanks for the heads-up! IIUC it needs to be an unsigned type. So probably the best way to fix this is to instantiate std::uniform_int_distribution with something like uint64_t and then have the result converted?

fhahn: Thanks for the heads-up! IIUC it needs to be an unsigned type. So probably the best way to fix…

fhahnAuthorUnsubmitted

Done

Should be fixed by c6828626a33992288943cdf363b997d540289638

fhahn: Should be fixed by c6828626a33992288943cdf363b997d540289638

std::numeric_limits<Ty>::max());

for (unsigned i = 0; i < N; i++)

A[i] = distrib(rng);

}

template <typename Ty>

static void check(const std::unique_ptr<Ty[]> &Reference,

const std::unique_ptr<Ty[]> &Tmp, unsigned NumElements,

MeinersburUnsubmitted

Done

I assume it is not possible for NaNs to appear?

Meinersbur: I assume it is not possible for NaNs to appear?

fhahnAuthorUnsubmitted

Done

At the moment it is only used with integer types. I think uniform_int_distribution also only works with integer types, but I think. we can skip testing with floating point types .

fhahn: At the moment it is only used with integer types. I think `uniform_int_distribution` also only…

int Offset) {

if (!std::equal(&Reference[0], &Reference[0] + NumElements, &Tmp[0])) {

std::cerr << "Miscompare with offset " << Offset << "\n";

exit(1);

}

#define DEFINE_SCALAR_AND_VECTOR_FN2(Loop) \

auto ScalarFn = [](auto *A, auto *B, unsigned TC) { \

_Pragma("clang loop vectorize(disable)") Loop \

}; \

auto VectorFn = [](auto *A, auto *B, unsigned TC) { \

_Pragma("clang loop vectorize(enable)") Loop \

};

#define DEFINE_SCALAR_AND_VECTOR_FN3(Loop) \

auto ScalarFn = [](auto *A, auto *B, auto *C, unsigned TC) { \

_Pragma("clang loop vectorize(disable)") Loop \

}; \

auto VectorFn = [](auto *A, auto *B, auto *C, unsigned TC) { \

_Pragma("clang loop vectorize(enable)") Loop \

};

// Helper to call \p f with \p args and acts as optimization barrier for \p f.

template <typename F, typename... Args>

__attribute__((optnone)) static void callThroughOptnone(F &&f, Args &&...args) {

MeinersburUnsubmitted

Done

Could you add a description for why we are using this function?

Meinersbur: Could you add a description for why we are using this function?

fhahnAuthorUnsubmitted

Done

Added in the committed version, thanks!

fhahn: Added in the committed version, thanks!

f(std::forward<Args>(args)...);

}

template <typename Ty> using Fn2Ty = std::function<void(Ty *, Ty *, unsigned)>;

// Run both \p ScalarFn and \p VectorFn on the same inputs with pointers to the

// same buffer. Fail if they do not produce the same results.

template <typename Ty>

MeinersburUnsubmitted

Done

I'd have preferred some explicit expression that gives confidence in that we don't accidently go out of range e.g. when increasing the span for offsets ant to know when they actually overlap.

Meinersbur: I'd have preferred some explicit expression that gives confidence in that we don't accidently…

fhahnAuthorUnsubmitted

Done

Agreed, but Im not sure how to best provide such an expression, combined with enforcing it when specifying the test loops. I've left it as is for now, but I'm more than happy to adjust it if there's a better alternative.

fhahn: Agreed, but Im not sure how to best provide such an expression, combined with enforcing it when…

static void checkOverlappingMemoryOneRuntimeCheck(Fn2Ty<Ty> ScalarFn,

Fn2Ty<Ty> VectorFn,

const char *Name) {

std::cout << "Checking " << Name << "\n";

MeinersburUnsubmitted

Done

It might be more predictable if reusing the same allocation. Otherwise each allocation may have a different alignment and effectively some offsets (relative to virtual address space, eg. page boundaries) are skipped. In case the vectorizer adds a prologue to ensure vector memory accesses are aligned.

Meinersbur: It might be more predictable if reusing the same allocation. Otherwise each allocation may have…

fhahnAuthorUnsubmitted

Done

Sounds good, I moved it out to checkOverlappingMemoryOneRuntimeCheck. Was that what you had in mind?

fhahn: Sounds good, I moved it out to `checkOverlappingMemoryOneRuntimeCheck`. Was that what you had…

MeinersburUnsubmitted

Not Done

yes, thank you.

Meinersbur: yes, thank you.

unsigned N = 100;

// Make sure we have enough extra elements so we can be liberal with offsets.

unsigned NumArrayElements = N * 8;

std::unique_ptr<Ty[]> Input1(new Ty[NumArrayElements]);

std::unique_ptr<Ty[]> Reference(new Ty[NumArrayElements]);

std::unique_ptr<Ty[]> ToCheck(new Ty[NumArrayElements]);

auto CheckWithOffset = [&](int Offset) {

MeinersburUnsubmitted

Done

// Run scalar function to generate reference output.

- auto *ReferenceStart = &Reference[0] + NumArrayElements / 2;

+ Ty *ReferenceStart = &Reference[NumArrayElements / 2];

ScalarFn(ReferenceStart + Offset, ReferenceStart, N);

[style] No almost-always-auto

Meinersbur: [style] No almost-always-auto

fhahnAuthorUnsubmitted

Done

Fixed, thanks!

fhahn: Fixed, thanks!

MeinersburUnsubmitted

Done

Can make the &Reference[NumArrayElements / 2] change too? Its the exact same as &Reference[0] + NumArrayElements / 2. &Reference[0] is only a more convoluted way to say Reference.get().

Meinersbur: Can make the `&Reference[NumArrayElements / 2]` change too? Its the exact same as `&Reference…

fhahnAuthorUnsubmitted

Done

Missed that originally, update, thanks!

fhahn: Missed that originally, update, thanks!

init_data(Input1, NumArrayElements);

for (unsigned i = 0; i < NumArrayElements; i++) {

Reference[i] = Input1[i];

ToCheck[i] = Input1[i];

MeinersburUnsubmitted

Done

// Run vector function to generate output to check.

- auto *StartPtr = &ToCheck[0] + NumArrayElements / 2;

+ Ty *StartPtr = &ToCheck[NumArrayElements / 2];

VectorFn(StartPtr + Offset, StartPtr, N);

Meinersbur:

fhahnAuthorUnsubmitted

Done

Fixed, thanks!

fhahn: Fixed, thanks!

}

// Run scalar function to generate reference output.

Ty *ReferenceStart = &Reference[NumArrayElements / 2];

ScalarFn(ReferenceStart + Offset, ReferenceStart, N);

// Run vector function to generate output to check.

Ty *StartPtr = &ToCheck[NumArrayElements / 2];

MeinersburUnsubmitted

Not Done

Do we really need that many offsets? The trip count is just 100 so everything outside [-100,100] already seems redundant.

Meinersbur: Do we really need that many offsets? The trip count is just 100 so everything outside [-100…

fhahnAuthorUnsubmitted

Done

Good point! Originally the tests used larger trip counts, but now [-100, 100] should be sufficient to cover all cases.

fhahn: Good point! Originally the tests used larger trip counts, but now [-100, 100] should be…

callThroughOptnone(VectorFn, StartPtr + Offset, StartPtr, N);

// Compare scalar and vector output.

check(Reference, ToCheck, NumArrayElements, Offset);

};

for (int i = -100; i <= 100; i++)

CheckWithOffset(i);

}

template <typename Ty>

using Fn3Ty = std::function<void(Ty *, Ty *, Ty *, unsigned)>;

template <typename Ty>

static void checkOverlappingMemoryTwoRuntimeChecks(Fn3Ty<Ty> ScalarFn,

Fn3Ty<Ty> VectorFn,

const char *Name) {

std::cout << "Checking " << Name << "\n";

unsigned N = 100;

// Make sure we have enough extra elements so we can be liberal with offsets.

unsigned NumArrayElements = N * 8;

std::unique_ptr<Ty[]> Input1(new Ty[NumArrayElements]);

std::unique_ptr<Ty[]> Input2(new Ty[NumArrayElements]);

std::unique_ptr<Ty[]> Reference(new Ty[NumArrayElements]);

std::unique_ptr<Ty[]> ToCheck(new Ty[NumArrayElements]);

auto CheckWithOffsetSecond = [&](int Offset) {

init_data(Input1, NumArrayElements);

init_data(Input2, NumArrayElements);

for (unsigned i = 0; i < NumArrayElements; i++) {

Reference[i] = Input1[i];

ToCheck[i] = Input1[i];

}

// Run scalar function to generate reference output.

Ty *ReferenceStart = &Reference[NumArrayElements / 2];

ScalarFn(ReferenceStart + Offset, &Input2[0], ReferenceStart, N);

// Run vector function to generate output to check.

Ty *StartPtr = &ToCheck[NumArrayElements / 2];

callThroughOptnone(VectorFn, StartPtr + Offset, &Input2[0], StartPtr, N);

MeinersburUnsubmitted

Done

check(Reference, ToCheck, NumArrayElements, Offset);

};

- for (int i = -200; i <= 200; i++)

+ for (int i = -100; i <= 100; i++)

CheckWithOffsetSecond(i);

Meinersbur:

fhahnAuthorUnsubmitted

Done

Adjusted, thanks!

fhahn: Adjusted, thanks!

// Compare scalar and vector output.

check(Reference, ToCheck, NumArrayElements, Offset);

};

for (int i = -100; i <= 100; i++)

CheckWithOffsetSecond(i);

}

int main(void) {

rng = std::mt19937(15);

{

DEFINE_SCALAR_AND_VECTOR_FN2(

for (unsigned i = 0; i < TC; i++)

A[i] = B[i] + 10;

);

checkOverlappingMemoryOneRuntimeCheck<char>(

ScalarFn, VectorFn, "1 read, 1 write, step 1, char");

checkOverlappingMemoryOneRuntimeCheck<int>(ScalarFn, VectorFn,

"1 read, 1 write, step 1, int");

checkOverlappingMemoryOneRuntimeCheck<long long>(

ScalarFn, VectorFn, "1 read, 1 write, step 1, long long");

}

{

DEFINE_SCALAR_AND_VECTOR_FN2(

for (unsigned i = 0; i < TC; i++)

A[i] = B[i + 3] + 10;

);

checkOverlappingMemoryOneRuntimeCheck<char>(

ScalarFn, VectorFn, "1 read, 1 write, offset 3, char");

checkOverlappingMemoryOneRuntimeCheck<int>(

ScalarFn, VectorFn, "1 read, 1 write, offset 3, int");

checkOverlappingMemoryOneRuntimeCheck<long long>(

ScalarFn, VectorFn, "1 read, 1 write, offset 3, long long");

}

{

DEFINE_SCALAR_AND_VECTOR_FN2(

for (unsigned i = 3; i < TC; i++)

A[i] = B[i - 3] + 10;

);

checkOverlappingMemoryOneRuntimeCheck<char>(

ScalarFn, VectorFn, "1 read, 1 write, offset -3, char");

checkOverlappingMemoryOneRuntimeCheck<int>(

ScalarFn, VectorFn, "1 read, 1 write, offset -3, int");

checkOverlappingMemoryOneRuntimeCheck<long long>(

ScalarFn, VectorFn, "1 read, 1 write, offset -3, long long");

}

{

DEFINE_SCALAR_AND_VECTOR_FN2(

for (unsigned i = TC; i > 0; i--)

A[i] = B[i] + 10;);

checkOverlappingMemoryOneRuntimeCheck<char>(

ScalarFn, VectorFn, "1 read, 1 write, index count down, char");

checkOverlappingMemoryOneRuntimeCheck<int>(

ScalarFn, VectorFn, "1 read, 1 write, index count down, int");

checkOverlappingMemoryOneRuntimeCheck<long long>(

ScalarFn, VectorFn, "1 read, 1 write, index count down, long long");

}

{

DEFINE_SCALAR_AND_VECTOR_FN2(

for (unsigned i = TC; i > 2; i -= 2)

A[i] = B[i] + 10;

);

checkOverlappingMemoryOneRuntimeCheck<char>(

ScalarFn, VectorFn, "1 read, 1 write, index count down 2, char");

checkOverlappingMemoryOneRuntimeCheck<int>(

ScalarFn, VectorFn, "1 read, 1 write, index count down 2, int");

checkOverlappingMemoryOneRuntimeCheck<long long>(

ScalarFn, VectorFn, "1 read, 1 write, index count down 2, long long");

}

{

DEFINE_SCALAR_AND_VECTOR_FN2(

for (unsigned i = 0, j = 0; i < TC; i++) {

dmgreenUnsubmitted

Done

Should this one have this loop unroll Pragma here? It's different to the others.

dmgreen: Should this one have this loop unroll Pragma here? It's different to the others.

fhahnAuthorUnsubmitted

Done

I *think* Clang performs runtime unrolling here before vectorization and that's why I added the pragma. It's not strictly necessary though.

fhahn: I *think* Clang performs runtime unrolling here before vectorization and that's why I added the…

A[i] = B[j] + 10;

j += 2;

}

xbolva00Unsubmitted

Done

template <int F>
int foo(int *d, int N) {
    int s = 0;
#pragma unroll F
    for (int i = 0; i < N; ++i) {
        s += d[i];
    }

    return s;
}

int p(int *d, int N) { return foo<2>(d, N); }

Maybe similar idea can be used here? or with _Pragma..

xbolva00: ``` template <int F> int foo(int *d, int N) { int s = 0; #pragma unroll F for (int i =…

fhahnAuthorUnsubmitted

Done

Thanks, I added a macro that generates both ScalarFn and VectorFn, given a loop body. It uses _Pragma. Alternative would be a template + vectorize_width, but template lambdas are a bit awkward.

fhahn: Thanks, I added a macro that generates both `ScalarFn` and `VectorFn`, given a loop body. It…

);

checkOverlappingMemoryOneRuntimeCheck<char>(

ScalarFn, VectorFn,

"1 read, 1 write, 2 inductions, different steps, char");

checkOverlappingMemoryOneRuntimeCheck<int>(

ScalarFn, VectorFn,

"1 read, 1 write, 2 inductions, different steps, int");

MeinersburUnsubmitted

Done

DEFINE_SCALAR_AND_VECTOR_FN2(

- _Pragma("loop unroll(disable)")

+ _Pragma("clang loop unroll(disable)")

for (unsigned i = 0; i < TC; i += 2) {

Meinersbur:

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

checkOverlappingMemoryOneRuntimeCheck<long long>(

ScalarFn, VectorFn,

"1 read, 1 write, 2 inductions, different steps, long long");

}

{

DEFINE_SCALAR_AND_VECTOR_FN2(

_Pragma("clang loop unroll(disable)")

for (unsigned i = 0; i < TC; i += 2)

A[i] = B[i] + 10;

);

checkOverlappingMemoryOneRuntimeCheck<char>(

ScalarFn, VectorFn, "1 read, 1 write, induction increment 2, char");

checkOverlappingMemoryOneRuntimeCheck<int>(

ScalarFn, VectorFn, "1 read, 1 write, induction increment 2, int");

checkOverlappingMemoryOneRuntimeCheck<long long>(

ScalarFn, VectorFn,

"1 read, 1 write, induction increment 2, long long");

}

{

DEFINE_SCALAR_AND_VECTOR_FN3(

for (unsigned i = 0; i < TC; i++)

A[i] = B[i] + C[i] + 10;

);

checkOverlappingMemoryTwoRuntimeChecks<int>(

ScalarFn, VectorFn, "2 reads, 1 write, simple indices, int");

checkOverlappingMemoryTwoRuntimeChecks<char>(

ScalarFn, VectorFn, "2 reads, 1 write, simple indices, char");

checkOverlappingMemoryTwoRuntimeChecks<long long>(

ScalarFn, VectorFn, "2 reads, 1 write, simple indices, long long");

}

{

DEFINE_SCALAR_AND_VECTOR_FN3(

for (unsigned i = 0; i < TC; i++) {

auto X = C[i] + 10;

A[i] = X;

B[i] = X + 9;

}

);

checkOverlappingMemoryTwoRuntimeChecks<char>(

ScalarFn, VectorFn, "1 read, 2 writes, simple indices, char");

checkOverlappingMemoryTwoRuntimeChecks<int>(

ScalarFn, VectorFn, "1 read, 2 writes, simple indices, int");

checkOverlappingMemoryTwoRuntimeChecks<long long>(

ScalarFn, VectorFn, "1 read, 2 writes, simple indices, long long");

}

return 0;

}

SingleSource/UnitTests/Vectorizer/runtime-checks.reference_output

This file was added.

				Checking 1 read, 1 write, step 1, char
				Checking 1 read, 1 write, step 1, int
				Checking 1 read, 1 write, step 1, long long
				Checking 1 read, 1 write, offset 3, char
				Checking 1 read, 1 write, offset 3, int
				Checking 1 read, 1 write, offset 3, long long
				Checking 1 read, 1 write, offset -3, char
				Checking 1 read, 1 write, offset -3, int
				Checking 1 read, 1 write, offset -3, long long
				Checking 1 read, 1 write, index count down, char
				Checking 1 read, 1 write, index count down, int
				Checking 1 read, 1 write, index count down, long long
				Checking 1 read, 1 write, index count down 2, char
				Checking 1 read, 1 write, index count down 2, int
				Checking 1 read, 1 write, index count down 2, long long
				Checking 1 read, 1 write, 2 inductions, different steps, char
				Checking 1 read, 1 write, 2 inductions, different steps, int
				Checking 1 read, 1 write, 2 inductions, different steps, long long
				Checking 1 read, 1 write, induction increment 2, char
				Checking 1 read, 1 write, induction increment 2, int
				Checking 1 read, 1 write, induction increment 2, long long
				Checking 2 reads, 1 write, simple indices, int
				Checking 2 reads, 1 write, simple indices, char
				Checking 2 reads, 1 write, simple indices, long long
				Checking 1 read, 2 writes, simple indices, char
				Checking 1 read, 2 writes, simple indices, int
				Checking 1 read, 2 writes, simple indices, long long
				exit 0

This is an archive of the discontinued LLVM Phabricator instance.

[test-suite] Add unit tests for vectorizer memory runtime checks.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 410105

SingleSource/UnitTests/Vectorizer/CMakeLists.txt

SingleSource/UnitTests/Vectorizer/runtime-checks.cpp

SingleSource/UnitTests/Vectorizer/runtime-checks.reference_output

[test-suite] Add unit tests for vectorizer memory runtime checks.
ClosedPublic