This is an archive of the discontinued LLVM Phabricator instance.

Target Power9 bit counting and vector comparison instructions through builtins (front end portion)
ClosedPublic

Authored by nemanjai on Sep 9 2016, 7:43 AM.

Details

Summary

This patch introduces the following builtins:
unsigned int vec_first_match_index (vector signed char, vector signed char);
unsigned int vec_first_match_index (vector unsigned char, vector unsigned char);
unsigned int vec_first_match_index (vector signed int, vector signed int);
unsigned int vec_first_match_index (vector unsigned int, vector unsigned int);
unsigned int vec_first_match_index (vector signed short, vector signed short);
unsigned int vec_first_match_index (vector unsigned short, vector unsigned short);
unsigned int vec_first_match_or_eos_index (vector signed char, vector signed char);
unsigned int vec_first_match_or_eos_index (vector unsigned char, vector unsigned char);
unsigned int vec_first_match_or_eos_index (vector signed int, vector signed int);
unsigned int vec_first_match_or_eos_index (vector unsigned int, vector unsigned int);
unsigned int vec_first_match_or_eos_index (vector signed short, vector signed short);
unsigned int vec_first_match_or_eos_index (vector unsigned short, vector unsigned short);
unsigned int vec_first_mismatch_index (vector signed char, vector signed char);
unsigned int vec_first_mismatch_index (vector unsigned char, vector unsigned char);
unsigned int vec_first_mismatch_index (vector signed int, vector signed int);
unsigned int vec_first_mismatch_index (vector unsigned int, vector unsigned int);
unsigned int vec_first_mismatch_index (vector signed short, vector signed short);
unsigned int vec_first_mismatch_index (vector unsigned short, vector unsigned short);
unsigned int vec_first_mismatch_or_eos_index (vector signed char, vector signed char);
unsigned int vec_first_mismatch_or_eos_index (vector unsigned char, vector unsigned char);
unsigned int vec_first_mismatch_or_eos_index (vector signed int, vector signed int);
unsigned int vec_first_mismatch_or_eos_index (vector unsigned int, vector unsigned int);
unsigned int vec_first_mismatch_or_eos_index (vector signed short, vector signed short);
unsigned int vec_first_mismatch_or_eos_index (vector unsigned short, vector unsigned short);
vector bool char vec_cmpne (vector bool char, vector bool char);
vector bool char vec_cmpne (vector signed char, vector signed char);
vector bool char vec_cmpne (vector unsigned char, vector unsigned char);
vector bool int vec_cmpne (vector bool int, vector bool int);
vector bool int vec_cmpne (vector signed int, vector signed int);
vector bool int vec_cmpne (vector unsigned int, vector unsigned int);
vector bool long long vec_cmpne (vector bool long long, vector bool long long);
vector bool long long vec_cmpne (vector signed long long, vector signed long long);
vector bool long long vec_cmpne (vector unsigned long long, vector unsigned long long);
vector bool short vec_cmpne (vector bool short, vector bool short);
vector bool short vec_cmpne (vector signed short, vector signed short);
vector bool short vec_cmpne (vector unsigned short, vector unsigned short);
vector bool long long vec_cmpne (vector double, vector double);
vector bool int vec_cmpne (vector float, vector float);
vector signed char vec_cnttz (vector signed char);
vector unsigned char vec_cnttz (vector unsigned char);
vector signed int vec_cnttz (vector signed int);
vector unsigned int vec_cnttz (vector unsigned int);
vector signed long long vec_cnttz (vector signed long long);
vector unsigned long long vec_cnttz (vector unsigned long long);
vector signed short vec_cnttz (vector signed short);
vector unsigned short vec_cnttz (vector unsigned short);
vector unsigned char vec_popcnt (vector signed char);
vector unsigned char vec_popcnt (vector unsigned char);
vector unsigned int vec_popcnt (vector signed int);
vector unsigned int vec_popcnt (vector unsigned int);
vector unsigned long long vec_popcnt (vector signed long long);
vector unsigned long long vec_popcnt (vector unsigned long long);
vector unsigned short vec_popcnt (vector signed short);
vector unsigned short vec_popcnt (vector unsigned short);

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai updated this revision to Diff 70835.Sep 9 2016, 7:43 AM
nemanjai retitled this revision from to Target Power9 bit counting and vector comparison instructions through builtins (front end portion).
nemanjai updated this object.
nemanjai set the repository for this revision to rL LLVM.
nemanjai added subscribers: cfe-commits, echristo.

Looking over the patch, I realized that I forgot to add a test case for the POWER9_VECTOR macro and the builtins that target the record forms of the instructions. I'll add those on the next revision along with addressing any review comments.

kbarton edited edge metadata.Sep 22 2016, 12:26 PM

Aside from one minor comment, this LGTM.

lib/Basic/Targets.cpp
1364

Please update this comment to also include float128 and power9-vector.

kbarton accepted this revision.Sep 22 2016, 12:27 PM
kbarton edited edge metadata.
This revision is now accepted and ready to land.Sep 22 2016, 12:27 PM
nemanjai closed this revision.Sep 27 2016, 3:54 AM

Committed revision 282481.

bjope added a subscriber: bjope.Sep 28 2016, 2:37 AM

This test/CodeGen/builtins-ppc-p9vector.c test will fail together with this upcoming LLVM patch https://reviews.llvm.org/D24955

Problem is that lots of your

add i64 {{.*}}, 64

checks will fails since the improved analysis will find out that the add has the "nsw" "nuw" properties.

I'm not so familiar with the regexps used by FileCheck, but somehow we need to (also) allow

add nsw nuw i64 {{.*}}, 64

in the checks to make it more future proof.

This test/CodeGen/builtins-ppc-p9vector.c test will fail together with this upcoming LLVM patch https://reviews.llvm.org/D24955

Problem is that lots of your

add i64 {{.*}}, 64

checks will fails since the improved analysis will find out that the add has the "nsw" "nuw" properties.

I'm not so familiar with the regexps used by FileCheck, but somehow we need to (also) allow

add nsw nuw i64 {{.*}}, 64

in the checks to make it more future proof.

I can change the patterns that check for the add instructions to the following:
// CHECK: add {{[nsuw ]*}}i64 {{.*}}, 64

That will pass with:
add nsw i64
add nuw i64
add nsw nuw i64
...

Basically if all that is found between the "add" and "i64" is any combination of the letters "nsuw" and space, it will pass. As far as I'm concerned, ensuring that the strings there are well formed is irrelevant - all I'm testing is that an add instruction is emitted that adds the constant 64.

I can make the change and check it in if you're in agreement.

bjope added a comment.Sep 28 2016, 4:03 AM

This test/CodeGen/builtins-ppc-p9vector.c test will fail together with this upcoming LLVM patch https://reviews.llvm.org/D24955

Problem is that lots of your

add i64 {{.*}}, 64

checks will fails since the improved analysis will find out that the add has the "nsw" "nuw" properties.

I'm not so familiar with the regexps used by FileCheck, but somehow we need to (also) allow

add nsw nuw i64 {{.*}}, 64

in the checks to make it more future proof.

I can change the patterns that check for the add instructions to the following:
// CHECK: add {{[nsuw ]*}}i64 {{.*}}, 64

That will pass with:
add nsw i64
add nuw i64
add nsw nuw i64
...

Basically if all that is found between the "add" and "i64" is any combination of the letters "nsuw" and space, it will pass. As far as I'm concerned, ensuring that the strings there are well formed is irrelevant - all I'm testing is that an add instruction is emitted that adds the constant 64.

I can make the change and check it in if you're in agreement.

Solution sounds good to me!

And it would be very helpful if you do that. D24955 will be my first patch contributing to llvm :-)

spatel added a subscriber: spatel.Sep 28 2016, 8:52 AM

Having a clang regression/unit test that depends on optimizer behavior is generally viewed as wrong. Can the tests be split into front-end (clang) tests and separate tests for the IR optimizer? Both x86 and AArch64 have done something like that in the last few months for testing of builtins/intrinsics.

Having a clang regression/unit test that depends on optimizer behavior is generally viewed as wrong. Can the tests be split into front-end (clang) tests and separate tests for the IR optimizer? Both x86 and AArch64 have done something like that in the last few months for testing of builtins/intrinsics.

Yeah, that sounds reasonable. I'll remove the -O2 from the test case and remove the checks for the select instructions. That's really the only major difference. So am I to understand the nsw/nuw flags will not be added without -O2 and the aforementioned changes will suffice?

Having a clang regression/unit test that depends on optimizer behavior is generally viewed as wrong. Can the tests be split into front-end (clang) tests and separate tests for the IR optimizer? Both x86 and AArch64 have done something like that in the last few months for testing of builtins/intrinsics.

Yeah, that sounds reasonable. I'll remove the -O2 from the test case and remove the checks for the select instructions. That's really the only major difference. So am I to understand the nsw/nuw flags will not be added without -O2 and the aforementioned changes will suffice?

Changing to -O0 or using -disable-llvm-optzns should keep the clang tests from breaking due to underlying changes in the IR optimizer. That may lead to a lot of bloat though. In http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20160307/152324.html , it was viewed as ok, if not ideal, to pipe the clang IR output using "opt -S -mem2reg".

Note that clang itself uses APIs like IRBuilder::CreateNUWSub(), so I think it's possible to see no-wrap IR even without the IR optimizer kicking in (but probably isn't a concern in this case?).

Should also mention:
https://reviews.llvm.org/D17999
has scripts attached that could make this kind of test generation a lot easier. :)

echristo added inline comments.Sep 28 2016, 2:23 PM
lib/Basic/Targets.cpp
1393

Can you combine this with the power8 check above?

1500

This is starting to look a bit complicated here as I think you're now enabling power9 vector on power8? :\

test/CodeGen/builtins-ppc-p9vector.c
2

Please no code generation checks in clang. I doubt you really need to even pipe this through opt unless you feel that alloca issues are going to cause problems. I'd prefer just to check unoptimized IR.

bjope added a comment.Oct 5 2016, 11:02 AM

What is the progress about getting rid of these code generation checks?

(I'm still hesitating about commiting D24955 in llvm since that would make these clang tests fail...)

(I'm still hesitating about commiting D24955 in llvm since that would make these clang tests fail...)

You can't do that. Bots will send you fail mail all day as they choke on the clang tests - speaking from experience. :)
We either need to fix or revert this commit in order to let D24955 proceed.