This is an archive of the discontinued LLVM Phabricator instance.

Patched clang to emit x86 blends as shufflevectors.
ClosedPublic

Authored by filcab on May 2 2014, 5:55 PM.

Details

Summary

Most of the clang header patch by Simon Pilgrim @ SCEE.
Also fixed (or added) clang tests for these intrinsics.

LLVM tests to make sure we get the blend instruction out of these
shufflevectors are at http://reviews.llvm.org/D3600

Diff Detail

Event Timeline

filcab updated this revision to Diff 9053.May 2 2014, 5:55 PM
filcab retitled this revision from to Patched clang to emit x86 blends as shufflevectors..
filcab updated this object.
filcab edited the test plan for this revision. (Show Details)
filcab added reviewers: eli.friedman, craig.topper.
filcab added a subscriber: Unknown Object (MLST).

Should code that is directly using the builtins themselves (like
builtin_ia32_pblendw256) be optimized too? If so wouldn't it be
better to, for example, leave _mm256_blend_epi16 as is, remove
builtin_ia32_pblendw256 from BuiltinsX86.def and make it a #define
to shufflevector?

filcab added a comment.May 6 2014, 6:52 PM

Ah, I hadn't thought of that.
But it seems that the gcc manual explicitly says they're functions and that
they're available:
http://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/X86-Built-in-Functions.html#X86-Built-in-Functions
I'm not sure, but I suppose we should keep them if they're documented as
available, right? Or are we not maintaining compatibility here (our manual
doesn't mention these builtins, AFAICT)?

I also don't see any other intrinsic doing the same (which doesn't mean we
can't start now, obviously).

Filipe

Filipe points out that we cannot use a #define for the __builtin since it has to be available even when no .h is include. Optimizing the builtin would then require custom code in clang/llvm, which is probably not worth it.

lib/Headers/avx2intrin.h
163–179

Why the change to __m256d? The intel manual says the signature is

m256i _mm256_blend_epi16 (m256i v1, __m256i v2, const int mask)

168

The masks looks wrong for the hight bits. Shouldn't this be

(((M) & 0x01) ? 16 : 0), \
(((M) & 0x02) ? 17 : 1), \
(((M) & 0x04) ? 18 : 2), \
(((M) & 0x08) ? 19 : 3), \
(((M) & 0x10) ? 20 : 4), \
(((M) & 0x20) ? 21 : 5), \
(((M) & 0x40) ? 22 : 6), \
(((M) & 0x80) ? 23 : 7), \
(((M) & 0x01) ? 24 : 8), \
(((M) & 0x02) ? 25 : 9), \
(((M) & 0x04) ? 26 : 10), \
(((M) & 0x08) ? 27 : 11), \
(((M) & 0x10) ? 28 : 12), \
(((M) & 0x20) ? 29 : 13), \
(((M) & 0x40) ? 30 : 14), \
(((M) & 0x80) ? 31 : 15), \

filcab updated this revision to Diff 9190.May 7 2014, 2:27 PM

Fixed masks in the blend shufflemasks.

filcab updated this revision to Diff 9191.May 7 2014, 2:29 PM

Fixed a typo pointed by Rafael.

rafael accepted this revision.May 7 2014, 3:05 PM
rafael added a reviewer: rafael.

LGTM with a few small requests.

test/CodeGen/avx-builtins.c
118

Write the constant in hex, so it is easier to read.

57 is 0x39. So, the lower 4 bits are 1001. It would probably be test to use a non symmetrical constant in the test.

This revision is now accepted and ready to land.May 7 2014, 3:05 PM

I just realized that one advantage of expanding the _builtins in clang is that it would allow us to remove the redundant ones from llvm in the future. Can you leave a FIXME about it in one of the tests?

filcab closed this revision.May 12 2014, 7:44 PM