This is an archive of the discontinued LLVM Phabricator instance.

[IR] Make {extract,insert}element accept an index of any integer type.
ClosedPublic

Authored by Bigcheese on Apr 25 2014, 7:17 PM.

Details

Summary

Given the following C code llvm currently generates suboptimal code for
x86-64:

m128 bss4( const m128 *ptr, size_t i, size_t j )
{

float f = ptr[i][j];
return (__m128) { f, f, f, f };

}

define <4 x float> @_Z4bss4PKDv4_fmm(<4 x float>* nocapture readonly %ptr, i64 %i, i64 %j) #0 {

%a1 = getelementptr inbounds <4 x float>* %ptr, i64 %i
%a2 = load <4 x float>* %a1, align 16, !tbaa !1
%a3 = trunc i64 %j to i32
%a4 = extractelement <4 x float> %a2, i32 %a3
%a5 = insertelement <4 x float> undef, float %a4, i32 0
%a6 = insertelement <4 x float> %a5, float %a4, i32 1
%a7 = insertelement <4 x float> %a6, float %a4, i32 2
%a8 = insertelement <4 x float> %a7, float %a4, i32 3
ret <4 x float> %a8

}

shlq    $4, %rsi
addq    %rdi, %rsi
movslq  %edx, %rax
vbroadcastss    (%rsi,%rax,4), %xmm0
retq

The movslq is uneeded, but is present because of the trunc to i32 and then
sext back to i64 that the backend adds for vbroadcastss.

We can't remove it because it changes the meaning. The IR that clang
generates is already suboptimal. What clang really should emit is:

%a4 = extractelement <4 x float> %a2, i64 %j

This patch makes that legal. A separate patch will teach clang to do it.

Diff Detail

Event Timeline

Bigcheese updated this revision to Diff 8860.Apr 25 2014, 7:17 PM
Bigcheese retitled this revision from to [IR] Make {extract,insert}element accept an index of any integer type..
Bigcheese updated this object.
Bigcheese edited the test plan for this revision. (Show Details)
Bigcheese added a subscriber: Unknown Object (MLST).
Bigcheese updated this revision to Diff 8938.Apr 29 2014, 12:25 PM

Updated serialization.

I audited all uses of {Insert,Extract}ElementInst and nothing makes assumptions about the type other than the bitcode reader/writer. There are lots of cases of generating a constant index, and these always use i32, but this is fine.

There are no changes needed to selection dag because it is lowered with:

SDValue InIdx = DAG.getSExtOrTrunc(getValue(I.getOperand(1)),
                                   getCurSDLoc(), TLI.getVectorIdxTy());

So the rest of the backend already sees the type it is expecting.

Bigcheese accepted this revision.May 1 2014, 3:22 PM
Bigcheese added a reviewer: Bigcheese.
This revision is now accepted and ready to land.May 1 2014, 3:22 PM
Bigcheese closed this revision.May 1 2014, 3:23 PM