Page MenuHomePhabricator

[RFC] Patch for Bug 18808 - clang -O3 -flto drops Target feature flags
AbandonedPublic

Authored by suyog on Mar 26 2015, 3:41 AM.

Details

Summary

This is regarding bug 18808 where target feature was set at the front end, but not passed to the linker when using -flto flag (Gold linker)

The option string "-plugin-opt=-mattr=feature" was not appended in command generated for linker.

Approach for this patch :

In lib/tools/clang/lib/Driver/Tools.cpp file

  1. Add a new function getTargetFeatureList() to return the list of target features
  2. Modify the getTargetFeature() function to set target-feature separately in clang options
  3. In the AddGoldPlugin() function, retrieve the list of target features and append to "-plugin-opt=-mattr=". Multiple features are appended separated by comma.

No make-check regression observed with the patch.

Test case mentioned in 18808 :

$ cat test.c

#include <stdint.h>
#include <stdlib.h>

typedef uint64_t avxreg attribute((ext_vector_type(4)));

int main(int argc, char **argv) {
avxreg x=argc,y=argc;
for (int i=0; i<argc; i++) {

		int foo = atoi(argv[i]);
		x = y + (avxreg)foo;
		y = x;

}
return x.x;
}

Assembly generated before patch :

$ clang -O3 -flto -mavx2 test.c -o test
$ objdump -d test > 1.s
$ cat 1.s

00000000004005d0 <main>:

4005d0:       55                      push   %rbp
4005d1:       53                      push   %rbx
4005d2:       48 83 ec 28             sub    $0x28,%rsp
4005d6:       48 89 f3                mov    %rsi,%rbx
4005d9:       89 fd                   mov    %edi,%ebp
4005db:       48 63 c5                movslq %ebp,%rax
4005de:       66 48 0f 6e c0          movq   %rax,%xmm0
4005e3:       66 0f 70 c0 44          pshufd $0x44,%xmm0,%xmm0
4005e8:       85 c0                   test   %eax,%eax
4005ea:       7e 51                   jle    40063d <main+0x6d>
4005ec:       66 0f 6f d0             movdqa %xmm0,%xmm2
4005f0:       66 0f 7f 14 24          movdqa %xmm2,(%rsp)
4005f5:       66 0f 7f 44 24 10       movdqa %xmm0,0x10(%rsp)
4005fb:       48 8b 3b                mov    (%rbx),%rdi
4005fe:       31 f6                   xor    %esi,%esi
400600:       ba 0a 00 00 00          mov    $0xa,%edx
400605:       e8 c6 fe ff ff          callq  4004d0 <strtol@plt>
40060a:       66 0f 6f 14 24          movdqa (%rsp),%xmm2
40060f:       48 98                   cltq
400611:       66 48 0f 6e c0          movq   %rax,%xmm0
400616:       66 0f 70 c0 44          pshufd $0x44,%xmm0,%xmm0
40061b:       66 0f d4 d0             paddq  %xmm0,%xmm2
40061f:       66 0f 6f 4c 24 10       movdqa 0x10(%rsp),%xmm1
400625:       66 0f d4 c8             paddq  %xmm0,%xmm1
400629:       66 0f 7f 4c 24 10       movdqa %xmm1,0x10(%rsp)
40062f:       66 0f 6f 44 24 10       movdqa 0x10(%rsp),%xmm0
400635:       48 83 c3 08             add    $0x8,%rbx
400639:       ff cd                   dec    %ebp
40063b:       75 b3                   jne    4005f0 <main+0x20>
40063d:       66 48 0f 7e c0          movq   %xmm0,%rax
400642:       48 83 c4 28             add    $0x28,%rsp
400646:       5b                      pop    %rbx
400647:       5d                      pop    %rbp
400648:       c3                      retq
400649:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

Assembly generated after patch :

$ clang -O3 -flto -mavx2 test.c -o test
$ objdump -d test > 2.s
$ cat 2.s

0000000004005d0 <main>:

4005d0:       55                      push   %rbp
4005d1:       53                      push   %rbx
4005d2:       48 83 ec 28             sub    $0x28,%rsp
4005d6:       48 89 f3                mov    %rsi,%rbx
4005d9:       89 fd                   mov    %edi,%ebp
4005db:       48 63 c5                movslq %ebp,%rax
4005de:       c4 e1 f9 6e c0          vmovq  %rax,%xmm0
4005e3:       c4 e2 7d 19 c0          vbroadcastsd %xmm0,%ymm0
4005e8:       85 c0                   test   %eax,%eax
4005ea:       7e 42                   jle    40062e <main+0x5e>
4005ec:       0f 1f 40 00             nopl   0x0(%rax)
4005f0:       c5 fc 11 04 24          vmovups %ymm0,(%rsp)
4005f5:       48 8b 3b                mov    (%rbx),%rdi
4005f8:       31 f6                   xor    %esi,%esi
4005fa:       ba 0a 00 00 00          mov    $0xa,%edx
4005ff:       c5 f8 77                vzeroupper
400602:       e8 c9 fe ff ff          callq  4004d0 <strtol@plt>
400607:       48 98                   cltq
400609:       c4 e1 f9 6e c0          vmovq  %rax,%xmm0
40060e:       c4 e2 7d 59 c0          vpbroadcastq %xmm0,%ymm0
400613:       c5 fe 6f 0c 24          vmovdqu (%rsp),%ymm1
400618:       c5 fd d4 c9             vpaddq %ymm1,%ymm0,%ymm1
40061c:       c5 fe 7f 0c 24          vmovdqu %ymm1,(%rsp)
400621:       c5 fc 10 04 24          vmovups (%rsp),%ymm0
400626:       48 83 c3 08             add    $0x8,%rbx
40062a:       ff cd                   dec    %ebp
40062c:       75 c2                   jne    4005f0 <main+0x20>
40062e:       c4 e1 f9 7e c0          vmovq  %xmm0,%rax
400633:       48 83 c4 28             add    $0x28,%rsp
400637:       5b                      pop    %rbx
400638:       5d                      pop    %rbp
400639:       c5 f8 77                vzeroupper
40063c:       c3                      retq
40063d:       0f 1f 00                nopl   (%rax)

Visible clearly from above that avx instructions are generated after applying the patch.

Can some one please help in reviewing this patch?
Also, can someone help in forming a test case for this patch? I couldn't figure out how to write test case for checking the generation of avx instructions.

Regards,
Suyog

Diff Detail

Repository
rL LLVM

Event Timeline

suyog updated this revision to Diff 22705.Mar 26 2015, 3:41 AM
suyog retitled this revision from to [RFC] Patch for Bug 18808 - clang -O3 -flto drops Target feature flags.
suyog updated this object.
suyog edited the test plan for this revision. (Show Details)
suyog added reviewers: void, dexonsmith, samsonov, rafael.
suyog set the repository for this revision to rL LLVM.
suyog added a subscriber: Unknown Object (MLST).

Eric - this sounds related to your recent work. I assume we should be
producing the same asm, but not by passing any flags to the linker, instead
by using function attributes?

Very much so. Right now, this works without doing this:

dzur:~/tmp> cat bar.c
#include <stdint.h>
#include <stdlib.h>

typedef uint64_t avxreg attribute((ext_vector_type(4)));

int main(int argc, char **argv) {

avxreg x=argc,y=argc;
for (int i=0; i<argc; i++) {
        int foo = atoi(argv[i]);
        x = y + (avxreg)foo;
        y = x;
}
return x.x;

}
dzur:~/tmp> ~/builds/build-llvm/Debug+Asserts/bin/clang -O3 -flto -mavx2
bar.c
dzur:~/tmp> llvm-objdump -disassemble a.out | more

main:

400520:       55      pushq   %rbp
400521:       53      pushq   %rbx
400522:       48 83 ec 28     subq    $40, %rsp
400526:       48 89 f3        movq    %rsi, %rbx
400529:       89 fd   movl    %edi, %ebp
40052b:       48 63 c5        movslq  %ebp, %rax
40052e:       c4 e1 f9 6e c0  vmovq   %rax, %xmm0
400533:       c4 e2 7d 19 c0  vbroadcastsd    %xmm0, %ymm0
400538:       85 c0   testl   %eax, %eax
40053a:       7e 42   jle     66
40053c:       0f 1f 40 00     nopl    (%rax)
400540:       c5 fc 11 04 24  vmovups %ymm0, (%rsp)
400545:       48 8b 3b        movq    (%rbx), %rdi
400548:       31 f6   xorl    %esi, %esi
40054a:       ba 0a 00 00 00  movl    $10, %edx
40054f:       c5 f8 77        vzeroupper
400552:       e8 b9 fe ff ff  callq   -327
400557:       48 98   cltq
400559:       c4 e1 f9 6e c0  vmovq   %rax, %xmm0
40055e:       c4 e2 7d 59 c0  vpbroadcastq    %xmm0, %ymm0
400563:       c5 fe 6f 0c 24  vmovdqu (%rsp), %ymm1
400568:       c5 fd d4 c9     vpaddq  %ymm1, %ymm0, %ymm1
40056c:       c5 fe 7f 0c 24  vmovdqu %ymm1, (%rsp)
400571:       c5 fc 10 04 24  vmovups (%rsp), %ymm0
400576:       48 83 c3 08     addq    $8, %rbx
40057a:       ff cd   decl    %ebp
40057c:       75 c2   jne     -62
40057e:       c4 e1 f9 7e c0  vmovq   %xmm0, %rax
400583:       48 83 c4 28     addq    $40, %rsp
400587:       5b      popq    %rbx
400588:       5d      popq    %rbp
400589:       c5 f8 77        vzeroupper
40058c:       c3      retq
40058d:       0f 1f 00        nopl    (%rax)

which should be precisely what's expected.

If you look at the IR generated from clang:

dzur:~/tmp> ~/builds/build-llvm/Debug+Asserts/bin/clang -O3 -flto -mavx2
bar.c -S -o -
; ModuleID = 'bar.c'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: nounwind uwtable
define i32 @main(i32 %argc, i8** nocapture readonly %argv) #0 {
entry:

%conv = sext i32 %argc to i64
%splat.splatinsert = insertelement <4 x i64> undef, i64 %conv, i32 0
%splat.splat = shufflevector <4 x i64> %splat.splatinsert, <4 x i64>

undef, <4 x i32> zeroinitializer

%cmp14 = icmp sgt i32 %argc, 0
br i1 %cmp14, label %for.body.preheader, label %for.end

for.body.preheader: ; preds = %entry

br label %for.body

for.body: ; preds =
%for.body.preheader, %for.body

%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0,

%for.body.preheader ]

%y.015 = phi <4 x i64> [ %add, %for.body ], [ %splat.splat,

%for.body.preheader ]

%arrayidx = getelementptr inbounds i8*, i8** %argv, i64 %indvars.iv
%0 = load i8*, i8** %arrayidx, align 8, !tbaa !1
%call.i = tail call i64 @strtol(i8* nocapture %0, i8** null, i32 10) #2
%sext = shl i64 %call.i, 32
%conv5 = ashr exact i64 %sext, 32
%splat.splatinsert6 = insertelement <4 x i64> undef, i64 %conv5, i32 0
%splat.splat7 = shufflevector <4 x i64> %splat.splatinsert6, <4 x i64>

undef, <4 x i32> zeroinitializer

%add = add <4 x i64> %splat.splat7, %y.015
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, %argc
br i1 %exitcond, label %for.end.loopexit, label %for.body

for.end.loopexit: ; preds = %for.body

%add.lcssa = phi <4 x i64> [ %add, %for.body ]
br label %for.end

for.end: ; preds =
%for.end.loopexit, %entry

%y.0.lcssa = phi <4 x i64> [ %splat.splat, %entry ], [ %add.lcssa,

%for.end.loopexit ]

%1 = extractelement <4 x i64> %y.0.lcssa, i64 0
%conv8 = trunc i64 %1 to i32
ret i32 %conv8

}

; Function Attrs: nounwind
declare i64 @strtol(i8* readonly, i8** nocapture, i32) #1

attributes #0 = { nounwind uwtable "less-precise-fpmad"="false"
"no-frame-pointer-elim"="false" "no-infs-fp-math"="false"
"no-nans-fp-math"="false" "stack-protector-buffer-size"="8"
"target-cpu"="x86-64"
"target-features"="+sse4.2,+avx2,+ssse3,+sse3,+sse,+sse2,+sse4.1,+avx,+popcnt"
"unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind "less-precise-fpmad"="false"
"no-frame-pointer-elim"="false" "no-infs-fp-math"="false"
"no-nans-fp-math"="false" "stack-protector-buffer-size"="8"
"target-cpu"="x86-64"
"target-features"="+sse4.2,+avx2,+ssse3,+sse3,+sse,+sse2,+sse4.1,+avx,+popcnt"
"unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }

!llvm.ident = !{!0}

!0 = !{!"clang version 3.7.0 (trunk 233238) (llvm/trunk 233240)"}
!1 = !{!2, !2, i64 0}
!2 = !{!"any pointer", !3, i64 0}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C/C++ TBAA"}

You'll see that the target-features for the function are set to the
appropriate level of avx.

I'll take a look at the PR to make sure we can just close it.

-eric

I'll take a look at the PR to make sure we can just close it.

So the PR was also about adding options at LTO time to fix code generation.
I don't think this is the right way to do it, I think I'd prefer to use the
clang machinery rather than -mattr etc. (Which you can do via command line
option if the gold plugin calls cl::ParseCommandLineOptions I think...)

-eric

-eric

suyog added a comment.Mar 30 2015, 3:54 AM

Hi David, Eric,

Thanks a lot for looking into this.

Was this a recent commit? If so, can you point me to the CL (unable to exactly find)?

By the way, i am also getting the same output as Eric with latest revision.

Thanks a lot again. :)

Regards,
Suyog

The original set of patches was clang r232888 and llvm r232885. The clang
one was reverted and reapplied after fixes for sse4 in r233227.

-eric

suyog abandoned this revision.Mar 30 2015, 11:36 AM

Thanks Eric for pointing out the CL's.

Abandoning this revision.