basic_string's constructor calls init which was not getting inlined.
This prevented optimization of const string as  init would appear as a call in between a string's def and use.
Worked in collaboration with Aditya Kumar.
 Differential  D22782  
Added 'inline' attribute to __init to inline the basic_string's constructor Authored by laxmansole on Jul 25 2016, 2:41 PM. 
Details 
 basic_string's constructor calls init which was not getting inlined. Worked in collaboration with Aditya Kumar. 
Diff Detail 
 Event TimelineComment Actions I think we can write a testcase that shows that copy constructors are not optimized away unless the string constructor is inlined. This patch fixes the performance of a proprietary benchmark when compiled with libc++, closing the performance gap with the same benchmark compiled with the gnu libstdc++. Overall with this patch we have half a billion fewer instructions out of about 10 billion. Comment Actions We are looking at the libcxx testsuite, it seems there are only correctness/functionality tests. Do you have any pointers on how to add tests to check if functions are inlined. In llvm it is easy to do because we can print the assembly and use FileCheck/grep to CHECK/CHECK-NOT if a string is present. Comment Actions $ cat foo.cpp int foo(const std::string name);
int bar(){
    return foo("bar");
}$clang++ -S -O3 -fno-exceptions foo.cpp Assembly output without patch: _Z3barv:                                // @_Z3barv
// BB#0:                                // %entry
    sub sp, sp, #64             // =64
    adrp    x1, .L.str
    add x1, x1, :lo12:.L.str
    add x0, sp, #8              // =8
    orr w2, wzr, #0x3
    stp xzr, x19, [sp, #24]     // 8-byte Folded Spill
    stp x29, x30, [sp, #48]     // 8-byte Folded Spill
    add x29, sp, #48            // =48
    stp xzr, xzr, [sp, #8]
    bl  _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcm
    add x0, sp, #8              // =8
    bl  _Z3fooNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE
    and w19, w0, #0x1
    add x0, sp, #8              // =8
    bl  _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEED1Ev
    mov  w0, w19
    ldp x29, x30, [sp, #48]     // 8-byte Folded Reload
    ldr x19, [sp, #32]          // 8-byte Folded Reload
    add sp, sp, #64             // =64
    retAssembly output with patch: _Z3barv:                                // @_Z3barv
// BB#0:                                // %entry
    sub sp, sp, #64             // =64
    orr w8, wzr, #0x6
    mov w9, #114
    mov w10, #24930
    add x0, sp, #8              // =8
    stp xzr, x19, [sp, #24]     // 8-byte Folded Spill
    stp x29, x30, [sp, #48]     // 8-byte Folded Spill
    add x29, sp, #48            // =48
    stp xzr, xzr, [sp, #8]
    strb    w8, [sp, #8]
    strb    w9, [sp, #11]
    sturh   w10, [sp, #9]
    strb    wzr, [sp, #12]
    bl  _Z3fooNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE
    and w19, w0, #0x1
    add x0, sp, #8              // =8
    bl  _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEED1Ev
    mov  w0, w19
    ldp x29, x30, [sp, #48]     // 8-byte Folded Reload
    ldr x19, [sp, #32]          // 8-byte Folded Reload
    add sp, sp, #64             // =64
    retWe can see that call to _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcm get inlined. Comment Actions The change itself LGTM, although we probably want to inline the forward/input iterator __init's as well. However I would like to see a small benchmark that demonstrates the performance change. Please try and write the benchmark using Google Benchmark. Comment Actions Sure, Thanks, Comment Actions Added inline attribute to the forward/input iterator __init's.  Comment Actions I would love to see a benchmark with this, but I've done enough investigating on my own that I *know* this patch is beneficial. Comment Actions This patch was motivated by perf analysis we did on a proprietary benchmark in which we have seen a reduction of about 1 billion instructions (out of 10B) on x86_64-linux and aarch64-linux.  |