This is an archive of the discontinued LLVM Phabricator instance.

Added 'inline' attribute to __init to inline the basic_string's constructor
ClosedPublic

Authored by laxmansole on Jul 25 2016, 2:41 PM.

Details

Summary

basic_string's constructor calls init which was not getting inlined.
This prevented optimization of const string as init would appear as a call in between a string's def and use.

Worked in collaboration with Aditya Kumar.

Diff Detail

Repository
rL LLVM

Event Timeline

laxmansole updated this revision to Diff 65424.Jul 25 2016, 2:41 PM
laxmansole retitled this revision from to Added 'inline' attribute to __init to inline the basic_string's constructor.
laxmansole updated this object.
laxmansole updated this object.
laxmansole added subscribers: hiraditya, sebpop, laxmansole and 3 others.
laxmansole updated this object.Jul 25 2016, 2:44 PM
laxmansole updated this object.
laxmansole updated this object.
majnemer edited subscribers, added: cfe-commits; removed: llvm-commits.Jul 25 2016, 2:49 PM
mclow.lists edited edge metadata.Jul 25 2016, 3:13 PM

Do we have a test for the problem that this is solving?

Do we have a test for the problem that this is solving?

I think we can write a testcase that shows that copy constructors are not optimized away unless the string constructor is inlined.

This patch fixes the performance of a proprietary benchmark when compiled with libc++, closing the performance gap with the same benchmark compiled with the gnu libstdc++. Overall with this patch we have half a billion fewer instructions out of about 10 billion.

Do we have a test for the problem that this is solving?

We are looking at the libcxx testsuite, it seems there are only correctness/functionality tests. Do you have any pointers on how to add tests to check if functions are inlined. In llvm it is easy to do because we can print the assembly and use FileCheck/grep to CHECK/CHECK-NOT if a string is present.

laxmansole added a comment.EditedJul 26 2016, 3:28 PM

Do we have a test for the problem that this is solving?

$ cat foo.cpp

int foo(const std::string name);
int bar(){
    return foo("bar");
}

$clang++ -S -O3 -fno-exceptions foo.cpp

Assembly output without patch:

_Z3barv:                                // @_Z3barv
// BB#0:                                // %entry
    sub sp, sp, #64             // =64
    adrp    x1, .L.str
    add x1, x1, :lo12:.L.str
    add x0, sp, #8              // =8
    orr w2, wzr, #0x3
    stp xzr, x19, [sp, #24]     // 8-byte Folded Spill
    stp x29, x30, [sp, #48]     // 8-byte Folded Spill
    add x29, sp, #48            // =48
    stp xzr, xzr, [sp, #8]
    bl  _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcm
    add x0, sp, #8              // =8
    bl  _Z3fooNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE
    and w19, w0, #0x1
    add x0, sp, #8              // =8
    bl  _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEED1Ev
    mov  w0, w19
    ldp x29, x30, [sp, #48]     // 8-byte Folded Reload
    ldr x19, [sp, #32]          // 8-byte Folded Reload
    add sp, sp, #64             // =64
    ret

Assembly output with patch:

_Z3barv:                                // @_Z3barv
// BB#0:                                // %entry
    sub sp, sp, #64             // =64
    orr w8, wzr, #0x6
    mov w9, #114
    mov w10, #24930
    add x0, sp, #8              // =8
    stp xzr, x19, [sp, #24]     // 8-byte Folded Spill
    stp x29, x30, [sp, #48]     // 8-byte Folded Spill
    add x29, sp, #48            // =48
    stp xzr, xzr, [sp, #8]
    strb    w8, [sp, #8]
    strb    w9, [sp, #11]
    sturh   w10, [sp, #9]
    strb    wzr, [sp, #12]
    bl  _Z3fooNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE
    and w19, w0, #0x1
    add x0, sp, #8              // =8
    bl  _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEED1Ev
    mov  w0, w19
    ldp x29, x30, [sp, #48]     // 8-byte Folded Reload
    ldr x19, [sp, #32]          // 8-byte Folded Reload
    add sp, sp, #64             // =64
    ret

We can see that call to

_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcm

get inlined.

EricWF added a subscriber: EricWF.Aug 2 2016, 10:33 PM

The change itself LGTM, although we probably want to inline the forward/input iterator __init's as well.

However I would like to see a small benchmark that demonstrates the performance change. Please try and write the benchmark using Google Benchmark.
Some helpful links:

The change itself LGTM, although we probably want to inline the forward/input iterator __init's as well.

However I would like to see a small benchmark that demonstrates the performance change. Please try and write the benchmark using Google Benchmark.
Some helpful links:

Sure,
We'll come up with a synthetic benchmark to expose performance improvements.

Thanks,

laxmansole updated this revision to Diff 67597.Aug 10 2016, 1:59 PM
laxmansole edited edge metadata.

Added inline attribute to the forward/input iterator __init's.
Thanks @EricWF for suggestion.

EricWF accepted this revision.Aug 11 2016, 1:19 AM
EricWF added a reviewer: EricWF.

I would love to see a benchmark with this, but I've done enough investigating on my own that I *know* this patch is beneficial.

This revision is now accepted and ready to land.Aug 11 2016, 1:19 AM
This revision was automatically updated to reflect the committed changes.

I would love to see a benchmark with this, but I've done enough investigating on my own that I *know* this patch is beneficial.

This patch was motivated by perf analysis we did on a proprietary benchmark in which we have seen a reduction of about 1 billion instructions (out of 10B) on x86_64-linux and aarch64-linux.