This change uses __has_builtin_constant_p if supported to check for compile time known short inputs, and directly inlines the assignment. The compiler will elide the outlined or inlined code as needed.
Outlined calls have been placed in __assign_external(...) calls. Combined with ABI changes for unstable (inline assign, outline assign_external), this results in considerable speed ups.
Before
----------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------- BM_StringAssignAsciiz_Empty_Opaque 6.83 ns 6.83 ns 100196352 BM_StringAssignAsciiz_Empty_Transparent 6.57 ns 6.58 ns 105873408 BM_StringAssignAsciiz_Small_Opaque 8.47 ns 8.47 ns 82526208 BM_StringAssignAsciiz_Small_Transparent 8.25 ns 8.25 ns 84996096 BM_StringAssignAsciiz_Large_Opaque 19.0 ns 19.0 ns 36720640 BM_StringAssignAsciiz_Large_Transparent 18.5 ns 18.5 ns 37986304 BM_StringAssignAsciiz_Huge_Opaque 1683 ns 1683 ns 401408 BM_StringAssignAsciiz_Huge_Transparent 1680 ns 1679 ns 397312 BM_StringAssignAsciizMix_Opaque 11.0 ns 11.0 ns 63692800 BM_StringAssignAsciizMix_Transparent 11.2 ns 11.1 ns 62808064
After with ABI change
----------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------- BM_StringAssignAsciiz_Empty_Opaque 6.74 ns 6.75 ns 100999168 BM_StringAssignAsciiz_Empty_Transparent 0.863 ns 0.864 ns 809365504 BM_StringAssignAsciiz_Small_Opaque 8.53 ns 8.54 ns 81907712 BM_StringAssignAsciiz_Small_Transparent 1.15 ns 1.15 ns 605679616 BM_StringAssignAsciiz_Large_Opaque 18.9 ns 18.9 ns 37072896 BM_StringAssignAsciiz_Large_Transparent 18.6 ns 18.6 ns 37535744 BM_StringAssignAsciiz_Huge_Opaque 1687 ns 1687 ns 405504 BM_StringAssignAsciiz_Huge_Transparent 1690 ns 1690 ns 409600 BM_StringAssignAsciizMix_Opaque 10.9 ns 10.9 ns 64319488 BM_StringAssignAsciizMix_Transparent 5.31 ns 5.31 ns 131194880
Inlined code for assign empty string
void AssignEmpty(std::string*p) { p->assign(""); } AssignEmpty: test byte ptr [rdi], 1 jne .LBB1_2 mov byte ptr [rdi], 0 add rdi, 1 mov byte ptr [rdi], 0 ret .LBB1_2: mov rax, qword ptr [rdi + 16] mov qword ptr [rdi + 8], 0 mov byte ptr [rax], 0 ret
Inlined code for assign short string
void Assign(std::string*p) { p->assign("Hello world"); } Assign: test byte ptr [rdi], 1 jne .LBB0_2 mov byte ptr [rdi], 22 add rdi, 1 jmp .LBB0_3 .LBB0_2: mov rax, qword ptr [rdi + 16] mov qword ptr [rdi + 8], 11 mov rdi, rax .LBB0_3: movabs rax, 8031924123371070792 mov qword ptr [rdi], rax mov dword ptr [rdi + 7], 1684828783 mov byte ptr [rdi + 11], 0 ret