This change optimizes the assign() methods for string where either the contents or lengths are compile time known constants. For small strings (< min_cap) we can execute the assignment entirely inline. For strings up to 128 bytes we allow the compiler to efficiently inline the copy operation after we call the offline __resize<>() method. Short / long branches are taken at the call site for better branch prediction and allowing FDO optimizations.
Benchmarks (unstable / google perflab):
name old time/op new time/op delta BM_StringAssignAsciiz_Empty_Opaque 5.69ns ± 7% 5.97ns ± 7% ~ (p=0.056 n=5+5) BM_StringAssignAsciiz_Empty_Transparent 5.39ns ± 7% 0.79ns ± 8% -85.36% (p=0.008 n=5+5) BM_StringAssignAsciiz_Small_Opaque 11.2ns ± 5% 11.0ns ± 6% ~ (p=0.548 n=5+5) BM_StringAssignAsciiz_Small_Transparent 10.1ns ± 7% 1.0ns ± 8% -89.76% (p=0.008 n=5+5) BM_StringAssignAsciiz_Large_Opaque 23.5ns ± 7% 23.8ns ± 7% ~ (p=0.841 n=5+5) BM_StringAssignAsciiz_Large_Transparent 21.4ns ± 7% 12.7ns ± 7% -40.83% (p=0.008 n=5+5) BM_StringAssignAsciiz_Huge_Opaque 336ns ± 4% 327ns ± 7% ~ (p=0.421 n=5+5) BM_StringAssignAsciiz_Huge_Transparent 331ns ± 5% 324ns ± 7% ~ (p=0.548 n=5+5) BM_StringAssignAsciizMix_Opaque 13.6ns ±10% 13.7ns ± 9% ~ (p=0.690 n=5+5) BM_StringAssignAsciizMix_Transparent 12.9ns ± 8% 3.6ns ± 8% -71.82% (p=0.008 n=5+5)
It used to be __n <= ..., now it's __n < ... -- what am I missing?