Fold strcmp for short string literals
https://github.com/llvm/llvm-project/issues/58003
Depends D155053.
Details
Diff Detail
Event Timeline
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|
576 | This will return an incorrect result if the first character of Str1 and Str2 is the same, but Str1 has length greater than one. | |
581 | This may perform an out of bounds access for zero-length strings. | |
630 | This is the transform that implements what you're trying to do, but taking dereferenceability into account. |
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|
630 | https://github.com/llvm/llvm-project/issues/58003 is requesting that we generate control flow for small strings. For example, transforming strcmp(c, "x") == 0 to c[0] == 'x' && c[1] == '\0'. This is distinct from transforming it to a memcmp. (Of course, the current version of this patch doesn't implement that.) |
For such program.
#include <cstring> bool testFunction1_1(const char* c) { return strcmp(c, "0") == 0; }
GCC emits:
.type _Z15testFunction1_1PKc, @function _Z15testFunction1_1PKc: .LFB27: .cfi_startproc endbr64 movzbl (%rdi), %eax subl $48, %eax jne .L2 movzbl 1(%rdi), %eax .L2: testl %eax, %eax sete %al ret .cfi_endproc .LFE27: .size _Z15testFunction1_1PKc, .-_Z15testFunction1_1PKc .p2align 4 .globl _Z15testFunction1_2PKc .type _Z15testFunction1_2PKc, @function
After changes clang emits:
.type _Z15testFunction1_1PKc,@function _Z15testFunction1_1PKc: # @_Z15testFunction1_1PKc .cfi_startproc # %bb.0: # %entry movsbl (%rdi), %eax addl $-48, %eax je .LBB0_1 # %bb.2: # %strcmp_fold_sub_join testl %eax, %eax sete %al retq .LBB0_1: # %strcmp_fold_sub_is_zero movsbl 1(%rdi), %eax testl %eax, %eax sete %al retq .Lfunc_end0: .size _Z15testFunction1_1PKc, .Lfunc_end0-_Z15testFunction1_1PKc .cfi_endproc
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|
630 | I updated implementation for single character str2 constant. If current approach is valid I will implement it for two characters str2 constant as well. |
efriedma:
(instcombine specifically won't work; for the sake of compile-time performance, we've forbidden instcombine from changing the cfg.)
Your patch changes it, indeed. (SLC is part of InstCombine).
Right, InstCombine is control-flow preserving transform. We don't really have a great place to put such a transform right now, so SimplifyCFG is probably your best bet.
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp | ||
---|---|---|
567 | You need to guard this behind isOnlyUsedInComparisonWithZero(), as the exact return value of strcmp() is unspecified (only its relation to zero is). |
I checked SimplifyCFG and if I decide to put transform into this pass, it seems that implementation will look complex and unexpected. It will require to get TargetLibraryInfo in SimplifyCFGPass and pass it to simplifyCFG function in Local.h, but this function is called from a lot of places, and it will be unexpected if this function can optimize library calls.
There is PartiallyInlineLibCallsPass that currently optimizes sqrt function and changes CFG, but it is registered in CodeGenPassBuilder and applied during codegen, but I expect that our optimization need to be registered in PassBuilderPipelines and applied during common IR optimizations.
Can we move this PartiallyInlineLibCallsPass to common IR optimizations, so we can extend it ? Or maybe it is better to create separate pass ?
Maybe AggressiveInstCombine? It currently doesn't contain any CFG modifications, I think, but I don't see any reason they can't be added.
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp | ||
---|---|---|
908 ↗ | (On Diff #539144) | more like expand? |
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp | ||
---|---|---|
920 ↗ | (On Diff #539144) | GCC I think “inlines “ strcmp up to some length, but not strictly 1, no? Also it can inline (expand) strcmp calls if we know that src/dst are dereferenceable with known size. |
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp | ||
---|---|---|
920 ↗ | (On Diff #539144) | It expands strcmp if size is 1 or 2 https://godbolt.org/z/jhbToh5Kb. |
It would be great to have reusable general algo with N as parameter.
Then it can be used also some memcmp calls…
Generally on board with the direction of doing this in AggressiveInstCombine.
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp | ||
---|---|---|
884 ↗ | (On Diff #539475) | Aren't these conditions already checked by the caller now? |
920 ↗ | (On Diff #539475) | // instead of ///. |
928 ↗ | (On Diff #539475) | We should also handle constant string on the LHS. |
942 ↗ | (On Diff #539475) | As mentioned on the previous review, I believe you need to require that the strcmp result is only used in a comparison with zero. The exact return value of strcmp is unspecified, and we should not optimize to a result that deviates from the used libc implementation. |
949 ↗ | (On Diff #539475) | Why the use of sign extension? The C standard says the comparison is performed on unsigned char, not signed char:
is determined by the sign of the difference between the values of the first pair of characters (both |
992 ↗ | (On Diff #539475) | Why is this relevant? |
1003 ↗ | (On Diff #539475) | Why is this relevant? |
1008 ↗ | (On Diff #539475) | This is only used by expandStrcmp(), I would sink it into there. |
llvm/test/Transforms/AggressiveInstCombine/strcmp.ll | ||
2 ↗ | (On Diff #539475) | TODO (If you write an extra comment, put it below RUN, not above.) |
51 ↗ | (On Diff #539475) | Should also test:
|
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp | ||
---|---|---|
949 ↗ | (On Diff #539475) | I cast this char to int32 so I can later sub them as signed integers. This looks simillar to standard glibc implementation https://github.com/zerovm/glibc/blob/master/string/strcmp.c#L45C15-L45C15. If we take this standard implementation and look and the IR it will be similar to mine https://godbolt.org/z/Pd9xKcnE8. |
992 ↗ | (On Diff #539475) | I take it from SimplifyLibCalls.cpp. Not sure if this is safe to ignore it for our transformation. // Then check for known library functions. if (TLI->getLibFunc(*Callee, Func) && isLibFuncEmittable(M, TLI, Func)) { // We never change the calling convention. if (!ignoreCallingConv(Func) && !IsCallingConvC) return nullptr; if (Value *V = optimizeStringMemoryLibCall(CI, Builder)) return V; |
1003 ↗ | (On Diff #539475) | I take it from InstCombineCalls.cpp. Not sure if this is safe to ignore it for our transformation. // Skip optimizing notail and musttail calls so // LibCallSimplifier::optimizeCall doesn't have to preserve those invariants. // LibCallSimplifier::optimizeCall should try to preseve tail calls though. if (CI->isMustTailCall() || CI->isNoTailCall()) return nullptr; |
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp | ||
---|---|---|
959 ↗ | (On Diff #540021) | Rather than hardcoding B.getInt32Ty(), it is better to use the original type (CI->getType()). There are platforms where int is not 32-bit. |
949 ↗ | (On Diff #539475) | You can do the subtraction as you do now, but the characters need to be zero-extended. Otherwise you will claim that '\x7f' is larger than '\x80', which is not correct according to strcmp semantics. |
992 ↗ | (On Diff #539475) | We're removing the call entirely, so calling convention doesn't matter. |
1003 ↗ | (On Diff #539475) | Same here. This doesn't matter as the call is removed entirely. |
llvm/test/Transforms/AggressiveInstCombine/strcmp.ll | ||
25 ↗ | (On Diff #540021) | @expand_strcmp_eq_s1 |
47 ↗ | (On Diff #540021) | @expand_strcmp_eq_s1_commuted |
69 ↗ | (On Diff #540021) | @expand_strcmp_ne_s1 And so on, to get at least a slightly meaningful test name, rather than just a counter. |
108 ↗ | (On Diff #540021) | It's okay to have just one test with the constant on the left-hand-side. No need to test LHS and RHS for all predicates. |
llvm/lib/Analysis/ValueTracking.cpp | ||
---|---|---|
269 ↗ | (On Diff #541567) | I am not sure about the requested change. You suggest to add max_iteration_count argument to isOnlyUsedInZeroComparison function, and relace all_of I->users() iteration with loop, so we can limit iteration count ? Can you please provide more information about the requested change. |
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp | ||
907 ↗ | (On Diff #541567) | Do you propose to extract most of the changes from AggressiveInstCombine, that are not related to expandStrcmp in separate patch ? |
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp | ||
---|---|---|
907 ↗ | (On Diff #541567) | Yeah, but this has already been approved, so no need to make it a blocker. |
I do not have commit access. Can you please land this one for me Maksim Kita <kitaetoya@gmail.com> ?
Done! Please follow https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access for requesting commit access.
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp | ||
---|---|---|
907 ↗ | (On Diff #541567) | Yeah, this would have been a bit nicer, to make it clear that this doesn't really change anything about the sqrt fold, just moves code around. |
This is causing a mis-compilation. Repro:
#include <string.h> #define TOKMAXLEN 10 typedef struct { char token[TOKMAXLEN + 1]; /* always NUL-terminated */ } datetkn; constexpr datetkn TimezoneAbbrevTable[] = {{"z"}}; static const int TimezoneAbbrevTableSize = sizeof TimezoneAbbrevTable / sizeof TimezoneAbbrevTable[0]; int main() { const char* token = ""; for (int i = 0; i < TimezoneAbbrevTableSize; ++i) { const datetkn& timezone_abbrev = TimezoneAbbrevTable[i]; if (strcmp("z", token) <= 0) { return 10; } token = timezone_abbrev.token; } return 0; }
cmd: clang -O2 ./prog.cc -o ./prog.o
I believe the problem is that if the constant string is on the LHS, we also need to invert the greater/smaller results. The current code would be correct for equality comparisons, but not for relational comparisons.
I'll prepare a revert in case @kitaisreal doesn't respond soon.
Here a slightly more convenient reproducer (dumped IR right before the AggressiveInstCombine pass): https://gcc.godbolt.org/z/bGn3EnWG6
The commit can't be reverted cleanly, I'll have to also revert dependent commits: cbfcf90152de5392a36d0a0241eef25f5e159eef and 5dde755188e34c0ba5304365612904476c8adfda.
Prepared a revert of the three commits here: https://reviews.llvm.org/D157430. I'm running tests now and will land the revert once they finish, unless I hear any objections.
You need to guard this behind isOnlyUsedInComparisonWithZero(), as the exact return value of strcmp() is unspecified (only its relation to zero is).