This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] Fold strcmp for short string literals
ClosedPublic

Authored by kitaisreal on Jul 7 2023, 9:16 AM.

Diff Detail

Event Timeline

kitaisreal created this revision.Jul 7 2023, 9:16 AM
kitaisreal requested review of this revision.Jul 7 2023, 9:16 AM
nikic requested changes to this revision.Jul 7 2023, 12:25 PM
nikic added inline comments.
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
576

This will return an incorrect result if the first character of Str1 and Str2 is the same, but Str1 has length greater than one.

581

This may perform an out of bounds access for zero-length strings.

630

This is the transform that implements what you're trying to do, but taking dereferenceability into account.

This revision now requires changes to proceed.Jul 7 2023, 12:25 PM
efriedma added inline comments.
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
630

https://github.com/llvm/llvm-project/issues/58003 is requesting that we generate control flow for small strings. For example, transforming strcmp(c, "x") == 0 to c[0] == 'x' && c[1] == '\0'. This is distinct from transforming it to a memcmp. (Of course, the current version of this patch doesn't implement that.)

Added propert implementation with branches.

For such program.

#include <cstring>

bool testFunction1_1(const char* c)
{
    return strcmp(c, "0") == 0;
}

GCC emits:

	.type	_Z15testFunction1_1PKc, @function
_Z15testFunction1_1PKc:
.LFB27:
	.cfi_startproc
	endbr64
	movzbl	(%rdi), %eax
	subl	$48, %eax
	jne	.L2
	movzbl	1(%rdi), %eax
.L2:
	testl	%eax, %eax
	sete	%al
	ret
	.cfi_endproc
.LFE27:
	.size	_Z15testFunction1_1PKc, .-_Z15testFunction1_1PKc
	.p2align 4
	.globl	_Z15testFunction1_2PKc
	.type	_Z15testFunction1_2PKc, @function

After changes clang emits:

	.type	_Z15testFunction1_1PKc,@function
_Z15testFunction1_1PKc:                 # @_Z15testFunction1_1PKc
	.cfi_startproc
# %bb.0:                                # %entry
	movsbl	(%rdi), %eax
	addl	$-48, %eax
	je	.LBB0_1
# %bb.2:                                # %strcmp_fold_sub_join
	testl	%eax, %eax
	sete	%al
	retq
.LBB0_1:                                # %strcmp_fold_sub_is_zero
	movsbl	1(%rdi), %eax
	testl	%eax, %eax
	sete	%al
	retq
.Lfunc_end0:
	.size	_Z15testFunction1_1PKc, .Lfunc_end0-_Z15testFunction1_1PKc
	.cfi_endproc
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
630

I updated implementation for single character str2 constant. If current approach is valid I will implement it for two characters str2 constant as well.
I am not sure if current approach with changing control flow inside this optimization is valid and safe, because it seems that other SimplifyLibCalls optimizations does not change control flow.

kitaisreal added inline comments.Jul 8 2023, 10:33 AM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
576

Updated implementation.

581

Updated implementation.

xbolva00 added a subscriber: xbolva00.EditedJul 8 2023, 11:46 AM

efriedma:

(instcombine specifically won't work; for the sake of compile-time performance, we've forbidden instcombine from changing the cfg.)

Your patch changes it, indeed. (SLC is part of InstCombine).

nikic requested changes to this revision.Jul 8 2023, 12:48 PM

Right, InstCombine is control-flow preserving transform. We don't really have a great place to put such a transform right now, so SimplifyCFG is probably your best bet.

This revision now requires changes to proceed.Jul 8 2023, 12:48 PM
This comment was removed by kitaisreal.
nikic added inline comments.Jul 8 2023, 12:52 PM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
567

You need to guard this behind isOnlyUsedInComparisonWithZero(), as the exact return value of strcmp() is unspecified (only its relation to zero is).

Thanks. I will try to implement it in SimplifyCFG.

Right, InstCombine is control-flow preserving transform. We don't really have a great place to put such a transform right now, so SimplifyCFG is probably your best bet.

I checked SimplifyCFG and if I decide to put transform into this pass, it seems that implementation will look complex and unexpected. It will require to get TargetLibraryInfo in SimplifyCFGPass and pass it to simplifyCFG function in Local.h, but this function is called from a lot of places, and it will be unexpected if this function can optimize library calls.
There is PartiallyInlineLibCallsPass that currently optimizes sqrt function and changes CFG, but it is registered in CodeGenPassBuilder and applied during codegen, but I expect that our optimization need to be registered in PassBuilderPipelines and applied during common IR optimizations.
Can we move this PartiallyInlineLibCallsPass to common IR optimizations, so we can extend it ? Or maybe it is better to create separate pass ?

Maybe AggressiveInstCombine? It currently doesn't contain any CFG modifications, I think, but I don't see any reason they can't be added.

Moved transformation into AggressiveInstCombine

xbolva00 added inline comments.Jul 11 2023, 9:32 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
908 ↗(On Diff #539144)

more like expand?

kitaisreal retitled this revision from [SimplifyLibCalls] Fold strcmp for short string literals to [AggressiveInstCombine] Fold strcmp for short string literals.
kitaisreal edited the summary of this revision. (Show Details)
kitaisreal added a reviewer: xbolva00.
xbolva00 added inline comments.Jul 12 2023, 2:38 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
920 ↗(On Diff #539144)

GCC I think “inlines “ strcmp up to some length, but not strictly 1, no?

Also it can inline (expand) strcmp calls if we know that src/dst are dereferenceable with known size.

Rename fold into expand.

kitaisreal added inline comments.Jul 12 2023, 3:46 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
920 ↗(On Diff #539144)

It expands strcmp if size is 1 or 2 https://godbolt.org/z/jhbToh5Kb.
I plan to add case for string with size 2 in separate patch, because control flow becomes more complicated and it will make patch harder to review.

It would be great to have reusable general algo with N as parameter.

Then it can be used also some memcmp calls…

nikic requested changes to this revision.Jul 12 2023, 12:43 PM

Generally on board with the direction of doing this in AggressiveInstCombine.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
884 ↗(On Diff #539475)

Aren't these conditions already checked by the caller now?

920 ↗(On Diff #539475)

// instead of ///.

928 ↗(On Diff #539475)

We should also handle constant string on the LHS.

942 ↗(On Diff #539475)

As mentioned on the previous review, I believe you need to require that the strcmp result is only used in a comparison with zero. The exact return value of strcmp is unspecified, and we should not optimize to a result that deviates from the used libc implementation.

949 ↗(On Diff #539475)

Why the use of sign extension? The C standard says the comparison is performed on unsigned char, not signed char:

The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp

is determined by the sign of the difference between the values of the first pair of characters (both
interpreted as unsigned char) that differ in the objects being compared.

992 ↗(On Diff #539475)

Why is this relevant?

1003 ↗(On Diff #539475)

Why is this relevant?

1008 ↗(On Diff #539475)

This is only used by expandStrcmp(), I would sink it into there.

llvm/test/Transforms/AggressiveInstCombine/strcmp.ll
2 ↗(On Diff #539475)

TODO (If you write an extra comment, put it below RUN, not above.)

51 ↗(On Diff #539475)

Should also test:

  • Comparison with empty string (not folded, because left to SLC).
  • Comparison with two known strings (not folded, because left to SLC).
  • Comparison with two-char string (not folded for now).
  • Comparison with known string on the LHS rather than RHS.
This revision now requires changes to proceed.Jul 12 2023, 12:43 PM
kitaisreal marked 7 inline comments as done.Jul 13 2023, 7:21 AM
kitaisreal added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
949 ↗(On Diff #539475)

I cast this char to int32 so I can later sub them as signed integers. This looks simillar to standard glibc implementation https://github.com/zerovm/glibc/blob/master/string/strcmp.c#L45C15-L45C15. If we take this standard implementation and look and the IR it will be similar to mine https://godbolt.org/z/Pd9xKcnE8.

992 ↗(On Diff #539475)

I take it from SimplifyLibCalls.cpp. Not sure if this is safe to ignore it for our transformation.

// Then check for known library functions.
if (TLI->getLibFunc(*Callee, Func) && isLibFuncEmittable(M, TLI, Func)) {
  // We never change the calling convention.
  if (!ignoreCallingConv(Func) && !IsCallingConvC)
    return nullptr;
  if (Value *V = optimizeStringMemoryLibCall(CI, Builder))
    return V;
1003 ↗(On Diff #539475)

I take it from InstCombineCalls.cpp. Not sure if this is safe to ignore it for our transformation.

// Skip optimizing notail and musttail calls so
// LibCallSimplifier::optimizeCall doesn't have to preserve those invariants.
// LibCallSimplifier::optimizeCall should try to preseve tail calls though.
if (CI->isMustTailCall() || CI->isNoTailCall())
  return nullptr;
nikic added inline comments.Jul 13 2023, 1:31 PM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
959 ↗(On Diff #540021)

Rather than hardcoding B.getInt32Ty(), it is better to use the original type (CI->getType()). There are platforms where int is not 32-bit.

949 ↗(On Diff #539475)

You can do the subtraction as you do now, but the characters need to be zero-extended. Otherwise you will claim that '\x7f' is larger than '\x80', which is not correct according to strcmp semantics.

992 ↗(On Diff #539475)

We're removing the call entirely, so calling convention doesn't matter.

1003 ↗(On Diff #539475)

Same here. This doesn't matter as the call is removed entirely.

llvm/test/Transforms/AggressiveInstCombine/strcmp.ll
25 ↗(On Diff #540021)

@expand_strcmp_eq_s1

47 ↗(On Diff #540021)

@expand_strcmp_eq_s1_commuted

69 ↗(On Diff #540021)

@expand_strcmp_ne_s1

And so on, to get at least a slightly meaningful test name, rather than just a counter.

108 ↗(On Diff #540021)

It's okay to have just one test with the constant on the left-hand-side. No need to test LHS and RHS for all predicates.

kitaisreal marked 13 inline comments as done.

Updated implementation.

kitaisreal marked 5 inline comments as done.Jul 18 2023, 8:39 AM
goldstein.w.n added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
269 ↗(On Diff #541567)

This should probably have a max iteration cnt?

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
907 ↗(On Diff #541567)

should the sqrt patch be split into a preliminary NFC?

nikic accepted this revision.Jul 18 2023, 11:26 AM

LGTM

This revision is now accepted and ready to land.Jul 18 2023, 11:26 AM
kitaisreal added inline comments.Jul 18 2023, 11:52 AM
llvm/lib/Analysis/ValueTracking.cpp
269 ↗(On Diff #541567)

I am not sure about the requested change. You suggest to add max_iteration_count argument to isOnlyUsedInZeroComparison function, and relace all_of I->users() iteration with loop, so we can limit iteration count ? Can you please provide more information about the requested change.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
907 ↗(On Diff #541567)

Do you propose to extract most of the changes from AggressiveInstCombine, that are not related to expandStrcmp in separate patch ?

goldstein.w.n added inline comments.Jul 18 2023, 1:24 PM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
907 ↗(On Diff #541567)

Yeah, but this has already been approved, so no need to make it a blocker.

kitaisreal marked 2 inline comments as done.Jul 19 2023, 1:16 AM

I do not have commit access. Can you please land this one for me Maksim Kita <kitaetoya@gmail.com> ?

This revision was landed with ongoing or failed builds.Jul 19 2023, 8:12 AM
This revision was automatically updated to reflect the committed changes.
nikic added a comment.Jul 19 2023, 8:15 AM

Done! Please follow https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access for requesting commit access.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
907 ↗(On Diff #541567)

Yeah, this would have been a bit nicer, to make it clear that this doesn't really change anything about the sqrt fold, just moves code around.

asmok-g added a subscriber: asmok-g.Aug 8 2023, 2:45 AM

This is causing a mis-compilation. Repro:

#include <string.h>

#define TOKMAXLEN 10

typedef struct {
  char token[TOKMAXLEN + 1]; /* always NUL-terminated */
} datetkn;

constexpr datetkn TimezoneAbbrevTable[] = {{"z"}};

static const int TimezoneAbbrevTableSize =
    sizeof TimezoneAbbrevTable / sizeof TimezoneAbbrevTable[0];

int main() {
  const char* token = "";
  for (int i = 0; i < TimezoneAbbrevTableSize; ++i) {
    const datetkn& timezone_abbrev = TimezoneAbbrevTable[i];
    if (strcmp("z", token) <= 0) {
      return 10;
    }
    token = timezone_abbrev.token;
  }

  return 0;
}

cmd: clang -O2 ./prog.cc -o ./prog.o

nikic added a comment.Aug 8 2023, 2:49 AM

I believe the problem is that if the constant string is on the LHS, we also need to invert the greater/smaller results. The current code would be correct for equality comparisons, but not for relational comparisons.

This is serious, please fix or revert from release branches.

alexfh added a subscriber: alexfh.Aug 8 2023, 11:50 AM

This is serious, please fix or revert from release branches.

I'll prepare a revert in case @kitaisreal doesn't respond soon.

Here a slightly more convenient reproducer (dumped IR right before the AggressiveInstCombine pass): https://gcc.godbolt.org/z/bGn3EnWG6

This is serious, please fix or revert from release branches.

I'll prepare a revert in case @kitaisreal doesn't respond soon.

Here a slightly more convenient reproducer (dumped IR right before the AggressiveInstCombine pass): https://gcc.godbolt.org/z/bGn3EnWG6

The commit can't be reverted cleanly, I'll have to also revert dependent commits: cbfcf90152de5392a36d0a0241eef25f5e159eef and 5dde755188e34c0ba5304365612904476c8adfda.

This is serious, please fix or revert from release branches.

I'll prepare a revert in case @kitaisreal doesn't respond soon.

Here a slightly more convenient reproducer (dumped IR right before the AggressiveInstCombine pass): https://gcc.godbolt.org/z/bGn3EnWG6

The commit can't be reverted cleanly, I'll have to also revert dependent commits: cbfcf90152de5392a36d0a0241eef25f5e159eef and 5dde755188e34c0ba5304365612904476c8adfda.

Prepared a revert of the three commits here: https://reviews.llvm.org/D157430. I'm running tests now and will land the revert once they finish, unless I hear any objections.