This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Transform str(n)cmp to memcmp
ClosedPublic

Authored by xbolva00 on Aug 3 2018, 2:22 AM.

Details

Summary

Motivation examples:
int strcmp_memcmp() {

char buf[12];
return strcmp(buf, "key") == 0;

}

int strcmp_memcmp2() {

char buf[12];
return strcmp(buf, "key") != 0;

}

int strncmp_memcmp() {

char buf[12];
return strncmp(buf, "key", 3) == 0;

}

can be turned to memcmp.

See test file for more cases.

Diff Detail

Repository
rL LLVM

Event Timeline

xbolva00 created this revision.Aug 3 2018, 2:22 AM
xbolva00 retitled this revision from [InstCombine] Transnform str(n)cmp to memcmp to [InstCombine] Transform str(n)cmp to memcmp.Aug 3 2018, 2:23 AM
xbolva00 updated this revision to Diff 158991.Aug 3 2018, 5:50 AM
  • More negative tests
xbolva00 updated this revision to Diff 159017.Aug 3 2018, 8:26 AM
  • removed useless parameter
xbolva00 updated this revision to Diff 159026.EditedAug 3 2018, 9:24 AM
  • Added tests to check if the result of strcmp calls is used more than once

Tests ok now? May I precommit them?

xbolva00 updated this revision to Diff 159027.Aug 3 2018, 9:26 AM

Please fix the tests so they don't strcmp against uninitialized memory. Probably the simplest way to do that is use an argument instead of an alloca.

lib/Transforms/Utils/SimplifyLibCalls.cpp
155 ↗(On Diff #159027)

Not sure why this check is relevant; memcmp should return the same value as strcmp.

164 ↗(On Diff #159027)

I don't see how this proves anything... just because you have a pointer of type [N x i8] * doesn't imply it's actually backed by N bytes of dereferenceable memory. Please just use llvm::isDereferenceableAndAlignedPointer.

xbolva00 added inline comments.Aug 3 2018, 12:44 PM
lib/Transforms/Utils/SimplifyLibCalls.cpp
155 ↗(On Diff #159027)

\0 in the middle of string?
https://stackoverflow.com/questions/13095513/what-is-the-difference-between-memcmp-strcmp-and-strncmp-in-c

I think the check is probably right here..

164 ↗(On Diff #159027)

Ok, thanks

efriedma added inline comments.Aug 3 2018, 1:03 PM
lib/Transforms/Utils/SimplifyLibCalls.cpp
155 ↗(On Diff #159027)

I'm not sure what you mean... as far as I can tell, strcmp and memcmp should return the same result even if the non-constant string has an embedded null. Can you give an example? (I'm interpreting "same" here loosely to mean the same zero-ness and same sign bit.)

xbolva00 updated this revision to Diff 159085.Aug 3 2018, 2:05 PM
  • Use isDereferenceableAndAlignedPointer

Updated.

Transformation works but I was unable to make it working with tests as you requested (as an argument). Not sure what is wrong, I had tried to put align info but nothing worked.

lib/Transforms/Utils/SimplifyLibCalls.cpp
155 ↗(On Diff #159027)

Yeah, sign bit is same, but values are not. So need to check only if it is only used in condition

xbolva00 added inline comments.Aug 3 2018, 2:13 PM
lib/Transforms/Utils/SimplifyLibCalls.cpp
155 ↗(On Diff #159027)

Oh no. strcmp(...) < -10 could be true, but memcmp(...) < - 10 is false.

xbolva00 marked 2 inline comments as done.Aug 3 2018, 2:13 PM
xbolva00 added inline comments.
test/Transforms/InstCombine/strcmp-memcmp.ll
10 ↗(On Diff #159085)

Not sure what breaks transformation with this test file, i tried [12 x i8]* align 1 %buf, but it did not worked.

Any advice? Otherwise, this transformation works with aligned alloca.

I think you need to add something like "dereferenceable(4)" or "byval" to the parameter. so it's recognized as dereferenceable.

Please get rid of the lifetime.start/end intrinsics in the test.

lib/Transforms/Utils/SimplifyLibCalls.cpp
155 ↗(On Diff #159027)

The C standard doesn't specify a specific return value; we don't have to return exactly the same value libc strcmp would return anyway, I think? Haven't really thought through the implications of that. It's okay to leave it for now.

xbolva00 added inline comments.Aug 3 2018, 2:22 PM
lib/Transforms/Utils/SimplifyLibCalls.cpp
155 ↗(On Diff #159027)

But if rhs is zero, yes, we can lift that restriction for ne/eq only.

xbolva00 updated this revision to Diff 159172.EditedAug 4 2018, 1:20 AM
  • Added/fixed tests
  • Allow any comparision with zero

Tests should be ok now, I will wait ~1 day and I will precommit them.

xbolva00 updated this revision to Diff 159173.Aug 4 2018, 1:41 AM
xbolva00 updated this revision to Diff 159174.
xbolva00 updated this revision to Diff 159184.Aug 4 2018, 9:40 AM
  • Fixed logic
  • Some new tests
xbolva00 updated this revision to Diff 159186.Aug 4 2018, 10:32 AM
xbolva00 updated this revision to Diff 159200.Aug 5 2018, 2:41 AM
  • Rebased
efriedma added a subscriber: spatel.Aug 6 2018, 5:45 PM
efriedma added inline comments.
lib/Transforms/Utils/SimplifyLibCalls.cpp
409 ↗(On Diff #159200)

std::min.

xbolva00 updated this revision to Diff 159485.EditedAug 7 2018, 4:49 AM
  • Use std::min
xbolva00 marked an inline comment as done.Aug 7 2018, 4:49 AM
  • Use std::min

Ok now?

efriedma accepted this revision.Aug 9 2018, 1:38 PM

LGTM

This revision is now accepted and ready to land.Aug 9 2018, 1:38 PM
This revision was automatically updated to reflect the committed changes.

There are concerns about this optimization expressed in

https://github.com/google/sanitizers/issues/993

Specifically, it can transform

strcmp("abc\0...undef...undef...", "foobar")

into a similar memcmp() which would observe uninitialized data at the end of the first string.

I'm not a 100% sure that it is undefined behavior because there are no indeterminate values in the code at all, and memcmp() as a standard library function does not have to follow the same rules (technically). @rsmith But it upsets MemorySanitizer, and for a good reason IMHO.

This optimization should be either limited to functions w/o sanitize_memory attribute, or disabled completely.

xbolva00 added a comment.EditedSep 4 2018, 12:01 PM

See https://godbolt.org/z/o9CEYv
So GCC should also upset MemorySanitizer here, no?

limited to functions w/o sanitize_memory attribute

Sounds OK for me, let's hear other folks.

Edit: Seems GCC does not do this opt if memory sanitizer is on.

GCC does not have MemorySanitizer.

"which would observe uninitialized data at the end of the first string"

The uninitialized data isn't a problem for any implementation of memcmp that doesn't involve LTO: a cross-translation-unit call effectively freezes all unintialized data. But I guess we can't really guarantee that reliably unless we introduce an llvm.memcmp intrinsic.

In theory there's also a potential race condition with this transform, if the memcmp implementation loads from the buffer multiple times and the "uninitialized" data is written by another thread. We can maybe work around this; haven't thought through what check would be necessary.

We can solve both problems, I think, if we don't actually call the libc memcmp; we can emit an inline implementation that avoids those issues, I think. At least when msan is disabled.

If this is blocking you, you can revert for now, and we'll rework this.

GCC does not have MemorySanitizer.

Ah, I was trying it with address sanitizer, sorry.

adamreichold added a comment.EditedSep 4 2018, 12:11 PM

limited to functions w/o sanitize_memory attribute

Sounds OK for me, let's hear other folks.

Edit: Seems GCC does not do this opt if memory sanitizer is on.

Just for my understanding of this: If I do that replacement by hand, I am reading uninitialized memory and all bets are of. The compiler however can do this behind the scenes since my program will not be able to observe the indeterminate state (since I am looking only at == 0 of the return value of strcmp).

Does LLVM's implementation somehow guarantee that no later optimization pass comes along, sees the program reading uninitialized memory and declares UB?

Does LLVM's implementation somehow guarantee that no later optimization pass comes along, sees the program reading uninitialized memory and declares UB?

In LLVM IR, reading uninitialized memory isn't itself UB; it's only UB if you perform certain operations on the result. Granted, a naive LLVM IR implementation of memcmp might involve operations which lead to UB.

adamreichold added a comment.EditedSep 4 2018, 12:40 PM

Does LLVM's implementation somehow guarantee that no later optimization pass comes along, sees the program reading uninitialized memory and declares UB?

In LLVM IR, reading uninitialized memory isn't itself UB; it's only UB if you perform certain operations on the result. Granted, a naive LLVM IR implementation of memcmp might involve operations which lead to UB.

But that also implies that you have to emit an inline implementation since you might not control what the implementation of memcmp supplied by libc does?

Does LLVM's implementation somehow guarantee that no later optimization pass comes along, sees the program reading uninitialized memory and declares UB?

In LLVM IR, reading uninitialized memory isn't itself UB; it's only UB if you perform certain operations on the result. Granted, a naive LLVM IR implementation of memcmp might involve operations which lead to UB.

But that also implies that you have to emit an inline implementation since you might not control what the implementation of memcmp supplied by libc does?

LLVM already expands memcmp if possible, as we can see https://godbolt.org/z/o9CEYv (vs. https://godbolt.org/z/D8RhVl)

But that also implies that you have to emit an inline implementation since you might not control what the implementation of memcmp supplied by libc does?

LLVM already expands memcmp if possible, as we can see https://godbolt.org/z/o9CEYv (vs. https://godbolt.org/z/D8RhVl)

From what I understand this is done as an optimization on a best-effort basis whereas in this case it is a precondition for the soundness of this particular optimization.

But that also implies that you have to emit an inline implementation since you might not control what the implementation of memcmp supplied by libc does?

Assuming there isn't a data race, the libc implementation of memcmp can't actually do anything undefined. In assembly, "uninitialized" data always has some fixed value.


Given the problems here, we probably want to move this transform to the ExpandMemCmp pass, and only transform when we'll expand the compare inline.

xbolva00 added a comment.EditedSep 4 2018, 1:11 PM

But that also implies that you have to emit an inline implementation since you might not control what the implementation of memcmp supplied by libc does?

Assuming there isn't a data race, the libc implementation of memcmp can't actually do anything undefined. In assembly, "uninitialized" data always has some fixed value.


Given the problems here, we probably want to move this transform to the ExpandMemCmp pass, and only transform when we'll expand the compare inline.

Cannot we just check the conditions of expansion https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/ExpandMemCmp.cpp#L678 and apply it here? We basically miss check for optForSize. So if this transformation passes, memcmp will be expanded too.

Do we guarantee the correctness of programs if a custom opt pipeline is used (eg. disabled ExpandMemCmp, enabled InstCombine pass)?

adamreichold added a comment.EditedSep 4 2018, 1:13 PM

But that also implies that you have to emit an inline implementation since you might not control what the implementation of memcmp supplied by libc does?

Assuming there isn't a data race, the libc implementation of memcmp can't actually do anything undefined. In assembly, "uninitialized" data always has some fixed value.

Sorry for going off into the land of imagined libc implementations here, but for example it could update a thread local variable whenever it reads a byte and my program could then read this variable and hence observe the value of the uninitialized memory. I know this a useless and contrived way to implement memcmp, I just think this shows that if one calls an external function that is not guaranteed to be free of side-effects, those side-effects could be reflected back into the current translation unit. And while even this would probably not be a UB problem, since the compiler will by construction not know that I process undefined values when I access that thread local variable and hence will not call out UB on me, this could even become a correctness issue if I make assumptions about this variable's value when the strcmp succeeds.

for example it could update a thread local variable whenever it reads a byte and my program could then read this variable and hence observe the value of the uninitialized memory

We assume memcmp has the semantics described in the C standard unless you specify -fno-builtin or something; otherwise, we couldn't transform it at all.

Do we guarantee the correctness of programs if a custom opt pipeline is used (eg. disabled ExpandMemCmp, enabled InstCombine pass)?

We try to, yes... and even if we didn't, I'd be worried about it getting out of sync.

xbolva00 added a comment.EditedSep 4 2018, 10:48 PM

So what is proposed solution?

1, bail out if sanitize memory attribute
2, check for opt for size (as in expandMemcmp)
3, move this instcombine transformation into expandmemcmp pass

If it were just msan, I would say we could just check for the msan attribute and call it done. But I'm a little worried about the potential race and potential weird interactions with LTO. So probably best to both bailout if msan is enabled, and move the transform to expandmemcmp.