This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Look through more casts when folding memchr and memcmp
ClosedPublic

Authored by msebor on Jun 22 2022, 11:00 AM.

Details

Summary

The memchr and memcmp folders fail for the results of pointer addition involving fractional offsets into constants of types other than i8, such as in

const int a[] = { 0x01010101, 0x02020202 };

void* f (void)
{
  return memchr((const char*)a + 1, 1, 7);
}

and similar examples involving structs and unions. This is due to what seems like an overly restrictive check in the getConstantDataArrayInfo that keeps the function from "looking through [all] bitcast instructions and geps" despite the comment documenting this intent.

In conjunction with the recent enhancement to let all libcall folders work with subobjects of constants of arbitrary types (D125114), this change removes the limitation above, bringing the memchr and memcmp folders up to par with GCC. Tested on by running make check-all on x86_64-linux.

(The code in getConstantDataArrayInfo could stand to be simplified by letting the function take a DataLayout argument. I stopped short of making that change in this patch to minimize the extent of the changes.)

Diff Detail

Event Timeline

msebor created this revision.Jun 22 2022, 11:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2022, 11:00 AM
msebor requested review of this revision.Jun 22 2022, 11:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2022, 11:00 AM
efriedma added inline comments.Jun 22 2022, 11:50 AM
llvm/lib/Analysis/ValueTracking.cpp
4196

Maybe it makes sense to just drop the dyn_cast<GlobalVariable> here?

4287

Not sure I understand this change; why does it matter if the slice is zero-length here?

msebor added inline comments.Jun 22 2022, 12:24 PM
llvm/lib/Analysis/ValueTracking.cpp
4196

This is what I meant by the simplification opportunity (I think we discussed it with @nikic in another review but I should have made that clear here). As far as I can tell the GlobalVariable cast is necessary to get at the DataLayout and to compute the offset below. (Or is there some other way to get at it?)

4287

The change prevents treating past-the-end accesses to char arrays the same was as those to empty strings, such as in strlen("123" + 4). This is exercised by str-int-3.ll that I added in D125114 based on my understanding of the convention that library calls that provably result in an out-of-bounds access are best left for sanitizers to report. (I think an argument can be made that folding such calls to zero is safer than counting on sanitizers, so I don't mind removing the check and adjusting the test if it's decided that's preferable.)

nikic added inline comments.Jun 23 2022, 8:27 AM
llvm/lib/Analysis/ValueTracking.cpp
4196

If you really want to avoid the DataLayout argument, you can do something like dyn_cast<GlobalVariable>(getUnderlyingObject(V)), take the DataLayout from there, and then use that to feed the stripAndAccumulateConstantOffsets() call.

Otherwise this is just shifting the problem to a GEP + bitcast + GEP sequence. (And it's worth noting that with opaque pointers the casts being skipped here don't exist anyway.)

efriedma added inline comments.Jun 23 2022, 11:11 AM
llvm/lib/Analysis/ValueTracking.cpp
4287

If we want getConstantStringInfo(TrimAtNul=true) to always fail if it doesn't trim a nul, we could do that, but only failing when we don't trim a null and Slice.Array is null doesn't really make sense to me.

lattner resigned from this revision.Jun 23 2022, 1:41 PM
msebor updated this revision to Diff 439819.Jun 24 2022, 10:56 AM
msebor marked an inline comment as done.
  • Have getConstantStringInfo fail for empty array slices when TrimAtNul is set.
  • Have GetStringLengthH fail for empty array slices.
  • Gracefully handle empty array slices in memchr and memrchr folders.
  • Add tests.
llvm/lib/Analysis/ValueTracking.cpp
4196

Thanks for the pointer to getUnderlyingObj. Unless there is a form of the function that also computes the aggregate GEP offset I don't think using it here would be appropriate since the rest of the function then has to (again) iterate over the same chain of casts and GEPs to compute the offset. (Would enhancing getUnderlyingObjto compute the offset be appropriate? FWIW, GCC has at least two utility functions that do both.)

I'm not opposed to adding the DataLayout argument (quite the contrary), but as I explained, I avoided it in this patch to keep it from growing too big. If making this small incremental improvement first is a problem I can instead start by submitting a change to add the new argument. Let me know your preference.

Opaque pointers obviously obviate the casts, so once typed pointers completely disappear the code can be simplified, The removal of the FisrstIdx test will still be necessary to handle these use cases either way but that can be done then. Would you prefer to defer this whole patch until then?

4287

I've made the change you suggest in the updated patch. It has a fairly pervasive impact on the couple of dozen or so clients of the function. There are no tests for this case so I've developed and added a subset of of them in the updated patch. It was a nontrivial effort whose goal is orthogonal to this enhancement and that I think would have been better handled separately and by someone who's less ambivalent about the benefits of doing this than me. (The test coverage would ideally be extended to all supported functions but I leave that for some other time.)

nikic added inline comments.Jun 24 2022, 12:28 PM
llvm/lib/Analysis/ValueTracking.cpp
4196

stripAndAccumulateConstantOffsets() is the function that computes aggregate GEP offsets. getUnderlyingObject() is just a suggestion on how you can fetch the DataLayout without passing an argument. Basically the start of this function (down to line 4222) would become something like this:

auto *GV = dyn_cast<GlobalVariable>(getUnderlyingObject(V));
if (!GV || !GV->isConstant() || !GV->hasDefinitiveInitializer())
  return false; // Not based on constant global.

const DataLayout &DL = GV->getParent()->getDataLayout();
APInt Offset(DL.getIndexTypeSizeInBits(V->getType()), 0);
if (GV != P->stripAndAccumulateConstantOffsets(DL, Offset, /*AllowNonInbounds*/ true))
  return false; // Could not determine offset from GV.
4287

Possibly I'm misreading @efriedma's comment, but I think the suggestion here was to just drop the change you originally did from the patch. This doesn't seem to have any direct relation to the bitcast stripping part.

(For what it's worth, I think I've come around on the sanitizer issue from "we should accommodate sanitizers if it doesn't cost any effort" to "we should ignore sanitizers completely and optimize as aggressively as possible", because these little checks are not a principled solution anyway, but they always cause a lot of discussion about whether some particular case should be special-cased or not.)

msebor updated this revision to Diff 440373.Jun 27 2022, 1:54 PM

Changes in revision 3 of the patch

  • return a conservative result from getConstantDataArrayInfo for past-the-end offsets,
  • adds comments to getConstantStringInfo and GetStringLengthH explaining the decision to return a conservative result for out-of-bounds offsets,
  • fold empty sequences in optimizeMemRChr and optimizeMemRhr,
  • add new and adjust existing tests.
llvm/lib/Analysis/ValueTracking.cpp
4196

I'm pretty sure (I think) I understood what you meant. My point is that a call to getUnderlyingObject has a linear complexity in the number of GEPs/casts, and so would a fully general recursive call to getConstantDataArrayInfo. They both walk the IR to find the underlying declaration. Calling the former from the latter would make the latter do double the work, with quadratic complexity in the number of GEPs.

The complexity could be kept linear by

  • adding a DataLayout argument to getConstantDataArrayInfo
  • having getUnderlyingObject also accumulate the offset from each GEP it strips on the way to the declaration of the object (it too would need a DataLayout argument).

Both changes seem useful to me independently of each other (and as I mentioned, a precedent for the getUnderlyingObject change is GCC's get_ref_base_and_extent utility function, among others).

But until at least the first is done, this patch handles a single cast + GEP + cast chain in O(N). Alternatively, I can add the DataLayout argument to getConstantDataArrayInfo first, and handle in O(N) arbitrarily long chains of cast/GEP sequences.

Having said all that, I have a feeling we might be talking past each other and this will not be the end of it.

4287

Okay, given that, I'm comfortable removing both the check and the case str-int-3.ll test that otherwise fails.

I've also updated the new strcall-no-nul.ll test to reflect the current behavior on trunk, and in addition removed the other checks I added to verify the opposite strategy. (Either way, being more consistent about this than the initial patch leads to churn that's strictly outside the scope of this enhancement.)

As an aside it would be helpful to capture somewhere the decision to fold even in spite of the effect on sanitizers so that we can point to it when the question comes up again in the future.

nikic added inline comments.Jun 27 2022, 2:28 PM
llvm/lib/Analysis/ValueTracking.cpp
4196

There shouldn't be any quadratic complexity -- when using getUnderlyingObject/stripAndAccumulateConstantOffsets there is no need for getConstantDataArrayInfo itself to be recursive, all the offsets are accumulated within a single call.

You are right that there is some redundant work going on in that getUnderlyingObject will first find the base object, and then stripAndAccumulateConstantOffsets will accumulate offsets down to it, but it's still a linear walk, repeated twice, so complexity is still O(n).

(If we want to really get down to it, doing two walks will actually be faster in practice, because getUnderlyingObject is significantly faster than stripAndAccumulateConstantOffsets, which means that we can quickly discard any cases that aren't based on a constant GV. This is just a side note, we generally wouldn't specifically optimize for that.)

msebor updated this revision to Diff 440431.Jun 27 2022, 4:36 PM

Changes in revision 4 of the patch:

  • use getUnderlyingObject and stripAndAccumulateConstantOffsets and avoid recursion in getConstantDataArrayInfo,
  • add tests to exercise long chain of GEPs.
llvm/lib/Analysis/ValueTracking.cpp
4196

Aah, that was my misunderstanding, sorry. I assumed stripAndAccumulateConstantOffsets and accumulateConstantOffset both operated on just a single GEP except that the former also stripped casts. It looks like the former actually iterates over all the GEPs and accumulates the offsets from all of them. So yes, that does work linearly.

nikic added inline comments.Jun 28 2022, 1:21 AM
llvm/lib/Analysis/ValueTracking.cpp
4194

You can move this out of the GEPOperator block and also drop the stripPointerCasts() above (and the repeated checks of GV below it). All cases can be treated uniformly.

4239

I wouldn't claim that folding these is really "safer", it's just a (UB-based) optimization.

msebor updated this revision to Diff 440696.EditedJun 28 2022, 10:58 AM

Changes in revision 5 of the patch:

  • Simplify getConstantDataArrayInfo some more.
  • Update comment.
llvm/lib/Analysis/ValueTracking.cpp
4194

Because stripAndAccumulateConstantOffsets is a member of the Value class and not one of GEPOperator. Nice, thanks!

4239

Many users would find replacing an undefined library call with a well-defined expression safer than letting the call take place and crash their program for example. I certainly do, but I see little point in arguing about it here.

nikic accepted this revision.Jun 28 2022, 12:30 PM

Implementation LG, some notes on the tests.

llvm/test/Transforms/InstCombine/memchr-10.ll
22

Incomplete check line?

llvm/test/Transforms/InstCombine/memchr-9.ll
318

These check lines aren't correct and are probably just getting ignored. You need to pass something like --check-prefixes=CHECK,BE and --check-prefixes=CHECK,LE and regenerate check lines.

llvm/test/Transforms/InstCombine/strcall-no-nul.ll
143

This check line doesn't make a lot of sense ... the syntax is incorrect (and the incorrect syntax is repeated in the cases below) and the ret wouldn't make sense here anyway (because strlen returns an integer not a pointer). The TODO also seems outdated, as this already folds to 0.

llvm/test/Transforms/InstCombine/strnlen-1.ll
166

This TODO is probably not relevant anymore given the updated patch?

This revision is now accepted and ready to land.Jun 28 2022, 12:30 PM
This revision was landed with ongoing or failed builds.Jun 28 2022, 3:00 PM
This revision was automatically updated to reflect the committed changes.
msebor marked an inline comment as done.

Thanks for the careful review!

llvm/test/Transforms/InstCombine/memchr-10.ll
22

It's intentional and documented above: the return value doesn't matter (i.e., it's indeterminate here). I didn't want to encode an expectation one way or the other.

llvm/test/Transforms/InstCombine/memchr-9.ll
318

Right. You do have a good eye for detail!

llvm/test/Transforms/InstCombine/strcall-no-nul.ll
143

I've fixed the strlen typo.

I hand-wrote all these lines and I realize not all of them are entirely correct. My main goal was to more prominently mark up the expected failures, but I was also hoping that llvm-lit would pick them up somehow and indicate when there are expected failures in the test. I've removed the rest of the XFAIL- lines from the final commit but, for future reference, is there a way to do what I want? (I couldn't find an example in the test suite that I didn't add myself.)

nikic added inline comments.Jun 29 2022, 12:36 AM
llvm/test/Transforms/InstCombine/strcall-no-nul.ll
143

It's possible to mark the entire test as XFAIL using XFAIL: *, but I don't think XFAIL interacts with FileCheck in any way.

General convention is to check for current codegen and put a TODO/FIXME if it's currently incorrect/suboptimal.

fhahn added a subscriber: fhahn.Jun 29 2022, 2:02 AM

It looks like Transforms/InstCombine/str-int-3.ll is failing on Darwin systems. The Darwin bots are currently down unfortunately though (https://green.lab.llvm.org/green/)

fhahn added a comment.Jun 29 2022, 6:42 AM

It looks like Transforms/InstCombine/str-int-3.ll is failing on Darwin systems. The Darwin bots are currently down unfortunately though (https://green.lab.llvm.org/green/)

GreenDragon is back up, please see https://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/29787/testReport/LLVM/Transforms_InstCombine/str_int_3_ll/ for more details about the failing test. Please revert the patch if the fix requires more time.

GreenDragon is back up, please see https://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/29787/testReport/LLVM/Transforms_InstCombine/str_int_3_ll/ for more details about the failing test. Please revert the patch if the fix requires more time.

Thanks for the heads up. I'm not having any luck reproducing the failure with a cross for either i386-apple-darwin or x86_64-apple-darwin. Can you please provide more detail? (The full test output and the target triple, or anything else that might help reproduce it on x86_64 Linux.)

I've managed to reproduce it by changing the test case. It seems as though the test input of

%pa_0_0_32 = getelementptr [2 x %struct.A], [2 x %struct.A]* @a, i64 0, i64 0, i32 0, i64 32
%ia_0_0_32 = call i32 @atoi(i8* %pa_0_0_32)

is on Darwin transformed into

%ia_0_0_32 = call i32 @atoi(i8* getelementptr inbounds ([2 x %struct.A], [2 x %struct.A]* @a, i64 1, i64 0, i32 0, i64 0))

which the folder doesn't handle.

I've managed to reproduce it by changing the test case...

Sorry, I was too hasty, that works as expected. I still have not been able to reproduce it.

msebor added a comment.EditedJun 29 2022, 9:25 AM

I suspect the Darwin strtol behaves differently than on Linux for en empty subject sequence (ie., for ""). On Linux strtol succeeds and returns zero without changing errno. On Darwin it sets errno, which then causes the convertStrToNumber utility function to fail. This is permitted by POSIX but the test assumes the Linux behavior.

This also seems be supported by the Apple open source implementation of the function in strtol.c.

I've relaxed the test in rGd515211a0ce1 to avoid exercising the target-dependent behavior. Please let me know if you see any other issues.

spatel added a subscriber: spatel.Jun 29 2022, 11:16 AM

I've relaxed the test in rGd515211a0ce1 to avoid exercising the target-dependent behavior. Please let me know if you see any other issues.

That avoids the test failure, but we still have a host-platform difference that seems unnecessary.

IIUC, the problem is here:
https://github.com/llvm/llvm-project/blob/649439e7aeeb3b30f54297b3c6f7d6549fb2f85a/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp#L87

On Linux, errno isn't set for an empty string:
https://godbolt.org/z/Tn5j1o4s6

But on a Mac (I tested locally - not sure if it's possible to do that on godbolt), I get this output:

Value of errno: 22
res = 0

Instead of return nullptr on errno, just remove that check? Or return a null constant since anything goes at that point?

msebor added a comment.EditedJun 29 2022, 12:10 PM

I agree there is a detectable difference when folding either undefined calls to atoi (or strtol) either with past the end pointers, or even with the empty string. The former calls are undefined so the difference shouldn't matter, but the latter is well-defined. In the latter case the difference predates the patch which is why I just "papered" over it by disabling the test, but I agree it should be avoided in the well-defined case. I've raised PR 56293 to keep track of it and will look into removing it.

Maybe reland with better patch title as this patch affects more libcalls than just two ones?