This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyLibCalls] Add dereferenceable bytes from known callsites
ClosedPublic

Authored by xbolva00 on Aug 12 2019, 4:29 AM.

Details

Summary

int mm(char *a, char *b) {

return memcmp(a,b,16);

}

Currently:
define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 {
entry:

%call = tail call i32 @memcmp(i8* %a, i8* %b, i64 16)
ret i32 %call

}

After patch:
define dso_local i32 @mm(i8* nocapture readonly %a, i8* nocapture readonly %b) local_unnamed_addr #1 {
entry:

%call = tail call i32 @memcmp(i8* dereferenceable(16)  %a, i8* dereferenceable(16)  %b, i64 16)
ret i32 %call

}

Diff Detail

Event Timeline

xbolva00 created this revision.Aug 12 2019, 4:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2019, 4:29 AM
xbolva00 retitled this revision from [SimplifyLibCalls] Add dereferecanble bytes from known callsites úWI] to [SimplifyLibCalls] Add dereferecanble bytes from known callsites [WIP].Aug 12 2019, 4:29 AM

Please @jdoerfert check it. I will add more tests when the code is OK.

xbolva00 marked an inline comment as done.Aug 12 2019, 4:34 AM
xbolva00 added a subscriber: spatel.
xbolva00 added inline comments.
lib/Transforms/Utils/SimplifyLibCalls.cpp
193

Copy paste from @spatel 's patch: https://reviews.llvm.org/D64258

We should find proper place for this.

xbolva00 retitled this revision from [SimplifyLibCalls] Add dereferecanble bytes from known callsites [WIP] to [SimplifyLibCalls] Add dereferenceable bytes from known callsites [WIP].Aug 12 2019, 4:34 AM

Some ideas
1, + memchr
2, infer noalias as follow up

I like the direction, first obstacle identified but solvable. We need tests.

lib/Transforms/Utils/SimplifyLibCalls.cpp
193

The Attributor has a similar, though more elaborate, utility which is not exposed yet. I would expect that to be usable but the interface is in flux right now. I'd go with this and a TODO note.


Edit: Reading through the use cases, maybe we should use the Attributor code which can handle more cases. What is happening here is not what we want, see my example below.


Proposal, do it here, sth like:

static void annotateDereferenceableBytes(CallSite CS, unsigned ArgNo, uint64_t DerefBytes) {
  if (CS.getDereferenceableBytes(ArgNo) < DerefBytes) {
    CS.removeParamAttr(ArgNo, Attribute::Dereferenceable);
    CS.removeParamAttr(ArgNo, Attribute::DereferenceableOrNull);
    // TODO: CallSite does not have an `addParamAttr` for integer attributes.
    CS.addAttribute(ArgNo + AttributeList::FirstArgIndex, Attribute::getWithDereferenceableBytes(CS->getContext(), DerefBytes));
  }
}

(I did not try to compile this!)

947

LHS and RHS are call site operands for which we deduced dereferenceability. Thus we want to annotate the call site. Now with this code, if I'm not mistaken, there are two possibilities for LHS, RHS is the same:

  1. LHS is anything but an argument -> no annotation -> not what we want
  2. LHS is an argument -> annotation at the argument -> not necessarily correct, see below:

We should not derive deref for the arguments a or b from the memcmp call. We can and should derive deref for their usage in the memcmp call.

void foo(int *a, int *b) {
  if (a && b)
    memcmp(a, b, 4);
}
986

What does the return value here mean? I'm also confused by the Op0 which was there already.

xbolva00 updated this revision to Diff 214692.Aug 12 2019, 1:03 PM

memchr support.

xbolva00 marked 2 inline comments as done.Aug 12 2019, 1:10 PM
xbolva00 added inline comments.
lib/Transforms/Utils/SimplifyLibCalls.cpp
947

Right. I also wanted to annotate callsite arguments but I didnt find a nice API.

I will try your code above.

986

nullptr means that nothing was simplified.

below, for memcpy, we can just turn memcpy libcall to memcpy intrinsic.

Thanks for review, I am going to address your notes now + tests.

jdoerfert added inline comments.Aug 12 2019, 1:35 PM
lib/Transforms/Utils/SimplifyLibCalls.cpp
986

but why is the pointer operand returned?

xbolva00 marked an inline comment as done.Aug 12 2019, 1:42 PM
xbolva00 added inline comments.
lib/Transforms/Utils/SimplifyLibCalls.cpp
986

llvm memcpy intrinsic "returns" void so in order to fix return value, we return dst pointer.

return memcpy(dst, ...)

to

llvm.memcpy(dst, ....)
return dst;

I dont know why this was designed this way :)

xbolva00 marked an inline comment as done.Aug 12 2019, 1:43 PM
xbolva00 added inline comments.
lib/Transforms/Utils/SimplifyLibCalls.cpp
986
xbolva00 updated this revision to Diff 214726.Aug 12 2019, 3:42 PM

Addressed review tips.
Added tests. (extensive tests for memcmp, light tests for others)

xbolva00 updated this revision to Diff 214728.Aug 12 2019, 3:45 PM
xbolva00 marked an inline comment as done.Aug 12 2019, 3:49 PM

(I havent run test suite yet, I expect many test files need to be updated.. I will do it tommorow)

lib/Transforms/Utils/SimplifyLibCalls.cpp
193

if (CI->getDereferenceableBytes(ArgNo + 1) < DerefBytes) {

quite surprising :) spent some time on find this :D

xbolva00 edited the summary of this revision. (Show Details)Aug 12 2019, 3:51 PM
jdoerfert accepted this revision.Aug 12 2019, 4:36 PM

One minor request, a few nits, LGTM.

lib/Transforms/Utils/SimplifyLibCalls.cpp
193

My bad, I added AttributeList::FirstArgIndex (= 1) below but forgot it above.
Can you replace 1 with AttributeList::FirstArgIndex please.

785

Maybe call annotateDereferenceableBytes after the zero check.

949

Again, I guess if the inner condition hits the annotations are lost (are they?). We could annotate afterwards.

988

I don't mind but you don't even need the Op0 .. changes anymore.

This revision is now accepted and ready to land.Aug 12 2019, 4:36 PM
xbolva00 updated this revision to Diff 214783.Aug 13 2019, 2:07 AM
xbolva00 retitled this revision from [SimplifyLibCalls] Add dereferenceable bytes from known callsites [WIP] to [SimplifyLibCalls] Add dereferenceable bytes from known callsites.

Fixed nits

This revision was automatically updated to reflect the committed changes.
xbolva00 added a comment.EditedAug 14 2019, 3:13 PM

Hello @jdoerfert.

I found another opportunity

char *d;

char * xmain(unsigned long n) {

char *s = malloc(32);
__builtin_memcpy(s, d, n);
return s;

}

define dso_local noalias i8* @xmain(i64 %n) local_unnamed_addr #0 {
entry:

%call = tail call i8* @malloc(i64 32)
%0 = load i8*, i8** @d, align 8, !tbaa !2
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %call, i8* align 1 %0, i64 %n, i1 false)
ret i8* %call

}

from malloc-like functions we can infer e.g. tail call void @llvm.memcpy.p0i8.p0i8.i64(i8*dereferenceable_or_null(32) align 1 %call, i8* align 1 %0, i64 %n, i1 false)

@spatel not sure what is best approach here.. in SLC's optimizeMalloc walk thru all malloc's users (+ sane threshold?) and annotate them?

xbolva00 added a comment.EditedAug 14 2019, 3:42 PM

Or by conservative (dont make IC much slower) and check only str(n)cpy, memcpy, memmove, memset buffers if they come from malloc/realloc/calloc?

Edit: maybe we can just add deref or null (C) to malloc ret value and it would do the right thing easily.

Edit: maybe we can just add deref or null (C) to malloc ret value and it would do the right thing easily.

Yes! That is what we should do.