This is an archive of the discontinued LLVM Phabricator instance.

Improve support of GNU mempcpy
ClosedPublic

Authored by serge-sans-paille on Dec 11 2019, 12:46 PM.

Details

Summary
  • Lower to the memcpy intrinsic
  • Raise warnings when size/bounds are known

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptDec 11 2019, 12:46 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
Jim added a subscriber: Jim.Dec 12 2019, 1:14 AM
Jim added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
2518

Is it an error here?

It should be:

if (BuiltinID == Builtin::BImempcpy ||
    BuiltinID == Builtin::BI__builtin_mempcpy)
    return RValue::get(Builder.CreateGEP(Dest.getPointer(), SizeVal));
else
    return RValue::get(Dest.getPointer());

@Jim obviously :-) Thanks for spotting that.

serge-sans-paille marked an inline comment as done.Dec 12 2019, 5:07 AM
jedilyn edited reviewers, added: Jim, erik.pilkington; removed: jedilyn.Dec 12 2019, 9:06 PM
Jim added a comment.Dec 12 2019, 11:05 PM

I am curious what is difference of code generation after applying your changes?

Please add a IR codegen test.

serge-sans-paille added a comment.EditedDec 13 2019, 2:58 AM
In D71374#1783032, @Jim wrote:

I am curious what is difference of code generation after applying your changes?

[[updated]]
Before, when compiling

#define _GNU_SOURCE
#include <string.h>

void* foo(void* to, void* from, unsigned n) {
  return mempcpy(mempcpy(to, from, n), from, n);
}

We get (clang -O3)

define dso_local i8* @foo(i8* %0, i8* nocapture readonly %1, i32 %2) local_unnamed_addr #0 {
  %4 = zext i32 %2 to i64
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %0, i8* align 1 %1, i64 %4, i1 false) #2
  %5 = getelementptr inbounds i8, i8* %0, i64 %4
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %5, i8* align 1 %1, i64 %4, i1 false) #2
  %6 = getelementptr inbounds i8, i8* %5, i64 %4
  ret i8* %6
}

And we now get

define dso_local i8* @foo(i8* %to, i8* nocapture readonly %from, i32 %n) local_unnamed_addr #0 {
entry:
  %conv = zext i32 %n to i64
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %to, i8* align 1 %from, i64 %conv, i1 false)
  %0 = getelementptr i8, i8* %to, i64 %conv
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %0, i8* align 1 %from, i64 %conv, i1 false)
  %1 = getelementptr i8, i8* %0, i64 %conv
  ret i8* %1
}

So no real change apart from the inbound keyword that I forgot to add.

LLVM already converts mempcpy to memcpy..

LLVM already converts mempcpy to memcpy..

Indeed, the clang version I was using as base reference was clang-9, and the mempcpy optimisation at IR level got introduced after that.
Nevertheless, this patch does much more than lowering mempcpy, it also triggers new warnings, so It's still useful that clang understands it.

@xbolva00 should I remove the lowering part and leave it to llvm?

[[edited the example to reflect actual state]]

xbolva00 added a comment.EditedDec 13 2019, 4:53 AM

@xbolva00 should I remove the lowering part and leave it to llvm?

Probably okay to leave it + inbounds, but please add a test.

serge-sans-paille updated this revision to Diff 233797.EditedDec 13 2019, 7:02 AM

Added test cases + inbound GEP

This revision is now accepted and ready to land.Jan 9 2020, 7:36 AM
This revision was automatically updated to reflect the committed changes.