Page MenuHomePhabricator

[loop-idiom] Hoist loop memcpys to loop preheader
Needs RevisionPublic

Authored by zhuhan0 on Mar 1 2021, 1:21 AM.

Details

Summary

For a simple loop like:

struct S {
  int x;
  int y;
  char b;
};

unsigned foo(S* __restrict__ a, S* b, int n) {
  for (int i = 0; i < n; i++)
    a[i] = b[i];

  return sizeof(a[0]);
}

We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of opt -loop-idiom -S < memcpy_before.ll

%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %0 = bitcast %struct.S* %arrayidx2 to i8*
  %1 = bitcast %struct.S* %arrayidx to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }

The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.

With this change, re-run opt -loop-idiom -S < memcpy_before.ll. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.

%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %a1 = bitcast %struct.S* %a to i8*
  %b2 = bitcast %struct.S* %b to i8*
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  %0 = zext i32 %n to i64
  %1 = mul nuw nsw i64 %0, 12
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %2 = bitcast %struct.S* %arrayidx2 to i8*
  %3 = bitcast %struct.S* %arrayidx to i8*
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }

Diff Detail

Event Timeline

zhuhan0 created this revision.Mar 1 2021, 1:21 AM
zhuhan0 requested review of this revision.Mar 1 2021, 1:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2021, 1:21 AM
zhuhan0 updated this revision to Diff 327182.Mar 1 2021, 10:46 AM

Fix linter.

zhuhan0 updated this revision to Diff 328938.Mar 7 2021, 10:39 PM
zhuhan0 added a reviewer: lebedev.ri.

Add function name to optimization remarks.

Friendly ping. I could ask my colleagues to review this, but would appreciate some community feedback. I didn't find a clear code owner for this pass, so I simply put the top contributors to LoopIdiomRecognize.cpp as reviewers. Please let me know if I should put somebody else.

This looks like a great optimization to handle, but I'm not a competent reviewer for this area any longer.

This revision is now accepted and ready to land.Mon, Mar 22, 10:09 AM
foad added a subscriber: foad.Tue, Mar 23, 9:23 AM

Typo in description: "perheader".

zhuhan0 retitled this revision from [loop-idiom] Hoist loop memcpys to loop perheader to [loop-idiom] Hoist loop memcpys to loop preheader.Tue, Mar 23, 9:43 AM

Typo in description: "perheader".

Good catch. Corrected title.

This revision was landed with ongoing or failed builds.Mon, Mar 29, 11:42 PM
This revision was automatically updated to reflect the committed changes.

Is @zino someone's replacement account?
I'm asking because the accept of this revision is the only contribution of that account.

This revision is now accepted and ready to land.Tue, Mar 30, 2:49 AM
lebedev.ri requested changes to this revision.Tue, Mar 30, 2:49 AM
This revision now requires changes to proceed.Tue, Mar 30, 2:49 AM
hoy added a comment.Tue, Mar 30, 9:20 AM

Is @zino someone's replacement account?
I'm asking because the accept of this revision is the only contribution of that account.

Zino is a real LLVM developer. @zino looks like your email is not verified.

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
818

No need of this continue?

1211

Nit: Use 'StringRef` to avoid extra mem allocation?

hoy edited reviewers, added: hoy; removed: hoyFB.Tue, Mar 30, 9:20 AM

Is @zino someone's replacement account?
I'm asking because the accept of this revision is the only contribution of that account.

Hi Roman, Yes, this is replacement account for Zino (on PTO for the week), and I believe the original account was https://reviews.llvm.org/p/zinob/

Zino, Hongtao and myself did internal code review for this patch before it's sent up here - we should have pointed that out.

Sorry for the breakage though. Instruction or a reproducer would be helpful.

-Wenlei

In D97667#2658951, @hoy wrote:

Is @zino someone's replacement account?
I'm asking because the accept of this revision is the only contribution of that account.

Zino is a real LLVM developer. @zino looks like your email is not verified.

Not as per https://reviews.llvm.org/people/commits/21223/ / https://reviews.llvm.org/people/revisions/21223/, which is why i asked.
Docs aren't quite clear on this, but the subtext is that reviews only count
if the reviewee is actually familiar with the code (and that can be seen from git log),
and is an [upstream!] llvm dev.

Can you please

  1. precommit the tests
  2. split this up into whatever nfc preparatory patch and the actual memcpy patch?

@lebedev.ri @zino is the replacement account for @zinob. He had some issue with the old account and couldn't retrieve it.

  1. precommit the tests

Could you elaborate on this? What's the best way to run the failed buildbots build without committing this?

  1. split this up into whatever nfc preparatory patch and the actual memcpy patch?

Will do. Thanks.