This is an archive of the discontinued LLVM Phabricator instance.

[WIP][MemCpyOpt] implement liveness analysis based stack-move optimization
Changes PlannedPublic

Authored by khei4 on Aug 22 2023, 3:18 AM.

Details

Reviewers
None
Summary

This is WIP
This patch extends multi-BB stack-move optimization using a liveness analysis framework as pcwalton does.
only focused on unescaped static allocas as the previous patch does,

This patch introduce BB/Instruction-level Liveness Analysis, basically with the same terminology with Modern Compiler Implementation book

  1. capture check by CaptureTracker, collect whether all SSA uses are 1) LiveDef, which writes full size data to alloca (e.g. full-sized lifetime.start/end, memcpy/memset/memmove, store/load or 2) LiveUse, otherwise.
  2. Compute the src/dest alloca Liveness, by LiveUse based backward approach.
    • If the BB's first SSA use is LiveUse of src/dest, this BB is LiveIn for src/dest.
    • The LiveIn BB's predecessors are LiveOut for the allocas.
      • The LiveIn BB's predecessors, which don't contain any LiveUse or LiveDef, are also LiveIn.
  1. Check the BB-level liveness conflict; if it fails, check User-level liveness conflicts
    • BB-level
      • If the BB is also LiveIn or is LiveOut or contains LiveUse for src and dest simultaneously, then it conflict.
    • User-level
      • From the end of BB, propagate src and dest liveness (If met LiveUse, then live, otherwise LiveDef make it not live) (TODO: more proper explanation).

So basically above procedure is pcwalton's liveness analysis, but it miscompile and missed some opportunities as followings

misscompile (or miss opportunity, we can merge if we remove meaning less store %struct.Foo { i32 13, i32 13, i32 13 }, ptr %dest. These can be characterized by mergeable if we remove no Use, no LiveOut LiveDef `. But it requires more User-level checks if we try to merge them.

%src = alloca %struct.Foo, align 4
%dest = alloca %struct.Foo, align 4
call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src

call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)
store %struct.Foo { i32 13, i32 13, i32 13 }, ptr %dest
%1 = call i32 @use_nocapture(ptr nocapture %src)
call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
ret void

missed opportunity (if the Use is only read, no conflict happens.)

%src = alloca %struct.Foo, align 4
%dest = alloca %struct.Foo, align 4
call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src

call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)

%1 = call i32 @use_readonly(ptr nocapture %src)
%2 = call i32 @use_readonly(ptr nocapture %src)
%3 = call i32 @use_readonly(ptr nocapture %dest)
call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
ret void

Both cases can be covered by making LiveUse have ModRef information but balancing with compile time regressions.

TODO: Currently, the following will be miscompiled (src and dest would be merged)

%src = alloca %struct.Foo, align 4
%dest = alloca %struct.Foo, align 4
call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src)
call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest)
store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src

call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false)

%1 = call i32 @use_nocapture(ptr nocapture %src)
%2 = call i32 @use_nocapture(ptr nocapture %dest)
call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src)
call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest)
ret void

Diff Detail

Event Timeline

khei4 created this revision.Aug 22 2023, 3:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2023, 3:18 AM
khei4 requested review of this revision.Aug 22 2023, 3:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2023, 3:18 AM
nikic added a subscriber: nikic.Aug 22 2023, 6:04 AM

It looks like this is a diff against some intermediate version, rather than the full patch.

khei4 edited the summary of this revision. (Show Details)Aug 24 2023, 8:26 AM
khei4 edited the summary of this revision. (Show Details)Aug 24 2023, 8:29 AM
khei4 updated this revision to Diff 553152.Aug 24 2023, 9:03 AM

fix wrong diff

khei4 added a comment.Aug 24 2023, 9:03 AM

It looks like this is a diff against some intermediate version, rather than the full patch.

@nikic Thanks! I fixed and updated the description, although still WIP!

khei4 planned changes to this revision.Aug 28 2023, 12:31 PM

If I used the backward approach like this, I couldn't handle cleanly some cases that we could merge in the multi-BB case (for example, src and dest are read-only together; I believe it's a sufficiently familiar case for Rust). Still, I now noticed extending the ModRef analysis in a more forward flow-sensitive way could cover the use after def cases as complex as this patch is. So a change is planned.