This is WIP
This patch extends multi-BB stack-move optimization using a liveness analysis framework as pcwalton does.
only focused on unescaped static allocas as the previous patch does,
This patch introduce BB/Instruction-level Liveness Analysis, basically with the same terminology with Modern Compiler Implementation book
- capture check by CaptureTracker, collect whether all SSA uses are 1) LiveDef, which writes full size data to alloca (e.g. full-sized lifetime.start/end, memcpy/memset/memmove, store/load or 2) LiveUse, otherwise.
- Compute the src/dest alloca Liveness, by LiveUse based backward approach.
- If the BB's first SSA use is LiveUse of src/dest, this BB is LiveIn for src/dest.
- The LiveIn BB's predecessors are LiveOut for the allocas.
- The LiveIn BB's predecessors, which don't contain any LiveUse or LiveDef, are also LiveIn.
- Check the BB-level liveness conflict; if it fails, check User-level liveness conflicts
- BB-level
- If the BB is also LiveIn or is LiveOut or contains LiveUse for src and dest simultaneously, then it conflict.
- User-level
- From the end of BB, propagate src and dest liveness (If met LiveUse, then live, otherwise LiveDef make it not live) (TODO: more proper explanation).
- BB-level
So basically above procedure is pcwalton's liveness analysis, but it miscompile and missed some opportunities as followings
misscompile (or miss opportunity, we can merge if we remove meaning less store %struct.Foo { i32 13, i32 13, i32 13 }, ptr %dest. These can be characterized by mergeable if we remove no Use, no LiveOut LiveDef `. But it requires more User-level checks if we try to merge them.
%src = alloca %struct.Foo, align 4 %dest = alloca %struct.Foo, align 4 call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src) call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest) store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false) store %struct.Foo { i32 13, i32 13, i32 13 }, ptr %dest %1 = call i32 @use_nocapture(ptr nocapture %src) call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src) call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest) ret void
missed opportunity (if the Use is only read, no conflict happens.)
%src = alloca %struct.Foo, align 4 %dest = alloca %struct.Foo, align 4 call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src) call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest) store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false) %1 = call i32 @use_readonly(ptr nocapture %src) %2 = call i32 @use_readonly(ptr nocapture %src) %3 = call i32 @use_readonly(ptr nocapture %dest) call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src) call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest) ret void
Both cases can be covered by making LiveUse have ModRef information but balancing with compile time regressions.
TODO: Currently, the following will be miscompiled (src and dest would be merged)
%src = alloca %struct.Foo, align 4 %dest = alloca %struct.Foo, align 4 call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %src) call void @llvm.lifetime.start.p0(i64 12, ptr nocapture %dest) store %struct.Foo { i32 10, i32 20, i32 30 }, ptr %src call void @llvm.memcpy.p0.p0.i64(ptr align 4 %dest, ptr align 4 %src, i64 12, i1 false) %1 = call i32 @use_nocapture(ptr nocapture %src) %2 = call i32 @use_nocapture(ptr nocapture %dest) call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %src) call void @llvm.lifetime.end.p0(i64 12, ptr nocapture %dest) ret void