This patch copies @vsk's fix to instcombine from D85555 over to mem2reg. The
motivation and rationale are exactly the same: When mem2reg removes an alloca,
it erases the dbg.{addr,declare} instructions which refer to the alloca. It
would be better to instead remove all debug intrinsics which describe the
contents of the dead alloca, namely all dbg.value(<dead alloca>, ...,
DW_OP_deref)'s.
As far as I can tell, prior to D80264 these dbg.value+derefs would have been
silently dropped instead of being made undef, so we're just returning to
previous behaviour with these patches.
Testing
llvm-lit llvm/test and ninja check-clang gave no unexpected failures. Added
3 tests, each of which covers a dbg.value deletion path in mem2reg:
mem2reg-promote-alloca-1.ll mem2reg-promote-alloca-2.ll mem2reg-promote-alloca-3.ll
The first is based on the dexter test inlining.c from D89543. This patch also
improves the debugging experience for loop.c from D89543, which suffers
similarly after arg promotion instead of inlining.
An example
Looking at one of the tests added as an example, mem2reg-promote-alloca-1.ll,
based on inlining.c from D89543. Starting with this source:
int g; __attribute__((__always_inline__)) static void use(int* p) { g = *p; } void fun(int param) { use(¶m); }
After various transforms (including inlining) we get this IR. Note that the
dbg.values for param (!16) before the store and call were inserted by
instcombine's LowerDbgDeclare earlier in the pipeline.
define dso_local void @fun(i32 %param) !dbg !12 { entry: %param.addr = alloca i32, align 4 call void @llvm.dbg.value(metadata i32 %param, metadata !16, metadata !DIExpression()), !dbg !17 store i32 %param, i32* %param.addr, align 4 call void @llvm.dbg.value(metadata i32* %param.addr, metadata !16, metadata !DIExpression(DW_OP_deref)), !dbg !17 ;; ### inline 'use' start ### call void @llvm.dbg.value(metadata i32* %param.addr, metadata !22, metadata !DIExpression()), !dbg !28 call void @llvm.dbg.value(metadata i32* %param.addr, metadata !22, metadata !DIExpression()), !dbg !28 %0 = load i32, i32* %param.addr, align 4, !dbg !30 store i32 %0, i32* @g, align 4, !dbg !31 ;; ### inline 'use' end ### ret void, !dbg !32 } !16 = !DILocalVariable(name: "param", arg: 1, scope: !12, file: !6, line: 15, type: !7) !22 = !DILocalVariable(name: "param", arg: 1, scope: !12, file: !6, line: 15, type: !7)
Then sroa runs, invoking mem2reg, and replaces the store+load with the stored
value and deletes the alloca. Without this patch, the`dbg.value+deref` becomes
undef:
define dso_local void @fun(i32 %param) !dbg !12 { entry: call void @llvm.dbg.value(metadata i32 %param, metadata !16, metadata !DIExpression()), !dbg !17 call void @llvm.dbg.value(metadata i32* undef, metadata !16, metadata !DIExpression(DW_OP_deref)), !dbg !17 ;; ### inline 'use' start ### call void @llvm.dbg.value(metadata i32* %param.addr, metadata !22, metadata !DIExpression()), !dbg !28 call void @llvm.dbg.value(metadata i32* %param.addr, metadata !22, metadata !DIExpression()), !dbg !28 store i32 %param, i32* @g, align 4, !dbg !31 ;; ### inline 'use' end ### ret void, !dbg !32 } !16 = !DILocalVariable(name: "param", arg: 1, scope: !12, file: !6, line: 15, type: !7) !22 = !DILocalVariable(name: "param", arg: 1, scope: !12, file: !6, line: 15, type: !7)
With this patch, the dbg.value+deref is instead removed:
define dso_local void @fun(i32 %param) !dbg !12 { entry: call void @llvm.dbg.value(metadata i32 %param, metadata !16, metadata !DIExpression()), !dbg !17 ;; ### inline 'use' start ### call void @llvm.dbg.value(metadata i32* %param.addr, metadata !22, metadata !DIExpression()), !dbg !28 call void @llvm.dbg.value(metadata i32* %param.addr, metadata !22, metadata !DIExpression()), !dbg !28 store i32 %param, i32* @g, align 4, !dbg !31 ;; ### inline 'use' end ### ret void, !dbg !32 } !16 = !DILocalVariable(name: "param", arg: 1, scope: !12, file: !6, line: 15, type: !7) !22 = !DILocalVariable(name: "param", arg: 1, scope: !12, file: !6, line: 15, type: !7)
Considering that this replaces a TinyPtrVector which "is specialized for cases where there are normally 0 or 1 element in a vector", perhaps we ought to reserve less memory?