This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Adjust fragment offset for big endian targets when splitting alloca in SROA
AcceptedPublic

Authored by Ka-Ka on Feb 14 2019, 1:17 AM.

Download Raw Diff

Details

Reviewers

aprantl
bjope
dstenb

Summary

When handling debug info fragments of types where padding have been
introduced, the offset have to be adjusted on big endian targets to
cover the correct parts of the value.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 28135
Build 28134: arc lint + arc unit

Event Timeline

Ka-Ka created this revision.Feb 14 2019, 1:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2019, 1:17 AM

Herald added subscribers: jdoerfert, jsji, nemanjai. · View Herald Transcript

Harbormaster completed remote builds in B28135: Diff 186801.Feb 14 2019, 1:17 AM

Ka-Ka added a project: debug-info.Feb 14 2019, 3:41 AM

aprantl added inline comments.Feb 14 2019, 8:35 AM

lib/Transforms/Scalar/SROA.cpp
4279	Since there are many places in the compiler where we create fragments, would it be an option to adapt the semantics of DW_OP_LLVM_fragment in a way that allows us to defer the special handling of big endian targets to AsmPrinter/DwarfExpression.cpp ? Or, if that doesn't work create an API for creating new fragments that force users to think about what to do on big-endian targets?

Ka-Ka added inline comments.Feb 15 2019, 4:58 AM

lib/Transforms/Scalar/SROA.cpp
4279	If I interpret you correctly you suggest to extend DW_OP_LLVM_fragment to hold additional information about the hole (the undescribed bits between this and the next fragment) that might follow the fragment. That would be a larger change, but it might be worth it.

aprantl added inline comments.Feb 15 2019, 8:25 AM

lib/Transforms/Scalar/SROA.cpp
4279	I think I may need a refresher about what the problem here is. The code here is making the gap between fragments larger, but it's not immediately obvious why. Is that because we are counting the bits from offset 0 to the other end of the value in big endian? I think I may need some ASCII art to illustrate the problem :-)

Ka-Ka added inline comments.Feb 18 2019, 2:16 AM

lib/Transforms/Scalar/SROA.cpp
4279	The code only move where the gap is. The problem here it that the code handle a type that don't fill the entire variable in memory (due to bitfields). The testcase added to this patch is a copy of X86 testcase the test/DebugInfo/X86/sroasplit-5.ll and adapted to powerpc. It demonstrate how the patch work. The variable in the testcase consist of a struct of a i32 and a i24. On little endian the variable will look like this in memory: +--+------+--------+ \|XX\| i24 \| i32 \| +--+------+--------+ Th XX repressent a 8-bit gap (padding). On big endian the variable will look like this in memory: +------+--+--------+ \| i24 \|XX\| i32 \| +------+--+--------+ We need therefore to adjust the offset of the i24-fragment. As you said above there are many places where we create fragments in the compiler, but I'm not sure all of those places need this kind of special handling of padding as needed here in SROA.

LGTM with update to testcase.

lib/Transforms/Scalar/SROA.cpp
4279	Thanks :-) The variable in the testcase is a struct with a single member though? Probably over-reduced. 0 7 31 63 +--+------+--------+ \|XX\| i24 \| i32 \| +--+------+--------+ <-LSB 0 23 31 63 +------+--+--------+ \| i24 \|XX\| i32 \| +------+--+--------+ LSB-> I see. Because we don't explicitly describe the gaps in LLVM IR, we can't use a universal addressing scheme. But as you say, SROA is special since it chops up structs differently on big-endian anyway, so doing something special here may be the right solution. Can you adjust the testcase to match your example and include the ASCII art in a comment explaining what's going on/expected?

This revision is now accepted and ready to land.Feb 18 2019, 9:47 AM

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

SROA.cpp

5 lines

test/

DebugInfo/

PowerPC/

sroasplit-5.ll

73 lines

Diff 186801

lib/Transforms/Scalar/SROA.cpp

Show First 20 Lines • Show All 4,267 Lines • ▼ Show 20 Lines	bool SROA::splitAlloca(AllocaInst &AI, AllocaSlices &AS) {
for (auto &P : AS.partitions()) {		for (auto &P : AS.partitions()) {
if (AllocaInst *NewAI = rewritePartition(AI, AS, P)) {		if (AllocaInst *NewAI = rewritePartition(AI, AS, P)) {
Changed = true;		Changed = true;
if (NewAI != &AI) {		if (NewAI != &AI) {
uint64_t SizeOfByte = 8;		uint64_t SizeOfByte = 8;
uint64_t AllocaSize = DL.getTypeSizeInBits(NewAI->getAllocatedType());		uint64_t AllocaSize = DL.getTypeSizeInBits(NewAI->getAllocatedType());
// Don't include any padding.		// Don't include any padding.
uint64_t Size = std::min(AllocaSize, P.size() * SizeOfByte);		uint64_t Size = std::min(AllocaSize, P.size() * SizeOfByte);
Fragments.push_back(Fragment(NewAI, P.beginOffset() * SizeOfByte, Size));		uint64_t Offset = P.beginOffset() * SizeOfByte;
		if (DL.isBigEndian())
		Offset += std::max(AllocaSize, P.size() * SizeOfByte) - Size;
		Fragments.push_back(Fragment(NewAI, Offset, Size));
		aprantlUnsubmitted Not Done Reply Inline Actions Since there are many places in the compiler where we create fragments, would it be an option to adapt the semantics of DW_OP_LLVM_fragment in a way that allows us to defer the special handling of big endian targets to AsmPrinter/DwarfExpression.cpp ? Or, if that doesn't work create an API for creating new fragments that force users to think about what to do on big-endian targets? aprantl: Since there are many places in the compiler where we create fragments, would it be an option to…
		Ka-KaAuthorUnsubmitted Not Done Reply Inline Actions If I interpret you correctly you suggest to extend DW_OP_LLVM_fragment to hold additional information about the hole (the undescribed bits between this and the next fragment) that might follow the fragment. That would be a larger change, but it might be worth it. Ka-Ka: If I interpret you correctly you suggest to extend DW_OP_LLVM_fragment to hold additional…
		aprantlUnsubmitted Not Done Reply Inline Actions I think I may need a refresher about what the problem here is. The code here is making the gap between fragments larger, but it's not immediately obvious why. Is that because we are counting the bits from offset 0 to the other end of the value in big endian? I think I may need some ASCII art to illustrate the problem :-) aprantl: I think I may need a refresher about what the problem here is. The code here is making the gap…
		Ka-KaAuthorUnsubmitted Not Done Reply Inline Actions The code only move where the gap is. The problem here it that the code handle a type that don't fill the entire variable in memory (due to bitfields). The testcase added to this patch is a copy of X86 testcase the test/DebugInfo/X86/sroasplit-5.ll and adapted to powerpc. It demonstrate how the patch work. The variable in the testcase consist of a struct of a i32 and a i24. On little endian the variable will look like this in memory: +--+------+--------+ \|XX\| i24 \| i32 \| +--+------+--------+ Th XX repressent a 8-bit gap (padding). On big endian the variable will look like this in memory: +------+--+--------+ \| i24 \|XX\| i32 \| +------+--+--------+ We need therefore to adjust the offset of the i24-fragment. As you said above there are many places where we create fragments in the compiler, but I'm not sure all of those places need this kind of special handling of padding as needed here in SROA. Ka-Ka: The code only move where the gap is. The problem here it that the code handle a type that don't…
		aprantlUnsubmitted Not Done Reply Inline Actions Thanks :-) The variable in the testcase is a struct with a single member though? Probably over-reduced. 0 7 31 63 +--+------+--------+ \|XX\| i24 \| i32 \| +--+------+--------+ <-LSB 0 23 31 63 +------+--+--------+ \| i24 \|XX\| i32 \| +------+--+--------+ LSB-> I see. Because we don't explicitly describe the gaps in LLVM IR, we can't use a universal addressing scheme. But as you say, SROA is special since it chops up structs differently on big-endian anyway, so doing something special here may be the right solution. Can you adjust the testcase to match your example and include the ASCII art in a comment explaining what's going on/expected? aprantl: Thanks :-) The variable in the testcase is a struct with a single member though? Probably over…
}		}
}		}
++NumPartitions;		++NumPartitions;
}		}

NumAllocaPartitions += NumPartitions;		NumAllocaPartitions += NumPartitions;
MaxPartitionsPerAlloca.updateMax(NumPartitions);		MaxPartitionsPerAlloca.updateMax(NumPartitions);

▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

test/DebugInfo/PowerPC/sroasplit-5.ll

This file was added.

				; RUN: opt %s -sroa -verify -S -o - \| FileCheck %s

				target datalayout = "E-m:e-i64:64-n32:64"
				target triple = "ppc64"

				; This is the big endian version of the testcase test/DebugInfo/X86/sroasplit-5.ll

				; When SROA is creating new smaller allocas, it may add padding.
				;
				; Verify that the fragment for this i24 start att the correct offset for big endian targets.
				; CHECK: DIExpression(DW_OP_LLVM_fragment, 0, 32)
				; CHECK: DIExpression(DW_OP_LLVM_fragment, 40, 24)

				%struct.prog_src_register = type { i32, i24 }

				define i64 @src_reg_for_float() #0 !dbg !4 {
				entry:
				%retval = alloca %struct.prog_src_register, align 4
				%a = alloca %struct.prog_src_register, align 4
				%local = alloca i32, align 4
				call void @llvm.dbg.declare(metadata %struct.prog_src_register* %a, metadata !16, metadata !17), !dbg !18
				%0 = bitcast %struct.prog_src_register* %a to i8*, !dbg !19
				call void @llvm.memset.p0i8.i64(i8* align 4 %0, i8 0, i64 8, i1 false), !dbg !19
				call void @llvm.dbg.declare(metadata i32* %local, metadata !20, metadata !17), !dbg !21
				%1 = bitcast %struct.prog_src_register* %a to i32*, !dbg !21
				%bf.load = load i32, i32* %1, align 4, !dbg !21
				%bf.shl = shl i32 %bf.load, 15, !dbg !21
				%bf.ashr = ashr i32 %bf.shl, 19, !dbg !21
				store i32 %bf.ashr, i32* %local, align 4, !dbg !21
				%2 = bitcast %struct.prog_src_register* %retval to i8*, !dbg !22
				%3 = bitcast %struct.prog_src_register* %a to i8*, !dbg !22
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %2, i8* align 4 %3, i64 8, i1 false), !dbg !22
				%4 = bitcast %struct.prog_src_register* %retval to i64*, !dbg !22
				%5 = load i64, i64* %4, align 1, !dbg !22
				ret i64 %5, !dbg !22
				}

				declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

				declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) #2

				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i1) #2

				attributes #0 = { nounwind }
				attributes #1 = { nounwind readnone }
				attributes #2 = { nounwind }

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!13, !14}
				!llvm.ident = !{!15}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, producer: "clang version 3.7.0 ", isOptimized: false, emissionKind: FullDebug, file: !1, enums: !2, retainedTypes: !2, globals: !2, imports: !2)
				!1 = !DIFile(filename: "<stdin>", directory: "")
				!2 = !{}
				!4 = distinct !DISubprogram(name: "src_reg_for_float", line: 7, isLocal: false, isDefinition: true, isOptimized: false, unit: !0, scopeLine: 7, file: !5, scope: !6, type: !7, retainedNodes: !2)
				!5 = !DIFile(filename: "foo.c", directory: "")
				!6 = !DIFile(filename: "foo.c", directory: "")
				!7 = !DISubroutineType(types: !8)
				!8 = !{!9}
				!9 = !DICompositeType(tag: DW_TAG_structure_type, name: "prog_src_register", line: 1, size: 64, align: 32, file: !5, elements: !10)
				!10 = !{!11}
				!11 = !DIDerivedType(tag: DW_TAG_member, name: "Index", line: 3, size: 13, align: 32, offset: 4, file: !5, scope: !9, baseType: !12)
				!12 = !DIBasicType(tag: DW_TAG_base_type, name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
				!13 = !{i32 2, !"Dwarf Version", i32 4}
				!14 = !{i32 2, !"Debug Info Version", i32 3}
				!15 = !{!"clang version 3.7.0 "}
				!16 = !DILocalVariable(name: "a", line: 8, scope: !4, file: !6, type: !9)
				!17 = !DIExpression()
				!18 = !DILocation(line: 8, scope: !4)
				!19 = !DILocation(line: 9, scope: !4)
				!20 = !DILocalVariable(name: "local", line: 10, scope: !4, file: !6, type: !12)
				!21 = !DILocation(line: 10, scope: !4)
				!22 = !DILocation(line: 11, scope: !4)