That patch is the fix for https://bugs.llvm.org/show_bug.cgi?id=40703 "wrong line number info for obj file compiled with -ffunction-sections" bug. The problem happened with only .o files. If object file contains several .text sections then line number information showed incorrectly. The reason for this is that DwarfLineTable could not detect section which corresponds to specified address(because address is the local to the section). And as the result it could not select proper sequence in the line table. The fix is to pass SectionIndex with the address. So that it would be possible to differentiate addresses from various sections. With this fix llvm-objdump shows correct line numbers for disassembled code.
Details
Diff Detail
Event Timeline
This is topic on llvm-dev http://lists.llvm.org/pipermail/llvm-dev/2019-February/130187.html
This has some relatively extensive changes to some of the llvm tools (like sancov, for instance) that aren't tested - perhaps it'd be better to leave those tools usage unmodified (adding the undef section value as a fix to the API change, but nothing more) with fixits and/or followup patches that implement the improved functionality possible with the new API, along with tests?
llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h | ||
---|---|---|
142–144 | Might be simpler to implement as: return std::tie(LHS.SectionIndex, LHS.Address) < std::tie(RHS.SectionIndex, RHS.Address); | |
150 | Maybe "presented" could be replaced with "is present" here and below? | |
llvm/include/llvm/Object/ObjectFile.h | ||
139 | Maybe use non-static data member initializers rather than constructors? | |
145–149 | If this can be changed to a plain struct, might be nice to move this operator out of the class definition (& generally/at least it should be a friend rather than a member, so that LHS and RHS get equal treatment for conversions, etc) & implement this with std::tie similar to the previous suggestion, maybe? | |
324 | Cleanups like this would ideally go in separate NFC (No Functional Change) commits - shouldn't require any review, feel free to just commit them if you have commit rights. | |
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp | ||
273 | Is this llvm:: qualifier to disambiguate different things called SectionedAddress? Perhaps we could avoid having different things with that name and use only one of them - maybe move the one that's already in libDebugInfo into libObject? (perhaps as an incremental patch before the rest of this change) | |
973 | Presumably this is going to have ambiguity problems if it doesn't have the section too? Which CU does it return if there are two CUs both with address zero in them (because they each describe functions in different sections), for instance? | |
977–978 | Similarly here | |
1058–1060 | Yeah, seems like it'd be best to have one SectionedAddress type rather than mapping between these equivalent but different types. | |
llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp | ||
867–868 | Perhaps Row should contain a SectionedAddress rather than a separate Address and SectionIndex field (again, could be a preliminary or after-this-patch refactoring to clean it up if it'd otherwise introduce a lot of churn to this already fairly large patch) | |
890–893 | Under what conditions is this code needed? (it looks like this would only fail if the section index was incorrect? Which makes me think/feel like this is maybe working around a bug of some kind?) | |
906 | Maybe this should be "lookupAddressImpl" if it's the underlying implementation that is used by the more general lookupAddress entry point? | |
989–990 | Maybe this should be {Address.SectionIndex, EndAddr - 1} (though admittedly it's a bit subtle having to know if section index or address comes first) & passed in directly to findRowInSeq - no need to split out into a named variable, etc. | |
1072–1074 | Looks like unrelated refactoring? (better in a separate commit) | |
llvm/test/tools/llvm-objdump/Inputs/function-sections-line-numbers.cpp | ||
1–25 | Seems like a rather long/complicated test case. Is it sufficient to compile this with -ffunction-sections: void f1() { } void f2() { } and then symbolize address 0? | |
llvm/tools/llvm-cfi-verify/lib/FileAnalysis.cpp | ||
533–538 | Could use the ctor: object::SectionedAddress Address(SectionIndex, VMAddress); or braced init: IndirectInstructions.insert({SectionIndex, VMAddress}); (I think the latter's probably fine & worth doing across this patch - but I do appreciate the concern that without names one could easily mix up the section/address order - so perhaps you/other reviewers feel differentlny to me) | |
llvm/tools/llvm-cfi-verify/lib/GraphBuilder.h | ||
105 | Might as well pass a SectionedAddress here? | |
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp | ||
420–421 | Perhaps this should be changed to produce more than one result? Maybe just leaving a FIXME (given that it's no worse with this change than it was before) | |
llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp | ||
392–393 | You can roll the variable into the condition here: if (auto SecOrErr = Sym.getSection()) & LLVM style tends not to put {} on single-line statements like this |
David, thank you for the review. I would address all points. two major points ;
- If it is OK to left tools with UNDEF value and TODO comment - then I am also OK with this and would remove changes for sancov.
- As to having two SectionedAddress structures. I was thinking that existing SectionedAddress structure is local for Dwarf sources. So that Dwarf sources could be independend from another sources. But if it is not the case - then It would be better to leave only one SectionedAddress structure defined at Object/ObjectFile.h So if it is OK I will just remove SectionedAddress from DWARFSection.h
My comments about LLD side are below.
Question is: can LLD part have a test?
I also wonder could this patch just pass object::SectionedAddress::UndefSection,
would behavior be the same? If so, then probably would be better to split LLD change
to a different patch with the test case.
lld/ELF/InputFiles.cpp | ||
---|---|---|
208 | detect -> Detect (comments should start from uppercase) | |
210 | I would like to hear what Rui thinks about this, And FTR I do not know a better way to find the section index here, seems | |
213 | We usually use prefix increments I think: ++CurIndex |
Hi George, Thank you for comments. I will address them. According to the questions ;
- yes, I would create a test for LLD.
- Using object::SectionedAddress::UndefSection as section index makes following tests fail :
lld :: ELF/conflict.s lld :: ELF/debug-line-str.s lld :: ELF/multiple-cu.s lld :: ELF/undef.s lld :: ELF/x86-64-reloc-range-debug-loc.s
llvm/include/llvm/Object/ObjectFile.h | ||
---|---|---|
139 | I changed it to non-static data member initializers. But it looks like I could not drop following constructors. Since if I add non-static data member initializers then it stopped to compile inplace initializers = {Addr, Index}. To compile inplace initializer - it require constructor SectionedAddress (Addr, Index). It also require default constructor for default instantiation. | |
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp | ||
273 | moved SectionedAddress from libDebugInfo into linObject. | |
973 | Agreed. I left it as is to reduce size of the fix. Since It is not required for 40703 bug. I would put TODO comment here. | |
977–978 | would put TODO comment here. | |
llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp | ||
890–893 | That is how DebugLineInfo organized. If it has absolute addresses - it does not care which section they relate. So if caller ask for information for an AbsoluteAddress from section X then there would not be possible to find such entry. All absolute addresses put in Undef section(in DebugLineInfo table). But if Line Info table has Relocation information then it would create entry for specific SectionIndex. Earlier there was not this differrentiation because addresses were not marked with SectionIndex. Now addresses which have relocations marked with proper section index and absolute addresses marked with undex section index. Thus the algorithm is following : find address in relocations for specified section. then if not found check it for Undef section which contains info about all absolute addresses. another alternative would be to set proper sectionIndex for absolute addresses while DebugLineInfo table create it`s map. Then there would not be neccessary to search twice, but there would be neccessary to create more complex map. I think this is just another way to implement the same. | |
llvm/test/tools/llvm-objdump/Inputs/function-sections-line-numbers.cpp | ||
1–25 | I took above short test case and check how llvm-objdump shows line numbers. for llvm-symbolizer it would probably be necessary to extend command line to tell to which section specified address related. | |
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp | ||
420–421 | added TODO comment for this. |
George, I added test for LLD. But.... It already pass for LLD without this fix. The reason for that is following quickfix :
lld/ELF/DWARF.cpp:90
// FIXME: We should be consistent about always adding the file // offset or not. if (DR->Section->Flags & ELF::SHF_ALLOC) Val += cast<InputSection>(DR->Section)->getOffsetInFile();
lld/ELF/InputFiles.cpp:208
// Use fake address calcuated by adding section file offset and offset in // section. See comments for ObjectInfo class. DILineInfo Info; for (const llvm::DWARFDebugLine::LineTable *LT : LineTables) if (LT->getFileLineInfoForAddress( S->getOffsetInFile() + Offset, nullptr, DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath, Info)) return Info;
FileOffset added instead of section offset(because section offset could be 0). I think that quickfix should not prevent making interfaces full. i.e. I still think that adding SectionIndex for getFileLineInfoForAddress is a right thing. Having this already done - the fix with getOffsetInFile could probably be refactored out.
Ok. I debugged this a little and think that:
- LLD code change LGTM.
- I find the LLD test case potentially useful because I did not find any test checking exactly this situation. Given that LLD has mentioned "TODOs" in the code that we might want to remove, it would be good to have such a test for the future possible changes probably.
- It feels that the test can be reduced and might need some cleanup (I left a few inline comments).
- Since without assigning the section index, we would have few existent LLD tests failures (and also since this test pass without any code changes atm), it should be fine to go without any additional test for LLD in this patch.
To summarize: I suggest to proceed with this patch without the LLD test and post it on a different review as an independent improvement.
lld/test/ELF/debug-line-obj.s | ||
---|---|---|
5 ↗ | (On Diff #187074) | I do not think you need to mention "crash" here. I would just say here that this test checks that we report locations correctly when we have the line number |
238 ↗ | (On Diff #187074) | You should not need .debug_macinfo section. |
251 ↗ | (On Diff #187074) | I think you can remove .ident and the .note.GNU-stack section below. |
llvm/test/tools/llvm-objdump/function-sections-line-numbers.test | ||
1 | I did not find any other llvm-objdump test that would use clang. |
deleted LLD test as it was requested. Modified llvm-objdump test to not to use clang.
llvm/include/llvm/Object/ObjectFile.h | ||
---|---|---|
139 | ah, seems that rule doesn't exist in C++14 anymore - so maybe a FIXME to remove the ctors once we've adopted C++14? | |
149 | These shouldn't be static (means they'd have internal linkage, and they'd be duplicated (& not deduplicated by the linker) and lead to ODR violations (any non-static inline function that calls these and is used in multiple translation units would be in violation of the ODR because that inline function would differ (it'd be calling different functions) between fileS)) | |
llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp | ||
939–950 | Probably flip this around to reduce indentation: if (lookupAddressRangeImpl(...)) return true; if (Address.SectionIndex == Undef) return false; Address.SectionIndex = Undef; return lookupAddressRangeImpl(...); | |
989–992 | In some places you've rolled the SectionedAddress creation into the call site, and in others you have a named variable like this. I think it probably makes sense to not name these variables in general (unless they're needed more than once)? But not a big deal. | |
1072–1075 | Still looks like some unrelated refactoring here? | |
llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp | ||
391–392 | Could you use "else if" here? | |
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
202 | can't say I've seen this written this way - I think it's probably more common to write it as "if (!BinaryOrErr)"? | |
205 | Rather than x.get().y you could write this as "x->y"? |
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
---|---|---|
202 | There are more usages of error() in this file. so I did it exactly like in other places. Though if you think it would be better to change it into "if (!BinaryOrErr)" then I will change it. |
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
---|---|---|
200 | @avl This patch has broken sanitizer tests on Darwin (rdar://problem/48474543). In particular this now fails. FILECHECK_DUMP_INPUT_ON_FAILURE=1 bin/llvm-lit -v projects/compiler-rt/test/asan/X86_64DarwinConfig/TestCases/Darwin/haswell-symbolication.cc I've tracked down the issue to this commit and this function (getModuleSectionIndexForAddress()). The problem is that this code assumes the module name is the same as the path to the binary. This doesn't seem to be true on Darwin because while running the sanitizer test llvm-symbolizer tries to open... /path/to/llvm_build/projects/compiler-rt/test/asan/X86_64DarwinConfig/TestCases/Darwin/Output/haswell-symbolication.cc.tmp.thin.x86_64:x86_64 Note the :x86_64 at the end of the module name. That's the slice name inside the file and is not part of the file name itself. We need to land a fix for this or we'll need to revert this patch. I suspect this issue must be handled elsewhere in the codebase because Darwin specific logic probably doesn't belong here. @JDevlieghere Any ideas if there's an API that can be used here to get the binary file path from the module name? |
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
---|---|---|
200 | I will research how to fix it. Make me known if there is proper solution for this, please. |
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
---|---|---|
200 | I see that situation already handled in LLVMSymbolizer::getOrCreateModuleInfo(). I am going to handle this in similar manner : size_t ColonPos = ModuleName.find_last_of(':'); // Verify that substring after colon form a valid arch name. |
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp | ||
---|---|---|
200 | @delcypher I commit a fix for it. But now i think it would not work. I will prepare review for better version asap. |
detect -> Detect
(comments should start from uppercase)