Page MenuHomePhabricator

[SymbolFileDWARF] Introduce DWARFContext
ClosedPublic

Authored by zturner on Mar 19 2019, 2:18 PM.

Details

Summary

LLVM's DWARF parsing library has a class called DWARFContext which holds all of the various DWARF data sections and lots of other information. LLDB's on the other hand stores all of this directly in SymbolFileDWARF / SymbolFileDWARFDwo and passes this interface around through the parsing library. Obviously this is incompatible with a world where the low level interface does not depend on the high level interface, so we need to move towards a model similar to LLVM's - i.e. all of the context needed for low level parsing should be in a single class, and that class gets passed around.

This patch is a small incremental step towards achieving this. The interface and internals deviate from LLVM's for technical reasons, but the high level idea is the same. The goal is, eventually, to remove all occurrences of SymbolFileDWARF from the low level parsing code.

For now I've chosen a very simple section - the .debug_aranges section to move into DWARFContext while leaving everything else unchanged. In the short term this is a bit confusing because now the information you need might come from either of 2 different locations. But it's a huge refactor to do this all at once and runs a much higher risk of breaking things. So I think it would be wise to do this in very small pieces.

TL;DR - No functional change

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Mar 19 2019, 2:18 PM
JDevlieghere added inline comments.Mar 19 2019, 3:12 PM
lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.h
30 ↗(On Diff #191393)

Why do we need these two methods? I presume it's because one does work and the other doesn't? If I understand the code correctly, LoadOrGetSection will initialize the optional, so could you just check the optional to determine where we should try to do work or not?

zturner marked an inline comment as done.Mar 19 2019, 3:19 PM
zturner added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.h
30 ↗(On Diff #191393)

It's because one of them is const and a const version of the function cannot do any work. It's possible we won't ever need this, and we'll always want a "get or create" function, but i can imagine scenarios where we have a const reference or const pointer.

This is always a problem with these lazy type interfaces, because in that case should you actually use const_cast or should you just fail and return nullptr? It's a tough call. I chose to explicitly embed into the name whether the function was doing work (e.g. maybeLoad) versus being purely a getter that will never do work (e.g. get).

JDevlieghere added inline comments.Mar 19 2019, 3:40 PM
lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.h
30 ↗(On Diff #191393)

Aha, I missed that. nit about the name: how about getArangesData and getOrLoadArangesData

zturner updated this revision to Diff 191421.Mar 19 2019, 5:23 PM

I went ahead and just removed the const get version entirely, while changing the other function name to getOrLoad as you suggested. I have another patch while I'll upload after this one that converts all of the rest of the functions (except for the virtual ones required for DWO support), and the const version of the function was literally never needed, except in one place that was itself dead code. So, in the spirit of YAGNI, it's gone until it demonstrates a need.

This revision is now accepted and ready to land.Mar 19 2019, 5:27 PM

I remember LLVM's DWARFContext being irritating to deal with when I was doing the DWARF 5 stuff. The exact details have been swapped out of my memory, but I suspect it had something to do with needing to know whether I was in DWO mode or normal mode in order to snag the right section for some sort of cross-section lookup. This is because DWARFContext is the dumping ground for all the sections, regardless of what kind of file we're looking at, and it's up to the user of its interfaces to know what mode it's in.
I will admit that having one dumping ground is convenient in some ways, especially when we were looking at object files that contained a mix of both normal and .dwo sections, but IIRC we don't emit mixed-mode object files anymore.

Not objecting to this patch, but it was the most convenient place to raise this particular grump, and something to think about as you progress.

labath added a subscriber: labath.Mar 20 2019, 5:45 AM
labath added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.h
28 ↗(On Diff #191421)

Don't want to block progress here over this, but I just wanted to say that my impression was that this DwarfData business was a relict from the time when we re not mmapping the input object files. These days, we always mmap the input, and so I don't believe that should be necessary because ObjectFileMachO will automatically perform the DataBuffer slicing that you're doing here manually as a part of ReadSectionData.

Maybe one of the Apple folks could check whether this is really needed, because if it isn't, it will make your job easier here, as well as produce an API that is more similar to its llvm counterpart.

clayborg requested changes to this revision.Mar 20 2019, 7:38 AM
clayborg added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.cpp
17 ↗(On Diff #191421)

is "mapped_dwarf_data" actually the entire file mmap'ed? If so rename this to "mapped_file_data"?

40 ↗(On Diff #191421)

Use:

lldb::offset_t Section::GetSectionData(DataExtractor &data);

We might want to make the ObjectFile::ReadSectionData private so people don't use it and add a doxygen comment to use the above API in Section?

46–49 ↗(On Diff #191421)

Again, is this the entire file that is mapped?

lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.h
21 ↗(On Diff #191421)

What does m_dwarf_data contain? The entire file? Just the __DWARF segment for mach-o?

28 ↗(On Diff #191421)

We don't always mmap. We mmap if the file is local. We read the entire file if the file is on a network drive. If we don't do this and we mmap a NFS file, and that mount goes away, we crash the next time we access a page that hasn't been paged in. So we don't know if stuff is actually mmap'ed or not.

lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h
41 ↗(On Diff #191421)

Store as a reference maybe? We must assume that the DWARFContext will be alive as long as this class is around. We would need to supply this in the constructor then. Is that possible?

70 ↗(On Diff #191421)

Be nice if this can be a reference and be set in the ctor

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
690–691 ↗(On Diff #191421)
m_info.reset(new DWARFDebugInfo(m_dwarf_context));
693 ↗(On Diff #191421)

revert

This revision now requires changes to proceed.Mar 20 2019, 7:38 AM
labath added inline comments.Mar 20 2019, 7:43 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.h
28 ↗(On Diff #191421)

Yeah, I wasn't being fully correct here, but this distinction does not really matter here. Even if we don't mmap, we will end up with a full image of the file in memory, so anybody can slice any chunk of that file as he sees fit. Since ObjectFileMachO::ReadSectionData already implements this slicing, there's no advantage in slicing manually here.

zturner marked 5 inline comments as done.Mar 20 2019, 8:13 AM
zturner added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.cpp
17 ↗(On Diff #191421)

It's the contents of the __DWARF segment on MachO, unless my reading of the old code is wrong.

In the old code, SymbolFileDWARF would try to read this segment, and if it existed set it to SymbolFileDWARF::m_dwarf_data. Then, the ReadCachedSectionData function would first check if this member had data in it, and if so read the data from there instead of from the ObjectFile.

In this modified version of the patch, I call SetDwarfData with the contents of this section, then pass it through to this function so that we can perform the same logic.

40 ↗(On Diff #191421)

Yes, I like that better. Thanks for pointing it out.

46–49 ↗(On Diff #191421)

As mentioned elsewhere, this is the contents of __DWARF, which I didn't understand the purpose of, so I was hesitant to make any drastic changes. Perhaps you could clarify the significance of it now though and perhaps suggest an alternative approach to what we're doing here.

lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.h
28 ↗(On Diff #191421)

The parameter here represents the content of the __DWARF data segment on MachO. I don't really know what this is for or the implications of removing it, so for now I left it identical to the original code with NFC, and planned on addressing this in the future.

lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h
70 ↗(On Diff #191421)

Fair enough, I kept it as a pointer for parity with SymbolFileDWARF* m_dwarf2Data, but a reference is definitely better.

We can get rid of the m_dwarf_data all together. As Pavel suggested, we used to not mmap the entire file, and now we do most of the time. The Section::GetSectionData() will ref count the mmap data in the data extractor and use the mmap already, so we should only support grabbing the contents via the Section::GetSectionData() and don't propagate this legacy code in the new DWARFContext.

clayborg added inline comments.Mar 20 2019, 8:36 AM
lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.cpp
17 ↗(On Diff #191421)

remove "mapped_dwarf_data" arg

34–38 ↗(On Diff #191421)

remove

46–49 ↗(On Diff #191421)

remove

50 ↗(On Diff #191421)

rename to "getArangesData()"? Would it be better to return an Optional<DWARFDataExtractor>?

lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.h
21 ↗(On Diff #191421)

Remove this now that we don't need it since Section::GetSectionData() will do the right thing already.

28 ↗(On Diff #191421)

We can get rid of this after thinking about this.

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
404–407 ↗(On Diff #191421)

Get rid of m_dwarf_data now that we don't need it? That will make subsequent patches easier.

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
461 ↗(On Diff #191421)

"m_dwarf_ctx" to keep it shorter? Or just "m_ctx"?

zturner updated this revision to Diff 191510.Mar 20 2019, 9:32 AM
zturner marked an inline comment as done.

Updated based on suggestions, including removal of the m_dwarf_data member variable which holds the contents of the __DWARF segment on MachO.

clayborg added inline comments.Mar 20 2019, 1:14 PM
lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.cpp
24 ↗(On Diff #191510)

Don't we want to put an empty DWARFDataExtractor in here? Maybe this code should be:

const SectionList *section_list = module.GetSectionList();
  if (!section_list)
    return nullptr;

  auto section_sp = section_list->FindSectionByType(section_type, true);
  if (!section_sp)
    return nullptr;

  extractor.emplace();
  if (section_sp->GetSectionData(*extractor) == 0)
    extractor = llvm::None;
  return extractor.getPointer();
40 ↗(On Diff #191510)

Why do we care about "getOrLoadArangesData()"? What not just "getArangesData()"?

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
552 ↗(On Diff #191510)
if (m_obj_file->ReadSectionData(section_sp.get(), data) == 0)
  data.Clear();
clayborg added inline comments.Mar 20 2019, 1:15 PM
lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.cpp
24 ↗(On Diff #191510)

Or use a local DWARFDataExtractor and move it into "extractor" if we succeed?

DWARFDataExtractor data;
if (section_sp->GetSectionData(data) > 0)
    extractor = std::move(data);
zturner marked 2 inline comments as done.Mar 20 2019, 1:21 PM
zturner added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/DWARFContext.cpp
24 ↗(On Diff #191510)

The suggested code here has a bug. If you say

extractor = llvm::None;
return extractor.getPointer();

it will assert, because there is no pointer, it's uninitialized. When i write extractor.emplace(); it initializes it with a default constructed DataExtractor, the idea being that if we try this function and it fails for any reason, we should not try the function again, because it already failed once, and so we should just "cache" the fact that it failed and immediately return. By having a default constructed DataExtractor and changing the fast path exit condition to also return null in the case where the size is 0, we ensure that we only ever do work once, regardless of whether that work fails or succeeds.

40 ↗(On Diff #191510)

I like to emphasize in the name that the function might do work. People are trained to think of get functions as read-only, but it's not the case here and so the idea is to make sure that nobody can misinterpret the behavior.

50 ↗(On Diff #191421)

I actually prefer to keep the name as getOrLoad. get functions often sound like they don't modify anything, but this way it's clear that lazy evaluation will occur, which might be important to the caller for some reason or another.

clayborg added inline comments.Mar 20 2019, 1:34 PM
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
552 ↗(On Diff #191510)

What do you think about this one? This is my last nit

zturner marked an inline comment as done.Mar 20 2019, 1:40 PM
zturner added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
552 ↗(On Diff #191510)

I did see that one, but I was having trouble figuring out how the behavior is different from what I already wrote. The only way I could see it being different is if ReadSectionData would actually write data there even when it failed. That would be strange though, do you know if it can happen?

The way I wrote it, it clears first, and then if the function fails / returns 0, I would expect it to be unchanged.

Happy to change it if there's actually a difference I'm overlooking, but if they're the same then I think the branchless version is slightly easier to read.

clayborg accepted this revision.Mar 20 2019, 1:41 PM

The suggestion just avoids the Clear() call at the expense of a branch. No big deal. Fine either way.

This revision is now accepted and ready to land.Mar 20 2019, 1:41 PM
Closed by commit rL356612: Introduce DWARFContext. (authored by zturner, committed by ). · Explain WhyMar 20 2019, 1:48 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2019, 1:48 PM