This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"
ClosedPublic

Authored by avl on Mar 1 2019, 2:10 PM.

Download Raw Diff

Details

Reviewers

dblaikie
grimar
delcypher

Summary

This patch fixes buildbot failures after https://reviews.llvm.org/D58194.

[DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"
    
[Symbolizer] Add getModuleSectionIndexForAddress() helper routine
    
The https://reviews.llvm.org/D58194 patch changed symbolizer interface.
Particularily it requires not only Address but SectionIndex also.
Note object::SectionedAddress parameter:
    
Expected<DILineInfo> symbolizeCode(const std::string &ModuleName,
                                 object::SectionedAddress ModuleOffset,
                                 StringRef DWPName = "");
    
There are callers of symbolizer which do not know particular section index.
That patch creates getModuleSectionIndexForAddress() routine which
will detect section index for the specified address. Thus if caller
set ModuleOffset.SectionIndex into object::SectionedAddress::UndefSection
state then symbolizer would detect section index using
getModuleSectionIndexForAddress routine.

Diff Detail

Event Timeline

avl created this revision.Mar 1 2019, 2:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2019, 2:10 PM

Herald added subscribers: rupprecht, hiraditya. · View Herald Transcript

dblaikie added inline comments.Mar 1 2019, 4:37 PM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
142 ↗	(On Diff #188970)	Any reason this is becoming a member function from a file-local function?
146 ↗	(On Diff #188970)	I hadn't spotted this code (or looked very closely at it, at least) in the original review. I'm surprised/confused why the original change involved adding code that would look at input object file names at all. Why is this? Hmm, I see looking at the llvm-symbolizer code that this logic looks up the section name and then passes that in along with the file name to the symbolizing routine. But what's the point of that? It doesn't seem like it would change/improve the quality of the symbolizer's output compared to passing in a "I don't know what section" value, right?
148–157 ↗	(On Diff #188970)	Lots more get<N> rather than first/second. But there's also enough uses here it might be nicer to name the two variables instead of accessing them by pair repeatedly.
341 ↗	(On Diff #188970)	Perhaps this function should return StringRefs into the originally passed in string (perhaps the original input should be a StringRef too)?
355 ↗	(On Diff #188970)	Would make_pair be simpler here? Also potentially using std::move to avoid extra string copies: return std::make_pair(std::move(BinaryName), std::move(ArchName));
443–444 ↗	(On Diff #188970)	rather than using the tuple accessor (get<N>) I think it's probably still more canonical to use ".first" and ".second" for pairs.
448 ↗	(On Diff #188970)	You could replace the "unique_ptr<T>()" with "nullptr" here.
453 ↗	(On Diff #188970)	I don't think much code in LLVM checks for explicit "== nullptr" - this would probably more canonically be written as "if (!Objects.first)"
454 ↗	(On Diff #188970)	Guess this cast is needed because of the conversion to Expected? Is non-error, but null, a valid result from this function? That's a tricky thing, but could be the case. I wonder if Expected<T*> should be constructible from nullptr_t to simplify this sort of case?
llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp
150–158 ↗	(On Diff #188970)	Is this related to the rest of the change in this patch? If not, it should probably go in a separate patch/commit.

dblaikie added inline comments.Mar 1 2019, 4:39 PM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗	(On Diff #188970)	I think this follows on from one of the comments I made in the original review - I don't know it's worth plumbing through the section index to any callers except the original one you had in mind (objdump disassembly) as all the other callers will need work to do anytihng other than what they do today (eg: the symbolizer could have support for multiple answers - the same address in multiple sections (in which case it'd need tests for that, etc)- but this bit of code isn't doing that)

David, Thank you for the comments! I skipped code style issues for the moment. please check the comments for allowing setting SectionIndex into Undef state and for the necessity of getModuleSectionIndexForAddress() routine. After we will decide on these questions I will handle code style issues.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
142 ↗	(On Diff #188970)	the idea was: to not copy ModuleName parsing code from Symbolizer. This code is neccessary to properly handle ModuleName on Darwin, since it contains architecture postfix. and another reason Is to use this function in any place where SectionIndex would be neccessary. Specifying Undef value is not enough usually, see later comment on the Undef SectionIndex usage.
146 ↗	(On Diff #188970)	I think that my original description was not clear enough and there is some misunderstanding of new interface usage. Let`s clarify it now and decide what should it mean. I assume that SectionIndex should always be set in the correct value. Setting SectionIndex into Undef state could lead to incorrect work if relocations present. At least that is how it is currently done. That assumes that all callers of new interface need to do extra work to set SectionIndex properly. In some cases I already done that additional work, in some cases left it in Undef state(if it did not brake testing) and for some cases getModuleSectionIndexForAddress could be used for setting SectionIndex(that is the temporarily routine until proper code implemented). These two last cases assumed to be changed into proper implementation. It differs from the logic - if SectionIndex presented then take it into account, if SectionIndex is not presented then handle it somehow. The rationale for that is that DWARFDebugLineTable looks to be improper place to decide how to handle "I don't know what section" case. Different use cases could assume different handling of that situation. So I decided that DWARFDebugLineTable should always receive full info and caller of this routine should decide how to handle this situation. f.e. case "I don't know what section" could be handled : a) find correct section and set it. b) find several sections which match to local offset and use them. c) find last section which match to local offset and use it(behavior before fix). d) ignore that case. f) display error ... something else... Do you think we need to change it and allow setting SectionIndex into Undef state and specify how DWARFDebugLineTable would handle it ?
llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp
150–158 ↗	(On Diff #188970)	The only relation is that it fixes several lit tests after "D58194" patch.

Hahnfeld added a subscriber: Hahnfeld.Mar 3 2019, 7:20 AM

dblaikie added inline comments.Mar 4 2019, 11:14 AM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
142 ↗	(On Diff #188970)	Right, I understand the first bit - but that wouldn't change where this function should live. The second bit - it'd be nice to move it potentially in a separate refactoring patch (and/or including it in the patch where a second caller is added to this function) to help isolate/separate independent changes.
146 ↗	(On Diff #188970)	Ah. But most (all?) executables only have one executable section, right? That's why the way this worked before your change has worked OK for most/all users historically? On that basis I think it makes sense to let the exsiting/previous APIs (witohut specifying a section index (ie: have the original functions, without any section index parameter - then overloaded with the new versions that allow the section index to be specified) - or perhaps having to specify an Undef one) to continue to work, while letting those callers who need to, specify it.

avl marked an inline comment as done.Mar 5 2019, 3:38 AM

avl added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗	(On Diff #188970)	right, most executables only have one executable section. That is why previous implementation worked correctly for them using only address as point locator. But for not-linked objects this is not right. Previous implementation did not work for them. i.e. if we allow previous implementation by setting SectionIndex = UndefSection then such a call would work for only linked executable. That means that DWARFDebugLine or LLVMSymbolizer would imply concrete state. following calls could be done for only linked executables : DWARFDebugLine.getFileLineInfoForAddress ( SectionedAddress.SectionIndex = UndefSection LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = UndefSection it would be error to call them for non-linked object file. That means that all places where we could expect non-linked object file needs to have something like this : if (linked executable) { LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = UndefSection } else { LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = getSectionIndex() } llvm-dwardump, llvm-objectdump, lld, llvm-symbolizer - could work not only with linked executable but with not-linked object files also. So it looks like most usages would expect both cases - linked executables and not-linked object files. Probably in that case stateless interface would be more convenient ? Current stateless interface does not assume whether it works for linked or not-linked object. It always receives correct SectionIndex and works equally for both types : LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = getSectionIndex() The places where it becomes less convenient is the code assumed to work with only linked-executable. In such places there would be neccessary to add code which would detect SectionIndex for new interface. But this looks to be quite natural task and written once it could be used later for free. I created such a routine - getModuleSectionIndexForAddress(though I agree this is quite naive implementation) . The advantage of the new stateless interface is that it could not be used incorrectly. i.e. it would not be possible to use UndefSection for non-linked object files. Probably it would make sense to not keep old behavior without SectionIndex ? currently UndefSection used in following modules : sanstats sancov llvm-xray llvm-dwarfdump

I separated ExecutionEngine into another review - https://reviews.llvm.org/D58959

avl marked an inline comment as done.Mar 8 2019, 11:46 AM

avl added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗	(On Diff #188970)	David, do you think it makes sense to assume that SectionIndex should always be set in the correct value(not equal to undef) ? Or your preferred choice is to allow SectionIndex to be set into UndefState and assume that in that case linked executables should be processed ?

dblaikie added inline comments.Mar 8 2019, 12:07 PM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗	(On Diff #188970)	I don't think it makes sense to open the file separately, search for a valid section index, then pass that into the API - compared to the API doing this for you. If you know the section you want, you pass it in, if not, you don't & you get the best effort (first instance, or perhaps APIs could be updated to return multiple responses when it's ambiguous). I take it your idea was that this "open the file, search for some section index which covers the address, then use that section index when calling the underlying API" was meant as a stop-gap until some longer term solution where section indexes would always be passed in? I don't think that's a future we'd really have - most of these uses/users aren't going to pass in section indexes (users of llvm-dwarfdump/objdump/symbolizeretc are unlikely to specify a section index on the command line too), so I think it makes sense to have the built-in functionality be what's currently grafted on top using getModuleSectionIndexForAddress - leave that as the built-in functionality as it was before and let users override/be specific if they want to be.

avl marked an inline comment as done.Mar 8 2019, 2:25 PM

avl added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗	(On Diff #188970)	I take it your idea was that this "open the file, search for some section index which covers the address, then use that section index when calling the underlying API" was meant as a stop-gap until some longer term solution where section indexes would always be passed in? yes. that`s right. I don't think that's a future we'd really have - most of these uses/users aren't going to pass in section indexes (users of llvm-dwarfdump/objdump /symbolizeretc are unlikely to specify a section index on the command line too), so I think it makes sense to have the built-in functionality be what's currently grafted on top using getModuleSectionIndexForAddress - leave that as the built-in functionality as it was before and let users override/be specific if they want to be. That is also thing which I am agree. i.e. users should have a built-in functionality for setting proper SectionIndex. They should not do it manually. Thing which seems to me questionable is that - Is it OK to hide that builtin functionality under old symbolizer interface : symbolizeCode(..., object::SectionedAddress ModuleOffset) symbolizeInlinedCode(..., object::SectionedAddress ModuleOffset) symbolizeData(..., object::SectionedAddress ModuleOffset i.e. if ModuleOffset.SectionIndex = UndefSection and if we will keep old behavior in this case. That looks questionable because different use cases could require different behaviour. But in this case some concrete behavior would already be done. Instead we can specify interface so that it always receives correct unambitious information. i.e. these methods would always receive correct SectionIndex : symbolizeCode(..., object::SectionedAddress ModuleOffset) symbolizeInlinedCode(..., object::SectionedAddress ModuleOffset) symbolizeData(..., object::SectionedAddress ModuleOffset so that SectionIndex either set into proper index of section, either set into Undef state and it means an absolute address. That allows to implement several builtin behaviors separately. for example : we could implement this method for symbolizer : enumerateSections (uint64_t address, std::function<void(uint_64_t SectionIndex)> on_section); then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them. If we will need another behavior(find first of, find last of/display error) then it would be possible to create another helper builtin routine(as part of symbolizer interface or as separate implementation). This patch follows that idea : it requires SectionIndex to be properly specified for symbolizeCode/InlinedCode/Data methods and provides helper getModuleSectionIndexForAddress() routine so that all existing tools would be able to call it and work the same way as before. This getModuleSectionIndexForAddress routine assumed to be temporarily and should be replaced later with proper solution(s). Something like above enumerateSections() suggestion. So it looks like this solution allows API to help users and at the same time to implement several use cases.

dblaikie added inline comments.Mar 8 2019, 2:41 PM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗	(On Diff #188970)	I'm still not sure I follow the harm of having these APIs default to their old behavior - if the current solution is to have them all use this extra API to more awkwardly get the old behavior anyway. "then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them." What's the new behavior you're describing here - how would dwarfdump/objdump/symbolizer behave differently if they enumerated the sections? What sort of user-facing behavior change are you picturing here? I think the existing behavior's really not too problematic & fine to preserve (what do GNU tools do - eg: addr2line on an object file?) & API users can override the Undef section index to some specific section if/when they want to.

avl marked an inline comment as done.Mar 9 2019, 10:49 AM

avl added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗	(On Diff #188970)	I'm still not sure I follow the harm of having these APIs default to their old behavior - if the current solution is to have them all use this extra API to more awkwardly get the old behavior anyway. My main concern about "old behavior for Undef section index" is that it is implicit solution and this solution does not have cancel option. If we will need to implement another behavior for : symbolizeCode(SectionedAddress.SectionIndex = Undef) How would it be possible to cancel "old behavior for Undef section index" ? "then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them." What's the new behavior you're describing here - how would dwarfdump/objdump /symbolizer behave differently if they enumerated the sections? What sort of user-facing behavior change are you picturing here? Actually I do not suggest to change their behavior here. I just show that they could require different behavior than current one: llvm-symbolizer -obj=test.o 0x10 func1() test.c:3 above usage will show only one entry from some.o. If some.o has more sections which match to the specified address then they would not be shown. It is possible that symbolizer could be extended to show entries from all sections matched to specified address. llvm-symbolizer -obj=test.o 0x10 func1() test.c:3 func2() test.c:20 In this case there would be necessary to implement some sort of sections enumeration. "old behavior for Undef section index" would not help here. I think the existing behavior's really not too problematic & fine to preserve (what do GNU tools do - eg: addr2line on an object file?) & API users can override the Undef section index to some specific section if/when they want to. I do not suggest to change behavior. If current behavior is OK then let`s preserve it. I suggest to not hide specific behavior under API. My concern is that let`s support it in an explicit way. It does not look awkwardly : object::SectionedAddress ModuleOffset = { Offset, Symbolizer.getModuleSectionIndexForAddress(ModuleName, Offset)}; Symbolizer.symbolizeCode(ModuleName, ModuleOffset) It explicitly tells how Section Index received. getModuleSectionIndexForAddress() preserve old behavior. It could easily be extended for future needs(if necessary) The advantage is that it would be explicitly seen which behavior would be used: object::SectionedAddress ModuleOffset = { Offset, Symbolizer.getSectionIndexForNewBehavior(ModuleName, Offset)}; Symbolizer.symbolizeCode(ModuleName, ModuleOffset) API users can override the Undef section index to some specific section if/when they want to API users can not override behavior for Undef section index if Undef section index would mean something specific. If the idea would be to get symbol information for only absolute addresses - then call to symbolizeCode(SectionedAddress.SectionIndex = Undef) would return wrong results, since "old behavior" does extra work.

dblaikie added inline comments.Mar 11 2019, 5:45 PM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗	(On Diff #188970)	My main concern about "old behavior for Undef section index" is that it is implicit solution and this solution does not have cancel option. If we will need to implement another behavior for : symbolizeCode(SectionedAddress.SectionIndex = Undef) How would it be possible to cancel "old behavior for Undef section index" ? I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right? So it seems clear to me how the caller can choose to get the old behavior, or choose not to. Actually I do not suggest to change their behavior here. I just show that they could require different behavior than current one: llvm-symbolizer -obj=test.o 0x10 func1() test.c:3 above usage will show only one entry from some.o. If some.o has more sections which match to the specified address then they would not be shown. It is possible > that symbolizer could be extended to show entries from all sections matched to specified address. llvm-symbolizer -obj=test.o 0x10 func1() test.c:3 func2() test.c:20 This ^ is what I would mean by a change in behavior. That's a change in what llvm-symbolizer does/how it behaves. In this case there would be necessary to implement some sort of sections enumeration. "old behavior for Undef section index" would not help here. Maybe - alternatively, the API could be changed to respond with multiple results when the request is ambiguous (when there's more than one section with that address in it). But the particular thing about this API is that it separately opens the file and searches through the sections - which seems like a fairly awkward API (to have to open the file, processes it, then call an API that will open the file again & parse it completely separately). Keeping the previous behavior as a fallback when Undef is given seems like a better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc"). In the future, the API could be improved to, instead of returning the first (or some arbitrary) result for the address - instead it could be split into two APIs - one that takes a SectionedAddress and returns zero or one results, and one that takes a raw address and returns zero or more results. But I don't think there's any need to rush to implimenting that in any callers - they were mostly doing fine with the old behavior.

avl marked an inline comment as done.Mar 12 2019, 2:48 AM

avl added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗	(On Diff #188970)	I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right? not quite right. I mean the case when caller passes Undef and wants different than "old behavior". I.e. Undef Section index means - absolute address(it is known that this is global address , not local to the section). If user asked for absolute address He might do not want to handle local section addresses which would be done as "old behaviour". Keeping the previous behavior as a fallback when Undef is given seems like a better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc"). reopening file is just implementation detail and could be handled in the same way as llvm-symbolizer currently does - it caches files. From the other side I think symbolizer`s interface would be changed anyway - to support several sections case. So that current "old behavior" would probably be removed later... Anyway, if "Keeping the previous behavior as a fallback when Undef is given" seems better than "separating behavior in explicit way" then let`s continue with this preferable approach. So I am going to implement - if SectionIndex set into Undef state, then symbolizer would internally detect SectionIndex similarly to how it was before.

David, Could you also review https://reviews.llvm.org/D58959 please ? It seems not depending on decision discussed here. So it looks like it could already be integrated.

dblaikie added inline comments.Mar 12 2019, 8:09 AM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗	(On Diff #188970)	I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right? not quite right. I mean the case when caller passes Undef and wants different than "old behavior". I.e. Undef Section index means - absolute address(it is known that this is global address , not local to the section). If user asked for absolute address He might do not want to handle local section addresses which would be done as "old behaviour". Ah, OK, thanks for clarifying/explaining that (re: absolute address). When would there be an ambiguity here between "any section with this address in it" and "I know this is an absolute address"? My naive understanding is that, say, in an unlinked object file addresses are all section-relative, and in a linked executable they're absolute. Are there cases where you might have both? And knowing whether the address is absolute or relative would produce different answers? Keeping the previous behavior as a fallback when Undef is given seems like better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc"). reopening file is just implementation detail and could be handled in the same way as llvm-symbolizer currently does - it caches files. It's an implementation detail that, to me, speaks to an issue in the design of the APIs - if the caller into the API is expected to somehow identify the section before calling into the API which then necessarily opens/parses the file for sections, etc. And one where callers may reasonably not have up-front knowledge of which section ID they want (& have to perform a search like this) - that speaks, to me, to something ill-fitting about the design of the API as it stands - which is where I'm coming from/motivated by in this discussion. From the other side I think symbolizer`s interface would be changed anyway - to support several sections case. So that current "old behavior" would probably be removed later... Anyway, if "Keeping the previous behavior as a fallback when Undef is given" seems better than "separating behavior in explicit way" then let`s continue with this preferable approach. So I am going to implement - if SectionIndex set into Undef state, then symbolizer would internally detect SectionIndex similarly to how it was before. "before" being "before any of your recent changes"? OK - thanks! If there is a case where "Undef == fallback to old behavior" and "Undef == treat the address as absolute" would produce different answers - could you include a test case so we can keep this use case in mind in the future?

avl marked an inline comment as done.Mar 12 2019, 11:40 AM

avl added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗	(On Diff #188970)	When would there be an ambiguity here between "any section with this address in it" and "I know this is an absolute address"? I thought about following difference : "I know this is an absolute address": symbolize(SectionIndex=Undef, linked_file) => func1():test.c:3 symbolize(SectionIndex=Undef, not_linked_file) => error. not linked executable. "any section with this address in it": symbolize(SectionIndex=Undef, linked_file) => func1():test.c:3 symbolize(SectionIndex=Undef, not_linked_file) => func1():test.c:3 note: I do not suggest to implement above behavior. I think that is possible scenario if caller does not want to handle case "address relates to several sections" (silently displaying one from several matches). And one where callers may reasonably not have up-front knowledge of which section ID they want (& have to perform a search like this) Ok, I assumed that caller should know which section specified address belongs. if symbolizer assumed to be used without section`s relation knowledge then we probably do not need SectionedAddress in symbolizer at all. "before" being "before any of your recent changes"? OK - thanks! yes.

addressed all comments.

dblaikie added inline comments.Mar 14 2019, 1:12 PM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
155–166 ↗	(On Diff #190423)	This code is repeated several times & again, still seems to be at the wrong layer of abstraction to me - having callers parse names, open files, search for sections - rather than passing in Undef and letting the underlying API do what it did before any of these changes. I think it's further down that the Undef=>BestEffortSectionIndex should be handled - again, where it was effectively handled before any of these changes.

I moved handling Undef case from Symbolizer into SymbolizableObjectFile. Please check whether it is proper place.

In D58848#1430071, @avl wrote:

I moved handling Undef case from Symbolizer into SymbolizableObjectFile. Please check whether it is proper place.

I'm still a little confused by the existence of a section search at all whet Undef is given - given that the previous implementation (before your work) there was no such search.

Is it possible the search could be omitted entirely?

In case Undef specified then there could be several sections with matching address ranges.
It seems to me that DWARFDebugLineTable improper place to make such a decision - which of them should be taken as a result.
This in fact high level decision, not low level.

Previously that decision was made just by accident. Property of sorting algorithm defined which concrete section would be selected.
I think that this decision should be unambiguous and currently for the same pair {Address, SectionIndex} there always would be returned the same result.

If DWARFDebugLineTable would be changed so that for Undef value there would be done search by one criteria then for overlapping address ranges there could be different results depending on concrete call to DWARFDebugLineTable. To prevent such random section selection there was implemented getModuleSectionIndexForAddress().

Do you think we need to omit that high level decision and allow random selection for sections with overlapping address ranges ?

David, there is a connected review - https://reviews.llvm.org/D59189 It cures compilation error after D58194 Could you take a look on it please ?
I think that change is OK and it is good to do in spite of decision made in this review.

ping.

In D58848#1430203, @avl wrote:

In case Undef specified then there could be several sections with matching address ranges.
It seems to me that DWARFDebugLineTable improper place to make such a decision - which of them should be taken as a result.
This in fact high level decision, not low level.

It still seems too high level to me if it involves the caller separately opening and walking the file contents.

Please move this down at least far enough that it doesn't involve separately computing the file name and opening the file. Where it's a "well, if you didn't specify a section index, we'll use the first section that covers this address".

Please move this down at least far enough that it doesn't involve separately
computing the file name and opening the file. Where it's a "well, if you didn't
specify a section index, we'll use the first section that covers this address".

It looks like this version of patch does exactly this.
It does not open file separately and does not parse file name.
The only thing which is additionally done is for already opened file :
"well, if you didn't specify a section index, we'll use the first section that covers this address" :

uint64_t SymbolizableObjectFile::getModuleSectionIndexForAddress(

  uint64_t Address) const {

for (SectionRef Sec : Module->sections()) {
  if (!Sec.isText() || Sec.isVirtual())
    continue;

  if (Address >= Sec.getAddress() &&
      Address <= Sec.getAddress() + Sec.getSize()) {
    return Sec.getIndex();
  }
}

return object::SectionedAddress::UndefSection;

}

I think it looks very close to above description.

In D58848#1440161, @avl wrote:
Please move this down at least far enough that it doesn't involve separately
computing the file name and opening the file. Where it's a "well, if you didn't
specify a section index, we'll use the first section that covers this address".

It looks like this version of patch does exactly this.
It does not open file separately and does not parse file name.
The only thing which is additionally done is for already opened file :
"well, if you didn't specify a section index, we'll use the first section that covers this address" :

uint64_t SymbolizableObjectFile::getModuleSectionIndexForAddress(
  uint64_t Address) const {

for (SectionRef Sec : Module->sections()) {
  if (!Sec.isText() || Sec.isVirtual())
    continue;

  if (Address >= Sec.getAddress() &&
      Address <= Sec.getAddress() + Sec.getSize()) {
    return Sec.getIndex();
  }
}

return object::SectionedAddress::UndefSection;
}

I think it looks very close to above description.

Ah, so it does - thanks for correcting me/ walking me through that. Perhaps I was just reading poorly, or Phab was showing a different delta or something.

This looks good, thanks!

(I guess perhaps another/future change could be for these APIs to return an error if it is ambiguous (the symbolize APIs have an Error return type already, so they could change semantics here without changing the syntax - and that'd probably give a pretty good user experience of "hey, you're trying to symbolize an ambiguous address, we can't currently give you a clear answer to this query") - and, as discussed, perhaps one day they return multiple results, etc)

This revision is now accepted and ready to land.Mar 22 2019, 2:52 PM

Thank you!

In D58848#1440177, @avl wrote:

Thank you!

It seems you indented the description by 2. That may be why the revision is not automatically closed..

Differential Revision: https://reviews.llvm.org/D58848

@MaskRay

It seems you indented the description by 2. That may be why the revision is not automatically closed..

Thanks! I will take it into account next time.

avl closed this revision.Mar 23 2019, 2:39 AM

Revision Contents

Path

Size

llvm/

lib/

DebugInfo/

Symbolize/

SymbolizableObjectFile.h

3 lines

SymbolizableObjectFile.cpp

26 lines

tools/

llvm-symbolizer/

llvm-symbolizer.cpp

57 lines

Diff 190735

llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.h

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	private:
// For big-endian PowerPC64 ELF, OpdAddress is the address of the .opd		// For big-endian PowerPC64 ELF, OpdAddress is the address of the .opd
// (function descriptor) section and OpdExtractor refers to its contents.		// (function descriptor) section and OpdExtractor refers to its contents.
std::error_code addSymbol(const object::SymbolRef &Symbol,		std::error_code addSymbol(const object::SymbolRef &Symbol,
uint64_t SymbolSize,		uint64_t SymbolSize,
DataExtractor *OpdExtractor = nullptr,		DataExtractor *OpdExtractor = nullptr,
uint64_t OpdAddress = 0);		uint64_t OpdAddress = 0);
std::error_code addCoffExportSymbols(const object::COFFObjectFile *CoffObj);		std::error_code addCoffExportSymbols(const object::COFFObjectFile *CoffObj);

		/// Search for the first occurence of specified Address in ObjectFile.
		uint64_t getModuleSectionIndexForAddress(uint64_t Address) const;

object::ObjectFile *Module;		object::ObjectFile *Module;
std::unique_ptr<DIContext> DebugInfoContext;		std::unique_ptr<DIContext> DebugInfoContext;

struct SymbolDesc {		struct SymbolDesc {
uint64_t Addr;		uint64_t Addr;
// If size is 0, assume that symbol occupies the whole memory range up to		// If size is 0, assume that symbol occupies the whole memory range up to
// the following symbol.		// the following symbol.
uint64_t Size;		uint64_t Size;
Show All 17 Lines

llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp

Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	return FNKind == FunctionNameKind::LinkageName && UseSymbolTable &&
isa<DWARFContext>(DebugInfoContext.get());		isa<DWARFContext>(DebugInfoContext.get());
}		}

DILineInfo		DILineInfo
SymbolizableObjectFile::symbolizeCode(object::SectionedAddress ModuleOffset,		SymbolizableObjectFile::symbolizeCode(object::SectionedAddress ModuleOffset,
FunctionNameKind FNKind,		FunctionNameKind FNKind,
bool UseSymbolTable) const {		bool UseSymbolTable) const {
DILineInfo LineInfo;		DILineInfo LineInfo;

		if (ModuleOffset.SectionIndex == object::SectionedAddress::UndefSection)
		ModuleOffset.SectionIndex =
		getModuleSectionIndexForAddress(ModuleOffset.Address);

if (DebugInfoContext) {		if (DebugInfoContext) {
LineInfo = DebugInfoContext->getLineInfoForAddress(		LineInfo = DebugInfoContext->getLineInfoForAddress(
ModuleOffset, getDILineInfoSpecifier(FNKind));		ModuleOffset, getDILineInfoSpecifier(FNKind));
}		}
// Override function name from symbol table if necessary.		// Override function name from symbol table if necessary.
if (shouldOverrideWithSymbolTable(FNKind, UseSymbolTable)) {		if (shouldOverrideWithSymbolTable(FNKind, UseSymbolTable)) {
std::string FunctionName;		std::string FunctionName;
uint64_t Start, Size;		uint64_t Start, Size;
if (getNameFromSymbolTable(SymbolRef::ST_Function, ModuleOffset.Address,		if (getNameFromSymbolTable(SymbolRef::ST_Function, ModuleOffset.Address,
FunctionName, Start, Size)) {		FunctionName, Start, Size)) {
LineInfo.FunctionName = FunctionName;		LineInfo.FunctionName = FunctionName;
}		}
}		}
return LineInfo;		return LineInfo;
}		}

DIInliningInfo SymbolizableObjectFile::symbolizeInlinedCode(		DIInliningInfo SymbolizableObjectFile::symbolizeInlinedCode(
object::SectionedAddress ModuleOffset, FunctionNameKind FNKind,		object::SectionedAddress ModuleOffset, FunctionNameKind FNKind,
bool UseSymbolTable) const {		bool UseSymbolTable) const {
DIInliningInfo InlinedContext;		DIInliningInfo InlinedContext;

		if (ModuleOffset.SectionIndex == object::SectionedAddress::UndefSection)
		ModuleOffset.SectionIndex =
		getModuleSectionIndexForAddress(ModuleOffset.Address);

if (DebugInfoContext)		if (DebugInfoContext)
InlinedContext = DebugInfoContext->getInliningInfoForAddress(		InlinedContext = DebugInfoContext->getInliningInfoForAddress(
ModuleOffset, getDILineInfoSpecifier(FNKind));		ModuleOffset, getDILineInfoSpecifier(FNKind));
// Make sure there is at least one frame in context.		// Make sure there is at least one frame in context.
if (InlinedContext.getNumberOfFrames() == 0)		if (InlinedContext.getNumberOfFrames() == 0)
InlinedContext.addFrame(DILineInfo());		InlinedContext.addFrame(DILineInfo());

// Override the function name in lower frame with name from symbol table.		// Override the function name in lower frame with name from symbol table.
Show All 12 Lines

DIGlobal SymbolizableObjectFile::symbolizeData(		DIGlobal SymbolizableObjectFile::symbolizeData(
object::SectionedAddress ModuleOffset) const {		object::SectionedAddress ModuleOffset) const {
DIGlobal Res;		DIGlobal Res;
getNameFromSymbolTable(SymbolRef::ST_Data, ModuleOffset.Address, Res.Name,		getNameFromSymbolTable(SymbolRef::ST_Data, ModuleOffset.Address, Res.Name,
Res.Start, Res.Size);		Res.Start, Res.Size);
return Res;		return Res;
}		}

		/// Search for the first occurence of specified Address in ObjectFile.
		uint64_t SymbolizableObjectFile::getModuleSectionIndexForAddress(
		uint64_t Address) const {

		for (SectionRef Sec : Module->sections()) {
		if (!Sec.isText() \|\| Sec.isVirtual())
		continue;

		if (Address >= Sec.getAddress() &&
		Address <= Sec.getAddress() + Sec.getSize()) {
		return Sec.getIndex();
		}
		}

		return object::SectionedAddress::UndefSection;
		}

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	if (ClBinaryName.empty()) {
ModuleName = ClBinaryName;		ModuleName = ClBinaryName;
}		}
// Skip delimiters and parse module offset.		// Skip delimiters and parse module offset.
pos += strspn(pos, kDelimiters);		pos += strspn(pos, kDelimiters);
int offset_length = strcspn(pos, kDelimiters);		int offset_length = strcspn(pos, kDelimiters);
return !StringRef(pos, offset_length).getAsInteger(0, ModuleOffset);		return !StringRef(pos, offset_length).getAsInteger(0, ModuleOffset);
}		}

// This routine returns section index for an address.
// Assumption: would work ambiguously for object files which have sections not
// assigned to an address(since the same address could belong to various
// sections).
static uint64_t getModuleSectionIndexForAddress(const std::string &ModuleName,
uint64_t Address) {

// following ModuleName processing was copied from
// LLVMSymbolizer::getOrCreateModuleInfo().
// it needs to be refactored to avoid code duplication.
std::string BinaryName = ModuleName;
size_t ColonPos = ModuleName.find_last_of(':');
// Verify that substring after colon form a valid arch name.
if (ColonPos != std::string::npos) {
std::string ArchStr = ModuleName.substr(ColonPos + 1);
if (Triple(ArchStr).getArch() != Triple::UnknownArch) {
BinaryName = ModuleName.substr(0, ColonPos);
}
}

Expected<OwningBinary<Binary>> BinaryOrErr = createBinary(BinaryName);

if (error(BinaryOrErr))
return object::SectionedAddress::UndefSection;

Binary &Binary = *BinaryOrErr->getBinary();

if (ObjectFile *O = dyn_cast<ObjectFile>(&Binary)) {
for (SectionRef Sec : O->sections()) {
if (!Sec.isText() \|\| Sec.isVirtual())
continue;

if (Address >= Sec.getAddress() &&
Address <= Sec.getAddress() + Sec.getSize()) {
return Sec.getIndex();
}
}
}

return object::SectionedAddress::UndefSection;
}

static void symbolizeInput(StringRef InputString, LLVMSymbolizer &Symbolizer,		static void symbolizeInput(StringRef InputString, LLVMSymbolizer &Symbolizer,
DIPrinter &Printer) {		DIPrinter &Printer) {
bool IsData = false;		bool IsData = false;
std::string ModuleName;		std::string ModuleName;
uint64_t Offset = 0;		uint64_t Offset = 0;
if (!parseCommand(StringRef(InputString), IsData, ModuleName, Offset)) {		if (!parseCommand(StringRef(InputString), IsData, ModuleName, Offset)) {
outs() << InputString;		outs() << InputString;
return;		return;
}		}

if (ClPrintAddress) {		if (ClPrintAddress) {
outs() << "0x";		outs() << "0x";
outs().write_hex(Offset);		outs().write_hex(Offset);
StringRef Delimiter = ClPrettyPrint ? ": " : "\n";		StringRef Delimiter = ClPrettyPrint ? ": " : "\n";
outs() << Delimiter;		outs() << Delimiter;
}		}
Offset -= ClAdjustVMA;		Offset -= ClAdjustVMA;
object::SectionedAddress ModuleOffset = {
Offset, getModuleSectionIndexForAddress(ModuleName, Offset)};
if (IsData) {		if (IsData) {
auto ResOrErr = Symbolizer.symbolizeData(ModuleName, ModuleOffset);		auto ResOrErr = Symbolizer.symbolizeData(
		ModuleName, {Offset, object::SectionedAddress::UndefSection});
Printer << (error(ResOrErr) ? DIGlobal() : ResOrErr.get());		Printer << (error(ResOrErr) ? DIGlobal() : ResOrErr.get());
} else if (ClPrintInlining) {		} else if (ClPrintInlining) {
auto ResOrErr =		auto ResOrErr = Symbolizer.symbolizeInlinedCode(
Symbolizer.symbolizeInlinedCode(ModuleName, ModuleOffset, ClDwpName);		ModuleName, {Offset, object::SectionedAddress::UndefSection},
		ClDwpName);
Printer << (error(ResOrErr) ? DIInliningInfo() : ResOrErr.get());		Printer << (error(ResOrErr) ? DIInliningInfo() : ResOrErr.get());
} else {		} else {
auto ResOrErr =		auto ResOrErr = Symbolizer.symbolizeCode(
Symbolizer.symbolizeCode(ModuleName, ModuleOffset, ClDwpName);		ModuleName, {Offset, object::SectionedAddress::UndefSection},
		ClDwpName);
Printer << (error(ResOrErr) ? DILineInfo() : ResOrErr.get());		Printer << (error(ResOrErr) ? DILineInfo() : ResOrErr.get());
}		}
outs() << "\n";		outs() << "\n";
outs().flush();		outs().flush();
}		}

int main(int argc, char **argv) {		int main(int argc, char **argv) {
InitLLVM X(argc, argv);		InitLLVM X(argc, argv);
Show All 39 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 190735

llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.h

llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp

[DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"
ClosedPublic