This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"
ClosedPublic

Authored by avl on Mar 1 2019, 2:10 PM.

Download Raw Diff

Details

Reviewers

dblaikie
grimar
delcypher

Summary

This patch fixes buildbot failures after https://reviews.llvm.org/D58194.

[DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"
    
[Symbolizer] Add getModuleSectionIndexForAddress() helper routine
    
The https://reviews.llvm.org/D58194 patch changed symbolizer interface.
Particularily it requires not only Address but SectionIndex also.
Note object::SectionedAddress parameter:
    
Expected<DILineInfo> symbolizeCode(const std::string &ModuleName,
                                 object::SectionedAddress ModuleOffset,
                                 StringRef DWPName = "");
    
There are callers of symbolizer which do not know particular section index.
That patch creates getModuleSectionIndexForAddress() routine which
will detect section index for the specified address. Thus if caller
set ModuleOffset.SectionIndex into object::SectionedAddress::UndefSection
state then symbolizer would detect section index using
getModuleSectionIndexForAddress routine.

Diff Detail

Event Timeline

avl created this revision.Mar 1 2019, 2:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2019, 2:10 PM

Herald added subscribers: rupprecht, hiraditya. · View Herald Transcript

dblaikie added inline comments.Mar 1 2019, 4:37 PM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
142	Any reason this is becoming a member function from a file-local function?
146	I hadn't spotted this code (or looked very closely at it, at least) in the original review. I'm surprised/confused why the original change involved adding code that would look at input object file names at all. Why is this? Hmm, I see looking at the llvm-symbolizer code that this logic looks up the section name and then passes that in along with the file name to the symbolizing routine. But what's the point of that? It doesn't seem like it would change/improve the quality of the symbolizer's output compared to passing in a "I don't know what section" value, right?
148–157	Lots more get<N> rather than first/second. But there's also enough uses here it might be nicer to name the two variables instead of accessing them by pair repeatedly.
341	Perhaps this function should return StringRefs into the originally passed in string (perhaps the original input should be a StringRef too)?
355	Would make_pair be simpler here? Also potentially using std::move to avoid extra string copies: return std::make_pair(std::move(BinaryName), std::move(ArchName));
443–444	rather than using the tuple accessor (get<N>) I think it's probably still more canonical to use ".first" and ".second" for pairs.
448	You could replace the "unique_ptr<T>()" with "nullptr" here.
453	I don't think much code in LLVM checks for explicit "== nullptr" - this would probably more canonically be written as "if (!Objects.first)"
454	Guess this cast is needed because of the conversion to Expected? Is non-error, but null, a valid result from this function? That's a tricky thing, but could be the case. I wonder if Expected<T*> should be constructible from nullptr_t to simplify this sort of case?
llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp
150–158	Is this related to the rest of the change in this patch? If not, it should probably go in a separate patch/commit.

dblaikie added inline comments.Mar 1 2019, 4:39 PM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146	I think this follows on from one of the comments I made in the original review - I don't know it's worth plumbing through the section index to any callers except the original one you had in mind (objdump disassembly) as all the other callers will need work to do anytihng other than what they do today (eg: the symbolizer could have support for multiple answers - the same address in multiple sections (in which case it'd need tests for that, etc)- but this bit of code isn't doing that)

David, Thank you for the comments! I skipped code style issues for the moment. please check the comments for allowing setting SectionIndex into Undef state and for the necessity of getModuleSectionIndexForAddress() routine. After we will decide on these questions I will handle code style issues.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
142	the idea was: to not copy ModuleName parsing code from Symbolizer. This code is neccessary to properly handle ModuleName on Darwin, since it contains architecture postfix. and another reason Is to use this function in any place where SectionIndex would be neccessary. Specifying Undef value is not enough usually, see later comment on the Undef SectionIndex usage.
146	I think that my original description was not clear enough and there is some misunderstanding of new interface usage. Let`s clarify it now and decide what should it mean. I assume that SectionIndex should always be set in the correct value. Setting SectionIndex into Undef state could lead to incorrect work if relocations present. At least that is how it is currently done. That assumes that all callers of new interface need to do extra work to set SectionIndex properly. In some cases I already done that additional work, in some cases left it in Undef state(if it did not brake testing) and for some cases getModuleSectionIndexForAddress could be used for setting SectionIndex(that is the temporarily routine until proper code implemented). These two last cases assumed to be changed into proper implementation. It differs from the logic - if SectionIndex presented then take it into account, if SectionIndex is not presented then handle it somehow. The rationale for that is that DWARFDebugLineTable looks to be improper place to decide how to handle "I don't know what section" case. Different use cases could assume different handling of that situation. So I decided that DWARFDebugLineTable should always receive full info and caller of this routine should decide how to handle this situation. f.e. case "I don't know what section" could be handled : a) find correct section and set it. b) find several sections which match to local offset and use them. c) find last section which match to local offset and use it(behavior before fix). d) ignore that case. f) display error ... something else... Do you think we need to change it and allow setting SectionIndex into Undef state and specify how DWARFDebugLineTable would handle it ?
llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp
150–158	The only relation is that it fixes several lit tests after "D58194" patch.

Hahnfeld added a subscriber: Hahnfeld.Mar 3 2019, 7:20 AM

dblaikie added inline comments.Mar 4 2019, 11:14 AM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
142	Right, I understand the first bit - but that wouldn't change where this function should live. The second bit - it'd be nice to move it potentially in a separate refactoring patch (and/or including it in the patch where a second caller is added to this function) to help isolate/separate independent changes.
146	Ah. But most (all?) executables only have one executable section, right? That's why the way this worked before your change has worked OK for most/all users historically? On that basis I think it makes sense to let the exsiting/previous APIs (witohut specifying a section index (ie: have the original functions, without any section index parameter - then overloaded with the new versions that allow the section index to be specified) - or perhaps having to specify an Undef one) to continue to work, while letting those callers who need to, specify it.

avl marked an inline comment as done.Mar 5 2019, 3:38 AM

avl added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146	right, most executables only have one executable section. That is why previous implementation worked correctly for them using only address as point locator. But for not-linked objects this is not right. Previous implementation did not work for them. i.e. if we allow previous implementation by setting SectionIndex = UndefSection then such a call would work for only linked executable. That means that DWARFDebugLine or LLVMSymbolizer would imply concrete state. following calls could be done for only linked executables : DWARFDebugLine.getFileLineInfoForAddress ( SectionedAddress.SectionIndex = UndefSection LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = UndefSection it would be error to call them for non-linked object file. That means that all places where we could expect non-linked object file needs to have something like this : if (linked executable) { LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = UndefSection } else { LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = getSectionIndex() } llvm-dwardump, llvm-objectdump, lld, llvm-symbolizer - could work not only with linked executable but with not-linked object files also. So it looks like most usages would expect both cases - linked executables and not-linked object files. Probably in that case stateless interface would be more convenient ? Current stateless interface does not assume whether it works for linked or not-linked object. It always receives correct SectionIndex and works equally for both types : LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = getSectionIndex() The places where it becomes less convenient is the code assumed to work with only linked-executable. In such places there would be neccessary to add code which would detect SectionIndex for new interface. But this looks to be quite natural task and written once it could be used later for free. I created such a routine - getModuleSectionIndexForAddress(though I agree this is quite naive implementation) . The advantage of the new stateless interface is that it could not be used incorrectly. i.e. it would not be possible to use UndefSection for non-linked object files. Probably it would make sense to not keep old behavior without SectionIndex ? currently UndefSection used in following modules : sanstats sancov llvm-xray llvm-dwarfdump

I separated ExecutionEngine into another review - https://reviews.llvm.org/D58959

avl marked an inline comment as done.Mar 8 2019, 11:46 AM

avl added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146	David, do you think it makes sense to assume that SectionIndex should always be set in the correct value(not equal to undef) ? Or your preferred choice is to allow SectionIndex to be set into UndefState and assume that in that case linked executables should be processed ?

dblaikie added inline comments.Mar 8 2019, 12:07 PM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146	I don't think it makes sense to open the file separately, search for a valid section index, then pass that into the API - compared to the API doing this for you. If you know the section you want, you pass it in, if not, you don't & you get the best effort (first instance, or perhaps APIs could be updated to return multiple responses when it's ambiguous). I take it your idea was that this "open the file, search for some section index which covers the address, then use that section index when calling the underlying API" was meant as a stop-gap until some longer term solution where section indexes would always be passed in? I don't think that's a future we'd really have - most of these uses/users aren't going to pass in section indexes (users of llvm-dwarfdump/objdump/symbolizeretc are unlikely to specify a section index on the command line too), so I think it makes sense to have the built-in functionality be what's currently grafted on top using getModuleSectionIndexForAddress - leave that as the built-in functionality as it was before and let users override/be specific if they want to be.

avl marked an inline comment as done.Mar 8 2019, 2:25 PM

avl added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146	I take it your idea was that this "open the file, search for some section index which covers the address, then use that section index when calling the underlying API" was meant as a stop-gap until some longer term solution where section indexes would always be passed in? yes. that`s right. I don't think that's a future we'd really have - most of these uses/users aren't going to pass in section indexes (users of llvm-dwarfdump/objdump /symbolizeretc are unlikely to specify a section index on the command line too), so I think it makes sense to have the built-in functionality be what's currently grafted on top using getModuleSectionIndexForAddress - leave that as the built-in functionality as it was before and let users override/be specific if they want to be. That is also thing which I am agree. i.e. users should have a built-in functionality for setting proper SectionIndex. They should not do it manually. Thing which seems to me questionable is that - Is it OK to hide that builtin functionality under old symbolizer interface : symbolizeCode(..., object::SectionedAddress ModuleOffset) symbolizeInlinedCode(..., object::SectionedAddress ModuleOffset) symbolizeData(..., object::SectionedAddress ModuleOffset i.e. if ModuleOffset.SectionIndex = UndefSection and if we will keep old behavior in this case. That looks questionable because different use cases could require different behaviour. But in this case some concrete behavior would already be done. Instead we can specify interface so that it always receives correct unambitious information. i.e. these methods would always receive correct SectionIndex : symbolizeCode(..., object::SectionedAddress ModuleOffset) symbolizeInlinedCode(..., object::SectionedAddress ModuleOffset) symbolizeData(..., object::SectionedAddress ModuleOffset so that SectionIndex either set into proper index of section, either set into Undef state and it means an absolute address. That allows to implement several builtin behaviors separately. for example : we could implement this method for symbolizer : enumerateSections (uint64_t address, std::function<void(uint_64_t SectionIndex)> on_section); then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them. If we will need another behavior(find first of, find last of/display error) then it would be possible to create another helper builtin routine(as part of symbolizer interface or as separate implementation). This patch follows that idea : it requires SectionIndex to be properly specified for symbolizeCode/InlinedCode/Data methods and provides helper getModuleSectionIndexForAddress() routine so that all existing tools would be able to call it and work the same way as before. This getModuleSectionIndexForAddress routine assumed to be temporarily and should be replaced later with proper solution(s). Something like above enumerateSections() suggestion. So it looks like this solution allows API to help users and at the same time to implement several use cases.

dblaikie added inline comments.Mar 8 2019, 2:41 PM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146	I'm still not sure I follow the harm of having these APIs default to their old behavior - if the current solution is to have them all use this extra API to more awkwardly get the old behavior anyway. "then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them." What's the new behavior you're describing here - how would dwarfdump/objdump/symbolizer behave differently if they enumerated the sections? What sort of user-facing behavior change are you picturing here? I think the existing behavior's really not too problematic & fine to preserve (what do GNU tools do - eg: addr2line on an object file?) & API users can override the Undef section index to some specific section if/when they want to.

avl marked an inline comment as done.Mar 9 2019, 10:49 AM

avl added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146	I'm still not sure I follow the harm of having these APIs default to their old behavior - if the current solution is to have them all use this extra API to more awkwardly get the old behavior anyway. My main concern about "old behavior for Undef section index" is that it is implicit solution and this solution does not have cancel option. If we will need to implement another behavior for : symbolizeCode(SectionedAddress.SectionIndex = Undef) How would it be possible to cancel "old behavior for Undef section index" ? "then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them." What's the new behavior you're describing here - how would dwarfdump/objdump /symbolizer behave differently if they enumerated the sections? What sort of user-facing behavior change are you picturing here? Actually I do not suggest to change their behavior here. I just show that they could require different behavior than current one: llvm-symbolizer -obj=test.o 0x10 func1() test.c:3 above usage will show only one entry from some.o. If some.o has more sections which match to the specified address then they would not be shown. It is possible that symbolizer could be extended to show entries from all sections matched to specified address. llvm-symbolizer -obj=test.o 0x10 func1() test.c:3 func2() test.c:20 In this case there would be necessary to implement some sort of sections enumeration. "old behavior for Undef section index" would not help here. I think the existing behavior's really not too problematic & fine to preserve (what do GNU tools do - eg: addr2line on an object file?) & API users can override the Undef section index to some specific section if/when they want to. I do not suggest to change behavior. If current behavior is OK then let`s preserve it. I suggest to not hide specific behavior under API. My concern is that let`s support it in an explicit way. It does not look awkwardly : object::SectionedAddress ModuleOffset = { Offset, Symbolizer.getModuleSectionIndexForAddress(ModuleName, Offset)}; Symbolizer.symbolizeCode(ModuleName, ModuleOffset) It explicitly tells how Section Index received. getModuleSectionIndexForAddress() preserve old behavior. It could easily be extended for future needs(if necessary) The advantage is that it would be explicitly seen which behavior would be used: object::SectionedAddress ModuleOffset = { Offset, Symbolizer.getSectionIndexForNewBehavior(ModuleName, Offset)}; Symbolizer.symbolizeCode(ModuleName, ModuleOffset) API users can override the Undef section index to some specific section if/when they want to API users can not override behavior for Undef section index if Undef section index would mean something specific. If the idea would be to get symbol information for only absolute addresses - then call to symbolizeCode(SectionedAddress.SectionIndex = Undef) would return wrong results, since "old behavior" does extra work.

dblaikie added inline comments.Mar 11 2019, 5:45 PM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146	My main concern about "old behavior for Undef section index" is that it is implicit solution and this solution does not have cancel option. If we will need to implement another behavior for : symbolizeCode(SectionedAddress.SectionIndex = Undef) How would it be possible to cancel "old behavior for Undef section index" ? I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right? So it seems clear to me how the caller can choose to get the old behavior, or choose not to. Actually I do not suggest to change their behavior here. I just show that they could require different behavior than current one: llvm-symbolizer -obj=test.o 0x10 func1() test.c:3 above usage will show only one entry from some.o. If some.o has more sections which match to the specified address then they would not be shown. It is possible > that symbolizer could be extended to show entries from all sections matched to specified address. llvm-symbolizer -obj=test.o 0x10 func1() test.c:3 func2() test.c:20 This ^ is what I would mean by a change in behavior. That's a change in what llvm-symbolizer does/how it behaves. In this case there would be necessary to implement some sort of sections enumeration. "old behavior for Undef section index" would not help here. Maybe - alternatively, the API could be changed to respond with multiple results when the request is ambiguous (when there's more than one section with that address in it). But the particular thing about this API is that it separately opens the file and searches through the sections - which seems like a fairly awkward API (to have to open the file, processes it, then call an API that will open the file again & parse it completely separately). Keeping the previous behavior as a fallback when Undef is given seems like a better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc"). In the future, the API could be improved to, instead of returning the first (or some arbitrary) result for the address - instead it could be split into two APIs - one that takes a SectionedAddress and returns zero or one results, and one that takes a raw address and returns zero or more results. But I don't think there's any need to rush to implimenting that in any callers - they were mostly doing fine with the old behavior.

avl marked an inline comment as done.Mar 12 2019, 2:48 AM

avl added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146	I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right? not quite right. I mean the case when caller passes Undef and wants different than "old behavior". I.e. Undef Section index means - absolute address(it is known that this is global address , not local to the section). If user asked for absolute address He might do not want to handle local section addresses which would be done as "old behaviour". Keeping the previous behavior as a fallback when Undef is given seems like a better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc"). reopening file is just implementation detail and could be handled in the same way as llvm-symbolizer currently does - it caches files. From the other side I think symbolizer`s interface would be changed anyway - to support several sections case. So that current "old behavior" would probably be removed later... Anyway, if "Keeping the previous behavior as a fallback when Undef is given" seems better than "separating behavior in explicit way" then let`s continue with this preferable approach. So I am going to implement - if SectionIndex set into Undef state, then symbolizer would internally detect SectionIndex similarly to how it was before.

David, Could you also review https://reviews.llvm.org/D58959 please ? It seems not depending on decision discussed here. So it looks like it could already be integrated.

dblaikie added inline comments.Mar 12 2019, 8:09 AM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146	I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right? not quite right. I mean the case when caller passes Undef and wants different than "old behavior". I.e. Undef Section index means - absolute address(it is known that this is global address , not local to the section). If user asked for absolute address He might do not want to handle local section addresses which would be done as "old behaviour". Ah, OK, thanks for clarifying/explaining that (re: absolute address). When would there be an ambiguity here between "any section with this address in it" and "I know this is an absolute address"? My naive understanding is that, say, in an unlinked object file addresses are all section-relative, and in a linked executable they're absolute. Are there cases where you might have both? And knowing whether the address is absolute or relative would produce different answers? Keeping the previous behavior as a fallback when Undef is given seems like better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc"). reopening file is just implementation detail and could be handled in the same way as llvm-symbolizer currently does - it caches files. It's an implementation detail that, to me, speaks to an issue in the design of the APIs - if the caller into the API is expected to somehow identify the section before calling into the API which then necessarily opens/parses the file for sections, etc. And one where callers may reasonably not have up-front knowledge of which section ID they want (& have to perform a search like this) - that speaks, to me, to something ill-fitting about the design of the API as it stands - which is where I'm coming from/motivated by in this discussion. From the other side I think symbolizer`s interface would be changed anyway - to support several sections case. So that current "old behavior" would probably be removed later... Anyway, if "Keeping the previous behavior as a fallback when Undef is given" seems better than "separating behavior in explicit way" then let`s continue with this preferable approach. So I am going to implement - if SectionIndex set into Undef state, then symbolizer would internally detect SectionIndex similarly to how it was before. "before" being "before any of your recent changes"? OK - thanks! If there is a case where "Undef == fallback to old behavior" and "Undef == treat the address as absolute" would produce different answers - could you include a test case so we can keep this use case in mind in the future?

avl marked an inline comment as done.Mar 12 2019, 11:40 AM

avl added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146	When would there be an ambiguity here between "any section with this address in it" and "I know this is an absolute address"? I thought about following difference : "I know this is an absolute address": symbolize(SectionIndex=Undef, linked_file) => func1():test.c:3 symbolize(SectionIndex=Undef, not_linked_file) => error. not linked executable. "any section with this address in it": symbolize(SectionIndex=Undef, linked_file) => func1():test.c:3 symbolize(SectionIndex=Undef, not_linked_file) => func1():test.c:3 note: I do not suggest to implement above behavior. I think that is possible scenario if caller does not want to handle case "address relates to several sections" (silently displaying one from several matches). And one where callers may reasonably not have up-front knowledge of which section ID they want (& have to perform a search like this) Ok, I assumed that caller should know which section specified address belongs. if symbolizer assumed to be used without section`s relation knowledge then we probably do not need SectionedAddress in symbolizer at all. "before" being "before any of your recent changes"? OK - thanks! yes.

addressed all comments.

dblaikie added inline comments.Mar 14 2019, 1:12 PM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
131–142	This code is repeated several times & again, still seems to be at the wrong layer of abstraction to me - having callers parse names, open files, search for sections - rather than passing in Undef and letting the underlying API do what it did before any of these changes. I think it's further down that the Undef=>BestEffortSectionIndex should be handled - again, where it was effectively handled before any of these changes.

I moved handling Undef case from Symbolizer into SymbolizableObjectFile. Please check whether it is proper place.

In D58848#1430071, @avl wrote:

I moved handling Undef case from Symbolizer into SymbolizableObjectFile. Please check whether it is proper place.

I'm still a little confused by the existence of a section search at all whet Undef is given - given that the previous implementation (before your work) there was no such search.

Is it possible the search could be omitted entirely?

In case Undef specified then there could be several sections with matching address ranges.
It seems to me that DWARFDebugLineTable improper place to make such a decision - which of them should be taken as a result.
This in fact high level decision, not low level.

Previously that decision was made just by accident. Property of sorting algorithm defined which concrete section would be selected.
I think that this decision should be unambiguous and currently for the same pair {Address, SectionIndex} there always would be returned the same result.

If DWARFDebugLineTable would be changed so that for Undef value there would be done search by one criteria then for overlapping address ranges there could be different results depending on concrete call to DWARFDebugLineTable. To prevent such random section selection there was implemented getModuleSectionIndexForAddress().

Do you think we need to omit that high level decision and allow random selection for sections with overlapping address ranges ?

David, there is a connected review - https://reviews.llvm.org/D59189 It cures compilation error after D58194 Could you take a look on it please ?
I think that change is OK and it is good to do in spite of decision made in this review.

ping.

In D58848#1430203, @avl wrote:

In case Undef specified then there could be several sections with matching address ranges.
It seems to me that DWARFDebugLineTable improper place to make such a decision - which of them should be taken as a result.
This in fact high level decision, not low level.

It still seems too high level to me if it involves the caller separately opening and walking the file contents.

Please move this down at least far enough that it doesn't involve separately computing the file name and opening the file. Where it's a "well, if you didn't specify a section index, we'll use the first section that covers this address".

Please move this down at least far enough that it doesn't involve separately
computing the file name and opening the file. Where it's a "well, if you didn't
specify a section index, we'll use the first section that covers this address".

It looks like this version of patch does exactly this.
It does not open file separately and does not parse file name.
The only thing which is additionally done is for already opened file :
"well, if you didn't specify a section index, we'll use the first section that covers this address" :

uint64_t SymbolizableObjectFile::getModuleSectionIndexForAddress(

  uint64_t Address) const {

for (SectionRef Sec : Module->sections()) {
  if (!Sec.isText() || Sec.isVirtual())
    continue;

  if (Address >= Sec.getAddress() &&
      Address <= Sec.getAddress() + Sec.getSize()) {
    return Sec.getIndex();
  }
}

return object::SectionedAddress::UndefSection;

}

I think it looks very close to above description.

In D58848#1440161, @avl wrote:
Please move this down at least far enough that it doesn't involve separately
computing the file name and opening the file. Where it's a "well, if you didn't
specify a section index, we'll use the first section that covers this address".

It looks like this version of patch does exactly this.
It does not open file separately and does not parse file name.
The only thing which is additionally done is for already opened file :
"well, if you didn't specify a section index, we'll use the first section that covers this address" :

uint64_t SymbolizableObjectFile::getModuleSectionIndexForAddress(
  uint64_t Address) const {

for (SectionRef Sec : Module->sections()) {
  if (!Sec.isText() || Sec.isVirtual())
    continue;

  if (Address >= Sec.getAddress() &&
      Address <= Sec.getAddress() + Sec.getSize()) {
    return Sec.getIndex();
  }
}

return object::SectionedAddress::UndefSection;
}

I think it looks very close to above description.

Ah, so it does - thanks for correcting me/ walking me through that. Perhaps I was just reading poorly, or Phab was showing a different delta or something.

This looks good, thanks!

(I guess perhaps another/future change could be for these APIs to return an error if it is ambiguous (the symbolize APIs have an Error return type already, so they could change semantics here without changing the syntax - and that'd probably give a pretty good user experience of "hey, you're trying to symbolize an ambiguous address, we can't currently give you a clear answer to this query") - and, as discussed, perhaps one day they return multiple results, etc)

This revision is now accepted and ready to land.Mar 22 2019, 2:52 PM

Thank you!

In D58848#1440177, @avl wrote:

Thank you!

It seems you indented the description by 2. That may be why the revision is not automatically closed..

Differential Revision: https://reviews.llvm.org/D58848

@MaskRay

It seems you indented the description by 2. That may be why the revision is not automatically closed..

Thanks! I will take it into account next time.

avl closed this revision.Mar 23 2019, 2:39 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

DebugInfo/

Symbolize/

Symbolize.h

12 lines

lib/

DebugInfo/

Symbolize/

Symbolize.cpp

77 lines

ExecutionEngine/

IntelJITEvents/

IntelJITEventListener.cpp

12 lines

tools/

llvm-symbolizer/

llvm-symbolizer.cpp

44 lines

Diff 188970

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	public:
Expected<DIGlobal> symbolizeData(const std::string &ModuleName,		Expected<DIGlobal> symbolizeData(const std::string &ModuleName,
object::SectionedAddress ModuleOffset);		object::SectionedAddress ModuleOffset);
void flush();		void flush();

static std::string		static std::string
DemangleName(const std::string &Name,		DemangleName(const std::string &Name,
const SymbolizableModule *DbiModuleDescriptor);		const SymbolizableModule *DbiModuleDescriptor);

		// this is a temporarily routine. It could be used to detect
		// section index for an address until all places would not be patched
		// with the correct SectionIndex.
		uint64_t getModuleSectionIndexForAddress(const std::string &ModuleName,
		uint64_t Address);

private:		private:
// Bundles together object file with code/data and object file with		// Bundles together object file with code/data and object file with
// corresponding debug info. These objects can be the same.		// corresponding debug info. These objects can be the same.
using ObjectPair = std::pair<ObjectFile , ObjectFile >;		using ObjectPair = std::pair<ObjectFile , ObjectFile >;

/// Returns a SymbolizableModule or an error if loading debug info failed.		/// Returns a SymbolizableModule or an error if loading debug info failed.
/// Only one attempt is made to load a module, and errors during loading are		/// Only one attempt is made to load a module, and errors during loading are
/// only reported once. Subsequent calls to get module info for a module that		/// only reported once. Subsequent calls to get module info for a module that
Show All 13 Lines	Expected<ObjectPair> getOrCreateObjectPair(const std::string &Path,
const std::string &ArchName);		const std::string &ArchName);

/// Return a pointer to object file at specified path, for a specified		/// Return a pointer to object file at specified path, for a specified
/// architecture (e.g. if path refers to a Mach-O universal binary, only one		/// architecture (e.g. if path refers to a Mach-O universal binary, only one
/// object file from it will be returned).		/// object file from it will be returned).
Expected<ObjectFile *> getOrCreateObject(const std::string &Path,		Expected<ObjectFile *> getOrCreateObject(const std::string &Path,
const std::string &ArchName);		const std::string &ArchName);

		/// Returns binary name and architecture name(if presented) extracted from
		/// ModuleName. <0> pair member is a binary name, <1> pair member is an
		/// architecture
		std::pair<std::string, std::string>
		parseModuleName(const std::string &ModuleName) const;

std::map<std::string, std::unique_ptr<SymbolizableModule>> Modules;		std::map<std::string, std::unique_ptr<SymbolizableModule>> Modules;

/// Contains cached results of getOrCreateObjectPair().		/// Contains cached results of getOrCreateObjectPair().
std::map<std::pair<std::string, std::string>, ObjectPair>		std::map<std::pair<std::string, std::string>, ObjectPair>
ObjectPairForPathArch;		ObjectPairForPathArch;

/// Contains parsed binary for each path, or parsing error.		/// Contains parsed binary for each path, or parsing error.
std::map<std::string, OwningBinary<Binary>> BinaryForPath;		std::map<std::string, OwningBinary<Binary>> BinaryForPath;
Show All 13 Lines

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	if (!Info)
return DIGlobal();		return DIGlobal();

// If the user is giving us relative addresses, add the preferred base of		// If the user is giving us relative addresses, add the preferred base of
// the object to the offset before we do the query. It's what DIContext		// the object to the offset before we do the query. It's what DIContext
// expects.		// expects.
if (Opts.RelativeAddresses)		if (Opts.RelativeAddresses)
ModuleOffset.Address += Info->getModulePreferredBase();		ModuleOffset.Address += Info->getModulePreferredBase();

DIGlobal Global = Info->symbolizeData(ModuleOffset);		DIGlobal Global = Info->symbolizeData(ModuleOffset);
if (Opts.Demangle)		if (Opts.Demangle)
Global.Name = DemangleName(Global.Name, Info);		Global.Name = DemangleName(Global.Name, Info);
return Global;		return Global;
}		}

		// This routine returns section index for an address.
		// Assumption: would work ambiguously for object files which have sections not
		// assigned to an address(since the same address could belong to various
		// sections).
		uint64_t
		LLVMSymbolizer::getModuleSectionIndexForAddress(const std::string &ModuleName,
		dblaikieUnsubmitted Not Done Reply Inline Actions Any reason this is becoming a member function from a file-local function? dblaikie: Any reason this is becoming a member function from a file-local function?
		avlAuthorUnsubmitted Done Reply Inline Actions the idea was: to not copy ModuleName parsing code from Symbolizer. This code is neccessary to properly handle ModuleName on Darwin, since it contains architecture postfix. and another reason Is to use this function in any place where SectionIndex would be neccessary. Specifying Undef value is not enough usually, see later comment on the Undef SectionIndex usage. avl: the idea was: - to not copy ModuleName parsing code from Symbolizer. This code is neccessary to…
		dblaikieUnsubmitted Not Done Reply Inline Actions Right, I understand the first bit - but that wouldn't change where this function should live. The second bit - it'd be nice to move it potentially in a separate refactoring patch (and/or including it in the patch where a second caller is added to this function) to help isolate/separate independent changes. dblaikie: Right, I understand the first bit - but that wouldn't change where this function should live.
		dblaikieUnsubmitted Not Done Reply Inline Actions This code is repeated several times & again, still seems to be at the wrong layer of abstraction to me - having callers parse names, open files, search for sections - rather than passing in Undef and letting the underlying API do what it did before any of these changes. I think it's further down that the Undef=>BestEffortSectionIndex should be handled - again, where it was effectively handled before any of these changes. dblaikie: This code is repeated several times & again, still seems to be at the wrong layer of…
		uint64_t Address) {

		std::pair<std::string, std::string> ParsedModuleName =
		parseModuleName(ModuleName);
		dblaikieUnsubmitted Not Done Reply Inline Actions I hadn't spotted this code (or looked very closely at it, at least) in the original review. I'm surprised/confused why the original change involved adding code that would look at input object file names at all. Why is this? Hmm, I see looking at the llvm-symbolizer code that this logic looks up the section name and then passes that in along with the file name to the symbolizing routine. But what's the point of that? It doesn't seem like it would change/improve the quality of the symbolizer's output compared to passing in a "I don't know what section" value, right? dblaikie: I hadn't spotted this code (or looked very closely at it, at least) in the original review.
		dblaikieUnsubmitted Not Done Reply Inline Actions I think this follows on from one of the comments I made in the original review - I don't know it's worth plumbing through the section index to any callers except the original one you had in mind (objdump disassembly) as all the other callers will need work to do anytihng other than what they do today (eg: the symbolizer could have support for multiple answers - the same address in multiple sections (in which case it'd need tests for that, etc)- but this bit of code isn't doing that) dblaikie: I think this follows on from one of the comments I made in the original review - I don't know…
		avlAuthorUnsubmitted Done Reply Inline Actions I think that my original description was not clear enough and there is some misunderstanding of new interface usage. Let`s clarify it now and decide what should it mean. I assume that SectionIndex should always be set in the correct value. Setting SectionIndex into Undef state could lead to incorrect work if relocations present. At least that is how it is currently done. That assumes that all callers of new interface need to do extra work to set SectionIndex properly. In some cases I already done that additional work, in some cases left it in Undef state(if it did not brake testing) and for some cases getModuleSectionIndexForAddress could be used for setting SectionIndex(that is the temporarily routine until proper code implemented). These two last cases assumed to be changed into proper implementation. It differs from the logic - if SectionIndex presented then take it into account, if SectionIndex is not presented then handle it somehow. The rationale for that is that DWARFDebugLineTable looks to be improper place to decide how to handle "I don't know what section" case. Different use cases could assume different handling of that situation. So I decided that DWARFDebugLineTable should always receive full info and caller of this routine should decide how to handle this situation. f.e. case "I don't know what section" could be handled : a) find correct section and set it. b) find several sections which match to local offset and use them. c) find last section which match to local offset and use it(behavior before fix). d) ignore that case. f) display error ... something else... Do you think we need to change it and allow setting SectionIndex into Undef state and specify how DWARFDebugLineTable would handle it ? avl: I think that my original description was not clear enough and there is some misunderstanding of…
		dblaikieUnsubmitted Not Done Reply Inline Actions Ah. But most (all?) executables only have one executable section, right? That's why the way this worked before your change has worked OK for most/all users historically? On that basis I think it makes sense to let the exsiting/previous APIs (witohut specifying a section index (ie: have the original functions, without any section index parameter - then overloaded with the new versions that allow the section index to be specified) - or perhaps having to specify an Undef one) to continue to work, while letting those callers who need to, specify it. dblaikie: Ah. But most (all?) executables only have one executable section, right? That's why the way…
		avlAuthorUnsubmitted Done Reply Inline Actions right, most executables only have one executable section. That is why previous implementation worked correctly for them using only address as point locator. But for not-linked objects this is not right. Previous implementation did not work for them. i.e. if we allow previous implementation by setting SectionIndex = UndefSection then such a call would work for only linked executable. That means that DWARFDebugLine or LLVMSymbolizer would imply concrete state. following calls could be done for only linked executables : DWARFDebugLine.getFileLineInfoForAddress ( SectionedAddress.SectionIndex = UndefSection LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = UndefSection it would be error to call them for non-linked object file. That means that all places where we could expect non-linked object file needs to have something like this : if (linked executable) { LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = UndefSection } else { LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = getSectionIndex() } llvm-dwardump, llvm-objectdump, lld, llvm-symbolizer - could work not only with linked executable but with not-linked object files also. So it looks like most usages would expect both cases - linked executables and not-linked object files. Probably in that case stateless interface would be more convenient ? Current stateless interface does not assume whether it works for linked or not-linked object. It always receives correct SectionIndex and works equally for both types : LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = getSectionIndex() The places where it becomes less convenient is the code assumed to work with only linked-executable. In such places there would be neccessary to add code which would detect SectionIndex for new interface. But this looks to be quite natural task and written once it could be used later for free. I created such a routine - getModuleSectionIndexForAddress(though I agree this is quite naive implementation) . The advantage of the new stateless interface is that it could not be used incorrectly. i.e. it would not be possible to use UndefSection for non-linked object files. Probably it would make sense to not keep old behavior without SectionIndex ? currently UndefSection used in following modules : sanstats sancov llvm-xray llvm-dwarfdump avl: right, most executables only have one executable section. That is why previous implementation…
		avlAuthorUnsubmitted Done Reply Inline Actions David, do you think it makes sense to assume that SectionIndex should always be set in the correct value(not equal to undef) ? Or your preferred choice is to allow SectionIndex to be set into UndefState and assume that in that case linked executables should be processed ? avl: David, do you think it makes sense to assume that SectionIndex should always be set in the…
		dblaikieUnsubmitted Not Done Reply Inline Actions I don't think it makes sense to open the file separately, search for a valid section index, then pass that into the API - compared to the API doing this for you. If you know the section you want, you pass it in, if not, you don't & you get the best effort (first instance, or perhaps APIs could be updated to return multiple responses when it's ambiguous). I take it your idea was that this "open the file, search for some section index which covers the address, then use that section index when calling the underlying API" was meant as a stop-gap until some longer term solution where section indexes would always be passed in? I don't think that's a future we'd really have - most of these uses/users aren't going to pass in section indexes (users of llvm-dwarfdump/objdump/symbolizeretc are unlikely to specify a section index on the command line too), so I think it makes sense to have the built-in functionality be what's currently grafted on top using getModuleSectionIndexForAddress - leave that as the built-in functionality as it was before and let users override/be specific if they want to be. dblaikie: I don't think it makes sense to open the file separately, search for a valid section index…
		avlAuthorUnsubmitted Done Reply Inline Actions I take it your idea was that this "open the file, search for some section index which covers the address, then use that section index when calling the underlying API" was meant as a stop-gap until some longer term solution where section indexes would always be passed in? yes. that`s right. I don't think that's a future we'd really have - most of these uses/users aren't going to pass in section indexes (users of llvm-dwarfdump/objdump /symbolizeretc are unlikely to specify a section index on the command line too), so I think it makes sense to have the built-in functionality be what's currently grafted on top using getModuleSectionIndexForAddress - leave that as the built-in functionality as it was before and let users override/be specific if they want to be. That is also thing which I am agree. i.e. users should have a built-in functionality for setting proper SectionIndex. They should not do it manually. Thing which seems to me questionable is that - Is it OK to hide that builtin functionality under old symbolizer interface : symbolizeCode(..., object::SectionedAddress ModuleOffset) symbolizeInlinedCode(..., object::SectionedAddress ModuleOffset) symbolizeData(..., object::SectionedAddress ModuleOffset i.e. if ModuleOffset.SectionIndex = UndefSection and if we will keep old behavior in this case. That looks questionable because different use cases could require different behaviour. But in this case some concrete behavior would already be done. Instead we can specify interface so that it always receives correct unambitious information. i.e. these methods would always receive correct SectionIndex : symbolizeCode(..., object::SectionedAddress ModuleOffset) symbolizeInlinedCode(..., object::SectionedAddress ModuleOffset) symbolizeData(..., object::SectionedAddress ModuleOffset so that SectionIndex either set into proper index of section, either set into Undef state and it means an absolute address. That allows to implement several builtin behaviors separately. for example : we could implement this method for symbolizer : enumerateSections (uint64_t address, std::function<void(uint_64_t SectionIndex)> on_section); then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them. If we will need another behavior(find first of, find last of/display error) then it would be possible to create another helper builtin routine(as part of symbolizer interface or as separate implementation). This patch follows that idea : it requires SectionIndex to be properly specified for symbolizeCode/InlinedCode/Data methods and provides helper getModuleSectionIndexForAddress() routine so that all existing tools would be able to call it and work the same way as before. This getModuleSectionIndexForAddress routine assumed to be temporarily and should be replaced later with proper solution(s). Something like above enumerateSections() suggestion. So it looks like this solution allows API to help users and at the same time to implement several use cases. avl: >I take it your idea was that this "open the file, search for some section index which >covers…
		dblaikieUnsubmitted Not Done Reply Inline Actions I'm still not sure I follow the harm of having these APIs default to their old behavior - if the current solution is to have them all use this extra API to more awkwardly get the old behavior anyway. "then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them." What's the new behavior you're describing here - how would dwarfdump/objdump/symbolizer behave differently if they enumerated the sections? What sort of user-facing behavior change are you picturing here? I think the existing behavior's really not too problematic & fine to preserve (what do GNU tools do - eg: addr2line on an object file?) & API users can override the Undef section index to some specific section if/when they want to. dblaikie: I'm still not sure I follow the harm of having these APIs default to their old behavior - if…
		avlAuthorUnsubmitted Done Reply Inline Actions I'm still not sure I follow the harm of having these APIs default to their old behavior - if the current solution is to have them all use this extra API to more awkwardly get the old behavior anyway. My main concern about "old behavior for Undef section index" is that it is implicit solution and this solution does not have cancel option. If we will need to implement another behavior for : symbolizeCode(SectionedAddress.SectionIndex = Undef) How would it be possible to cancel "old behavior for Undef section index" ? "then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them." What's the new behavior you're describing here - how would dwarfdump/objdump /symbolizer behave differently if they enumerated the sections? What sort of user-facing behavior change are you picturing here? Actually I do not suggest to change their behavior here. I just show that they could require different behavior than current one: llvm-symbolizer -obj=test.o 0x10 func1() test.c:3 above usage will show only one entry from some.o. If some.o has more sections which match to the specified address then they would not be shown. It is possible that symbolizer could be extended to show entries from all sections matched to specified address. llvm-symbolizer -obj=test.o 0x10 func1() test.c:3 func2() test.c:20 In this case there would be necessary to implement some sort of sections enumeration. "old behavior for Undef section index" would not help here. I think the existing behavior's really not too problematic & fine to preserve (what do GNU tools do - eg: addr2line on an object file?) & API users can override the Undef section index to some specific section if/when they want to. I do not suggest to change behavior. If current behavior is OK then let`s preserve it. I suggest to not hide specific behavior under API. My concern is that let`s support it in an explicit way. It does not look awkwardly : object::SectionedAddress ModuleOffset = { Offset, Symbolizer.getModuleSectionIndexForAddress(ModuleName, Offset)}; Symbolizer.symbolizeCode(ModuleName, ModuleOffset) It explicitly tells how Section Index received. getModuleSectionIndexForAddress() preserve old behavior. It could easily be extended for future needs(if necessary) The advantage is that it would be explicitly seen which behavior would be used: object::SectionedAddress ModuleOffset = { Offset, Symbolizer.getSectionIndexForNewBehavior(ModuleName, Offset)}; Symbolizer.symbolizeCode(ModuleName, ModuleOffset) API users can override the Undef section index to some specific section if/when they want to API users can not override behavior for Undef section index if Undef section index would mean something specific. If the idea would be to get symbol information for only absolute addresses - then call to symbolizeCode(SectionedAddress.SectionIndex = Undef) would return wrong results, since "old behavior" does extra work. avl: >I'm still not sure I follow the harm of having these APIs default to their old behavior - >if…
		dblaikieUnsubmitted Not Done Reply Inline Actions My main concern about "old behavior for Undef section index" is that it is implicit solution and this solution does not have cancel option. If we will need to implement another behavior for : symbolizeCode(SectionedAddress.SectionIndex = Undef) How would it be possible to cancel "old behavior for Undef section index" ? I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right? So it seems clear to me how the caller can choose to get the old behavior, or choose not to. Actually I do not suggest to change their behavior here. I just show that they could require different behavior than current one: llvm-symbolizer -obj=test.o 0x10 func1() test.c:3 above usage will show only one entry from some.o. If some.o has more sections which match to the specified address then they would not be shown. It is possible > that symbolizer could be extended to show entries from all sections matched to specified address. llvm-symbolizer -obj=test.o 0x10 func1() test.c:3 func2() test.c:20 This ^ is what I would mean by a change in behavior. That's a change in what llvm-symbolizer does/how it behaves. In this case there would be necessary to implement some sort of sections enumeration. "old behavior for Undef section index" would not help here. Maybe - alternatively, the API could be changed to respond with multiple results when the request is ambiguous (when there's more than one section with that address in it). But the particular thing about this API is that it separately opens the file and searches through the sections - which seems like a fairly awkward API (to have to open the file, processes it, then call an API that will open the file again & parse it completely separately). Keeping the previous behavior as a fallback when Undef is given seems like a better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc"). In the future, the API could be improved to, instead of returning the first (or some arbitrary) result for the address - instead it could be split into two APIs - one that takes a SectionedAddress and returns zero or one results, and one that takes a raw address and returns zero or more results. But I don't think there's any need to rush to implimenting that in any callers - they were mostly doing fine with the old behavior. dblaikie: > My main concern about "old behavior for Undef section index" is that it is implicit solution…
		avlAuthorUnsubmitted Done Reply Inline Actions I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right? not quite right. I mean the case when caller passes Undef and wants different than "old behavior". I.e. Undef Section index means - absolute address(it is known that this is global address , not local to the section). If user asked for absolute address He might do not want to handle local section addresses which would be done as "old behaviour". Keeping the previous behavior as a fallback when Undef is given seems like a better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc"). reopening file is just implementation detail and could be handled in the same way as llvm-symbolizer currently does - it caches files. From the other side I think symbolizer`s interface would be changed anyway - to support several sections case. So that current "old behavior" would probably be removed later... Anyway, if "Keeping the previous behavior as a fallback when Undef is given" seems better than "separating behavior in explicit way" then let`s continue with this preferable approach. So I am going to implement - if SectionIndex set into Undef state, then symbolizer would internally detect SectionIndex similarly to how it was before. avl: >I'm not sure what you mean by "cancel the old behavior" - if a caller never passes >Undef…
		dblaikieUnsubmitted Not Done Reply Inline Actions I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right? not quite right. I mean the case when caller passes Undef and wants different than "old behavior". I.e. Undef Section index means - absolute address(it is known that this is global address , not local to the section). If user asked for absolute address He might do not want to handle local section addresses which would be done as "old behaviour". Ah, OK, thanks for clarifying/explaining that (re: absolute address). When would there be an ambiguity here between "any section with this address in it" and "I know this is an absolute address"? My naive understanding is that, say, in an unlinked object file addresses are all section-relative, and in a linked executable they're absolute. Are there cases where you might have both? And knowing whether the address is absolute or relative would produce different answers? Keeping the previous behavior as a fallback when Undef is given seems like better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc"). reopening file is just implementation detail and could be handled in the same way as llvm-symbolizer currently does - it caches files. It's an implementation detail that, to me, speaks to an issue in the design of the APIs - if the caller into the API is expected to somehow identify the section before calling into the API which then necessarily opens/parses the file for sections, etc. And one where callers may reasonably not have up-front knowledge of which section ID they want (& have to perform a search like this) - that speaks, to me, to something ill-fitting about the design of the API as it stands - which is where I'm coming from/motivated by in this discussion. From the other side I think symbolizer`s interface would be changed anyway - to support several sections case. So that current "old behavior" would probably be removed later... Anyway, if "Keeping the previous behavior as a fallback when Undef is given" seems better than "separating behavior in explicit way" then let`s continue with this preferable approach. So I am going to implement - if SectionIndex set into Undef state, then symbolizer would internally detect SectionIndex similarly to how it was before. "before" being "before any of your recent changes"? OK - thanks! If there is a case where "Undef == fallback to old behavior" and "Undef == treat the address as absolute" would produce different answers - could you include a test case so we can keep this use case in mind in the future? dblaikie: >> I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef…
		avlAuthorUnsubmitted Done Reply Inline Actions When would there be an ambiguity here between "any section with this address in it" and "I know this is an absolute address"? I thought about following difference : "I know this is an absolute address": symbolize(SectionIndex=Undef, linked_file) => func1():test.c:3 symbolize(SectionIndex=Undef, not_linked_file) => error. not linked executable. "any section with this address in it": symbolize(SectionIndex=Undef, linked_file) => func1():test.c:3 symbolize(SectionIndex=Undef, not_linked_file) => func1():test.c:3 note: I do not suggest to implement above behavior. I think that is possible scenario if caller does not want to handle case "address relates to several sections" (silently displaying one from several matches). And one where callers may reasonably not have up-front knowledge of which section ID they want (& have to perform a search like this) Ok, I assumed that caller should know which section specified address belongs. if symbolizer assumed to be used without section`s relation knowledge then we probably do not need SectionedAddress in symbolizer at all. "before" being "before any of your recent changes"? OK - thanks! yes. avl: >When would there be an ambiguity here between "any section with this address in it" >and "I…

		auto ObjectsOrErr = getOrCreateObjectPair(std::get<0>(ParsedModuleName),
		std::get<1>(ParsedModuleName));
		if (!ObjectsOrErr) {
		// ignore error at this place.
		consumeError(ObjectsOrErr.takeError());
		Modules.erase(ModuleName);
		ObjectPairForPathArch.erase(std::make_pair(std::get<0>(ParsedModuleName),
		std::get<1>(ParsedModuleName)));
		ObjectForUBPathAndArch.erase(std::make_pair(std::get<0>(ParsedModuleName),
		std::get<1>(ParsedModuleName)));
		dblaikieUnsubmitted Not Done Reply Inline Actions Lots more get<N> rather than first/second. But there's also enough uses here it might be nicer to name the two variables instead of accessing them by pair repeatedly. dblaikie: Lots more get<N> rather than first/second. But there's also enough uses here it might be nicer…
		BinaryForPath.erase(std::get<0>(ParsedModuleName));
		return object::SectionedAddress::UndefSection;
		}
		ObjectPair Objects = ObjectsOrErr.get();

		for (SectionRef Sec : std::get<0>(Objects)->sections()) {
		if (!Sec.isText() \|\| Sec.isVirtual())
		continue;

		if (Address >= Sec.getAddress() &&
		Address <= Sec.getAddress() + Sec.getSize()) {
		return Sec.getIndex();
		}
		}

		return object::SectionedAddress::UndefSection;
		}

void LLVMSymbolizer::flush() {		void LLVMSymbolizer::flush() {
ObjectForUBPathAndArch.clear();		ObjectForUBPathAndArch.clear();
BinaryForPath.clear();		BinaryForPath.clear();
ObjectPairForPathArch.clear();		ObjectPairForPathArch.clear();
Modules.clear();		Modules.clear();
}		}

namespace {		namespace {
▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	ObjectFile *LLVMSymbolizer::lookUpDebuglinkObject(const std::string &Path,
if (!DbgObjOrErr) {		if (!DbgObjOrErr) {
// Ignore errors, the file might not exist.		// Ignore errors, the file might not exist.
consumeError(DbgObjOrErr.takeError());		consumeError(DbgObjOrErr.takeError());
return nullptr;		return nullptr;
}		}
return DbgObjOrErr.get();		return DbgObjOrErr.get();
}		}

		std::pair<std::string, std::string>
		dblaikieUnsubmitted Not Done Reply Inline Actions Perhaps this function should return StringRefs into the originally passed in string (perhaps the original input should be a StringRef too)? dblaikie: Perhaps this function should return StringRefs into the originally passed in string (perhaps…
		LLVMSymbolizer::parseModuleName(const std::string &ModuleName) const {
		std::string BinaryName = ModuleName;
		std::string ArchName = Opts.DefaultArch;
		size_t ColonPos = ModuleName.find_last_of(':');
		// Verify that substring after colon form a valid arch name.
		if (ColonPos != std::string::npos) {
		std::string ArchStr = ModuleName.substr(ColonPos + 1);
		if (Triple(ArchStr).getArch() != Triple::UnknownArch) {
		BinaryName = ModuleName.substr(0, ColonPos);
		ArchName = ArchStr;
		}
		}

		return std::pair<std::string, std::string>(BinaryName, ArchName);
		dblaikieUnsubmitted Not Done Reply Inline Actions Would make_pair be simpler here? Also potentially using std::move to avoid extra string copies: return std::make_pair(std::move(BinaryName), std::move(ArchName)); dblaikie: Would make_pair be simpler here? Also potentially using std::move to avoid extra string copies…
		}

Expected<LLVMSymbolizer::ObjectPair>		Expected<LLVMSymbolizer::ObjectPair>
LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,		LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,
const std::string &ArchName) {		const std::string &ArchName) {
const auto &I = ObjectPairForPathArch.find(std::make_pair(Path, ArchName));		const auto &I = ObjectPairForPathArch.find(std::make_pair(Path, ArchName));
if (I != ObjectPairForPathArch.end()) {		if (I != ObjectPairForPathArch.end()) {
return I->second;		return I->second;
}		}

▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines

Expected<SymbolizableModule *>		Expected<SymbolizableModule *>
LLVMSymbolizer::getOrCreateModuleInfo(const std::string &ModuleName,		LLVMSymbolizer::getOrCreateModuleInfo(const std::string &ModuleName,
StringRef DWPName) {		StringRef DWPName) {
const auto &I = Modules.find(ModuleName);		const auto &I = Modules.find(ModuleName);
if (I != Modules.end()) {		if (I != Modules.end()) {
return I->second.get();		return I->second.get();
}		}
std::string BinaryName = ModuleName;
std::string ArchName = Opts.DefaultArch;		std::pair<std::string, std::string> ParsedModuleName =
size_t ColonPos = ModuleName.find_last_of(':');		parseModuleName(ModuleName);
// Verify that substring after colon form a valid arch name.
if (ColonPos != std::string::npos) {		auto ObjectsOrErr = getOrCreateObjectPair(std::get<0>(ParsedModuleName),
std::string ArchStr = ModuleName.substr(ColonPos + 1);		std::get<1>(ParsedModuleName));
		dblaikieUnsubmitted Not Done Reply Inline Actions rather than using the tuple accessor (get<N>) I think it's probably still more canonical to use ".first" and ".second" for pairs. dblaikie: rather than using the tuple accessor (get<N>) I think it's probably still more canonical to use…
if (Triple(ArchStr).getArch() != Triple::UnknownArch) {
BinaryName = ModuleName.substr(0, ColonPos);
ArchName = ArchStr;
}
}
auto ObjectsOrErr = getOrCreateObjectPair(BinaryName, ArchName);
if (!ObjectsOrErr) {		if (!ObjectsOrErr) {
// Failed to find valid object file.		// Failed to find valid object file.
Modules.insert(		Modules.insert(
std::make_pair(ModuleName, std::unique_ptr<SymbolizableModule>()));		std::make_pair(ModuleName, std::unique_ptr<SymbolizableModule>()));
		dblaikieUnsubmitted Not Done Reply Inline Actions You could replace the "unique_ptr<T>()" with "nullptr" here. dblaikie: You could replace the "unique_ptr<T>()" with "nullptr" here.
return ObjectsOrErr.takeError();		return ObjectsOrErr.takeError();
}		}
ObjectPair Objects = ObjectsOrErr.get();		ObjectPair Objects = ObjectsOrErr.get();

		if (Objects.first == nullptr)
		dblaikieUnsubmitted Not Done Reply Inline Actions I don't think much code in LLVM checks for explicit "== nullptr" - this would probably more canonically be written as "if (!Objects.first)" dblaikie: I don't think much code in LLVM checks for explicit "== nullptr" - this would probably more…
		return static_cast<SymbolizableModule *>(nullptr);
		dblaikieUnsubmitted Not Done Reply Inline Actions Guess this cast is needed because of the conversion to Expected? Is non-error, but null, a valid result from this function? That's a tricky thing, but could be the case. I wonder if Expected<T> should be constructible from nullptr_t to simplify this sort of case? dblaikie:* Guess this cast is needed because of the conversion to Expected? Is non-error, but null, a…

std::unique_ptr<DIContext> Context;		std::unique_ptr<DIContext> Context;
// If this is a COFF object containing PDB info, use a PDBContext to		// If this is a COFF object containing PDB info, use a PDBContext to
// symbolize. Otherwise, use DWARF.		// symbolize. Otherwise, use DWARF.
if (auto CoffObject = dyn_cast<COFFObjectFile>(Objects.first)) {		if (auto CoffObject = dyn_cast<COFFObjectFile>(Objects.first)) {
const codeview::DebugInfo *DebugInfo;		const codeview::DebugInfo *DebugInfo;
StringRef PDBFileName;		StringRef PDBFileName;
auto EC = CoffObject->getDebugPDBInfo(DebugInfo, PDBFileName);		auto EC = CoffObject->getDebugPDBInfo(DebugInfo, PDBFileName);
if (!EC && DebugInfo != nullptr && !PDBFileName.empty()) {		if (!EC && DebugInfo != nullptr && !PDBFileName.empty()) {
▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	for (const std::pair<SymbolRef, uint64_t> &P : computeSymbolSizes(*DebugObj)) {
uint64_t Size = P.second;		uint64_t Size = P.second;

// Record this address in a local vector		// Record this address in a local vector
Functions.push_back((void*)Addr);		Functions.push_back((void*)Addr);

// Build the function loaded notification message		// Build the function loaded notification message
iJIT_Method_Load FunctionMessage =		iJIT_Method_Load FunctionMessage =
FunctionDescToIntelJITFormat(*Wrapper, Name->data(), Addr, Size);		FunctionDescToIntelJITFormat(*Wrapper, Name->data(), Addr, Size);
// TODO: it is neccessary to set proper SectionIndex here.
// object::SectionedAddress::UndefSection works for only absolute addresses.		uint64_t SectionIndex = object::SectionedAddress::UndefSection;
DILineInfoTable Lines = Context->getLineInfoForAddressRange({Addr, object::SectionedAddress::UndefSection}, Size);		auto SectOrErr = Sym.getSection();
		if (SectOrErr) {
		SectionIndex = SectOrErr.get()->getIndex();
		}

		DILineInfoTable Lines =
		Context->getLineInfoForAddressRange({Addr, SectionIndex}, Size);
		dblaikieUnsubmitted Not Done Reply Inline Actions Is this related to the rest of the change in this patch? If not, it should probably go in a separate patch/commit. dblaikie: Is this related to the rest of the change in this patch? If not, it should probably go in a…
		avlAuthorUnsubmitted Done Reply Inline Actions The only relation is that it fixes several lit tests after "D58194" patch. avl: The only relation is that it fixes several lit tests after "D58194" patch.
DILineInfoTable::iterator Begin = Lines.begin();		DILineInfoTable::iterator Begin = Lines.begin();
DILineInfoTable::iterator End = Lines.end();		DILineInfoTable::iterator End = Lines.end();
for (DILineInfoTable::iterator It = Begin; It != End; ++It) {		for (DILineInfoTable::iterator It = Begin; It != End; ++It) {
LineInfo.push_back(		LineInfo.push_back(
DILineInfoToIntelJITFormat((uintptr_t)Addr, It->first, It->second));		DILineInfoToIntelJITFormat((uintptr_t)Addr, It->first, It->second));
}		}
if (LineInfo.size() == 0) {		if (LineInfo.size() == 0) {
FunctionMessage.source_file_name = 0;		FunctionMessage.source_file_name = 0;
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	if (ClBinaryName.empty()) {
ModuleName = ClBinaryName;		ModuleName = ClBinaryName;
}		}
// Skip delimiters and parse module offset.		// Skip delimiters and parse module offset.
pos += strspn(pos, kDelimiters);		pos += strspn(pos, kDelimiters);
int offset_length = strcspn(pos, kDelimiters);		int offset_length = strcspn(pos, kDelimiters);
return !StringRef(pos, offset_length).getAsInteger(0, ModuleOffset);		return !StringRef(pos, offset_length).getAsInteger(0, ModuleOffset);
}		}

// This routine returns section index for an address.
// Assumption: would work ambiguously for object files which have sections not
// assigned to an address(since the same address could belong to various
// sections).
static uint64_t getModuleSectionIndexForAddress(const std::string &ModuleName,
uint64_t Address) {

// following ModuleName processing was copied from
// LLVMSymbolizer::getOrCreateModuleInfo().
// it needs to be refactored to avoid code duplication.
std::string BinaryName = ModuleName;
size_t ColonPos = ModuleName.find_last_of(':');
// Verify that substring after colon form a valid arch name.
if (ColonPos != std::string::npos) {
std::string ArchStr = ModuleName.substr(ColonPos + 1);
if (Triple(ArchStr).getArch() != Triple::UnknownArch) {
BinaryName = ModuleName.substr(0, ColonPos);
}
}

Expected<OwningBinary<Binary>> BinaryOrErr = createBinary(BinaryName);

if (error(BinaryOrErr))
return object::SectionedAddress::UndefSection;

Binary &Binary = *BinaryOrErr->getBinary();

if (ObjectFile *O = dyn_cast<ObjectFile>(&Binary)) {
for (SectionRef Sec : O->sections()) {
if (!Sec.isText() \|\| Sec.isVirtual())
continue;

if (Address >= Sec.getAddress() &&
Address <= Sec.getAddress() + Sec.getSize()) {
return Sec.getIndex();
}
}
}

return object::SectionedAddress::UndefSection;
}

static void symbolizeInput(StringRef InputString, LLVMSymbolizer &Symbolizer,		static void symbolizeInput(StringRef InputString, LLVMSymbolizer &Symbolizer,
DIPrinter &Printer) {		DIPrinter &Printer) {
bool IsData = false;		bool IsData = false;
std::string ModuleName;		std::string ModuleName;
uint64_t Offset = 0;		uint64_t Offset = 0;
if (!parseCommand(StringRef(InputString), IsData, ModuleName, Offset)) {		if (!parseCommand(StringRef(InputString), IsData, ModuleName, Offset)) {
outs() << InputString;		outs() << InputString;
return;		return;
}		}

if (ClPrintAddress) {		if (ClPrintAddress) {
outs() << "0x";		outs() << "0x";
outs().write_hex(Offset);		outs().write_hex(Offset);
StringRef Delimiter = ClPrettyPrint ? ": " : "\n";		StringRef Delimiter = ClPrettyPrint ? ": " : "\n";
outs() << Delimiter;		outs() << Delimiter;
}		}
Offset -= ClAdjustVMA;		Offset -= ClAdjustVMA;
object::SectionedAddress ModuleOffset = {		object::SectionedAddress ModuleOffset = {
Offset, getModuleSectionIndexForAddress(ModuleName, Offset)};		Offset, Symbolizer.getModuleSectionIndexForAddress(ModuleName, Offset)};
if (IsData) {		if (IsData) {
auto ResOrErr = Symbolizer.symbolizeData(ModuleName, ModuleOffset);		auto ResOrErr = Symbolizer.symbolizeData(ModuleName, ModuleOffset);
Printer << (error(ResOrErr) ? DIGlobal() : ResOrErr.get());		Printer << (error(ResOrErr) ? DIGlobal() : ResOrErr.get());
} else if (ClPrintInlining) {		} else if (ClPrintInlining) {
auto ResOrErr =		auto ResOrErr =
Symbolizer.symbolizeInlinedCode(ModuleName, ModuleOffset, ClDwpName);		Symbolizer.symbolizeInlinedCode(ModuleName, ModuleOffset, ClDwpName);
Printer << (error(ResOrErr) ? DIInliningInfo() : ResOrErr.get());		Printer << (error(ResOrErr) ? DIInliningInfo() : ResOrErr.get());
} else {		} else {
▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 188970

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp

[DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"
ClosedPublic