This patch fixes buildbot failures after https://reviews.llvm.org/D58194. [DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces" [Symbolizer] Add getModuleSectionIndexForAddress() helper routine The https://reviews.llvm.org/D58194 patch changed symbolizer interface. Particularily it requires not only Address but SectionIndex also. Note object::SectionedAddress parameter: Expected<DILineInfo> symbolizeCode(const std::string &ModuleName, object::SectionedAddress ModuleOffset, StringRef DWPName = ""); There are callers of symbolizer which do not know particular section index. That patch creates getModuleSectionIndexForAddress() routine which will detect section index for the specified address. Thus if caller set ModuleOffset.SectionIndex into object::SectionedAddress::UndefSection state then symbolizer would detect section index using getModuleSectionIndexForAddress routine.
Details
Diff Detail
Event Timeline
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
142 ↗ | (On Diff #188970) | Any reason this is becoming a member function from a file-local function? |
146 ↗ | (On Diff #188970) | I hadn't spotted this code (or looked very closely at it, at least) in the original review. I'm surprised/confused why the original change involved adding code that would look at input object file names at all. Why is this? Hmm, I see looking at the llvm-symbolizer code that this logic looks up the section name and then passes that in along with the file name to the symbolizing routine. But what's the point of that? It doesn't seem like it would change/improve the quality of the symbolizer's output compared to passing in a "I don't know what section" value, right? |
148–157 ↗ | (On Diff #188970) | Lots more get<N> rather than first/second. But there's also enough uses here it might be nicer to name the two variables instead of accessing them by pair repeatedly. |
341 ↗ | (On Diff #188970) | Perhaps this function should return StringRefs into the originally passed in string (perhaps the original input should be a StringRef too)? |
355 ↗ | (On Diff #188970) | Would make_pair be simpler here? Also potentially using std::move to avoid extra string copies: return std::make_pair(std::move(BinaryName), std::move(ArchName)); |
443–444 ↗ | (On Diff #188970) | rather than using the tuple accessor (get<N>) I think it's probably still more canonical to use ".first" and ".second" for pairs. |
448 ↗ | (On Diff #188970) | You could replace the "unique_ptr<T>()" with "nullptr" here. |
453 ↗ | (On Diff #188970) | I don't think much code in LLVM checks for explicit "== nullptr" - this would probably more canonically be written as "if (!Objects.first)" |
454 ↗ | (On Diff #188970) | Guess this cast is needed because of the conversion to Expected? Is non-error, but null, a valid result from this function? That's a tricky thing, but could be the case. I wonder if Expected<T*> should be constructible from nullptr_t to simplify this sort of case? |
llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp | ||
150–158 ↗ | (On Diff #188970) | Is this related to the rest of the change in this patch? If not, it should probably go in a separate patch/commit. |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
146 ↗ | (On Diff #188970) | I think this follows on from one of the comments I made in the original review - I don't know it's worth plumbing through the section index to any callers except the original one you had in mind (objdump disassembly) as all the other callers will need work to do anytihng other than what they do today (eg: the symbolizer could have support for multiple answers - the same address in multiple sections (in which case it'd need tests for that, etc)- but this bit of code isn't doing that) |
David, Thank you for the comments! I skipped code style issues for the moment. please check the comments for allowing setting SectionIndex into Undef state and for the necessity of getModuleSectionIndexForAddress() routine. After we will decide on these questions I will handle code style issues.
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
142 ↗ | (On Diff #188970) | the idea was:
|
146 ↗ | (On Diff #188970) | I think that my original description was not clear enough and there is some misunderstanding of new interface usage. Let`s clarify it now and decide what should it mean. I assume that SectionIndex should always be set in the correct value. Setting SectionIndex into Undef state could lead to incorrect work if relocations present. At least that is how it is currently done. That assumes that all callers of new interface need to do extra work to set SectionIndex properly. In some cases I already done that additional work, in some cases left it in Undef state(if it did not brake testing) and for some cases getModuleSectionIndexForAddress could be used for setting SectionIndex(that is the temporarily routine until proper code implemented). These two last cases assumed to be changed into proper implementation. It differs from the logic - if SectionIndex presented then take it into account, if SectionIndex is not presented then handle it somehow. The rationale for that is that DWARFDebugLineTable looks to be improper place to decide how to handle "I don't know what section" case. Different use cases could assume different handling of that situation. So I decided that DWARFDebugLineTable should always receive full info and caller of this routine should decide how to handle this situation. f.e. case "I don't know what section" could be handled : a) find correct section and set it. Do you think we need to change it and allow setting SectionIndex into Undef state and specify how DWARFDebugLineTable would handle it ? |
llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp | ||
150–158 ↗ | (On Diff #188970) | The only relation is that it fixes several lit tests after "D58194" patch. |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
142 ↗ | (On Diff #188970) | Right, I understand the first bit - but that wouldn't change where this function should live. The second bit - it'd be nice to move it potentially in a separate refactoring patch (and/or including it in the patch where a second caller is added to this function) to help isolate/separate independent changes. |
146 ↗ | (On Diff #188970) | Ah. But most (all?) executables only have one executable section, right? That's why the way this worked before your change has worked OK for most/all users historically? On that basis I think it makes sense to let the exsiting/previous APIs (witohut specifying a section index (ie: have the original functions, without any section index parameter - then overloaded with the new versions that allow the section index to be specified) - or perhaps having to specify an Undef one) to continue to work, while letting those callers who need to, specify it. |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
146 ↗ | (On Diff #188970) | right, most executables only have one executable section. That is why previous implementation worked correctly for them using only address as point locator. But for not-linked objects this is not right. Previous implementation did not work for them. i.e. if we allow previous implementation by setting SectionIndex = UndefSection then such a call would work for only linked executable. That means that DWARFDebugLine or LLVMSymbolizer would imply concrete state. following calls could be done for only linked executables : DWARFDebugLine.getFileLineInfoForAddress ( SectionedAddress.SectionIndex = UndefSection it would be error to call them for non-linked object file. That means that all places where we could expect non-linked object file needs to have something like this : if (linked executable) { LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = UndefSection } else { LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = getSectionIndex() } llvm-dwardump, llvm-objectdump, lld, llvm-symbolizer - could work not only with linked executable but with not-linked object files also. So it looks like most usages would expect both cases - linked executables and not-linked object files. Probably in that case stateless interface would be more convenient ? Current stateless interface does not assume whether it works for linked or not-linked object. It always receives correct SectionIndex and works equally for both types : LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = getSectionIndex() The places where it becomes less convenient is the code assumed to work with *only* linked-executable. In such places there would be neccessary to add code which would detect SectionIndex for new interface. But this looks to be quite natural task and written once it could be used later for free. I created such a routine - getModuleSectionIndexForAddress(though I agree this is quite naive implementation) . The advantage of the new stateless interface is that it could not be used incorrectly. i.e. it would not be possible to use UndefSection for non-linked object files. Probably it would make sense to not keep old behavior without SectionIndex ? currently UndefSection used in following modules :
|
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
146 ↗ | (On Diff #188970) | David, do you think it makes sense to assume that SectionIndex should always be set in the correct value(not equal to undef) ? Or your preferred choice is to allow SectionIndex to be set into UndefState and assume that in that case linked executables should be processed ? |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
146 ↗ | (On Diff #188970) | I don't think it makes sense to open the file separately, search for a valid section index, then pass that into the API - compared to the API doing this for you. If you know the section you want, you pass it in, if not, you don't & you get the best effort (first instance, or perhaps APIs could be updated to return multiple responses when it's ambiguous). I take it your idea was that this "open the file, search for some section index which covers the address, then use that section index when calling the underlying API" was meant as a stop-gap until some longer term solution where section indexes would always be passed in? I don't think that's a future we'd really have - most of these uses/users aren't going to pass in section indexes (users of llvm-dwarfdump/objdump/symbolizeretc are unlikely to specify a section index on the command line too), so I think it makes sense to have the built-in functionality be what's currently grafted on top using getModuleSectionIndexForAddress - leave that as the built-in functionality as it was before and let users override/be specific if they want to be. |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
146 ↗ | (On Diff #188970) |
yes. that`s right.
That is also thing which I am agree. i.e. users should have a built-in functionality for setting proper SectionIndex. They should not do it manually. Thing which seems to me questionable is that - Is it OK to hide that builtin functionality under old symbolizer interface : symbolizeCode(..., object::SectionedAddress ModuleOffset) i.e. if ModuleOffset.SectionIndex = UndefSection and if we will keep old behavior in this case. That looks questionable because different use cases could require different behaviour. But in this case some concrete behavior would already be done. Instead we can specify interface so that it always receives correct unambitious information. i.e. these methods would always receive correct SectionIndex : symbolizeCode(..., object::SectionedAddress ModuleOffset) so that SectionIndex either set into proper index of section, either set into Undef state and it means an absolute address. That allows to implement several builtin behaviors separately. for example : we could implement this method for symbolizer : enumerateSections (uint64_t address, std::function<void(uint_64_t SectionIndex)> on_section); then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them. If we will need another behavior(find first of, find last of/display error) then it would be possible to create another helper builtin routine(as part of symbolizer interface or as separate implementation). This patch follows that idea : it requires SectionIndex to be properly specified for symbolizeCode/InlinedCode/Data methods and provides helper getModuleSectionIndexForAddress() routine so that all existing tools would be able to call it and work the same way as before. This getModuleSectionIndexForAddress routine assumed to be temporarily and should be replaced later with proper solution(s). Something like above enumerateSections() suggestion. So it looks like this solution allows API to help users and at the same time to implement several use cases. |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
146 ↗ | (On Diff #188970) | I'm still not sure I follow the harm of having these APIs default to their old behavior - if the current solution is to have them all use this extra API to more awkwardly get the old behavior anyway. "then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them." What's the new behavior you're describing here - how would dwarfdump/objdump/symbolizer behave differently if they enumerated the sections? What sort of user-facing behavior change are you picturing here? I think the existing behavior's really not too problematic & fine to preserve (what do GNU tools do - eg: addr2line on an object file?) & API users can override the Undef section index to some specific section if/when they want to. |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
146 ↗ | (On Diff #188970) |
My main concern about "old behavior for Undef section index" is that it is implicit solution and this solution does not have cancel option. If we will need to implement another behavior for : symbolizeCode(SectionedAddress.SectionIndex = Undef) How would it be possible to cancel "old behavior for Undef section index" ?
Actually I do not suggest to change their behavior here. llvm-symbolizer -obj=test.o 0x10 above usage will show only one entry from some.o. If some.o has more sections llvm-symbolizer -obj=test.o 0x10 In this case there would be necessary to implement some sort of sections enumeration. "old behavior for Undef section index" would not help here.
I do not suggest to change behavior. If current behavior is OK then let`s preserve it. I suggest to not hide specific behavior under API. My concern is that let`s support it in an explicit way. It does not look awkwardly : object::SectionedAddress ModuleOffset = { Offset, Symbolizer.getModuleSectionIndexForAddress(ModuleName, Offset)}; Symbolizer.symbolizeCode(ModuleName, ModuleOffset) It explicitly tells how Section Index received. getModuleSectionIndexForAddress() It could easily be extended for future needs(if necessary) object::SectionedAddress ModuleOffset = { Offset, Symbolizer.getSectionIndexForNewBehavior(ModuleName, Offset)}; Symbolizer.symbolizeCode(ModuleName, ModuleOffset)
API users can not override behavior for Undef section index if symbolizeCode(SectionedAddress.SectionIndex = Undef) would return wrong results, since "old behavior" does extra work. |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
146 ↗ | (On Diff #188970) |
I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right? So it seems clear to me how the caller can choose to get the old behavior, or choose not to.
This ^ is what I would mean by a change in behavior. That's a change in what llvm-symbolizer does/how it behaves.
Maybe - alternatively, the API could be changed to respond with multiple results when the request is ambiguous (when there's more than one section with that address in it). But the particular thing about this API is that it separately opens the file and searches through the sections - which seems like a fairly awkward API (to have to open the file, processes it, then call an API that will open the file again & parse it completely separately). Keeping the previous behavior as a fallback when Undef is given seems like a better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc"). In the future, the API could be improved to, instead of returning the first (or some arbitrary) result for the address - instead it could be split into two APIs - one that takes a SectionedAddress and returns zero or one results, and one that takes a raw address and returns zero or more results. But I don't think there's any need to rush to implimenting that in any callers - they were mostly doing fine with the old behavior. |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
146 ↗ | (On Diff #188970) |
not quite right. I mean the case when caller passes Undef and wants different than "old behavior". I.e. Undef Section index means - absolute address(it is known that this is global address , not local to the section). If user asked for absolute address He might do not want to handle local section addresses which would be done as "old behaviour".
reopening file is just implementation detail and could be handled in the same way as llvm-symbolizer currently does - it caches files. From the other side I think symbolizer`s interface would be changed anyway - to support several sections case. So that current "old behavior" would probably be removed later... Anyway, if "Keeping the previous behavior as a fallback when Undef is given" seems better than "separating behavior in explicit way" then let`s continue with this preferable approach. So I am going to implement - if SectionIndex set into Undef state, then symbolizer would internally detect SectionIndex similarly to how it was before. |
David, Could you also review https://reviews.llvm.org/D58959 please ? It seems not depending on decision discussed here. So it looks like it could already be integrated.
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
146 ↗ | (On Diff #188970) |
Ah, OK, thanks for clarifying/explaining that (re: absolute address). When would there be an ambiguity here between "any section with this address in it" and "I know this is an absolute address"? My naive understanding is that, say, in an unlinked object file addresses are all section-relative, and in a linked executable they're absolute. Are there cases where you might have both? And knowing whether the address is absolute or relative would produce different answers?
It's an implementation detail that, to me, speaks to an issue in the design of the APIs - if the caller into the API is expected to somehow identify the section before calling into the API which then necessarily opens/parses the file for sections, etc. And one where callers may reasonably not have up-front knowledge of which section ID they want (& have to perform a search like this) - that speaks, to me, to something ill-fitting about the design of the API as it stands - which is where I'm coming from/motivated by in this discussion.
"before" being "before any of your recent changes"? OK - thanks! If there is a case where "Undef == fallback to old behavior" and "Undef == treat the address as absolute" would produce different answers - could you include a test case so we can keep this use case in mind in the future? |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
146 ↗ | (On Diff #188970) |
I thought about following difference : "I know this is an absolute address": "any section with this address in it": note: I do not suggest to implement above behavior. I think that is possible scenario if caller does not want to handle case "address relates to several sections" (silently displaying one from several matches).
Ok, I assumed that caller should know which section specified address belongs.
yes. |
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp | ||
---|---|---|
155–166 ↗ | (On Diff #190423) | This code is repeated several times & again, still seems to be at the wrong layer of abstraction to me - having callers parse names, open files, search for sections - rather than passing in Undef and letting the underlying API do what it did before any of these changes. I think it's further down that the Undef=>BestEffortSectionIndex should be handled - again, where it was effectively handled before any of these changes. |
I moved handling Undef case from Symbolizer into SymbolizableObjectFile. Please check whether it is proper place.
I'm still a little confused by the existence of a section search at all whet Undef is given - given that the previous implementation (before your work) there was no such search.
Is it possible the search could be omitted entirely?
In case Undef specified then there could be several sections with matching address ranges.
It seems to me that DWARFDebugLineTable improper place to make such a decision - which of them should be taken as a result.
This in fact high level decision, not low level.
Previously that decision was made just by accident. Property of sorting algorithm defined which concrete section would be selected.
I think that this decision should be unambiguous and currently for the same pair {Address, SectionIndex} there always would be returned the same result.
If DWARFDebugLineTable would be changed so that for Undef value there would be done search by one criteria then for overlapping address ranges there could be different results depending on concrete call to DWARFDebugLineTable. To prevent such random section selection there was implemented getModuleSectionIndexForAddress().
Do you think we need to omit that high level decision and allow random selection for sections with overlapping address ranges ?
David, there is a connected review - https://reviews.llvm.org/D59189 It cures compilation error after D58194 Could you take a look on it please ?
I think that change is OK and it is good to do in spite of decision made in this review.
It still seems too high level to me if it involves the caller separately opening and walking the file contents.
Please move this down at least far enough that it doesn't involve separately computing the file name and opening the file. Where it's a "well, if you didn't specify a section index, we'll use the first section that covers this address".
Please move this down at least far enough that it doesn't involve separately
computing the file name and opening the file. Where it's a "well, if you didn't
specify a section index, we'll use the first section that covers this address".
It looks like this version of patch does exactly this.
It does not open file separately and does not parse file name.
The only thing which is additionally done is for already opened file :
"well, if you didn't specify a section index, we'll use the first section that covers this address" :
uint64_t SymbolizableObjectFile::getModuleSectionIndexForAddress(
uint64_t Address) const { for (SectionRef Sec : Module->sections()) { if (!Sec.isText() || Sec.isVirtual()) continue; if (Address >= Sec.getAddress() && Address <= Sec.getAddress() + Sec.getSize()) { return Sec.getIndex(); } } return object::SectionedAddress::UndefSection;
}
I think it looks very close to above description.
Ah, so it does - thanks for correcting me/ walking me through that. Perhaps I was just reading poorly, or Phab was showing a different delta or something.
This looks good, thanks!
(I guess perhaps another/future change could be for these APIs to return an error if it is ambiguous (the symbolize APIs have an Error return type already, so they could change semantics here without changing the syntax - and that'd probably give a pretty good user experience of "hey, you're trying to symbolize an ambiguous address, we can't currently give you a clear answer to this query") - and, as discussed, perhaps one day they return multiple results, etc)
It seems you indented the description by 2. That may be why the revision is not automatically closed..
Differential Revision: https://reviews.llvm.org/D58848
It seems you indented the description by 2. That may be why the revision is not automatically closed..
Thanks! I will take it into account next time.