This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"
ClosedPublic

Authored by avl on Mar 1 2019, 2:10 PM.

Details

Summary
This patch fixes buildbot failures after https://reviews.llvm.org/D58194.

[DebugInfo] follow up for "add SectionedAddress to DebugInfo interfaces"
    
[Symbolizer] Add getModuleSectionIndexForAddress() helper routine
    
The https://reviews.llvm.org/D58194 patch changed symbolizer interface.
Particularily it requires not only Address but SectionIndex also.
Note object::SectionedAddress parameter:
    
Expected<DILineInfo> symbolizeCode(const std::string &ModuleName,
                                 object::SectionedAddress ModuleOffset,
                                 StringRef DWPName = "");
    
There are callers of symbolizer which do not know particular section index.
That patch creates getModuleSectionIndexForAddress() routine which
will detect section index for the specified address. Thus if caller
set ModuleOffset.SectionIndex into object::SectionedAddress::UndefSection
state then symbolizer would detect section index using
getModuleSectionIndexForAddress routine.

Diff Detail

Event Timeline

avl created this revision.Mar 1 2019, 2:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2019, 2:10 PM
dblaikie added inline comments.Mar 1 2019, 4:37 PM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
142 ↗(On Diff #188970)

Any reason this is becoming a member function from a file-local function?

146 ↗(On Diff #188970)

I hadn't spotted this code (or looked very closely at it, at least) in the original review.

I'm surprised/confused why the original change involved adding code that would look at input object file names at all. Why is this?

Hmm, I see looking at the llvm-symbolizer code that this logic looks up the section name and then passes that in along with the file name to the symbolizing routine.

But what's the point of that? It doesn't seem like it would change/improve the quality of the symbolizer's output compared to passing in a "I don't know what section" value, right?

148–157 ↗(On Diff #188970)

Lots more get<N> rather than first/second. But there's also enough uses here it might be nicer to name the two variables instead of accessing them by pair repeatedly.

341 ↗(On Diff #188970)

Perhaps this function should return StringRefs into the originally passed in string (perhaps the original input should be a StringRef too)?

355 ↗(On Diff #188970)

Would make_pair be simpler here?

Also potentially using std::move to avoid extra string copies:

return std::make_pair(std::move(BinaryName), std::move(ArchName));
443–444 ↗(On Diff #188970)

rather than using the tuple accessor (get<N>) I think it's probably still more canonical to use ".first" and ".second" for pairs.

448 ↗(On Diff #188970)

You could replace the "unique_ptr<T>()" with "nullptr" here.

453 ↗(On Diff #188970)

I don't think much code in LLVM checks for explicit "== nullptr" - this would probably more canonically be written as "if (!Objects.first)"

454 ↗(On Diff #188970)

Guess this cast is needed because of the conversion to Expected?

Is non-error, but null, a valid result from this function? That's a tricky thing, but could be the case.

I wonder if Expected<T*> should be constructible from nullptr_t to simplify this sort of case?

llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp
150–158 ↗(On Diff #188970)

Is this related to the rest of the change in this patch?

If not, it should probably go in a separate patch/commit.

dblaikie added inline comments.Mar 1 2019, 4:39 PM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗(On Diff #188970)

I think this follows on from one of the comments I made in the original review - I don't know it's worth plumbing through the section index to any callers except the original one you had in mind (objdump disassembly) as all the other callers will need work to do anytihng other than what they do today (eg: the symbolizer could have support for multiple answers - the same address in multiple sections (in which case it'd need tests for that, etc)- but this bit of code isn't doing that)

avl marked 3 inline comments as done.Mar 2 2019, 7:51 AM

David, Thank you for the comments! I skipped code style issues for the moment. please check the comments for allowing setting SectionIndex into Undef state and for the necessity of getModuleSectionIndexForAddress() routine. After we will decide on these questions I will handle code style issues.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
142 ↗(On Diff #188970)

the idea was:

  • to not copy ModuleName parsing code from Symbolizer. This code is neccessary to properly handle ModuleName on Darwin, since it contains architecture postfix.
  • and another reason Is to use this function in any place where SectionIndex would be neccessary. Specifying Undef value is not enough usually, see later comment on the Undef SectionIndex usage.
146 ↗(On Diff #188970)

I think that my original description was not clear enough and there is some misunderstanding of new interface usage. Let`s clarify it now and decide what should it mean.

I assume that SectionIndex should always be set in the correct value. Setting SectionIndex into Undef state could lead to incorrect work if relocations present. At least that is how it is currently done. That assumes that all callers of new interface need to do extra work to set SectionIndex properly. In some cases I already done that additional work, in some cases left it in Undef state(if it did not brake testing) and for some cases getModuleSectionIndexForAddress could be used for setting SectionIndex(that is the temporarily routine until proper code implemented). These two last cases assumed to be changed into proper implementation.

It differs from the logic - if SectionIndex presented then take it into account, if SectionIndex is not presented then handle it somehow.

The rationale for that is that DWARFDebugLineTable looks to be improper place to decide how to handle "I don't know what section" case. Different use cases could assume different handling of that situation. So I decided that DWARFDebugLineTable should always receive full info and caller of this routine should decide how to handle this situation.

f.e. case "I don't know what section" could be handled :

a) find correct section and set it.
b) find several sections which match to local offset and use them.
c) find last section which match to local offset and use it(behavior before fix).
d) ignore that case.
f) display error ... something else...

Do you think we need to change it and allow setting SectionIndex into Undef state and specify how DWARFDebugLineTable would handle it ?

llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp
150–158 ↗(On Diff #188970)

The only relation is that it fixes several lit tests after "D58194" patch.

dblaikie added inline comments.Mar 4 2019, 11:14 AM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
142 ↗(On Diff #188970)

Right, I understand the first bit - but that wouldn't change where this function should live.

The second bit - it'd be nice to move it potentially in a separate refactoring patch (and/or including it in the patch where a second caller is added to this function) to help isolate/separate independent changes.

146 ↗(On Diff #188970)

Ah.

But most (all?) executables only have one executable section, right? That's why the way this worked before your change has worked OK for most/all users historically?

On that basis I think it makes sense to let the exsiting/previous APIs (witohut specifying a section index (ie: have the original functions, without any section index parameter - then overloaded with the new versions that allow the section index to be specified) - or perhaps having to specify an Undef one) to continue to work, while letting those callers who need to, specify it.

avl marked an inline comment as done.Mar 5 2019, 3:38 AM
avl added inline comments.
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗(On Diff #188970)

right, most executables only have one executable section. That is why previous implementation worked correctly for them using only address as point locator. But for not-linked objects this is not right. Previous implementation did not work for them. i.e. if we allow previous implementation by setting SectionIndex = UndefSection then such a call would work for only linked executable. That means that DWARFDebugLine or LLVMSymbolizer would imply concrete state.

following calls could be done for only linked executables :

DWARFDebugLine.getFileLineInfoForAddress ( SectionedAddress.SectionIndex = UndefSection
LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = UndefSection

it would be error to call them for non-linked object file. That means that all places where we could expect non-linked object file needs to have something like this :

if (linked executable) {

LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = UndefSection

} else {

LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = getSectionIndex()

}

llvm-dwardump, llvm-objectdump, lld, llvm-symbolizer - could work not only with linked executable but with not-linked object files also. So it looks like most usages would expect both cases - linked executables and not-linked object files. Probably in that case stateless interface would be more convenient ?

Current stateless interface does not assume whether it works for linked or not-linked object. It always receives correct SectionIndex and works equally for both types :

LLVMSymbolizer.symbolizeInlinedCode ( SectionedAddress.SectionIndex = getSectionIndex()

The places where it becomes less convenient is the code assumed to work with *only* linked-executable. In such places there would be neccessary to add code which would detect SectionIndex for new interface. But this looks to be quite natural task and written once it could be used later for free. I created such a routine - getModuleSectionIndexForAddress(though I agree this is quite naive implementation) .

The advantage of the new stateless interface is that it could not be used incorrectly. i.e. it would not be possible to use UndefSection for non-linked object files.

Probably it would make sense to not keep old behavior without SectionIndex ?

currently UndefSection used in following modules :

  1. sanstats
  2. sancov
  3. llvm-xray
  4. llvm-dwarfdump
avl added a comment.Mar 5 2019, 4:19 AM

I separated ExecutionEngine into another review - https://reviews.llvm.org/D58959

avl marked an inline comment as done.Mar 8 2019, 11:46 AM
avl added inline comments.
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗(On Diff #188970)

David, do you think it makes sense to assume that SectionIndex should always be set in the correct value(not equal to undef) ? Or your preferred choice is to allow SectionIndex to be set into UndefState and assume that in that case linked executables should be processed ?

dblaikie added inline comments.Mar 8 2019, 12:07 PM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗(On Diff #188970)

I don't think it makes sense to open the file separately, search for a valid section index, then pass that into the API - compared to the API doing this for you. If you know the section you want, you pass it in, if not, you don't & you get the best effort (first instance, or perhaps APIs could be updated to return multiple responses when it's ambiguous).

I take it your idea was that this "open the file, search for some section index which covers the address, then use that section index when calling the underlying API" was meant as a stop-gap until some longer term solution where section indexes would always be passed in?

I don't think that's a future we'd really have - most of these uses/users aren't going to pass in section indexes (users of llvm-dwarfdump/objdump/symbolizeretc are unlikely to specify a section index on the command line too), so I think it makes sense to have the built-in functionality be what's currently grafted on top using getModuleSectionIndexForAddress - leave that as the built-in functionality as it was before and let users override/be specific if they want to be.

avl marked an inline comment as done.Mar 8 2019, 2:25 PM
avl added inline comments.
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗(On Diff #188970)

I take it your idea was that this "open the file, search for some section index which
covers the address, then use that section index when calling the underlying API" was
meant as a stop-gap until some longer term solution where section indexes would
always be passed in?

yes. that`s right.

I don't think that's a future we'd really have - most of these uses/users aren't
going to pass in section indexes (users of llvm-dwarfdump/objdump
/symbolizeretc are unlikely to specify a section index on the command line too), so
I think it makes sense to have the built-in functionality be what's currently grafted
on top using getModuleSectionIndexForAddress - leave that as the built-in
functionality as it was before and let users override/be specific if they want to be.

That is also thing which I am agree. i.e. users should have a built-in functionality for setting proper SectionIndex. They should not do it manually.

Thing which seems to me questionable is that - Is it OK to hide that builtin functionality under old symbolizer interface :

symbolizeCode(..., object::SectionedAddress ModuleOffset)
symbolizeInlinedCode(..., object::SectionedAddress ModuleOffset)
symbolizeData(..., object::SectionedAddress ModuleOffset

i.e. if ModuleOffset.SectionIndex = UndefSection and if we will keep old behavior in this case. That looks questionable because different use cases could require different behaviour. But in this case some concrete behavior would already be done.

Instead we can specify interface so that it always receives correct unambitious information. i.e. these methods would always receive correct SectionIndex :

symbolizeCode(..., object::SectionedAddress ModuleOffset)
symbolizeInlinedCode(..., object::SectionedAddress ModuleOffset)
symbolizeData(..., object::SectionedAddress ModuleOffset

so that SectionIndex either set into proper index of section, either set into Undef state and it means an absolute address.

That allows to implement several builtin behaviors separately. for example : we could implement this method for symbolizer :

enumerateSections (uint64_t address, std::function<void(uint_64_t SectionIndex)> on_section);

then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them.

If we will need another behavior(find first of, find last of/display error) then it would be possible to create another helper builtin routine(as part of symbolizer interface or as separate implementation).

This patch follows that idea : it requires SectionIndex to be properly specified for symbolizeCode/InlinedCode/Data methods and provides helper getModuleSectionIndexForAddress() routine so that all existing tools would be able to call it and work the same way as before. This getModuleSectionIndexForAddress routine assumed to be temporarily and should be replaced later with proper solution(s). Something like above enumerateSections() suggestion.

So it looks like this solution allows API to help users and at the same time to implement several use cases.

dblaikie added inline comments.Mar 8 2019, 2:41 PM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗(On Diff #188970)

I'm still not sure I follow the harm of having these APIs default to their old behavior - if the current solution is to have them all use this extra API to more awkwardly get the old behavior anyway.

"then llvm-dwarfdump/objdump/symbolizer would read an address from command line ask symbolizer for sections which specified address relates to and call symbolizeCode for the received info. Then this enumerateSections method would be builtin solution for them."

What's the new behavior you're describing here - how would dwarfdump/objdump/symbolizer behave differently if they enumerated the sections? What sort of user-facing behavior change are you picturing here?

I think the existing behavior's really not too problematic & fine to preserve (what do GNU tools do - eg: addr2line on an object file?) & API users can override the Undef section index to some specific section if/when they want to.

avl marked an inline comment as done.Mar 9 2019, 10:49 AM
avl added inline comments.
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗(On Diff #188970)

I'm still not sure I follow the harm of having these APIs default to their old behavior -
if the current solution is to have them all use this extra API to more awkwardly get
the old behavior anyway.

My main concern about "old behavior for Undef section index" is that it is implicit solution and this solution does not have cancel option. If we will need to implement another behavior for :

symbolizeCode(SectionedAddress.SectionIndex = Undef)

How would it be possible to cancel "old behavior for Undef section index" ?

"then llvm-dwarfdump/objdump/symbolizer would read an address from command
line ask symbolizer for sections which specified address relates to and call
symbolizeCode for the received info. Then this enumerateSections method would
be builtin solution for them."

What's the new behavior you're describing here - how would dwarfdump/objdump
/symbolizer behave differently if they enumerated the sections? What sort of
user-facing behavior change are you picturing here?

Actually I do not suggest to change their behavior here.
I just show that they could require different behavior than current one:

llvm-symbolizer -obj=test.o 0x10
func1()
test.c:3

above usage will show only one entry from some.o. If some.o has more sections
which match to the specified address then they would not be shown. It is possible that symbolizer could be extended to show entries from all sections matched to specified address.

llvm-symbolizer -obj=test.o 0x10
func1()
test.c:3
func2()
test.c:20

In this case there would be necessary to implement some sort of sections enumeration. "old behavior for Undef section index" would not help here.

I think the existing behavior's really not too problematic & fine to preserve (what do
GNU tools do - eg: addr2line on an object file?) & API users can override the Undef
section index to some specific section if/when they want to.

I do not suggest to change behavior. If current behavior is OK then let`s preserve it. I suggest to not hide specific behavior under API.

My concern is that let`s support it in an explicit way. It does not look awkwardly :

object::SectionedAddress ModuleOffset = {

Offset, Symbolizer.getModuleSectionIndexForAddress(ModuleName, Offset)};

Symbolizer.symbolizeCode(ModuleName, ModuleOffset)

It explicitly tells how Section Index received. getModuleSectionIndexForAddress()
preserve old behavior.

It could easily be extended for future needs(if necessary)
The advantage is that it would be explicitly seen which behavior would be used:

object::SectionedAddress ModuleOffset = {

Offset, Symbolizer.getSectionIndexForNewBehavior(ModuleName, Offset)};

Symbolizer.symbolizeCode(ModuleName, ModuleOffset)

API users can override the Undef
section index to some specific section if/when they want to

API users can not override behavior for Undef section index if
Undef section index would mean something specific. If the idea would be to
get symbol information for only absolute addresses - then call to

symbolizeCode(SectionedAddress.SectionIndex = Undef)

would return wrong results, since "old behavior" does extra work.

dblaikie added inline comments.Mar 11 2019, 5:45 PM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗(On Diff #188970)

My main concern about "old behavior for Undef section index" is that it is implicit solution and this solution does not have cancel option. If we will need to implement another behavior for :

symbolizeCode(SectionedAddress.SectionIndex = Undef)

How would it be possible to cancel "old behavior for Undef section index" ?

I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right?

So it seems clear to me how the caller can choose to get the old behavior, or choose not to.

Actually I do not suggest to change their behavior here.
I just show that they could require different behavior than current one:

llvm-symbolizer -obj=test.o 0x10
func1()
test.c:3

above usage will show only one entry from some.o. If some.o has more sections
which match to the specified address then they would not be shown. It is possible > that symbolizer could be extended to show entries from all sections matched to specified address.

llvm-symbolizer -obj=test.o 0x10
func1()
test.c:3
func2()
test.c:20

This ^ is what I would mean by a change in behavior. That's a change in what llvm-symbolizer does/how it behaves.

In this case there would be necessary to implement some sort of sections enumeration. "old behavior for Undef section index" would not help here.

Maybe - alternatively, the API could be changed to respond with multiple results when the request is ambiguous (when there's more than one section with that address in it).

But the particular thing about this API is that it separately opens the file and searches through the sections - which seems like a fairly awkward API (to have to open the file, processes it, then call an API that will open the file again & parse it completely separately).

Keeping the previous behavior as a fallback when Undef is given seems like a better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc"). In the future, the API could be improved to, instead of returning the first (or some arbitrary) result for the address - instead it could be split into two APIs - one that takes a SectionedAddress and returns zero or one results, and one that takes a raw address and returns zero or more results. But I don't think there's any need to rush to implimenting that in any callers - they were mostly doing fine with the old behavior.

avl marked an inline comment as done.Mar 12 2019, 2:48 AM
avl added inline comments.
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗(On Diff #188970)

I'm not sure what you mean by "cancel the old behavior" - if a caller never passes
Undef, they don't get the old behavior, right?

not quite right. I mean the case when caller passes Undef and wants different than "old behavior". I.e. Undef Section index means - absolute address(it is known that this is global address , not local to the section). If user asked for absolute address He might do not want to handle local section addresses which would be done as "old behaviour".

Keeping the previous behavior as a fallback when Undef is given seems like a
better option to me (avoids the "open the same file again and do your own search
for the section index before calling the API which will independently
reopen/reparse the file, etc").

reopening file is just implementation detail and could be handled in the same way as llvm-symbolizer currently does - it caches files. From the other side I think symbolizer`s interface would be changed anyway - to support several sections case. So that current "old behavior" would probably be removed later...

Anyway, if "Keeping the previous behavior as a fallback when Undef is given" seems better than "separating behavior in explicit way" then let`s continue with this preferable approach. So I am going to implement - if SectionIndex set into Undef state, then symbolizer would internally detect SectionIndex similarly to how it was before.

avl added a comment.Mar 12 2019, 2:52 AM

David, Could you also review https://reviews.llvm.org/D58959 please ? It seems not depending on decision discussed here. So it looks like it could already be integrated.

dblaikie added inline comments.Mar 12 2019, 8:09 AM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗(On Diff #188970)

I'm not sure what you mean by "cancel the old behavior" - if a caller never passes Undef, they don't get the old behavior, right?

not quite right. I mean the case when caller passes Undef and wants different than "old behavior". I.e. Undef Section index means - absolute address(it is known that this is global address , not local to the section). If user asked for absolute address He might do not want to handle local section addresses which would be done as "old behaviour".

Ah, OK, thanks for clarifying/explaining that (re: absolute address).

When would there be an ambiguity here between "any section with this address in it" and "I know this is an absolute address"? My naive understanding is that, say, in an unlinked object file addresses are all section-relative, and in a linked executable they're absolute. Are there cases where you might have both? And knowing whether the address is absolute or relative would produce different answers?

Keeping the previous behavior as a fallback when Undef is given seems like better option to me (avoids the "open the same file again and do your own search for the section index before calling the API which will independently reopen/reparse the file, etc").

reopening file is just implementation detail and could be handled in the same way as llvm-symbolizer currently does - it caches files.

It's an implementation detail that, to me, speaks to an issue in the design of the APIs - if the caller into the API is expected to somehow identify the section before calling into the API which then necessarily opens/parses the file for sections, etc. And one where callers may reasonably not have up-front knowledge of which section ID they want (& have to perform a search like this) - that speaks, to me, to something ill-fitting about the design of the API as it stands - which is where I'm coming from/motivated by in this discussion.

From the other side I think symbolizer`s interface would be changed anyway - to support several sections case. So that current "old behavior" would probably be removed later...

Anyway, if "Keeping the previous behavior as a fallback when Undef is given" seems better than "separating behavior in explicit way" then let`s continue with this preferable approach. So I am going to implement - if SectionIndex set into Undef state, then symbolizer would internally detect SectionIndex similarly to how it was before.

"before" being "before any of your recent changes"? OK - thanks!

If there is a case where "Undef == fallback to old behavior" and "Undef == treat the address as absolute" would produce different answers - could you include a test case so we can keep this use case in mind in the future?

avl marked an inline comment as done.Mar 12 2019, 11:40 AM
avl added inline comments.
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
146 ↗(On Diff #188970)

When would there be an ambiguity here between "any section with this address in it"
and "I know this is an absolute address"?

I thought about following difference :

"I know this is an absolute address":
symbolize(SectionIndex=Undef, linked_file) => func1():test.c:3
symbolize(SectionIndex=Undef, not_linked_file) => error. not linked executable.

"any section with this address in it":
symbolize(SectionIndex=Undef, linked_file) => func1():test.c:3
symbolize(SectionIndex=Undef, not_linked_file) => func1():test.c:3

note: I do not suggest to implement above behavior. I think that is possible scenario if caller does not want to handle case "address relates to several sections" (silently displaying one from several matches).

And one where callers may reasonably not have up-front knowledge of which
section ID they want (& have to perform a search like this)

Ok, I assumed that caller should know which section specified address belongs.
if symbolizer assumed to be used without section`s relation knowledge then we probably do not need SectionedAddress in symbolizer at all.

"before" being "before any of your recent changes"? OK - thanks!

yes.

avl updated this revision to Diff 190423.Mar 13 2019, 8:43 AM
avl edited the summary of this revision. (Show Details)

addressed all comments.

dblaikie added inline comments.Mar 14 2019, 1:12 PM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
155–166 ↗(On Diff #190423)

This code is repeated several times & again, still seems to be at the wrong layer of abstraction to me - having callers parse names, open files, search for sections - rather than passing in Undef and letting the underlying API do what it did before any of these changes.

I think it's further down that the Undef=>BestEffortSectionIndex should be handled - again, where it was effectively handled before any of these changes.

avl updated this revision to Diff 190735.Mar 14 2019, 3:06 PM

I moved handling Undef case from Symbolizer into SymbolizableObjectFile. Please check whether it is proper place.

In D58848#1430071, @avl wrote:

I moved handling Undef case from Symbolizer into SymbolizableObjectFile. Please check whether it is proper place.

I'm still a little confused by the existence of a section search at all whet Undef is given - given that the previous implementation (before your work) there was no such search.

Is it possible the search could be omitted entirely?

avl added a comment.Mar 14 2019, 4:26 PM

In case Undef specified then there could be several sections with matching address ranges.
It seems to me that DWARFDebugLineTable improper place to make such a decision - which of them should be taken as a result.
This in fact high level decision, not low level.

Previously that decision was made just by accident. Property of sorting algorithm defined which concrete section would be selected.
I think that this decision should be unambiguous and currently for the same pair {Address, SectionIndex} there always would be returned the same result.

If DWARFDebugLineTable would be changed so that for Undef value there would be done search by one criteria then for overlapping address ranges there could be different results depending on concrete call to DWARFDebugLineTable. To prevent such random section selection there was implemented getModuleSectionIndexForAddress().

Do you think we need to omit that high level decision and allow random selection for sections with overlapping address ranges ?

avl added a comment.Mar 15 2019, 12:19 AM

David, there is a connected review - https://reviews.llvm.org/D59189 It cures compilation error after D58194 Could you take a look on it please ?
I think that change is OK and it is good to do in spite of decision made in this review.

In D58848#1430203, @avl wrote:

In case Undef specified then there could be several sections with matching address ranges.
It seems to me that DWARFDebugLineTable improper place to make such a decision - which of them should be taken as a result.
This in fact high level decision, not low level.

It still seems too high level to me if it involves the caller separately opening and walking the file contents.

Please move this down at least far enough that it doesn't involve separately computing the file name and opening the file. Where it's a "well, if you didn't specify a section index, we'll use the first section that covers this address".

avl added a comment.Mar 22 2019, 2:41 PM

Please move this down at least far enough that it doesn't involve separately
computing the file name and opening the file. Where it's a "well, if you didn't
specify a section index, we'll use the first section that covers this address".

It looks like this version of patch does exactly this.
It does not open file separately and does not parse file name.
The only thing which is additionally done is for already opened file :
"well, if you didn't specify a section index, we'll use the first section that covers this address" :

uint64_t SymbolizableObjectFile::getModuleSectionIndexForAddress(

  uint64_t Address) const {

for (SectionRef Sec : Module->sections()) {
  if (!Sec.isText() || Sec.isVirtual())
    continue;

  if (Address >= Sec.getAddress() &&
      Address <= Sec.getAddress() + Sec.getSize()) {
    return Sec.getIndex();
  }
}

return object::SectionedAddress::UndefSection;

}

I think it looks very close to above description.

dblaikie accepted this revision.Mar 22 2019, 2:52 PM
In D58848#1440161, @avl wrote:

Please move this down at least far enough that it doesn't involve separately
computing the file name and opening the file. Where it's a "well, if you didn't
specify a section index, we'll use the first section that covers this address".

It looks like this version of patch does exactly this.
It does not open file separately and does not parse file name.
The only thing which is additionally done is for already opened file :
"well, if you didn't specify a section index, we'll use the first section that covers this address" :

uint64_t SymbolizableObjectFile::getModuleSectionIndexForAddress(

  uint64_t Address) const {

for (SectionRef Sec : Module->sections()) {
  if (!Sec.isText() || Sec.isVirtual())
    continue;

  if (Address >= Sec.getAddress() &&
      Address <= Sec.getAddress() + Sec.getSize()) {
    return Sec.getIndex();
  }
}

return object::SectionedAddress::UndefSection;

}

I think it looks very close to above description.

Ah, so it does - thanks for correcting me/ walking me through that. Perhaps I was just reading poorly, or Phab was showing a different delta or something.

This looks good, thanks!

(I guess perhaps another/future change could be for these APIs to return an error if it is ambiguous (the symbolize APIs have an Error return type already, so they could change semantics here without changing the syntax - and that'd probably give a pretty good user experience of "hey, you're trying to symbolize an ambiguous address, we can't currently give you a clear answer to this query") - and, as discussed, perhaps one day they return multiple results, etc)

This revision is now accepted and ready to land.Mar 22 2019, 2:52 PM
avl added a comment.Mar 22 2019, 2:59 PM

Thank you!

In D58848#1440177, @avl wrote:

Thank you!

It seems you indented the description by 2. That may be why the revision is not automatically closed..

Differential Revision: https://reviews.llvm.org/D58848
avl added a comment.Mar 23 2019, 2:38 AM

@MaskRay

It seems you indented the description by 2. That may be why the revision is not automatically closed..

Thanks! I will take it into account next time.

avl closed this revision.Mar 23 2019, 2:39 AM