Add in the ability of parsing symbol table for 64 bit object.
Details
- Reviewers
DiggerLin daltenty hubert.reinterpretcast jhenderson Xiangling_L MaskRay - Group Reviewers
Restricted Project - Commits
- rG8e84311a84b3: [XCOFF][AIX] Enable tooling support for 64 bit symbol table parsing
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Object/XCOFFObjectFile.cpp | ||
---|---|---|
411 | can we add new member function as getNumberOfSymbolTableEntries() return is64Bit() is64Bit() ? getNumberOfSymbolTableEntries64() : getLogicalNumberOfSymbolTableEntries32(); } the function can also use in |
llvm/lib/Object/XCOFFObjectFile.cpp | ||
---|---|---|
205 | Please see my other comments regarding combining the 32bit and 64 bit version into 1 function. | |
209 | Please see my other comments regarding combining the 32bit and 64 bit version into 1 function. | |
411 | About all the comments mentioning if we could combining the 32bit and 64 bit version into 1 function. |
It seems to me like this should be using inheritance here. You have a base class that has the common members, and provides pure virtual declarations of the various getters, with the sub-classes defining them to do the right thing. Yes, it would introduce a number of getters, but I feel like it would make everything a bit cleaner from a usability standpoint. In most cases, you then don't need any is64Bit queries, because the getters hide that from you.
On a testing note, there are several places in the new code which detect some kind of error. You need testing for these code paths too.
llvm/lib/Object/XCOFFObjectFile.cpp | ||
---|---|---|
411 | From my experience working with tools that had to support 32-bit and 64-bit ELF, you don't worry about the underlying type in most cases and always use the larger type. The same probably applies here. Of course, it becomes a bit moot if you add a common getter interface as suggested out-of-line, because those getters will have to return the larger of the two return types anyway. Is there a strong reason to not use the larger type everywhere? | |
756 | Related to my comments elsewhere - it looks to me like most consumers will need to handle both 32 and 64-bit versions, so they'll always have to do this dance. Thus your concern about how the caller uses them is misplaced - the caller is more likely to do the wrong thing i.e. call the wrong version than have problems with the return types. | |
792 | Better than errorCodeToError(/*some error code*/) is to use createStringError() or createFileError() to provide more context to the failure (how did the parsing fail? where? etc). | |
807 | I don't think you want to use int here. There's always going to be a positive number of entries, and there are no subtractions etc inolving Index here. Better would be an unsigned type of some form (presumably the return type of getNumberOfAuxEntries()). | |
816 | Same comment as before - use createStringError() or createFileError(). | |
llvm/test/tools/llvm-objdump/XCOFF/disassemble-symbol-description64.test | ||
14–15 | I'm not going to stop you checking in a pre-compiled object, as I'm not an XCOFF maintainer, but as you are continuing to add more functionality here, I strongly advise you to write a yaml2obj XCOFF port, to avoid pre-canned binaries. You'll find pre-built binaries extremely inconvenient to work with as you maintain things going forward. Not only that, but they are harmful to the git repository size, especially if you have to occasionally rebuild them. Using yaml2obj may also be about the only way you can test most parse failure paths. If yaml2obj isn't viable, at least consider llvm-mc or similar, if possible. | |
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
412–417 | @grimar has gone to a lot of effort to get rid of unwrapOrError from the ELF dumping code. I'd prefer it if we could avoid using it here too. It is generally better in dumping tools to report a warning and abort dumping the current section than to emit an error and terminate the program, since it gives the user more of the information they've asked for. |
llvm/lib/Object/XCOFFObjectFile.cpp | ||
---|---|---|
411 |
I don't know what strength this reason has, but we had noticed that some of the tools do not reflect the width of the 32-bit format fields very well (even for relatively uninterpreted output). Where the producer of the binary is under development, developers are better served if the tools emit the correct width for fields in the format. |
llvm/test/tools/llvm-objdump/XCOFF/disassemble-symbol-description64.test | ||
---|---|---|
14–15 | I agree that we would want to move away from pre-canned binaries at some point. |
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
---|---|---|
412–417 | Yeah. Having unwrapOrError available is my concern. I am trying to cleanup ELF dumper, but other files (e.g. COFF) are still using it, thought ideally I'd just remove this API from llvm-readobj code, it seems does more harm than good for a long term. |
llvm/include/llvm/BinaryFormat/XCOFF.h | ||
---|---|---|
300 | the name must be consistent with the aix OS file syms.h? |
llvm/include/llvm/BinaryFormat/XCOFF.h | ||
---|---|---|
300 | The current style in this file seems to be (correct me if I'm wrong): |
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
---|---|---|
412–417 | Thanks. Agreed. Will avoid using unwrapOrError in future code. |
llvm/include/llvm/BinaryFormat/XCOFF.h | ||
---|---|---|
300 | The issue here is that the name from the OS header is a reserved name. So by reason of not wanting undefined behaviour, we cannot use the name taken from the OS header. Unfortunately, that means we cannot be consistent in terms of using the name taken from the OS header without switching all the enumerator names to be descriptive and in the LLVM style. |
llvm/include/llvm/BinaryFormat/XCOFF.h | ||
---|---|---|
300 | hmm... Is the namespace XCOFF not enough to prevent the undefined behavior happening? Or are we afraid of people just use using namespace XCOFF to defeat it? |
llvm/include/llvm/BinaryFormat/XCOFF.h | ||
---|---|---|
300 | The practical cause of undefined behaviour in such a case is usually that the instance of the identifier here is misparsed, or otherwise has surprising behaviour, either because it is defined as a macro or is an extension keyword. Whether the name turns out to be in scope elsewhere is not a factor for such mechanisms. |
llvm/include/llvm/BinaryFormat/XCOFF.h | ||
---|---|---|
300 | Got it. I will switch the style of SymbolAuxType in the next revision. We will need to come up with a plan to switch the rest of the classes in this file (There are a lot). |
llvm/include/llvm/BinaryFormat/XCOFF.h | ||
---|---|---|
300 | +1 to dropping the underscore. I think it's okay for that to be the only change, but have no strong opinion either way, so happy with whichever you prefer. | |
305 | Not that it really matters, but it's more traiditional to order enums in ascending numerical order. Any particular reason you've done this in the reverse order? | |
llvm/lib/Object/XCOFFObjectFile.cpp | ||
411 | In think in the context of printing the appropriately formatted output, you'd want to switch on the source type (i.e. is64Bit or whatever), at the formatting time. Certainly, this is how we've done it in our own internal code bases I work on, and there are examples of this in a number of other LLVM utilities. For example in https://github.com/llvm/llvm-project/blob/master/llvm/lib/DebugInfo/DWARF/DWARFCompileUnit.cpp#L17, the dump function dumps the offset with a width according to the DWARF format (i.e. 32 or 64 bit), but the getLength function returns a 64-bit value always. Similarly https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-objdump/ELFDump.cpp#L257 identifies the ELF format kind and uses that in determining the width of offset, size and address fields (which are stored as uint64_t) when printing ELF program header tables. There are certainly plenty of places where this hasn't been done. Sometimes this is a mistake, other times it's for consistency with GNU output, but I think the preferred approach is the "store large, explicitly specify format on output" approach. |
llvm/test/tools/llvm-objdump/XCOFF/disassemble-symbol-description64.test | ||
---|---|---|
14–15 | Yeah, chicken-or-egg problem is a bit of an issue. I'm not sure there's always a clear answer to this. The one I've encouraged for yaml2obj DWARF support testing is to actually inspect the hex output (with sufficient additional commenting to make it clear what the output represents). By keeping the initial functionality small enough, you can boostrap up from there. The issue is that a lot of our low-level tool testing (i.e. testing of things like llvm-readobj) has switched over to yaml2obj, but clearly we can't (in theory) then use llvm-readobj to test the basic output of yaml2obj or we end up with a circular test dependency - a bug in a common library might not obviously manifest itself in this context, but would if using a tool from outside the ecosystem. Another strategy which I've used occasionally for testing DWARF parsing before the yaml2obj support existed was writing assembly using just .byte/.quad etc directives to craft the input format precisely, without relying on the higher-level assembly directives (like .file/.loc etc). This may not work in all situations though. |
llvm/include/llvm/BinaryFormat/XCOFF.h | ||
---|---|---|
305 | I sort of "copied" the list from the OS header, and that's just the order it appeared in OS header. |
llvm/include/llvm/BinaryFormat/XCOFF.h | ||
---|---|---|
305 | I haven't got any particular preference, so am happy to defer to whatever you prefer on this one. |
Addresses comments.
Add in test case to test errors.
Use view/reference class to encapsulate 32-bit and 64-bit differences instead.
@jhenderson I tried to use inheritance as suggested. But inheritance would mean I need to use pointers to enable the runtime polymorphism. Then there is a life time issue that need to be managed when using pointers. The easier way to achieve that is to return via unique_ptr. But using unique_ptr introduced usability issue in the caller/user side, as we would see std::move, SymbolRef.get() before getting to the query we want. Also underneath of the unique_ptr, new/delete is not very efficient as well.
In the end, I tried to solve this in similar manner as COFF does, which is using a view class without inheritance. Although the downside of it is we basically have an if query in every call to the view class to differentiate which version (32/64) we are having right now, the good thing is that caller side is much more cleaner.
llvm/include/llvm/Object/XCOFFObjectFile.h | ||
---|---|---|
158 | change to and change in the following functions too ? | |
176 | change to return reinterpret_cast<uintptr_t>(Entry32 ? Entry32 : Entry64) | |
400 | this means for getSymbolEntryAddressByIndex(uint32_t SymbolTableIndex) const ? | |
484 | assert(OwningObjectPtr != nullptr) here ? | |
495–528 | maybe we can use a macro here. int16_t getSectionNumber() const { return GETVALUE(SectionNumber); } uint16_t getSymbolType() const { return GETVALUE(SymbolType); } and so on | |
535 | getAddress() may confuse with getting the address of the symbol. maybe good to rename to getEntryAddress() ? | |
llvm/lib/Object/XCOFFObjectFile.cpp | ||
600 | several place use above NumberOfSymTableEntries , maybe good to provide a helper function. | |
757 | change to const uint64_t SymbolTableSize ? | |
823–825 | not all the symbol has Csect entry. | |
827 | I think assert(isCsectSymbol()) myabe better. | |
llvm/tools/llvm-objdump/XCOFFDump.cpp | ||
49 | I can not see benefit to change |
llvm/include/llvm/Object/XCOFFObjectFile.h | ||
---|---|---|
158 | I was initially worried about MSVC breakage here: https://github.com/llvm/llvm-project/commit/210314ae8c59bc0a8801c2528eda892cd5960c31 |
llvm/lib/Object/XCOFFObjectFile.cpp | ||
---|---|---|
823–825 | I returned Expected<XCOFFCsectAuxRef> partly because of the original comment on this function: I believe that's a change from your previous commit. Is there any reason that you changed your mind? | |
827 | Sure. | |
llvm/tools/llvm-objdump/XCOFFDump.cpp | ||
49 | I did it for consistency reason, i.e: always get XCOFFSymbolRef via toSymbolRef. |
llvm/include/llvm/Object/XCOFFObjectFile.h | ||
---|---|---|
400 | Sorry, what's the difference between Symbol and SymbolEntry? |
llvm/include/llvm/Object/XCOFFObjectFile.h | ||
---|---|---|
400 | the value of symbol maybe a symbol relocation address , I was confused getSymbolAddress with getting the relocation address at my first glance of the code, getSymbolEntryAddress, that means we need the SymbolEntry address not relocation address of a symbol. |
llvm/include/llvm/Object/XCOFFObjectFile.h | ||
---|---|---|
400 | I don't think we have a symbol relocation address IMO. The address related to relocation could be relocation entry's address, or the virtual address data member inside of a relocation entry. But those addresses are not related to symbol in any ways. | |
535 | Do you still find it confusing after seeing my other comments about Symbol vs SymbolEntry? |
llvm/include/llvm/Object/XCOFFObjectFile.h | ||
---|---|---|
145 | You probably don't need these assert. The dereference will crash anyway |
llvm/include/llvm/Object/XCOFFObjectFile.h | ||
---|---|---|
145 | That's not true. There are some number of addressable bytes containing 0 starting from address 0x0 on AIX. |
llvm/include/llvm/Object/XCOFFObjectFile.h | ||
---|---|---|
158 | thanks let me know. | |
171 | what about return reinterpret_cast<uintptr_t>(Entry32 ? Entry32 : Entry64) ? | |
202 | ruse GETVALUE(SymbolAlignmentAndType) ? | |
425 | not sure we want to Distance to be negative value future? I think change to int32_t Distance, means that we can backward | |
435 | not sure whether we want to define a enum for the LanguageID in this patch. | |
484 | "Symbol table pointer can not be nullptr!" --> "Symbol table entry pointer can not be nullptr!" | |
493 | using GETVALUE(Value) for consistent ? | |
llvm/lib/Object/XCOFFObjectFile.cpp | ||
223 | const int16_t SectNum ? | |
498 | const int16_t SectionNum ? |
llvm/lib/Object/XCOFFObjectFile.cpp | ||
---|---|---|
805–812 | const int16_t SectNum |
llvm/lib/Object/XCOFFObjectFile.cpp | ||
---|---|---|
839–845 | create a static function getSymbolAuxType in a file scope maybe better? |
Hi Jason, what's your plan about the patch? When will you move forward with it?
Currently D100375 relies on the patch for its 64-bit llvm-readobj. In additional, D97656, D99164 and D98003 should be rebased on it.
Besides, I would like to add support for the line number dump in llvm-objdump if this patch is ready.
It looks like this is a fundamental patch for tools implementation. And it looks good except for some comments haven't been addressed yet.
Please let me know if I can be of any help. Thanks!
Rebase and Address comments.
llvm/include/llvm/Object/XCOFFObjectFile.h | ||
---|---|---|
171 | No, you could not do that. It only works if Entry32 and Entry64 are the same type. But they are not here. | |
425 | I don't see a need to jump backward now. If it's needed in the future, we could always change in the future patch. | |
435 | I think we are already doing enum mapping in tools/llvm-readobj/XCOFFDumper.cpp. I don't see a strong need to create an enum for it. | |
493 | I would prefer to be more explicit here because we are doing a conversion to larger value for 32 bit version, which is different from the rest of GETVALUE(Value). | |
llvm/lib/Object/XCOFFObjectFile.cpp | ||
797 | Thanks. I brought back the old behavior and added the FIXME to say that this function does not return a correct value if we have -ffunction-sections enabled. | |
839–845 | Yes, there is a XCOFFCsectAuxRef ::getAuxType64(), but if you notice, this function is used to create an XCOFFCsectAuxRef object. So you don't have that function available in the creator. |
llvm/lib/Object/XCOFFObjectFile.cpp | ||
---|---|---|
831 | Is this error user-facing (I'm assuming so)? Assuming it is, you should record here which symbol is causing the problem. Otherwise the user will be faced with an error along the lines of this: error: this csec symbol contains no auxiliary entry which is not really actionable (imagine the input had 100000 symbols in - the user can't realistically go through each to find the offending one). | |
848 | Similar to my above comment - which entry was not found? Give the user more context so that they can act on the problem. | |
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
416 | This will write the error inline, rather than to stderr. Are you sure that's what you want? it isn't what most dumping tools do on failure. |
llvm/lib/Object/XCOFFObjectFile.cpp | ||
---|---|---|
831 | I believe this requires std::move(Error);, as you're returning an Expected, not an Error. | |
835–836 | I think it would make more sense to insert the name in the middle of the message to make it a bit more concise. Something like: | |
863 | I'd suggest quoting somehow the symbol name, so that any whitespace or similar that happens to end up in the name (rare, but possible to write using assembly, at least for other platforms) is easily understood to be part of the name. | |
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
414–415 | Use reportError or reportWarning so that the error is reported in a clean manner and consistent with other llvm-readobj varieties, and not report_fatal_error which looks like a crash. General rule of thumb: try to avoid using report_fatal_error, especially in tool code where it is easy to report errors properly. In llvm-readobj for ELF, we try to avoid even using reportError where possible, as that stops the tool from continuing dumping, which can be problematic occasionally. We prefer reportWarning (or more specifically the local reportUniqueWarning which avoids reporting the same warning multiple times) and bailing out of the current routine. Take a look at ELFDumper.cpp for examples. |
Address comments.
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
---|---|---|
414–415 | Thanks for the elaboration. That clears up things a lot for me. I will use reportUniqueWarning here. |
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
---|---|---|
414–415 | reportUniqueWarning can take an Error directly, so you can just do: if (!ErrOrCsectAuxRef) reportUniqueWarning(ErrOrCsectAuxRef.takeError()); Also, be careful, as the program continues, so referencing ErrOrCsectAuxRef after this may result in things going wrong... |
I'll try to take another look in the next day or two. The pre-merge bots are failing. Is that an issue with this patch?
Thanks a lot.
I think it's more of an issue that the binaries did not get into the phabricator probably. Not really an issue with the patch itself.
Only some minor nits, otherwise LGTM. I haven't attempted to review the test coverage for all the new code, as I don't feel like I'm in a good position to do that, not being an XCOFF developer. I assume someone else has though.
llvm/include/llvm/Object/XCOFFObjectFile.h | ||
---|---|---|
483 | ||
485 | ||
llvm/lib/Object/XCOFFObjectFile.cpp | ||
797 | ||
808 | Basically any time you use consumeError, add a comment explaining why it's justified that we don't report the error to the user. |
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
---|---|---|
390–391 | this is only for the 32bits . for 64bit, it maybe look for the x_auxtype ==AUX_CSECT |
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
---|---|---|
187 | if (AuxEntPtr->AuxType != XCOFF::AUX_FILE ) , it should not be parsed as | |
236 | if (AuxEntPtr->AuxType != XCOFF::AUX_CSECT) , it should not be parsed as |
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
---|---|---|
187 | I think it's the caller's responsibility to make sure they are passing in the right auxiliary type. So we should assume when we enter this function, we have the right auxiliary type here. So I modified it and made it an assert instead. | |
236 | Same above. I modified it to be an assertion instead. | |
390–391 | Good point. Updated the code. |
llvm/lib/Object/XCOFFObjectFile.cpp | ||
---|---|---|
94 | SymbolAuxType is only for the 64 bits. | |
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
378 | as you mention "I think it's the caller's responsibility to make sure they are passing in the right auxiliary type. " I think we need to check the AuxType == XCOFF::AUX_FILE for 64 bits. if not , print out the raw data as AUX_CSECT did ? |
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
---|---|---|
416 | we have iterated auxiliary entries from line 382~396 and reiterated again in the printCsectAuxEnt(). I think we can improve on it. And in 64bits, The auxiliary entries maybe be reordered in above implement. |
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
---|---|---|
416 | I don't think we reiterated again in printCsectAuxEnt() for other auxiliary entries. |
LGTM with address comment.
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
---|---|---|
366 | it need "continue;" |
I took a quick look at the pre-merge output. The message is that the object isn't recognised as a valid object file, not that it wasn't found, which suggests either the object or code is broken in some manner, or you're missing a dependent patch. Does this patch depend on another patch that isn't in main yet?
llvm/tools/llvm-readobj/XCOFFDumper.cpp | ||
---|---|---|
357–372 | i -> I (you're changing most of the loop body - you might as well fix this whilst you're here) | |
363 | Test case? | |
392–394 | My English language ping went off at the word "till" in these two sentences. I'd probably change it to "to". Also, use "first" rather than "1st", I suggest in both places. Also "skips" -> "skip" for grammatical consistency. | |
396 | i -> I |
No this patch does not depend on other patches. I think this is caused by git generated binary for the patch doesn't really assemble to the same one?
llvm/test/tools/llvm-readobj/XCOFF/file-aux-wrong64.test | ||
---|---|---|
20 | I added an newline after all the raw bytes. Other than that, I think the output is good. We have 18 bytes per symbol table entry, and we are printing 18 raw bytes here. |
llvm/test/tools/llvm-readobj/XCOFF/file-aux-wrong64.test | ||
---|---|---|
20 | I don't know if it is, but you probably want 00fb indented to line up nicely with the previous line. You can then confirm that this indentation is maintained by enabling --match-full-lines and --strict-whitespace in FileCheck. (If you do that, you'll need to remove the space after CHECK-NEXT:) |
llvm/test/tools/llvm-readobj/XCOFF/file-aux-wrong64.test | ||
---|---|---|
20 | Hi James, I adjusted the code to print 00fb in the same line, I think that actually works better because we could have multiply auxiliary entry data to print out. Putting each in the same line is easier to parse. |
the name must be consistent with the aix OS file syms.h?
what about the to change to AUX_EXCEPT . it consistent with our current style.