This is an archive of the discontinued LLVM Phabricator instance.

[AIX][XCOFF] add symbol priority for the llvm-objdump -D -symbol-description
ClosedPublic

Authored by DiggerLin on Apr 17 2020, 10:35 AM.

Details

Summary

when there are two symbol has the same address. llvm-objdump -D -symbol-description will select symbol based on the following rule:

  1. using Label first if there is a Label symbol.
  2. If there is not Label, using a symbol which has Storage Mapping class.
  3. if more than one symbol has storage mapping class, put the TC0 has the low priority, for other storage mapping class , compare based on the value.

Diff Detail

Event Timeline

DiggerLin created this revision.Apr 17 2020, 10:35 AM
llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
32

It's not okay for the library header to refer to a function defined in the tool implementation. Either the function should be part of the library or the tool should define and use a custom comparator. I'm leaning towards the latter.

llvm/tools/llvm-objdump/XCOFFDump.cpp
78

To understand the "priority" we need to understand whether sorting by this predicate (and the operator < that calls it) produces a list with the priority in ascending or in descending order. Please add a comment that indicates which. Doxygen-style interface documentation might be appropriate for the comparator that will be used by the tool.

80

Using < on booleans is weird, the expression here is equivalent to SymInfo2.IsLabel.

82

There should be additional tiebreakers until the symbols are mostly indistinguishable. The symbol with the index is higher priority than the one without. Since we already prefer labels over csects, the symbol with the lower index is likely to be higher priority (at least if both have a storage mapping class; if not, then it's hard to tell). We can retain the old final tiebreaker on the name as well.

96

We must never return true for comparing a symbol with itself. Also, we should prefer lower storage mapping class values: this conveniently means that the code interpretation is preferred over the data one (code being less likely to be of zero length).

DiggerLin marked 5 inline comments as done.Apr 21 2020, 9:29 AM
DiggerLin added inline comments.Apr 21 2020, 9:29 AM
llvm/tools/llvm-objdump/XCOFFDump.cpp
82
  1. llvm-readobj sort the symbol on ascend order.
  2. the symbol without index is always the first symbol of the symbols of each section.
  3. if there are two symbols has the same address, llvm -D -symbol-description always use the symbol in the high ascend order.
96

thanks for point out comparing a symbol with itself

DiggerLin updated this revision to Diff 259028.Apr 21 2020, 9:35 AM
DiggerLin marked an inline comment as done.

address comment

DiggerLin marked 2 inline comments as done.Apr 22 2020, 5:55 AM
DiggerLin added inline comments.
llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
63

I will delete the "else" in the next patch update

llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
79 ↗(On Diff #259028)

I will delete the "if" in next patch update.

DiggerLin updated this revision to Diff 259313.Apr 22 2020, 9:01 AM

This behaviour seems significantly under-tested. Just looking at the new operator, there are three different comparisons, so I'd expect to see at least 4 test cases for that alone (one for each side of the branch point). Furthermore, there's a big new switch in the same file that isn't exercised in any depth.

llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
83 ↗(On Diff #259313)

Nit: Get rid of the blank line after the comment.

high order symbol

It's not clear to me what this phrase means.

symbols have the same

Should be symbols with the same

85 ↗(On Diff #259313)

not a label

90 ↗(On Diff #259313)

If a symbol does not have a StorageMappingClass ... which has a StorageMappingClass

DiggerLin added a comment.EditedApr 29 2020, 7:53 AM

This behaviour seems significantly under-tested. Just looking at the new operator, there are three different comparisons, so I'd expect to see at least 4 test cases for that alone (one for each side of the branch point). Furthermore, there's a big new switch in the same file that isn't exercised in any depth.

I do not think it is under-tested.

I dump all the symbol of the xcoff-section-headers.o

bash-5.0$ dump -tv xcoff-section-headers.o

xcoff-section-headers.o:

                        ***Symbol Table Information***
[Index] m        Value       Scn   Aux  Sclass    Type            Name
[Index] a0                                                        Fname
[Index] a1      Tagndx      Lnno  Size  Lnoptr    Endndx
[Index] a2      Tagndx            Fsiz  Lnoptr    Endndx
[Index] a3      Tagndx      Lnno  Size  Dimensions
[Index] a4   CSlen     PARMhsh SNhash SMtype SMclass Stab SNstab
[Index] a5      SECTlen    #RELent    #LINnums

[0]     m   0x00000000     debug     3    FILE          C:COM     .file
[1]     a0                                                        sections.c
[2]     a0                                                        Thu Apr 18 10:26:01 2019
[3]     a0                                                        IBM XL C for AIX, Version 16.1.0.1
[4]     m   0x00000000     .text     1  static                    .text
[5]     a5  0x00000080     0x0001     0x0000
[6]     m   0x00000080     .data     1  static                    .data
[7]     a5  0x00000024     0x0007     0x0000
[8]     m   0x000000a4      .bss     1  static                    .bss
[9]     a5  0x00000004     0x0000     0x0000
[10]    m   0x00000000    .tdata     1  static                    .tdata
[11]    a5  0x00000024     0x0000     0x0000
[12]    m   0x00000008     .tbss     1  static                    .tbss
[13]    a5  0x00000004     0x0000     0x0000
[14]    m   0x00000000     .text     1  unamex                    **No Symbol**
[15]    a4  0x00000080       0    0     SD       PR    0    0
[16]    m   0x00000000     .text     1  extern                    .func
[17]    a4  0x0000000e       0    0     LD       PR    0    0
[18]    m   0x00000080     .data     1  unamex                    TOC
[19]    a4  0x00000000       0    0     SD      TC0    0    0
[20]    m   0x00000094     .data     1  extern                    func
[21]    a4  0x0000000c       0    0     SD       DS    0    0
[22]    m   0x00000080     .data     1  unamex                    func
[23]    a4  0x00000004       0    0     SD       TC    0    0
[24]    m   0x000000a4      .bss     1  extern                    a
[25]    a4  0x00000004       0    0     CM       RW    0    0
[26]    m   0x00000084     .data     1  unamex                    a
[27]    a4  0x00000004       0    0     SD       TC    0    0
[28]    m   0x000000a0     .data     1  extern                    b
[29]    a4  0x00000004       0    0     SD       RW    0    0
[30]    m   0x00000088     .data     1  unamex                    b
[31]    a4  0x00000004       0    0     SD       TC    0    0
[32]    m   0x00000008     .tbss     1  extern                    c
[33]    a4  0x00000004       0    0     CM       UL    0    0
[34]    m   0x0000008c     .data     1  unamex                    c
[35]    a4  0x00000004       0    0     SD       TC    0    0
[36]    m   0x00000000    .tdata     1  extern                    d
[37]    a4  0x00000008       0    0     SD       TL    0    0
[38]    m   0x00000090     .data     1  unamex                    d
[39]    a4  0x00000004       0    0     SD       TC    0    0

there are
[4] m 0x00000000 .text 1 static .text
[5] a5 0x00000080 0x0001 0x0000
[14] m 0x00000000 .text 1 unamex No Symbol
[15] a4 0x00000080 0 0 SD PR 0 0
[16] m 0x00000000 .text 1 extern .func
[17] a4 0x0000000e 0 0 LD PR 0 0

symbol .text and .func have the same address. symbol .text do not has storage mapping class , No Symbol has storage mapping class and .func is a label.
the output is label ".func" . we tested it.

there are
[6] m 0x00000080 .data 1 static .data
[7] a5 0x00000024 0x0007 0x0000
[20] m 0x00000094 .data 1 extern func
[21] a4 0x0000000c 0 0 SD DS 0 0
symbol .data and func[DS] have the same address. symbol .data do not have not has storage mapping class , func[DS] has storage mapping class. the output is func[DS].
we tested it.

there are
[18] m 0x00000080 .data 1 unamex TOC
[19] a4 0x00000000 0 0 SD TC0 0 0
[22] m 0x00000080 .data 1 unamex func
[23] a4 0x00000004 0 0 SD TC 0 0
symbol .TOC and func[TC] have the same address . the output is func[TC] we tested it.

for the XCOFF. If symbols have storage mapping class, We only put the TC0 in the low proirity, We already tested it. for other.

DiggerLin updated this revision to Diff 260926.Apr 29 2020, 8:24 AM
DiggerLin marked 5 inline comments as done.

address comment

DiggerLin marked an inline comment as done.Apr 29 2020, 8:29 AM
DiggerLin added inline comments.
llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
77 ↗(On Diff #260926)

since we only has know that the TC0 si low priority than storage mapping class TC, TD, TE. For other storage mapping class, we do not have priority on them currently. I am not sure using switch case to generate the priority is a good idea or not? The advantage of using a switch case is: if we want to change the storage mapping class priority someday, we just need to change the switch case. @hubert.reinterpretcast , @jasonliu

83 ↗(On Diff #259313)

thanks

85 ↗(On Diff #259313)

thanks

90 ↗(On Diff #259313)

thanks

There seems to be quite a lot of new code here, but only one minor test change. The MCDisassembler ordering stuff looks like a prime candidate for gtest unit-testing. You could probably fairly easily unit-test the SymbolInfoTy comparison method too.

llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
81 ↗(On Diff #260926)

the highest priority symbol

85 ↗(On Diff #260926)

I think the following is a little clearer: "Label symbols have higher priorty than non-label symbols."

90–91 ↗(On Diff #260926)

Now that I have a better grasp of what is going on, I'd actually recommend a slightly different wording:

"Symbols with a StorageMappingClass have higher priority than those without."

DiggerLin marked 5 inline comments as done.EditedMay 8 2020, 7:17 AM

There seems to be quite a lot of new code here, but only one minor test change. The MCDisassembler ordering stuff looks like a prime candidate for gtest unit-testing. You could probably fairly easily unit-test the SymbolInfoTy comparison method too.

we will change all the test cases which has "llvm-objdump -D" to "llvm-objdump -D -symbol-description" . and "llvm-objdump --disassemble" to "llvm-objdump --disassemble -symbol-description"
in new patch. I think there will enough test change in the that patch.

And I will add a gtest unit test for the function bool operator<(const XCOFFSymbolInfo &SymInfo) const

llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
81 ↗(On Diff #260926)

thanks

85 ↗(On Diff #260926)

thanks

DiggerLin updated this revision to Diff 262946.May 8 2020, 1:59 PM
DiggerLin marked 2 inline comments as done.

address comment and added a new unit test case

jhenderson added inline comments.May 12 2020, 12:12 AM
llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
81 ↗(On Diff #260926)

Hi @DiggerLin. Thank you for addressing my comments, but please try to be more careful in addressing them in their entirety:

high -> highest

(if you're uncertain about something I wrote, feel free to ask for clarity!)

llvm/unittests/MC/CMakeLists.txt
20 ↗(On Diff #262946)

Traditionally, test file names are derived from the file they are testing. In other words, this should probably be MCDisassemblerTest.cpp

llvm/unittests/MC/XCOFFSymbolPriorityTest.cpp
1 ↗(On Diff #262946)

Please fix this header.

25–28 ↗(On Diff #262946)

It would probably be a good idea to show that these comparisons are stable - in other words, comparing them the other way around still works as expected. Consequently, I'd expect an EXPECT_GT(SIT2, SIT1); line or equivalent for each of these pairs.

It might also be helpful to add a comment for each test case (or pair of test cases once you add EXPECT_GT) explaining what each case shows, along the lines of "non-label symbols have lower priority than label symbols" etc.

DiggerLin marked 9 inline comments as done.May 13 2020, 6:52 AM
DiggerLin added inline comments.
llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
81 ↗(On Diff #260926)

thanks

llvm/unittests/MC/CMakeLists.txt
20 ↗(On Diff #262946)

thanks

llvm/unittests/MC/XCOFFSymbolPriorityTest.cpp
1 ↗(On Diff #262946)

thanks

25–28 ↗(On Diff #262946)

thanks.

since we do not provide a XCOFFSymbolInfo::operator< , testing EXPECT_GT will has a compile error.

DiggerLin updated this revision to Diff 263701.May 13 2020, 7:01 AM
DiggerLin marked 4 inline comments as done.

address comment

jhenderson added inline comments.May 14 2020, 1:24 AM
llvm/unittests/MC/MCDisassemblerTest.cpp
27 ↗(On Diff #263701)

If EXPECT_GT doesn't work, perhaps you could test something like EXPECT_FALSE(SIT2 < SIT1); etc? Probably also EXPECT_FALSE(SIT1 < SIT1)?

I think it's important to demonstrate both the true cases, like you have done, and the false cases. If you don't then your less than operator function could be the below to make all the unit tests pass:

bool operator< (auto LHS, auto RHS) {
  return true;
}
DiggerLin marked 2 inline comments as done.May 15 2020, 10:20 AM
DiggerLin added inline comments.
llvm/unittests/MC/MCDisassemblerTest.cpp
27 ↗(On Diff #263701)

I do not think we need to test EXPECT_FALSE(SIT1 < SIT1) , if there are two symbol is the same, we do not care which symbol is display.

DiggerLin marked an inline comment as done.

address comment

jhenderson added inline comments.May 19 2020, 12:21 AM
llvm/unittests/MC/MCDisassemblerTest.cpp
27 ↗(On Diff #263701)

It's important to test this because a number of sorting-related algorithms use std::less to compare symbols. If two identical values were to compare as less to each other simultaneously, it will render the algorithm unstable and potentially non-deterministic. Non-deterministic code is always bad, because it can make it hard to reproduce things. In some cases, an unsuitable comparator may even cause assertions (I believe MSVC extra checks do checks to ensure comparators are sane).

Furthermore, you are focused on sorting, but the code you have written is more widely applicable. People may use it in the future assuming it works like they would expect. They would expect the less than operator to return false when comparing the same symbol to itself. This could result in bugs.

DiggerLin updated this revision to Diff 265523.May 21 2020, 9:39 AM
DiggerLin marked 2 inline comments as done.

address comment

jhenderson accepted this revision.May 27 2020, 12:38 AM

LGTM, with one nit and a suggestion. Please wait for somebody else to confirm too.

llvm/unittests/MC/MCDisassemblerTest.cpp
27 ↗(On Diff #265523)

Up to you, but it might be good to switch these EXPECT_LT lines for the more explicit EXPECT_TRUE(SIT1 < SIT2); for consistency with the EXPECT_FALSE checks. I'm happy either way though.

44 ↗(On Diff #265523)

compare with itself. -> comparing with themselves.

This revision is now accepted and ready to land.May 27 2020, 12:38 AM
llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
85 ↗(On Diff #265523)

Nit: s/priorty/priority/;

90 ↗(On Diff #265523)

Please use the same construction as used for IsLabel to make a decision based on (SymInfo.)StorageMappingClass.hasValue() only if there is a difference between the two.

83 ↗(On Diff #259313)

Nit: Blank line still here.

llvm/unittests/MC/MCDisassemblerTest.cpp
25 ↗(On Diff #265523)

I think the description here puts too much emphasis on the "priority" result of this particular comparison when the "priority" is a concept that was meant to describe the sorting when the address is the same.

Suggestion:
"Test that higher addresses would appear later than lower ones when symbols are sorted in ascending order."

For future reference, it would be preferable if the less significant fields of the symbols differed in such a way that each field would suggest the reverse answer.

DiggerLin marked 6 inline comments as done.
llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
83 ↗(On Diff #266623)

Blank line still here.

90 ↗(On Diff #265523)

Latest refresh did not contain new changes here?

llvm/unittests/MC/MCDisassemblerTest.cpp
16 ↗(On Diff #266623)

There's no need to create Optionals instead of passing in the values that they are created from directly.

21 ↗(On Diff #266623)

I think this symbol is too unnatural. I believe a label here with XMC_RW should work.

30 ↗(On Diff #266623)

s/Test symbol/Test that symbols/;

35 ↗(On Diff #266623)

s/Test symbol/Test that symbols/;
s/has/have/;

40 ↗(On Diff #266623)

Please see the requested change to the definition of SIT3. As it is, the ordering of the tests would have suggested that the presence of a StorageMappingClass is more significant than the property of being a label. The current test input would not match that expectation (but I don't think such input is possible).

DiggerLin marked 7 inline comments as done.May 28 2020, 7:39 AM
DiggerLin added inline comments.
llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
90 ↗(On Diff #265523)

I do not think I can do it as IsLabel .

for example , if I implement as

if (StorageMappingClass != SymInfo.StorageMappingClass)

return  SymInfo.StorageMappingClass.hasValue();

it will have problem. we can get the code from optional.h

template <typename T, typename U>
bool operator==(const Optional<T> &X, const Optional<U> &Y) {

if (X && Y)
  return *X == *Y;
return X.hasValue() == Y.hasValue();

}

template <typename T, typename U>
bool operator!=(const Optional<T> &X, const Optional<U> &Y) {

return !(X == Y);

}

it will compare the value of storagemappingclass too , not only check whether it hasValue or not.

I change the code to

bool XCOFFSymbolInfo::operator<(const XCOFFSymbolInfo &SymInfo) const {
  // Label symbols have higher priority than non-label symbols.
  if (IsLabel != SymInfo.IsLabel)
    return SymInfo.IsLabel;

  // Symbols with a StorageMappingClass have higher priority than those without.
  if (StorageMappingClass.hasValue() != SymInfo.StorageMappingClass.hasValue())
    return  SymInfo.StorageMappingClass.hasValue();

  if (StorageMappingClass.hasValue()) {
    return getSMCPriority(StorageMappingClass.getValue()) <
           getSMCPriority(SymInfo.StorageMappingClass.getValue());
  }

  return false;
}
`

I do not think it is more simple than current implement, I keep the current implement, if you still want to change to above , I can change to as above code.

llvm/unittests/MC/MCDisassemblerTest.cpp
44 ↗(On Diff #265523)

thanks

DiggerLin updated this revision to Diff 266878.May 28 2020, 7:42 AM

LGTM with minor nits.

llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
90 ↗(On Diff #266878)

Minor nit: There's two spaces here; should only have one.

91 ↗(On Diff #266878)

There's trailing whitespace at the end of the line here.

llvm/unittests/MC/MCDisassemblerTest.cpp
33 ↗(On Diff #266878)

s/Test symbol/Test that symbols/;
s/that those/than those/;

34 ↗(On Diff #266878)

s/other/some other/;

jhenderson accepted this revision.May 29 2020, 2:29 AM

LGTM, once @hubert.reinterpretcast's comments have been addressed.

DiggerLin marked 4 inline comments as done.May 29 2020, 6:53 AM
This revision was automatically updated to reflect the committed changes.